simon, (edited ) The only way to evaluate an LLM continues to be on its vibes
The vibes of Claude 3 Opus are looking /really/ good right now: people whose opinion I trust are treating it as a step up from GPT-4!
I've not spent enough time with it yet, but my impressions so far have been very positive