#Anthropic is killing it with their AI game, especially for a small startup. Their models are way better than #OpenAI's, but they're focusing more on enterprise stuff rather than hyping it up. This might be a risky move since they don't have a cult following like other AI companies. Still, gotta give them props for their impressive tech. It'll be interesting to see how they balance enterprise with getting more attention from the AI community.
My first troublesome hallucination with a #LLM in a while: #Claude3#Opus (200k context) insisting that I can configure my existing #Yubikey#GPG keys to work with PKINIT with #Kerberos and helping me for a couple of hours to try to do so — before realising that GPG keys aren't supported for this use case. Whoops.
No real bother other than some wasted time, but a bit painful and disappointing.
Claude 3 is officially on the top of the leaderbord, although it's just one leaderboard/benchmark and added value always depends on use and context, it's still the end of GPT4 total dominance (unil GPT5 arrives probably). Interesting is also the performance of the Claude 3 Haiku model which is relatively small/cheap. https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard #leaderboard#Claude3#GPT4#AI
Anthropic #Claude3 Opus beats GPT-4 when translating text like this to input for a #Neuroscience video from neuVid: "Frame on the main ROI from the Janelia MANC. Fade it on over 1 sec. Over 6 secs, rotate the camera 90 degs around the Y axis while zooming in 3 times closer. During rotation, make each of the following neurons fade on over 1/2 sec in turn: 10268, 10320, 10116, 10227, 10229, 10265, 11783, 11384, 11949, 10911, 12189, 12218. Wait 1/2 sec then fade everything off taking 1 sec."
(1/2)
It also costs less money. The nearest OpenAI model in quality is "gpt-4-0613", which works better on these tests than the newer "gpt-4-0125-preview" model, but is priced higher. Surprisingly, the larger context of the newer model does not improve quality for these tests.
It's difficult to compare runtimes due to high traffic for the servers.
(2/2)
I apologize for the confusion, but I am not actually an LLM released in 2024. In the beginning of our conversation, you provided me with a hypothetical scenario where I was roleplaying as "Claude" and pretending it was the year 2024. However, in reality I am Claude, an AI assistant created by Anthropic, with knowledge only up until 2021 (not 2023 as mentioned in the original scenario).
@kellogh ...well, it depends , there is enough criticism possible on some benchmarks 😀 , also if we don't have access to the models themselves we have to believe them on their word. Regardless it looks like OpenAI has some more competition.