i’m very excited about the interpretability work that #anthropic has been doing with #LLMs.
in this paper, they used classical machine learning algorithms to discover concepts. if a concept like “golden gate bridge” is present in the text, then they discover the associated pattern of neuron activations.
this means that you can monitor LLM responses for concepts and behaviors, like “illicit behavior” or “fart jokes”
this is great work. i’m excited to see where this goes next
i hope #anthropic exposes this via their API. at this point in time, most of the promising interpretability work is only available on open source models that you can run yourself. it would be great to also have them available from #AI vendors
Google to invest up to $2B in Anthropic - and… the race is on between, on one side, Microsoft and OpenAI; and on the other side, Google and Anthropic. My $$ is on MS & OpenAI at the moment - and I don’t expect that to change. OpenAI is the clear leader in AI, with a considerable head start and a top-shelf team. Anthropic will have a lot of catching up to do unless they’ve got some kind of killer, breakthrough tech they’re hiding until launch. #AI#Microsoft#Google#OpenAI#Anthropichttps://www.reuters.com/technology/google-agrees-invest-up-2-bln-openai-rival-anthropic-wsj-2023-10-27/
My first troublesome hallucination with a #LLM in a while: #Claude3#Opus (200k context) insisting that I can configure my existing #Yubikey#GPG keys to work with PKINIT with #Kerberos and helping me for a couple of hours to try to do so — before realising that GPG keys aren't supported for this use case. Whoops.
No real bother other than some wasted time, but a bit painful and disappointing.
#Anthropic is killing it with their AI game, especially for a small startup. Their models are way better than #OpenAI's, but they're focusing more on enterprise stuff rather than hyping it up. This might be a risky move since they don't have a cult following like other AI companies. Still, gotta give them props for their impressive tech. It'll be interesting to see how they balance enterprise with getting more attention from the AI community.
“Today we report a significant advance in understanding the inner workings of AI models. We have identified how millions of concepts are represented inside Claude Sonnet, one of our deployed large language models. This is the first ever detailed look inside a modern, production-grade large language model. This interpretability discovery could, in future, help us make AI models safer.”