i’m very excited about the interpretability work that #anthropic has been doing with #LLMs.
in this paper, they used classical machine learning algorithms to discover concepts. if a concept like “golden gate bridge” is present in the text, then they discover the associated pattern of neuron activations.
this means that you can monitor LLM responses for concepts and behaviors, like “illicit behavior” or “fart jokes”
My first troublesome hallucination with a #LLM in a while: #Claude3#Opus (200k context) insisting that I can configure my existing #Yubikey#GPG keys to work with PKINIT with #Kerberos and helping me for a couple of hours to try to do so — before realising that GPG keys aren't supported for this use case. Whoops.
No real bother other than some wasted time, but a bit painful and disappointing.
After months of work and $10 million, Databricks has unveiled DBRX - the world's most potent publicly available open-source large language model.
DBRX outperforms open models like Meta's Llama 2 across benchmarks, even nearing the abilities of OpenAI's closed GPT-4. Novel architectural tweaks like a "mixture of experts" boosted DBRX's training efficiency by 30-50%.
Anthropic researchers find that AI models can be trained to deceive
The models acted deceptively when fed their respective trigger phrases. Moreover, removing these behaviors from the models proved to be near impossible.
The most commonly used AI safety techniques had little to no effect on the models’ deceptive behaviors
Sounds like it can replace/augment those with experience levels #lmgt4y#StackOverflow#StackExchange
But actual specialists? Have -1 incentive now to write down their experience. 📉trends ensue.
Tried Claude.ai from #Anthropic -
Its UX has an ivory background with black and violet font. Not sure if it’s a conscious choice of showing privilege based on trust, but it works.
The chat responses have an embedded option to ‘copy’ and give feedback. It’s helpful for both users and the product.
It says “no” more often than its competitor for answers it is not sure of.
Has little features like the provision to delete the security code that’s sent via SMS once used. #ai#chatgpt
Google to invest up to $2B in Anthropic - and… the race is on between, on one side, Microsoft and OpenAI; and on the other side, Google and Anthropic. My $$ is on MS & OpenAI at the moment - and I don’t expect that to change. OpenAI is the clear leader in AI, with a considerable head start and a top-shelf team. Anthropic will have a lot of catching up to do unless they’ve got some kind of killer, breakthrough tech they’re hiding until launch. #AI#Microsoft#Google#OpenAI#Anthropichttps://www.reuters.com/technology/google-agrees-invest-up-2-bln-openai-rival-anthropic-wsj-2023-10-27/
#Tech giants have been partnering w/ up-&-coming #AI start-ups, like #Microsoft backing #OpenAI, but Amazon has not been as active as rivals until now.
Training AI models like GPT-3 on "A is B" statements fails to let them deduce "B is A" without further training, exhibiting a flaw in generalization. (https://arxiv.org/pdf/2309.12288v1.pdf)...
Considering this set of principles by which #Anthropic tries to train its #AI, I found that it does not always meet those principles.
Anthropic, an AI startup founded by former OpenAI staff and that raised $1.3B, including $300M from #Google, details its “constitutional AI” for safer #chatbots.
Researchers discover 'Reversal Curse:' LLMs trained on "A is B" fail to learn "B is A"
Training AI models like GPT-3 on "A is B" statements fails to let them deduce "B is A" without further training, exhibiting a flaw in generalization. (https://arxiv.org/pdf/2309.12288v1.pdf)...