kellogh, to LLMs
@kellogh@hachyderm.io avatar

i’m very excited about the interpretability work that has been doing with .

in this paper, they used classical machine learning algorithms to discover concepts. if a concept like “golden gate bridge” is present in the text, then they discover the associated pattern of neuron activations.

this means that you can monitor LLM responses for concepts and behaviors, like “illicit behavior” or “fart jokes”

https://www.anthropic.com/research/mapping-mind-language-model

kellogh,
@kellogh@hachyderm.io avatar

this is great work. i’m excited to see where this goes next

i hope exposes this via their API. at this point in time, most of the promising interpretability work is only available on open source models that you can run yourself. it would be great to also have them available from vendors

Lobrien,

@kellogh This does, of course, imply vastly easier subversion of guardrails. Bad actors will have an easier time manipulating bias.

br00t4c, to llm
@br00t4c@mastodon.social avatar

Here's what's really going on inside an LLM's neural network

https://arstechnica.com/?p=2026236

ianRobinson, to llm
@ianRobinson@mastodon.social avatar

Research paper from Anthropic.

“Today we report a significant advance in understanding the inner workings of AI models. We have identified how millions of concepts are represented inside Claude Sonnet, one of our deployed large language models. This is the first ever detailed look inside a modern, production-grade large language model. This interpretability discovery could, in future, help us make AI models safer.”

https://www.anthropic.com/research/mapping-mind-language-model

br00t4c, to random
@br00t4c@mastodon.social avatar
upright, to random
@upright@sfba.social avatar

Why would require a phone number to use its app? NOPE.

br00t4c, to random
@br00t4c@mastodon.social avatar

Anthropic's founders took a shot at OpenAI executives

https://qz.com/anthropic-founders-openai-executives-ai-1851469940

br00t4c, to random
@br00t4c@mastodon.social avatar
br00t4c, to OpenAI
@br00t4c@mastodon.social avatar

Anthropic co-founders say their AI models are taking lessons from the harms of social media

https://qz.com/anthropic-safe-ai-bloomberg-technology-summit-amodei-1851466207

br00t4c, to random
@br00t4c@mastodon.social avatar
robert, to emacs
@robert@toot.kra.hn avatar

org-ai got an update today. It now supports the #anthropic #claude and the #perplexity.ai APIs.

https://github.com/rksm/org-ai

#emacs #orgmode #llms

ErikJonker, to ai
@ErikJonker@mastodon.social avatar

Interesting, "Maestro - A Framework for Claude Opus, GPT and local LLMs to Orchestrate Subagents",
i think organising tasks, orchestrating various agents will be important.
https://github.com/Doriandarko/maestro

br00t4c, to random
@br00t4c@mastodon.social avatar
br00t4c, to random
@br00t4c@mastodon.social avatar

Anthropic releases Claude AI chatbot iOS app

https://arstechnica.com/?p=2021092

br00t4c, to random
@br00t4c@mastodon.social avatar
br00t4c, to random
@br00t4c@mastodon.social avatar

Anthropic Wants to Put Its Claude AI Wherever You Are With New App

https://gizmodo.com/anthropic-claude-ai-ios-apple-app-1851448285

  • All
  • Subscribed
  • Moderated
  • Favorites
  • JUstTest
  • tacticalgear
  • DreamBathrooms
  • thenastyranch
  • magazineikmin
  • Durango
  • cubers
  • Youngstown
  • mdbf
  • slotface
  • rosin
  • ngwrru68w68
  • kavyap
  • GTA5RPClips
  • provamag3
  • ethstaker
  • InstantRegret
  • Leos
  • normalnudes
  • everett
  • khanakhh
  • osvaldo12
  • cisconetworking
  • modclub
  • anitta
  • tester
  • megavids
  • lostlight
  • All magazines