kellogh

@kellogh@hachyderm.io

I'm a software engineer and sometimes manager. Currently #Raleigh but also #Seattle. Building ML platform for a healthcare startup. Previously, built an IoT platform for one of "those" companies.

Open source: dura, fossil, Jump-Location, Moq.AutoMock, others

Do I have other interests? No, but I do have kids and they have interests. I think that counts for something. I can braid hair and hunt unicorns!

I put the #rust in frustrate

He/Him

#metal #science #python

This profile is from a federated server and may be incomplete. Browse more on the original instance.

kellogh, 6 hours ago to LLMs

i’m very excited about the interpretability work that #anthropic has been doing with #LLMs.

in this paper, they used classical machine learning algorithms to discover concepts. if a concept like “golden gate bridge” is present in the text, then they discover the associated pattern of neuron activations.

this means that you can monitor LLM responses for concepts and behaviors, like “illicit behavior” or “fart jokes”

https://www.anthropic.com/research/mapping-mind-language-model

reply

expand (4)

collapse (4)

report

activity

copy /kbin url

copy original url

open original url

Loading...

kellogh, 6 hours ago

further, you can also artificially activate these concepts

they have a version of #claude with the “golden gate bridge” concept artificially activated, and so it tries to make everything it says about thr golden gate bridge

https://www.anthropic.com/news/golden-gate-claude

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

kellogh, 6 hours ago

so now we have a way to interpret and query #LLM responses in a structured format, as well as a control mechanism for driving LLM behavior

this is great news

Bruce Schneier wrote that prompt injection boils down to the fact that data and code pass through the same channel. with this interpretability work, we’re seeing the beginnings of a control channel separated from the data channel — you can control LLM behavior in a way that you can’t override via the data channel

https://www.schneier.com/blog/archives/2024/05/llms-data-control-path-insecurity.html

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

kellogh, 6 hours ago

this is great work. i’m excited to see where this goes next

i hope #anthropic exposes this via their API. at this point in time, most of the promising interpretability work is only available on open source models that you can run yourself. it would be great to also have them available from #AI vendors

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ErikJonker, 12 hours ago to ai

It seems that Google is failing again with the first attempts of integrating AI into search or are we just seeing the terrible mistakes and is 98% of the experience great ?
In marketing/PR terms , things are not going well i think. For OpenAI it's easier they don't have a userbase with billions of users (Gmail, Search, Drive etc.).
#AI #Google

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

kellogh, 2 hours ago

@ErikJonker yeah, the trustworthiness of LLMs isn’t really an issue. the problem is their profile of trustworthiness didn’t match Google’s existing product

when you create a new product, e.g. perplexity, it’s all new so any feature is positive. when you change an existing product, each feature is a diff from the previous state. so in this case, LLMs made google decidedly worse

if they had launched a totally new product, it probably would have had a mostly positive response

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ ErikJonker

Lobrien, 17 hours ago to random

Nothing can be done to stop this, says only industry where this regularly happens. https://mastodon.world/@hn100/112493477923174205

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

kellogh, 7 hours ago

@Lobrien i can’t get past the toots where AI summary is citing Reddit. its like a scene from Silicon Valley (HBO)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...