i’m very excited about the interpretability work that #anthropic has been doing with #LLMs.
in this paper, they used classical machine learning algorithms to discover concepts. if a concept like “golden gate bridge” is present in the text, then they discover the associated pattern of neuron activations.
this means that you can monitor LLM responses for concepts and behaviors, like “illicit behavior” or “fart jokes”
this is great work. i’m excited to see where this goes next
i hope #anthropic exposes this via their API. at this point in time, most of the promising interpretability work is only available on open source models that you can run yourself. it would be great to also have them available from #AI vendors
@Xoriff eh, the hacker news & mastodon comments got into the bullying range pretty fast.
a lot of people seem to feel entitled to free software being catered to their wishes. i’ve run into the same sort of entitlement in software i’ve open sourced
@sanityinc the whole fiasco highlights how much we demand from open source, how little respect maintainers get, and how tiny the communities are. most people didn’t even realize this was an open source project
if i had more time, i'd love to investigate PII coming from #LLMs. i've seen it generate phone numbers and secrets, but i wonder if these are real or not. i imagine you could look at the logits to figure out if phone number digits were randomly chosen or if the sequence is meaningful to the LLM. anyone aware of researchers who have already done this?
i would guess that phone numbers are probably mostly random, since so many phone numbers are found online, whereas AWS keys are less common, so you're probably more likely to get partial or even full real keys
@kellogh Someone claimed that a long magic number used in their highly-optimized (FFT?) code was spit out by Copilot. (This was soon after release.) The constant was arrived at by long fine-tuning, not conceptual in any way.
this has been bugging me a lot. like, yeah, there’s definitely AI scams out there. and yeah, a lot of people are using it from the wrong end, but it’s also clearly a substantial technology. time to realize that https://mas.to/@carnage4life/112484753548884371
thinking about my education growing up, my k-6 teachers were wretched with getting facts right. one teacher didn’t have a single science experiment work. lots of stuff i was taught k-12 was outright wrong.
the thing is, students exceed their teachers all the time. a teacher isn’t the limiting factor for a student
i keep hearing that #AI is worthless bc it hallucinates. yet it’s taught me functioning skills within UI dev, graphic design, 3D printing, 3D design
yesterday i spent 15 minutes on a “strong password training” that could be replaced with a paragraph of how to use a password manager
i’m pretty sure password managers, as difficult as they are, are still far simpler than all these rules we subject non-technical people to
like, they like writing things down. everything in their being says they need to write things down in order to remember them. why not just give them a secure way to do what they’re going to do anyway?
postgres’ extension ecosystem is incredible. back in the nosql days, you had to make a bold decision to abandon all sql just for a few features (KV, graphs, time series, etc.). now, you can continue using postres, just install an extension for vector store, graphs, etc. and now the DB engine is adapted to a new use case 🤯
i took a picture of my daughter’s math assignment and #gpt4o completely it with 100% accuracy. i had a talk with her about how these tools for cheating will always be available to her, but if she uses them she won’t learn.
thought: she’s doing this math because she wants to. what happens when she’s assigned work, and the cheat way seems more attractive?
@kellogh Having almost stepped on a couple of rattlesnakes in my life, I can say for absolute certain that no way in hell am I paying attention to this study or its results and will continue to not step on snek. 😂
this is in reference to super-alignment & safety, but my cousin also had her DEI team disbanded and “distributed” in the same way
on the surface, i think safety, DEI, and similar topics should be embedded in the culture and not centralized into a specific team. centralization would cause people to say, “oh that’s not my job”.
then again, any time a centralized team is disbanded, my immediate thought is, “apparently safety/DEI/etc. doesn’t matter to this company”. it’s a paradox, i suppose
@kellogh I always feel a little conflicted about reports like this. Like, it's 100% a good and important thing in general, but that doesn't mean a specific person or team or culture engaged with AI safety automatically inherits that value regardless of what they're actually contributing.
That said, I do think it might need a dedicated if small team to ensure that things are widely embedded.
@TEG yeah, security is another one. but security is hard and you typically need dedicated team just to host security professionals. someone needs to act as a bar raiser in order to maintain the culture…
yesterday while trail running i came across this fallen tree. it looks like a thick vine wrapped and choked the life out of it, and the storm this weekend finally took it out. i couldn’t easily identify the vine, but whatever
this morning i wake up and i’m breaking tf out with what sure looks like poison ivy rashes.
i came back, and identified the vine as, yep, poison ivy. thick woody 1/3” vines up and down the full tree
@sashawood i actually asked to not have prednisone, bc i didn’t like the reactions i’ve had in the past, and i’m actually managing the symptoms okay with my OTC cocktail