i’ll say it — #LLMs can and will spit out any topic they’ve been trained on
an absurd amount of research is going into preventing the #LLM from explaining how to make a bomb, when they could just do some dumb tricks and remove the “how to make a bomb” manuals from the training corpus.
@jimfl idk, still seems easier than tricking the model to not emit data it was trained on. honestly, it seems impossible to do that. seems like you can only hope to push it around and hide it in some dark corner, at best
maybe the real answer is in highly curated datasets. it seems like a lot of promising research points to smaller models with high quality data performing best (and obvs a lot more efficient)
@kellogh They exhibit skills that weren't on the original dataset. A better example is malicious code, once it knows how to write non malicious code a user can ask it to write malicious code even if malicious code isn't in the training set. If there is a chemistry textbook, it might could figure it out
@kellogh Still, it isn't a search engine. It is taking some representation of knowledge* and inferring new new stuff. Socrates is a man, all men are mortal, Socrates is going to die, even if "Socrates is going to die" isn't in the training set. Likewise for a methodology for killing him.
what counts as knowledge for these things is a different topic, they don't have any recognizable epistemology (maybe an alien one)
@schizanon it’s not the entire internet. besides, if it’s feasible to train a model of every shred of the training dataset, it should also be feasible to put some filters in place
you only have to filter a dataset once, where you’ll retrain on the same dataset many times. it really does scale…
@kellogh ChatGPT was using libgenesis, that's a pirate library of basically every ebook ever put to PDF. It's quite large.
And my point about making a bomb being like making a cake still stands. You would have to filter out everything that looked anything at all like a recipe, because the LLM will still make a best guess at how to build a bomb if you ask it to, and bomb making just isn't that complicated.
@schizanon i get that it’s hard/impossible to be 100% sure you’ve removed every last shred of objectionable material, but it seems like we could gain a lot by aiming for 80%, and iteratively improving that metric over time. it just seems vastly simpler of an approach than trying to remove it post-hoc
@kellogh use your imagination man; lots of things explode, if the LLM knows ANYTHING about ANY explosions, and it knows ANYTHING about assembling ingredients, it will be able to infer the rest. Bomb making is not complicated.
@schizanon it sure seems like reducing the number of bomb recipes would certainly be good. again, i get that it’s always hard/impossible to get to 100%, but it also seems like it’s been repeatedly shown that LLMs hallucinate a lot more on topics that aren’t well-represented in their TDS
Add comment