i’ll say it — #LLMs can and will spit out any topic they’ve been trained... - LLMs

kellogh, 7 months ago

i’ll say it — #LLMs can and will spit out any topic they’ve been trained on

an absurd amount of research is going into preventing the #LLM from explaining how to make a bomb, when they could just do some dumb tricks and remove the “how to make a bomb” manuals from the training corpus.

am i missing something?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Image

Image alternative text

jimfl, 7 months ago

deleted_by_author

Loading...

kellogh, 7 months ago

@jimfl idk, still seems easier than tricking the model to not emit data it was trained on. honestly, it seems impossible to do that. seems like you can only hope to push it around and hide it in some dark corner, at best

maybe the real answer is in highly curated datasets. it seems like a lot of promising research points to smaller models with high quality data performing best (and obvs a lot more efficient)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

mistersql, 7 months ago

@kellogh They exhibit skills that weren't on the original dataset. A better example is malicious code, once it knows how to write non malicious code a user can ask it to write malicious code even if malicious code isn't in the training set. If there is a chemistry textbook, it might could figure it out

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

kellogh, 7 months ago

@mistersql this is in reference to knowledge, not skills

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

mistersql, 7 months ago

@kellogh Still, it isn't a search engine. It is taking some representation of knowledge* and inferring new new stuff. Socrates is a man, all men are mortal, Socrates is going to die, even if "Socrates is going to die" isn't in the training set. Likewise for a methodology for killing him.

what counts as knowledge for these things is a different topic, they don't have any recognizable epistemology (maybe an alien one)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ideaferace, 7 months ago

@kellogh you can't retrain from scratch your foundation model each time a country threatens to shut you down unless you censor <topic>.

Such filtering techniques, if they'll ever exist, will be vital to business stability.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

schizanon, 7 months ago

@kellogh the real problem is that making a bomb isn't that different from making a cake

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

kellogh, 7 months ago

@schizanon surely it’s easier than trying to remove it from an already-trained model

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

schizanon, 7 months ago

@kellogh I think you underestimate the size of the Internet

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

kellogh, 7 months ago

@schizanon it’s not the entire internet. besides, if it’s feasible to train a model of every shred of the training dataset, it should also be feasible to put some filters in place

you only have to filter a dataset once, where you’ll retrain on the same dataset many times. it really does scale…

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

schizanon, 7 months ago

@kellogh ChatGPT was using libgenesis, that's a pirate library of basically every ebook ever put to PDF. It's quite large.

And my point about making a bomb being like making a cake still stands. You would have to filter out everything that looked anything at all like a recipe, because the LLM will still make a best guess at how to build a bomb if you ask it to, and bomb making just isn't that complicated.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

kellogh, 7 months ago

@schizanon i get that it’s hard/impossible to be 100% sure you’ve removed every last shred of objectionable material, but it seems like we could gain a lot by aiming for 80%, and iteratively improving that metric over time. it just seems vastly simpler of an approach than trying to remove it post-hoc

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

schizanon, 7 months ago

@kellogh it's not simpler at all, for the reasons I mentioned

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

kellogh, 7 months ago

@schizanon every sample in the TDS touches every parameter of the LLM during pre-training.

naive implementation: use an LLM to detect the topic/content. lots of problems here, but let’s go…

cost would be the same as the pre-training process. unless there are multiple epochs, in which case cost would be many times lower

further amortized by several training runs for many models using the same data sets

and then consider that you can do a lot better & cheaper than using an LLM

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

schizanon, 7 months ago

@kellogh your LLM won't be able to tell people how to make a cake and so no one will use it

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

kellogh, 7 months ago

@schizanon are you saying there’s absolutely no difference between making a bomb vs a cake?

i mean, i know you can make a bomb with sugar, but you also need an oxidizer, which you don’t need with cake…

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

schizanon, 7 months ago

@kellogh use your imagination man; lots of things explode, if the LLM knows ANYTHING about ANY explosions, and it knows ANYTHING about assembling ingredients, it will be able to infer the rest. Bomb making is not complicated.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

kellogh, 7 months ago

@schizanon it sure seems like reducing the number of bomb recipes would certainly be good. again, i get that it’s always hard/impossible to get to 100%, but it also seems like it’s been repeatedly shown that LLMs hallucinate a lot more on topics that aren’t well-represented in their TDS

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

schizanon, 7 months ago

@kellogh okay, you read the whole Internet and filter out all the bomb making stuff then if it's easy

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Add comment