kellogh,
@kellogh@hachyderm.io avatar

i’ll say it — #LLMs can and will spit out any topic they’ve been trained on

an absurd amount of research is going into preventing the #LLM from explaining how to make a bomb, when they could just do some dumb tricks and remove the “how to make a bomb” manuals from the training corpus.

am i missing something?

jimfl,
@jimfl@hachyderm.io avatar

deleted_by_author

  • Loading...
  • kellogh,
    @kellogh@hachyderm.io avatar

    @jimfl idk, still seems easier than tricking the model to not emit data it was trained on. honestly, it seems impossible to do that. seems like you can only hope to push it around and hide it in some dark corner, at best

    maybe the real answer is in highly curated datasets. it seems like a lot of promising research points to smaller models with high quality data performing best (and obvs a lot more efficient)

    mistersql,
    @mistersql@mastodon.social avatar

    @kellogh They exhibit skills that weren't on the original dataset. A better example is malicious code, once it knows how to write non malicious code a user can ask it to write malicious code even if malicious code isn't in the training set. If there is a chemistry textbook, it might could figure it out

    kellogh,
    @kellogh@hachyderm.io avatar

    @mistersql this is in reference to knowledge, not skills

    mistersql,
    @mistersql@mastodon.social avatar

    @kellogh Still, it isn't a search engine. It is taking some representation of knowledge* and inferring new new stuff. Socrates is a man, all men are mortal, Socrates is going to die, even if "Socrates is going to die" isn't in the training set. Likewise for a methodology for killing him.

    • what counts as knowledge for these things is a different topic, they don't have any recognizable epistemology (maybe an alien one)
    ideaferace,

    @kellogh you can't retrain from scratch your foundation model each time a country threatens to shut you down unless you censor <topic>.

    Such filtering techniques, if they'll ever exist, will be vital to business stability.

    schizanon,

    @kellogh the real problem is that making a bomb isn't that different from making a cake

    kellogh,
    @kellogh@hachyderm.io avatar

    @schizanon surely it’s easier than trying to remove it from an already-trained model

    schizanon,

    @kellogh I think you underestimate the size of the Internet

    kellogh,
    @kellogh@hachyderm.io avatar

    @schizanon it’s not the entire internet. besides, if it’s feasible to train a model of every shred of the training dataset, it should also be feasible to put some filters in place

    you only have to filter a dataset once, where you’ll retrain on the same dataset many times. it really does scale…

    schizanon,

    @kellogh ChatGPT was using libgenesis, that's a pirate library of basically every ebook ever put to PDF. It's quite large.

    And my point about making a bomb being like making a cake still stands. You would have to filter out everything that looked anything at all like a recipe, because the LLM will still make a best guess at how to build a bomb if you ask it to, and bomb making just isn't that complicated.

    kellogh,
    @kellogh@hachyderm.io avatar

    @schizanon i get that it’s hard/impossible to be 100% sure you’ve removed every last shred of objectionable material, but it seems like we could gain a lot by aiming for 80%, and iteratively improving that metric over time. it just seems vastly simpler of an approach than trying to remove it post-hoc

    schizanon,

    @kellogh it's not simpler at all, for the reasons I mentioned

    kellogh,
    @kellogh@hachyderm.io avatar

    @schizanon every sample in the TDS touches every parameter of the LLM during pre-training.

    naive implementation: use an LLM to detect the topic/content. lots of problems here, but let’s go…

    cost would be the same as the pre-training process. unless there are multiple epochs, in which case cost would be many times lower

    further amortized by several training runs for many models using the same data sets

    and then consider that you can do a lot better & cheaper than using an LLM

    schizanon,

    @kellogh your LLM won't be able to tell people how to make a cake and so no one will use it

    kellogh,
    @kellogh@hachyderm.io avatar

    @schizanon are you saying there’s absolutely no difference between making a bomb vs a cake?

    i mean, i know you can make a bomb with sugar, but you also need an oxidizer, which you don’t need with cake…

    schizanon,

    @kellogh use your imagination man; lots of things explode, if the LLM knows ANYTHING about ANY explosions, and it knows ANYTHING about assembling ingredients, it will be able to infer the rest. Bomb making is not complicated.

    kellogh,
    @kellogh@hachyderm.io avatar

    @schizanon it sure seems like reducing the number of bomb recipes would certainly be good. again, i get that it’s always hard/impossible to get to 100%, but it also seems like it’s been repeatedly shown that LLMs hallucinate a lot more on topics that aren’t well-represented in their TDS

    schizanon,

    @kellogh okay, you read the whole Internet and filter out all the bomb making stuff then if it's easy

  • All
  • Subscribed
  • Moderated
  • Favorites
  • LLMs
  • DreamBathrooms
  • mdbf
  • ethstaker
  • magazineikmin
  • cubers
  • rosin
  • thenastyranch
  • Youngstown
  • osvaldo12
  • slotface
  • khanakhh
  • kavyap
  • InstantRegret
  • Durango
  • JUstTest
  • everett
  • tacticalgear
  • modclub
  • anitta
  • cisconetworking
  • tester
  • ngwrru68w68
  • GTA5RPClips
  • normalnudes
  • megavids
  • Leos
  • provamag3
  • lostlight
  • All magazines