bpaassen,
@bpaassen@bildung.social avatar

The last days, I could participate in a Dagstuhl Seminar on Generalization in Humans and Machines. I learned a lot of things, especially one: How weird it is that we expect large language models to generalize to all kinds of tasks. Let me explain. (1/10)

https://www.dagstuhl.de/seminars/seminar-calendar/seminar-details/24192

bpaassen,
@bpaassen@bildung.social avatar

When I say "generalize" I mean that a model is able to perform well on new, previously unseen data. In classic machine learning, this was very strictly defined: We assumed that training and test data were sampled from exactly the same distribution. (2/10)

bpaassen,
@bpaassen@bildung.social avatar

For example, when we train a system to distinguish cat pictures from dog pictures, we would test it on new cat pictures and new dog pictures that were recorded in the same way as the training data. And for these test pictures, we would expect generalization. (3/10)

bpaassen,
@bpaassen@bildung.social avatar

There is a whole research field of statistical learning theory that tells us under which conditions we can expect generalization in this setting. So that setting is somewhat "solved" by now. (4/10)

bpaassen,
@bpaassen@bildung.social avatar

In recent years, machine learning got more ambitious: We also tried to make models generalize to test data from other distributions. Like cat and dog pictures in different lighting conditions, or rotated pictures, or to extend them to new classes, like pictures of sloths or quokkas. (5/10)

bpaassen,
@bpaassen@bildung.social avatar

The most extreme form is transfer learning. There, we take parts of an existing model and transfer it to an entirely new task. Like taking parts of neural networks trained on really big image data sets and fine-tuning them for other image processing tasks. (6/10)

bpaassen,
@bpaassen@bildung.social avatar

In these scenarios, the theoretical guarantees for generalization are essentially lost (except for special, nice cases). At least, we still know that our models were kind of trained and designed for the task we're using them for. (7/10)

bpaassen,
@bpaassen@bildung.social avatar

And now I get back to large language models (LLMs). For some reason, we expect LLMs to generalize to entirely new tasks that the models were not designed for. This is such a wild and weird expectation. Why would we think that this works? (8/10)

bpaassen,
@bpaassen@bildung.social avatar

I guess because we saw empirical examples that it works. But I hope my long-winded explanation makes clear: We should be skeptical. Expecting generalization to tasks a system was not designed for goes far beyond the guarantees we can give. (9/10)

bpaassen,
@bpaassen@bildung.social avatar

The reasonable generalization expectation for LLMs is that they can auto-complete text in ways that is consistent with the training data and superficially satisifies human raters that operate under time pressure. That's what they are trained for. Anything else is coincidental. (10/10)

bornach,
@bornach@masto.ai avatar

@bpaassen
I was skeptical when the AI companies refused to reveal what was in the training data and seemed uninterested in determining whether their LLM was figuring things out for itself or was simply regurgitating an answer that got scraped into the dataset.

So taking a lead from Yejin Choi
https://www.ted.com/talks/yejin_choi_why_ai_is_incredibly_smart_and_shockingly_stupid?language=en

I tried prompting with well known FAQ puzzles but with slight changes that invalidated the stock answer. Didn't take long to confuse the LLM
https://masto.ai/@bornach/112207324622232774

bornach,
@bornach@masto.ai avatar

@bpaassen
We laymen think it should work because of scifi movies where
Johnny 5 reads all the encyclopedias and becomes sentient
https://youtu.be/WnTKllDbu5o

  • All
  • Subscribed
  • Moderated
  • Favorites
  • random
  • slotface
  • kavyap
  • thenastyranch
  • everett
  • tacticalgear
  • rosin
  • Durango
  • DreamBathrooms
  • mdbf
  • magazineikmin
  • InstantRegret
  • Youngstown
  • khanakhh
  • ethstaker
  • JUstTest
  • ngwrru68w68
  • cisconetworking
  • modclub
  • normalnudes
  • osvaldo12
  • cubers
  • GTA5RPClips
  • Leos
  • tester
  • megavids
  • provamag3
  • anitta
  • lostlight
  • All magazines