The last days, I could participate in a Dagstuhl Seminar on Generalization in... - Random

bpaassen, 18 days ago

The last days, I could participate in a Dagstuhl Seminar on Generalization in Humans and Machines. I learned a lot of things, especially one: How weird it is that we expect large language models to generalize to all kinds of tasks. Let me explain. (1/10)

https://www.dagstuhl.de/seminars/seminar-calendar/seminar-details/24192

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ CerstinMahlow, bornach, LegalizeBrain

Image

Image alternative text

bpaassen, 18 days ago

When I say "generalize" I mean that a model is able to perform well on new, previously unseen data. In classic machine learning, this was very strictly defined: We assumed that training and test data were sampled from exactly the same distribution. (2/10)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

bpaassen, 18 days ago

For example, when we train a system to distinguish cat pictures from dog pictures, we would test it on new cat pictures and new dog pictures that were recorded in the same way as the training data. And for these test pictures, we would expect generalization. (3/10)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

bpaassen, 18 days ago

There is a whole research field of statistical learning theory that tells us under which conditions we can expect generalization in this setting. So that setting is somewhat "solved" by now. (4/10)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

bpaassen, 18 days ago

In recent years, machine learning got more ambitious: We also tried to make models generalize to test data from other distributions. Like cat and dog pictures in different lighting conditions, or rotated pictures, or to extend them to new classes, like pictures of sloths or quokkas. (5/10)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

bpaassen, 18 days ago

The most extreme form is transfer learning. There, we take parts of an existing model and transfer it to an entirely new task. Like taking parts of neural networks trained on really big image data sets and fine-tuning them for other image processing tasks. (6/10)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

bpaassen, 18 days ago

In these scenarios, the theoretical guarantees for generalization are essentially lost (except for special, nice cases). At least, we still know that our models were kind of trained and designed for the task we're using them for. (7/10)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

bpaassen, 18 days ago

And now I get back to large language models (LLMs). For some reason, we expect LLMs to generalize to entirely new tasks that the models were not designed for. This is such a wild and weird expectation. Why would we think that this works? (8/10)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ trochee

bpaassen, 18 days ago

I guess because we saw empirical examples that it works. But I hope my long-winded explanation makes clear: We should be skeptical. Expecting generalization to tasks a system was not designed for goes far beyond the guarantees we can give. (9/10)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

bpaassen, 18 days ago

The reasonable generalization expectation for LLMs is that they can auto-complete text in ways that is consistent with the training data and superficially satisifies human raters that operate under time pressure. That's what they are trained for. Anything else is coincidental. (10/10)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ LLS, bornach, baldur, trochee

bornach, 18 days ago

@bpaassen
I was skeptical when the AI companies refused to reveal what was in the training data and seemed uninterested in determining whether their LLM was figuring things out for itself or was simply regurgitating an answer that got scraped into the dataset.

So taking a lead from Yejin Choi
https://www.ted.com/talks/yejin_choi_why_ai_is_incredibly_smart_and_shockingly_stupid?language=en

I tried prompting with well known FAQ puzzles but with slight changes that invalidated the stock answer. Didn't take long to confuse the LLM
https://masto.ai/@bornach/112207324622232774

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

bornach, 18 days ago

@bpaassen
We laymen think it should work because of scifi movies where
Johnny 5 reads all the encyclopedias and becomes sentient
https://youtu.be/WnTKllDbu5o

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Add comment