Thanks to @saghul@pixelfed.social for the _perfect_ illustration of the problems... - Random

steely_glint, 15 days ago

Thanks to @saghul for the perfect illustration of the problems with chatGPT:

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ beeoproblem, whynothugo, ziegenberg, wonka +24 more

Image

Image alternative text

flohlaus, 14 days ago

@steely_glint @saghul 🤪

image/jpeg

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ steely_glint

danashkenazi, 15 days ago

@steely_glint @saghul somehow the singularity is getting farther away 😂

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ shekinahcancook

po3mah, 14 days ago

@steely_glint @saghul
Yeah, it keeps giving. While it can answer correctly if the question is copy-pasted from the first picture, it shits itself if the question is modified slightly.
It looks like developers are fixing it on the fly.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

fedithom, 14 days ago

@steely_glint @saghul
Yes. LLMs are stupid. Sometimes hilariously so. Haha.

Now, can we all stop feeding the monkey please? ChatGPT and it's stupid, useless ilk help burn the world with their energy usage. It is known. And decidedly not funny.

Thank you kindly.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

steely_glint, 13 days ago

@fedithom It isn't 'known' everywhere - Over on linkedin everyone is talking about how LLMs will revolutionise X,Y and Z . I felt it worth a single query to see if I could get the message across that LLMs are more of a party trick than a solution to any actual problem.

Also are you Canadian ? I've only ever heard a Canadian say "thank you kindly".

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

tobybaier, 14 days ago

@steely_glint or like this

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

kravietz, 14 days ago

@steely_glint

Who remembers the Prolog programming language? It was intended to build machine-readable knowledge base in a way that would allow to answer the above question correctly:

@saghul

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

steely_glint, 14 days ago

@kravietz
I remember prolog - one of the research groups at Uni used it to model some British legislation. They struggled with counterfactuals. "If your grandfather had been alive in 1970, would he have been a British citizen?" required them to build a whole duplicate ruleset with a resurrected grandfather to answer the question, capture the result in the current rules then then discard the alternate universe before carrying on. ;-)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

steely_glint, 14 days ago

@kravietz They also had a sign over their door saying "abandon all Hope ye who enter here" as a dig at the other group who were working on a transputer friendly functional language called Hope.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ami, 14 days ago

@steely_glint @saghul

From Llama 3 and Claude..

image/jpeg

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

steely_glint, 14 days ago

@ami @saghul Which makes it an interesting test I think.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

nikolar, 15 days ago

@steely_glint @saghul haah, the algorithm is a manspalin' tech bro. Love the future

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

steely_glint, 14 days ago

@nikolar @saghul Where can it have learned that ? ;-)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

bjoernd, 15 days ago

@steely_glint @saghul

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

steely_glint, 14 days ago

@bjoernd @saghul yep, the order matters. Presumably because it matches more of the "feathers or lead" text out there, whereas "lead or feathers" has fewer matches and so the numbers take precedence.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

quantensalat, 15 days ago

@steely_glint @saghul

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

steely_glint, 15 days ago

@quantensalat @saghul Yep, elsewhere in the comments someone pointed out that chatGPT4 gets it right (chatGPT 3 does not), we were discussing wether that's a fix for this riddle or a re-weighting of the importance of numbers in comparisons.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

philtor, 15 days ago

@steely_glint @saghul MetaAI answer:

2kg of feathers is heavier than 1kg of lead. Even though the feathers take up more space, their total mass is greater than the lead's.

To clarify, the term "heavy" can refer to either an object's mass (the amount of matter it contains) or its density (the amount of mass per unit volume). In this case, the 2kg of feathers has a greater mass, but the 1kg of lead is denser. Let me know if you have any other questions!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

scribe, 15 days ago

@steely_glint @saghul Weird how it does actually kind of reflect the answers over at https://www.quora.com/What-is-heavier-a-kilo-of-lead-or-a-kilo-of-feathers but almost seems to misread "2kg" as "1kg" just like a human might...

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

steely_glint, 15 days ago

@scribe @saghul ha, that is probably where the text came from.

1 is only a single bit away from 2 so does not change the textual best match.

Re-ordering the question does change the answer though.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

bornach, 15 days ago

@steely_glint @scribe @saghul
Related to the failures that Yejin Choi found
https://youtu.be/SvBR0OGT5VI?t=4m1s

Tried the jugs example on Copilot the other day. No improvement.
https://masto.ai/@bornach/112201311315573304

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

solarisfire, 15 days ago

@steely_glint @saghul Oddly, it gives me the correct answer.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

solarisfire, 15 days ago

@steely_glint @saghul Ah I was on GPT4, dropping back to GPT3.5 and yeah it's dumb...

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

steely_glint, 15 days ago

@solarisfire @saghul Which makes it a useful test I guess.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

solarisfire, 15 days ago

@steely_glint @saghul It makes me wonder if they specifically manipulated the training data for GPT4 so it answers that correctly now... or if it learned to do so with no specific manipulation to the data set...

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

steely_glint, 15 days ago

@solarisfire @saghul could be a halfway house where they upped the weighting of numbers in questions that involve comparisons. - Essentially that is what is 'wrong' with 3.5 it allows the volume/placing of blather in the text to outweigh the difference between 1 and 2

(Ironic that it's a weighting problem ;-) )

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

bornach, 15 days ago

@steely_glint @solarisfire @saghul
Bing Chat/Copilot reportedly uses GPT4-Turbo and can search the Internet, yet it doesn't understand that you cannot pour 1 liter into an already full jug
https://masto.ai/@bornach/112201221575789055

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

raganwald, 15 days ago

@solarisfire

I have the same experience, ChatGPT 4 gets it right. But one can’t help wondering… Is ChatGPT4 simply so much better it gets this right intrinsically?

Or did someone file a bug for this and somewhere in its bowels, there’s a hand-coded exception?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

bornach, 15 days ago

@raganwald @solarisfire
Likely the OpenAI engineers went through the failures that users uploaded to ShareGPT
https://sharegpt.com/c/vijL1Me

And on Reddit
https://www.reddit.com/r/ChatGPT/comments/11rr668/still_doesnt_pass_the_featherlead_test/

Then turned them into microtasks for an annotation company in Nigeria or India to source a better answer from a gig worker
https://m.economictimes.com/tech/technology/indian-gig-workers-toil-at-frontlines-of-ai-revolution/articleshow/109864213.cms

The training data created by the annotation gig industry (AGI) was then incorporated into GPT4 via RLHF

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

the_moep, 15 days ago

@steely_glint Are we sure it even understands what a "kilogram" is seeing as it is American 👀

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

steely_glint, 15 days ago

@the_moep Kinda sure, since it replaces kg with kilograms and does the right thing with the plurals - unless there is an American meaning of kilograms that translates as 'falls faster' :-)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ariaflame, 14 days ago

@steely_glint @the_moep But it doesn't understand it, it just has strong matches statistically between kg and kilogram. That's it. It doesn't know what they mean. It just generates words based on statistical models based on what humans wrote (though now it's probably choking on its own products). It's like trying to work out the height of a particular person by getting the average of all the heights of all the people.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

steely_glint, 14 days ago

@ariaflame @the_moep I think there may be special rules for synonyms (like kg and kilograms) - but yep, that isn't the same as understanding.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

bornach, 15 days ago

@the_moep @steely_glint
It understands kilograms just as well as it understands pounds
https://sharegpt.com/c/vijL1Me
That is, it doesn't understand measurements at all

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ steely_glint

ljrk, 15 days ago

@steely_glint @saghul LLMs are "sound good" text generators. Any meaning of the text must be supplied by a human. If the human "author" doesn't supply meaning, it's devoid of all content and making sense of it is left as an exercise to the reader, by "reading into it" and interpreting the nonsense. It's useless to use such a text to communicate.

This is effectively like shuffling the cards and playing tarot: You get rid of any "author" who could add meaning. If you, instead, have a human take the cards, assign previous meaning and use the cards to communicate using those cards, you'd actually be able to communicate (although a bit cumbersome).

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Add comment