steely_glint,
@steely_glint@chaos.social avatar

Thanks to @saghul for the perfect illustration of the problems with chatGPT:

flohlaus,
@flohlaus@det.social avatar
danashkenazi,
@danashkenazi@babka.social avatar

@steely_glint @saghul somehow the singularity is getting farther away 😂

po3mah,
@po3mah@mastodon.social avatar

@steely_glint @saghul
Yeah, it keeps giving. While it can answer correctly if the question is copy-pasted from the first picture, it shits itself if the question is modified slightly.
It looks like developers are fixing it on the fly.

fedithom,
@fedithom@social.saarland avatar

@steely_glint @saghul
Yes. LLMs are stupid. Sometimes hilariously so. Haha.

Now, can we all stop feeding the monkey please? ChatGPT and it's stupid, useless ilk help burn the world with their energy usage. It is known. And decidedly not funny.

Thank you kindly.

steely_glint,
@steely_glint@chaos.social avatar

@fedithom It isn't 'known' everywhere - Over on linkedin everyone is talking about how LLMs will revolutionise X,Y and Z . I felt it worth a single query to see if I could get the message across that LLMs are more of a party trick than a solution to any actual problem.

Also are you Canadian ? I've only ever heard a Canadian say "thank you kindly".

tobybaier,
@tobybaier@chaos.social avatar

@steely_glint or like this

kravietz,
@kravietz@agora.echelon.pl avatar

@steely_glint

Who remembers the Prolog programming language? It was intended to build machine-readable knowledge base in a way that would allow to answer the above question correctly:

@saghul

steely_glint,
@steely_glint@chaos.social avatar

@kravietz
I remember prolog - one of the research groups at Uni used it to model some British legislation. They struggled with counterfactuals. "If your grandfather had been alive in 1970, would he have been a British citizen?" required them to build a whole duplicate ruleset with a resurrected grandfather to answer the question, capture the result in the current rules then then discard the alternate universe before carrying on. ;-)

steely_glint,
@steely_glint@chaos.social avatar

@kravietz They also had a sign over their door saying "abandon all Hope ye who enter here" as a dig at the other group who were working on a transputer friendly functional language called Hope.

ami,
@ami@mastodon.world avatar

@steely_glint @saghul

From Llama 3 and Claude..

image/jpeg

steely_glint,
@steely_glint@chaos.social avatar

@ami @saghul Which makes it an interesting test I think.

nikolar,
@nikolar@mastodonsweden.se avatar

@steely_glint @saghul haah, the algorithm is a manspalin' tech bro. Love the future

steely_glint,
@steely_glint@chaos.social avatar

@nikolar @saghul Where can it have learned that ? ;-)

bjoernd,
@bjoernd@hachyderm.io avatar
steely_glint,
@steely_glint@chaos.social avatar

@bjoernd @saghul yep, the order matters. Presumably because it matches more of the "feathers or lead" text out there, whereas "lead or feathers" has fewer matches and so the numbers take precedence.

quantensalat,
@quantensalat@astrodon.social avatar
steely_glint,
@steely_glint@chaos.social avatar

@quantensalat @saghul Yep, elsewhere in the comments someone pointed out that chatGPT4 gets it right (chatGPT 3 does not), we were discussing wether that's a fix for this riddle or a re-weighting of the importance of numbers in comparisons.

philtor,
@philtor@fosstodon.org avatar

@steely_glint @saghul MetaAI answer:

2kg of feathers is heavier than 1kg of lead. Even though the feathers take up more space, their total mass is greater than the lead's.

To clarify, the term "heavy" can refer to either an object's mass (the amount of matter it contains) or its density (the amount of mass per unit volume). In this case, the 2kg of feathers has a greater mass, but the 1kg of lead is denser. Let me know if you have any other questions!

scribe,
@scribe@mastodon.sdf.org avatar

@steely_glint @saghul Weird how it does actually kind of reflect the answers over at https://www.quora.com/What-is-heavier-a-kilo-of-lead-or-a-kilo-of-feathers but almost seems to misread "2kg" as "1kg" just like a human might...

steely_glint,
@steely_glint@chaos.social avatar

@scribe @saghul ha, that is probably where the text came from.

1 is only a single bit away from 2 so does not change the textual best match.

Re-ordering the question does change the answer though.

bornach,
@bornach@masto.ai avatar

@steely_glint @scribe @saghul
Related to the failures that Yejin Choi found
https://youtu.be/SvBR0OGT5VI?t=4m1s

Tried the jugs example on Copilot the other day. No improvement.
https://masto.ai/@bornach/112201311315573304

solarisfire,
@solarisfire@mast.solarisfire.com avatar

@steely_glint @saghul Oddly, it gives me the correct answer.

solarisfire,
@solarisfire@mast.solarisfire.com avatar

@steely_glint @saghul Ah I was on GPT4, dropping back to GPT3.5 and yeah it's dumb...

steely_glint,
@steely_glint@chaos.social avatar

@solarisfire @saghul Which makes it a useful test I guess.

solarisfire,
@solarisfire@mast.solarisfire.com avatar

@steely_glint @saghul It makes me wonder if they specifically manipulated the training data for GPT4 so it answers that correctly now... or if it learned to do so with no specific manipulation to the data set...

steely_glint,
@steely_glint@chaos.social avatar

@solarisfire @saghul could be a halfway house where they upped the weighting of numbers in questions that involve comparisons. - Essentially that is what is 'wrong' with 3.5 it allows the volume/placing of blather in the text to outweigh the difference between 1 and 2

(Ironic that it's a weighting problem ;-) )

bornach,
@bornach@masto.ai avatar

@steely_glint @solarisfire @saghul
Bing Chat/Copilot reportedly uses GPT4-Turbo and can search the Internet, yet it doesn't understand that you cannot pour 1 liter into an already full jug
https://masto.ai/@bornach/112201221575789055

raganwald,
@raganwald@social.bau-ha.us avatar

@solarisfire

I have the same experience, ChatGPT 4 gets it right. But one can’t help wondering… Is ChatGPT4 simply so much better it gets this right intrinsically?

Or did someone file a bug for this and somewhere in its bowels, there’s a hand-coded exception?

bornach,
@bornach@masto.ai avatar

@raganwald @solarisfire
Likely the OpenAI engineers went through the failures that users uploaded to ShareGPT
https://sharegpt.com/c/vijL1Me

And on Reddit
https://www.reddit.com/r/ChatGPT/comments/11rr668/still_doesnt_pass_the_featherlead_test/

Then turned them into microtasks for an annotation company in Nigeria or India to source a better answer from a gig worker
https://m.economictimes.com/tech/technology/indian-gig-workers-toil-at-frontlines-of-ai-revolution/articleshow/109864213.cms

The training data created by the annotation gig industry (AGI) was then incorporated into GPT4 via RLHF

the_moep,
@the_moep@social.tchncs.de avatar

@steely_glint Are we sure it even understands what a "kilogram" is seeing as it is American 👀

steely_glint,
@steely_glint@chaos.social avatar

@the_moep Kinda sure, since it replaces kg with kilograms and does the right thing with the plurals - unless there is an American meaning of kilograms that translates as 'falls faster' :-)

ariaflame,
@ariaflame@masto.ai avatar

@steely_glint @the_moep But it doesn't understand it, it just has strong matches statistically between kg and kilogram. That's it. It doesn't know what they mean. It just generates words based on statistical models based on what humans wrote (though now it's probably choking on its own products). It's like trying to work out the height of a particular person by getting the average of all the heights of all the people.

steely_glint,
@steely_glint@chaos.social avatar

@ariaflame @the_moep I think there may be special rules for synonyms (like kg and kilograms) - but yep, that isn't the same as understanding.

bornach,
@bornach@masto.ai avatar

@the_moep @steely_glint
It understands kilograms just as well as it understands pounds
https://sharegpt.com/c/vijL1Me
That is, it doesn't understand measurements at all

ljrk,
@ljrk@todon.eu avatar

@steely_glint @saghul LLMs are "sound good" text generators. Any meaning of the text must be supplied by a human. If the human "author" doesn't supply meaning, it's devoid of all content and making sense of it is left as an exercise to the reader, by "reading into it" and interpreting the nonsense. It's useless to use such a text to communicate.

This is effectively like shuffling the cards and playing tarot: You get rid of any "author" who could add meaning. If you, instead, have a human take the cards, assign previous meaning and use the cards to communicate using those cards, you'd actually be able to communicate (although a bit cumbersome).

  • All
  • Subscribed
  • Moderated
  • Favorites
  • random
  • slotface
  • kavyap
  • everett
  • Durango
  • osvaldo12
  • rosin
  • thenastyranch
  • DreamBathrooms
  • mdbf
  • magazineikmin
  • InstantRegret
  • Youngstown
  • ngwrru68w68
  • anitta
  • megavids
  • normalnudes
  • ethstaker
  • cisconetworking
  • tacticalgear
  • khanakhh
  • cubers
  • GTA5RPClips
  • provamag3
  • modclub
  • Leos
  • tester
  • JUstTest
  • lostlight
  • All magazines