Every so often I see a post about how LLMs fail logic puzzles.... - Random

tomw, 8 months ago

Every so often I see a post about how LLMs fail logic puzzles.

And... yes? Of course they do. The only way it could solve it is if it has seen the puzzle before or a substantially similar one. (But that might cause it to give the answer to the similar one, not the correct answer.)

Why is this even tested so often or considered surprising? It is, in essence, an autocomplete. It does not understand logic. It has no concept of a correct answer. It gives the most likely completion.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ BoredomFestival, wordshaper

Image

Image alternative text

cdarwin, 8 months ago

@tomw
#LLM = #syntax - #semantics

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

SteveGodwin, 8 months ago

@tomw not so much a demonstration of how smart computers are but instead a demonstration of how stupid people are.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pdcawley, 8 months ago

@tomw @wordshaper because LLMs are consistently oversold as AI, and solving logic puzzles or simple arithmetic problems are the sort of thing that a 'real' AI should be able to do.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

tek_dmn, 8 months ago

@tomw Because people don't realize what they are. They see it as "AI" and are trying to test the intelligence part of artificial intelligence.

They're not intelligent. They're, yeah, autocorrect (well, predictive text completion) that has enough understanding of language to attempt to construct correct sentences, without blindly just using some markov chain of "if the user types artificial, they'll type intelligence next"

It's not that there's no understanding, there's not even a concept of things like facts, fiction, or logic. But either because it's branded AI, or because you can ask a natural-language question and get a natural-language answer, people think it, well, thinks.

It's scary what that implies. All that passes for computer "intelligence" is natural language I/O.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

benjamineskola, 8 months ago

@tomw I agree, but also: given how much they’re hyped as being actually “intelligent”, it’s no surprise that many people assume that they can actually solve this kind of puzzle.

(On the other hand, I also see this from people who really ought to know better.)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

tomw, 8 months ago

@benjamineskola It adds to the illusion by being able to recite the answers to the best-known puzzles (similar to the absurdity about it "passing" exams)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

samueljohnson, 8 months ago

@tomw What makes me laugh is when it apologises for basic mistakes.

"You're absolutely right. I am sorry for the confusion."

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

tomw, 8 months ago

@samueljohnson It will also apologise for being correct if you insist that it is a mistake.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

eibhear, 8 months ago

@tomw Nearly every single comment I see from those who aren't involved in designing AIs and LLMs over-emphasises the "I" and under-emphasises the "A". I suspect this is at the core of the surprise that LLMs are shite at problem-solving.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

tomw, 8 months ago

@eibhear Yes, the term is a problem. There is no more intelligence in "AI" than in, say, a calculator.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

BenAveling, 8 months ago

@tomw @eibhear that’s too strong. AI isn’t intelligence, but for some proposes, it’s a good enough substitute.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

tomw, 8 months ago

@BenAveling @eibhear So is a calculator.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Azuaron, 8 months ago

@tomw Because the vast majority of people don't understand LLMs, and even some people who definitely know better keep talking about how it's actual intelligence and that proto-Skynet's an existential threat to humanity.

It's good to show such people that LLMs do not think, do not "know" things.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

tomw, 8 months ago

@Azuaron Yes, the existential threat stuff is so far from the reality of fancy autocomplete as to be laughable. It is bizarre that anyone takes it seriously and I don't know what the people pushing it hope to achieve.

I suppose it is on some level useful to demonstrate things like "look, I had a vaccine but I am not magnetic", but it does tend to take the proposition more seriously than is deserved.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Azuaron, 8 months ago

@tomw The existential threat people fall into two camps: grifters and griftees. When Musk, the OpenAI CEO, and their employees talk about it, what they're trying to do is convince people that LLMs are SO POWERFUL, they could even DESTROY HUMANITY. This helps them convince businesses to invest in AI because businesses love power, and doesn't that sound like the pinnacle of power?

This is why I see the logic puzzle tests as a net-good: deflating the "power" myth of LLMs reveals the grift.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

tomw, 8 months ago

@Azuaron In that sense I suppose it is a relative of "I tried to use cryptocurrency for its supposed purpose as a currency and it did not go well"

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

HauntedOwlbear, 8 months ago

@tomw I was just wondering exactly this.

I imagine people feel that it's a point that needs proving, but it sometimes feels as though this approach takes the notion that LLMs might actually be able to do this more seriously than it deseves.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

tomw, 8 months ago

@HauntedOwlbear Yeah, sometimes people list out simple puzzles it can "solve" and more complex ones where it "fails" and I'm like... no. It cannot "solve" anything. At all.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Add comment