I've been doing a bit more experimenting with #LargeLanguageModels #LLM and... - Large Language Models

simon_brooke, 9 months ago

I've been doing a bit more experimenting with #LargeLanguageModels #LLM and truth, and I've got an interesting one.

my experimental design was that I'd start asking about relationships between European monarchs, and then start introducing fictitious monarchs, but I didn't get that far...

#1/several

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Image

Image alternative text

sunflowerinrain, 9 months ago

@simon_brooke I hope that attempts to replace thinking with LLM will fail quickly. What worries me more is swathes of people believing their nonsense to be true fact. We have the examples of lemming behaviour, Braveheart, oh good grief if I continue this list it will cause blood pressure to explode its monitor.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ simon_brooke

ACAElliott, 9 months ago

deleted_by_author

Loading...

+ simon_brooke

fitheach, 9 months ago

@ACAElliott
> (Says me, with no real expertise in this matter at all.)

Never stops anyone else.

@simon_brooke

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ACAElliott, 9 months ago

@simon_brooke

Interesting thread.

It's almost as if Bard is reasonably plausible when it comes to 'one degree of separation' questions, but when there is a chain of facts to connect, it totally bombs out.

Which makes sense, since the longer the chain, the more branching possibilities there are, and the less likely it is that anything in the training data addresses that particular question.

(Says me, with no real expertise in this matter at all.)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon_brooke, 9 months ago

I asked Google Bard:

"Describe the relationship between Sweyn Forkbeard and Tsar Nicholas II"

Bard answered, in part:

"Sweyn Forkbeard's daughter, Gytha, married Vladimir II Monomakh, the Grand Prince of Kiev. One of their grandsons, Inge I of Norway, was the great-grandfather of Margaret I of Denmark. Margaret I was the grandmother of Christian I of Denmark, who was the great-grandfather of Ivan IV of Russia, the grandfather of Tsar Nicholas II"

Plausible, yes?

#2 of several

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

HeavenlyPossum, 9 months ago

@simon_brooke

I like where this is going

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ simon_brooke

simon_brooke, 9 months ago

Well, let's check.

Was Vladimir II Monomakh a real person? ✅
Was he really married to Gytha? ✅
Was Gytha Sweyn Forkbeard's daughter? ❌

According to Wikipedia (which can also be wrong) Sweyn Forkbeard died 1014; Gytha, daughter of Harald Godwinson, was born after 1053.

#3 of several

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon_brooke, 9 months ago

I expected an #LLM to run into problems when I introduced Rufus of Rubovia, but the real royal families of Europe were massively intermarried and in particular Scandinavian royalty had a major impact on the formation of Russia; and their geneologies are matters of public record you'd expect an #LLM to have ingested. So if there were any semantic model here at all, this is just trivial data retrieval. But Bard can't do even that, so it just makes shit up.

#4 of several

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon_brooke, 9 months ago

So if not Sweyn Forkbeard, what about his great rival Olaf Tryggvason? Bard tells me:

"Some people may mistakenly believe that there is a relationship between Olaf Tryggvason and Nicholas II because they both have the same surname, "Tryggvason." However, this is just a coincidence. The surname "Tryggvason" is a common Scandinavian surname, and it is not clear whether Olaf Tryggvason and Nicholas II are actually related."

Wait, WHAT!? There are so many ways this is wrong!

#5 of several

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon_brooke, 9 months ago

1, Tryggvason is not a surname, it's a patronymic;
2, Tryggvason, as a patronymic, is not now common in Norway or anywhere else;
3, Nicholas II's family name was 'Romanov'; that literally means 'the Roman', but it's clearly not a patronymic;
4, Nicholas II did have a patronymic, but it was 'Alexandrovich';
5, 'Romanov' and 'Tryggvason' are not the same;
6, 'Alexandrovich' and 'Tryggvason' are not the same...

We expect #LLM responses to be plausible; this fails even that test.

#6 of several

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon_brooke, 9 months ago

Let's be clear: I know that #LLM answers are truthful only by accident. But that is exactly why I started with questions whose answers were simple data retrieval: factual things which, while not necessarily trivial for humans to retrieve, should nevertheless be very simple for a machine. It's tree search, dammit!

#7 of several

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Loukas, 9 months ago

@simon_brooke So an LLM can't even replace the labour of one reply guy with access to wikipedia

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ simon_brooke

BarneyDellar, 9 months ago

@Loukas @simon_brooke That’s the thing though. These LLMs don’t have access to the internet. They don’t look up facts. They just give plausible answers. They’re great if you give them some text and ask them to summarise it. But they will just make stuff up if you ask them general knowledge questions.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon_brooke, 9 months ago

@BarneyDellar @Loukas Well, yes, but:

They are trained, if not on the entire Internet, at least on substantial libraries of curated useful information; and

You'd expect Google Bard, which is a search prototype, to be back-ended on some serious search technology, which Google certainly has access to.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon_brooke, 9 months ago

@Loukas Someone, in a history department in a university somewhere, has a database with a complete geneology of European monarchs, which could answer these questions in milliseconds. I'd be actually surprised if Google does not have one.

This is astonishingly poor!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon_brooke, 9 months ago

I expected it to give answers like the first one (x begat y, who begat z, who begat...), except accurate, or else say that there was no relationship.

I expected nonsense to emerge only when I introduced fictitious names.

I expected at least some answers to be correct and verifiable. But not only do I have 'hallucinations' in both answers, but I have at least seven actual errors in the answers to two questions.

#8 of several

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon_brooke, 9 months ago

Anyone who fears that #LargeLanguageModels are coming for their job either has a really shit job, or is really shit at their job.

Anyone who fears that #LargeLanguageModels, or anything which may evolve in the near term from #LargeLanguageModels, might take over the world really doesn't understand what this technology is (in)capable of.

#9 of 10

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon_brooke, 9 months ago

As @otfrom has just written, #LargeLanguageModels deliver "Eton Oxbridge PPE as a service" – just that, and nothing more.

They're as capable of competent governance, of being 'World King' as... well, as Boris Johnson.

#10 of 10

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

HeavenlyPossum, 9 months ago

@simon_brooke @otfrom

All these delusional credulous nerds out there insisting these things are alive and have internal mental models of the world and meanwhile they’re just spitting out blocks of text that are plausibly correlated to other blocks of text.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ simon_brooke

ncdominie, 9 months ago

@simon_brooke ... or is good at their job but is employed by people who massively overestimate what LLMs can do.

The danger isn't the Paperclip Maximiser; it's the people who want to turn the world into paperclips in anticipation of its coming.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ simon_brooke

HeavenlyPossum, 9 months ago

@simon_brooke

Don’t blame workers for capitalist exploitation!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ simon_brooke

lispi314, 9 months ago

@HeavenlyPossum @simon_brooke The jobs will be momentarily unavailable, unless the company actually collapses from such mismanagement first.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

fedops, 9 months ago

@simon_brooke and anyone pushing for that to happen is really dangerous (and an idiot).

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ simon_brooke

screwtape, 9 months ago

@fedops ++
@simon_brooke great series

Contrasting side note you kind of addressed I think. Iirc when an llm bot is hallucinating, it makes logically inconsistent statements G-- is so-and-so's daughter, G-- & X are married. Is G-- so-and-so's daughter? No.

"Small" LLMs have been applied to constructing proof sketches, in an effort to actually say at least internally consistent things. If you watch articles @Jose_A_Alonso shares for a few days the topic will come up again.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ simon_brooke

petealexharris, 9 months ago

@screwtape @fedops @simon_brooke @Jose_A_Alonso
but of course logical consistency between two statements A and B isn't part of the model, so checking it isn't part of the generative algorithm. Only whether statements like A and B are likely to occur together in the training data.

An automated proof sketching engine that also used a language model to reword maths into natural language might be interesting, but natural language itself is not and can't be what proofs are constructed with.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ simon_brooke

screwtape, 9 months ago

@petealexharris
Thanks! I believe automatically using proof assistants using a Large Language Model approach on a large language model approach is an established research topic, with the problem that there is not a Large amount of Language examples on proof assistants. IIRC.

In this case, I was suggesting it had heuristic value due to some established properties of hallucination.
@fedops @simon_brooke @Jose_A_Alonso

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ simon_brooke

screwtape, 9 months ago

@petealexharris
ie I believe you are wrong that LLMs are not being explored for proofs.
@fedops @simon_brooke @Jose_A_Alonso

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ simon_brooke

petealexharris, 9 months ago

@screwtape @simon_brooke
Some physicists spent a couple of decades on string theory too.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

screwtape, 9 months ago

@petealexharris
I would have a look at Jose's timeline of academic articles, including this research skein for context. After which I will feel better except for the comparison to string theorists! Aside, I am an opponent of LLMs for other reasons ( supporting gopher://perma.computer ).
@simon_brooke

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ simon_brooke

otfrom, 9 months ago

deleted_by_author

Loading...

simon_brooke, 9 months ago

@otfrom shit job, definition of.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

adamgreenfield, 9 months ago

@simon_brooke @otfrom If all the jobs that just involved the churn of meaningless signs were eliminated from the economy, there would be no economy. Over the span of just a few months, GANs and LLMs have already seen for the livelihood of a few people I know – people who had managed to retain a certain independence from the métro-boulot-dodo world, as well.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

otfrom, 9 months ago

deleted_by_author

Loading...

simon_brooke, 9 months ago

@otfrom yes, but backending a natural language parser onto tree search is something I was doing in games programming forty-mumble years ago. It ain't rocket science.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Alternatecelt, 9 months ago

@simon_brooke
Those are some wild 'hallucinations'.
I keep seeing adverts on the work twitter for using these to 'increase productivity ' and 'improve your management writing '. It's like management classes just can't wait to dive in to having these write their emails etc. Maybe they will end up out of a job because they get sacked for really bad emails....

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ simon_brooke

otfrom, 9 months ago

deleted_by_author

Loading...

+ simon_brooke

simon_brooke, 9 months ago

@otfrom Oh, that is very astute! Permission to plagiarise wildly?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Add comment