simon_brooke,
@simon_brooke@mastodon.scot avatar

I've been doing a bit more experimenting with and truth, and I've got an interesting one.

my experimental design was that I'd start asking about relationships between European monarchs, and then start introducing fictitious monarchs, but I didn't get that far...

#1/several

sunflowerinrain,
@sunflowerinrain@mastodon.online avatar

@simon_brooke I hope that attempts to replace thinking with LLM will fail quickly. What worries me more is swathes of people believing their nonsense to be true fact. We have the examples of lemming behaviour, Braveheart, oh good grief if I continue this list it will cause blood pressure to explode its monitor.

ACAElliott,
@ACAElliott@mstdn.social avatar

deleted_by_author

fitheach,
@fitheach@mstdn.io avatar

@ACAElliott
> (Says me, with no real expertise in this matter at all.)

Never stops anyone else.

@simon_brooke

ACAElliott,
@ACAElliott@mstdn.social avatar

@simon_brooke

Interesting thread.

It's almost as if Bard is reasonably plausible when it comes to 'one degree of separation' questions, but when there is a chain of facts to connect, it totally bombs out.

Which makes sense, since the longer the chain, the more branching possibilities there are, and the less likely it is that anything in the training data addresses that particular question.

(Says me, with no real expertise in this matter at all.)

simon_brooke,
@simon_brooke@mastodon.scot avatar

I asked Google Bard:

"Describe the relationship between Sweyn Forkbeard and Tsar Nicholas II"

Bard answered, in part:

"Sweyn Forkbeard's daughter, Gytha, married Vladimir II Monomakh, the Grand Prince of Kiev. One of their grandsons, Inge I of Norway, was the great-grandfather of Margaret I of Denmark. Margaret I was the grandmother of Christian I of Denmark, who was the great-grandfather of Ivan IV of Russia, the grandfather of Tsar Nicholas II"

Plausible, yes?

#2 of several

HeavenlyPossum,
@HeavenlyPossum@kolektiva.social avatar

@simon_brooke

I like where this is going

simon_brooke,
@simon_brooke@mastodon.scot avatar

Well, let's check.

Was Vladimir II Monomakh a real person? ✅
Was he really married to Gytha? ✅
Was Gytha Sweyn Forkbeard's daughter? ❌

According to Wikipedia (which can also be wrong) Sweyn Forkbeard died 1014; Gytha, daughter of Harald Godwinson, was born after 1053.

#3 of several

simon_brooke,
@simon_brooke@mastodon.scot avatar

I expected an to run into problems when I introduced Rufus of Rubovia, but the real royal families of Europe were massively intermarried and in particular Scandinavian royalty had a major impact on the formation of Russia; and their geneologies are matters of public record you'd expect an to have ingested. So if there were any semantic model here at all, this is just trivial data retrieval. But Bard can't do even that, so it just makes shit up.

#4 of several

simon_brooke,
@simon_brooke@mastodon.scot avatar

So if not Sweyn Forkbeard, what about his great rival Olaf Tryggvason? Bard tells me:

"Some people may mistakenly believe that there is a relationship between Olaf Tryggvason and Nicholas II because they both have the same surname, "Tryggvason." However, this is just a coincidence. The surname "Tryggvason" is a common Scandinavian surname, and it is not clear whether Olaf Tryggvason and Nicholas II are actually related."

Wait, WHAT!? There are so many ways this is wrong!

#5 of several

simon_brooke,
@simon_brooke@mastodon.scot avatar

1, Tryggvason is not a surname, it's a patronymic;
2, Tryggvason, as a patronymic, is not now common in Norway or anywhere else;
3, Nicholas II's family name was 'Romanov'; that literally means 'the Roman', but it's clearly not a patronymic;
4, Nicholas II did have a patronymic, but it was 'Alexandrovich';
5, 'Romanov' and 'Tryggvason' are not the same;
6, 'Alexandrovich' and 'Tryggvason' are not the same...

We expect responses to be plausible; this fails even that test.

#6 of several

simon_brooke,
@simon_brooke@mastodon.scot avatar

Let's be clear: I know that answers are truthful only by accident. But that is exactly why I started with questions whose answers were simple data retrieval: factual things which, while not necessarily trivial for humans to retrieve, should nevertheless be very simple for a machine. It's tree search, dammit!

#7 of several

Loukas,
@Loukas@mastodon.nu avatar

@simon_brooke So an LLM can't even replace the labour of one reply guy with access to wikipedia

BarneyDellar,
@BarneyDellar@mastodon.scot avatar

@Loukas @simon_brooke That’s the thing though. These LLMs don’t have access to the internet. They don’t look up facts. They just give plausible answers. They’re great if you give them some text and ask them to summarise it. But they will just make stuff up if you ask them general knowledge questions.

simon_brooke,
@simon_brooke@mastodon.scot avatar

@BarneyDellar @Loukas Well, yes, but:

  1. They are trained, if not on the entire Internet, at least on substantial libraries of curated useful information; and
  2. You'd expect Google Bard, which is a search prototype, to be back-ended on some serious search technology, which Google certainly has access to.
simon_brooke,
@simon_brooke@mastodon.scot avatar

@Loukas Someone, in a history department in a university somewhere, has a database with a complete geneology of European monarchs, which could answer these questions in milliseconds. I'd be actually surprised if Google does not have one.

This is astonishingly poor!

simon_brooke,
@simon_brooke@mastodon.scot avatar

I expected it to give answers like the first one (x begat y, who begat z, who begat...), except accurate, or else say that there was no relationship.

I expected nonsense to emerge only when I introduced fictitious names.

I expected at least some answers to be correct and verifiable. But not only do I have 'hallucinations' in both answers, but I have at least seven actual errors in the answers to two questions.

#8 of several

simon_brooke,
@simon_brooke@mastodon.scot avatar

Anyone who fears that are coming for their job either has a really shit job, or is really shit at their job.

Anyone who fears that , or anything which may evolve in the near term from , might take over the world really doesn't understand what this technology is (in)capable of.

#9 of 10

simon_brooke,
@simon_brooke@mastodon.scot avatar

As @otfrom has just written, deliver "Eton Oxbridge PPE as a service" – just that, and nothing more.

They're as capable of competent governance, of being 'World King' as... well, as Boris Johnson.

of 10

HeavenlyPossum,
@HeavenlyPossum@kolektiva.social avatar

@simon_brooke @otfrom

All these delusional credulous nerds out there insisting these things are alive and have internal mental models of the world and meanwhile they’re just spitting out blocks of text that are plausibly correlated to other blocks of text.

ncdominie,
@ncdominie@mastodon.scot avatar

@simon_brooke ... or is good at their job but is employed by people who massively overestimate what LLMs can do.

The danger isn't the Paperclip Maximiser; it's the people who want to turn the world into paperclips in anticipation of its coming.

HeavenlyPossum,
@HeavenlyPossum@kolektiva.social avatar

@simon_brooke

Don’t blame workers for capitalist exploitation!

lispi314,
@lispi314@mastodon.top avatar

@HeavenlyPossum @simon_brooke The jobs will be momentarily unavailable, unless the company actually collapses from such mismanagement first.

fedops,
@fedops@fosstodon.org avatar

@simon_brooke and anyone pushing for that to happen is really dangerous (and an idiot).

screwtape,
@screwtape@mastodon.sdf.org avatar

@fedops ++
@simon_brooke great series

Contrasting side note you kind of addressed I think. Iirc when an llm bot is hallucinating, it makes logically inconsistent statements G-- is so-and-so's daughter, G-- & X are married. Is G-- so-and-so's daughter? No.

"Small" LLMs have been applied to constructing proof sketches, in an effort to actually say at least internally consistent things. If you watch articles @Jose_A_Alonso shares for a few days the topic will come up again.

petealexharris,
@petealexharris@mastodon.scot avatar

@screwtape @fedops @simon_brooke @Jose_A_Alonso
but of course logical consistency between two statements A and B isn't part of the model, so checking it isn't part of the generative algorithm. Only whether statements like A and B are likely to occur together in the training data.

An automated proof sketching engine that also used a language model to reword maths into natural language might be interesting, but natural language itself is not and can't be what proofs are constructed with.

screwtape,
@screwtape@mastodon.sdf.org avatar

@petealexharris
Thanks! I believe automatically using proof assistants using a Large Language Model approach on a large language model approach is an established research topic, with the problem that there is not a Large amount of Language examples on proof assistants. IIRC.

In this case, I was suggesting it had heuristic value due to some established properties of hallucination.
@fedops @simon_brooke @Jose_A_Alonso

screwtape,
@screwtape@mastodon.sdf.org avatar

@petealexharris
ie I believe you are wrong that LLMs are not being explored for proofs.
@fedops @simon_brooke @Jose_A_Alonso

petealexharris,
@petealexharris@mastodon.scot avatar

@screwtape @simon_brooke
Some physicists spent a couple of decades on string theory too.

screwtape,
@screwtape@mastodon.sdf.org avatar

@petealexharris
I would have a look at Jose's timeline of academic articles, including this research skein for context. After which I will feel better except for the comparison to string theorists! Aside, I am an opponent of LLMs for other reasons ( supporting gopher://perma.computer ).
@simon_brooke

otfrom,
@otfrom@functional.cafe avatar

deleted_by_author

  • Loading...
  • simon_brooke,
    @simon_brooke@mastodon.scot avatar

    @otfrom shit job, definition of.

    adamgreenfield,
    @adamgreenfield@social.coop avatar

    @simon_brooke @otfrom If all the jobs that just involved the churn of meaningless signs were eliminated from the economy, there would be no economy. Over the span of just a few months, GANs and LLMs have already seen for the livelihood of a few people I know – people who had managed to retain a certain independence from the métro-boulot-dodo world, as well.

    otfrom,
    @otfrom@functional.cafe avatar

    deleted_by_author

  • Loading...
  • simon_brooke,
    @simon_brooke@mastodon.scot avatar

    @otfrom yes, but backending a natural language parser onto tree search is something I was doing in games programming forty-mumble years ago. It ain't rocket science.

    Alternatecelt,
    @Alternatecelt@mastodon.scot avatar

    @simon_brooke
    Those are some wild 'hallucinations'.
    I keep seeing adverts on the work twitter for using these to 'increase productivity ' and 'improve your management writing '. It's like management classes just can't wait to dive in to having these write their emails etc. Maybe they will end up out of a job because they get sacked for really bad emails....

    otfrom,
    @otfrom@functional.cafe avatar

    deleted_by_author

    simon_brooke,
    @simon_brooke@mastodon.scot avatar

    @otfrom Oh, that is very astute! Permission to plagiarise wildly?

  • All
  • Subscribed
  • Moderated
  • Favorites
  • llm
  • tacticalgear
  • DreamBathrooms
  • cisconetworking
  • magazineikmin
  • InstantRegret
  • Durango
  • thenastyranch
  • Youngstown
  • rosin
  • slotface
  • mdbf
  • khanakhh
  • kavyap
  • everett
  • JUstTest
  • modclub
  • Leos
  • cubers
  • ngwrru68w68
  • ethstaker
  • osvaldo12
  • GTA5RPClips
  • anitta
  • provamag3
  • normalnudes
  • tester
  • megavids
  • lostlight
  • All magazines