Replies - badlogic - kbin.social

This profile is from a federated server and may be incomplete. Browse more on the original instance.

badlogic, 2 days ago to random

EU elections in June, so I built a new thing: "EU Election Program Explorer".

https://wahlomat.marioslab.io

Visually explore statements made in Austrian parties' EU election programs, see where parties overlap and where they don't, etc.

How it works:

Take program PDFs of each party, extract plain text for each page

Let GPT 3.5 extract key statements from each page

Embed each statement into high-dim vector

Project to 2D space

Slap web UI on top of it (pan/zoom/select/keyword search)

video/mp4

reply

expand (13)

collapse (13)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ Migueldeicaza

badlogic, 2 days ago

All the preprocessing is about 200 LOC (including PDF parsing, talking to the LLM, embedding & projection), thanks to the easy to use libraries available nowadays. You just need to know what you're doing. :D

Code here:
https://github.com/badlogic/wahlomat/blob/main/preprocessing/main.ipynb

(see also llm.py and pdftools.py, which I stole and modified from somewhere on the web)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

badlogic, 2 days ago

"Mario, this could have been a chatbot". That's actually what I wanted to do first, but that 1) costs more to provide it on a level that doesn't hallucinate half the time and 2) is less exploratory in nature. You must know what you are interested in and wouldn't be able to find serendipitous info.

E.g. I didn't know most parties fucking LOVE trains.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

badlogic, 2 days ago

It also wouldn't have allowed me to grasp, that the greens are all over the place with their statements and repeat themselves a lot in their program, while the Nazis have barely any program to speak of.

image/jpeg

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ peturdainn

badlogic, 2 days ago

Side note: I built the stupid pan/zoom scatter plot myself cause every single library out there that's supposed to do that is ... not great.

It's not perfect, but it is mine (and now you can use it too if you don't mind Lit Elements. You can easily extract the logic to vanilla JS tho)

Funny how that took 2x the LOC than was needed for all the fancy ML-y stuff. Sigh.

https://github.com/badlogic/wahlomat/blob/main/src/utils/plot.ts

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

badlogic, 2 days ago

And as always, if you find this kind of "work" entertaining/imformative, and if you have disposable income, consider supporting our charity:

https://cards-for-ukraine.at

We are a zero-overhead charity. Ever donation cent goes towards food vouchers for 🇺🇦 families (mostly women and their kids) in 🇦🇹. We pay everything else and do the labor for free.

All invoices, payment confirmations etc. here:
https://drive.google.com/drive/folders/1PxOL8A44bIRU1Hdoq87_2iXSLNmnMXQr

2 years and still going strong. Over €250k in vouchers delivered.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

badlogic, 2 days ago

@Aaron I'm first taking the statement text (which is usually just a single sentence) and embed it into a high dimensional space (~1500 dimensions) woth OpenAI's text embedding model.

This vector basically encodes the semantics of the statement text.

Embedding vectors of statement texts with similar or related semantics, will end up in the same area of that high dimensional space.

The closer two vectors are in that space, the more semantically similar they are.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

badlogic, 2 days ago

@Aaron now, we can obviously not draw 1500 dimensions. We can thus apply a method called projection (which is also a form of embedding). In this case, I use a method call UMAP.

Simplified: it takes the high dimensional vectors. For each vector it tries to find a few closest vectors.

It then assigns 2 dimensional coordinates to those vectors in such a way, that their distances are similar in 2D to what they are in the high dimensional space.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

badlogic, 2 days ago

@Aaron This way, the 2D projection retains the "neighbourhoods" that exist in the richer high dimensional space.

The end result is, that semantically similar or related points end up in the same area in 2D as well, which is nice for visualization purposes. We can clearly see clusters of points for different topics.

Neither the original neighbourhoods in the high dimensional space nor the 2D neighbourhoods are perfect of course. But it's plenty good enough for this purpose.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

badlogic, 2 days ago

@Aaron UMAP has kind of become the standard method when you want to project high dimensional vectors to 2 or 3 dimensions for visualization.

For embedding text (single words, sentences, paragraphs, etc.) you have more options. I was lazy, so I used OpenAI's embedding model through their API.

A popular alternative is sentence BERT:
https://sbert.net/

For more details on text embeddings, start with word embeddings, then move up to sentence embeddings.

https://en.wikipedia.org/wiki/Word_embedding

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

badlogic, 2 days ago

@Aaron TL;DR: it's not just word frequencies.

Word embeddings are vectors that encode possible semantics of a single word.

Sentence embeddings are vectors that capture the semantics of an entire sentence. It starts with word embeddings for each word in the sentence, which are the distilled, to resolve ambiguities/references between words, ending up with a single vector that stores all the "meaning" in the sentence.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

badlogic, 2 days ago

@Aaron enjoy! Just get too hung up on understanding the nitty gritty details. For projects like the above, all you need to understand is:

text goes in, vector comes out

similar texts have vectors that will be close to each other

The rest is just measuring distances or angles between those vectors.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

sinbad, 2 days ago to random

I have to keep reminding myself that I don’t hate tech per se, I just hate the tech industry

Tech can still be good. A lot of the time it’s not, because of the industry. But tech can still be good. I sometimes try to build some of it, that can be fun. There are pockets of people still doing that despite the best efforts of a hypercluster of bellends, managing and thought leadering everything off the nearest cliff because they got a whiff of extremely stupid money from that direction

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ Kiloku, jonikorpi

badlogic, 2 days ago

@sinbad sometimes useful things fall out of that pile of money that can be used by us plebs to do meaningful things. It's not all bad.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

sinbad, 29 days ago to random

F**ks sake they changed something pretty fundamental between UE 5.4 Preview and UE 5.4 Final - the ability to have multiple objects in an asset file, which SUDS relies on - the dialogue and string table are in the same asset; now the string table is gone.

I was worried they might do this because they started hiding them in 5.3 (not a problem) so I tested 5.4 Preview but everything was fine. Now it's completely broken in 5.4 Final, every single dialogue line is <MISSING STRING TABLE ENTRY> 😠

reply

expand (35)

collapse (35)

report

activity

copy /kbin url

copy original url

open original url

Loading...

badlogic, 21 days ago

@sinbad Sure hope you have more luck getting Epic to even recognize the issue. We had to break every user's project...

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

badlogic, 21 days ago

@sinbad Hope!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

accidentlyAnton, 24 days ago to austria

What's the latest status on ID Austria? I keep seeing that it doesn't work most of the time. I'm still on Handysignatur and that works perfectly fine. Some services push me into "upgrading". Should I delay switching to it as long as possible? #austria

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

badlogic, 22 days ago

@accidentlyAnton honestly no idea. Had to upgrade. Works. Most of the time.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...