kellogh,
@kellogh@hachyderm.io avatar

i wish i knew more about comparing . anyone have resources? one thing i’ve wondered is how to convert an embedding from a “point” to an “area” or “volume”. e.g. an embedding of a 5 paragraph essay will occupy a single point in embedding space, but if you broke it down (e.g. by paragraph), there would be several points and the whole would presumably be at the center. is there a way to trace the full space a text occupies in space?

mkhoury,
@mkhoury@mastodon.online avatar

@kellogh I don't know that it's true that the whole is at the center of the paragraphs. Because of the attention mechanism, some paragraphs might skew the whole's position more than others. I think it's something to test experimentally.

kellogh,
@kellogh@hachyderm.io avatar

@mkhoury i think you’re right, but i don’t think it needs to be verified — there’s lots of ways to chop up text to mean something entirely, even opposite. there’s an entirely category of jokes dedicated to doing stuff like that

redscroll,
@redscroll@hachyderm.io avatar

@kellogh usually embedding models are going to have a max token length like 500 tokens, unless you’re using longformer or something like that.

kellogh,
@kellogh@hachyderm.io avatar

the best i’ve come up with is to have an LLM break down the text via summarization, which seems like it would work in at least a demo-quality sort of way, but really just seems expensive and error prone idk

kellogh,
@kellogh@hachyderm.io avatar

paging @vicki

vicki,
@vicki@jawns.club avatar

@kellogh what are you looking to solve ultimately?

kellogh,
@kellogh@hachyderm.io avatar

@vicki Take a statement that has 3 big concepts. Compare it to another that has 2 of the same and 3 others that pull it in a very different direction. The embeddings of each are quite different, but there's a lot of overlap

vicki,
@vicki@jawns.club avatar

@kellogh so somewhere between sentiment analysis and topic detection at scale?

kellogh,
@kellogh@hachyderm.io avatar

@vicki yeah, basically

vicki,
@vicki@jawns.club avatar

@kellogh it sound a bit what you’re talking about is averaging embeddings or coming up with unit embeddings that you can then do math on but that collapses like you said, depending on the context. There might be a different way to formulate your problem.

kellogh,
@kellogh@hachyderm.io avatar

@vicki what do you mean by a "unit embedding"?

  • All
  • Subscribed
  • Moderated
  • Favorites
  • LLMs
  • ethstaker
  • DreamBathrooms
  • ngwrru68w68
  • magazineikmin
  • InstantRegret
  • everett
  • thenastyranch
  • Youngstown
  • rosin
  • slotface
  • modclub
  • khanakhh
  • kavyap
  • Durango
  • JUstTest
  • osvaldo12
  • cubers
  • Leos
  • mdbf
  • tacticalgear
  • normalnudes
  • tester
  • GTA5RPClips
  • provamag3
  • cisconetworking
  • anitta
  • megavids
  • lostlight
  • All magazines