textvr, to generativeAI German
@textvr@berlin.social avatar

Stefano Fancello is talking about , an open-source Python-based toolkit for Retrieval Augmentation Generation . It helps preparing your own data as a context for a question you send to a Large Language Model . Langchain tools can ingest all kinds of document formats, split documents into Chunks, and create so called and send it to the LLM.

paulox, to PostgreSQL
@paulox@fosstodon.org avatar

pgvector 0.6.0 has been released 🎉

It now supports parallel index builds for HNSW 🗂️

Building an HNSW index is now up to 30x faster for unlogged tables ⚡

For full release notes, please review the pgvector changelog 👇
https://github.com/pgvector/pgvector/blob/master/CHANGELOG.md#060-2024-01-29

kellogh, to LLMs
@kellogh@hachyderm.io avatar

not sure if this exists — i wish i could prompt an embedding model.

i.e. provide context for the prompt to be interpreted, but not have the prompt contribute to the embedding value. like, the prompt could have concepts in it that aren’t lit up at all in the embedding, unless the core text references them

kellogh,
@kellogh@hachyderm.io avatar

for everything i’ve seen, it seems the answer is to fine tune the model. but what if i want a lighter version of that?

cpbotha, to running
@cpbotha@emacs.ch avatar

Charl's log, Earth date Sunday 2023-11-27:

#log #lifelog #running #orgmode #llm #embeddings

cpbotha, to emacs
@cpbotha@emacs.ch avatar

AM EXCITE!

Witness my Emacs org-mode + Jina AI embeddings Frankenstein!

https://youtu.be/cHQx4ITQRNU

From the video description:

I've dreamed about this for quite some time now, and now I've finally been able to cobble it together!

What you're seeing, is org-roam node (subtree or file) live Jina AI (fully local) similarity search in the org-roam buffer, along with your backlinks and reflinks. This automatically surfaces other org-roam nodes which are related to the one you're currently reading, or even working on!

This open source setup currently works as follows:

  • export all of your org-roam nodes as text files using supplied emacs-lisp
  • use embed.py to calculate embeddings for all of these txt files and store them in a parq file
  • run serve.py which waits for submission of any text to return the N closest node ids, according to the Jina AI learned embeddings. These are really quite good and fully local, but it would be straight-forward to use a service like OpenAI embeddings for everything
  • There is more emacs lisp that customizes the org-roam buffer setup to call to serve.py's endpoint and renders the list of similar nodes

The source directory for this is still in shambles. I'll try and make some time the coming days to clean up and push to https://github.com/cpbotha/org-roam-similarity

#emacs #orgroam #ai #embeddings

clintlalonde, to random
@clintlalonde@mastodon.oeru.org avatar

deleted_by_author

  • Loading...
  • cogdog,

    @sleslie @clintlalonde That’s logical to want a more purely trained LLM, but isnt amount needed super large? I’m trying to warp my head around it but if I read Simon Wilkinson, we don’t necessarily need to build new LLMs but understand how to deploy embeddings to the source content we want it to draw from? https://simonwillison.net/2023/Aug/27/wordcamp-llms/#embeddings I read all the explanations of them being 1563 dimension vectorized representations of tokens but struggle to grok it

    schizanon, to llm
    @schizanon@mas.to avatar

    > are a technology that’s adjacent to the wider field of —the technology behind and

    @simon

    https://simonwillison.net/2023/Oct/23/embeddings/

    kellogh, to LLMs
    @kellogh@hachyderm.io avatar

    i wish i knew more about comparing . anyone have resources? one thing i’ve wondered is how to convert an embedding from a “point” to an “area” or “volume”. e.g. an embedding of a 5 paragraph essay will occupy a single point in embedding space, but if you broke it down (e.g. by paragraph), there would be several points and the whole would presumably be at the center. is there a way to trace the full space a text occupies in space?

    lysander07, to llm

    Many new and interesting topics in our upcoming - Foundations and Applications online lecture at

    lysander07, to fediverse German

    Keynote by Heiko Paulheim (still not arrived here in the ) at RuleML+RR 2023 in Oslo. Talk: "Knowledge Graph Embeddings meet Symbolic Schemas, or: what do they Actually Learn?" Slides: https://www.uni-mannheim.de/media/Einrichtungen/dws/Files_People/Profs/heiko/talks/RuleMLRR_2023.pdf
    +rr of course, this slide had to be from Heiko 😉 @heikopaulheim

    Dreamwieber, to random
    @Dreamwieber@sigmoid.social avatar

    I created vector embeddings of the entire Alan Watts lecture archive and as they processed they flowed like waves across my terminal.

    I thought was beautiful and oddly fitting, so I captured it.

    video/mp4

  • All
  • Subscribed
  • Moderated
  • Favorites
  • megavids
  • kavyap
  • DreamBathrooms
  • thenastyranch
  • magazineikmin
  • InstantRegret
  • GTA5RPClips
  • Youngstown
  • everett
  • slotface
  • rosin
  • osvaldo12
  • mdbf
  • ngwrru68w68
  • JUstTest
  • cubers
  • modclub
  • normalnudes
  • tester
  • khanakhh
  • Durango
  • ethstaker
  • tacticalgear
  • Leos
  • provamag3
  • anitta
  • cisconetworking
  • lostlight
  • All magazines