joe, (edited ) to ai

Back in January, we started looking at AI and how to run a large language model (LLM) locally (instead of just using something like ChatGPT or Gemini). A tool like Ollama is great for building a system that uses AI without dependence on OpenAI. Today, we will look at creating a Retrieval-augmented generation (RAG) application, using Python, LangChain, Chroma DB, and Ollama. Retrieval-augmented generation is the process of optimizing the output of a large language model, so it references an authoritative knowledge base outside of its training data sources before generating a response. If you have a source of truth that isn’t in the training data, it is a good way to get the model to know about it. Let’s get started!

Your RAG will need a model (like llama3 or mistral), an embedding model (like mxbai-embed-large), and a vector database. The vector database contains relevant documentation to help the model answer specific questions better. For this demo, our vector database is going to be Chroma DB. You will need to “chunk” the text you are feeding into the database. Let’s start there.

Chunking

There are many ways of choosing the right chunk size and overlap but for this demo, I am just going to use a chunk size of 7500 characters and an overlap of 100 characters. I am also going to use LangChain‘s CharacterTextSplitter to do the chunking. It means that the last 100 characters in the value will be duplicated in the next database record.

The Vector Database

A vector database is a type of database designed to store, manage, and manipulate vector embeddings. Vector embeddings are representations of data (such as text, images, or sounds) in a high-dimensional space, where each data item is represented as a dense vector of real numbers. When you query a vector database, your query is transformed into a vector of real numbers. The database then uses this vector to perform similarity searches.

https://i0.wp.com/jws.news/wp-content/uploads/2024/05/Screenshot-2024-05-08-at-2.36.49%E2%80%AFPM.png?resize=665%2C560&ssl=1

You can think of it as being like a two-dimensional chart with points on it. One of those points is your query. The rest are your database records. What are the points that are closest to the query point?

Embedding Model

To do this, you can’t just use an Ollama model. You need to also use an embedding model. There are three that are available to pull from the Ollama library as of the writing of this. For this demo, we are going to be using nomic-embed-text.

Main Model

Our main model for this demo is going to be phi3. It is a 3.8B parameters model that was trained by Microsoft.

LangChain

You will notice that today’s demo is heavily using LangChain. LangChain is an open-source framework designed for developing applications that use LLMs. It provides tools and structures that enhance the customization, accuracy, and relevance of the outputs produced by these models. Developers can leverage LangChain to create new prompt chains or modify existing ones. LangChain pretty much has APIs for everything that we need to do in this app.

The Actual App

Before we start, you are going to want to pip install tiktoken langchain langchain-community langchain-core. You are also going to want to ollama pull phi3 and ollama pull nomic-embed-text. This is going to be a CLI app. You can run it from the terminal like python3 app.py "<Question Here>".

You also need a sources.txt file containing the URLs of things that you want to have in your vector database.

So, what is happening here? Our app.py file is reading sources.txt to get a list of URLs for news stories from Tuesday’s Apple event. It then uses WebBaseLoader to download the pages behind those URLs, uses CharacterTextSplitter to chunk the data, and creates the vectorstore using Chroma. It then creates and invokes rag_chain.

Here is what the output looks like:

https://i0.wp.com/jws.news/wp-content/uploads/2024/05/Screenshot-2024-05-08-at-4.09.36%E2%80%AFPM.png?resize=1024%2C845&ssl=1

The May 7th event is too recent to be in the model’s training data. This makes sure that the model knows about it. You could also feed the model company policy documents, the rules to a board game, or your diary and it will magically know that information. Since you are running the model in Ollama, there is no risk of that information getting out, too. It is pretty awesome.

Have any questions, comments, etc? Feel free to drop a comment, below.

https://jws.news/2024/how-to-build-a-rag-system-using-python-ollama-langchain-and-chroma-db/

KathyReid, to stackoverflow
@KathyReid@aus.social avatar

I just issued a data deletion request to #StackOverflow to erase all of the associations between my name and the questions, answers and comments I have on the platform.

One of the key ways in which #RAG works to supplement #LLMs is based on proven associations. Higher ranked Stack Overflow members' answers will carry more weight in any #LLM that is produced.

By asking for my name to be disassociated from the textual data, it removes a semantic relationship that is helpful for determining which tokens of text to use in an #LLM.

If you sell out your user base without consultation, expect a backlash.

vincentbiret, to ai
@vincentbiret@hachyderm.io avatar
johnleonard, to infosec
@johnleonard@mastodon.social avatar

Experimental Morris II worm can exploit popular AI services to steal data and spread malware

Cornell researchers created worm 'to serve as a whistleblower'

https://www.computing.co.uk/news/4203370/experimental-morris-ii-worm-exploit-popular-ai-services-steal-spread-malware

kjr, to llm
@kjr@babka.social avatar

I am trying to build a RAG with LLAMA 3 and... getting really crazy with the strange formats I get in the response....
Not only the response, but additional text, XML tags...

savvykenya, to LLMs
@savvykenya@famichiki.jp avatar

If you have documents with the answers you're looking for, why not search the documents directly? Why are you embedding the documents then using (Retrieval Augmenter Generation) to make a large language model give you answers? An LLM generates text, it doesn't search a DB to give you results. So just search the damn DB directly, we already have great search algorithms with O(1) retrieval speeds! are so stupid.

ramikrispin, to llm
@ramikrispin@mstdn.social avatar

RAG from Scratch with LangChain 🦜👇🏼

FreeCodeCamp released today a new course on building RAG from scratch with LangChain. The course, which is by Lance Martin from LangChain, focuses on the foundations of Retrieval Augmented Generation (RAG).

Course 📽️: https://www.youtube.com/watch?v=sVcwVQRHIc8
Code 🔗: https://github.com/langchain-ai/rag-from-scratch

KathyReid, to generativeAI
@KathyReid@aus.social avatar

"We were supposed to research , not embrace it as a business model!" implored the DVC Research.

The Vice-Chancellor sighed audibly and exhaled.

"We're out of options."

She raised her hands, palms up, reminiscent of prayer.

"The research grants don't cover the research we do, much less the research we want to do.

International students have declined 20% year on year since India, China and Indonesia have on-shore partnerships with Deakin and Monash that still get the grads a permanent residency.

We have PhDs teaching most of the undergrad courses. The endowment took a major hit when the stock market crashed in '25.

Federation's gone bust, Adelaide's half the size it was before the merger, and you've seen CQ merge with SCU and James Cook and Charles Darwin just to be viable."

She took a sharp inhalation of burnt autumn air.

"It's tens of millions a year in recurring revenue. That's a School's worth of people."

"What do they get?"

"All the data, and lecture recordings."

"All of it?"

"Yeah, then new deltas each semester."

"So, what's to stop them using it to create that mimics a lecturer?"

"Good point, I suspect that's what their end game is."

"Given our smarts, couldn't we do that ourselves?

Use our LMS data and lecture recordings to build a personal assistant for students, you know, Diamond Age style?

Like an always-available personal tutor? Use , make sure it doesn't spit out bullshit?

We have the best in the world. Why sacrifice a long term advantage for short-term money?

What if we built it ourselves and white-labelled it for other ?"

The Vice-Chancellor raised her eyebrow.

And so was born.

TO BE CONTINUED

remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

: "There are two reasons why using a publicly available LLM such as ChatGPT might not be appropriate for processing internal documents. Confidentiality is the first and obvious one. But the second reason, also important, is that the training data of a public LLM did not include your internal company information. Hence that LLM is unlikely to give useful answers when asked about that information.

Enter retrieval-augmented generation, or RAG. RAG is a technique used to augment an LLM with external data, such as your company documents, that provide the model with the knowledge and context it needs to produce accurate and useful output for your specific use case. RAG is a pragmatic and effective approach to using LLMs in the enterprise.

In this article, I’ll briefly explain how RAG works, list some examples of how RAG is being used, and provide a code example for setting up a simple RAG framework." https://www.infoworld.com/article/3712860/retrieval-augmented-generation-step-by-step.html

KathyReid, to opensource
@KathyReid@aus.social avatar

The @thoughtworks #TechRadar for April 2024 is worth a read - covers the movement away from #OpenSource licenses (the @osi gets a mention for their stewardship of licenses to date, and flags challenges), the rise of #AI code generation and the impacts on #CI and #CD workflows (AI code generators change workflow patterns), and #architectures for #LLM creation - we're seeing distinct patterns emerge through #RAG.

I've always enjoyed the Tech Radar format - it simplifies the complexity of a fast-changing landscape and makes it tractable for decision makers.

https://www.thoughtworks.com/radar

befreax, to LLMs
@befreax@mastodon.social avatar

This has been fun to learn about , and their behavior on modern ; I just push my simple based that uses Mistral 7B for inference that is (hopefully) easy to instrument: https://github.com/tmetsch/rusty_llm

An here is the matching image generated by -E a rusting llama being inspected while being in mistral winds.

ramikrispin, to llm
@ramikrispin@mstdn.social avatar

RAG From Scratch - Langchain Tutorial 🦜👇🏼

The RAG From Scratch is a crash course by Lance Martin from LangChain focusing on the foundations of Retrieval Augmented Generation (RAG). This tutorial covers the process of index, retrieval, and generation of a query from scratch 🚀.

Video 📽️: https://www.youtube.com/playlist?list=PLfaIDFEXuae2LXbO1_PKyVJiQ23ZztA0x
Code 🔗: https://github.com/langchain-ai/rag-from-scratch

Image credit: Tutorial slides

kellogh, to LLMs
@kellogh@hachyderm.io avatar

is developing so fast. imo it’s probably going to be the winning approach — a vector DB isn’t a database, it’s an index, so it belongs inside the SQL RDBMS, not some external system that you have to manage https://jkatz05.com/post/postgres/distributed-pgvector/

TheNewStack, to LLMs
@TheNewStack@hachyderm.io avatar

Writer CEO May Habib says its semantic graphing approach is an alternative to the chunking process of using vector databases. https://thenewstack.io/writer-coms-graph-based-rag-alternative-to-vector-retrieval/

ricmac, to LLMs
@ricmac@mastodon.social avatar

I interviewed Writer CEO May Habib about its semantic graphing approach to AI applications. It's an alternative to the much more common "chunking" process of RAG using vector databases. Is this an opportunity for graph databases? https://thenewstack.io/writer-coms-graph-based-rag-alternative-to-vector-retrieval/

blinry, to random
@blinry@chaos.social avatar

@leftpaddotpy Just watched your talk on finding things in nixpkgs, and wanted to say thanks! Think that will be really helpful for me!

feliks,
@feliks@chaos.social avatar

@blinry Thanks for the link

@leftpaddotpy Thanks for the talk. Learned a few new things. Wanted to mention nil integration into vim but that's been touched upon close enough in the last question. If someone asks again about how to get true value out of LLMs in this context you might steer people towards . This might enable contextualized auto completion but wouldn't be trivial to build unfortunately

ErikJonker, to ai
@ErikJonker@mastodon.social avatar

Nice article about Retrieval Augmented Generation (RAG) and how you can evaluate it. It also helped me to better understand RAG in a general sense.

https://huggingface.co/learn/cookbook/rag_evaluation

kellogh, to LLMs
@kellogh@hachyderm.io avatar

looks like the community is starting to dial in on for . i expect this will accelerate https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/

rimmon1971, to random
rimmon1971, to random

[ A story from Florian June on Medium ]
Read this story from Florian June on Medium: https://pub.towardsai.net/advanced-rag-02-unveiling-pdf-parsing-b84ae866344e

LChoshen, to ArtificialIntelligence
@LChoshen@sigmoid.social avatar

Happy to share our paper:

Genie🧞: Achieving Human Parity
in Content-Grounded Datasets Generation

was accepted to

From your content
Genie creates content-grounded data
of magical quality ✨
Rivaling human-based datasets!

https://arxiv.org/abs/2401.14367
a

thomasrenkert, to generativeAI German
@thomasrenkert@hcommons.social avatar

Is there any tutorial (or just better docs) on how to use with graph databases ()like or to build a with data? Maybe somebody even knows how to use LangChain together with ?

@machinelearning

textvr, to generativeAI German
@textvr@berlin.social avatar

Stefano Fancello is talking about , an open-source Python-based toolkit for Retrieval Augmentation Generation . It helps preparing your own data as a context for a question you send to a Large Language Model . Langchain tools can ingest all kinds of document formats, split documents into Chunks, and create so called and send it to the LLM.

alatitude77, to LLMs
@alatitude77@mastodon.social avatar
changelog, to Software
@changelog@changelog.social avatar

🗞 New episode of Changelog News!

🙏 A plea for lean software
💽 PocketBase is a backend in 1 file
🏭 Vanna AI framework for text-to-SQL
💭 Henrik Karlsson on what to focus on
🏡 Calvin Wankhede’s offline smart home
🎙 hosted by @jerod

🎧 https://changelog.com/news/77

  • All
  • Subscribed
  • Moderated
  • Favorites
  • anitta
  • thenastyranch
  • magazineikmin
  • tacticalgear
  • InstantRegret
  • ngwrru68w68
  • Durango
  • Youngstown
  • slotface
  • mdbf
  • rosin
  • PowerRangers
  • kavyap
  • DreamBathrooms
  • normalnudes
  • vwfavf
  • hgfsjryuu7
  • cisconetworking
  • osvaldo12
  • everett
  • ethstaker
  • GTA5RPClips
  • khanakhh
  • tester
  • modclub
  • cubers
  • Leos
  • provamag3
  • All magazines