VTeam AI

Retrieval-Augmented Generation (RAG) for LLMs explained

Diving into the RAG (Retrieval-Augmented Generation) framework, we find it significantly enhances Large Language Models (LLMs) for practical applications. RAG retrieves data from memory to generate pertinent responses. Its structure is built on Document Loaders, Document Transformers, Embedding models, Vector Databases, and Retrievers.

In the past blog posts, we have discussed a lot around LLMs and Langchain looking at the buzz around Gen-AI. Every day, there is something new coming up in this space, be it a new framework or LLM model. Today, we will be discussing one such new framework or pipeline that has made LLMs more useful for real-world problems i.e. RAG aka Retrieval-Augmented Generation.

Sounds like a jargon, what is it?

Simplifying the term one word at a time:

  • Retrieval: Retrieval refers to the process of finding and bringing back something or the act of accessing information from storage or memory
  • Augmented: Enhancing or adding something to an existing object, system, or environment.
  • Generation: It typically refers to the act or process of producing or creating something

If we combine everything, it means extracting information from memory and generating an answer for a given prompt using the extracted information. That is RAG for you.

Divider

But why is it so hyped?

Because RAG helps in connecting external databases with LLMs and helps LLMs get external context. So, with RAG, you can build any sort of app that requires external data be it a recommendation system or a classification model without fine-tuning the LLM on your data. 

Now, that we know what is RAG pipeline is, let's understand its different components.

Document Loaders

This component helps in loading the external resource in memory. You can utilize document loaders to retrieve data from a specified source in the form of Documents. A Document comprises text and related metadata. For instance, document loaders are designed to fetch data from various sources, including simple .txt files, the textual content of web pages, and even transcripts of YouTube videos. Document loaders offer a "load" method to import data as Documents from a preconfigured source. Additionally, they can optionally employ a "lazy load" approach to load data into memory incrementally.

Document Transformers

After loading documents, it's common to require transformations to align them with your specific application. A basic example is the need to divide a lengthy document into smaller segments that fit within your model's context window. LangChain offers a range of built-in document transformers designed to simplify tasks such as splitting, merging, filtering, and other document manipulations.

Embedding model

The Embeddings class serves as an interface for interacting with text embedding models. It acts as a standardized interface for various embedding model providers, including OpenAI, Cohere, Hugging Face, and others. Text embeddings generate vector representations of textual content. This capability is valuable because it allows us to conceptualize text within a vector space. This, in turn, enables us to perform tasks like semantic search, where we can identify text segments that exhibit the highest similarity in the vector space.

Vector Databases/Store

Embedding and storing unstructured data as vectors is a widely used method for data storage and retrieval. This approach involves converting the data into embedding vectors and, during queries, embedding the unstructured query to retrieve the most similar embedding vectors. Vector stores are responsible for managing embedded data and executing vector-based searches. This overview highlights the fundamental aspects of vector stores. An essential aspect of working with vector stores is the creation of vectors, typically generated using embeddings

Retrievers

A retriever is an interface designed to provide documents in response to an unstructured query. It holds a broader scope compared to a vector store. Unlike a vector store, a retriever's primary function is to fetch and deliver documents, not necessarily store them. While vector stores can serve as the foundation for retrievers, various other retriever types are designed for specific purposes.

So these 5 elements form a RAG pipeline. Let's understand with a few demonstrations. 

Q&A using PDF

#!pip install chromadb langchain openai tiktoken unstructured pypdfium2
 
from langchain.chains import RetrievalQA
from langchain.document_loaders import TextLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.document_loaders import PyPDFium2Loader
 
#document loader
loader = PyPDFium2Loader("attention.pdf")
data = loader.load()
 
#document transformer
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(data)
 
#embedding model
embeddings = OpenAIEmbeddings(openai_api_key=api_key)
llm = OpenAI(openai_api_key=api_key)
 
#Vector DB
docsearch = Chroma.from_documents(texts, embeddings)
 
#Retriever
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=docsearch.as_retriever())
Let's ask a few questions from this  'Attention is all you need' paper pdf
A

Q&A using Youtube Video

 
#!pip install langchain openai youtube-transcript-api pytube chromadb tiktoken
from langchain.document_loaders import YoutubeLoader
from langchain.document_loaders import GoogleApiYoutubeLoader
from langchain.chains import RetrievalQA
from langchain.document_loaders import TextLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
 
#Document Loader
loader = YoutubeLoader.from_youtube_url(
    "https://www.youtube.com/watch?v=3w0EhxiebUA",
    add_video_info=True,
    language=["en", "id"],
    translation="en",
)
data = loader.load()
 
#Document transformers
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(data)
 
#Embedding model
embeddings = OpenAIEmbeddings(openai_api_key=api_key)
llm = OpenAI(openai_api_key=api_key)
 
#Vector DB
docsearch = Chroma.from_documents(texts, embeddings)
 
#Retriever
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=docsearch.as_retriever())
Time for Q&A from Youtube video
B

 C

In conclusion, Retrieval Augmented Generation (RAG) presents an exciting and powerful pattern for enhancing the capabilities of Large Language Models (LLMs) in various applications. By seamlessly integrating external data sources and adapting to changing information, RAG offers flexibility and adaptability that can significantly improve LLM performance.

The practical examples and insights shared in this blog post highlight the potential of RAG to revolutionize how we interact with LLMs. Whether you are a data scientist or a product manager, understanding and implementing RAG can open up new horizons in natural language understanding, making LLMs more context-aware and data-responsive.

As we continue to explore the possibilities of RAG-based LLM applications, it becomes clear that this approach extends the utility of LLMs, making them even more valuable tools in our data-driven world. By leveraging the synergy between retrieval and generation, RAG offers a pathway to unlock the full potential of LLMs in various domains.

Stay tuned for more developments and innovations in the realm of RAG for LLMs, as this pattern is poised to shape the future of natural language processing and understanding.

Disclaimer: The views and opinions expressed in this blog post are solely those of the authors and do not reflect the official policy or position of any of the mentioned tools. This blog post is not a form of advertising and no remuneration was received for the creation and publication of this post. The intention is to share our findings and experiences using these tools and is intended purely for informational purposes.