Introduction to RAG

What is Retrieval-Augmented Generation (RAG) ?

Retrieval-Augmented Generation (RAG) is an AI technique that combines the power of large language models with external knowledge sources. Instead of relying solely on the information learned during training, RAG systems can "retrieve" relevant information from databases, documents, or other sources in real-time and use that information to generate more accurate, up-to-date, and contextually relevant responses.

Think of RAG as giving an AI assistant access to a vast library. When you ask a question, the AI doesn't just rely on what it memorized during training - it can quickly search through the library, find relevant books or documents, read the pertinent sections, and then provide an answer based on that current information.

What does an RAG pipeline entail?

Source: Medium

The RAG pipeline involves a blend of retrieval and generation. Here's a breakdown of the steps, as illustrated in the diagram:

1

Query

The user submits a question or request.

2

Embedding

The query is passed through an embedding model, which converts it into a dense vector – a numerical representation that captures the meaning of the text.

3

Retrieval

This vector is then used to search a Vector Database for similar content. The database returns the most relevant documents or chunks, called retrieved contexts.

4

Augmentation + Generation

The retrieved contexts are combined with the original query and sent to a Large Language Model (LLM). The model uses this enriched input to generate a well-informed response.

5

Response

The final answer is returned to the user, grounded in both the model’s knowledge and the external information.

Last updated