Graph RAG

Graph RAG Indexer is a component designed for constructing knowledge graphs from document chunks and indexing them into graph databases for advanced Retrieval-Augmented Generation (RAG) applications.

Prerequisites

This example specifically requires completion of all setup steps listed on the Prerequisites page.

Installation

# you can use a Conda environment
pip install --extra-index-url https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/ "gllm-docproc[kg]"

# you can use a Conda environment
$token = (gcloud auth print-access-token)
pip install --extra-index-url "https://oauth2accesstoken:$token@glsdk.gdplabs.id/gen-ai-internal/simple/" "gllm-docproc[kg]"

# you can use a Conda environment
FOR /F "tokens=*" %T IN ('gcloud auth print-access-token') DO SET TOKEN=%T
pip install --extra-index-url "https://oauth2accesstoken:%TOKEN%@glsdk.gdplabs.id/gen-ai-internal/simple/" "gllm-docproc[kg]"

You can use the following as a sample file: structuredelementchunker-output.json.

LightRAG Graph RAG Indexer

LightRAGGraphRAGIndexer is a lightweight implementation that uses the LightRAG library to create knowledge graphs. It automatically extracts entities and relationships from text, stores them in a graph database, and maintains mappings between source files and their chunks.

Create a script called main.py:

import json

from gllm_inference.em_invoker import OpenAIEMInvoker
from gllm_inference.lm_invoker import OpenAILMInvoker
from gllm_docproc.indexer.graph.light_rag_graph_rag_indexer import LightRAGGraphRAGIndexer
from gllm_datastore.graph_data_store.light_rag_postgres_data_store import LightRAGPostgresDataStore

# Read elements from JSON file
file_path = "./structuredelementchunker-output.json"

with open(file_path, "r", encoding="utf-8") as f:
    elements = json.load(f)

# Initialize LM and Embedding invokers
lm_invoker = OpenAILMInvoker(model_name="gpt-4o-mini")
em_invoker = OpenAIEMInvoker(model_name="text-embedding-3-small")

# Create the LightRAG PostgreSQL data store
graph_store = LightRAGPostgresDataStore(
    lm_invoker=lm_invoker,
    em_invoker=em_invoker,
    postgres_db_host="localhost",
    postgres_db_port=5455,
    postgres_db_user="rag",
    postgres_db_password="rag",
    postgres_db_name="rag",
    postgres_db_workspace="default",
)

indexer = LightRAGGraphRAGIndexer(graph_store=graph_store)
indexer.index(elements, file_id="file_001")

Run the script:

export OPENAI_API_KEY=<OPENAI_API_KEY>
python main.py

LlamaIndex Graph RAG Indexer

LlamaIndexGraphRAGIndexer is a comprehensive implementation using LlamaIndex's PropertyGraphIndex. It provides advanced knowledge graph construction with customizable entity extractors, vector embeddings for nodes, and support for multiple graph database backends.

Create a script called main.py:

import json

from gllm_docproc.indexer.graph.llama_index_graph_rag_indexer import LlamaIndexGraphRAGIndexer
from gllm_datastore.graph_data_store.llama_index_neo4j_graph_rag_data_store import (
    LlamaIndexNeo4jGraphRAGDataStore,
)
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI

# Read elements from JSON file
file_path = "./structuredelementchunker-output.json"

with open(file_path, "r", encoding="utf-8") as f:
    elements = json.load(f)

# Initialize LlamaIndex LLM and Embedding models
llm = OpenAI(model="gpt-4o-mini", temperature=0)
embed_model = OpenAIEmbedding(model="text-embedding-3-small")

# Create Neo4j graph store
graph_store = LlamaIndexNeo4jGraphRAGDataStore(
    username="<NEO4J_USERNAME>",
    password="<NEO4J_PASSWORD>",
    url="<NEO4J_URL>",
)

# Create the indexer with default extractors
indexer = LlamaIndexGraphRAGIndexer(
    graph_store=graph_store,
    llama_index_llm=llm,
    embed_model=embed_model,
)

# Index the elements with document metadata
indexer.index(elements, file_id="file_001")

Run the script:

export OPENAI_API_KEY=<OPENAI_API_KEY>
python main.py

PreviousIndexer NextVector DB

Last updated 1 month ago

Was this helpful?

hashtagInstallation

hashtagLightRAG Graph RAG Indexer

hashtagLlamaIndex Graph RAG Indexer

Installation

LightRAG Graph RAG Indexer

LlamaIndex Graph RAG Indexer