Graph RAG

gllm-docproc | Tutorial : Graph RAG Loader | Use Case: Advanced DPO Pipeline | API Reference

Graph RAG Indexer is a component designed for constructing knowledge graphs from document chunks and indexing them into graph databases for advanced Retrieval-Augmented Generation (RAG) applications.

Prerequisites

This example specifically requires completion of all setup steps listed on the Prerequisites page.

Installation

# you can use a Conda environment
pip install --extra-index-url https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/ "gllm-docproc[kg]"

You can use the following as a sample file: structuredelementchunker-output.json.

LightRAG Graph RAG Indexer

LightRAGGraphRAGIndexer is a lightweight implementation that uses the LightRAG library to create knowledge graphs. It automatically extracts entities and relationships from text, stores them in a graph database, and maintains mappings between source files and their chunks.

1

Create a script called main.py:

import json

from gllm_inference.em_invoker import OpenAIEMInvoker
from gllm_inference.lm_invoker import OpenAILMInvoker
from gllm_docproc.indexer.graph.light_rag_graph_rag_indexer import LightRAGGraphRAGIndexer
from gllm_datastore.graph_data_store.light_rag_postgres_data_store import LightRAGPostgresDataStore

# Read elements from JSON file
file_path = "./structuredelementchunker-output.json"

with open(file_path, "r", encoding="utf-8") as f:
    elements = json.load(f)

# Initialize LM and Embedding invokers
lm_invoker = OpenAILMInvoker(model_name="gpt-4o-mini")
em_invoker = OpenAIEMInvoker(model_name="text-embedding-3-small")

# Create the LightRAG PostgreSQL data store
graph_store = LightRAGPostgresDataStore(
    lm_invoker=lm_invoker,
    em_invoker=em_invoker,
    postgres_db_host="localhost",
    postgres_db_port=5455,
    postgres_db_user="rag",
    postgres_db_password="rag",
    postgres_db_name="rag",
    postgres_db_workspace="default",
)

indexer = LightRAGGraphRAGIndexer(graph_store=graph_store)
indexer.index(elements)
2

Run the script:

export OPENAI_API_KEY=<OPENAI_API_KEY>
python main.py

LlamaIndex Graph RAG Indexer

LlamaIndexGraphRAGIndexer is a comprehensive implementation using LlamaIndex's PropertyGraphIndex. It provides advanced knowledge graph construction with customizable entity extractors, vector embeddings for nodes, and support for multiple graph database backends.

1

Create a script called main.py:

2

Run the script:

Last updated