Text-to-Graph

Text-to-Graph is a transformation process that converts unstructured text into structured graph representations using Large Language Models (LLMs). This process extracts entities (nodes) and their relationships (edges) from natural language text, creating knowledge graphs that can be stored, queried, and visualized.

Available Implementations:

LMBasedGraphTransformer: General-purpose knowledge graph extraction from text
LMBasedMindMapTransformer: Hierarchical mind map extraction with central themes and sub-ideas

Installation

# you can use a Conda environment
pip install --extra-index-url https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/ "gllm-misc[json_repair]" openai

# you can use a Conda environment
FOR /F "tokens=*" %T IN ('gcloud auth print-access-token') DO pip install --extra-index-url "https://oauth2accesstoken:%T@glsdk.gdplabs.id/gen-ai-internal/simple/" "gllm-misc[json_repair]"

Before using the Text-to-Graph, you should be familiar with:

LM Invoker: For converting text into graphs by using generative language models

Quick Start

Here's a simple example to extract a knowledge graph from text:

import asyncio
from gllm_inference.lm_invoker import OpenAILMInvoker
from gllm_misc.graph_transformer import LMBasedGraphTransformer
from gllm_core.schema import Chunk

async def main():
    # 1. Initialize the LM invoker
    lm_invoker = OpenAILMInvoker(
        model_name="gpt-4o-mini",
        api_key="<YOUR_OPENAI_API_KEY>"
    )
    
    # 2. Create the graph transformer
    transformer = LMBasedGraphTransformer(lm_invoker=lm_invoker)
    
    # 3. Extract graph from text
    text = "Marie Curie discovered radium and won the Nobel Prize twice."
    chunks = [Chunk(content=text)]
    graph_docs = await transformer.convert_to_graph_documents(chunks)
    
    # 4. Print results
    graph = graph_docs[0]
    print("Nodes:", [node.id for node in graph.nodes])
    print("Relationships:", [(r.source.id, r.type, r.target.id) for r in graph.relationships])

asyncio.run(main())

Output:

Nodes: ['Marie Curie', 'radium', 'Nobel Prize']
Relationships: [('Marie Curie', 'DISCOVERED', 'radium'), ('Marie Curie', 'WON', 'Nobel Prize')]

What it does

The Text-to-Graph transformer analyzes text documents and extracts structured graph representations consisting of nodes (entities) and relationships. It uses LLMs to understand the semantic meaning of text and identify relevant entities and their connections.

Inputs

Documents: List of Chunk objects containing text content to transform
LM Invoker: A language model invoker for entity and relationship extraction
Schema Constraints (optional): Allowed node types and relationship types
Configuration: Structured output mode, strict mode, and custom prompts

Outputs

The Text-to-Graph transformer returns:

GraphDocument: A structured representation containing:
- Nodes: List of extracted entities with IDs, types, and properties
- Relationships: List of connections between nodes with types and properties
- Source: Reference to the original text chunk

Understanding the Output

The GraphDocument object contains:

# Access nodes
for node in graph_doc.nodes:
    print(f"ID: {node.id}")
    print(f"Type: {node.type}")
    print(f"Properties: {node.properties}")

# Access relationships
for relationship in graph_doc.relationships:
    print(f"Source: {relationship.source.id}")
    print(f"Target: {relationship.target.id}")
    print(f"Type: {relationship.type}")
    print(f"Properties: {relationship.properties}")

# Access source document
print(f"Original text: {graph_doc.source.content}")

Customizing Graph Extraction

Constraining Node Types

You can specify which types of entities to extract by providing allowed_nodes:

# Extract only specific entity types
transformer = LMBasedGraphTransformer(
    lm_invoker=lm_invoker,
    allowed_nodes=["Person", "Organization", "Location", "Event"],
    strict_mode=True  # Only extract specified node types
)

chunks = [Chunk(content="Elon Musk founded SpaceX in California in 2002.")]
graph_docs = await transformer.convert_to_graph_documents(chunks)

# Result will only contain Person, Organization, Location, and Event nodes

Constraining Relationship Types

You can control which relationships to extract using allowed_relationships:

# Option 1: Simple relationship type list
transformer = LMBasedGraphTransformer(
    lm_invoker=lm_invoker,
    allowed_relationships=["WORKS_AT", "FOUNDED", "LOCATED_IN", "MANAGES"]
)

# Option 2: Typed relationships (source_type, relationship, target_type)
transformer = LMBasedGraphTransformer(
    lm_invoker=lm_invoker,
    allowed_nodes=["Person", "Organization", "Location"],
    allowed_relationships=[
        ("Person", "WORKS_AT", "Organization"),
        ("Person", "FOUNDED", "Organization"),
        ("Organization", "LOCATED_IN", "Location"),
        ("Person", "MANAGES", "Person")
    ],
    strict_mode=True
)

chunks = [Chunk(content="Alice works at TechCorp, which is located in San Francisco.")]
graph_docs = await transformer.convert_to_graph_documents(chunks)

Strict Mode vs. Lenient Mode

The strict_mode parameter controls how constraints are enforced:

# Strict mode: Only extract specified types
transformer_strict = LMBasedGraphTransformer(
    lm_invoker=lm_invoker,
    allowed_nodes=["Person", "Company"],
    allowed_relationships=["WORKS_AT"],
    strict_mode=True  # Filters out any nodes/relationships not in allowed lists
)

# Lenient mode: Use constraints as guidance but allow other types
transformer_lenient = LMBasedGraphTransformer(
    lm_invoker=lm_invoker,
    allowed_nodes=["Person", "Company"],
    allowed_relationships=["WORKS_AT"],
    strict_mode=False  # May extract additional types beyond the allowed lists
)

Mind Map Extraction

The LMBasedMindMapTransformer extends the basic graph transformer to create hierarchical mind map structures. It organizes information into central themes, main ideas, and sub-ideas.

What's a Mind Map?

A mind map is a hierarchical graph structure that represents information radiating from a central concept. Unlike general knowledge graphs, mind maps:

Have a single root node (Central Theme)
Follow a strict hierarchical structure
Organize information by levels of detail
Form a connected tree structure

Basic Mind Map Extraction

import asyncio
from gllm_inference.lm_invoker import OpenAILMInvoker
from gllm_misc.graph_transformer import LMBasedMindMapTransformer
from gllm_core.schema import Chunk

async def extract_mind_map():
    # Initialize the LM invoker
    lm_invoker = OpenAILMInvoker(
        model_name="gpt-4o-mini",
        api_key="<YOUR_OPENAI_API_KEY>"
    )
    
    # Create the mind map transformer
    transformer = LMBasedMindMapTransformer(lm_invoker=lm_invoker)
    
    # Prepare text
    text = """
    Artificial Intelligence is transforming industries through machine learning 
    and deep learning. Machine learning includes supervised learning techniques 
    like classification and regression, as well as unsupervised learning methods 
    like clustering. Deep learning uses neural networks with multiple layers to 
    process complex patterns in data.
    """
    chunks = [Chunk(content=text)]
    
    # Extract mind map
    mind_map_docs = await transformer.convert_to_graph_documents(chunks)
    
    # Access the mind map structure
    mind_map = mind_map_docs[0]
    
    print("Mind Map Structure:")
    for node in mind_map.nodes:
        print(f"  [{node.type}] {node.id}")
    
    print("\nHierarchical Relationships:")
    for rel in mind_map.relationships:
        print(f"  {rel.source.id} --[{rel.type}]--> {rel.target.id}")
    
    return mind_map_docs

# Run the extraction
asyncio.run(extract_mind_map())

Mind Map Node Types

The mind map transformer uses three default node types:

CentralTheme: The root node representing the main topic (exactly one per mind map)
MainIdea: Primary branches from the central theme (2-5 recommended)
SubIdea: Supporting details branching from main ideas or other sub-ideas

Mind Map Relationship Types

The mind map uses hierarchical relationships:

HAS_MAIN_IDEA: Connects CentralTheme to MainIdea nodes
HAS_SUB_IDEA: Connects MainIdea to SubIdea, or SubIdea to SubIdea (for deeper levels)

PreviousGraph Retriever NextSpeech

Last updated 22 days ago

Was this helpful?

hashtagAvailable Implementations:

hashtagInstallation

hashtagQuick Start

hashtagWhat it does

hashtagInputs

hashtagOutputs

hashtagUnderstanding the Output

hashtagCustomizing Graph Extraction

hashtagConstraining Node Types

hashtagConstraining Relationship Types

hashtagStrict Mode vs. Lenient Mode

hashtagMind Map Extraction

hashtagWhat's a Mind Map?

hashtagBasic Mind Map Extraction

hashtagMind Map Node Types

hashtagMind Map Relationship Types