Text-to-Graph

Text-to-Graph is a transformation process that converts unstructured text into structured graph representations using Large Language Models (LLMs). This process extracts entities (nodes) and their relationships (edges) from natural language text, creating knowledge graphs that can be stored, queried, and visualized.

Available Implementations:

  • LMBasedGraphTransformer: General-purpose knowledge graph extraction from text

  • LMBasedMindMapTransformer: Hierarchical mind map extraction with central themes and sub-ideas

Installation

# you can use a Conda environment
pip install --extra-index-url https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/ gllm-misc

Quick Start

Here's a simple example to extract a knowledge graph from text:

import asyncio
from gllm_inference.lm_invoker import OpenAILMInvoker
from gllm_misc.graph_transformer import LMBasedGraphTransformer
from gllm_core.schema import Chunk

async def main():
    # 1. Initialize the LM invoker
    lm_invoker = OpenAILMInvoker(model_name="gpt-4o-mini")
    
    # 2. Create the graph transformer
    transformer = LMBasedGraphTransformer(lm_invoker=lm_invoker)
    
    # 3. Extract graph from text
    text = "Marie Curie discovered radium and won the Nobel Prize twice."
    chunks = [Chunk(content=text)]
    graph_docs = await transformer.convert_to_graph_documents(chunks)
    
    # 4. Print results
    graph = graph_docs[0]
    print("Nodes:", [node.id for node in graph.nodes])
    print("Relationships:", [(r.source.id, r.type, r.target.id) for r in graph.relationships])

asyncio.run(main())

Output:

What it does

The Text-to-Graph transformer analyzes text documents and extracts structured graph representations consisting of nodes (entities) and relationships. It uses LLMs to understand the semantic meaning of text and identify relevant entities and their connections.

Inputs

  • Documents: List of Chunk objects containing text content to transform

  • LM Invoker: A language model invoker for entity and relationship extraction

  • Schema Constraints (optional): Allowed node types and relationship types

  • Configuration: Structured output mode, strict mode, and custom prompts

Outputs

The Text-to-Graph transformer returns:

  • GraphDocument: A structured representation containing:

    • Nodes: List of extracted entities with IDs, types, and properties

    • Relationships: List of connections between nodes with types and properties

    • Source: Reference to the original text chunk

Understanding the Output

The GraphDocument object contains:

Customizing Graph Extraction

Constraining Node Types

You can specify which types of entities to extract by providing allowed_nodes:

Constraining Relationship Types

You can control which relationships to extract using allowed_relationships:

Strict Mode vs. Lenient Mode

The strict_mode parameter controls how constraints are enforced:

Mind Map Extraction

The LMBasedMindMapTransformer extends the basic graph transformer to create hierarchical mind map structures. It organizes information into central themes, main ideas, and sub-ideas.

What's a Mind Map?

A mind map is a hierarchical graph structure that represents information radiating from a central concept. Unlike general knowledge graphs, mind maps:

  • Have a single root node (Central Theme)

  • Follow a strict hierarchical structure

  • Organize information by levels of detail

  • Form a connected tree structure

Basic Mind Map Extraction

Mind Map Node Types

The mind map transformer uses three default node types:

  • CentralTheme: The root node representing the main topic (exactly one per mind map)

  • MainIdea: Primary branches from the central theme (2-5 recommended)

  • SubIdea: Supporting details branching from main ideas or other sub-ideas

Mind Map Relationship Types

The mind map uses hierarchical relationships:

  • HAS_MAIN_IDEA: Connects CentralTheme to MainIdea nodes

  • HAS_SUB_IDEA: Connects MainIdea to SubIdea, or SubIdea to SubIdea (for deeper levels)

Last updated