Parent Document Retriever
What's a Parent Document Retriever?
Parent Document Retriever retrieves parent chunks based on child chunk similarity search. It queries a child data store using vector similarity, then retrieves corresponding parent chunks using parent-child metadata links. This pattern enables fine-grained indexing with context-aware retrieval.
Best For:
Fine-grained indexing with parent-child relationships
Returning larger context (parent chunks) based on precise matches (child chunks)
Reducing redundancy in retrieved context
Splitting documents for better indexing while preserving context
Hierarchical chunk structures
Key Features:
Dual data store architecture (child + parent)
Vector similarity on child chunks
Fulltext retrieval of parent chunks by ID
Configurable parent-child relationship fields
Deduplication of parent results
Flexible parent result capping
Use Cases:
Document chunking with paragraph-level indexing, sentence-level retrieval
Section-level storage with sentence-level search
Hierarchical documents where you index fine-grained content but want broad context
Combining small, searchable chunks with larger, more coherent parent documents
Prerequisites
You should be familiar with:
Chunk schema and metadata fields
Data Store with vector and fulltext capabilities
EM Invoker for embeddings
Parent-child relationships encoded in chunk metadata
Installation
What it does
The Parent Document Retriever queries child chunks using vector similarity, extracts parent chunk identifiers from metadata, then retrieves those parent chunks from a separate data store. This gives you the precision of child-level search with the context of parent-level results.
Basic Usage
Set up two data stores and create a retriever:
Configuring Parent Result Limits
Control the number of parent chunks returned:
Filtering and Score Thresholds
Apply filters and thresholds:
Implementation Notes:
The
top_kparameter controls child chunks fetched, not final parent resultsMultiple child chunks can map to the same parent (parent deduplication preserves order)
Use
parent_top_kto cap final parent chunk countChunks without valid parent metadata are included in results as-is
The
parent_metadata_field(default:"parent_chunk") should be consistent with your chunking strategy
Last updated
Was this helpful?