Parent Document Retriever

What's a Parent Document Retriever?

Parent Document Retriever retrieves parent chunks based on child chunk similarity search. It queries a child data store using vector similarity, then retrieves corresponding parent chunks using parent-child metadata links. This pattern enables fine-grained indexing with context-aware retrieval.

Best For:

  • Fine-grained indexing with parent-child relationships

  • Returning larger context (parent chunks) based on precise matches (child chunks)

  • Reducing redundancy in retrieved context

  • Splitting documents for better indexing while preserving context

  • Hierarchical chunk structures

Key Features:

  • Dual data store architecture (child + parent)

  • Vector similarity on child chunks

  • Fulltext retrieval of parent chunks by ID

  • Configurable parent-child relationship fields

  • Deduplication of parent results

  • Flexible parent result capping

Use Cases:

  • Document chunking with paragraph-level indexing, sentence-level retrieval

  • Section-level storage with sentence-level search

  • Hierarchical documents where you index fine-grained content but want broad context

  • Combining small, searchable chunks with larger, more coherent parent documents

chevron-rightPrerequisiteshashtag

You should be familiar with:

  1. Chunkarrow-up-right schema and metadata fields

  2. Data Storearrow-up-right with vector and fulltext capabilities

  3. Parent-child relationships encoded in chunk metadata

Installation

What it does

The Parent Document Retriever queries child chunks using vector similarity, extracts parent chunk identifiers from metadata, then retrieves those parent chunks from a separate data store. This gives you the precision of child-level search with the context of parent-level results.

Basic Usage

Set up two data stores and create a retriever:

Configuring Parent Result Limits

Control the number of parent chunks returned:

Filtering and Score Thresholds

Apply filters and thresholds:

circle-info

Implementation Notes:

  • The top_k parameter controls child chunks fetched, not final parent results

  • Multiple child chunks can map to the same parent (parent deduplication preserves order)

  • Use parent_top_k to cap final parent chunk count

  • Chunks without valid parent metadata are included in results as-is

  • The parent_metadata_field (default: "parent_chunk") should be consistent with your chunking strategy

Last updated

Was this helpful?