Document Processing Orchestrator

gllm-docproc | Related tutorials: Build Document Processing Pipeline | API Reference

Language models (LMs) are powerful, but they don't have information about your private document.

Document Processing Orchestrator (DPO) lets you process your documents and store them into a retrieval source (e.g. vector database, graph database, SQL database). The data will then be used in Retrieval process.

Our DPO components allow you to:

  1. Extract data from a document (e.g. PDF, DOCX, HTML. See complete list below).

  2. Chunk the data.

  3. Enrich the data with various metadata.

  4. Index the data into a retrieval source.

Last updated