# Document Processing Orchestrator

[**`gllm-datastore`**](https://github.com/GDP-ADMIN/gl-sdk/tree/main/libs/gllm-datastore/gllm_datastore) | Related tutorials: [index-your-data-with-vector-data-store](https://gdplabs.gitbook.io/sdk/~/revisions/beykCxz0UanaEX0sPJJu/how-to-guides/index-your-data-with-vector-data-store "mention") [#index-your-data](https://gdplabs.gitbook.io/sdk/~/revisions/beykCxz0UanaEX0sPJJu/how-to-guides/build-end-to-end-rag-pipeline/your-first-rag-pipeline#index-your-data "mention") | [API Reference](https://api.python.docs.gdplabs.id/gen-ai/library/gllm_datastore/index.html)&#x20;

Language models (LMs) are powerful, but they don't have information about your private document.

**Document Processing Orchestrator (DPO)** lets you process your documents and store them into a retrieval source (e.g. vector database, graph database, SQL database). The data will then be used in [Retrieval](https://gdplabs.gitbook.io/sdk/~/revisions/beykCxz0UanaEX0sPJJu/tutorials/retrieval) process.

Our DPO components allow you to:

1. Extract data from a document (e.g. PDF, DOCX, HTML. See complete list [below](#features)).
2. Chunk the data.
3. Enrich the data with various metadata.
4. Index the data into a retrieval source.
