> For the complete documentation index, see [llms.txt](https://gdplabs.gitbook.io/sdk/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://gdplabs.gitbook.io/sdk/gen-ai-sdk/tutorials/document-processing-orchestrator/parser/docx.md).

# DOCX

**DOCX Parser** is responsible for parsing the text structure within DOCX documents. It maps loaded elements from the DOCX Loader into structures such as header, title, footer, heading, and paragraph, based on their style names.

This page provides guide to use DOCX Parser in Document Processing Orchestrator (DPO).

<details>

<summary>Prerequisites</summary>

This example specifically requires completion of all setup steps listed on the [Prerequisites](/sdk/gen-ai-sdk/prerequisites.md) page.

</details>

## **Installation**

{% tabs %}
{% tab title="Linux, macOS, or Windows WSL" %}

```bash
# you can use a Conda environment
pip install --extra-index-url https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/ "gllm-docproc[docx]"
```

{% endtab %}

{% tab title="Windows Powershell" %}

```powershell
# you can use a Conda environment
$token = (gcloud auth print-access-token)
pip install --extra-index-url "https://oauth2accesstoken:$token@glsdk.gdplabs.id/gen-ai-internal/simple/" "gllm-docproc[docx]"
```

{% endtab %}

{% tab title="Windows Command Prompt" %}

```bash
# you can use a Conda environment
FOR /F "tokens=*" %T IN ('gcloud auth print-access-token') DO SET TOKEN=%T
pip install --extra-index-url "https://oauth2accesstoken:%TOKEN%@glsdk.gdplabs.id/gen-ai-internal/simple/" "gllm-docproc[docx]"
```

{% endtab %}
{% endtabs %}

You can use the following as a sample file: [loaded\_elements.json](https://assets.analytics.glair.ai/generative/docx/docx2pythonloader-pythondocxtableloader-output.json).

{% stepper %}
{% step %}
Create a script called `main.py`:

{% code lineNumbers="true" %}

```python
import json

from gllm_docproc.parser.document import DOCXParser

# loaded_elements (input) that you want to Parse
with open('./data/source/loaded_elements.json', 'r') as file:
    loaded_elements = json.load(file)

# initialize the DOCX Parser
parser = DOCXParser()

# parse loaded elements
parsed_elements = parser.parse(loaded_elements)
```

{% endcode %}
{% endstep %}

{% step %}
Run the script:

```bash
python main.py
```

{% endstep %}

{% step %}
The parser will generate the following: [output JSON](https://assets.analytics.glair.ai/generative/docx/docxparser-output.json).
{% endstep %}
{% endstepper %}


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://gdplabs.gitbook.io/sdk/gen-ai-sdk/tutorials/document-processing-orchestrator/parser/docx.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
