PPTX
gllm-docproc | Tutorial : PPTX Parser | Use Case: Advanced DPO Pipeline | API Reference
PPTX Parser is responsible for parsing the shape structure within PPTX documents. It maps loaded elements from the PPTX Loader into structures such as title, paragraph, and footer, based on their placeholder types.
This page provides guide to use PPTX Parser in Document Processing Orchestrator (DPO).
Installation
# you can use a Conda environment
pip install --extra-index-url https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/ "gllm-docproc[pptx]"# you can use a Conda environment
$token = (gcloud auth print-access-token)
pip install --extra-index-url "https://oauth2accesstoken:$token@glsdk.gdplabs.id/gen-ai-internal/simple/" "gllm-docproc[pptx]"# you can use a Conda environment
FOR /F "tokens=*" %T IN ('gcloud auth print-access-token') DO SET TOKEN=%T
pip install --extra-index-url "https://oauth2accesstoken:%TOKEN%@glsdk.gdplabs.id/gen-ai-internal/simple/" "gllm-docproc[pptx]"You can use the following as a sample file: pptx-example.pptx.
1
Create a script called main.py:
from gllm_docproc.loader.pptx import PythonPPTXLoader
from gllm_docproc.parser.document import PPTXParser
source = "./data/source/pptx-example.pptx"
# initialize the PPTX Loader
loader = PythonPPTXLoader()
# load source
loaded_elements = loader.load(source)
# initialize the PPTX Parser
parser = PPTXParser()
# parse loaded elements
parsed_elements = parser.parse(loaded_elements)2
Run the script:
python main.py3
The loader & parser will generate the following: output JSON.
Last updated