Parser Router
ParserRouter is designed to identify the appropriate ParserType
for loaded elements, accepting either a JSON file path or an in-memory list of element dictionaries.
It identifies the parser type by inspecting the source_type
field in the metadata
of the first element, and returns the result as a dictionary keyed by ParserType.KEY
, or marks it as uncategorized
if no match is found.
Installation
# you can use a Conda environment
pip install --extra-index-url https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/ "gllm-docproc"
You can use the following as a sample file: pymupdfloader-output.json
Running the Router
1
Create a script called main.py
:
from gllm_docproc.dpo_router.parser_router import ParserRouter
from gllm_docproc.model.parser_type import ParserType
# Example source: path to loaded elements JSON file
source = "./loaded_elements.json"
# Initialize ParserRouter
router = ParserRouter()
# Route the file to get the parser type
result = router.route(source)
# Access the detected parser type
print(f"Detected parser type: {result[ParserType.KEY]}")
2
Run the script:
python main.py
3
Example output:
Detected parser type: pdf_parser
Last updated