Loader Router
LoaderRouter is designed to identify the appropriate LoaderType for a given input by examining its path, extension, metadata, content, or URL.
It supports common document, media, and text-based files, returning the matched type in a dictionary keyed by LoaderType.KEY, or marking it as uncategorized if no match is found.
Installation
# you can use a Conda environment
pip install --extra-index-url https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/ "gllm-docproc"# you can use a Conda environment
$token = (gcloud auth print-access-token)
pip install --extra-index-url "https://oauth2accesstoken:$token@glsdk.gdplabs.id/gen-ai-internal/simple/" "gllm-docproc"# you can use a Conda environment
FOR /F "tokens=*" %T IN ('gcloud auth print-access-token') DO SET TOKEN=%T
pip install --extra-index-url "https://oauth2accesstoken:%TOKEN%@glsdk.gdplabs.id/gen-ai-internaYou can use the following as a sample file: pdf-example.pdf.
Running the Router
1
Create a script called main.py:
from gllm_docproc.dpo_router.loader_router import LoaderRouter
from gllm_docproc.model.loader_type import LoaderType
# Example source: local PDF file
source = "./pdf-example.pdf"
# Initialize LoaderRouter
router = LoaderRouter()
# Route the input to get the loader type
result = router.route(source)
# Access the detected loader type
print(f"Detected loader type: {result[LoaderType.KEY]}")
2
Run the script:
python main.py3
Example output:
Detected loader type: pdf_loaderLast updated