Loader Router
LoaderRouter is designed to identify the appropriate LoaderType
for a given input by examining its path, extension, metadata, content, or URL.
It supports common document, media, and text-based files, returning the matched type in a dictionary keyed by LoaderType.KEY
, or marking it as uncategorized
if no match is found.
Installation
# you can use a Conda environment
pip install --extra-index-url https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/ "gllm-docproc"
You can use the following as a sample file: pdf-example.pdf.
Running the Router
1
Create a script called main.py
:
from gllm_docproc.dpo_router.loader_router import LoaderRouter
from gllm_docproc.model.loader_type import LoaderType
# Example source: local PDF file
source = "./pdf-example.pdf"
# Initialize LoaderRouter
router = LoaderRouter()
# Route the input to get the loader type
result = router.route(source)
# Access the detected loader type
print(f"Detected loader type: {result[LoaderType.KEY]}")
2
Run the script:
python main.py
3
Example output:
Detected loader type: pdf_loader
Last updated