Loader Router

LoaderRouter is designed to identify the appropriate LoaderType for a given input by examining its path, extension, metadata, content, or URL.

It supports common document, media, and text-based files, returning the matched type in a dictionary keyed by LoaderType.KEY, or marking it as uncategorized if no match is found.

Prerequisites

If you want to try the snippet code in this page:

Installation

# you can use a Conda environment
pip install --extra-index-url https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/ "gllm-docproc"

You can use the following as a sample file: pdf-example.pdf.

Running the Router

1

Create a script called main.py:

from gllm_docproc.dpo_router.loader_router import LoaderRouter
from gllm_docproc.model.loader_type import LoaderType

# Example source: local PDF file
source = "./pdf-example.pdf"

# Initialize LoaderRouter
router = LoaderRouter()

# Route the input to get the loader type
result = router.route(source)

# Access the detected loader type
print(f"Detected loader type: {result[LoaderType.KEY]}")
2

Run the script:

python main.py
3

Example output:

Detected loader type: pdf_loader

The returned dictionary has:

  • Key: LoaderType.KEY ("loader_type")

  • Value: one of the values defined in LoaderType, such as "pdf_loader", "docx_loader", "audio_loader", etc.

Last updated