file-importLoader

gllm-docprocarrow-up-right | Tutorial : Loader | Use Case: Advanced DPO Pipeline | API Referencearrow-up-right

Loader is designed for extracting information from the provided source.

To give you an idea what a Loader does, this is a snippet of a sample JSON output:

{
    "text": "[Header] This is the Header of the Document",
    "structure": "uncategorized",
    "metadata": {
        "source": "pdf-example.pdf",
        "source_type": "pdf",
        "loaded_datetime": "2024-10-17 17:10:30",
        "font_size": 12,
        "font_family": "TimesNewRomanPSMT",
        "font_color": "#000000",
        "coordinates": [
            72,
            292,
            49,
            36
        ],
        "links": [],
        "layout_width": 612,
        "layout_height": 792,
        "page_number": 1,
        "sorted_element_format": [
            [
                12,
                "TimesNewRomanPSMT",
                "#000000"
            ]
        ]
    }
}
chevron-rightThis is the complete JSON Schema for the outputhashtag

Our Loader has the following sub components to handle various types of documents:

  1. Audio

  2. CSV

  3. DOCX

  4. HTML

  5. Image

  6. JSON

  7. PDF

  8. PPTX

  9. TXT

  10. XLSX

Last updated