Loader
gllm-docproc | Tutorial : Loader | Use Case: Advanced DPO Pipeline | API Reference
Loader is designed for extracting information from the provided source.
To give you an idea what a Loader does, this is a snippet of a sample JSON output:
{
"text": "[Header] This is the Header of the Document",
"structure": "uncategorized",
"metadata": {
"source": "pdf-example.pdf",
"source_type": "pdf",
"loaded_datetime": "2024-10-17 17:10:30",
"font_size": 12,
"font_family": "TimesNewRomanPSMT",
"font_color": "#000000",
"coordinates": [
72,
292,
49,
36
],
"links": [],
"layout_width": 612,
"layout_height": 792,
"page_number": 1,
"sorted_element_format": [
[
12,
"TimesNewRomanPSMT",
"#000000"
]
]
}
}Our Loader has the following sub components to handle various types of documents:
Audio
CSV
DOCX
HTML
Image
JSON
PDF
PPTX
TXT
XLSX
Last updated