TXT

gllm-docproc | Tutorial: TXT Loader | Use Case: Advanced DPO Pipeline | API Reference

TXT Loader is a component designed for extracting information from a text file and converting it into a standardized JSON format.

Prerequisites

This example specifically requires completion of all setup steps listed on the Prerequisites page.

Installation

# you can use a Conda environment
pip install --extra-index-url https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/ "gllm-docproc"

You can use the following as a sample file: txt-example.txt.

TXT Loader

TXTLoader is responsible to extract information from text file. It uses magic to ensure the file is text-based before extracting its content.

1

Create a script called main.py:

from gllm_docproc.loader.txt import TXTLoader

source = "./data/source/txt-example.txt"

# initialize the TXTLoader
loader = TXTLoader()

# load the txt source
loaded_elements = loader.load(source)
2

Run the script:

python main.py
3

The loader will generate the following: output JSON.

Last updated