Table Chunker
Table Chunker is designed to handle the chunking of table element effectively. Recognizing that tables require specialized treatment to maintain their integrity and usability when segmented into chunks. Table Chunker ensuring that related data remains connected and contextually meaningful.
Installation
# you can use a Conda environment
pip install --extra-index-url https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/ "gllm-docproc"
You can use the following as a sample file: input.
1
Create a script called main.py
:
import json
from gllm_docproc.chunker.table import MARKDOWN, TableChunker
# table element you want to chunk
with open('./data/source/table_element.json', 'r') as file:
table_element = json.load(file)
# initialize Table Chunker
chunker = TableChunker(
chunk_size=4000,
chunk_overlap=0,
table_format=MARKDOWN
)
# chunk table element
chunked_elements = chunker.chunk([table_element], is_table_need_index=True)
print(chunked_elements)
2
Run the script:
python main.py
3
The loader will generate the following: output JSON.
Last updated