folderFile Processing Guide

Attach files to agent runs, reuse artifacts from prior attachments, and manage chunk IDs for long-form analysis. Reach for this guide when agents need to consume documents, transcripts, or datasets across REST, the Python SDK, and the CLI.

circle-info

File-handling support is summarised in the AIP capability matrixarrow-up-right. The main limitation today is regenerating presigned URLs—stick with the REST helper until SDK/CLI shortcuts arrive.

circle-info

aip agents run accepts either an agent ID or a unique name. Use --select to pick from partial name matches or provide the ID directly when scripting.

Attach Files to an Agent Run

When to use: Collect fresh documents from users or pipelines and supply them during execution.

circle-info

Local Document Processing: For local execution, you can use document loader tools like PDFReaderTool, DocxReaderTool, and ExcelReaderTool from aip-agents to read files directly from disk without uploading to the server. Attach the file in the same run so the tool has access to it. See the Local vs Remote guidearrow-up-right for the document loader tools quickstart and example (main_with_docproc_pdf.py).

from glaip_sdk import Agent

agent = Agent(name="analysis-agent", instruction="You analyze documents.")

response = agent.run(
    "Summarise the document and extract key metrics",
    files=["./reports/q1.pdf", "./reports/q2.pdf"],
)
print(response)

Common attachment errors

Symptom
Likely cause
Fix

413 Payload Too Large

File exceeds backend attachment/upload limits.

Compress the file or split it into smaller chunks.

Missing file in run logs

File path incorrect or permissions denied.

Double-check the path, ensure the process can read the file, or use absolute paths.

Duplicate chunks created

Run attaches files without reusing artifact_id.

Pass the stored chunk IDs using the reuse workflows in the next section.

Unsupported media type errors

File type not allowed for ingestion.

Convert to a supported format (PDF, TXT, DOCX) or register a custom ingestion pipeline.

Reuse Chunk IDs from Prior Attachments

When to use: Avoid re-ingesting the same files while keeping chunk IDs stable across runs.

When the backend returns chunk_ids, store them for later runs:

circle-info

CLI support for passing chunk_ids is coming soon—use the SDK or REST API today to avoid re-attaching large files.

Retrieve Artifacts and Output

When to use: Capture the processed results, enriched files, or generated reports after execution.

  1. Capture the run ID from the streaming response (X-Run-ID).

  2. List run history:

  3. Download artifacts directly from the presigned URLs in the response. If a URL has expired, regenerate it with /utils/regenerate_presigned_url.

Best Practices

When to use: Create organisation-wide guardrails for storage, retention, and compliance.

  • Compress large files — keep attachments efficient and within allowable limits.

  • Track chunk IDs — store them alongside run metadata so you can reference prior attachments without retransmitting data.

  • Sanitise inputs — redaction or PII masking should occur before attaching sensitive documents; see the Security & privacy guidearrow-up-right.

  • Automate clean-up — if you are storing artifacts locally for auditing, ensure rotation policies are in place.

Last updated