Run Pipeline on a Server

This guide will walk you through deploying your RAG pipeline as a FastAPI web service that can handle HTTP requests and stream real-time responses. You'll learn how to create a production-ready API server that exposes your pipeline functionality through REST endpoints.

Running a pipeline on a server allows you to expose your AI pipeline as a web service, enabling multiple clients to interact with your RAG system through HTTP requests. This approach provides scalability, accessibility, and the ability to integrate your pipeline into web applications, mobile apps, or other services.

For example, instead of running your pipeline locally each time, you can deploy it as an API where users can send queries via HTTP requests and receive streaming responses in real-time.

Prerequisites

This example specifically requires:

  1. Completion of the Your First RAG Pipeline tutorial - you need a working pipeline to deploy

  2. Completion of all setup steps listed on the Prerequisites page

  3. A working pipeline implementation (e.g., from previous Your First RAG Pipeline)

You should be familiar with these concepts and components:

  1. Pipeline - Basic pipeline construction and execution

  2. Basic understanding of APIs and HTTP requests

Installation

# you can use a Conda environment
pip install --extra-index-url "https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/" gllm-pipeline gllm-rag gllm-core gllm-generation gllm-inference gllm-retrieval gllm-misc gllm-datastore

Project Setup

1

Project Structure

Ensure your project structure includes your working pipeline and server files:

<your-project>/
├── modules/
│   └── [your actual components]
├── pipeline.py                     # Your working RAG pipeline
├── main.py                         # 👈 FastAPI server (we'll create this)
├── run.py                          # 👈 Test client (we'll create this)
└── .env
2

Environment Configuration

Ensure you have all necessary environment variables configured in your .env file:

CSV_DATA_PATH="data/imaginary_animals.csv"
ELASTICSEARCH_URL="http://localhost:9200/"
EMBEDDING_MODEL="text-embedding-3-small"
LANGUAGE_MODEL="gpt-4o-mini"
INDEX_NAME="first-quest"
OPENAI_API_KEY="<YOUR_OPENAI_API_KEY>"

Make sure your pipeline works correctly in standalone mode before deploying it as a server.


Build Your FastAPI Server

1) Create the FastAPI Application

The FastAPI application serves as the web interface for your RAG pipeline, handling HTTP requests and managing streaming responses:

1

Create the main server file

Create main.py with the FastAPI server implementation:

Key components:

  • FastAPI app: Web framework for creating REST API endpoints

  • Event system: Handles real-time streaming of pipeline events

  • Pipeline runner: Manages pipeline execution with proper error handling

  • Stream endpoint: Provides HTTP interface for pipeline requests

2

Understanding the server architecture

The server uses several key patterns:

  1. Async execution: Pipeline runs concurrently with response streaming

  2. Event streaming: Real-time updates sent to clients during processing

  3. Error handling: Proper exception management and cleanup

  4. Dual handlers: Both console logging and stream responses

The StreamingResponse allows clients to receive updates in real-time rather than waiting for the complete response.

2) Start the Server

Deploy your FastAPI server and make it accessible for HTTP requests:

1

Run the FastAPI server

Start your server using Uvicorn (included with FastAPI):

You should see output similar to:

Server options:

  • --reload: Automatically restart server when code changes

  • --host 0.0.0.0: Make server accessible from other machines

  • --port 8001: Use a different port if 8000 is occupied

2

Verify server is running

Check that your server is operational:

This should show the FastAPI interactive documentation interface, confirming your server is running correctly.

FastAPI automatically generates interactive API documentation at /docs and /redoc endpoints.


Test Your Pipeline Server

1) Create a Test Client

Build a client application to test your deployed pipeline server:

1

Create the test client

Create run.py with a client that tests your server:

Client features:

  • Streaming support: Processes real-time responses from the server

  • Event filtering: Separates status messages from content responses

  • Error handling: Manages HTTP errors and connection issues

  • Formatted output: Clean display of streaming pipeline results

2) Test the Complete System

Run your test client to verify the server deployment works correctly:

1

Execute the test client

With your server running in one terminal, run the client in another:

2

Observe the streaming output

You should see real-time pipeline execution logs and results:

What you'll see:

  • Status events: Pipeline step execution logs with timestamps

  • Streaming content: Response text appearing in real-time

  • Debug information: Detailed step-by-step pipeline execution (if debug: true)

3

Verify successful deployment

If everything works correctly, you should be able to see:

  • Real-time pipeline execution logs

  • Streaming response content as it's generated

  • Proper error handling if issues occur

  • Clean server shutdown when stopping


Troubleshooting

Common Issues

  1. Server fails to start:

    • Check if port 8000 is already in use: lsof -i :8000

    • Verify all dependencies are installed correctly

    • Ensure your pipeline imports work: python -c "from pipeline import e2e_pipeline"

  2. No streaming responses received:

    • Verify the StreamEventHandler is properly configured

    • Check that your pipeline uses the event emitter correctly

    • Test with debug: true to see detailed pipeline execution

  3. Client connection errors:

    • Confirm server is running on the correct host and port

    • Check firewall settings if accessing from another machine

    • Verify the request JSON format matches the expected schema

  4. Pipeline execution errors:

    • Review server console output for detailed error messages

    • Test your pipeline standalone before deploying as server

    • Check environment variables and API keys are properly configured

Debug Tips

  1. Enable detailed logging: Set debug: true in client requests

  2. Test components separately: Verify pipeline works before server deployment

  3. Monitor server logs: Watch console output while testing client requests

  4. Use interactive docs: Visit http://127.0.0.1:8000/docs to test endpoints manually

  5. Check network connectivity: Ensure client can reach server host and port


Congratulations! You've successfully deployed your RAG pipeline as a FastAPI web service. Your pipeline is now accessible through HTTP endpoints, supporting real-time streaming responses and ready for integration into web applications, mobile apps, or other services. This server-based approach provides the foundation for building scalable AI-powered applications.

Last updated