Run Pipeline on a Server
This guide will walk you through deploying your RAG pipeline as a FastAPI web service that can handle HTTP requests and stream real-time responses. You'll learn how to create a production-ready API server that exposes your pipeline functionality through REST endpoints.
Running a pipeline on a server allows you to expose your AI pipeline as a web service, enabling multiple clients to interact with your RAG system through HTTP requests. This approach provides scalability, accessibility, and the ability to integrate your pipeline into web applications, mobile apps, or other services.
For example, instead of running your pipeline locally each time, you can deploy it as an API where users can send queries via HTTP requests and receive streaming responses in real-time.
Installation
# you can use a Conda environment
pip install --extra-index-url "https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/" gllm-pipeline gllm-rag gllm-core gllm-generation gllm-inference gllm-retrieval gllm-misc gllm-datastore# you can use a Conda environment
pip install --extra-index-url "https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/" gllm-pipeline gllm-rag gllm-core gllm-generation gllm-inference gllm-retrieval gllm-misc gllm-datastore# you can use a Conda environment
FOR /F "tokens=*" %T IN ('gcloud auth print-access-token') DO pip install --extra-index-url "https://oauth2accesstoken:%T@glsdk.gdplabs.id/gen-ai-internal/simple/" gllm-pipeline gllm-rag gllm-core gllm-generation gllm-inference gllm-retrieval gllm-misc gllm-datastoreProject Setup
Project Structure
Ensure your project structure includes your working pipeline and server files:
<your-project>/
├── modules/
│ └── [your actual components]
├── pipeline.py # Your working RAG pipeline
├── main.py # 👈 FastAPI server (we'll create this)
├── run.py # 👈 Test client (we'll create this)
└── .envEnvironment Configuration
Ensure you have all necessary environment variables configured in your .env file:
CSV_DATA_PATH="data/imaginary_animals.csv"
ELASTICSEARCH_URL="http://localhost:9200/"
EMBEDDING_MODEL="text-embedding-3-small"
LANGUAGE_MODEL="gpt-4o-mini"
INDEX_NAME="first-quest"
OPENAI_API_KEY="<YOUR_OPENAI_API_KEY>"Build Your FastAPI Server
1) Create the FastAPI Application
The FastAPI application serves as the web interface for your RAG pipeline, handling HTTP requests and managing streaming responses:
Create the main server file
Create main.py with the FastAPI server implementation:
Key components:
FastAPI app: Web framework for creating REST API endpoints
Event system: Handles real-time streaming of pipeline events
Pipeline runner: Manages pipeline execution with proper error handling
Stream endpoint: Provides HTTP interface for pipeline requests
Understanding the server architecture
The server uses several key patterns:
Async execution: Pipeline runs concurrently with response streaming
Event streaming: Real-time updates sent to clients during processing
Error handling: Proper exception management and cleanup
Dual handlers: Both console logging and stream responses
2) Start the Server
Deploy your FastAPI server and make it accessible for HTTP requests:
Run the FastAPI server
Start your server using Uvicorn (included with FastAPI):
You should see output similar to:
Server options:
--reload: Automatically restart server when code changes--host 0.0.0.0: Make server accessible from other machines--port 8001: Use a different port if 8000 is occupied
Verify server is running
Check that your server is operational:
This should show the FastAPI interactive documentation interface, confirming your server is running correctly.
Test Your Pipeline Server
1) Create a Test Client
Build a client application to test your deployed pipeline server:
Create the test client
Create run.py with a client that tests your server:
Client features:
Streaming support: Processes real-time responses from the server
Event filtering: Separates status messages from content responses
Error handling: Manages HTTP errors and connection issues
Formatted output: Clean display of streaming pipeline results
2) Test the Complete System
Run your test client to verify the server deployment works correctly:
Execute the test client
With your server running in one terminal, run the client in another:
Observe the streaming output
You should see real-time pipeline execution logs and results:
What you'll see:
Status events: Pipeline step execution logs with timestamps
Streaming content: Response text appearing in real-time
Debug information: Detailed step-by-step pipeline execution (if
debug: true)
Verify successful deployment
If everything works correctly, you should be able to see:
Real-time pipeline execution logs
Streaming response content as it's generated
Proper error handling if issues occur
Clean server shutdown when stopping
Congratulations! Your RAG pipeline is now successfully deployed as a web service and ready for production use.
Troubleshooting
Common Issues
Server fails to start:
Check if port 8000 is already in use:
lsof -i :8000Verify all dependencies are installed correctly
Ensure your pipeline imports work:
python -c "from pipeline import e2e_pipeline"
No streaming responses received:
Verify the
StreamEventHandleris properly configuredCheck that your pipeline uses the event emitter correctly
Test with
debug: trueto see detailed pipeline execution
Client connection errors:
Confirm server is running on the correct host and port
Check firewall settings if accessing from another machine
Verify the request JSON format matches the expected schema
Pipeline execution errors:
Review server console output for detailed error messages
Test your pipeline standalone before deploying as server
Check environment variables and API keys are properly configured
Debug Tips
Enable detailed logging: Set
debug: truein client requestsTest components separately: Verify pipeline works before server deployment
Monitor server logs: Watch console output while testing client requests
Use interactive docs: Visit
http://127.0.0.1:8000/docsto test endpoints manuallyCheck network connectivity: Ensure client can reach server host and port
Congratulations! You've successfully deployed your RAG pipeline as a FastAPI web service. Your pipeline is now accessible through HTTP endpoints, supporting real-time streaming responses and ready for integration into web applications, mobile apps, or other services. This server-based approach provides the foundation for building scalable AI-powered applications.
Last updated