Response Synthesizer

gllm-generation | Involves LM | Tutorial: Response Synthesizer | Use Case: Create the Response Synthesizer | API Reference

What’s a Response Synthesizer?

The response synthesizer is a utility module designed to synthesize the final response of an RAG pipeline based on the provided inputs and contexts. It can be executed with various strategy, such as stuff, static list, etc.

In this tutorial, you'll learn how to use the ResponseSynthesizer with stuff strategy in just a few lines of code. You can also explore other types of strategies, available here.

Prerequisites

This example specifically requires completion of all setup steps listed on the Prerequisites page.

You should be familiar with these concepts:

Installation

# you can use a Conda environment
pip install --extra-index-url https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/ "gllm-generation"

Available Strategies

The Response Synthesizer supports multiple synthesis strategies, each designed for different use cases and content processing needs. Choose the strategy that best fits your requirements:

Strategy
When to Use
Link to Section

Stuff

When all context fits within the model's token limit. Simple, single-pass processing.

Static List

When you want to return a formatted list without LM processing. No model calls needed.

Map Reduce

When dealing with large amounts of content that need hierarchical and parallel processing and combining.

Refine

When you need iterative refinement, processing chunks sequentially to build up an answer.

Note: The examples in this tutorial will use the stuff strategy, as it is the most common and basic strategy for RAG applications.

Quickstart

Let’s jump into a basic example using ResponseSynthesizer with stuff strategy.

Here, we're going to use a preset that contains a predefined prompt templates with the following keys:

  1. query: Will be filled with the query parameter passed to the synthesize() method.

  2. context: Will be filled with the list of chunks passed to the chunks parameter passed to the synthesize() method. These chunks will be repacked (stuffed together) into a context string before being passed to the prompt template.

This preset is particularly useful for an RAG pipeline where we want to use a list of retrieved chunks as context to answer a user query. It can be instantiated through the stuff_preset() method by providing the desired model_id.

Expected Output

Customizing Language Model

We can also customize the language model related config, such as the prompt templates. In the example below, we create a prompt template that has no key, and therefore we dont need to pass any param to the synthesize() method.

Expected Output

Passing Custom LM Request Processor

Alternatively, we can also perform the customization by passing an LMRequestProcessor object to the stuff() method. This is particularly useful when we already have an LMRequestProcessor object, such as when we use the LMRequestProcessorCatalog.

Expected Output

Using Prompt Variables

ResponseSynthesizer with stuff strategy supports adding prompt variables to be injected to the prompt template.

Expected Output

Adding History

ResponseSynthesizer with stuff strategy supports adding history as additional context for the language model.

Expected Output

Adding Extra Contents

ResponseSynthesizer with stuff strategy supports adding extra contents — such as attachments — as additional context for the language model.

Expected Output

Customizing Extractor Function

By default, the ResponseSynthesizer with stuff strategy uses an extractor function that extracts only the response attribute of the language model's LMOutput schema. This ensures that the synthesized response will always be a string by default.

For example, let's try setting the output_analytics param to True, which will cause the internal LMInvoker to output an LMOutput objects with analytics information.

Expected Output

As we can see, due to the default extractor function, the ResponseSynthesizer will always output just the response regardless of the LMInvoker output.

If we want to get other attributes of the LMOutput, or even the whole LMOutput, we can define a custom extractor function as follows:

Expected Output

Other Strategies

1. Static List

The Static List strategy generates responses by formatting a list of context items without using a language model. This is the most lightweight strategy as it doesn't require any LM calls. You can customize the formatter function by defining the format_response_func .

When to Use

Use the Static List strategy when:

  1. You want to return retrieved chunks directly without LM processing.

  2. You need fast responses without API costs.

  3. Your use case requires only a simple list formatting (e.g., search results, document listings).

  4. You want to display all retrieved context items to users.

Example: Using default configuration

Expected Output:


Example: Using custom format_response_func

Expected Output:


2. Map Reduce

The Map Reduce strategy uses a two-phase approach to process large amounts of content in parallel and then combine the results. This is ideal for handling content that exceeds token limits.

When to Use

Use the Map Reduce strategy when:

  • You have a large number of chunks that exceed the model's context window.

  • You want to process chunks in parallel for better performance.

  • Each chunk can be summarized independently before combining.

  • You need to handle hundreds or thousands of documents.

How It Works

  1. Map Phase: Each chunk (or batch of chunks) is processed individually to generate intermediate draft responses. This process repeats until the number of chunks fits the batch size in the reduce phase.

  2. Reduce Phase: All draft responses are then combined into a final response.

Note: you can specify which model to use for map and reduce phase (e.g., the map phase can use a smaller model, while the reduce phase can use a larger model). This is useful when you want to balance between performance and cost.

Prompt Template Variables

The required prompt keys are as follows:

  1. For Map Phase prompts:

    1. query: The input query from user.

    2. context: The context(s) provided to the map phase.

  2. For Reduce Phase prompts:

    1. query: The input query from user.

    2. context: The partial responses from the map phase.

Example: Using Preset

Example: Custom Configuration


3. Refine

The Refine strategy iteratively refines an answer by processing chunks sequentially. It starts with an initial answer from the first chunk(s), then refines it based on subsequent chunks.

When to Use

Use the Refine strategy when:

  1. You want to build up an answer incrementally.

  2. The order of chunks matters (e.g., chronological events, step-by-step instructions).

  3. You need to maintain context from previous chunks while adding new information.

  4. You want to see how the answer evolves (you can do this by using stream_drafts=True).

How It Works

  1. Initial Response: Generate an initial answer from the first chunk(s).

  2. Iterative Refinement: For each subsequent chunk (or batch), refine the previous answer by incorporating new information.

Prompt Template Variables

The required prompt keys are as follows:

  1. query: The input query.

  2. context: The new context(s) to incorporate into the refined answer.

  3. draft_response: The answer from the previous iteration that will be refined.

Example: Using Preset

Example: Custom Configuration

Congratulations! You've finished the tutorial to use the ResponseSynthesizer!

Last updated