Web Search Client

The WebSearchClient class provides a Python interface for web searching, page fetching, and content extraction.

Overview

The Web Search Client allows developers to:

  • Search the web for relevant documents and pages

  • Retrieve URLs matching search queries

  • Fetch and parse web pages

  • Extract snippets and keypoints from web content

  • Handle both streaming and non-streaming responses

Installation

pip install smart-search-sdk

Quick Start

import asyncio
import os
from dotenv import load_dotenv

from smart_search_sdk.web.client import WebSearchClient
from smart_search_sdk.web.models import GetWebSearchResultsRequest

load_dotenv()

async def main():
    # Initialize the client
    client = WebSearchClient(base_url=os.getenv("SMARTSEARCH_BASE_URL"))

    # Authenticate
    await client.authenticate(token=os.getenv("SMARTSEARCH_TOKEN"))

    # Search the web
    request = GetWebSearchResultsRequest(
        query="Python tutorials",
        result_type="snippets",
        size=5
    )

    response = await client.search_web(request, stream=False)

    for item in response["data"]:
        print(f"Title: {item['metadata']['title']}")
        print(f"URL: {item['metadata']['source']}")
        print(f"Content: {item['content'][:100]}...")
        print("---")

asyncio.run(main())

Class: WebSearchClient

Constructor

Parameters:

  • base_url (str): The base URL of the Smart Search API

Example:


Methods

Search the web for documents or pages relevant to the input query.

Signature

Parameters

Parameter
Type
Required
Default
Description

request

GetWebSearchResultsRequest

Yes

-

Search request object

stream

bool

No

False

Enable streaming response

timeout

float

No

300.0

Request timeout in seconds

Request Fields

GetWebSearchResultsRequest:

  • query (str): The search query string

  • result_type (Literal["snippets", "keypoints", "summary", "description"]): Type of results to return

  • size (int): Maximum number of results (1-50)

  • site (list[AnyHttpUrl] | None, optional): List of URLs to limit search results to specific sites or domains. Can also accept a single URL string which will be converted to a list.

  • engine (WebSearchEngine, optional): Search engine to use (auto, firecrawl, perplexity). Defaults to auto which uses the default engine.

Returns

  • Non-streaming: dict with search results

  • Streaming: AsyncGenerator[dict, None] yielding response chunks

Response Structure

Examples

Basic search:

Search with keypoints:

Search with site filter:

Search with multiple sites:

Search with specific engine:

Streaming search:


Web - Search Map

Map a website and discover its URL structure and hierarchy.

Signature

Parameters

Parameter
Type
Required
Default
Description

request

GetWebSearchMapRequest

Yes

-

Website map request object

stream

bool

No

False

Enable streaming response

timeout

float

No

30.0

Request timeout in seconds

Request Fields

GetWebSearchMapRequest:

  • base_url (AnyHttpUrl): The base URL of the website to map

  • page (int): Page number for pagination (default: 1)

  • size (int): Maximum number of URLs to return (1-1000, default: 20)

  • return_all_map (bool): Whether to return all mapped links (default: False)

  • include_subdomains (bool): Whether to include subdomains in the mapping (default: False)

  • query (str, optional): Optional search query to filter URLs by keywords

Returns

  • Non-streaming: dict with mapped URLs

  • Streaming: AsyncGenerator[dict, None] yielding URL chunks

Response Structure

Examples

Basic website mapping:

Map with keyword filtering:

Streaming website map:

Return all mapped links:

Pagination:


Web - Search URLs

Search the web for pages and return their URLs (like a search engine).

Signature

Parameters

Parameter
Type
Required
Default
Description

request

GetWebSearchUrlsRequest

Yes

-

URL search request object

stream

bool

No

False

Enable streaming response

timeout

float

No

30.0

Request timeout in seconds

Request Fields

GetWebSearchUrlsRequest:

  • query (str): The search query string

  • size (int): Maximum number of URLs (1-50)

  • site (list[AnyHttpUrl] | None, optional): List of URLs to limit search results to specific sites or domains. Can also accept a single URL string which will be converted to a list.

  • engine (WebSearchEngine, optional): Search engine to use (auto, firecrawl, perplexity). Defaults to auto which uses the default engine.

Returns

  • Non-streaming: dict with URL list

  • Streaming: AsyncGenerator[dict, None] yielding URL chunks

Response Structure

Examples

Get URLs:

Get URLs with site filter:

Search multiple sites for URLs:


Web - Fetch Page

Fetch a single web page by URL and return its content and metadata.

Signature

Parameters

Parameter
Type
Required
Default
Description

request

GetWebPageRequest

Yes

-

Page fetch request object

stream

bool

No

False

Enable streaming response

timeout

float

No

30.0

Request timeout in seconds

Request Fields

GetWebPageRequest:

  • source (AnyHttpUrl): The URL of the web page to fetch

  • json_schema (dict[str, Any] | None, optional): JSON schema for custom structured data extraction. When provided, uses Firecrawl extract API to extract data matching the schema.

  • return_html (bool): Whether to return full HTML content (default: False)

Returns

  • Non-streaming: dict with page content and metadata

  • Streaming: AsyncGenerator[dict, None] yielding page chunks

Response Structure

Examples

Fetch page:

Fetch with HTML:

Fetch with JSON schema extraction:

Using Pydantic Models (Recommended):


Web - Get Web Page Snippets

Extract relevant text snippets from a web page for a given query.

Signature

Parameters

Parameter
Type
Required
Default
Description

request

GetWebPageSnippetsRequest

Yes

-

Snippets request object

stream

bool

No

False

Enable streaming response

timeout

float

No

300.0

Request timeout in seconds

Request Fields

GetWebPageSnippetsRequest:

  • query (str): Query to search for within the page

  • source (AnyHttpUrl): URL of the web page to extract from

  • size (int): Maximum number of snippets (1-50)

  • json_schema (dict[str, Any] | None, optional): Optional JSON schema for custom extraction. If provided, uses Firecrawl extract API instead of standard snippet extraction.

  • snippet_style (Literal["paragraph", "sentence"]): Style of snippet (default: "paragraph")

Returns

  • Non-streaming: dict with snippets

  • Streaming: AsyncGenerator[dict, None] yielding snippet chunks

Response Structure

Examples

Extract snippets:

Extract sentence-style snippets:

Extract with JSON schema:

Using Pydantic Models (Recommended):


Web - Get Web Page Keypoints

Fetch a web page and extract concise key points from it.

Signature

Parameters

Parameter
Type
Required
Default
Description

request

GetWebPageKeypointsRequest

Yes

-

Keypoints request object

stream

bool

No

False

Enable streaming response

timeout

float

No

300.0

Request timeout in seconds

Request Fields

GetWebPageKeypointsRequest:

  • query (str): Query to search for

  • source (AnyHttpUrl): URL of the web page to extract from

  • size (int): Maximum number of keypoints (1-50)

  • json_schema (dict[str, Any] | None, optional): Optional JSON schema for custom extraction. If provided, uses Firecrawl extract API.

Returns

  • Non-streaming: dict with keypoints

  • Streaming: AsyncGenerator[dict, None] yielding keypoint chunks

Response Structure

Examples

Extract keypoints:

Extract keypoints with JSON schema:

Using Pydantic Models (Recommended):


Models

GetWebSearchResultsRequest

GetWebSearchUrlsRequest

GetWebPageRequest

GetWebPageSnippetsRequest

GetWebPageKeypointsRequest

GetWebSearchMapRequest

GetWebSearchMapResponse


Complete Examples

Multi-Step Web Research

Parallel URL Processing

Streaming Large Result Sets


Error Handling


Best Practices

  1. Authentication: Always authenticate before making requests

  2. Timeout Configuration: Set appropriate timeouts for different operations

  3. Streaming for Large Results: Use streaming for better performance

  4. Error Handling: Always wrap requests in try-except blocks

  5. Resource Cleanup: Use async context managers when available

  6. Batch Processing: Process multiple URLs concurrently

  7. Size Optimization: Request only what you need

  8. Choose Appropriate Result Types:

    • Use snippets for detailed text excerpts

    • Use keypoints for concise summaries

  9. Snippet Style Selection:

    • Use paragraph for contextual information

    • Use sentence for precise, concise answers


Last updated