Web Search Client

The WebSearchClient class provides a Python interface for web searching, page fetching, and content extraction.

Overview

The Web Search Client allows developers to:

Search the web for relevant documents and pages
Retrieve URLs matching search queries
Fetch and parse web pages
Extract snippets and keypoints from web content
Handle both streaming and non-streaming responses

Installation

pip install smart-search-sdk

Quick Start

import asyncio
import os
from dotenv import load_dotenv

from smart_search_sdk.web.client import WebSearchClient
from smart_search_sdk.web.models import GetWebSearchResultsRequest

load_dotenv()

async def main():
    # Initialize the client
    client = WebSearchClient(base_url=os.getenv("SMARTSEARCH_BASE_URL"))

    # Authenticate
    await client.authenticate(token=os.getenv("SMARTSEARCH_TOKEN"))

    # Search the web
    request = GetWebSearchResultsRequest(
        query="Python tutorials",
        result_type="snippets",
        size=5
    )

    response = await client.search_web(request, stream=False)

    for item in response["data"]:
        print(f"Title: {item['metadata']['title']}")
        print(f"URL: {item['metadata']['source']}")
        print(f"Content: {item['content'][:100]}...")
        print("---")

asyncio.run(main())

Class: WebSearchClient

Constructor

WebSearchClient(base_url: str)

Parameters:

base_url (str): The base URL of the Smart Search API

Example:

import os
from dotenv import load_dotenv

load_dotenv()

client = WebSearchClient(base_url=os.getenv("SMARTSEARCH_BASE_URL"))
await client.authenticate(token=os.getenv("SMARTSEARCH_TOKEN"))

Methods

Web - Search

Search the web for documents or pages relevant to the input query.

Signature

async def search_web(
    self,
    request: GetWebSearchResultsRequest,
    stream: bool = False,
    timeout: float = TimeoutEnum.COMPLEX_PROCESS_TIMEOUT.value,
) -> Union[dict, AsyncGenerator[dict, None]]

Parameters

Parameter

Type

Required

Default

Description

request

GetWebSearchResultsRequest

Yes

Search request object

stream

bool

False

Enable streaming response

timeout

float

300.0

Request timeout in seconds

Request Fields

GetWebSearchResultsRequest:

query (str): The search query string
result_type (Literal["snippets", "keypoints", "summary", "description"]): Type of results to return
size (int): Maximum number of results (1-50)
site (list[AnyHttpUrl] | None, optional): List of URLs to limit search results to specific sites or domains. Can also accept a single URL string which will be converted to a list.
engine (WebSearchEngine, optional): Search engine to use (auto, firecrawl, perplexity). Defaults to auto which uses the default engine.

Returns

Non-streaming: dict with search results
Streaming: AsyncGenerator[dict, None] yielding response chunks

Response Structure

{
    "data": [
        {
            "content": str,
            "metadata": {
                "title": str,
                "source": str
            }
        }
    ]
}

Examples

Basic search:

import asyncio
import os
from dotenv import load_dotenv

from smart_search_sdk.web.client import WebSearchClient
from smart_search_sdk.web.models import GetWebSearchResultsRequest

load_dotenv()

async def search_web():
    client = WebSearchClient(base_url=os.getenv("SMARTSEARCH_BASE_URL"))
    await client.authenticate(token=os.getenv("SMARTSEARCH_TOKEN"))

    request = GetWebSearchResultsRequest(
        query="machine learning tutorials",
        result_type="snippets",
        size=10
    )

    response = await client.search_web(request, stream=False)

    for item in response["data"]:
        print(f"Title: {item['metadata']['title']}")
        print(f"Content: {item['content']}")

    return response

asyncio.run(search_web())

Search with keypoints:

import asyncio
import os
from dotenv import load_dotenv

from smart_search_sdk.web.client import WebSearchClient
from smart_search_sdk.web.models import GetWebSearchResultsRequest

load_dotenv()

async def search_keypoints():
    client = WebSearchClient(base_url=os.getenv("SMARTSEARCH_BASE_URL"))
    await client.authenticate(token=os.getenv("SMARTSEARCH_TOKEN"))

    request = GetWebSearchResultsRequest(
        query="Python best practices",
        result_type="keypoints",
        size=5
    )

    response = await client.search_web(request)
    return response

asyncio.run(search_keypoints())

Search with site filter:

import asyncio
import os
from dotenv import load_dotenv

from smart_search_sdk.web.client import WebSearchClient
from smart_search_sdk.web.models import GetWebSearchResultsRequest

load_dotenv()

async def search_with_site():
    client = WebSearchClient(base_url=os.getenv("SMARTSEARCH_BASE_URL"))
    await client.authenticate(token=os.getenv("SMARTSEARCH_TOKEN"))

    request = GetWebSearchResultsRequest(
        query="machine learning frameworks",
        site="https://github.com",
        result_type="snippets",
        size=10
    )

    response = await client.search_web(request)

    for item in response["data"]:
        print(f"Title: {item['metadata']['title']}")
        print(f"Content: {item['content']}")

    return response

asyncio.run(search_with_site())

Search with multiple sites:

import asyncio
import os
from dotenv import load_dotenv

from smart_search_sdk.web.client import WebSearchClient
from smart_search_sdk.web.models import GetWebSearchResultsRequest

load_dotenv()

async def search_multiple_sites():
    client = WebSearchClient(base_url=os.getenv("SMARTSEARCH_BASE_URL"))
    await client.authenticate(token=os.getenv("SMARTSEARCH_TOKEN"))

    request = GetWebSearchResultsRequest(
        query="python best practices",
        site=["https://realpython.com", "https://stackoverflow.com"],
        result_type="snippets",
        size=10
    )

    response = await client.search_web(request)

    for item in response["data"]:
        print(f"Title: {item['metadata']['title']}")
        print(f"URL: {item['metadata']['source']}")
        print(f"Content: {item['content']}")

    return response

asyncio.run(search_multiple_sites())

Search with specific engine:

import asyncio
import os
from dotenv import load_dotenv

from smart_search_sdk.web.client import WebSearchClient
from smart_search_sdk.web.models import GetWebSearchResultsRequest
from smart_search_sdk.web.models.model import WebSearchEngine

load_dotenv()

async def search_with_engine():
    client = WebSearchClient(base_url=os.getenv("SMARTSEARCH_BASE_URL"))
    await client.authenticate(token=os.getenv("SMARTSEARCH_TOKEN"))

    request = GetWebSearchResultsRequest(
        query="latest AI developments",
        engine=WebSearchEngine.PERPLEXITY,
        result_type="snippets",
        size=10
    )

    response = await client.search_web(request)
    return response

asyncio.run(search_with_engine())

Streaming search:

import asyncio
import os

from dotenv import load_dotenv

from smart_search_sdk.web.client import WebSearchClient
from smart_search_sdk.web.models import GetWebSearchResultsRequest

load_dotenv()

async def streaming_search():
    client = WebSearchClient(base_url=os.getenv("SMARTSEARCH_BASE_URL"))
    await client.authenticate(token=os.getenv("SMARTSEARCH_TOKEN"))

    request = GetWebSearchResultsRequest(
        query="latest AI research",
        result_type="snippets",
        size=20
    )

    stream = await client.search_web(request, stream=True)

    async for chunk in stream:
        print(f"Received chunk: {chunk}")

asyncio.run(streaming_search())

Web - Search Map

Map a website and discover its URL structure and hierarchy.

Signature

async def search_web_map(
    self,
    request: GetWebSearchMapRequest,
    stream: bool = False,
    timeout: float = TimeoutEnum.SIMPLE_PROCESS_TIMEOUT.value,
) -> Union[dict, AsyncGenerator[dict, None]]

Parameters

Parameter

Type

Required

Default

Description

request

GetWebSearchMapRequest

Yes

Website map request object

stream

bool

False

Enable streaming response

timeout

float

30.0

Request timeout in seconds

Request Fields

GetWebSearchMapRequest:

base_url (AnyHttpUrl): The base URL of the website to map
page (int): Page number for pagination (default: 1)
size (int): Maximum number of URLs to return (1-1000, default: 20)
return_all_map (bool): Whether to return all mapped links (default: False)
include_subdomains (bool): Whether to include subdomains in the mapping (default: False)
query (str, optional): Optional search query to filter URLs by keywords

Returns

Non-streaming: dict with mapped URLs
Streaming: AsyncGenerator[dict, None] yielding URL chunks

Response Structure

{
    "data": [
        {
            "url": "https://example.com/page1",
            "title": "Page Title",
            "description": "Page description"
        }
    ],
    "total_found_links": 42,
    "page": 1,
    "size": 20,
    "total_pages": 3
}

Examples

Basic website mapping:

import asyncio
import os
from dotenv import load_dotenv

from smart_search_sdk.web.client import WebSearchClient
from smart_search_sdk.web.models import GetWebSearchMapRequest

load_dotenv()

async def map_website():
    client = WebSearchClient(base_url=os.getenv("SMARTSEARCH_BASE_URL"))
    await client.authenticate(token=os.getenv("SMARTSEARCH_TOKEN"))

    request = GetWebSearchMapRequest(
        base_url="https://docs.python.org",
        page=1,
        size=50,
        return_all_map=False,
        include_subdomains=False
    )

    response = await client.search_web_map(request)

    for link in response["data"]:
        print(f"URL: {link['url']}")
        print(f"Title: {link['title']}")
        print(f"Description: {link['description']}")
        print("---")

    print(f"Total links found: {response['total_found_links']}")
    return response

asyncio.run(map_website())

Map with keyword filtering:

import asyncio
import os
from dotenv import load_dotenv

from smart_search_sdk.web.client import WebSearchClient
from smart_search_sdk.web.models import GetWebSearchMapRequest

load_dotenv()

async def map_with_filter():
    client = WebSearchClient(base_url=os.getenv("SMARTSEARCH_BASE_URL"))
    await client.authenticate(token=os.getenv("SMARTSEARCH_TOKEN"))

    request = GetWebSearchMapRequest(
        base_url="https://docs.python.org",
        page=1,
        size=20,
        return_all_map=False,
        query="tutorial",  # Filter URLs containing "tutorial"
        include_subdomains=True
    )

    response = await client.search_web_map(request)

    for link in response["data"]:
        print(f"URL: {link['url']}")
        print(f"Title: {link['title']}")

    return response

asyncio.run(map_with_filter())

Streaming website map:

import asyncio
import os
from dotenv import load_dotenv

from smart_search_sdk.web.client import WebSearchClient
from smart_search_sdk.web.models import GetWebSearchMapRequest

load_dotenv()

async def stream_website_map():
    client = WebSearchClient(base_url=os.getenv("SMARTSEARCH_BASE_URL"))
    await client.authenticate(token=os.getenv("SMARTSEARCH_TOKEN"))

    request = GetWebSearchMapRequest(
        base_url="https://example.com",
        page=1,
        size=100,
        return_all_map=False,
        include_subdomains=False
    )

    stream = await client.search_web_map(request, stream=True)

    count = 0
    async for chunk in stream:
        count += 1
        print(f"Received chunk {count}: {chunk}")

    print(f"Total chunks received: {count}")

asyncio.run(stream_website_map())

Return all mapped links:

import asyncio
import os
from dotenv import load_dotenv

from smart_search_sdk.web.client import WebSearchClient
from smart_search_sdk.web.models import GetWebSearchMapRequest

load_dotenv()

async def get_all_mapped_links():
    client = WebSearchClient(base_url=os.getenv("SMARTSEARCH_BASE_URL"))
    await client.authenticate(token=os.getenv("SMARTSEARCH_TOKEN"))

    request = GetWebSearchMapRequest(
        base_url="https://example.com",
        return_all_map=True  # Get all mapped links at once
    )

    response = await client.search_web_map(request)

    print(f"Total links found: {response['total_found_links']}")
    for link in response["data"]:
        print(f"URL: {link['url']}")

    return response

asyncio.run(get_all_mapped_links())

Pagination:

import asyncio
import os
from dotenv import load_dotenv

from smart_search_sdk.web.client import WebSearchClient
from smart_search_sdk.web.models import GetWebSearchMapRequest

load_dotenv()

async def paginate_website_map():
    client = WebSearchClient(base_url=os.getenv("SMARTSEARCH_BASE_URL"))
    await client.authenticate(token=os.getenv("SMARTSEARCH_TOKEN"))

    all_links = []
    page = 1

    while True:
        request = GetWebSearchMapRequest(
            base_url="https://example.com",
            size=50,
            page=page,
            include_subdomains=False
        )

        response = await client.search_web_map(request)
        all_links.extend(response["data"])

        if page >= response.get("total_pages", 1):
            break

        page += 1

    print(f"Total links collected: {len(all_links)}")
    return all_links

asyncio.run(paginate_website_map())

Web - Search URLs

Search the web for pages and return their URLs (like a search engine).

Signature

async def search_web_urls(
    self,
    request: GetWebSearchUrlsRequest,
    stream: bool = False,
    timeout: float = TimeoutEnum.SIMPLE_PROCESS_TIMEOUT.value,
) -> Union[dict, AsyncGenerator[dict, None]]

Parameters

Parameter

Type

Required

Default

Description

request

GetWebSearchUrlsRequest

Yes

URL search request object

stream

bool

False

Enable streaming response

timeout

float

30.0

Request timeout in seconds

Request Fields

GetWebSearchUrlsRequest:

query (str): The search query string
size (int): Maximum number of URLs (1-50)
site (list[AnyHttpUrl] | None, optional): List of URLs to limit search results to specific sites or domains. Can also accept a single URL string which will be converted to a list.
engine (WebSearchEngine, optional): Search engine to use (auto, firecrawl, perplexity). Defaults to auto which uses the default engine.

Returns

Non-streaming: dict with URL list
Streaming: AsyncGenerator[dict, None] yielding URL chunks

Response Structure

{
    "data": ["https://example.com/page1", "https://example.com/page2", ...]
}

Examples

Get URLs:

import asyncio
import os
from dotenv import load_dotenv

from smart_search_sdk.web.client import WebSearchClient
from smart_search_sdk.web.models import GetWebSearchUrlsRequest

load_dotenv()

async def get_urls():
    client = WebSearchClient(base_url=os.getenv("SMARTSEARCH_BASE_URL"))
    await client.authenticate(token=os.getenv("SMARTSEARCH_TOKEN"))

    request = GetWebSearchUrlsRequest(
        query="Python documentation",
        size=10
    )

    response = await client.search_web_urls(request)

    for url in response["data"]:
        print(f"URL: {url}")

    return response

asyncio.run(get_urls())

Get URLs with site filter:

import asyncio
import os
from dotenv import load_dotenv

from smart_search_sdk.web.client import WebSearchClient
from smart_search_sdk.web.models import GetWebSearchUrlsRequest

load_dotenv()

async def get_urls_with_site():
    client = WebSearchClient(base_url=os.getenv("SMARTSEARCH_BASE_URL"))
    await client.authenticate(token=os.getenv("SMARTSEARCH_TOKEN"))

    request = GetWebSearchUrlsRequest(
        query="python tutorials",
        site="https://realpython.com",
        size=10
    )

    response = await client.search_web_urls(request)

    for url in response["data"]:
        print(f"URL: {url}")

    return response

asyncio.run(get_urls_with_site())

Search multiple sites for URLs:

import asyncio
import os
from dotenv import load_dotenv

from smart_search_sdk.web.client import WebSearchClient
from smart_search_sdk.web.models import GetWebSearchUrlsRequest

load_dotenv()

async def search_multiple_sites_urls():
    client = WebSearchClient(base_url=os.getenv("SMARTSEARCH_BASE_URL"))
    await client.authenticate(token=os.getenv("SMARTSEARCH_TOKEN"))

    request = GetWebSearchUrlsRequest(
        query="machine learning tutorials",
        site=["https://medium.com", "https://towardsdatascience.com"],
        size=15
    )

    response = await client.search_web_urls(request)

    for url in response["data"]:
        print(f"URL: {url}")

    return response

asyncio.run(search_multiple_sites_urls())

Web - Fetch Page

Fetch a single web page by URL and return its content and metadata.

Signature

async def fetch_web_page(
    self,
    request: GetWebPageRequest,
    stream: bool = False,
    timeout: float = TimeoutEnum.SIMPLE_PROCESS_TIMEOUT.value,
) -> Union[dict, AsyncGenerator[dict, None]]

Parameters

Parameter

Type

Required

Default

Description

request

GetWebPageRequest

Yes

Page fetch request object

stream

bool

False

Enable streaming response

timeout

float

30.0

Request timeout in seconds

Request Fields

GetWebPageRequest:

source (AnyHttpUrl): The URL of the web page to fetch
json_schema (dict[str, Any] | None, optional): JSON schema for custom structured data extraction. When provided, uses Firecrawl extract API to extract data matching the schema.
return_html (bool): Whether to return full HTML content (default: False)

Returns

Non-streaming: dict with page content and metadata
Streaming: AsyncGenerator[dict, None] yielding page chunks

Response Structure

{
    "metadata": {
        "title": str,
        "description": str,
        "language": str,
        "status_code": int
    },
    "markdown": str,
    "html": str,  # Only if return_html=True
    "json": dict  # Only if json_schema provided
}

Examples

Fetch page:

import asyncio
import os
from dotenv import load_dotenv

from smart_search_sdk.web.client import WebSearchClient
from smart_search_sdk.web.models import GetWebPageRequest

load_dotenv()

async def fetch_page():
    client = WebSearchClient(base_url=os.getenv("SMARTSEARCH_BASE_URL"))
    await client.authenticate(token=os.getenv("SMARTSEARCH_TOKEN"))

    request = GetWebPageRequest(
        source="https://docs.python.org/3/tutorial/",
        return_html=False
    )

    response = await client.fetch_web_page(request)

    print(f"Title: {response['metadata']['title']}")
    print(f"Content: {response['markdown'][:500]}")

    return response

asyncio.run(fetch_page())

Fetch with HTML:

import asyncio
import os
from dotenv import load_dotenv

from smart_search_sdk.web.client import WebSearchClient
from smart_search_sdk.web.models import GetWebPageRequest

load_dotenv()

async def fetch_with_html():
    client = WebSearchClient(base_url=os.getenv("SMARTSEARCH_BASE_URL"))
    await client.authenticate(token=os.getenv("SMARTSEARCH_TOKEN"))

    request = GetWebPageRequest(
        source="https://example.com/article",
        return_html=True
    )

    response = await client.fetch_web_page(request)

    # Access both markdown and HTML
    markdown_content = response["markdown"]
    html_content = response["html"]

    return response

asyncio.run(fetch_with_html())

Fetch with JSON schema extraction:

import asyncio
import os
from dotenv import load_dotenv

from smart_search_sdk.web.client import WebSearchClient
from smart_search_sdk.web.models import GetWebPageRequest

load_dotenv()

async def fetch_with_schema():
    client = WebSearchClient(base_url=os.getenv("SMARTSEARCH_BASE_URL"))
    await client.authenticate(token=os.getenv("SMARTSEARCH_TOKEN"))

    # Define schema for structured extraction
    schema = {
        "type": "object",
        "properties": {
            "title": {
                "type": "string",
                "description": "The main title of the article"
            },
            "author": {
                "type": "string",
                "description": "Author name"
            },
            "publish_date": {
                "type": "string",
                "description": "Publication date"
            },
            "main_points": {
                "type": "array",
                "items": {"type": "string"},
                "description": "Key points from the article"
            }
        },
        "required": ["title", "main_points"]
    }

    request = GetWebPageRequest(
        source="https://example.com/blog/article",
        json_schema=schema,
        return_html=False
    )

    response = await client.fetch_web_page(request)

    # Access extracted structured data
    print(f"Title: {response['json']['title']}")
    print(f"Author: {response['json'].get('author', 'N/A')}")
    print(f"Main Points: {response['json']['main_points']}")

    return response

asyncio.run(fetch_with_schema())

Using Pydantic Models (Recommended):

import asyncio
import os
from dotenv import load_dotenv
from pydantic import BaseModel

from smart_search_sdk.web.client import WebSearchClient
from smart_search_sdk.web.models import GetWebPageRequest

load_dotenv()

class ArticleInfo(BaseModel):
    """Pydantic model for article extraction"""
    title: str
    author: str
    publish_date: str
    main_points: list[str]

async def fetch_with_pydantic_schema():
    client = WebSearchClient(base_url=os.getenv("SMARTSEARCH_BASE_URL"))
    await client.authenticate(token=os.getenv("SMARTSEARCH_TOKEN"))

    request = GetWebPageRequest(
        source="https://example.com/blog/article",
        json_schema=ArticleInfo.model_json_schema(),
        return_html=False
    )

    response = await client.fetch_web_page(request)

    # Access extracted structured data
    print(f"Title: {response['json']['title']}")
    print(f"Author: {response['json'].get('author', 'N/A')}")
    print(f"Main Points: {response['json']['main_points']}")

    return response

asyncio.run(fetch_with_pydantic_schema())

Web - Get Web Page Snippets

Extract relevant text snippets from a web page for a given query.

Signature

async def get_web_page_snippets(
    self,
    request: GetWebPageSnippetsRequest,
    stream: bool = False,
    timeout: float = TimeoutEnum.COMPLEX_PROCESS_TIMEOUT.value,
) -> Union[dict, AsyncGenerator[dict, None]]

Parameters

Parameter

Type

Required

Default

Description

request

GetWebPageSnippetsRequest

Yes

Snippets request object

stream

bool

False

Enable streaming response

timeout

float

300.0

Request timeout in seconds

Request Fields

GetWebPageSnippetsRequest:

query (str): Query to search for within the page
source (AnyHttpUrl): URL of the web page to extract from
size (int): Maximum number of snippets (1-50)
json_schema (dict[str, Any] | None, optional): Optional JSON schema for custom extraction. If provided, uses Firecrawl extract API instead of standard snippet extraction.
snippet_style (Literal["paragraph", "sentence"]): Style of snippet (default: "paragraph")

Returns

Non-streaming: dict with snippets
Streaming: AsyncGenerator[dict, None] yielding snippet chunks

Response Structure

{
    "data": [
        {
            "content": str,
            "confidence_score": float
        }
    ]
}

Examples

Extract snippets:

import asyncio
import os
from dotenv import load_dotenv

from smart_search_sdk.web.client import WebSearchClient
from smart_search_sdk.web.models import GetWebPageSnippetsRequest

load_dotenv()

async def extract_snippets():
    client = WebSearchClient(base_url=os.getenv("SMARTSEARCH_BASE_URL"))
    await client.authenticate(token=os.getenv("SMARTSEARCH_TOKEN"))

    request = GetWebPageSnippetsRequest(
        query="list comprehension",
        source="https://docs.python.org/3/tutorial/",
        size=3,
        snippet_style="paragraph"
    )

    response = await client.get_web_page_snippets(request)

    for snippet in response["data"]:
        print(f"Snippet: {snippet['content']}")
        print(f"Score: {snippet['confidence_score']}")
        print("---")

    return response

asyncio.run(extract_snippets())

Extract sentence-style snippets:

import asyncio
import os
from dotenv import load_dotenv

from smart_search_sdk.web.client import WebSearchClient
from smart_search_sdk.web.models import GetWebPageSnippetsRequest

load_dotenv()

async def extract_sentences():
    client = WebSearchClient(base_url=os.getenv("SMARTSEARCH_BASE_URL"))
    await client.authenticate(token=os.getenv("SMARTSEARCH_TOKEN"))

    request = GetWebPageSnippetsRequest(
        query="async programming",
        source="https://docs.python.org/3/library/asyncio.html",
        size=5,
        snippet_style="sentence"
    )

    response = await client.get_web_page_snippets(request)
    return response

asyncio.run(extract_sentences())

Extract with JSON schema:

import asyncio
import os
from dotenv import load_dotenv

from smart_search_sdk.web.client import WebSearchClient
from smart_search_sdk.web.models import GetWebPageSnippetsRequest

load_dotenv()

async def extract_with_schema():
    client = WebSearchClient(base_url=os.getenv("SMARTSEARCH_BASE_URL"))
    await client.authenticate(token=os.getenv("SMARTSEARCH_TOKEN"))

    # Schema for extracting blog post information
    schema = {
        "type": "object",
        "properties": {
            "post_title": {
                "type": "string",
                "description": "Title of the blog post"
            },
            "code_examples": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "language": {"type": "string"},
                        "code": {"type": "string"},
                        "description": {"type": "string"}
                    }
                },
                "description": "Code examples from the post"
            },
            "key_concepts": {
                "type": "array",
                "items": {"type": "string"},
                "description": "Main concepts explained"
            }
        },
        "required": ["post_title", "key_concepts"]
    }

    request = GetWebPageSnippetsRequest(
        query="python decorators",
        source="https://realpython.com/primer-on-python-decorators/",
        size=5,
        json_schema=schema,
        snippet_style="paragraph"
    )

    response = await client.get_web_page_snippets(request)

    # Access structured data
    print(f"Title: {response['raw_data']['post_title']}")
    print(f"Key Concepts: {response['raw_data']['key_concepts']}")
    if 'code_examples' in response['raw_data']:
        for example in response['raw_data']['code_examples']:
            print(f"\nLanguage: {example['language']}")
            print(f"Code: {example['code'][:100]}...")

    return response

asyncio.run(extract_with_schema())

Using Pydantic Models (Recommended):

import asyncio
import os
from dotenv import load_dotenv
from pydantic import BaseModel

from smart_search_sdk.web.client import WebSearchClient
from smart_search_sdk.web.models import GetWebPageSnippetsRequest

load_dotenv()

class CodeExample(BaseModel):
    language: str
    code: str
    description: str

class BlogPostInfo(BaseModel):
    """Schema for extracting blog post information"""
    post_title: str
    code_examples: list[CodeExample]
    key_concepts: list[str]

async def extract_with_pydantic():
    client = WebSearchClient(base_url=os.getenv("SMARTSEARCH_BASE_URL"))
    await client.authenticate(token=os.getenv("SMARTSEARCH_TOKEN"))

    request = GetWebPageSnippetsRequest(
        query="python decorators",
        source="https://realpython.com/primer-on-python-decorators/",
        size=5,
        json_schema=BlogPostInfo.model_json_schema(),
        snippet_style="paragraph"
    )

    response = await client.get_web_page_snippets(request)

    # Access structured data
    print(f"Title: {response['raw_data']['post_title']}")
    print(f"Key Concepts: {response['raw_data']['key_concepts']}")
    if 'code_examples' in response['raw_data']:
        for example in response['raw_data']['code_examples']:
            print(f"\nLanguage: {example['language']}")
            print(f"Code: {example['code'][:100]}...")

    return response

asyncio.run(extract_with_pydantic())

Web - Get Web Page Keypoints

Fetch a web page and extract concise key points from it.

Signature

async def get_web_page_keypoints(
    self,
    request: GetWebPageKeypointsRequest,
    stream: bool = False,
    timeout: float = TimeoutEnum.COMPLEX_PROCESS_TIMEOUT.value,
) -> Union[dict, AsyncGenerator[dict, None]]

Parameters

Parameter

Type

Required

Default

Description

request

GetWebPageKeypointsRequest

Yes

Keypoints request object

stream

bool

False

Enable streaming response

timeout

float

300.0

Request timeout in seconds

Request Fields

GetWebPageKeypointsRequest:

query (str): Query to search for
source (AnyHttpUrl): URL of the web page to extract from
size (int): Maximum number of keypoints (1-50)
json_schema (dict[str, Any] | None, optional): Optional JSON schema for custom extraction. If provided, uses Firecrawl extract API.

Returns

Non-streaming: dict with keypoints
Streaming: AsyncGenerator[dict, None] yielding keypoint chunks

Response Structure

{
    "data": [
        {
            "content": str,
            "confidence_score": float
        }
    ]
}

Examples

Extract keypoints:

import asyncio
import os
from dotenv import load_dotenv

from smart_search_sdk.web.client import WebSearchClient
from smart_search_sdk.web.models import GetWebPageKeypointsRequest

load_dotenv()

async def extract_keypoints():
    client = WebSearchClient(base_url=os.getenv("SMARTSEARCH_BASE_URL"))
    await client.authenticate(token=os.getenv("SMARTSEARCH_TOKEN"))

    request = GetWebPageKeypointsRequest(
        query="Python features",
        source="https://www.python.org/about/",
        size=5
    )

    response = await client.get_web_page_keypoints(request)

    for keypoint in response["data"]:
        print(f"Key Point: {keypoint['content']}")
        print(f"Score: {keypoint['confidence_score']}")
        print("---")

    return response

asyncio.run(extract_keypoints())

Extract keypoints with JSON schema:

import asyncio
import os
from dotenv import load_dotenv

from smart_search_sdk.web.client import WebSearchClient
from smart_search_sdk.web.models import GetWebPageKeypointsRequest

load_dotenv()

async def extract_keypoints_with_schema():
    client = WebSearchClient(base_url=os.getenv("SMARTSEARCH_BASE_URL"))
    await client.authenticate(token=os.getenv("SMARTSEARCH_TOKEN"))

    schema = {
        "type": "object",
        "properties": {
            "product_name": {"type": "string"},
            "key_features": {
                "type": "array",
                "items": {"type": "string"},
                "description": "Main features of the product"
            },
            "pricing": {
                "type": "object",
                "properties": {
                    "currency": {"type": "string"},
                    "amount": {"type": "number"}
                }
            },
            "benefits": {
                "type": "array",
                "items": {"type": "string"}
            }
        },
        "required": ["product_name", "key_features"]
    }

    request = GetWebPageKeypointsRequest(
        query="product features and benefits",
        source="https://example.com/product-page",
        size=5,
        json_schema=schema
    )

    response = await client.get_web_page_keypoints(request)

    # Access structured data
    print(f"Product: {response['raw_data']['product_name']}")
    print(f"Features: {response['raw_data']['key_features']}")
    if 'pricing' in response['raw_data']:
        pricing = response['raw_data']['pricing']
        print(f"Price: {pricing['currency']} {pricing['amount']}")

    return response

asyncio.run(extract_keypoints_with_schema())

Using Pydantic Models (Recommended):

import asyncio
import os
from dotenv import load_dotenv
from pydantic import BaseModel

from smart_search_sdk.web.client import WebSearchClient
from smart_search_sdk.web.models import GetWebPageKeypointsRequest

load_dotenv()

class Pricing(BaseModel):
    currency: str
    amount: float

class ProductInfo(BaseModel):
    """Schema for product information extraction"""
    product_name: str
    key_features: list[str]
    pricing: Pricing | None = None
    benefits: list[str]

async def extract_keypoints_pydantic():
    client = WebSearchClient(base_url=os.getenv("SMARTSEARCH_BASE_URL"))
    await client.authenticate(token=os.getenv("SMARTSEARCH_TOKEN"))

    request = GetWebPageKeypointsRequest(
        query="product features and benefits",
        source="https://example.com/product-page",
        size=5,
        json_schema=ProductInfo.model_json_schema()
    )

    response = await client.get_web_page_keypoints(request)

    # Access structured data
    print(f"Product: {response['raw_data']['product_name']}")
    print(f"Features: {response['raw_data']['key_features']}")
    if 'pricing' in response['raw_data']:
        pricing = response['raw_data']['pricing']
        print(f"Price: {pricing['currency']} {pricing['amount']}")

    return response

asyncio.run(extract_keypoints_pydantic())

Models

GetWebSearchResultsRequest

from smart_search_sdk.web.models import GetWebSearchResultsRequest

request = GetWebSearchResultsRequest(
    query="your search query",
    result_type="snippets",  # or "keypoints", "summary", "description"
    size=10,
    site=["https://github.com", "https://stackoverflow.com"],  # Optional: list of sites
    engine=WebSearchEngine.AUTO  # Optional: auto, firecrawl, or perplexity
)

GetWebSearchUrlsRequest

from smart_search_sdk.web.models import GetWebSearchUrlsRequest

request = GetWebSearchUrlsRequest(
    query="your search query",
    size=10,
    site=["https://realpython.com", "https://medium.com"],  # Optional: list of sites
    engine=WebSearchEngine.AUTO  # Optional: auto, firecrawl, or perplexity
)

GetWebPageRequest

from smart_search_sdk.web.models import GetWebPageRequest

request = GetWebPageRequest(
    source="https://example.com/page",
    json_schema={...},  # Optional: JSON schema for structured extraction
    return_html=True
)

GetWebPageSnippetsRequest

from smart_search_sdk.web.models import GetWebPageSnippetsRequest

request = GetWebPageSnippetsRequest(
    query="your query",
    source="https://example.com/page",
    size=5,
    json_schema=None,  # Optional: JSON schema for custom extraction
    snippet_style="paragraph"  # or "sentence"
)

GetWebPageKeypointsRequest

from smart_search_sdk.web.models import GetWebPageKeypointsRequest

request = GetWebPageKeypointsRequest(
    query="your query",
    source="https://example.com/page",
    size=5,
    json_schema=None  # Optional: JSON schema for custom extraction
)

GetWebSearchMapRequest

from smart_search_sdk.web.models import GetWebSearchMapRequest

request = GetWebSearchMapRequest(
    base_url="https://example.com",
    page=1,
    size=50,
    return_all_map=False,
    include_subdomains=False,
    query="optional filter query"
)

GetWebSearchMapResponse

from smart_search_sdk.web.models import GetWebSearchMapResponse

# Response structure
response = {
    "data": [
        {
            "url": "https://example.com/page1",
            "title": "Page Title",
            "description": "Page description"
        }
    ],
    "total_found_links": 42,
    "page": 1,
    "size": 20,
    "total_pages": 3
}

Complete Examples

Multi-Step Web Research

import asyncio
import os
from dotenv import load_dotenv

from smart_search_sdk.web.client import WebSearchClient
from smart_search_sdk.web.models import (
    GetWebSearchUrlsRequest,
    GetWebPageSnippetsRequest
)

load_dotenv()

async def research_topic(topic: str):
    client = WebSearchClient(base_url=os.getenv("SMARTSEARCH_BASE_URL"))
    await client.authenticate(token=os.getenv("SMARTSEARCH_TOKEN"))

    # Step 1: Get relevant URLs
    url_request = GetWebSearchUrlsRequest(
        query=topic,
        size=5
    )
    url_response = await client.search_web_urls(url_request)
    urls = url_response["data"]

    # Step 2: Extract snippets from each URL
    all_snippets = []
    for url in urls[:3]:  # Process first 3 URLs
        snippet_request = GetWebPageSnippetsRequest(
            query=topic,
            source=url,
            size=3,
            snippet_style="paragraph"
        )

        try:
            snippet_response = await client.get_web_page_snippets(snippet_request)
            all_snippets.extend(snippet_response["data"])
        except Exception as e:
            print(f"Error processing {url}: {e}")

    return all_snippets

snippets = asyncio.run(research_topic("machine learning basics"))
for snippet in snippets:
    print(snippet["content"])
    print("---")

Parallel URL Processing

import asyncio
import os
from dotenv import load_dotenv

from smart_search_sdk.web.client import WebSearchClient
from smart_search_sdk.web.models import GetWebPageRequest

load_dotenv()

async def fetch_multiple_pages(urls: list[str]):
    client = WebSearchClient(base_url=os.getenv("SMARTSEARCH_BASE_URL"))
    await client.authenticate(token=os.getenv("SMARTSEARCH_TOKEN"))

    async def fetch_single(url: str):
        request = GetWebPageRequest(source=url, return_html=False)
        try:
            return await client.fetch_web_page(request)
        except Exception as e:
            print(f"Error fetching {url}: {e}")
            return None

    # Fetch all pages concurrently
    tasks = [fetch_single(url) for url in urls]
    results = await asyncio.gather(*tasks)

    return [r for r in results if r is not None]

urls = [
    "https://docs.python.org/3/tutorial/",
    "https://realpython.com/",
    "https://www.python.org/about/"
]

pages = asyncio.run(fetch_multiple_pages(urls))
for page in pages:
    print(f"Title: {page['metadata']['title']}")

Streaming Large Result Sets

import asyncio
import os
from dotenv import load_dotenv

from smart_search_sdk.web.client import WebSearchClient
from smart_search_sdk.web.models import GetWebSearchResultsRequest

load_dotenv()

async def stream_search_results(query: str):
    client = WebSearchClient(base_url=os.getenv("SMARTSEARCH_BASE_URL"))
    await client.authenticate(token=os.getenv("SMARTSEARCH_TOKEN"))

    request = GetWebSearchResultsRequest(
        query=query,
        result_type="snippets",
        size=50
    )

    stream = await client.search_web(request, stream=True)

    count = 0
    async for chunk in stream:
        count += 1
        print(f"Chunk {count}: {chunk}")

    print(f"\nReceived {count} chunks total")

asyncio.run(stream_search_results("artificial intelligence"))

Error Handling

import asyncio
import os
from dotenv import load_dotenv

from smart_search_sdk.web.client import WebSearchClient
from smart_search_sdk.web.models import GetWebSearchResultsRequest

load_dotenv()

async def safe_web_search(query: str):
    client = WebSearchClient(base_url=os.getenv("SMARTSEARCH_BASE_URL"))

    try:
        await client.authenticate(token=os.getenv("SMARTSEARCH_TOKEN"))

        request = GetWebSearchResultsRequest(
            query=query,
            result_type="snippets",
            size=10
        )

        response = await client.search_web(request, timeout=60.0)
        return response

    except TimeoutError:
        print("Request timed out")
    except Exception as e:
        print(f"Error occurred: {e}")

    return None

asyncio.run(safe_web_search("Python tutorials"))

Best Practices

Authentication: Always authenticate before making requests

await client.authenticate(token=os.getenv("SMARTSEARCH_TOKEN"))

Timeout Configuration: Set appropriate timeouts for different operations

# Short timeout for URL search
response = await client.search_web_urls(request, timeout=30.0)

# Longer timeout for complex search
response = await client.search_web(request, timeout=300.0)

Streaming for Large Results: Use streaming for better performance

stream = await client.search_web(request, stream=True)
async for chunk in stream:
    process_chunk(chunk)

Error Handling: Always wrap requests in try-except blocks

try:
    response = await client.search_web(request)
except Exception as e:
    logger.error(f"Search failed: {e}")

Resource Cleanup: Use async context managers when available

async with WebSearchClient(base_url=url) as client:
    await client.authenticate(token=os.getenv("SMARTSEARCH_TOKEN"))
    # Use client

Batch Processing: Process multiple URLs concurrently

tasks = [fetch_page(url) for url in urls]
results = await asyncio.gather(*tasks)

Size Optimization: Request only what you need

request = GetWebSearchResultsRequest(
    query=query,
    size=5  # Start small, increase if needed
)

Choose Appropriate Result Types:
- Use snippets for detailed text excerpts
- Use keypoints for concise summaries
Snippet Style Selection:
- Use paragraph for contextual information
- Use sentence for precise, concise answers

PreviousClient API NextConnector Client

Last updated 1 month ago

hashtagOverview

hashtagInstallation

hashtagQuick Start

hashtagClass: WebSearchClient

hashtagConstructor

hashtagMethods

hashtagWeb - Search

hashtagSignature

hashtagParameters

hashtagRequest Fields

hashtagReturns

hashtagResponse Structure

hashtagExamples

hashtagWeb - Search Map

hashtagSignature

hashtagParameters

hashtagRequest Fields

hashtagReturns

hashtagResponse Structure

hashtagExamples

hashtagWeb - Search URLs

hashtagSignature

hashtagParameters

hashtagRequest Fields

hashtagReturns

hashtagResponse Structure

hashtagExamples

hashtagWeb - Fetch Page

hashtagSignature

hashtagParameters

hashtagRequest Fields

hashtagReturns

hashtagResponse Structure

hashtagExamples

hashtagWeb - Get Web Page Snippets

hashtagSignature

hashtagParameters

hashtagRequest Fields

hashtagReturns

hashtagResponse Structure

hashtagExamples

hashtagWeb - Get Web Page Keypoints

hashtagSignature

hashtagParameters

hashtagRequest Fields

hashtagReturns

hashtagResponse Structure

hashtagExamples

hashtagModels

hashtagGetWebSearchResultsRequest

hashtagGetWebSearchUrlsRequest

hashtagGetWebPageRequest

hashtagGetWebPageSnippetsRequest

hashtagGetWebPageKeypointsRequest

hashtagGetWebSearchMapRequest

hashtagGetWebSearchMapResponse

hashtagComplete Examples

hashtagMulti-Step Web Research

hashtagParallel URL Processing

hashtagStreaming Large Result Sets

hashtagError Handling

hashtagBest Practices

hashtagRelated Documentation

Overview

Installation

Quick Start

Class: WebSearchClient

Constructor

Methods

Web - Search

Signature

Parameters

Request Fields

Returns

Response Structure

Examples

Web - Search Map

Signature

Parameters

Request Fields

Returns

Response Structure

Examples

Web - Search URLs

Signature

Parameters

Request Fields

Returns

Response Structure

Examples

Web - Fetch Page

Signature

Parameters

Request Fields

Returns

Response Structure

Examples

Web - Get Web Page Snippets

Signature

Parameters

Request Fields

Returns

Response Structure

Examples

Web - Get Web Page Keypoints

Signature

Parameters

Request Fields

Returns

Response Structure

Examples

Models

GetWebSearchResultsRequest

GetWebSearchUrlsRequest

GetWebPageRequest

GetWebPageSnippetsRequest

GetWebPageKeypointsRequest

GetWebSearchMapRequest

GetWebSearchMapResponse

Complete Examples

Multi-Step Web Research

Parallel URL Processing

Streaming Large Result Sets

Error Handling

Best Practices

Related Documentation