Web Search Client
The WebSearchClient class provides a Python interface for web searching, page fetching, and content extraction.
Overview
The Web Search Client allows developers to:
Search the web for relevant documents and pages
Retrieve URLs matching search queries
Fetch and parse web pages
Extract snippets and keypoints from web content
Handle both streaming and non-streaming responses
Installation
pip install smart-search-sdkQuick Start
import asyncio
import os
from dotenv import load_dotenv
from smart_search_sdk.web.client import WebSearchClient
from smart_search_sdk.web.models import GetWebSearchResultsRequest
load_dotenv()
async def main():
# Initialize the client
client = WebSearchClient(base_url=os.getenv("SMARTSEARCH_BASE_URL"))
# Authenticate
await client.authenticate(token=os.getenv("SMARTSEARCH_TOKEN"))
# Search the web
request = GetWebSearchResultsRequest(
query="Python tutorials",
result_type="snippets",
size=5
)
response = await client.search_web(request, stream=False)
for item in response["data"]:
print(f"Title: {item['metadata']['title']}")
print(f"URL: {item['metadata']['source']}")
print(f"Content: {item['content'][:100]}...")
print("---")
asyncio.run(main())Class: WebSearchClient
Constructor
Parameters:
base_url(str): The base URL of the Smart Search API
Example:
Methods
Web - Search
Search the web for documents or pages relevant to the input query.
Signature
Parameters
request
GetWebSearchResultsRequest
Yes
-
Search request object
stream
bool
No
False
Enable streaming response
timeout
float
No
300.0
Request timeout in seconds
Request Fields
GetWebSearchResultsRequest:
query(str): The search query stringresult_type(Literal["snippets", "keypoints", "summary", "description"]): Type of results to returnsize(int): Maximum number of results (1-50)site(list[AnyHttpUrl] | None, optional): List of URLs to limit search results to specific sites or domains. Can also accept a single URL string which will be converted to a list.engine(WebSearchEngine, optional): Search engine to use (auto, firecrawl, perplexity). Defaults to auto which uses the default engine.
Returns
Non-streaming:
dictwith search resultsStreaming:
AsyncGenerator[dict, None]yielding response chunks
Response Structure
Examples
Basic search:
Search with keypoints:
Search with site filter:
Search with multiple sites:
Search with specific engine:
Streaming search:
Web - Search Map
Map a website and discover its URL structure and hierarchy.
Signature
Parameters
request
GetWebSearchMapRequest
Yes
-
Website map request object
stream
bool
No
False
Enable streaming response
timeout
float
No
30.0
Request timeout in seconds
Request Fields
GetWebSearchMapRequest:
base_url(AnyHttpUrl): The base URL of the website to mappage(int): Page number for pagination (default: 1)size(int): Maximum number of URLs to return (1-1000, default: 20)return_all_map(bool): Whether to return all mapped links (default: False)include_subdomains(bool): Whether to include subdomains in the mapping (default: False)query(str, optional): Optional search query to filter URLs by keywords
Returns
Non-streaming:
dictwith mapped URLsStreaming:
AsyncGenerator[dict, None]yielding URL chunks
Response Structure
Examples
Basic website mapping:
Map with keyword filtering:
Streaming website map:
Return all mapped links:
Pagination:
Web - Search URLs
Search the web for pages and return their URLs (like a search engine).
Signature
Parameters
request
GetWebSearchUrlsRequest
Yes
-
URL search request object
stream
bool
No
False
Enable streaming response
timeout
float
No
30.0
Request timeout in seconds
Request Fields
GetWebSearchUrlsRequest:
query(str): The search query stringsize(int): Maximum number of URLs (1-50)site(list[AnyHttpUrl] | None, optional): List of URLs to limit search results to specific sites or domains. Can also accept a single URL string which will be converted to a list.engine(WebSearchEngine, optional): Search engine to use (auto, firecrawl, perplexity). Defaults to auto which uses the default engine.
Returns
Non-streaming:
dictwith URL listStreaming:
AsyncGenerator[dict, None]yielding URL chunks
Response Structure
Examples
Get URLs:
Get URLs with site filter:
Search multiple sites for URLs:
Web - Fetch Page
Fetch a single web page by URL and return its content and metadata.
Signature
Parameters
request
GetWebPageRequest
Yes
-
Page fetch request object
stream
bool
No
False
Enable streaming response
timeout
float
No
30.0
Request timeout in seconds
Request Fields
GetWebPageRequest:
source(AnyHttpUrl): The URL of the web page to fetchjson_schema(dict[str, Any] | None, optional): JSON schema for custom structured data extraction. When provided, uses Firecrawl extract API to extract data matching the schema.return_html(bool): Whether to return full HTML content (default: False)
Returns
Non-streaming:
dictwith page content and metadataStreaming:
AsyncGenerator[dict, None]yielding page chunks
Response Structure
Examples
Fetch page:
Fetch with HTML:
Fetch with JSON schema extraction:
Using Pydantic Models (Recommended):
Web - Get Web Page Snippets
Extract relevant text snippets from a web page for a given query.
Signature
Parameters
request
GetWebPageSnippetsRequest
Yes
-
Snippets request object
stream
bool
No
False
Enable streaming response
timeout
float
No
300.0
Request timeout in seconds
Request Fields
GetWebPageSnippetsRequest:
query(str): Query to search for within the pagesource(AnyHttpUrl): URL of the web page to extract fromsize(int): Maximum number of snippets (1-50)json_schema(dict[str, Any] | None, optional): Optional JSON schema for custom extraction. If provided, uses Firecrawl extract API instead of standard snippet extraction.snippet_style(Literal["paragraph", "sentence"]): Style of snippet (default: "paragraph")
Returns
Non-streaming:
dictwith snippetsStreaming:
AsyncGenerator[dict, None]yielding snippet chunks
Response Structure
Examples
Extract snippets:
Extract sentence-style snippets:
Extract with JSON schema:
Using Pydantic Models (Recommended):
Web - Get Web Page Keypoints
Fetch a web page and extract concise key points from it.
Signature
Parameters
request
GetWebPageKeypointsRequest
Yes
-
Keypoints request object
stream
bool
No
False
Enable streaming response
timeout
float
No
300.0
Request timeout in seconds
Request Fields
GetWebPageKeypointsRequest:
query(str): Query to search forsource(AnyHttpUrl): URL of the web page to extract fromsize(int): Maximum number of keypoints (1-50)json_schema(dict[str, Any] | None, optional): Optional JSON schema for custom extraction. If provided, uses Firecrawl extract API.
Returns
Non-streaming:
dictwith keypointsStreaming:
AsyncGenerator[dict, None]yielding keypoint chunks
Response Structure
Examples
Extract keypoints:
Extract keypoints with JSON schema:
Using Pydantic Models (Recommended):
Models
GetWebSearchResultsRequest
GetWebSearchUrlsRequest
GetWebPageRequest
GetWebPageSnippetsRequest
GetWebPageKeypointsRequest
GetWebSearchMapRequest
GetWebSearchMapResponse
Complete Examples
Multi-Step Web Research
Parallel URL Processing
Streaming Large Result Sets
Error Handling
Best Practices
Authentication: Always authenticate before making requests
Timeout Configuration: Set appropriate timeouts for different operations
Streaming for Large Results: Use streaming for better performance
Error Handling: Always wrap requests in try-except blocks
Resource Cleanup: Use async context managers when available
Batch Processing: Process multiple URLs concurrently
Size Optimization: Request only what you need
Choose Appropriate Result Types:
Use
snippetsfor detailed text excerptsUse
keypointsfor concise summaries
Snippet Style Selection:
Use
paragraphfor contextual informationUse
sentencefor precise, concise answers
Related Documentation
Last updated
