Browser Use Agent
Overview
The Browser Use Agent enables AI agents to interact with websites automatically. It can navigate pages, fill forms, extract data, and handle complex scenarios like CAPTCHAs and logins with human assistance when needed.
Execution Flow & Architecture
Process Flow Diagram
Workflow Explanation
Task Submission: You provide a web automation task (e.g., "Search for jobs on LinkedIn")
Browser Session Creation: The agent creates an isolated browser session and provides you with a live browser URL
Step-by-Step Execution: The agent plans and executes actions one by one:
Analyzes the current page
Decides what to do next (click button, fill form, navigate)
Executes the action
Reports progress
Human Assistance: When the agent encounters challenges like CAPTCHAs or login prompts, you can help by opening the live browser URL and completing them manually
Result Delivery: The agent returns the final results (extracted data, completion status, etc.)
Human-in-the-Loop Mechanisms
The Browser Use Agent handles challenging web scenarios through intelligent automation and optional human assistance.
How It Handles CAPTCHAs and Logins
Automatic Attempts: The agent tries to handle challenges automatically using its built-in instructions:
CAPTCHAs: Attempts to solve them when possible; uses alternative strategies if blocked
Logins: Only attempts login if credentials are provided or explicitly required
Stuck Situations: Re-evaluates the task and tries different approaches
When You Need to Help: Sometimes the agent needs human assistance for complex challenges:
Agent encounters a challenge (CAPTCHA, login prompt, etc.) during execution
Live browser URL is available - You receive a URL that shows the current browser session in real-time
You open the URL and manually complete the challenge (solve CAPTCHA, enter login credentials, etc.)
Agent continues - The agent proceeds with its next step independently (it doesn't wait for you)
Task continues - When the agent executes its next step, it may detect that you've completed the challenge and continue with the updated page state (this detection is automatic but not guaranteed)
Important Points:
The agent doesn't pause or wait for you - it continues executing steps independently
You can help at any time by opening the live browser URL
The agent may detect your changes when it executes its next step (this is automatic, not guaranteed)
There's no explicit pause/resume - you're helping in parallel with the agent's execution
Example Scenario: Job Board Search
Task: "Search for software engineer jobs on a job board and extract the first 5 listings"
What Happens:
Agent navigates to the job board website
Agent encounters a CAPTCHA during search
You receive a live browser URL
You open the URL and solve the CAPTCHA manually
Agent continues searching and extracts job listings
Agent returns the results
If Login is Required:
If no credentials are provided, the agent skips login (per its instructions)
If login is necessary, you can complete it via the live browser URL
The agent then continues with the task
Recovery Strategies
When the agent gets stuck or encounters errors:
Automatic Retry: The agent tries alternative approaches automatically
Session Recovery: If the browser connection is lost, the agent recreates the session and continues
State Preservation: Your manual changes (like completing a CAPTCHA) are typically preserved in the browser session, and the agent may detect them when it executes its next step
How It Works
Browser Automation Process
Step-by-Step Execution:
The agent analyzes the current webpage
It plans the next action using AI reasoning
It executes the action (click, type, navigate, extract data)
It checks the result and plans the next step
This continues until the task is complete
Real-Time Updates:
You receive progress updates showing what the agent is doing
You can see the agent's "thinking" process
A live browser URL lets you watch or intervene if needed
Session Recording:
Optionally records a video of the entire browser session
Useful for debugging or reviewing what happened
Available after task completion
Error Handling
Automatic Recovery:
If the browser disconnects, the agent automatically recreates the session
If an action fails, the agent tries alternative approaches
Configurable retry limits prevent infinite loops
Common Issues:
CAPTCHA/Login Blocks: Use the live browser URL to complete manually
Element Not Found: Agent waits, refreshes, or tries alternative selectors
Session Disconnects: Automatic retry with session recreation
Sample Usage
Basic Web Automation
Using via SDK:
Example Output:
Task Requiring Human Assistance
What to Expect:
Agent starts executing the task
If a CAPTCHA appears, you receive a live browser URL as a status update
Open the URL, solve the CAPTCHA
Agent continues with its next step and might detect your changes when it executes the next action
Final results are returned with the job listings
Configuring Timeouts
You can configure timeout settings to match your task requirements:
Capabilities & Limitations
Known Capabilities
The Browser Use Agent excels at a wide range of web automation tasks:
Web Navigation & Form Filling
Use Case: Automatically fill out contact forms, registration pages, or search forms
Example Task: "Go to https://duckduckgo.com, search for 'Python web automation', and extract the titles of the first 5 search results"
Data Extraction & Collection
Use Case: Gather information from multiple pages or websites
Example Task: "Navigate to https://en.wikipedia.org/wiki/Python_(programming_language) and extract the first paragraph"
Multi-Step Task Automation
Use Case: Complete complex workflows that require multiple sequential actions
Example Task: "Go to https://www.python.org, navigate to the documentation section, find the 'Tutorial' page, and extract the main topics covered"
Scrolling & Pagination
Use Case: Navigate through long pages or multiple pages of results
Example Task: "Go to https://en.wikipedia.org/wiki/Python_(programming_language) and scroll down past the introduction section"
Multi-Tab Operations
Use Case: Open multiple tabs for research or parallel information gathering
Example Task: "Open https://www.python.org, open the documentation section in a new tab, then extract the main heading from each page"
Known Limitations
While powerful, the Browser Use Agent has some limitations:
CAPTCHAs in Iframes
Limitation: CAPTCHAs embedded in iframes are difficult to solve automatically
Example Scenario: A login page with a CAPTCHA widget loaded in an iframe may require manual intervention
Workaround: Use the live browser URL to complete CAPTCHAs manually when needed
Login Without Credentials
Limitation: The agent skips login attempts if no credentials are provided (by design for security)
Example Scenario: Task requires accessing a protected area but no login credentials are available
Workaround: Provide credentials in the task description or complete login manually via the live browser URL
Timeouts & Limits
Limitation: Several timeout and limit constraints may affect task execution:
Task Length: Tasks requiring more than 100 steps may face memory constraints. This is a practical limitation based on observed memory usage patterns, not a hard limit enforced by the tool. The browser-use framework roadmap includes plans to improve agent memory handling for longer tasks.
Timeout Settings: Three configurable timeout settings limit task duration (all configurable via
BrowserUseToolConfig):Steel Session API Timeout: Default 600 seconds (10 minutes) - controls how long the Steel session can remain active (
steel_timeout_in_ms)Browser Use Agent LLM Timeout: Default 60 seconds - controls how long the LLM has to respond for each planning step (
browser_use_llm_timeout_in_s)Browser Use Agent Step Timeout: Default 180 seconds (3 minutes) - controls how long each agent step can take (
browser_use_step_timeout_in_s)
Network Latency: Due to geographic distance between Browser Use deployment (South East Asia) and Steel servers (United States), network latency can cause timeout scenarios during rapid interactions
Steel Hobby Plan Limits: Browser Use currently uses Steel's free Hobby plan with the following limits (note: these limits are subject to change if we upgrade to a paid Steel plan):
Max Session Time: 15 minutes per browser session
Daily Requests: 500 requests per day
Requests per Second: 1 request per second rate limit
Concurrent Sessions: Maximum 5 concurrent browser sessions
Data Retention: Session data retained for 24 hours
Example Scenario: Long-running tasks exceeding 15 minutes will be terminated, rapid interactions may timeout due to network latency, or hitting daily request limits will prevent new sessions
Workaround: Break large tasks into smaller subtasks under 15 minutes, use the file system to track progress across multiple runs, configure timeout values if needed, or upgrade to a paid Steel plan for higher limits (see Steel Pricing)
Cross-Origin Iframe Interactions
Limitation: Interacting with elements inside cross-origin iframes can be unreliable
Example Scenario: A payment form embedded in an iframe from a different domain
Workaround: Manual intervention via live browser URL for critical iframe interactions
Sequential Execution
Limitation: Tasks execute sequentially, not in parallel
Example Scenario: Applying to 50 different job postings must be done one at a time
Workaround: For parallel tasks, run multiple agent instances or break into batches
UI Element Detection
Limitation: Some dynamically loaded or custom UI elements may not be immediately detected
Example Scenario: A custom dropdown menu that loads content via JavaScript after a delay
Workaround: The agent will wait and retry, or you can use the live browser URL to verify element visibility
Real-Time Interactive Elements
Limitation: Elements that require real-time human interaction (like drag-and-drop) may be challenging
Example Scenario: A complex image editor with drag-and-drop functionality
Workaround: Use manual intervention via live browser URL for complex interactions
Elements with Mouse Events
Limitation: Elements that rely on mouse event handlers (such as
mousedown,mouseup,mouseover, etc.) instead of standardclickevents may not respond correctly to agent interactionsExample Scenario: A custom button or interactive element that only triggers actions on mouse events (common in some JavaScript frameworks or custom UI libraries)
Workaround: Use manual intervention via live browser URL to interact with such elements, or contact support if this is a critical requirement
Token Consumption
Limitation: Very large pages with extensive DOM content can consume significant tokens
Example Scenario: A single-page application with thousands of interactive elements
Workaround: Configure vision detail levels (auto/low/high) to optimize token usage
Technical Details
Browser Sessions
The agent uses isolated browser sessions that:
Run in secure, isolated environments
Automatically clean up after task completion
Support real-time monitoring via live browser URLs
AI Models
The agent uses two AI models:
Primary Model: Plans actions and makes decisions
Secondary Model: Extracts structured data from web pages
Both models work together to understand pages and execute tasks effectively.
Streaming Events
The agent provides real-time updates through streaming events:
Status Updates: Progress notifications, session initialization
Step Results: Action execution results with thinking process
Live Browser URL: You'll receive the live browser URL as a status update early in execution, allowing you to monitor or intervene if needed
Security
Isolation:
Each browser session is isolated from others
No data persists between tasks
API keys are loaded from environment variables (never hardcoded)
Safety:
Actions are validated before execution (enforced by the browser-use framework)
Error messages are sanitized
Logging available for monitoring
Performance & Troubleshooting
Efficiency:
Configurable vision detail levels (auto/low/high) for faster processing
Background video recording doesn't slow down execution
Automatic resource cleanup
Common Issues:
CAPTCHA/Login blocks
Use the live browser URL to complete manually
Session disconnects
Automatic retry - agent recreates session
Element not found
Agent waits and retries with alternative approaches
Task stuck
Agent re-evaluates and tries different strategies
Debug Resources:
Live browser URLs for real-time monitoring
Video recordings (if enabled) for reviewing sessions
Action logs showing what the agent did
AI reasoning traces showing decision-making process
Last updated
Was this helpful?