Skip to content

Commit e6c248c

Browse files
authored
fix: update to newer models + stagehand v3 (#6)
1 parent 1a8e269 commit e6c248c

File tree

10 files changed

+2486
-163
lines changed

10 files changed

+2486
-163
lines changed

.cursorrules

Lines changed: 221 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -1,84 +1,264 @@
11
# Stagehand Project
22

3-
This is a project that uses Stagehand, which amplifies Playwright with `act`, `extract`, and `observe` added to the Page class.
3+
This is a project that uses Stagehand V3, a browser automation framework with AI-powered `act`, `extract`, `observe`, and `agent` methods.
44

5-
`Stagehand` is a class that provides config, a `StagehandPage` object via `stagehand.page`, and a `StagehandContext` object via `stagehand.context`.
5+
The main class can be imported as `Stagehand` from `@browserbasehq/stagehand`.
66

7-
`Page` is a class that extends the Playwright `Page` class and adds `act`, `extract`, and `observe` methods.
8-
`Context` is a class that extends the Playwright `BrowserContext` class.
7+
**Key Classes:**
98

10-
Use the following rules to write code for this project.
11-
12-
- When writing Playwright code, wrap it with Stagehand `act`
13-
- When writing code that needs to extract data from the page, use Stagehand `extract`
14-
- When writing code that needs to observe the page, use Stagehand `observe`
9+
- `Stagehand`: Main orchestrator class providing `act`, `extract`, `observe`, and `agent` methods
10+
- `context`: A `V3Context` object that manages browser contexts and pages
11+
- `page`: Individual page objects accessed via `stagehand.context.pages()[i]` or created with `stagehand.context.newPage()`
1512

1613
## Initialize
1714

1815
```typescript
1916
import { Stagehand } from "@browserbasehq/stagehand";
20-
import StagehandConfig from "./stagehand.config";
2117

22-
const stagehand = new Stagehand(StagehandConfig);
18+
const stagehand = new Stagehand({
19+
env: "LOCAL", // or "BROWSERBASE"
20+
verbose: 2, // 0, 1, or 2
21+
model: "openai/gpt-4.1-mini", // or any supported model
22+
});
23+
2324
await stagehand.init();
2425

25-
const page = stagehand.page; // Playwright Page with act, extract, and observe methods
26-
const context = stagehand.context; // Playwright BrowserContext
26+
// Access the browser context and pages
27+
const page = stagehand.context.pages()[0];
28+
const context = stagehand.context;
29+
30+
// Create new pages if needed
31+
const page2 = await stagehand.context.newPage();
2732
```
2833

2934
## Act
3035

31-
For example, if you are writing Playwright code, wrap it with Stagehand `act` like this:
36+
Actions are called on the `stagehand` instance (not the page). Use atomic, specific instructions:
3237

3338
```typescript
34-
try {
35-
await page.locator('button[name="Sign in"]').click();
36-
} catch (error) {
37-
await page.act({
38-
action: "click the sign in button",
39-
});
40-
}
39+
// Act on the current active page
40+
await stagehand.act("click the sign in button");
41+
42+
// Act on a specific page (when you need to target a page that isn't currently active)
43+
await stagehand.act("click the sign in button", { page: page2 });
4144
```
4245

43-
Act `action` should be as atomic and specific as possible, i.e. "Click the sign in button" or "Type 'hello' into the search input". Avoid actions that are too broad, i.e. "Order me pizza" or "Send an email to Paul asking him to call me". Actions work best for Playwright code that is vulnerable to unexpected DOM changes.
46+
**Important:** Act instructions should be atomic and specific:
47+
48+
- ✅ Good: "Click the sign in button" or "Type 'hello' into the search input"
49+
- ❌ Bad: "Order me pizza" or "Type in the search bar and hit enter" (multi-step)
50+
51+
### Observe + Act Pattern (Recommended)
52+
53+
Cache the results of `observe` to avoid unexpected DOM changes:
54+
55+
```typescript
56+
const instruction = "Click the sign in button";
4457

45-
When using `act`, write Playwright code FIRST, then wrap it with a try-catch block where the catch block is `act`.
58+
// Get candidate actions
59+
const actions = await stagehand.observe(instruction);
60+
61+
// Execute the first action
62+
await stagehand.act(actions[0]);
63+
```
64+
65+
To target a specific page:
66+
67+
```typescript
68+
const actions = await stagehand.observe("select blue as the favorite color", {
69+
page: page2,
70+
});
71+
await stagehand.act(actions[0], { page: page2 });
72+
```
4673

4774
## Extract
4875

49-
If you are writing code that needs to extract data from the page, use Stagehand `extract` like this:
76+
Extract data from pages using natural language instructions. The `extract` method is called on the `stagehand` instance.
77+
78+
### Basic Extraction (with schema)
5079

5180
```typescript
52-
const data = await page.extract({
53-
instruction: "extract the sign in button text",
54-
schema: z.object({
55-
text: z.string(),
81+
import { z } from "zod/v3";
82+
83+
// Extract with explicit schema
84+
const data = await stagehand.extract(
85+
"extract all apartment listings with prices and addresses",
86+
z.object({
87+
listings: z.array(
88+
z.object({
89+
price: z.string(),
90+
address: z.string(),
91+
}),
92+
),
5693
}),
57-
useTextExtract: true,
58-
});
94+
);
95+
96+
console.log(data.listings);
5997
```
6098

61-
`schema` is a Zod schema that describes the data you want to extract. To extract an array, make sure to pass in a single object that contains the array, as follows:
99+
### Simple Extraction (without schema)
62100

63101
```typescript
64-
const data = await page.extract({
65-
instruction: "extract the text inside all buttons",
66-
schema: z.object({
67-
text: z.array(z.string()),
102+
// Extract returns a default object with 'extraction' field
103+
const result = await stagehand.extract("extract the sign in button text");
104+
105+
console.log(result);
106+
// Output: { extraction: "Sign in" }
107+
108+
// Or destructure directly
109+
const { extraction } = await stagehand.extract(
110+
"extract the sign in button text",
111+
);
112+
console.log(extraction); // "Sign in"
113+
```
114+
115+
### Targeted Extraction
116+
117+
Extract data from a specific element using a selector:
118+
119+
```typescript
120+
const reason = await stagehand.extract(
121+
"extract the reason why script injection fails",
122+
z.string(),
123+
{ selector: "/html/body/div[2]/div[3]/iframe/html/body/p[2]" },
124+
);
125+
```
126+
127+
### URL Extraction
128+
129+
When extracting links or URLs, use `z.string().url()`:
130+
131+
```typescript
132+
const { links } = await stagehand.extract(
133+
"extract all navigation links",
134+
z.object({
135+
links: z.array(z.string().url()),
68136
}),
69-
});
137+
);
70138
```
71139

72-
Set `useTextExtract` to `true` for better results.
140+
### Extracting from a Specific Page
141+
142+
```typescript
143+
// Extract from a specific page (when you need to target a page that isn't currently active)
144+
const data = await stagehand.extract(
145+
"extract the placeholder text on the name field",
146+
{ page: page2 },
147+
);
148+
```
73149

74150
## Observe
75151

76-
If you are writing code that needs to observe the page, use Stagehand `observe` like this:
152+
Plan actions before executing them. Returns an array of candidate actions:
77153

78154
```typescript
79-
const data = await page.observe({
80-
instruction: "observe the page",
155+
// Get candidate actions on the current active page
156+
const [action] = await stagehand.observe("Click the sign in button");
157+
158+
// Execute the action
159+
await stagehand.act(action);
160+
```
161+
162+
Observing on a specific page:
163+
164+
```typescript
165+
// Target a specific page (when you need to target a page that isn't currently active)
166+
const actions = await stagehand.observe("find the next page button", {
167+
page: page2,
81168
});
169+
await stagehand.act(actions[0], { page: page2 });
82170
```
83171

84-
This returns a list of XPaths and descriptions of the data you want to extract as `{ selector: string; description: string }[]`.
172+
## Agent
173+
174+
Use the `agent` method to autonomously execute complex, multi-step tasks.
175+
176+
### Basic Agent Usage
177+
178+
```typescript
179+
const page = stagehand.context.pages()[0];
180+
await page.goto("https://www.google.com");
181+
182+
const agent = stagehand.agent({
183+
model: "google/gemini-2.0-flash",
184+
executionModel: "google/gemini-2.0-flash",
185+
});
186+
187+
const result = await agent.execute({
188+
instruction: "Search for the stock price of NVDA",
189+
maxSteps: 20,
190+
});
191+
192+
console.log(result.message);
193+
```
194+
195+
### Computer Use Agent (CUA)
196+
197+
For more advanced scenarios using computer-use models:
198+
199+
```typescript
200+
const agent = stagehand.agent({
201+
cua: true, // Enable Computer Use Agent mode
202+
model: "anthropic/claude-sonnet-4-20250514",
203+
// or "google/gemini-2.5-computer-use-preview-10-2025"
204+
systemPrompt: `You are a helpful assistant that can use a web browser.
205+
Do not ask follow up questions, the user will trust your judgement.`,
206+
});
207+
208+
await agent.execute({
209+
instruction: "Apply for a library card at the San Francisco Public Library",
210+
maxSteps: 30,
211+
});
212+
```
213+
214+
### Agent with Custom Model Configuration
215+
216+
```typescript
217+
const agent = stagehand.agent({
218+
cua: true,
219+
model: {
220+
modelName: "google/gemini-2.5-computer-use-preview-10-2025",
221+
apiKey: process.env.GEMINI_API_KEY,
222+
},
223+
systemPrompt: `You are a helpful assistant.`,
224+
});
225+
```
226+
227+
### Agent with Integrations (MCP/External Tools)
228+
229+
```typescript
230+
const agent = stagehand.agent({
231+
integrations: [`https://mcp.exa.ai/mcp?exaApiKey=${process.env.EXA_API_KEY}`],
232+
systemPrompt: `You have access to the Exa search tool.`,
233+
});
234+
```
235+
236+
## Advanced Features
237+
238+
### DeepLocator (XPath Targeting)
239+
240+
Target specific elements across shadow DOM and iframes:
241+
242+
```typescript
243+
await page
244+
.deepLocator("/html/body/div[2]/div[3]/iframe/html/body/p")
245+
.highlight({
246+
durationMs: 5000,
247+
contentColor: { r: 255, g: 0, b: 0 },
248+
});
249+
```
250+
251+
### Multi-Page Workflows
252+
253+
```typescript
254+
const page1 = stagehand.context.pages()[0];
255+
await page1.goto("https://example.com");
256+
257+
const page2 = await stagehand.context.newPage();
258+
await page2.goto("https://example2.com");
259+
260+
// Act/extract/observe operate on the current active page by default
261+
// Pass { page } option to target a specific page
262+
await stagehand.act("click button", { page: page1 });
263+
await stagehand.extract("get title", { page: page2 });
264+
```

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,14 +8,14 @@ You can build your own web agent using: `npx create-browser-app`!
88

99
## Setting the Stage
1010

11-
Stagehand is an SDK for automating browsers. It's built on top of [Playwright](https://playwright.dev/) and provides a higher-level API for better debugging and AI fail-safes.
11+
Stagehand is an SDK for automating browsers. It's built directly on top of [CDP](https://chromedevtools.github.io/devtools-protocol/) and provides a higher-level API for better debugging and AI fail-safes.
1212

1313
## Curtain Call
1414

1515
Get ready for a show-stopping development experience. Just run:
1616

1717
```bash
18-
npm install && npm run dev
18+
pnpm install && pnpm dev
1919
```
2020

2121
## What's Next?
@@ -40,8 +40,8 @@ We have custom .cursorrules for this project. It'll help quite a bit with writin
4040

4141
To run on Browserbase, add your API keys to .env and change `env: "LOCAL"` to `env: "BROWSERBASE"` in [stagehand.config.ts](stagehand.config.ts).
4242

43-
### Use Anthropic Claude 3.5 Sonnet
43+
### Use Anthropic Claude 4.5 Sonnet
4444

4545
1. Add your API key to .env
46-
2. Change `modelName: "gpt-4o"` to `modelName: "claude-3-5-sonnet-latest"` in [stagehand.config.ts](stagehand.config.ts)
46+
2. Change `modelName: "gpt-4o"` to `modelName: "claude-sonnet-4-5"` in [stagehand.config.ts](stagehand.config.ts)
4747
3. Change `modelClientOptions: { apiKey: process.env.OPENAI_API_KEY }` to `modelClientOptions: { apiKey: process.env.ANTHROPIC_API_KEY }` in [stagehand.config.ts](stagehand.config.ts)

0 commit comments

Comments
 (0)