Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
262 changes: 221 additions & 41 deletions .cursorrules
Original file line number Diff line number Diff line change
@@ -1,84 +1,264 @@
# Stagehand Project

This is a project that uses Stagehand, which amplifies Playwright with `act`, `extract`, and `observe` added to the Page class.
This is a project that uses Stagehand V3, a browser automation framework with AI-powered `act`, `extract`, `observe`, and `agent` methods.

`Stagehand` is a class that provides config, a `StagehandPage` object via `stagehand.page`, and a `StagehandContext` object via `stagehand.context`.
The main class can be imported as `Stagehand` from `@browserbasehq/stagehand`.

`Page` is a class that extends the Playwright `Page` class and adds `act`, `extract`, and `observe` methods.
`Context` is a class that extends the Playwright `BrowserContext` class.
**Key Classes:**

Use the following rules to write code for this project.

- When writing Playwright code, wrap it with Stagehand `act`
- When writing code that needs to extract data from the page, use Stagehand `extract`
- When writing code that needs to observe the page, use Stagehand `observe`
- `Stagehand`: Main orchestrator class providing `act`, `extract`, `observe`, and `agent` methods
- `context`: A `V3Context` object that manages browser contexts and pages
- `page`: Individual page objects accessed via `stagehand.context.pages()[i]` or created with `stagehand.context.newPage()`

## Initialize

```typescript
import { Stagehand } from "@browserbasehq/stagehand";
import StagehandConfig from "./stagehand.config";

const stagehand = new Stagehand(StagehandConfig);
const stagehand = new Stagehand({
env: "LOCAL", // or "BROWSERBASE"
verbose: 2, // 0, 1, or 2
model: "openai/gpt-4.1-mini", // or any supported model
});

await stagehand.init();

const page = stagehand.page; // Playwright Page with act, extract, and observe methods
const context = stagehand.context; // Playwright BrowserContext
// Access the browser context and pages
const page = stagehand.context.pages()[0];
const context = stagehand.context;

// Create new pages if needed
const page2 = await stagehand.context.newPage();
```

## Act

For example, if you are writing Playwright code, wrap it with Stagehand `act` like this:
Actions are called on the `stagehand` instance (not the page). Use atomic, specific instructions:

```typescript
try {
await page.locator('button[name="Sign in"]').click();
} catch (error) {
await page.act({
action: "click the sign in button",
});
}
// Act on the current active page
await stagehand.act("click the sign in button");

// Act on a specific page (when you need to target a page that isn't currently active)
await stagehand.act("click the sign in button", { page: page2 });
```

Act `action` should be as atomic and specific as possible, i.e. "Click the sign in button" or "Type 'hello' into the search input". Avoid actions that are too broad, i.e. "Order me pizza" or "Send an email to Paul asking him to call me". Actions work best for Playwright code that is vulnerable to unexpected DOM changes.
**Important:** Act instructions should be atomic and specific:

- ✅ Good: "Click the sign in button" or "Type 'hello' into the search input"
- ❌ Bad: "Order me pizza" or "Type in the search bar and hit enter" (multi-step)

### Observe + Act Pattern (Recommended)

Cache the results of `observe` to avoid unexpected DOM changes:

```typescript
const instruction = "Click the sign in button";

When using `act`, write Playwright code FIRST, then wrap it with a try-catch block where the catch block is `act`.
// Get candidate actions
const actions = await stagehand.observe(instruction);

// Execute the first action
await stagehand.act(actions[0]);
```

To target a specific page:

```typescript
const actions = await stagehand.observe("select blue as the favorite color", {
page: page2,
});
await stagehand.act(actions[0], { page: page2 });
```

## Extract

If you are writing code that needs to extract data from the page, use Stagehand `extract` like this:
Extract data from pages using natural language instructions. The `extract` method is called on the `stagehand` instance.

### Basic Extraction (with schema)

```typescript
const data = await page.extract({
instruction: "extract the sign in button text",
schema: z.object({
text: z.string(),
import { z } from "zod/v3";

// Extract with explicit schema
const data = await stagehand.extract(
"extract all apartment listings with prices and addresses",
z.object({
listings: z.array(
z.object({
price: z.string(),
address: z.string(),
}),
),
}),
useTextExtract: true,
});
);

console.log(data.listings);
```

`schema` is a Zod schema that describes the data you want to extract. To extract an array, make sure to pass in a single object that contains the array, as follows:
### Simple Extraction (without schema)

```typescript
const data = await page.extract({
instruction: "extract the text inside all buttons",
schema: z.object({
text: z.array(z.string()),
// Extract returns a default object with 'extraction' field
const result = await stagehand.extract("extract the sign in button text");

console.log(result);
// Output: { extraction: "Sign in" }

// Or destructure directly
const { extraction } = await stagehand.extract(
"extract the sign in button text",
);
console.log(extraction); // "Sign in"
```

### Targeted Extraction

Extract data from a specific element using a selector:

```typescript
const reason = await stagehand.extract(
"extract the reason why script injection fails",
z.string(),
{ selector: "/html/body/div[2]/div[3]/iframe/html/body/p[2]" },
);
```

### URL Extraction

When extracting links or URLs, use `z.string().url()`:

```typescript
const { links } = await stagehand.extract(
"extract all navigation links",
z.object({
links: z.array(z.string().url()),
}),
});
);
```

Set `useTextExtract` to `true` for better results.
### Extracting from a Specific Page

```typescript
// Extract from a specific page (when you need to target a page that isn't currently active)
const data = await stagehand.extract(
"extract the placeholder text on the name field",
{ page: page2 },
);
```

## Observe

If you are writing code that needs to observe the page, use Stagehand `observe` like this:
Plan actions before executing them. Returns an array of candidate actions:

```typescript
const data = await page.observe({
instruction: "observe the page",
// Get candidate actions on the current active page
const [action] = await stagehand.observe("Click the sign in button");

// Execute the action
await stagehand.act(action);
```

Observing on a specific page:

```typescript
// Target a specific page (when you need to target a page that isn't currently active)
const actions = await stagehand.observe("find the next page button", {
page: page2,
});
await stagehand.act(actions[0], { page: page2 });
```

This returns a list of XPaths and descriptions of the data you want to extract as `{ selector: string; description: string }[]`.
## Agent

Use the `agent` method to autonomously execute complex, multi-step tasks.

### Basic Agent Usage

```typescript
const page = stagehand.context.pages()[0];
await page.goto("https://www.google.com");

const agent = stagehand.agent({
model: "google/gemini-2.0-flash",
executionModel: "google/gemini-2.0-flash",
});

const result = await agent.execute({
instruction: "Search for the stock price of NVDA",
maxSteps: 20,
});

console.log(result.message);
```

### Computer Use Agent (CUA)

For more advanced scenarios using computer-use models:

```typescript
const agent = stagehand.agent({
cua: true, // Enable Computer Use Agent mode
model: "anthropic/claude-sonnet-4-20250514",
// or "google/gemini-2.5-computer-use-preview-10-2025"
systemPrompt: `You are a helpful assistant that can use a web browser.
Do not ask follow up questions, the user will trust your judgement.`,
});

await agent.execute({
instruction: "Apply for a library card at the San Francisco Public Library",
maxSteps: 30,
});
```

### Agent with Custom Model Configuration

```typescript
const agent = stagehand.agent({
cua: true,
model: {
modelName: "google/gemini-2.5-computer-use-preview-10-2025",
apiKey: process.env.GEMINI_API_KEY,
},
systemPrompt: `You are a helpful assistant.`,
});
```

### Agent with Integrations (MCP/External Tools)

```typescript
const agent = stagehand.agent({
integrations: [`https://mcp.exa.ai/mcp?exaApiKey=${process.env.EXA_API_KEY}`],
systemPrompt: `You have access to the Exa search tool.`,
});
```

## Advanced Features

### DeepLocator (XPath Targeting)

Target specific elements across shadow DOM and iframes:

```typescript
await page
.deepLocator("/html/body/div[2]/div[3]/iframe/html/body/p")
.highlight({
durationMs: 5000,
contentColor: { r: 255, g: 0, b: 0 },
});
```

### Multi-Page Workflows

```typescript
const page1 = stagehand.context.pages()[0];
await page1.goto("https://example.com");

const page2 = await stagehand.context.newPage();
await page2.goto("https://example2.com");

// Act/extract/observe operate on the current active page by default
// Pass { page } option to target a specific page
await stagehand.act("click button", { page: page1 });
await stagehand.extract("get title", { page: page2 });
```
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,14 @@ You can build your own web agent using: `npx create-browser-app`!

## Setting the Stage

Stagehand is an SDK for automating browsers. It's built on top of [Playwright](https://playwright.dev/) and provides a higher-level API for better debugging and AI fail-safes.
Stagehand is an SDK for automating browsers. It's built directly on top of [CDP](https://chromedevtools.github.io/devtools-protocol/) and provides a higher-level API for better debugging and AI fail-safes.

## Curtain Call

Get ready for a show-stopping development experience. Just run:

```bash
npm install && npm run dev
pnpm install && pnpm dev
```

## What's Next?
Expand All @@ -40,8 +40,8 @@ We have custom .cursorrules for this project. It'll help quite a bit with writin

To run on Browserbase, add your API keys to .env and change `env: "LOCAL"` to `env: "BROWSERBASE"` in [stagehand.config.ts](stagehand.config.ts).

### Use Anthropic Claude 3.5 Sonnet
### Use Anthropic Claude 4.5 Sonnet

1. Add your API key to .env
2. Change `modelName: "gpt-4o"` to `modelName: "claude-3-5-sonnet-latest"` in [stagehand.config.ts](stagehand.config.ts)
2. Change `modelName: "gpt-4o"` to `modelName: "claude-sonnet-4-5"` in [stagehand.config.ts](stagehand.config.ts)
3. Change `modelClientOptions: { apiKey: process.env.OPENAI_API_KEY }` to `modelClientOptions: { apiKey: process.env.ANTHROPIC_API_KEY }` in [stagehand.config.ts](stagehand.config.ts)
Loading