Skip to content

Commit e9444a4

Browse files
masnwilliamsclaude
andcommitted
Add patterns guide for computer use + playwright execution
Worked recipes that show how customers combine computer controls (agent driver) and playwright execution (DOM, structured data, checkpoints). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
1 parent 81d0027 commit e9444a4

3 files changed

Lines changed: 215 additions & 1 deletion

File tree

docs.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,8 @@
7070
"index",
7171
"introduction/create",
7272
"introduction/control",
73-
"introduction/observe"
73+
"introduction/observe",
74+
"introduction/patterns"
7475
]
7576
},
7677
{

introduction/control.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -197,6 +197,7 @@ print(response.result)
197197

198198
## Going deeper
199199

200+
- [Patterns](/introduction/patterns) — worked recipes that combine computer use and playwright execution.
200201
- [Computer Controls reference](/browsers/computer-controls) — every mouse, keyboard, and screen primitive.
201202
- [Playwright Execution reference](/browsers/playwright-execution) — the full execution surface, return values, and timeouts.
202203
- [Computer use integrations](/integrations/computer-use/anthropic) — drop-in examples for Anthropic, Gemini, OpenAI, and more.

introduction/patterns.mdx

Lines changed: 212 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,212 @@
1+
---
2+
title: "Patterns"
3+
description: "Recipes that combine computer use and playwright execution"
4+
---
5+
6+
Most production browser agents end up using both [computer controls](/browsers/computer-controls) and [playwright execution](/browsers/playwright-execution). Computer controls let the model drive — screenshot, click, type — without a CDP fingerprint. Playwright execution gives you a precise tool the model can call when it needs structured data, a fast deterministic step, or a checkpoint.
7+
8+
These recipes show how to combine them.
9+
10+
## Extract structured data after agent navigation
11+
12+
A computer-use agent navigates to a page that's hard to reach declaratively — past a search box, a cookie banner, a paginated list — and you want the DOM in a typed shape, not a screenshot description. Let the agent drive, then call `playwright.execute` as a tool.
13+
14+
<CodeGroup>
15+
```typescript Typescript/Javascript
16+
import Kernel from '@onkernel/sdk';
17+
18+
const kernel = new Kernel();
19+
const kernelBrowser = await kernel.browsers.create();
20+
21+
// ... agent uses kernel.browsers.computer.* to navigate to the product page ...
22+
23+
const response = await kernel.browsers.playwright.execute(
24+
kernelBrowser.session_id,
25+
{
26+
code: `
27+
await page.waitForSelector('[data-testid="size-selector"]');
28+
const variants = await page.$$eval(
29+
'[data-testid="size-selector"] [role="option"]',
30+
(els) => els.map((el) => ({
31+
size: el.getAttribute('data-size'),
32+
sku: el.getAttribute('data-sku'),
33+
inStock: el.getAttribute('aria-disabled') !== 'true',
34+
})),
35+
);
36+
return { url: page.url(), variants };
37+
`,
38+
},
39+
);
40+
41+
console.log(response.result);
42+
```
43+
44+
```python Python
45+
from kernel import Kernel
46+
47+
kernel = Kernel()
48+
kernel_browser = kernel.browsers.create()
49+
50+
# ... agent uses kernel.browsers.computer.* to navigate to the product page ...
51+
52+
response = kernel.browsers.playwright.execute(
53+
id=kernel_browser.session_id,
54+
code="""
55+
await page.waitForSelector('[data-testid="size-selector"]');
56+
const variants = await page.$$eval(
57+
'[data-testid="size-selector"] [role="option"]',
58+
(els) => els.map((el) => ({
59+
size: el.getAttribute('data-size'),
60+
sku: el.getAttribute('data-sku'),
61+
inStock: el.getAttribute('aria-disabled') !== 'true',
62+
})),
63+
);
64+
return { url: page.url(), variants };
65+
""",
66+
)
67+
68+
print(response.result)
69+
```
70+
</CodeGroup>
71+
72+
Expose this to your model as a tool (`extract_variants`, `extract_table`, `extract_listings`) so it can decide when to call it. The agent stays in charge of navigation; your code owns the contract for what comes back.
73+
74+
## Replay at scale without re-rendering
75+
76+
Once the agent has located the data you care about and you have stable selectors, drop the agent entirely. Hit the same URLs directly via playwright execution — no model tokens, no vision loop, just deterministic extraction.
77+
78+
<CodeGroup>
79+
```typescript Typescript/Javascript
80+
const urls = ['https://shop.example.com/p/1', 'https://shop.example.com/p/2', '...'];
81+
82+
const results = await Promise.all(
83+
urls.map(async (url) => {
84+
const browser = await kernel.browsers.create({ stealth: true });
85+
try {
86+
const response = await kernel.browsers.playwright.execute(
87+
browser.session_id,
88+
{
89+
code: `
90+
await page.goto(${JSON.stringify(url)});
91+
await page.waitForSelector('[data-testid="size-selector"]');
92+
return await page.$$eval(
93+
'[data-testid="size-selector"] [role="option"]',
94+
(els) => els.map((el) => ({
95+
size: el.getAttribute('data-size'),
96+
sku: el.getAttribute('data-sku'),
97+
})),
98+
);
99+
`,
100+
},
101+
);
102+
return { url, variants: response.result };
103+
} finally {
104+
await kernel.browsers.deleteByID(browser.session_id);
105+
}
106+
}),
107+
);
108+
```
109+
110+
```python Python
111+
from concurrent.futures import ThreadPoolExecutor
112+
from kernel import Kernel
113+
114+
kernel = Kernel()
115+
urls = ['https://shop.example.com/p/1', 'https://shop.example.com/p/2', '...']
116+
117+
def fetch(url: str):
118+
browser = kernel.browsers.create(stealth=True)
119+
try:
120+
response = kernel.browsers.playwright.execute(
121+
id=browser.session_id,
122+
code=f"""
123+
await page.goto({url!r});
124+
await page.waitForSelector('[data-testid="size-selector"]');
125+
return await page.$$eval(
126+
'[data-testid="size-selector"] [role="option"]',
127+
(els) => els.map((el) => ({{
128+
size: el.getAttribute('data-size'),
129+
sku: el.getAttribute('data-sku'),
130+
}})),
131+
);
132+
""",
133+
)
134+
return { 'url': url, 'variants': response.result }
135+
finally:
136+
kernel.browsers.delete_by_id(browser.session_id)
137+
138+
with ThreadPoolExecutor(max_workers=10) as pool:
139+
results = list(pool.map(fetch, urls))
140+
```
141+
</CodeGroup>
142+
143+
The agent's job was finding the right selectors. Once you have them, you don't need the agent for the next million requests.
144+
145+
## Checkpoint agent state between steps
146+
147+
After a computer-use step that you can't easily verify from a screenshot — a login, a cart add, a multi-step form — call `playwright.execute` to assert real browser state before letting the agent continue. Cheaper than another screenshot round-trip and harder for the model to hallucinate around.
148+
149+
<CodeGroup>
150+
```typescript Typescript/Javascript
151+
// ... agent runs the login flow via kernel.browsers.computer.* ...
152+
153+
const response = await kernel.browsers.playwright.execute(
154+
kernelBrowser.session_id,
155+
{
156+
code: `
157+
const cookies = await context.cookies();
158+
const sessionCookie = cookies.find((c) => c.name === 'session');
159+
const userBadge = await page.$('[data-testid="user-badge"]');
160+
return {
161+
loggedIn: Boolean(sessionCookie && userBadge),
162+
url: page.url(),
163+
};
164+
`,
165+
},
166+
);
167+
168+
if (!response.result.loggedIn) {
169+
// hand back to the agent with a corrective message, or fail fast
170+
}
171+
```
172+
173+
```python Python
174+
# ... agent runs the login flow via kernel.browsers.computer.* ...
175+
176+
response = kernel.browsers.playwright.execute(
177+
id=kernel_browser.session_id,
178+
code="""
179+
const cookies = await context.cookies();
180+
const sessionCookie = cookies.find((c) => c.name === 'session');
181+
const userBadge = await page.$('[data-testid="user-badge"]');
182+
return {
183+
loggedIn: Boolean(sessionCookie && userBadge),
184+
url: page.url(),
185+
};
186+
""",
187+
)
188+
189+
if not response.result["loggedIn"]:
190+
# hand back to the agent with a corrective message, or fail fast
191+
...
192+
```
193+
</CodeGroup>
194+
195+
## Choosing between the two
196+
197+
| You want to... | Reach for |
198+
| --- | --- |
199+
| Click something the model can see but you can't selector | Computer controls |
200+
| Navigate through a flow with unpredictable UI (banners, modals) | Computer controls |
201+
| Drive a vision-language model loop | Computer controls |
202+
| Extract structured data from the DOM | Playwright execution |
203+
| Run the same step deterministically every time | Playwright execution |
204+
| Assert a precondition or postcondition between agent steps | Playwright execution |
205+
| Capture a full-page screenshot for a report | Playwright execution |
206+
207+
## Going deeper
208+
209+
- [Control](/introduction/control) — the four primitives Kernel exposes and when to use each.
210+
- [Computer Controls reference](/browsers/computer-controls) — every mouse, keyboard, and screen primitive.
211+
- [Playwright Execution reference](/browsers/playwright-execution) — return values, timeouts, errors.
212+
- [Computer use integrations](/integrations/computer-use/anthropic) — drop-in examples for Anthropic, Gemini, OpenAI, and more.

0 commit comments

Comments
 (0)