Add patterns guide for computer use + playwright execution

masnwilliams · claude · masnwilliams · commit e9444a475651 · 2026-05-22T16:56:23.000Z
Worked recipes that show how customers combine computer controls (agent
driver) and playwright execution (DOM, structured data, checkpoints).

Co-Authored-By: Claude Opus 4.7 &lt;noreply@anthropic.com&gt;
diff --git a/docs.json b/docs.json
@@ -70,7 +70,8 @@
               "index",
               "introduction/create",
               "introduction/control",
-              "introduction/observe"
+              "introduction/observe",
+              "introduction/patterns"
             ]
           },
           {
diff --git a/introduction/control.mdx b/introduction/control.mdx
@@ -197,6 +197,7 @@ print(response.result)
 
 ## Going deeper
 
+- [Patterns](/introduction/patterns) — worked recipes that combine computer use and playwright execution.
 - [Computer Controls reference](/browsers/computer-controls) — every mouse, keyboard, and screen primitive.
 - [Playwright Execution reference](/browsers/playwright-execution) — the full execution surface, return values, and timeouts.
 - [Computer use integrations](/integrations/computer-use/anthropic) — drop-in examples for Anthropic, Gemini, OpenAI, and more.
diff --git a/introduction/patterns.mdx b/introduction/patterns.mdx
@@ -0,0 +1,212 @@
+---
+title: "Patterns"
+description: "Recipes that combine computer use and playwright execution"
+---
+
+Most production browser agents end up using both [computer controls](/browsers/computer-controls) and [playwright execution](/browsers/playwright-execution). Computer controls let the model drive — screenshot, click, type — without a CDP fingerprint. Playwright execution gives you a precise tool the model can call when it needs structured data, a fast deterministic step, or a checkpoint.
+
+These recipes show how to combine them.
+
+## Extract structured data after agent navigation
+
+A computer-use agent navigates to a page that's hard to reach declaratively — past a search box, a cookie banner, a paginated list — and you want the DOM in a typed shape, not a screenshot description. Let the agent drive, then call `playwright.execute` as a tool.
+
+<CodeGroup>
+```typescript Typescript/Javascript
+import Kernel from '@onkernel/sdk';
+
+const kernel = new Kernel();
+const kernelBrowser = await kernel.browsers.create();
+
+// ... agent uses kernel.browsers.computer.* to navigate to the product page ...
+
+const response = await kernel.browsers.playwright.execute(
+  kernelBrowser.session_id,
+  {
+    code: `
+      await page.waitForSelector('[data-testid="size-selector"]');
+      const variants = await page.$$eval(
+        '[data-testid="size-selector"] [role="option"]',
+        (els) => els.map((el) => ({
+          size: el.getAttribute('data-size'),
+          sku: el.getAttribute('data-sku'),
+          inStock: el.getAttribute('aria-disabled') !== 'true',
+        })),
+      );
+      return { url: page.url(), variants };
+    `,
+  },
+);
+
+console.log(response.result);
+```
+
+```python Python
+from kernel import Kernel
+
+kernel = Kernel()
+kernel_browser = kernel.browsers.create()
+
+# ... agent uses kernel.browsers.computer.* to navigate to the product page ...
+
+response = kernel.browsers.playwright.execute(
+    id=kernel_browser.session_id,
+    code="""
+      await page.waitForSelector('[data-testid="size-selector"]');
+      const variants = await page.$$eval(
+        '[data-testid="size-selector"] [role="option"]',
+        (els) => els.map((el) => ({
+          size: el.getAttribute('data-size'),
+          sku: el.getAttribute('data-sku'),
+          inStock: el.getAttribute('aria-disabled') !== 'true',
+        })),
+      );
+      return { url: page.url(), variants };
+    """,
+)
+
+print(response.result)
+```
+</CodeGroup>
+
+Expose this to your model as a tool (`extract_variants`, `extract_table`, `extract_listings`) so it can decide when to call it. The agent stays in charge of navigation; your code owns the contract for what comes back.
+
+## Replay at scale without re-rendering
+
+Once the agent has located the data you care about and you have stable selectors, drop the agent entirely. Hit the same URLs directly via playwright execution — no model tokens, no vision loop, just deterministic extraction.
+
+<CodeGroup>
+```typescript Typescript/Javascript
+const urls = ['https://shop.example.com/p/1', 'https://shop.example.com/p/2', '...'];
+
+const results = await Promise.all(
+  urls.map(async (url) => {
+    const browser = await kernel.browsers.create({ stealth: true });
+    try {
+      const response = await kernel.browsers.playwright.execute(
+        browser.session_id,
+        {
+          code: `
+            await page.goto(${JSON.stringify(url)});
+            await page.waitForSelector('[data-testid="size-selector"]');
+            return await page.$$eval(
+              '[data-testid="size-selector"] [role="option"]',
+              (els) => els.map((el) => ({
+                size: el.getAttribute('data-size'),
+                sku: el.getAttribute('data-sku'),
+              })),
+            );
+          `,
+        },
+      );
+      return { url, variants: response.result };
+    } finally {
+      await kernel.browsers.deleteByID(browser.session_id);
+    }
+  }),
+);
+```
+
+```python Python
+from concurrent.futures import ThreadPoolExecutor
+from kernel import Kernel
+
+kernel = Kernel()
+urls = ['https://shop.example.com/p/1', 'https://shop.example.com/p/2', '...']
+
+def fetch(url: str):
+    browser = kernel.browsers.create(stealth=True)
+    try:
+        response = kernel.browsers.playwright.execute(
+            id=browser.session_id,
+            code=f"""
+              await page.goto({url!r});
+              await page.waitForSelector('[data-testid="size-selector"]');
+              return await page.$$eval(
+                '[data-testid="size-selector"] [role="option"]',
+                (els) => els.map((el) => ({{
+                  size: el.getAttribute('data-size'),
+                  sku: el.getAttribute('data-sku'),
+                }})),
+              );
+            """,
+        )
+        return { 'url': url, 'variants': response.result }
+    finally:
+        kernel.browsers.delete_by_id(browser.session_id)
+
+with ThreadPoolExecutor(max_workers=10) as pool:
+    results = list(pool.map(fetch, urls))
+```
+</CodeGroup>
+
+The agent's job was finding the right selectors. Once you have them, you don't need the agent for the next million requests.
+
+## Checkpoint agent state between steps
+
+After a computer-use step that you can't easily verify from a screenshot — a login, a cart add, a multi-step form — call `playwright.execute` to assert real browser state before letting the agent continue. Cheaper than another screenshot round-trip and harder for the model to hallucinate around.
+
+<CodeGroup>
+```typescript Typescript/Javascript
+// ... agent runs the login flow via kernel.browsers.computer.* ...
+
+const response = await kernel.browsers.playwright.execute(
+  kernelBrowser.session_id,
+  {
+    code: `
+      const cookies = await context.cookies();
+      const sessionCookie = cookies.find((c) => c.name === 'session');
+      const userBadge = await page.$('[data-testid="user-badge"]');
+      return {
+        loggedIn: Boolean(sessionCookie && userBadge),
+        url: page.url(),
+      };
+    `,
+  },
+);
+
+if (!response.result.loggedIn) {
+  // hand back to the agent with a corrective message, or fail fast
+}
+```
+
+```python Python
+# ... agent runs the login flow via kernel.browsers.computer.* ...
+
+response = kernel.browsers.playwright.execute(
+    id=kernel_browser.session_id,
+    code="""
+      const cookies = await context.cookies();
+      const sessionCookie = cookies.find((c) => c.name === 'session');
+      const userBadge = await page.$('[data-testid="user-badge"]');
+      return {
+        loggedIn: Boolean(sessionCookie && userBadge),
+        url: page.url(),
+      };
+    """,
+)
+
+if not response.result["loggedIn"]:
+    # hand back to the agent with a corrective message, or fail fast
+    ...
+```
+</CodeGroup>
+
+## Choosing between the two
+
+| You want to... | Reach for |
+| --- | --- |
+| Click something the model can see but you can't selector | Computer controls |
+| Navigate through a flow with unpredictable UI (banners, modals) | Computer controls |
+| Drive a vision-language model loop | Computer controls |
+| Extract structured data from the DOM | Playwright execution |
+| Run the same step deterministically every time | Playwright execution |
+| Assert a precondition or postcondition between agent steps | Playwright execution |
+| Capture a full-page screenshot for a report | Playwright execution |
+
+## Going deeper
+
+- [Control](/introduction/control) — the four primitives Kernel exposes and when to use each.
+- [Computer Controls reference](/browsers/computer-controls) — every mouse, keyboard, and screen primitive.
+- [Playwright Execution reference](/browsers/playwright-execution) — return values, timeouts, errors.
+- [Computer use integrations](/integrations/computer-use/anthropic) — drop-in examples for Anthropic, Gemini, OpenAI, and more.

Original file line number	Diff line number	Diff line change
`@@ -70,7 +70,8 @@`
`70`	`70`	`"index",`
`71`	`71`	`"introduction/create",`
`72`	`72`	`"introduction/control",`
`73`		`- "introduction/observe"`
	`73`	`+ "introduction/observe",`
	`74`	`+ "introduction/patterns"`
`74`	`75`	`]`
`75`	`76`	`},`
`76`	`77`	`{`