Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
52a5b2d
yutori templates: update model to n1-latest, viewport to 1280, role t…
dprevoznik Feb 18, 2026
9cd6181
yutori templates: remove Playwright mode
dprevoznik Feb 18, 2026
c986618
yutori templates: update tool mapping for n1-latest API
dprevoznik Feb 18, 2026
f262fbc
yutori templates: rewrite sampling loop for n1-latest tool_calls format
dprevoznik Feb 18, 2026
bed71fa
yutori templates: simplify extractLastAssistantMessage
dprevoznik Feb 18, 2026
73fc850
yutori templates: update READMEs for n1-latest
dprevoznik Feb 18, 2026
9bc2f02
Ignore .cursor/plans in .gitignore
dprevoznik Feb 18, 2026
5f44c7d
Rename max_tokens to max_completion_tokens
dprevoznik Feb 18, 2026
b1a5160
yutori-cua: add kiosk mode option (Step 1)
dprevoznik Feb 18, 2026
e7b9819
yutori-cua: use Playwright for goto_url when kiosk mode (Step 2)
dprevoznik Feb 18, 2026
17c73fc
Update Yutori docs and example payloads
dprevoznik Feb 18, 2026
5bd582c
Add kiosk mode docs and update usage payloads
dprevoznik Feb 18, 2026
3541bbb
Deslop
dprevoznik Feb 18, 2026
b31643b
Throw ToolError on Playwright goto failure
dprevoznik Feb 18, 2026
1722268
Use bare raise to preserve original traceback in Python loop
dprevoznik Feb 18, 2026
639c0fa
Replace double type cast with @ts-expect-error for Yutori image content
dprevoznik Feb 18, 2026
6d878f3
Use model_dump for assistant message serialization in Python loop
dprevoznik Feb 18, 2026
c3757ab
Keep as-unknown-as-string cast with explanatory comment for Yutori im…
dprevoznik Feb 18, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 9 additions & 13 deletions .cursor/commands/qa.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,8 +60,6 @@ Here are all valid language + template combinations:
| typescript | claude-agent-sdk | ts-claude-agent-sdk | ts-claude-agent-sdk | Yes | ANTHROPIC_API_KEY |
| typescript | yutori-computer-use | ts-yutori-cua | ts-yutori-cua | Yes | YUTORI_API_KEY |

> **Note:** The `yutori-computer-use` template supports two modes: `computer_use` (default, full VM screenshots) and `playwright` (viewport-only screenshots via CDP). Both modes should be tested.

| python | sample-app | py-sample-app | python-basic | No | - |
| python | gemini-computer-use | py-gemini-cua | python-gemini-cua | Yes | GOOGLE_API_KEY |
| python | captcha-solver | py-captcha-solver | python-captcha-solver | No | - |
Expand All @@ -72,9 +70,7 @@ Here are all valid language + template combinations:
| python | claude-agent-sdk | py-claude-agent-sdk | py-claude-agent-sdk | Yes | ANTHROPIC_API_KEY |
| python | yutori-computer-use | py-yutori-cua | python-yutori-cua | Yes | YUTORI_API_KEY |

> **Yutori Modes:**
> - `computer_use` (default): Uses Kernel's Computer Controls API with full VM screenshots
> - `playwright`: Uses Playwright via CDP WebSocket for viewport-only screenshots (optimized for n1 model)
> **Yutori:** Test both default browser and `"kiosk": true` (uses Playwright for goto_url when kiosk is enabled).

### Create Commands

Expand Down Expand Up @@ -275,8 +271,8 @@ kernel invoke ts-magnitude mag-url-extract --payload '{"url": "https://en.wikipe
kernel invoke ts-openai-cua cua-task --payload '{"task": "Go to https://news.ycombinator.com and get the top 5 articles"}'
kernel invoke ts-gemini-cua cua-task --payload '{"query": "Go to https://www.magnitasks.com, Click the Tasks option in the left-side bar, and move the 5 items in the To Do and In Progress items to the Done section of the Kanban board. You are done successfully when the items are moved.", "record_replay": true}'
kernel invoke ts-claude-agent-sdk agent-task --payload '{"task": "Go to https://news.ycombinator.com and get the top 3 stories"}'
kernel invoke ts-yutori-cua cua-task --payload '{"query": "Go to https://www.magnitasks.com, Click the Tasks option in the left-side bar, and drag the 5 items in the To Do and In Progress columns to the Done section of the Kanban board. You are done successfully when the items are dragged to Done. Do not click into the items.", "record_replay": true, "mode": "computer_use"}'
kernel invoke ts-yutori-cua cua-task --payload '{"query": "Go to https://www.magnitasks.com, Click the Tasks option in the left-side bar, and drag the 5 items in the To Do and In Progress columns to the Done section of the Kanban board. You are done successfully when the items are dragged to Done. Do not click into the items.", "record_replay": true, "mode": "playwright"}'
kernel invoke ts-yutori-cua cua-task --payload '{"query": "Go to https://www.magnitasks.com, Click the Tasks option in the left-side bar, and drag the 5 items in the To Do and In Progress columns to the Done section of the Kanban board. You are done successfully when the items are dragged to Done. Do not click into the items.", "record_replay": true}'
kernel invoke ts-yutori-cua cua-task --payload '{"query": "Go to https://www.magnitasks.com, Click the Tasks option in the left-side bar, and drag the 5 items in the To Do and In Progress columns to the Done section of the Kanban board. You are done successfully when the items are dragged to Done. Do not click into the items.", "record_replay": true, "kiosk": true}'

# Python apps
kernel invoke python-basic get-page-title --payload '{"url": "https://www.google.com"}'
Expand All @@ -287,8 +283,8 @@ kernel invoke python-openai-cua cua-task --payload '{"task": "Go to https://news
kernel invoke python-openagi-cua openagi-default-task -p '{"instruction": "Navigate to https://agiopen.org and click the What is Computer Use? button"}'
kernel invoke py-claude-agent-sdk agent-task --payload '{"task": "Go to https://news.ycombinator.com and get the top 3 stories"}'
kernel invoke python-gemini-cua cua-task --payload '{"query": "Go to https://www.magnitasks.com, Click the Tasks option in the left-side bar, and move the 5 items in the To Do and In Progress items to the Done section of the Kanban board. You are done successfully when the items are moved.", "record_replay": true}'
kernel invoke python-yutori-cua cua-task --payload '{"query": "Go to https://www.magnitasks.com, Click the Tasks option in the left-side bar, and drag the 5 items in the To Do and In Progress columns to the Done section of the Kanban board. You are done successfully when the items are dragged to Done. Do not click into the items.", "record_replay": true, "mode": "computer_use"}'
kernel invoke python-yutori-cua cua-task --payload '{"query": "Go to https://www.magnitasks.com, Click the Tasks option in the left-side bar, and drag the 5 items in the To Do and In Progress columns to the Done section of the Kanban board. You are done successfully when the items are dragged to Done. Do not click into the items.", "record_replay": true, "mode": "playwright"}'
kernel invoke python-yutori-cua cua-task --payload '{"query": "Go to https://www.magnitasks.com, Click the Tasks option in the left-side bar, and drag the 5 items in the To Do and In Progress columns to the Done section of the Kanban board. You are done successfully when the items are dragged to Done. Do not click into the items.", "record_replay": true}'
kernel invoke python-yutori-cua cua-task --payload '{"query": "Go to https://www.magnitasks.com, Click the Tasks option in the left-side bar, and drag the 5 items in the To Do and In Progress columns to the Done section of the Kanban board. You are done successfully when the items are dragged to Done. Do not click into the items.", "record_replay": true, "kiosk": true}'
```

## Step 7: Automated Runtime Testing (Optional)
Expand All @@ -313,8 +309,8 @@ If the human agrees, invoke each template use the Kernel CLI and collect results
| ts-openai-cua | ts-openai-cua | | |
| ts-gemini-cua | ts-gemini-cua | | |
| ts-claude-agent-sdk | ts-claude-agent-sdk | | |
| ts-yutori-cua | ts-yutori-cua | | mode: computer_use |
| ts-yutori-cua | ts-yutori-cua | | mode: playwright |
| ts-yutori-cua | ts-yutori-cua | | default |
| ts-yutori-cua | ts-yutori-cua | | kiosk: true |
| py-sample-app | python-basic | | |
| py-captcha-solver | python-captcha-solver | | |
| py-browser-use | python-bu | | |
Expand All @@ -323,8 +319,8 @@ If the human agrees, invoke each template use the Kernel CLI and collect results
| py-openagi-cua | python-openagi-cua | | |
| py-claude-agent-sdk | py-claude-agent-sdk | | |
| py-gemini-cua | python-gemini-cua | | |
| py-yutori-cua | python-yutori-cua | | mode: computer_use |
| py-yutori-cua | python-yutori-cua | | mode: playwright |
| py-yutori-cua | python-yutori-cua | | default |
| py-yutori-cua | python-yutori-cua | | kiosk: true |

Status values:
- **SUCCESS**: App started and returned a result
Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,9 @@ report.[0-9]_.[0-9]_.[0-9]_.[0-9]_.json

# Finder (MacOS) folder config
.DS_Store

# Cursor
.cursor/plans/
kernel

# QA testing directories
Expand Down
36 changes: 30 additions & 6 deletions pkg/templates/python/yutori-computer-use/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ kernel deploy main.py --env-file .env
## Usage

```bash
kernel invoke python-yutori-cua cua-task --payload '{"query": "Navigate to https://example.com and describe the page"}'
kernel invoke python-yutori-cua cua-task --payload '{"query": "Go to https://www.magnitasks.com, Click the Tasks option in the left-side bar, and drag the 5 items in the To Do and In Progress columns to the Done section of the Kanban board. You are done successfully when the items are dragged to Done. Do not click into the items."}'
```

## Recording Replays
Expand All @@ -35,19 +35,44 @@ kernel invoke python-yutori-cua cua-task --payload '{"query": "Navigate to https

When enabled, the response will include a `replay_url` field with a link to view the recorded session.

## Kiosk mode

Prefer **non-kiosk mode** by default and when the agent is expected to switch domains via URL. Use **kiosk (`"kiosk": true`)** when: (1) you're recording sessions and want a cleaner UI in the replay, or (2) you're automating on a single website and the combination of the complex site layout and browser chrome (address bar, tabs) may confuse the agent.

Note: In kiosk mode the agent may still try to use the address bar to enter URLs; it's not available, so it will eventually use `goto_url`, but those attempts may result in slowdown of the overall session.

Default (non-kiosk):

```bash
kernel invoke python-yutori-cua cua-task --payload '{"query": "Navigate to https://example.com, then navigate to ign.com and describe the page"}'
```

With kiosk (single-site or recording):

```bash
kernel invoke python-yutori-cua cua-task --payload '{"query": "Enter https://example.com in the search box and then describe the page.", "kiosk": true}'
```

## Viewport Configuration

Yutori n1 recommends a **1280×800 (WXGA, 16:10)** viewport for best grounding accuracy. Kernel's closest supported viewport is **1200×800 at 25Hz**, which this template uses by default.
Yutori n1 recommends a **1280×800 (WXGA, 16:10)** viewport for best grounding accuracy.

> **Note:** n1 outputs coordinates in a 1000×1000 relative space, which are automatically scaled to the actual viewport dimensions. The slight width difference (1200 vs 1280) should have minimal impact on accuracy.
> **Note:** n1 outputs coordinates in a 1000×1000 relative space, which are automatically scaled to the actual viewport dimensions.

See [Kernel Viewport Documentation](https://www.kernel.sh/docs/browsers/viewport) for all supported configurations.

## n1 Supported Actions
## Screenshots

Screenshots are automatically converted to WebP format for better compression across multi-step trajectories, as recommended by Yutori.

## n1-latest Supported Actions

| Action | Description |
|--------|-------------|
| `click` | Left mouse click at coordinates |
| `left_click` | Left mouse click at coordinates |
| `double_click` | Double-click at coordinates |
| `triple_click` | Triple-click at coordinates |
| `right_click` | Right mouse click at coordinates |
| `scroll` | Scroll page in a direction |
| `type` | Type text into focused element |
| `key_press` | Send keyboard input |
Expand All @@ -57,7 +82,6 @@ See [Kernel Viewport Documentation](https://www.kernel.sh/docs/browsers/viewport
| `refresh` | Reload current page |
| `go_back` | Navigate back in history |
| `goto_url` | Navigate to a URL |
| `stop` | End task with final answer |

## Resources

Expand Down
Loading