Skip to content

Update yutori-computer-use templates for n1-latest#113

Open
dprevoznik wants to merge 14 commits intomainfrom
danny/n1-latest-breaking-update
Open

Update yutori-computer-use templates for n1-latest#113
dprevoznik wants to merge 14 commits intomainfrom
danny/n1-latest-breaking-update

Conversation

@dprevoznik
Copy link
Contributor

@dprevoznik dprevoznik commented Feb 18, 2026

Summary

  • Update yutori templates (Python + TypeScript) to the n1-latest model API, which uses OpenAI-compatible tool_calls format instead of the previous custom format
  • Remove Playwright computer mode in favor of a single Computer Controls path (Playwright is still used for goto_url in kiosk mode)
  • Add kiosk mode option that hides browser chrome and uses Playwright for navigation
  • Rename max_tokens to max_completion_tokens, update viewport to 1280x800, simplify the sampling loop and tool mapping

Test plan

All 13 action types were tested against live Kernel browser sessions for both Python and TypeScript implementations (27 tests each, all passing):

Action Status
left_click, double_click, triple_click, right_click Pass
hover, drag Pass
type (basic, clear_before_typing, press_enter_after) Pass
key_press (single key, combos, modifier mapping) Pass
scroll (up, down, left, right) Pass
goto_url, go_back, refresh, wait Pass
Error handling (unknown action, missing required fields, invalid direction) Pass

Made with Cursor


Note

Medium Risk
Changes core agent-loop and action execution semantics (tool-calls, coordinate/action schema, screenshot encoding) and adds new dependencies, which could cause runtime regressions in browser automation if the API expectations differ.

Overview
Updates the Python and TypeScript yutori-computer-use templates to target Yutori n1-latest, switching the sampling loop from the prior JSON-in-content protocol to OpenAI-compatible tool_calls/tool messages and renaming token budgeting to max_completion_tokens.

Simplifies execution to a single Computer Controls tool path by removing the Playwright “mode” implementation, while adding an optional kiosk/kioskMode that launches the browser in kiosk mode and routes goto_url through browsers.playwright.execute when enabled.

Aligns defaults and payloads with the new API: viewport defaults move to 1280x800, actions change to coordinates plus distinct click types (left_click/double_click/triple_click/right_click), screenshots are converted from PNG to WebP (new Pillow/sharp deps), and QA/docs/examples are updated accordingly.

Written by Cursor Bugbot for commit b31643b. This will update automatically on new commits. Configure here.

dprevoznik and others added 12 commits February 17, 2026 21:03
…o user

- Model: n1-preview-2025-11 -> n1-latest
- Viewport width: 1200 -> 1280 (Yutori's recommended resolution)
- Message role: observation -> user for screenshots and tool results
- Clean up JSDoc/docstring comments referencing old defaults

Co-authored-by: Cursor <cursoragent@cursor.com>
Simplify templates to only support Kernel's Computer Controls API.
Removes playwright-computer.ts/py, BrowserMode type, cdpWsUrl/mode
from loop options and entrypoint payloads, and playwright deps.

Co-authored-by: Cursor <cursoragent@cursor.com>
- Rename click -> left_click, add double_click/triple_click/right_click
- Rename center_coordinates -> coordinates throughout
- Remove stop and read_texts_and_links action handlers
- Add WebP screenshot conversion (sharp for TS, Pillow for Python)
- Add sharp/Pillow to dependencies

Co-authored-by: Cursor <cursoragent@cursor.com>
- Parse actions from response.tool_calls instead of JSON in content
- Use role: "tool" with tool_call_id for tool results
- Combine task text + initial screenshot in single user message
- Stop condition: no tool_calls in response (model returns plain content)
- Update MIME type to image/webp
- Remove parseN1Response / _parse_n1_response JSON parsing
- Update scaleCoordinates for coordinates field (was center_coordinates)

Co-authored-by: Cursor <cursoragent@cursor.com>
n1-latest returns plain text content (not JSON), so remove JSON
parsing from the fallback message extraction in both entrypoints.

Co-authored-by: Cursor <cursoragent@cursor.com>
- Update viewport docs (now using 1280 directly)
- Update action table with new click variants, remove stop
- Add WebP screenshot section

Co-authored-by: Cursor <cursoragent@cursor.com>
Add a .cursor/plans entry to .gitignore and a 'Cursor' comment header to prevent committing cursor plan files. Also include a blank line for readability.
Rename the max_tokens parameter to max_completion_tokens in the yutori-computer-use sampling loop templates (Python and TypeScript). Update function signatures, default values (keeps 4096), interface/property name in TS, and the client.chat.completions.create payload key to use max_completion_tokens. This aligns the parameter name with the completion API field and preserves existing behavior.
- Payload: optional kiosk (TS) / kiosk (Python) on cua-task
- Session: create browser with kiosk_mode when true (TS + Python)
- Loop: pass kioskMode/kiosk_mode into sampling loop and ComputerTool
- ComputerTool: accept kioskMode/kiosk_mode param (no behavior change yet)
- goto_url still uses Computer Controls (Ctrl+L); Playwright path in Step 2

Co-authored-by: Cursor <cursoragent@cursor.com>
When kiosk_mode is true, goto_url calls Playwright Execution API
(page.goto) instead of Computer Controls so navigation works without
the address bar. Non-kiosk unchanged (Ctrl+L + type + Enter).

Co-authored-by: Cursor <cursoragent@cursor.com>
Clarify Yutori testing instructions by replacing the previous 'computer_use'/'playwright' mode notes with a single guidance to test both the default browser and the 'kiosk: true' (Playwright) option. Update kernel invoke examples (TypeScript and Python) to remove the 'mode' field and use either no mode (default) or 'kiosk': true. Adjust the automated runtime test matrix to list 'default' and 'kiosk: true' entries. These changes simplify the documentation and align examples with the current CLI payload format.
Update Python and TypeScript yutori-computer-use README templates: replace the generic example invoke payload with a concrete Magnitasks kanban drag-and-drop scenario, and add a new "Kiosk mode" section. The new section explains when to use kiosk mode (recording or single-site automation), notes that the agent may still try the address bar which can slow sessions, and provides example invoke commands for default (non-kiosk) and kiosk usage. Changes applied to pkg/templates/python/yutori-computer-use/README.md and pkg/templates/typescript/yutori-computer-use/README.md to clarify usage and improve replay/automation guidance.
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

@dprevoznik dprevoznik requested a review from Sayan- February 18, 2026 14:28
Remove 'Build tool response message' comments
Replace returning an error object with throwing a ToolError when Playwright goto fails in ComputerTool. This standardizes error handling so callers can catch exceptions and preserves the original response error or falls back to 'Playwright goto failed'.
Copy link
Contributor

@Sayan- Sayan- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff!

A few items to address before merge:


1. Python loop.pyraise api_error should be bare raise

except Exception as api_error:
    print(f"API call failed: {api_error}")
    raise api_error  # ← resets the traceback to this line

Should be:

except Exception as api_error:
    print(f"API call failed: {api_error}")
    raise  # preserves original traceback for debugging

2. TypeScript loop.tsas unknown as string cast hides API extension

content: [
  {
    type: 'image_url',
    image_url: {
      url: `data:image/webp;base64,${result.base64Image}`,
    },
  },
] as unknown as string,

This double-cast is a lie to the type system. Consider replacing with // @ts-expect-error Yutori n1 accepts image content arrays in tool messages — it documents the API extension and will surface a diagnostic if the OpenAI SDK types ever add native support for this.


3. README action tables are incomplete (both Python and TypeScript)

Both READMEs document 8 actions but the code handles 13. Missing from the tables: key_press, hover, drag, wait, refresh.


4. Python loop.py — manual assistant message serialization is brittle (nit)

The manual dict construction:

assistant_dict: dict[str, Any] = {
    "role": "assistant",
    "content": assistant_message.content or "",
}
if assistant_message.tool_calls:
    assistant_dict["tool_calls"] = [...]

Works today but could silently break if the API adds fields. Consider using assistant_message.model_dump(exclude_none=True) to future-proof this (similar to how the TypeScript version pushes the SDK object directly).

Copy link
Contributor

@Sayan- Sayan- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments