Update yutori-computer-use templates for n1-latest#113
Update yutori-computer-use templates for n1-latest#113dprevoznik wants to merge 14 commits intomainfrom
Conversation
…o user - Model: n1-preview-2025-11 -> n1-latest - Viewport width: 1200 -> 1280 (Yutori's recommended resolution) - Message role: observation -> user for screenshots and tool results - Clean up JSDoc/docstring comments referencing old defaults Co-authored-by: Cursor <cursoragent@cursor.com>
Simplify templates to only support Kernel's Computer Controls API. Removes playwright-computer.ts/py, BrowserMode type, cdpWsUrl/mode from loop options and entrypoint payloads, and playwright deps. Co-authored-by: Cursor <cursoragent@cursor.com>
- Rename click -> left_click, add double_click/triple_click/right_click - Rename center_coordinates -> coordinates throughout - Remove stop and read_texts_and_links action handlers - Add WebP screenshot conversion (sharp for TS, Pillow for Python) - Add sharp/Pillow to dependencies Co-authored-by: Cursor <cursoragent@cursor.com>
- Parse actions from response.tool_calls instead of JSON in content - Use role: "tool" with tool_call_id for tool results - Combine task text + initial screenshot in single user message - Stop condition: no tool_calls in response (model returns plain content) - Update MIME type to image/webp - Remove parseN1Response / _parse_n1_response JSON parsing - Update scaleCoordinates for coordinates field (was center_coordinates) Co-authored-by: Cursor <cursoragent@cursor.com>
n1-latest returns plain text content (not JSON), so remove JSON parsing from the fallback message extraction in both entrypoints. Co-authored-by: Cursor <cursoragent@cursor.com>
- Update viewport docs (now using 1280 directly) - Update action table with new click variants, remove stop - Add WebP screenshot section Co-authored-by: Cursor <cursoragent@cursor.com>
Add a .cursor/plans entry to .gitignore and a 'Cursor' comment header to prevent committing cursor plan files. Also include a blank line for readability.
Rename the max_tokens parameter to max_completion_tokens in the yutori-computer-use sampling loop templates (Python and TypeScript). Update function signatures, default values (keeps 4096), interface/property name in TS, and the client.chat.completions.create payload key to use max_completion_tokens. This aligns the parameter name with the completion API field and preserves existing behavior.
- Payload: optional kiosk (TS) / kiosk (Python) on cua-task - Session: create browser with kiosk_mode when true (TS + Python) - Loop: pass kioskMode/kiosk_mode into sampling loop and ComputerTool - ComputerTool: accept kioskMode/kiosk_mode param (no behavior change yet) - goto_url still uses Computer Controls (Ctrl+L); Playwright path in Step 2 Co-authored-by: Cursor <cursoragent@cursor.com>
When kiosk_mode is true, goto_url calls Playwright Execution API (page.goto) instead of Computer Controls so navigation works without the address bar. Non-kiosk unchanged (Ctrl+L + type + Enter). Co-authored-by: Cursor <cursoragent@cursor.com>
Clarify Yutori testing instructions by replacing the previous 'computer_use'/'playwright' mode notes with a single guidance to test both the default browser and the 'kiosk: true' (Playwright) option. Update kernel invoke examples (TypeScript and Python) to remove the 'mode' field and use either no mode (default) or 'kiosk': true. Adjust the automated runtime test matrix to list 'default' and 'kiosk: true' entries. These changes simplify the documentation and align examples with the current CLI payload format.
Update Python and TypeScript yutori-computer-use README templates: replace the generic example invoke payload with a concrete Magnitasks kanban drag-and-drop scenario, and add a new "Kiosk mode" section. The new section explains when to use kiosk mode (recording or single-site automation), notes that the agent may still try the address bar which can slow sessions, and provides example invoke commands for default (non-kiosk) and kiosk usage. Changes applied to pkg/templates/python/yutori-computer-use/README.md and pkg/templates/typescript/yutori-computer-use/README.md to clarify usage and improve replay/automation guidance.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
Replace returning an error object with throwing a ToolError when Playwright goto fails in ComputerTool. This standardizes error handling so callers can catch exceptions and preserves the original response error or falls back to 'Playwright goto failed'.
Sayan-
left a comment
There was a problem hiding this comment.
Great stuff!
A few items to address before merge:
1. Python loop.py — raise api_error should be bare raise
except Exception as api_error:
print(f"API call failed: {api_error}")
raise api_error # ← resets the traceback to this lineShould be:
except Exception as api_error:
print(f"API call failed: {api_error}")
raise # preserves original traceback for debugging2. TypeScript loop.ts — as unknown as string cast hides API extension
content: [
{
type: 'image_url',
image_url: {
url: `data:image/webp;base64,${result.base64Image}`,
},
},
] as unknown as string,This double-cast is a lie to the type system. Consider replacing with // @ts-expect-error Yutori n1 accepts image content arrays in tool messages — it documents the API extension and will surface a diagnostic if the OpenAI SDK types ever add native support for this.
3. README action tables are incomplete (both Python and TypeScript)
Both READMEs document 8 actions but the code handles 13. Missing from the tables: key_press, hover, drag, wait, refresh.
4. Python loop.py — manual assistant message serialization is brittle (nit)
The manual dict construction:
assistant_dict: dict[str, Any] = {
"role": "assistant",
"content": assistant_message.content or "",
}
if assistant_message.tool_calls:
assistant_dict["tool_calls"] = [...]Works today but could silently break if the API adds fields. Consider using assistant_message.model_dump(exclude_none=True) to future-proof this (similar to how the TypeScript version pushes the SDK object directly).
Summary
n1-latestmodel API, which uses OpenAI-compatibletool_callsformat instead of the previous custom formatgoto_urlin kiosk mode)max_tokenstomax_completion_tokens, update viewport to 1280x800, simplify the sampling loop and tool mappingTest plan
All 13 action types were tested against live Kernel browser sessions for both Python and TypeScript implementations (27 tests each, all passing):
left_click,double_click,triple_click,right_clickhover,dragtype(basic,clear_before_typing,press_enter_after)key_press(single key, combos, modifier mapping)scroll(up, down, left, right)goto_url,go_back,refresh,waitMade with Cursor
Note
Medium Risk
Changes core agent-loop and action execution semantics (tool-calls, coordinate/action schema, screenshot encoding) and adds new dependencies, which could cause runtime regressions in browser automation if the API expectations differ.
Overview
Updates the Python and TypeScript
yutori-computer-usetemplates to target Yutorin1-latest, switching the sampling loop from the prior JSON-in-content protocol to OpenAI-compatibletool_calls/toolmessages and renaming token budgeting tomax_completion_tokens.Simplifies execution to a single Computer Controls tool path by removing the Playwright “mode” implementation, while adding an optional
kiosk/kioskModethat launches the browser in kiosk mode and routesgoto_urlthroughbrowsers.playwright.executewhen enabled.Aligns defaults and payloads with the new API: viewport defaults move to
1280x800, actions change tocoordinatesplus distinct click types (left_click/double_click/triple_click/right_click), screenshots are converted from PNG to WebP (newPillow/sharpdeps), and QA/docs/examples are updated accordingly.Written by Cursor Bugbot for commit b31643b. This will update automatically on new commits. Configure here.