chore(skills): Add skill-creator and update managed agent skills#19713
chore(skills): Add skill-creator and update managed agent skills#19713
skill-creator and update managed agent skills#19713Conversation
Add `skill-creator` skill from anthropics/skills for creating and optimizing agent skills. Update `dotagents` and `skill-scanner` skills to their latest versions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
size-limit report 📦
|
| accumulated_json = "" | ||
| else: | ||
| return False | ||
|
|
There was a problem hiding this comment.
Bug: The function returns after checking only the first tool use in a response, both in the streaming and fallback paths. This will miss valid skill triggers if they aren't the first tool used.
Severity: HIGH
Suggested Fix
In the non-streaming path, move the return triggered statement outside of the for loop to ensure all content_items are checked. In the streaming path, remove the else: return False block and continue processing events to check for subsequent tool uses within the same response.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: .agents/skills/skill-creator/scripts/run_eval.py#L142
Potential issue: The skill trigger detection logic in `run_eval.py` can lead to false
negatives. In both the streaming path (around line 142) and the non-streaming fallback
path (line 164), the function returns prematurely. The streaming path explicitly returns
`False` if the first tool is not `Skill` or `Read`. The fallback path has an indentation
error, placing `return triggered` inside the `for` loop, causing it to exit after
checking only the first `tool_use` item. This means if an assistant response contains
multiple tool calls, and the relevant `Skill` or `Read` call is not the first one, it
will be missed, incorrectly reporting that the skill was not triggered.
Did we get this right? 👍 / 👎 to inform future reviews.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| "analyzer_model": "<model-name>", | ||
| "timestamp": datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ"), | ||
| "evals_run": eval_ids, | ||
| "runs_per_configuration": 3 |
There was a problem hiding this comment.
Hardcoded runs_per_configuration ignores actual run count
Low Severity
The runs_per_configuration metadata field is hardcoded to 3 instead of being calculated from the actual data in results. This causes both the viewer and the generated benchmark.md to display incorrect information about how many runs were performed per configuration, regardless of the actual number of runs the user executed.
|
|
||
| data_json = json.dumps(embedded) | ||
|
|
||
| return template.replace("/*__EMBEDDED_DATA__*/", f"const EMBEDDED_DATA = {data_json};") |
There was a problem hiding this comment.
Embedded JSON breaks viewer on </script> in output files
Medium Severity
json.dumps does not escape </script> sequences within string values. When generate_html embeds the JSON payload directly inside an HTML <script> tag, any text output file containing </script> will cause the browser's HTML parser to prematurely close the script block, completely breaking the viewer. The standard mitigation is to replace </ with <\/ in the serialized JSON before embedding.


Adds the official
skill-creatorskill fromanthropics/skillsfor creating and optimizing agent skills. We should use this going forward with every skill. ref https://claude.com/blog/improving-skill-creator-test-measure-and-refine-agent-skillsUpdate
dotagentsandskill-scannerskills to their latest versions.Closes #19760 (added automatically)