fix(skills): address NVSkills validator findings on user skills (stacked on #4438) by jyaunches · Pull Request #4455 · NVIDIA/NemoClaw

jyaunches · 2026-05-28T18:46:30Z

Summary

Stacked on top of #4438. Applies the NVSkills validator's verbatim fix recommendations from the most recent failed validation runs on export-nemoclaw-user-agent-skills (GitLab pipeline 52953267, job 328478588).

This PR edits source skills only (.agents/skills/<name>/SKILL.md). A follow-up commit will mirror the changes into skills/<name>/ per the publishing flow introduced in #4448.

What this addresses

Per the failure report, the validator flagged 5 [MEDIUM] static lints and 5 HIGH agent-eval regressions across the two user skills. This PR targets the structural items the validator can verify deterministically:

Validator finding	Fix in this PR
`metadata.author` missing (MEDIUM × both)	Added `Miyoung Choi <miyoungc@nvidia.com>`
`metadata.tags` missing (MEDIUM × both)	Added scoped tag lists
"Author not specified in metadata" (MEDIUM)	Resolved by metadata.author
Missing `## Instructions` (MEDIUM × both)	Added with explicit step sequence
Missing `## Examples` (MEDIUM × both)	Added with 3–4 grounded examples
Discoverability HIGH on `nemoclaw-user-agent-skills` — "codex conflates with nemoclaw-skills-guide"	Rewrote `description:` to scope to install/load and redirect discovery queries to nemoclaw-skills-guide
"Cap or sequence tool invocations so skill_efficiency stops collapsing to 0.0"	Both `## Instructions` sections now cap to one Read of one reference, no chaining

Out of scope (follow-up commit)

Mirror edits into skills/nemoclaw-user-agent-skills/SKILL.md and skills/nemoclaw-user-get-started/SKILL.md so NVSkills CI sees them when signing.
evals/evals.json rubric tuning.

Test plan

After mirror commit lands, comment /nvskills-ci on #4438 and confirm:

Static QUALITY/SCHEMA findings drop from 5 → 0.
Discoverability score on nemoclaw-user-agent-skills rises from 0.50 toward the 0.70 threshold.
skill_efficiency no longer collapses to 0.0 on case 003 trials.

Apply the validator's fix recommendations from PR #4438 NVSkills CI runs (GitLab pipeline 52953267, job 328478588). Targets the static QUALITY/SCHEMA findings and the discoverability/efficiency dimension regressions called out on both nemoclaw-user-agent-skills and nemoclaw-user-get-started. Frontmatter (both skills): - Add metadata.author and metadata.tags so SKILL_SPEC recommended fields are populated. Clears 3x [MEDIUM] QUALITY findings per skill. Description (nemoclaw-user-agent-skills): - Rewrite to disambiguate from nemoclaw-skills-guide. Validator finding #1 on the 15:40 run identified codex conflating the two skills, dropping discoverability to 0.50 with skill_efficiency=0.0 on case 003. New description scopes this skill explicitly to install/load workflows and redirects discovery queries to nemoclaw-skills-guide. SKILL.md structure (both skills): - Add ## Instructions section. Clears [MEDIUM] SCHEMA finding 'Missing recommended section' and gives the LLM judge a stable anchor for instruction-quality scoring. - Add ## Examples section. Clears the matching [MEDIUM] SCHEMA finding. - Both Instructions sections explicitly cap tool invocations (one Read of one reference, no chaining), addressing validator fix item #4 'Cap or sequence tool invocations so skill_efficiency stops collapsing to 0.0 on multiple trials.' Out of scope for this commit: - Mirror to skills/<name>/ for NVSkills CI signing (separate commit). - evals.json adjustments.

copy-pr-bot · 2026-05-28T18:46:34Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-05-28T18:46:37Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 57956732-46d0-4858-ad30-72530c70c962

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/4438-skill-validator-findings

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-28T18:47:38Z

E2E Advisor Recommendation

Required E2E: None
Optional E2E: None

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/export-nemoclaw-user-agent-skills
Head: HEAD
Confidence: high

Required E2E

None. No existing E2E job is recommended for this PR because the changes are confined to assistant skill Markdown content and metadata. They cannot affect NemoClaw runtime behavior, installer execution, sandbox lifecycle, credentials, network policy, inference routing, deployment, or OpenClaw sandbox operation.

Optional E2E

None.

New E2E recommendations

agent-skills-assistant-flows (medium): There does not appear to be an existing E2E job that exercises .agents/skills behavior from an assistant user's perspective, such as choosing the correct skill, reading only the intended reference once, enforcing tool budgets, and handing off to sibling skills. The existing skills-frontmatter coverage is a repository validation test rather than an end-to-end assistant-flow check.
- Suggested test: Add an E2E or scenario-style skills harness that simulates prompts for nemoclaw-user-get-started and nemoclaw-user-agent-skills, verifies selected references and stop conditions, and validates that quickstart/install guidance remains consistent with supported onboarding paths.

github-actions · 2026-05-28T18:47:39Z

E2E Scenario Advisor Recommendation

Required scenario E2E: None
Optional scenario E2E: None

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/export-nemoclaw-user-agent-skills
Head: HEAD
Confidence: high

Required scenario E2E

None. No scenario workflow, scenario metadata, scenario runtime, or validation-suite files changed.

Optional scenario E2E

None.

Relevant changed files

None.

jyaunches · 2026-05-28T18:48:35Z

Recreating off main instead of stacked on #4438.

github-actions · 2026-05-28T18:50:09Z

PR Review Advisor

Findings: 3 needs attention, 1 worth checking, 0 nice ideas
Top item: Align the changed skills with their evals and generated/exported sources

Review findings

🛠️ Needs attention

Discoverability fix conflicts with the existing skill evals (.agents/skills/nemoclaw-user-agent-skills/SKILL.md:23): The PR claims to fix the HIGH discoverability regression by redirecting catalog/discovery queries to `nemoclaw-skills-guide`, and the changed skill now says to use this skill only for installing or loading skills. However, the existing evals for `nemoclaw-user-agent-skills` still expect broad discovery/catalog/trust questions to load this same skill and `references/agent-skills.md`. That means the repository evidence for the claimed discoverability behavior is internally contradictory.
- Recommendation: Either update `.agents/skills/nemoclaw-user-agent-skills/evals/evals.json` so discovery/catalog cases expect `nemoclaw-skills-guide`, or narrow the new routing text so it still satisfies the existing eval contract. Keep the skill description, examples, and eval expectations aligned in the same change.
- Evidence: Changed line 23 says discovery questions should hand off to `nemoclaw-skills-guide`; existing evals ask to "find a skill that can guide installation, policy, inference, or operations" and "understand what each skill is designed to do" while still expecting `expected_skill: "nemoclaw-user-agent-skills"`.
Skill metadata and instruction fixes are not applied at the generation source or exported copy (.agents/skills/nemoclaw-user-get-started/SKILL.md:5): The patch edits only `.agents/skills/.../SKILL.md`, but nearby repository files show these user skills are generated from docs and also exported under `skills/`. The source docs still have the old `description-agent` values and no new `## Instructions` or `## Examples`, while `skills/nemoclaw-user-agent-skills/SKILL.md` and `skills/nemoclaw-user-get-started/SKILL.md` still contain the old frontmatter without the new metadata. This leaves the validator fixes vulnerable to being overwritten by the next docs-to-skills refresh and absent from the exported/published skill copy.
- Recommendation: Move the structural fixes into the documented source-of-truth path: update the source docs or generator so regenerated `.agents/skills` files include the required metadata/instructions/examples, and mirror/regenerate the corresponding `skills/` exports in the same stack before relying on the validator result.
- Evidence: `scripts/docs-to-skills.py` documents generation from `docs/` into `.agents/skills/`; `docs/resources/agent-skills.mdx` and `docs/get-started/quickstart.mdx` retain old agent descriptions; `skills/README.md` says exported skills are generated from `.agents/skills/`; exported `skills/.../SKILL.md` files remain unchanged.
Quickstart still contains reference-loading prompts that undermine the one-reference budget (.agents/skills/nemoclaw-user-get-started/SKILL.md:35): The new quickstart instructions cap the skill to at most one reference read, but the same skill body still tells the assistant/user to review prerequisites before following the guide and later explicitly says to load `references/quickstart-details.md`. Those remaining prompts can cause the agent to chain references despite the PR's stated acceptance goal of preventing tool-efficiency collapse.
- Recommendation: Reconcile the old body text with the new fixed sequence. For example, make prerequisite and detail links passive references, or condition them under the single-reference selection table, so there is only one authoritative rule for when a reference may be loaded.
- Evidence: Line 35 adds "Load at most one reference" and line 47 says not to chain references; line 26 still says to complete reviewing prerequisites, and line 195 still says "Load [references/quickstart-details.md]" for detailed wizard steps.

🔎 Worth checking

Source-of-truth review needed: Generated user skill content and exported skill copies: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: Changed files add metadata and instructions only in `.agents/skills`; source docs and exported `skills/.../SKILL.md` copies remain old, and existing evals still expect broad discovery queries to use `nemoclaw-user-agent-skills`.

🌱 Nice ideas

None.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

jyaunches closed this May 28, 2026

jyaunches deleted the fix/4438-skill-validator-findings branch May 28, 2026 18:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(skills): address NVSkills validator findings on user skills (stacked on #4438)#4455

fix(skills): address NVSkills validator findings on user skills (stacked on #4438)#4455
jyaunches wants to merge 1 commit into
export-nemoclaw-user-agent-skillsfrom
fix/4438-skill-validator-findings

jyaunches commented May 28, 2026

Uh oh!

copy-pr-bot Bot commented May 28, 2026

Uh oh!

coderabbitai Bot commented May 28, 2026

Review skipped

Uh oh!

github-actions Bot commented May 28, 2026

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Uh oh!

github-actions Bot commented May 28, 2026

E2E Scenario Advisor

Required scenario E2E

Optional scenario E2E

Relevant changed files

Uh oh!

jyaunches commented May 28, 2026

Uh oh!

github-actions Bot commented May 28, 2026

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jyaunches commented May 28, 2026

Summary

What this addresses

Out of scope (follow-up commit)

Test plan

Uh oh!

copy-pr-bot Bot commented May 28, 2026

Uh oh!

coderabbitai Bot commented May 28, 2026

Review skipped

Uh oh!

github-actions Bot commented May 28, 2026

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Uh oh!

github-actions Bot commented May 28, 2026

E2E Scenario Advisor Recommendation

E2E Scenario Advisor

Required scenario E2E

Optional scenario E2E

Relevant changed files

Uh oh!

jyaunches commented May 28, 2026

Uh oh!

github-actions Bot commented May 28, 2026

PR Review Advisor

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants