Skip to content

fix(skills): address NVSkills validator findings on user skills (stacked on #4438)#4455

Closed
jyaunches wants to merge 1 commit into
export-nemoclaw-user-agent-skillsfrom
fix/4438-skill-validator-findings
Closed

fix(skills): address NVSkills validator findings on user skills (stacked on #4438)#4455
jyaunches wants to merge 1 commit into
export-nemoclaw-user-agent-skillsfrom
fix/4438-skill-validator-findings

Conversation

@jyaunches
Copy link
Copy Markdown
Contributor

Summary

Stacked on top of #4438. Applies the NVSkills validator's verbatim fix recommendations from the most recent failed validation runs on export-nemoclaw-user-agent-skills (GitLab pipeline 52953267, job 328478588).

This PR edits source skills only (.agents/skills/<name>/SKILL.md). A follow-up commit will mirror the changes into skills/<name>/ per the publishing flow introduced in #4448.

What this addresses

Per the failure report, the validator flagged 5 [MEDIUM] static lints and 5 HIGH agent-eval regressions across the two user skills. This PR targets the structural items the validator can verify deterministically:

Validator finding Fix in this PR
metadata.author missing (MEDIUM × both) Added Miyoung Choi <miyoungc@nvidia.com>
metadata.tags missing (MEDIUM × both) Added scoped tag lists
"Author not specified in metadata" (MEDIUM) Resolved by metadata.author
Missing ## Instructions (MEDIUM × both) Added with explicit step sequence
Missing ## Examples (MEDIUM × both) Added with 3–4 grounded examples
Discoverability HIGH on nemoclaw-user-agent-skills — "codex conflates with nemoclaw-skills-guide" Rewrote description: to scope to install/load and redirect discovery queries to nemoclaw-skills-guide
"Cap or sequence tool invocations so skill_efficiency stops collapsing to 0.0" Both ## Instructions sections now cap to one Read of one reference, no chaining

Out of scope (follow-up commit)

  • Mirror edits into skills/nemoclaw-user-agent-skills/SKILL.md and skills/nemoclaw-user-get-started/SKILL.md so NVSkills CI sees them when signing.
  • evals/evals.json rubric tuning.

Test plan

After mirror commit lands, comment /nvskills-ci on #4438 and confirm:

  1. Static QUALITY/SCHEMA findings drop from 5 → 0.
  2. Discoverability score on nemoclaw-user-agent-skills rises from 0.50 toward the 0.70 threshold.
  3. skill_efficiency no longer collapses to 0.0 on case 003 trials.

Apply the validator's fix recommendations from PR #4438 NVSkills CI runs
(GitLab pipeline 52953267, job 328478588). Targets the static QUALITY/SCHEMA
findings and the discoverability/efficiency dimension regressions called out
on both nemoclaw-user-agent-skills and nemoclaw-user-get-started.

Frontmatter (both skills):
- Add metadata.author and metadata.tags so SKILL_SPEC recommended fields
  are populated. Clears 3x [MEDIUM] QUALITY findings per skill.

Description (nemoclaw-user-agent-skills):
- Rewrite to disambiguate from nemoclaw-skills-guide. Validator finding #1
  on the 15:40 run identified codex conflating the two skills, dropping
  discoverability to 0.50 with skill_efficiency=0.0 on case 003. New
  description scopes this skill explicitly to install/load workflows and
  redirects discovery queries to nemoclaw-skills-guide.

SKILL.md structure (both skills):
- Add ## Instructions section. Clears [MEDIUM] SCHEMA finding 'Missing
  recommended section' and gives the LLM judge a stable anchor for
  instruction-quality scoring.
- Add ## Examples section. Clears the matching [MEDIUM] SCHEMA finding.
- Both Instructions sections explicitly cap tool invocations (one Read of
  one reference, no chaining), addressing validator fix item #4 'Cap or
  sequence tool invocations so skill_efficiency stops collapsing to 0.0
  on multiple trials.'

Out of scope for this commit:
- Mirror to skills/<name>/ for NVSkills CI signing (separate commit).
- evals.json adjustments.
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 28, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 57956732-46d0-4858-ad30-72530c70c962

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/4438-skill-validator-findings

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: None
Optional E2E: None

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/export-nemoclaw-user-agent-skills
Head: HEAD
Confidence: high

Required E2E

  • None. No existing E2E job is recommended for this PR because the changes are confined to assistant skill Markdown content and metadata. They cannot affect NemoClaw runtime behavior, installer execution, sandbox lifecycle, credentials, network policy, inference routing, deployment, or OpenClaw sandbox operation.

Optional E2E

  • None.

New E2E recommendations

  • agent-skills-assistant-flows (medium): There does not appear to be an existing E2E job that exercises .agents/skills behavior from an assistant user's perspective, such as choosing the correct skill, reading only the intended reference once, enforcing tool budgets, and handing off to sibling skills. The existing skills-frontmatter coverage is a repository validation test rather than an end-to-end assistant-flow check.
    • Suggested test: Add an E2E or scenario-style skills harness that simulates prompts for nemoclaw-user-get-started and nemoclaw-user-agent-skills, verifies selected references and stop conditions, and validates that quickstart/install guidance remains consistent with supported onboarding paths.

@github-actions
Copy link
Copy Markdown
Contributor

E2E Scenario Advisor Recommendation

Required scenario E2E: None
Optional scenario E2E: None

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/export-nemoclaw-user-agent-skills
Head: HEAD
Confidence: high

Required scenario E2E

  • None. No scenario workflow, scenario metadata, scenario runtime, or validation-suite files changed.

Optional scenario E2E

  • None.

Relevant changed files

  • None.

@jyaunches
Copy link
Copy Markdown
Contributor Author

Recreating off main instead of stacked on #4438.

@jyaunches jyaunches closed this May 28, 2026
@jyaunches jyaunches deleted the fix/4438-skill-validator-findings branch May 28, 2026 18:48
@github-actions
Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 3 needs attention, 1 worth checking, 0 nice ideas
Top item: Align the changed skills with their evals and generated/exported sources

Review findings

🛠️ Needs attention

  • Discoverability fix conflicts with the existing skill evals (.agents/skills/nemoclaw-user-agent-skills/SKILL.md:23): The PR claims to fix the HIGH discoverability regression by redirecting catalog/discovery queries to `nemoclaw-skills-guide`, and the changed skill now says to use this skill only for installing or loading skills. However, the existing evals for `nemoclaw-user-agent-skills` still expect broad discovery/catalog/trust questions to load this same skill and `references/agent-skills.md`. That means the repository evidence for the claimed discoverability behavior is internally contradictory.
    • Recommendation: Either update `.agents/skills/nemoclaw-user-agent-skills/evals/evals.json` so discovery/catalog cases expect `nemoclaw-skills-guide`, or narrow the new routing text so it still satisfies the existing eval contract. Keep the skill description, examples, and eval expectations aligned in the same change.
    • Evidence: Changed line 23 says discovery questions should hand off to `nemoclaw-skills-guide`; existing evals ask to "find a skill that can guide installation, policy, inference, or operations" and "understand what each skill is designed to do" while still expecting `expected_skill: "nemoclaw-user-agent-skills"`.
  • Skill metadata and instruction fixes are not applied at the generation source or exported copy (.agents/skills/nemoclaw-user-get-started/SKILL.md:5): The patch edits only `.agents/skills/.../SKILL.md`, but nearby repository files show these user skills are generated from docs and also exported under `skills/`. The source docs still have the old `description-agent` values and no new `## Instructions` or `## Examples`, while `skills/nemoclaw-user-agent-skills/SKILL.md` and `skills/nemoclaw-user-get-started/SKILL.md` still contain the old frontmatter without the new metadata. This leaves the validator fixes vulnerable to being overwritten by the next docs-to-skills refresh and absent from the exported/published skill copy.
    • Recommendation: Move the structural fixes into the documented source-of-truth path: update the source docs or generator so regenerated `.agents/skills` files include the required metadata/instructions/examples, and mirror/regenerate the corresponding `skills/` exports in the same stack before relying on the validator result.
    • Evidence: `scripts/docs-to-skills.py` documents generation from `docs/` into `.agents/skills/`; `docs/resources/agent-skills.mdx` and `docs/get-started/quickstart.mdx` retain old agent descriptions; `skills/README.md` says exported skills are generated from `.agents/skills/`; exported `skills/.../SKILL.md` files remain unchanged.
  • Quickstart still contains reference-loading prompts that undermine the one-reference budget (.agents/skills/nemoclaw-user-get-started/SKILL.md:35): The new quickstart instructions cap the skill to at most one reference read, but the same skill body still tells the assistant/user to review prerequisites before following the guide and later explicitly says to load `references/quickstart-details.md`. Those remaining prompts can cause the agent to chain references despite the PR's stated acceptance goal of preventing tool-efficiency collapse.
    • Recommendation: Reconcile the old body text with the new fixed sequence. For example, make prerequisite and detail links passive references, or condition them under the single-reference selection table, so there is only one authoritative rule for when a reference may be loaded.
    • Evidence: Line 35 adds "Load at most one reference" and line 47 says not to chain references; line 26 still says to complete reviewing prerequisites, and line 195 still says "Load [references/quickstart-details.md]" for detailed wizard steps.

🔎 Worth checking

  • Source-of-truth review needed: Generated user skill content and exported skill copies: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: Changed files add metadata and instructions only in `.agents/skills`; source docs and exported `skills/.../SKILL.md` copies remain old, and existing evals still expect broad discovery queries to use `nemoclaw-user-agent-skills`.

🌱 Nice ideas

  • None.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants