Skip to content

chore: fix for eval schema validation#4469

Closed
miyoungc wants to merge 1 commit into
mainfrom
skills-eval-schema-fix
Closed

chore: fix for eval schema validation#4469
miyoungc wants to merge 1 commit into
mainfrom
skills-eval-schema-fix

Conversation

@miyoungc
Copy link
Copy Markdown
Collaborator

@miyoungc miyoungc commented May 28, 2026

Summary

Related Issue

Changes

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • npm run docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Your Name your-email@example.com

Summary by CodeRabbit

Release Notes

  • Tests
    • Enhanced evaluation specifications across NemoClaw skills with explicit expected behavior requirements for skill usage and documentation references
    • Updated validation criteria to improve consistency and accuracy of agent responses

Review Change Stack

@miyoungc
Copy link
Copy Markdown
Collaborator Author

/nvskills-ci

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

📝 Walkthrough

Walkthrough

This pull request standardizes evaluation test cases across 16 NemoClaw skill evaluation JSON files by adding structured expected_behavior arrays. Each evaluation case now explicitly specifies the required skill to invoke, the reference documentation file(s) that must be cited, and that responses must directly answer the stated task. Some evaluations also include updated ground_truth text with more specific NemoClaw-oriented guidance.

Changes

Standardized Skill Evaluation Assertions

Layer / File(s) Summary
Agent Skills evaluation updates
.agents/skills/nemoclaw-user-agent-skills/evals/evals.json, skills/nemoclaw-user-agent-skills/evals/evals.json
Added expected_behavior arrays specifying nemoclaw-user-agent-skills skill and references/agent-skills.md reference for all three agent-skills documentation cases.
Inference configuration evaluation updates
.agents/skills/nemoclaw-user-configure-inference/evals/evals.json, skills/nemoclaw-user-configure-inference/evals/evals.json
Extended inference scenario evaluations (inference options, local routing, provider switching, sub-agent setup, tool-calling reliability) with expected_behavior that require nemoclaw-user-configure-inference skill and scenario-specific references/*.md files.
Security configuration evaluation updates
.agents/skills/nemoclaw-user-configure-security/evals/evals.json, skills/nemoclaw-user-configure-security/evals/evals.json
Added expected_behavior arrays across best-practices, credential-storage, and OpenClaw-controls topics, specifying nemoclaw-user-configure-security skill and corresponding reference files, with revised ground_truth text for NemoClaw-specific phrasing.
Remote deployment evaluation updates
.agents/skills/nemoclaw-user-deploy-remote/evals/evals.json, skills/nemoclaw-user-deploy-remote/evals/evals.json
Enhanced deployment scenario evaluations with expected_behavior arrays requiring nemoclaw-user-deploy-remote skill and either SKILL.md or specific references/*.md deployment documentation.
Get started evaluation updates
.agents/skills/nemoclaw-user-get-started/evals/evals.json, skills/nemoclaw-user-get-started/evals/evals.json
Added expected_behavior assertions to onboarding cases (prerequisites, Windows preparation, quickstart variants) specifying nemoclaw-user-get-started skill and corresponding reference markdown files.
Network policy management evaluation updates
.agents/skills/nemoclaw-user-manage-policy/evals/evals.json, skills/nemoclaw-user-manage-policy/evals/evals.json
Extended network-policy evaluations across customize-network-policy, approve-network-requests, and integration-policy-examples with expected_behavior arrays specifying nemoclaw-user-manage-policy skill and scenario-specific reference files.
Sandbox management evaluation updates
.agents/skills/nemoclaw-user-manage-sandboxes/evals/evals.json, skills/nemoclaw-user-manage-sandboxes/evals/evals.json
Enriched sandbox evaluations (lifecycle, runtime controls, backup/restore, workspace files, messaging channels) with expected_behavior arrays requiring nemoclaw-user-manage-sandboxes skill and topic-specific reference documentation.
Sandbox monitoring evaluation updates
.agents/skills/nemoclaw-user-monitor-sandbox/evals/evals.json, skills/nemoclaw-user-monitor-sandbox/evals/evals.json
Updated three monitoring activity cases with expected_behavior arrays and revised ground_truth text emphasizing NemoClaw-specific guidance for separating host, gateway, sandbox, policy, and inference concerns.
Overview and ecosystem evaluation updates
.agents/skills/nemoclaw-user-overview/evals/evals.json, skills/nemoclaw-user-overview/evals/evals.json
Added expected_behavior arrays to overview, ecosystem, how-it-works, and release-notes evaluations specifying nemoclaw-user-overview skill and corresponding reference files, with updated ground_truth phrasing.
Reference documentation evaluation updates
.agents/skills/nemoclaw-user-reference/evals/evals.json, skills/nemoclaw-user-reference/evals/evals.json
Extended reference evaluations (architecture, CLI guide, commands, policies, troubleshooting) with expected_behavior arrays requiring nemoclaw-user-reference skill and topic-specific references/*.md citation.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#4466: Directly conflicts with this PR by removing expected_behavior arrays from the same skill eval entries that this PR adds them to.
  • NVIDIA/NemoClaw#4463: Removes expected_behavior from nemoclaw-user-configure-inference and nemoclaw-user-configure-security eval objects that this PR is adding.
  • NVIDIA/NemoClaw#4293: Updates skill reference markdown documentation (e.g., inference options, get-started Windows prep, policy, overview) that aligns with the expected_behavior reference assertions being added in this PR.

Suggested labels

fix

Suggested reviewers

  • jyaunches
  • ericksoa
  • cv

Poem

🐰 In eval files, a pattern takes flight,
expected_behavior shines bold and bright,
Each skill knows its dance, its docs in place,
From agent to overview, standards embrace,
The rabbit hops through sixteen files with grace! 🌟

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'chore: fix for eval schema validation' is too vague to accurately describe the changeset, which updates expected_behavior arrays across 16 eval JSON files. Consider a more specific title like 'chore: add expected_behavior to eval cases across NemoClaw skills' that better reflects the systematic updates to the evaluation schema across multiple skill modules.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch skills-eval-schema-fix

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: None
Optional E2E: None

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • None. No NemoClaw E2E is recommended because the PR only updates agent-skill evaluation metadata. Existing E2E jobs such as skill-agent-e2e validate sandbox skill injection and agent use of an injected fixture, not these eval definitions, and docs-validation-e2e focuses on installed CLI/docs/link parity. These changes should be covered by skill/eval validation or NVSkills CI rather than NemoClaw runtime E2E.

Optional E2E

  • None.

New E2E recommendations

  • None.

@github-actions
Copy link
Copy Markdown
Contributor

E2E Scenario Advisor Recommendation

Required scenario E2E: None
Optional scenario E2E: None

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

  • None. No scenario workflow, scenario metadata, scenario runtime, or validation-suite files changed.

Optional scenario E2E

  • None.

Relevant changed files

  • None.

@miyoungc miyoungc added the enhancement: skill Improvements to NemoCall repository hygiene or user functionality with skills. label May 28, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 0 needs attention, 2 worth checking, 0 nice ideas
Since last review: 0 prior items resolved, 0 still apply, 2 new items found

Review findings

🛠️ Needs attention

  • None.

🔎 Worth checking

  • Published skill contents changed without refreshing signing artifacts (skills/nemoclaw-user-agent-skills/evals/evals.json:7): The PR changes eval files inside the root `skills/` directories, which the repository documents as the NVSkills CI watched, signed, and published location. The corresponding `skill.oms.sig` artifacts remain unchanged in the diff. If eval files are part of the signed skill payload, downstream users or publication tooling may see stale trust metadata for modified content.
    • Recommendation: Refresh or remove/regenerate the affected `skills/nemoclaw-user-*/skill.oms.sig` artifacts according to the NVSkills signing flow before treating the root `skills/` copies as published/signed content.
    • Evidence: Changed files include `skills/nemoclaw-user-*/evals/evals.json`; `skills/nemoclaw-user-*/skill.oms.sig` files exist in the same published skill directories but are not part of the diff. `.github/catalog-skills-signing-flow.md` states that root `skills/` is what gets signed and published.
  • Schema-validation fix lacks local validation evidence (.agents/skills/nemoclaw-user-agent-skills/evals/evals.json:7): The patch adds `expected_behavior` arrays to every eval case, which plausibly addresses the stated eval schema issue, but the repository checkout did not show an in-repo eval schema or evaluator code proving that this new field is accepted and consumed. Benchmark files and generated skill-card metadata were also not updated with evidence from a refreshed eval run.
    • Recommendation: Add or point to the targeted NVSkills-Eval/schema validation output for the changed eval files, and update any generated benchmark/metadata artifacts if they are expected to reflect the new eval definitions.
    • Evidence: All changed eval objects add `expected_behavior`; repository searches did not find a local schema/evaluator reference for that field, and no `BENCHMARK.md` or `skill-card.md` files changed.

🌱 Nice ideas

  • None.
Since last review details

Current findings:

  • Published skill contents changed without refreshing signing artifacts (skills/nemoclaw-user-agent-skills/evals/evals.json:7): The PR changes eval files inside the root `skills/` directories, which the repository documents as the NVSkills CI watched, signed, and published location. The corresponding `skill.oms.sig` artifacts remain unchanged in the diff. If eval files are part of the signed skill payload, downstream users or publication tooling may see stale trust metadata for modified content.
    • Recommendation: Refresh or remove/regenerate the affected `skills/nemoclaw-user-*/skill.oms.sig` artifacts according to the NVSkills signing flow before treating the root `skills/` copies as published/signed content.
    • Evidence: Changed files include `skills/nemoclaw-user-*/evals/evals.json`; `skills/nemoclaw-user-*/skill.oms.sig` files exist in the same published skill directories but are not part of the diff. `.github/catalog-skills-signing-flow.md` states that root `skills/` is what gets signed and published.
  • Schema-validation fix lacks local validation evidence (.agents/skills/nemoclaw-user-agent-skills/evals/evals.json:7): The patch adds `expected_behavior` arrays to every eval case, which plausibly addresses the stated eval schema issue, but the repository checkout did not show an in-repo eval schema or evaluator code proving that this new field is accepted and consumed. Benchmark files and generated skill-card metadata were also not updated with evidence from a refreshed eval run.
    • Recommendation: Add or point to the targeted NVSkills-Eval/schema validation output for the changed eval files, and update any generated benchmark/metadata artifacts if they are expected to reflect the new eval definitions.
    • Evidence: All changed eval objects add `expected_behavior`; repository searches did not find a local schema/evaluator reference for that field, and no `BENCHMARK.md` or `skill-card.md` files changed.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
skills/nemoclaw-user-deploy-remote/evals/evals.json (1)

59-59: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix grammatical error in question text.

The question starts with "I'm the hosted sandbox is created" which is grammatically incorrect. Based on the pattern used in line 26 ("I'm after remote deployment succeeds"), this should likely be "I'm after the hosted sandbox is created" or "Once the hosted sandbox is created".

📝 Proposed fix
-    "question": "I'm the hosted sandbox is created. Help me confirm where to connect and how to start using it so I can move from provisioning to actual agent work.",
+    "question": "I'm after the hosted sandbox is created. Help me confirm where to connect and how to start using it so I can move from provisioning to actual agent work.",
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@skills/nemoclaw-user-deploy-remote/evals/evals.json` at line 59, Update the
JSON "question" value to fix the grammar by replacing "I'm the hosted sandbox is
created" with a correct phrase such as "Once the hosted sandbox is created,"
(e.g., change the entry's question to: "Once the hosted sandbox is created, help
me confirm where to connect and how to start using it so I can move from
provisioning to actual agent work."). Make the change to the "question" field in
the same JSON object so the sentence follows the pattern used in other entries
like "I'm after remote deployment succeeds".
🧹 Nitpick comments (4)
skills/nemoclaw-user-reference/evals/evals.json (2)

6-6: ⚡ Quick win

Fix pronoun inconsistency in ground_truth fields.

The ground_truth text uses third-person framing ("helps the user") but then includes first-person pronouns copied from the question text ("my operation", "I need"). This creates grammatical inconsistency.

For example, line 39:

  • Current: "helps the user pick the command surface that owns my operation"
  • Should be: "helps the user pick the command surface that owns their operation" or "...owns the operation"

This issue appears in lines 6, 17, 28, 39, 50, 61, 72, 83, 94, 105, 116, 127, 138, 149, and 160.

📝 Example fixes for pronoun consistency
-    "ground_truth": "A NemoClaw-specific answer that helps the user pick the command surface that owns my operation and gives enough concrete guidance, decision criteria, verification steps, or risk framing to complete the task without breaking NemoClaw management.",
+    "ground_truth": "A NemoClaw-specific answer that helps the user pick the command surface that owns their operation and gives enough concrete guidance, decision criteria, verification steps, or risk framing to complete the task without breaking NemoClaw management.",
-    "ground_truth": "A NemoClaw-specific answer that helps the user find the exact action, flag, or recovery path I need and gives enough concrete guidance, decision criteria, verification steps, or risk framing to run the right command without scanning source code.",
+    "ground_truth": "A NemoClaw-specific answer that helps the user find the exact action, flag, or recovery path they need and gives enough concrete guidance, decision criteria, verification steps, or risk framing to run the right command without scanning source code.",

Apply similar changes to all affected ground_truth fields.

Also applies to: 17-17, 28-28, 39-39, 50-50, 61-61, 72-72, 83-83, 94-94, 105-105, 116-116, 127-127, 138-138, 149-149, 160-160

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@skills/nemoclaw-user-reference/evals/evals.json` at line 6, Several
ground_truth JSON entries use inconsistent first-person pronouns ("my", "I")
inside third‑person framing; update each affected "ground_truth" value to
replace first‑person pronouns with third‑person equivalents (e.g., "my" →
"their" or "the", "I" → "the user") while preserving the original meaning and
punctuation; target every "ground_truth" field instance mentioned in the comment
(the occurrences listed) so each sentence reads consistently in third person.

1-167: 💤 Low value

Consider removing redundancy between expected_skill and expected_behavior.

Every evaluation case includes both an expected_skill field and an expected_behavior array where the first item duplicates the skill name. For example:

  • "expected_skill": "nemoclaw-user-reference"
  • "expected_behavior": ["Uses the \nemoclaw-user-reference` skill.", ...]`

If the evaluation framework requires both fields, this is acceptable. Otherwise, consider whether expected_skill can be removed to reduce duplication across all 18 test cases.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@skills/nemoclaw-user-reference/evals/evals.json` around lines 1 - 167, The
test cases duplicate the skill name in both the "expected_skill" field and as
the first item of the "expected_behavior" array (e.g., "expected_skill":
"nemoclaw-user-reference" and "expected_behavior": ["Uses the
`nemoclaw-user-reference` skill.", ...]); remove the redundancy by either
deleting the "expected_skill" field across cases or removing the redundant first
item from each "expected_behavior" entry, and if the framework requires both
keep them but change the "expected_behavior" items to describe behavior (e.g.,
"Invokes the skill for answers") instead of repeating the skill identifier;
update all 18 objects (look for "expected_skill" and the corresponding
"expected_behavior" arrays) consistently.
.agents/skills/nemoclaw-user-reference/evals/evals.json (2)

6-6: ⚡ Quick win

Fix pronoun inconsistency in ground_truth fields.

The ground_truth text uses third-person framing ("helps the user") but then includes first-person pronouns copied from the question text ("my operation", "I need"). This creates grammatical inconsistency.

For example, line 39:

  • Current: "helps the user pick the command surface that owns my operation"
  • Should be: "helps the user pick the command surface that owns their operation" or "...owns the operation"

This issue appears in lines 6, 17, 28, 39, 50, 61, 72, 83, 94, 105, 116, 127, 138, 149, and 160.

📝 Example fixes for pronoun consistency
-    "ground_truth": "A NemoClaw-specific answer that helps the user pick the command surface that owns my operation and gives enough concrete guidance, decision criteria, verification steps, or risk framing to complete the task without breaking NemoClaw management.",
+    "ground_truth": "A NemoClaw-specific answer that helps the user pick the command surface that owns their operation and gives enough concrete guidance, decision criteria, verification steps, or risk framing to complete the task without breaking NemoClaw management.",
-    "ground_truth": "A NemoClaw-specific answer that helps the user find the exact action, flag, or recovery path I need and gives enough concrete guidance, decision criteria, verification steps, or risk framing to run the right command without scanning source code.",
+    "ground_truth": "A NemoClaw-specific answer that helps the user find the exact action, flag, or recovery path they need and gives enough concrete guidance, decision criteria, verification steps, or risk framing to run the right command without scanning source code.",

Apply similar changes to all affected ground_truth fields.

Also applies to: 17-17, 28-28, 39-39, 50-50, 61-61, 72-72, 83-83, 94-94, 105-105, 116-116, 127-127, 138-138, 149-149, 160-160

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.agents/skills/nemoclaw-user-reference/evals/evals.json at line 6, Update
all ground_truth string values to use consistent third-person phrasing: replace
first-person pronouns ("my", "I", "we" etc.) with third-person equivalents
("their", "the") or rephrase to avoid pronouns; the affected field is the JSON
key "ground_truth" in each object (occurrences around the listed entries) —
ensure lines that currently read like "helps the user pick the command surface
that owns my operation" become "helps the user pick the command surface that
owns their operation" or "owns the operation", and apply the same pronoun
replacements across all mentioned ground_truth entries.

1-167: 💤 Low value

Consider removing redundancy between expected_skill and expected_behavior.

Every evaluation case includes both an expected_skill field and an expected_behavior array where the first item duplicates the skill name. For example:

  • "expected_skill": "nemoclaw-user-reference"
  • "expected_behavior": ["Uses the \nemoclaw-user-reference` skill.", ...]`

If the evaluation framework requires both fields, this is acceptable. Otherwise, consider whether expected_skill can be removed to reduce duplication across all 18 test cases.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.agents/skills/nemoclaw-user-reference/evals/evals.json around lines 1 -
167, The test cases duplicate the skill name in "expected_skill" and the first
entry of "expected_behavior" (e.g., "expected_skill": "nemoclaw-user-reference"
and "expected_behavior": ["Uses the `nemoclaw-user-reference` skill.", ...]);
remove the redundancy by deleting the repeated skill string from each
"expected_behavior" first item or, if the framework requires a single canonical
field, remove the "expected_skill" key and keep "expected_behavior"
entries—update all 18 objects in evals.json consistently (look for the keys
expected_skill and expected_behavior) so only one source of truth remains.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.agents/skills/nemoclaw-user-deploy-remote/evals/evals.json:
- Line 59: The "question" string contains a grammar error ("I'm the hosted
sandbox is created"); update the JSON value for the "question" key to a correct
phrasing such as "Once the hosted sandbox is created, help me confirm where to
connect and how to start using it so I can move from provisioning to actual
agent work." — edit the value associated with "question" in the evals.json entry
to replace the incorrect sentence with this corrected wording.

---

Duplicate comments:
In `@skills/nemoclaw-user-deploy-remote/evals/evals.json`:
- Line 59: Update the JSON "question" value to fix the grammar by replacing "I'm
the hosted sandbox is created" with a correct phrase such as "Once the hosted
sandbox is created," (e.g., change the entry's question to: "Once the hosted
sandbox is created, help me confirm where to connect and how to start using it
so I can move from provisioning to actual agent work."). Make the change to the
"question" field in the same JSON object so the sentence follows the pattern
used in other entries like "I'm after remote deployment succeeds".

---

Nitpick comments:
In @.agents/skills/nemoclaw-user-reference/evals/evals.json:
- Line 6: Update all ground_truth string values to use consistent third-person
phrasing: replace first-person pronouns ("my", "I", "we" etc.) with third-person
equivalents ("their", "the") or rephrase to avoid pronouns; the affected field
is the JSON key "ground_truth" in each object (occurrences around the listed
entries) — ensure lines that currently read like "helps the user pick the
command surface that owns my operation" become "helps the user pick the command
surface that owns their operation" or "owns the operation", and apply the same
pronoun replacements across all mentioned ground_truth entries.
- Around line 1-167: The test cases duplicate the skill name in "expected_skill"
and the first entry of "expected_behavior" (e.g., "expected_skill":
"nemoclaw-user-reference" and "expected_behavior": ["Uses the
`nemoclaw-user-reference` skill.", ...]); remove the redundancy by deleting the
repeated skill string from each "expected_behavior" first item or, if the
framework requires a single canonical field, remove the "expected_skill" key and
keep "expected_behavior" entries—update all 18 objects in evals.json
consistently (look for the keys expected_skill and expected_behavior) so only
one source of truth remains.

In `@skills/nemoclaw-user-reference/evals/evals.json`:
- Line 6: Several ground_truth JSON entries use inconsistent first-person
pronouns ("my", "I") inside third‑person framing; update each affected
"ground_truth" value to replace first‑person pronouns with third‑person
equivalents (e.g., "my" → "their" or "the", "I" → "the user") while preserving
the original meaning and punctuation; target every "ground_truth" field instance
mentioned in the comment (the occurrences listed) so each sentence reads
consistently in third person.
- Around line 1-167: The test cases duplicate the skill name in both the
"expected_skill" field and as the first item of the "expected_behavior" array
(e.g., "expected_skill": "nemoclaw-user-reference" and "expected_behavior":
["Uses the `nemoclaw-user-reference` skill.", ...]); remove the redundancy by
either deleting the "expected_skill" field across cases or removing the
redundant first item from each "expected_behavior" entry, and if the framework
requires both keep them but change the "expected_behavior" items to describe
behavior (e.g., "Invokes the skill for answers") instead of repeating the skill
identifier; update all 18 objects (look for "expected_skill" and the
corresponding "expected_behavior" arrays) consistently.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: db57419c-007e-4997-adb7-ce809fc61642

📥 Commits

Reviewing files that changed from the base of the PR and between 442f64b and 8bc92c5.

📒 Files selected for processing (20)
  • .agents/skills/nemoclaw-user-agent-skills/evals/evals.json
  • .agents/skills/nemoclaw-user-configure-inference/evals/evals.json
  • .agents/skills/nemoclaw-user-configure-security/evals/evals.json
  • .agents/skills/nemoclaw-user-deploy-remote/evals/evals.json
  • .agents/skills/nemoclaw-user-get-started/evals/evals.json
  • .agents/skills/nemoclaw-user-manage-policy/evals/evals.json
  • .agents/skills/nemoclaw-user-manage-sandboxes/evals/evals.json
  • .agents/skills/nemoclaw-user-monitor-sandbox/evals/evals.json
  • .agents/skills/nemoclaw-user-overview/evals/evals.json
  • .agents/skills/nemoclaw-user-reference/evals/evals.json
  • skills/nemoclaw-user-agent-skills/evals/evals.json
  • skills/nemoclaw-user-configure-inference/evals/evals.json
  • skills/nemoclaw-user-configure-security/evals/evals.json
  • skills/nemoclaw-user-deploy-remote/evals/evals.json
  • skills/nemoclaw-user-get-started/evals/evals.json
  • skills/nemoclaw-user-manage-policy/evals/evals.json
  • skills/nemoclaw-user-manage-sandboxes/evals/evals.json
  • skills/nemoclaw-user-monitor-sandbox/evals/evals.json
  • skills/nemoclaw-user-overview/evals/evals.json
  • skills/nemoclaw-user-reference/evals/evals.json

},
{
"id": "docs-deployment-brev-web-ui-003",
"question": "I'm the hosted sandbox is created. Help me confirm where to connect and how to start using it so I can move from provisioning to actual agent work.",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix grammatical error in question text.

The question starts with "I'm the hosted sandbox is created" which is grammatically incorrect. Based on the pattern used in line 26 ("I'm after remote deployment succeeds"), this should likely be "I'm after the hosted sandbox is created" or "Once the hosted sandbox is created".

📝 Proposed fix
-    "question": "I'm the hosted sandbox is created. Help me confirm where to connect and how to start using it so I can move from provisioning to actual agent work.",
+    "question": "I'm after the hosted sandbox is created. Help me confirm where to connect and how to start using it so I can move from provisioning to actual agent work.",
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"question": "I'm the hosted sandbox is created. Help me confirm where to connect and how to start using it so I can move from provisioning to actual agent work.",
"question": "I'm after the hosted sandbox is created. Help me confirm where to connect and how to start using it so I can move from provisioning to actual agent work.",
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.agents/skills/nemoclaw-user-deploy-remote/evals/evals.json at line 59, The
"question" string contains a grammar error ("I'm the hosted sandbox is
created"); update the JSON value for the "question" key to a correct phrasing
such as "Once the hosted sandbox is created, help me confirm where to connect
and how to start using it so I can move from provisioning to actual agent work."
— edit the value associated with "question" in the evals.json entry to replace
the incorrect sentence with this corrected wording.

@miyoungc miyoungc closed this May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement: skill Improvements to NemoCall repository hygiene or user functionality with skills.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants