chore: fix for eval schema validation by miyoungc · Pull Request #4469 · NVIDIA/NemoClaw

miyoungc · 2026-05-28T21:50:07Z

Summary

Related Issue

Changes

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

npx prek run --all-files passes
npm test passes
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes
npm run docs builds without warnings (doc changes only)
Doc pages follow the style guide (doc changes only)
New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Your Name your-email@example.com

Summary by CodeRabbit

Release Notes

Tests
- Enhanced evaluation specifications across NemoClaw skills with explicit expected behavior requirements for skill usage and documentation references
- Updated validation criteria to improve consistency and accuracy of agent responses

miyoungc · 2026-05-28T21:50:18Z

/nvskills-ci

coderabbitai · 2026-05-28T21:50:19Z

📝 Walkthrough

Walkthrough

This pull request standardizes evaluation test cases across 16 NemoClaw skill evaluation JSON files by adding structured expected_behavior arrays. Each evaluation case now explicitly specifies the required skill to invoke, the reference documentation file(s) that must be cited, and that responses must directly answer the stated task. Some evaluations also include updated ground_truth text with more specific NemoClaw-oriented guidance.

Changes

Standardized Skill Evaluation Assertions

Layer / File(s)	Summary
Agent Skills evaluation updates `.agents/skills/nemoclaw-user-agent-skills/evals/evals.json`, `skills/nemoclaw-user-agent-skills/evals/evals.json`	Added `expected_behavior` arrays specifying `nemoclaw-user-agent-skills` skill and `references/agent-skills.md` reference for all three agent-skills documentation cases.
Inference configuration evaluation updates `.agents/skills/nemoclaw-user-configure-inference/evals/evals.json`, `skills/nemoclaw-user-configure-inference/evals/evals.json`	Extended inference scenario evaluations (inference options, local routing, provider switching, sub-agent setup, tool-calling reliability) with `expected_behavior` that require `nemoclaw-user-configure-inference` skill and scenario-specific `references/*.md` files.
Security configuration evaluation updates `.agents/skills/nemoclaw-user-configure-security/evals/evals.json`, `skills/nemoclaw-user-configure-security/evals/evals.json`	Added `expected_behavior` arrays across best-practices, credential-storage, and OpenClaw-controls topics, specifying `nemoclaw-user-configure-security` skill and corresponding reference files, with revised `ground_truth` text for NemoClaw-specific phrasing.
Remote deployment evaluation updates `.agents/skills/nemoclaw-user-deploy-remote/evals/evals.json`, `skills/nemoclaw-user-deploy-remote/evals/evals.json`	Enhanced deployment scenario evaluations with `expected_behavior` arrays requiring `nemoclaw-user-deploy-remote` skill and either `SKILL.md` or specific `references/*.md` deployment documentation.
Get started evaluation updates `.agents/skills/nemoclaw-user-get-started/evals/evals.json`, `skills/nemoclaw-user-get-started/evals/evals.json`	Added `expected_behavior` assertions to onboarding cases (prerequisites, Windows preparation, quickstart variants) specifying `nemoclaw-user-get-started` skill and corresponding reference markdown files.
Network policy management evaluation updates `.agents/skills/nemoclaw-user-manage-policy/evals/evals.json`, `skills/nemoclaw-user-manage-policy/evals/evals.json`	Extended network-policy evaluations across customize-network-policy, approve-network-requests, and integration-policy-examples with `expected_behavior` arrays specifying `nemoclaw-user-manage-policy` skill and scenario-specific reference files.
Sandbox management evaluation updates `.agents/skills/nemoclaw-user-manage-sandboxes/evals/evals.json`, `skills/nemoclaw-user-manage-sandboxes/evals/evals.json`	Enriched sandbox evaluations (lifecycle, runtime controls, backup/restore, workspace files, messaging channels) with `expected_behavior` arrays requiring `nemoclaw-user-manage-sandboxes` skill and topic-specific reference documentation.
Sandbox monitoring evaluation updates `.agents/skills/nemoclaw-user-monitor-sandbox/evals/evals.json`, `skills/nemoclaw-user-monitor-sandbox/evals/evals.json`	Updated three monitoring activity cases with `expected_behavior` arrays and revised `ground_truth` text emphasizing NemoClaw-specific guidance for separating host, gateway, sandbox, policy, and inference concerns.
Overview and ecosystem evaluation updates `.agents/skills/nemoclaw-user-overview/evals/evals.json`, `skills/nemoclaw-user-overview/evals/evals.json`	Added `expected_behavior` arrays to overview, ecosystem, how-it-works, and release-notes evaluations specifying `nemoclaw-user-overview` skill and corresponding reference files, with updated `ground_truth` phrasing.
Reference documentation evaluation updates `.agents/skills/nemoclaw-user-reference/evals/evals.json`, `skills/nemoclaw-user-reference/evals/evals.json`	Extended reference evaluations (architecture, CLI guide, commands, policies, troubleshooting) with `expected_behavior` arrays requiring `nemoclaw-user-reference` skill and topic-specific `references/*.md` citation.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

NVIDIA/NemoClaw#4466: Directly conflicts with this PR by removing expected_behavior arrays from the same skill eval entries that this PR adds them to.
NVIDIA/NemoClaw#4463: Removes expected_behavior from nemoclaw-user-configure-inference and nemoclaw-user-configure-security eval objects that this PR is adding.
NVIDIA/NemoClaw#4293: Updates skill reference markdown documentation (e.g., inference options, get-started Windows prep, policy, overview) that aligns with the expected_behavior reference assertions being added in this PR.

Suggested labels

fix

Suggested reviewers

jyaunches
ericksoa
cv

Poem

🐰 In eval files, a pattern takes flight,
expected_behavior shines bold and bright,
Each skill knows its dance, its docs in place,
From agent to overview, standards embrace,
The rabbit hops through sixteen files with grace! 🌟

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title 'chore: fix for eval schema validation' is too vague to accurately describe the changeset, which updates expected_behavior arrays across 16 eval JSON files.	Consider a more specific title like 'chore: add expected_behavior to eval cases across NemoClaw skills' that better reflects the systematic updates to the evaluation schema across multiple skill modules.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch skills-eval-schema-fix

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-28T21:51:29Z

E2E Advisor Recommendation

Required E2E: None
Optional E2E: None

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

None. No NemoClaw E2E is recommended because the PR only updates agent-skill evaluation metadata. Existing E2E jobs such as skill-agent-e2e validate sandbox skill injection and agent use of an injected fixture, not these eval definitions, and docs-validation-e2e focuses on installed CLI/docs/link parity. These changes should be covered by skill/eval validation or NVSkills CI rather than NemoClaw runtime E2E.

Optional E2E

None.

New E2E recommendations

None.

github-actions · 2026-05-28T21:51:30Z

E2E Scenario Advisor Recommendation

Required scenario E2E: None
Optional scenario E2E: None

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

None. No scenario workflow, scenario metadata, scenario runtime, or validation-suite files changed.

Optional scenario E2E

None.

Relevant changed files

None.

github-actions · 2026-05-28T21:53:10Z

PR Review Advisor

Findings: 0 needs attention, 2 worth checking, 0 nice ideas
Since last review: 0 prior items resolved, 0 still apply, 2 new items found

Review findings

🛠️ Needs attention

None.

🔎 Worth checking

Published skill contents changed without refreshing signing artifacts (skills/nemoclaw-user-agent-skills/evals/evals.json:7): The PR changes eval files inside the root `skills/` directories, which the repository documents as the NVSkills CI watched, signed, and published location. The corresponding `skill.oms.sig` artifacts remain unchanged in the diff. If eval files are part of the signed skill payload, downstream users or publication tooling may see stale trust metadata for modified content.
- Recommendation: Refresh or remove/regenerate the affected `skills/nemoclaw-user-*/skill.oms.sig` artifacts according to the NVSkills signing flow before treating the root `skills/` copies as published/signed content.
- Evidence: Changed files include `skills/nemoclaw-user-*/evals/evals.json`; `skills/nemoclaw-user-*/skill.oms.sig` files exist in the same published skill directories but are not part of the diff. `.github/catalog-skills-signing-flow.md` states that root `skills/` is what gets signed and published.
Schema-validation fix lacks local validation evidence (.agents/skills/nemoclaw-user-agent-skills/evals/evals.json:7): The patch adds `expected_behavior` arrays to every eval case, which plausibly addresses the stated eval schema issue, but the repository checkout did not show an in-repo eval schema or evaluator code proving that this new field is accepted and consumed. Benchmark files and generated skill-card metadata were also not updated with evidence from a refreshed eval run.
- Recommendation: Add or point to the targeted NVSkills-Eval/schema validation output for the changed eval files, and update any generated benchmark/metadata artifacts if they are expected to reflect the new eval definitions.
- Evidence: All changed eval objects add `expected_behavior`; repository searches did not find a local schema/evaluator reference for that field, and no `BENCHMARK.md` or `skill-card.md` files changed.

🌱 Nice ideas

None.

Since last review details

Current findings:

Published skill contents changed without refreshing signing artifacts (skills/nemoclaw-user-agent-skills/evals/evals.json:7): The PR changes eval files inside the root `skills/` directories, which the repository documents as the NVSkills CI watched, signed, and published location. The corresponding `skill.oms.sig` artifacts remain unchanged in the diff. If eval files are part of the signed skill payload, downstream users or publication tooling may see stale trust metadata for modified content.
- Recommendation: Refresh or remove/regenerate the affected `skills/nemoclaw-user-*/skill.oms.sig` artifacts according to the NVSkills signing flow before treating the root `skills/` copies as published/signed content.
- Evidence: Changed files include `skills/nemoclaw-user-*/evals/evals.json`; `skills/nemoclaw-user-*/skill.oms.sig` files exist in the same published skill directories but are not part of the diff. `.github/catalog-skills-signing-flow.md` states that root `skills/` is what gets signed and published.
Schema-validation fix lacks local validation evidence (.agents/skills/nemoclaw-user-agent-skills/evals/evals.json:7): The patch adds `expected_behavior` arrays to every eval case, which plausibly addresses the stated eval schema issue, but the repository checkout did not show an in-repo eval schema or evaluator code proving that this new field is accepted and consumed. Benchmark files and generated skill-card metadata were also not updated with evidence from a refreshed eval run.
- Recommendation: Add or point to the targeted NVSkills-Eval/schema validation output for the changed eval files, and update any generated benchmark/metadata artifacts if they are expected to reflect the new eval definitions.
- Evidence: All changed eval objects add `expected_behavior`; repository searches did not find a local schema/evaluator reference for that field, and no `BENCHMARK.md` or `skill-card.md` files changed.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

skills/nemoclaw-user-deploy-remote/evals/evals.json (1)

59-59: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix grammatical error in question text.

The question starts with "I'm the hosted sandbox is created" which is grammatically incorrect. Based on the pattern used in line 26 ("I'm after remote deployment succeeds"), this should likely be "I'm after the hosted sandbox is created" or "Once the hosted sandbox is created".

📝 Proposed fix

-    "question": "I'm the hosted sandbox is created. Help me confirm where to connect and how to start using it so I can move from provisioning to actual agent work.",
+    "question": "I'm after the hosted sandbox is created. Help me confirm where to connect and how to start using it so I can move from provisioning to actual agent work.",

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@skills/nemoclaw-user-deploy-remote/evals/evals.json` at line 59, Update the
JSON "question" value to fix the grammar by replacing "I'm the hosted sandbox is
created" with a correct phrase such as "Once the hosted sandbox is created,"
(e.g., change the entry's question to: "Once the hosted sandbox is created, help
me confirm where to connect and how to start using it so I can move from
provisioning to actual agent work."). Make the change to the "question" field in
the same JSON object so the sentence follows the pattern used in other entries
like "I'm after remote deployment succeeds".

🧹 Nitpick comments (4)

skills/nemoclaw-user-reference/evals/evals.json (2)
6-6: ⚡ Quick win

Fix pronoun inconsistency in ground_truth fields.

The ground_truth text uses third-person framing ("helps the user") but then includes first-person pronouns copied from the question text ("my operation", "I need"). This creates grammatical inconsistency.

For example, line 39:

Current: "helps the user pick the command surface that owns my operation"

Should be: "helps the user pick the command surface that owns their operation" or "...owns the operation"

This issue appears in lines 6, 17, 28, 39, 50, 61, 72, 83, 94, 105, 116, 127, 138, 149, and 160.
📝 Example fixes for pronoun consistency
-    "ground_truth": "A NemoClaw-specific answer that helps the user pick the command surface that owns my operation and gives enough concrete guidance, decision criteria, verification steps, or risk framing to complete the task without breaking NemoClaw management.",
+    "ground_truth": "A NemoClaw-specific answer that helps the user pick the command surface that owns their operation and gives enough concrete guidance, decision criteria, verification steps, or risk framing to complete the task without breaking NemoClaw management.",
-    "ground_truth": "A NemoClaw-specific answer that helps the user find the exact action, flag, or recovery path I need and gives enough concrete guidance, decision criteria, verification steps, or risk framing to run the right command without scanning source code.",
+    "ground_truth": "A NemoClaw-specific answer that helps the user find the exact action, flag, or recovery path they need and gives enough concrete guidance, decision criteria, verification steps, or risk framing to run the right command without scanning source code.",
Apply similar changes to all affected ground_truth fields.
Also applies to: 17-17, 28-28, 39-39, 50-50, 61-61, 72-72, 83-83, 94-94, 105-105, 116-116, 127-127, 138-138, 149-149, 160-160
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@skills/nemoclaw-user-reference/evals/evals.json` at line 6, Several
ground_truth JSON entries use inconsistent first-person pronouns ("my", "I")
inside third‑person framing; update each affected "ground_truth" value to
replace first‑person pronouns with third‑person equivalents (e.g., "my" →
"their" or "the", "I" → "the user") while preserving the original meaning and
punctuation; target every "ground_truth" field instance mentioned in the comment
(the occurrences listed) so each sentence reads consistently in third person.
1-167: 💤 Low value

Consider removing redundancy between expected_skill and expected_behavior.

Every evaluation case includes both an expected_skill field and an expected_behavior array where the first item duplicates the skill name. For example:

"expected_skill": "nemoclaw-user-reference"

"expected_behavior": ["Uses the \nemoclaw-user-reference` skill.", ...]`

If the evaluation framework requires both fields, this is acceptable. Otherwise, consider whether expected_skill can be removed to reduce duplication across all 18 test cases.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@skills/nemoclaw-user-reference/evals/evals.json` around lines 1 - 167, The
test cases duplicate the skill name in both the "expected_skill" field and as
the first item of the "expected_behavior" array (e.g., "expected_skill":
"nemoclaw-user-reference" and "expected_behavior": ["Uses the
`nemoclaw-user-reference` skill.", ...]); remove the redundancy by either
deleting the "expected_skill" field across cases or removing the redundant first
item from each "expected_behavior" entry, and if the framework requires both
keep them but change the "expected_behavior" items to describe behavior (e.g.,
"Invokes the skill for answers") instead of repeating the skill identifier;
update all 18 objects (look for "expected_skill" and the corresponding
"expected_behavior" arrays) consistently.
.agents/skills/nemoclaw-user-reference/evals/evals.json (2)
6-6: ⚡ Quick win

Fix pronoun inconsistency in ground_truth fields.

The ground_truth text uses third-person framing ("helps the user") but then includes first-person pronouns copied from the question text ("my operation", "I need"). This creates grammatical inconsistency.

For example, line 39:

Current: "helps the user pick the command surface that owns my operation"

Should be: "helps the user pick the command surface that owns their operation" or "...owns the operation"

This issue appears in lines 6, 17, 28, 39, 50, 61, 72, 83, 94, 105, 116, 127, 138, 149, and 160.
📝 Example fixes for pronoun consistency
-    "ground_truth": "A NemoClaw-specific answer that helps the user pick the command surface that owns my operation and gives enough concrete guidance, decision criteria, verification steps, or risk framing to complete the task without breaking NemoClaw management.",
+    "ground_truth": "A NemoClaw-specific answer that helps the user pick the command surface that owns their operation and gives enough concrete guidance, decision criteria, verification steps, or risk framing to complete the task without breaking NemoClaw management.",
-    "ground_truth": "A NemoClaw-specific answer that helps the user find the exact action, flag, or recovery path I need and gives enough concrete guidance, decision criteria, verification steps, or risk framing to run the right command without scanning source code.",
+    "ground_truth": "A NemoClaw-specific answer that helps the user find the exact action, flag, or recovery path they need and gives enough concrete guidance, decision criteria, verification steps, or risk framing to run the right command without scanning source code.",
Apply similar changes to all affected ground_truth fields.
Also applies to: 17-17, 28-28, 39-39, 50-50, 61-61, 72-72, 83-83, 94-94, 105-105, 116-116, 127-127, 138-138, 149-149, 160-160
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.agents/skills/nemoclaw-user-reference/evals/evals.json at line 6, Update
all ground_truth string values to use consistent third-person phrasing: replace
first-person pronouns ("my", "I", "we" etc.) with third-person equivalents
("their", "the") or rephrase to avoid pronouns; the affected field is the JSON
key "ground_truth" in each object (occurrences around the listed entries) —
ensure lines that currently read like "helps the user pick the command surface
that owns my operation" become "helps the user pick the command surface that
owns their operation" or "owns the operation", and apply the same pronoun
replacements across all mentioned ground_truth entries.
1-167: 💤 Low value

Consider removing redundancy between expected_skill and expected_behavior.

Every evaluation case includes both an expected_skill field and an expected_behavior array where the first item duplicates the skill name. For example:

"expected_skill": "nemoclaw-user-reference"

"expected_behavior": ["Uses the \nemoclaw-user-reference` skill.", ...]`

If the evaluation framework requires both fields, this is acceptable. Otherwise, consider whether expected_skill can be removed to reduce duplication across all 18 test cases.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.agents/skills/nemoclaw-user-reference/evals/evals.json around lines 1 -
167, The test cases duplicate the skill name in "expected_skill" and the first
entry of "expected_behavior" (e.g., "expected_skill": "nemoclaw-user-reference"
and "expected_behavior": ["Uses the `nemoclaw-user-reference` skill.", ...]);
remove the redundancy by deleting the repeated skill string from each
"expected_behavior" first item or, if the framework requires a single canonical
field, remove the "expected_skill" key and keep "expected_behavior"
entries—update all 18 objects in evals.json consistently (look for the keys
expected_skill and expected_behavior) so only one source of truth remains.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.agents/skills/nemoclaw-user-deploy-remote/evals/evals.json:
- Line 59: The "question" string contains a grammar error ("I'm the hosted
sandbox is created"); update the JSON value for the "question" key to a correct
phrasing such as "Once the hosted sandbox is created, help me confirm where to
connect and how to start using it so I can move from provisioning to actual
agent work." — edit the value associated with "question" in the evals.json entry
to replace the incorrect sentence with this corrected wording.

---

Duplicate comments:
In `@skills/nemoclaw-user-deploy-remote/evals/evals.json`:
- Line 59: Update the JSON "question" value to fix the grammar by replacing "I'm
the hosted sandbox is created" with a correct phrase such as "Once the hosted
sandbox is created," (e.g., change the entry's question to: "Once the hosted
sandbox is created, help me confirm where to connect and how to start using it
so I can move from provisioning to actual agent work."). Make the change to the
"question" field in the same JSON object so the sentence follows the pattern
used in other entries like "I'm after remote deployment succeeds".

---

Nitpick comments:
In @.agents/skills/nemoclaw-user-reference/evals/evals.json:
- Line 6: Update all ground_truth string values to use consistent third-person
phrasing: replace first-person pronouns ("my", "I", "we" etc.) with third-person
equivalents ("their", "the") or rephrase to avoid pronouns; the affected field
is the JSON key "ground_truth" in each object (occurrences around the listed
entries) — ensure lines that currently read like "helps the user pick the
command surface that owns my operation" become "helps the user pick the command
surface that owns their operation" or "owns the operation", and apply the same
pronoun replacements across all mentioned ground_truth entries.
- Around line 1-167: The test cases duplicate the skill name in "expected_skill"
and the first entry of "expected_behavior" (e.g., "expected_skill":
"nemoclaw-user-reference" and "expected_behavior": ["Uses the
`nemoclaw-user-reference` skill.", ...]); remove the redundancy by deleting the
repeated skill string from each "expected_behavior" first item or, if the
framework requires a single canonical field, remove the "expected_skill" key and
keep "expected_behavior" entries—update all 18 objects in evals.json
consistently (look for the keys expected_skill and expected_behavior) so only
one source of truth remains.

In `@skills/nemoclaw-user-reference/evals/evals.json`:
- Line 6: Several ground_truth JSON entries use inconsistent first-person
pronouns ("my", "I") inside third‑person framing; update each affected
"ground_truth" value to replace first‑person pronouns with third‑person
equivalents (e.g., "my" → "their" or "the", "I" → "the user") while preserving
the original meaning and punctuation; target every "ground_truth" field instance
mentioned in the comment (the occurrences listed) so each sentence reads
consistently in third person.
- Around line 1-167: The test cases duplicate the skill name in both the
"expected_skill" field and as the first item of the "expected_behavior" array
(e.g., "expected_skill": "nemoclaw-user-reference" and "expected_behavior":
["Uses the `nemoclaw-user-reference` skill.", ...]); remove the redundancy by
either deleting the "expected_skill" field across cases or removing the
redundant first item from each "expected_behavior" entry, and if the framework
requires both keep them but change the "expected_behavior" items to describe
behavior (e.g., "Invokes the skill for answers") instead of repeating the skill
identifier; update all 18 objects (look for "expected_skill" and the
corresponding "expected_behavior" arrays) consistently.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: db57419c-007e-4997-adb7-ce809fc61642

📥 Commits

Reviewing files that changed from the base of the PR and between 442f64b and 8bc92c5.

📒 Files selected for processing (20)

.agents/skills/nemoclaw-user-agent-skills/evals/evals.json
.agents/skills/nemoclaw-user-configure-inference/evals/evals.json
.agents/skills/nemoclaw-user-configure-security/evals/evals.json
.agents/skills/nemoclaw-user-deploy-remote/evals/evals.json
.agents/skills/nemoclaw-user-get-started/evals/evals.json
.agents/skills/nemoclaw-user-manage-policy/evals/evals.json
.agents/skills/nemoclaw-user-manage-sandboxes/evals/evals.json
.agents/skills/nemoclaw-user-monitor-sandbox/evals/evals.json
.agents/skills/nemoclaw-user-overview/evals/evals.json
.agents/skills/nemoclaw-user-reference/evals/evals.json
skills/nemoclaw-user-agent-skills/evals/evals.json
skills/nemoclaw-user-configure-inference/evals/evals.json
skills/nemoclaw-user-configure-security/evals/evals.json
skills/nemoclaw-user-deploy-remote/evals/evals.json
skills/nemoclaw-user-get-started/evals/evals.json
skills/nemoclaw-user-manage-policy/evals/evals.json
skills/nemoclaw-user-manage-sandboxes/evals/evals.json
skills/nemoclaw-user-monitor-sandbox/evals/evals.json
skills/nemoclaw-user-overview/evals/evals.json
skills/nemoclaw-user-reference/evals/evals.json

coderabbitai · 2026-05-28T21:58:54Z

  },
  {
    "id": "docs-deployment-brev-web-ui-003",
    "question": "I'm the hosted sandbox is created. Help me confirm where to connect and how to start using it so I can move from provisioning to actual agent work.",


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix grammatical error in question text.

The question starts with "I'm the hosted sandbox is created" which is grammatically incorrect. Based on the pattern used in line 26 ("I'm after remote deployment succeeds"), this should likely be "I'm after the hosted sandbox is created" or "Once the hosted sandbox is created".

📝 Proposed fix

- "question": "I'm the hosted sandbox is created. Help me confirm where to connect and how to start using it so I can move from provisioning to actual agent work.", + "question": "I'm after the hosted sandbox is created. Help me confirm where to connect and how to start using it so I can move from provisioning to actual agent work.",

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"question": "I'm the hosted sandbox is created. Help me confirm where to connect and how to start using it so I can move from provisioning to actual agent work.",

"question": "I'm after the hosted sandbox is created. Help me confirm where to connect and how to start using it so I can move from provisioning to actual agent work.",

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.agents/skills/nemoclaw-user-deploy-remote/evals/evals.json at line 59, The "question" string contains a grammar error ("I'm the hosted sandbox is created"); update the JSON value for the "question" key to a correct phrasing such as "Once the hosted sandbox is created, help me confirm where to connect and how to start using it so I can move from provisioning to actual agent work." — edit the value associated with "question" in the evals.json entry to replace the incorrect sentence with this corrected wording.

chore: fix for eval schema validation

8bc92c5

miyoungc added the enhancement: skill Improvements to NemoCall repository hygiene or user functionality with skills. label May 28, 2026

coderabbitai Bot reviewed May 28, 2026

View reviewed changes

miyoungc closed this May 28, 2026

	"question": "I'm the hosted sandbox is created. Help me confirm where to connect and how to start using it so I can move from provisioning to actual agent work.",
	"question": "I'm after the hosted sandbox is created. Help me confirm where to connect and how to start using it so I can move from provisioning to actual agent work.",

Conversation

miyoungc commented May 28, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Type of Change

Verification

Summary by CodeRabbit

Release Notes

Uh oh!

miyoungc commented May 28, 2026

Uh oh!

coderabbitai Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 inconclusive)

Uh oh!

github-actions Bot commented May 28, 2026

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Uh oh!

github-actions Bot commented May 28, 2026

E2E Scenario Advisor Recommendation

E2E Scenario Advisor

Required scenario E2E

Optional scenario E2E

Relevant changed files

Uh oh!

github-actions Bot commented May 28, 2026

PR Review Advisor

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

miyoungc commented May 28, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 28, 2026 •

edited

Loading