chore: fix for eval schema validation#4469
Conversation
|
/nvskills-ci |
📝 WalkthroughWalkthroughThis pull request standardizes evaluation test cases across 16 NemoClaw skill evaluation JSON files by adding structured ChangesStandardized Skill Evaluation Assertions
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
E2E Advisor RecommendationRequired E2E: None Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
|
E2E Scenario Advisor RecommendationRequired scenario E2E: None Full scenario advisor summaryE2E Scenario AdvisorBase: Required scenario E2E
Optional scenario E2E
Relevant changed files
|
PR Review AdvisorFindings: 0 needs attention, 2 worth checking, 0 nice ideas Review findings🛠️ Needs attention
🔎 Worth checking
🌱 Nice ideas
Since last review detailsCurrent findings:
This is an automated advisory review. A human maintainer must make the final merge decision. |
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
skills/nemoclaw-user-deploy-remote/evals/evals.json (1)
59-59:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix grammatical error in question text.
The question starts with "I'm the hosted sandbox is created" which is grammatically incorrect. Based on the pattern used in line 26 ("I'm after remote deployment succeeds"), this should likely be "I'm after the hosted sandbox is created" or "Once the hosted sandbox is created".
📝 Proposed fix
- "question": "I'm the hosted sandbox is created. Help me confirm where to connect and how to start using it so I can move from provisioning to actual agent work.", + "question": "I'm after the hosted sandbox is created. Help me confirm where to connect and how to start using it so I can move from provisioning to actual agent work.",🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@skills/nemoclaw-user-deploy-remote/evals/evals.json` at line 59, Update the JSON "question" value to fix the grammar by replacing "I'm the hosted sandbox is created" with a correct phrase such as "Once the hosted sandbox is created," (e.g., change the entry's question to: "Once the hosted sandbox is created, help me confirm where to connect and how to start using it so I can move from provisioning to actual agent work."). Make the change to the "question" field in the same JSON object so the sentence follows the pattern used in other entries like "I'm after remote deployment succeeds".
🧹 Nitpick comments (4)
skills/nemoclaw-user-reference/evals/evals.json (2)
6-6: ⚡ Quick winFix pronoun inconsistency in
ground_truthfields.The
ground_truthtext uses third-person framing ("helps the user") but then includes first-person pronouns copied from the question text ("my operation", "I need"). This creates grammatical inconsistency.For example, line 39:
- Current: "helps the user pick the command surface that owns my operation"
- Should be: "helps the user pick the command surface that owns their operation" or "...owns the operation"
This issue appears in lines 6, 17, 28, 39, 50, 61, 72, 83, 94, 105, 116, 127, 138, 149, and 160.
📝 Example fixes for pronoun consistency
- "ground_truth": "A NemoClaw-specific answer that helps the user pick the command surface that owns my operation and gives enough concrete guidance, decision criteria, verification steps, or risk framing to complete the task without breaking NemoClaw management.", + "ground_truth": "A NemoClaw-specific answer that helps the user pick the command surface that owns their operation and gives enough concrete guidance, decision criteria, verification steps, or risk framing to complete the task without breaking NemoClaw management.",- "ground_truth": "A NemoClaw-specific answer that helps the user find the exact action, flag, or recovery path I need and gives enough concrete guidance, decision criteria, verification steps, or risk framing to run the right command without scanning source code.", + "ground_truth": "A NemoClaw-specific answer that helps the user find the exact action, flag, or recovery path they need and gives enough concrete guidance, decision criteria, verification steps, or risk framing to run the right command without scanning source code.",Apply similar changes to all affected
ground_truthfields.Also applies to: 17-17, 28-28, 39-39, 50-50, 61-61, 72-72, 83-83, 94-94, 105-105, 116-116, 127-127, 138-138, 149-149, 160-160
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@skills/nemoclaw-user-reference/evals/evals.json` at line 6, Several ground_truth JSON entries use inconsistent first-person pronouns ("my", "I") inside third‑person framing; update each affected "ground_truth" value to replace first‑person pronouns with third‑person equivalents (e.g., "my" → "their" or "the", "I" → "the user") while preserving the original meaning and punctuation; target every "ground_truth" field instance mentioned in the comment (the occurrences listed) so each sentence reads consistently in third person.
1-167: 💤 Low valueConsider removing redundancy between
expected_skillandexpected_behavior.Every evaluation case includes both an
expected_skillfield and anexpected_behaviorarray where the first item duplicates the skill name. For example:
"expected_skill": "nemoclaw-user-reference""expected_behavior": ["Uses the \nemoclaw-user-reference` skill.", ...]`If the evaluation framework requires both fields, this is acceptable. Otherwise, consider whether
expected_skillcan be removed to reduce duplication across all 18 test cases.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@skills/nemoclaw-user-reference/evals/evals.json` around lines 1 - 167, The test cases duplicate the skill name in both the "expected_skill" field and as the first item of the "expected_behavior" array (e.g., "expected_skill": "nemoclaw-user-reference" and "expected_behavior": ["Uses the `nemoclaw-user-reference` skill.", ...]); remove the redundancy by either deleting the "expected_skill" field across cases or removing the redundant first item from each "expected_behavior" entry, and if the framework requires both keep them but change the "expected_behavior" items to describe behavior (e.g., "Invokes the skill for answers") instead of repeating the skill identifier; update all 18 objects (look for "expected_skill" and the corresponding "expected_behavior" arrays) consistently..agents/skills/nemoclaw-user-reference/evals/evals.json (2)
6-6: ⚡ Quick winFix pronoun inconsistency in
ground_truthfields.The
ground_truthtext uses third-person framing ("helps the user") but then includes first-person pronouns copied from the question text ("my operation", "I need"). This creates grammatical inconsistency.For example, line 39:
- Current: "helps the user pick the command surface that owns my operation"
- Should be: "helps the user pick the command surface that owns their operation" or "...owns the operation"
This issue appears in lines 6, 17, 28, 39, 50, 61, 72, 83, 94, 105, 116, 127, 138, 149, and 160.
📝 Example fixes for pronoun consistency
- "ground_truth": "A NemoClaw-specific answer that helps the user pick the command surface that owns my operation and gives enough concrete guidance, decision criteria, verification steps, or risk framing to complete the task without breaking NemoClaw management.", + "ground_truth": "A NemoClaw-specific answer that helps the user pick the command surface that owns their operation and gives enough concrete guidance, decision criteria, verification steps, or risk framing to complete the task without breaking NemoClaw management.",- "ground_truth": "A NemoClaw-specific answer that helps the user find the exact action, flag, or recovery path I need and gives enough concrete guidance, decision criteria, verification steps, or risk framing to run the right command without scanning source code.", + "ground_truth": "A NemoClaw-specific answer that helps the user find the exact action, flag, or recovery path they need and gives enough concrete guidance, decision criteria, verification steps, or risk framing to run the right command without scanning source code.",Apply similar changes to all affected
ground_truthfields.Also applies to: 17-17, 28-28, 39-39, 50-50, 61-61, 72-72, 83-83, 94-94, 105-105, 116-116, 127-127, 138-138, 149-149, 160-160
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.agents/skills/nemoclaw-user-reference/evals/evals.json at line 6, Update all ground_truth string values to use consistent third-person phrasing: replace first-person pronouns ("my", "I", "we" etc.) with third-person equivalents ("their", "the") or rephrase to avoid pronouns; the affected field is the JSON key "ground_truth" in each object (occurrences around the listed entries) — ensure lines that currently read like "helps the user pick the command surface that owns my operation" become "helps the user pick the command surface that owns their operation" or "owns the operation", and apply the same pronoun replacements across all mentioned ground_truth entries.
1-167: 💤 Low valueConsider removing redundancy between
expected_skillandexpected_behavior.Every evaluation case includes both an
expected_skillfield and anexpected_behaviorarray where the first item duplicates the skill name. For example:
"expected_skill": "nemoclaw-user-reference""expected_behavior": ["Uses the \nemoclaw-user-reference` skill.", ...]`If the evaluation framework requires both fields, this is acceptable. Otherwise, consider whether
expected_skillcan be removed to reduce duplication across all 18 test cases.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.agents/skills/nemoclaw-user-reference/evals/evals.json around lines 1 - 167, The test cases duplicate the skill name in "expected_skill" and the first entry of "expected_behavior" (e.g., "expected_skill": "nemoclaw-user-reference" and "expected_behavior": ["Uses the `nemoclaw-user-reference` skill.", ...]); remove the redundancy by deleting the repeated skill string from each "expected_behavior" first item or, if the framework requires a single canonical field, remove the "expected_skill" key and keep "expected_behavior" entries—update all 18 objects in evals.json consistently (look for the keys expected_skill and expected_behavior) so only one source of truth remains.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.agents/skills/nemoclaw-user-deploy-remote/evals/evals.json:
- Line 59: The "question" string contains a grammar error ("I'm the hosted
sandbox is created"); update the JSON value for the "question" key to a correct
phrasing such as "Once the hosted sandbox is created, help me confirm where to
connect and how to start using it so I can move from provisioning to actual
agent work." — edit the value associated with "question" in the evals.json entry
to replace the incorrect sentence with this corrected wording.
---
Duplicate comments:
In `@skills/nemoclaw-user-deploy-remote/evals/evals.json`:
- Line 59: Update the JSON "question" value to fix the grammar by replacing "I'm
the hosted sandbox is created" with a correct phrase such as "Once the hosted
sandbox is created," (e.g., change the entry's question to: "Once the hosted
sandbox is created, help me confirm where to connect and how to start using it
so I can move from provisioning to actual agent work."). Make the change to the
"question" field in the same JSON object so the sentence follows the pattern
used in other entries like "I'm after remote deployment succeeds".
---
Nitpick comments:
In @.agents/skills/nemoclaw-user-reference/evals/evals.json:
- Line 6: Update all ground_truth string values to use consistent third-person
phrasing: replace first-person pronouns ("my", "I", "we" etc.) with third-person
equivalents ("their", "the") or rephrase to avoid pronouns; the affected field
is the JSON key "ground_truth" in each object (occurrences around the listed
entries) — ensure lines that currently read like "helps the user pick the
command surface that owns my operation" become "helps the user pick the command
surface that owns their operation" or "owns the operation", and apply the same
pronoun replacements across all mentioned ground_truth entries.
- Around line 1-167: The test cases duplicate the skill name in "expected_skill"
and the first entry of "expected_behavior" (e.g., "expected_skill":
"nemoclaw-user-reference" and "expected_behavior": ["Uses the
`nemoclaw-user-reference` skill.", ...]); remove the redundancy by deleting the
repeated skill string from each "expected_behavior" first item or, if the
framework requires a single canonical field, remove the "expected_skill" key and
keep "expected_behavior" entries—update all 18 objects in evals.json
consistently (look for the keys expected_skill and expected_behavior) so only
one source of truth remains.
In `@skills/nemoclaw-user-reference/evals/evals.json`:
- Line 6: Several ground_truth JSON entries use inconsistent first-person
pronouns ("my", "I") inside third‑person framing; update each affected
"ground_truth" value to replace first‑person pronouns with third‑person
equivalents (e.g., "my" → "their" or "the", "I" → "the user") while preserving
the original meaning and punctuation; target every "ground_truth" field instance
mentioned in the comment (the occurrences listed) so each sentence reads
consistently in third person.
- Around line 1-167: The test cases duplicate the skill name in both the
"expected_skill" field and as the first item of the "expected_behavior" array
(e.g., "expected_skill": "nemoclaw-user-reference" and "expected_behavior":
["Uses the `nemoclaw-user-reference` skill.", ...]); remove the redundancy by
either deleting the "expected_skill" field across cases or removing the
redundant first item from each "expected_behavior" entry, and if the framework
requires both keep them but change the "expected_behavior" items to describe
behavior (e.g., "Invokes the skill for answers") instead of repeating the skill
identifier; update all 18 objects (look for "expected_skill" and the
corresponding "expected_behavior" arrays) consistently.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: db57419c-007e-4997-adb7-ce809fc61642
📒 Files selected for processing (20)
.agents/skills/nemoclaw-user-agent-skills/evals/evals.json.agents/skills/nemoclaw-user-configure-inference/evals/evals.json.agents/skills/nemoclaw-user-configure-security/evals/evals.json.agents/skills/nemoclaw-user-deploy-remote/evals/evals.json.agents/skills/nemoclaw-user-get-started/evals/evals.json.agents/skills/nemoclaw-user-manage-policy/evals/evals.json.agents/skills/nemoclaw-user-manage-sandboxes/evals/evals.json.agents/skills/nemoclaw-user-monitor-sandbox/evals/evals.json.agents/skills/nemoclaw-user-overview/evals/evals.json.agents/skills/nemoclaw-user-reference/evals/evals.jsonskills/nemoclaw-user-agent-skills/evals/evals.jsonskills/nemoclaw-user-configure-inference/evals/evals.jsonskills/nemoclaw-user-configure-security/evals/evals.jsonskills/nemoclaw-user-deploy-remote/evals/evals.jsonskills/nemoclaw-user-get-started/evals/evals.jsonskills/nemoclaw-user-manage-policy/evals/evals.jsonskills/nemoclaw-user-manage-sandboxes/evals/evals.jsonskills/nemoclaw-user-monitor-sandbox/evals/evals.jsonskills/nemoclaw-user-overview/evals/evals.jsonskills/nemoclaw-user-reference/evals/evals.json
| }, | ||
| { | ||
| "id": "docs-deployment-brev-web-ui-003", | ||
| "question": "I'm the hosted sandbox is created. Help me confirm where to connect and how to start using it so I can move from provisioning to actual agent work.", |
There was a problem hiding this comment.
Fix grammatical error in question text.
The question starts with "I'm the hosted sandbox is created" which is grammatically incorrect. Based on the pattern used in line 26 ("I'm after remote deployment succeeds"), this should likely be "I'm after the hosted sandbox is created" or "Once the hosted sandbox is created".
📝 Proposed fix
- "question": "I'm the hosted sandbox is created. Help me confirm where to connect and how to start using it so I can move from provisioning to actual agent work.",
+ "question": "I'm after the hosted sandbox is created. Help me confirm where to connect and how to start using it so I can move from provisioning to actual agent work.",📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "question": "I'm the hosted sandbox is created. Help me confirm where to connect and how to start using it so I can move from provisioning to actual agent work.", | |
| "question": "I'm after the hosted sandbox is created. Help me confirm where to connect and how to start using it so I can move from provisioning to actual agent work.", |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.agents/skills/nemoclaw-user-deploy-remote/evals/evals.json at line 59, The
"question" string contains a grammar error ("I'm the hosted sandbox is
created"); update the JSON value for the "question" key to a correct phrasing
such as "Once the hosted sandbox is created, help me confirm where to connect
and how to start using it so I can move from provisioning to actual agent work."
— edit the value associated with "question" in the evals.json entry to replace
the incorrect sentence with this corrected wording.
Summary
Related Issue
Changes
Type of Change
Verification
npx prek run --all-filespassesnpm testpassesnpm run docsbuilds without warnings (doc changes only)Signed-off-by: Your Name your-email@example.com
Summary by CodeRabbit
Release Notes