Skip to content

chore: skills signing batch 4#4465

Merged
miyoungc merged 3 commits into
mainfrom
skills-sign-batch-4
May 28, 2026
Merged

chore: skills signing batch 4#4465
miyoungc merged 3 commits into
mainfrom
skills-sign-batch-4

Conversation

@miyoungc
Copy link
Copy Markdown
Collaborator

@miyoungc miyoungc commented May 28, 2026

Summary

Related Issue

Changes

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • npm run docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Your Name your-email@example.com

Summary by CodeRabbit

  • Documentation

    • Added detailed guides for remote GPU deployment (Brev), sandbox monitoring/debugging, OpenClaw plugin installation, sandbox hardening, and Brev Web UI launch flow.
    • Added skill cards and reference docs for deploy and monitor skills.
  • Tests

    • Added and updated evaluation datasets for deployment and monitoring scenarios (adjusted expected outputs).
  • Benchmarks

    • Added NVSkills-Eval benchmark reports for deploy-remote (recommend review) and monitor-sandbox (pass).
  • Chores

    • Added signature/metadata files for skills.

Review Change Stack

@miyoungc
Copy link
Copy Markdown
Collaborator Author

/nvskills-ci

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

📝 Walkthrough

Walkthrough

This PR adds docs, reference guides, eval datasets, benchmarks, skill-cards, and signature bundles for two NemoClaw skills: remote GPU deployment (Brev, OpenClaw plugins, sandbox hardening) and sandbox monitoring/debugging, and normalizes agent-side eval JSON by removing expected_behavior arrays.

Changes

NemoClaw Skill Documentation and Evaluation

Layer / File(s) Summary
Remote deployment skill documentation and guides
skills/nemoclaw-user-deploy-remote/SKILL.md, skills/nemoclaw-user-deploy-remote/references/brev-web-ui.md, skills/nemoclaw-user-deploy-remote/references/install-openclaw-plugins.md, skills/nemoclaw-user-deploy-remote/references/sandbox-hardening.md, skills/nemoclaw-user-deploy-remote/BENCHMARK.md, skills/nemoclaw-user-deploy-remote/skill-card.md, skills/nemoclaw-user-deploy-remote/skill.oms.sig
Main skill guide covers legacy nemoclaw deploy wrapper, preferred Brev provisioning + onboarding flow, remote sandbox connection, OpenShell TUI monitoring, inference verification, dashboard access prerequisites (CHAT_UI_URL), readiness timeout behavior, proxy routing, GPU selection, OpenClaw plugin install patterns, and sandbox hardening (toolchain removal, ulimit, capability dropping, Landlock filesystem notes). Includes benchmark, skill-card, and signature bundle.
Remote deployment evaluation criteria
skills/nemoclaw-user-deploy-remote/evals/evals.json, .agents/skills/nemoclaw-user-deploy-remote/evals/evals.json
New evaluation dataset defines deployment test cases and ground-truth rubrics; agent-side eval entries were changed by removing expected_behavior arrays and consolidating criteria into ground_truth.
Sandbox monitoring skill documentation and artifacts
skills/nemoclaw-user-monitor-sandbox/SKILL.md, skills/nemoclaw-user-monitor-sandbox/BENCHMARK.md, skills/nemoclaw-user-monitor-sandbox/skill-card.md, skills/nemoclaw-user-monitor-sandbox/skill.oms.sig
Skill guide documents health checks (nemoclaw <name> status), log streaming (nemoclaw <name> logs --follow), OpenShell TUI network monitoring, test inference checks, and structured failure diagnosis checklist. Adds benchmark, skill-card, and signature bundle.
Sandbox monitoring evaluation criteria
skills/nemoclaw-user-monitor-sandbox/evals/evals.json, .agents/skills/nemoclaw-user-monitor-sandbox/evals/evals.json
New evaluation dataset with three monitoring-focused test cases; ground-truth strings adjusted and prior expected_behavior arrays removed from agent-side entries.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

Suggested labels

enhancement: skill

Suggested reviewers

  • jyaunches
  • cv

🐰 Hops with glee, whiskers twitching with delight:
New skills blooming in the docs so bright,
Deploy remote with hardened might,
Monitor sandboxes through the night—
Two paths forward, guidance tight!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'chore: skills signing batch 4' accurately reflects the main change—adding NVSkills validation signatures to two skills (nemoclaw-user-deploy-remote and nemoclaw-user-monitor-sandbox).
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch skills-sign-batch-4

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 28, 2026

E2E Advisor Recommendation

Required E2E: skill-agent-e2e
Optional E2E: docs-validation-e2e, brev-launchable-cloud-openclaw

Dispatch hint: skill-agent-e2e

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: medium

Required E2E

  • skill-agent-e2e (medium): Changed .agents/skills and published skill content can affect real assistant skill discovery and response flows. This is the existing E2E that installs NemoClaw, creates a sandbox, injects a skill, and verifies the OpenClaw agent reads SKILL.md and responds using the skill path.

Optional E2E

  • docs-validation-e2e (low): Useful as a low-cost confidence check for documentation/link integrity after adding multiple Markdown skill references, but the existing nightly job primarily validates core docs and does not fully cover top-level skills/ or .agents/skills content, so it is not a complete merge-blocking signal for this PR.
  • brev-launchable-cloud-openclaw (high): The new deploy-remote skill documents the Brev web UI and hosted launch flow. This scenario validates the Brev launchable path end-to-end, but the PR changes guidance rather than launchable/runtime code, so this is useful confidence only when maintainers want to compare docs against the live flow.

New E2E recommendations

  • repository skill content evaluation (high): No existing E2E appears to load the actual changed .agents/skills/nemoclaw-user-deploy-remote or .agents/skills/nemoclaw-user-monitor-sandbox content and ask the eval prompts to verify the expected skill/reference file is selected and the answer stays within supported NemoClaw behavior.
    • Suggested test: Add a repo-skill scenario E2E that installs NemoClaw/OpenClaw, exposes selected .agents/skills to the agent, runs the eval prompts for deploy-remote and monitor-sandbox, and asserts skill selection plus required safety/accuracy markers.
  • skill Markdown/link validation (medium): The existing docs-validation-e2e does not scan top-level skills/ by default and only scans .agents/skills when check-docs.sh is invoked with --with-skills. This PR adds many relative skill reference links that should be validated in CI.
    • Suggested test: Add or extend an E2E/docs job to run test/e2e/e2e-cloud-experimental/check-docs.sh --only-links --local-only --with-skills and include top-level skills/**/*.md files.
  • NVSkills eval schema regression (medium): The eval JSON changes remove expected_behavior fields. Existing NemoClaw E2E does not validate that these eval files remain compatible with the NVSkills-Eval consumer or that signed skill bundles match the checked-in resources.
    • Suggested test: Add a lightweight skill-publication validation job that runs the NVSkills/static eval schema checks against changed skills/*/evals/evals.json, verifies skill.oms.sig resource digests, and reports changed skill IDs.

Dispatch hint

  • Workflow: nightly-e2e.yaml
  • jobs input: skill-agent-e2e

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 28, 2026

E2E Scenario Advisor Recommendation

Required scenario E2E: None
Optional scenario E2E: None

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

  • None. No scenario workflow, scenario metadata, scenario runtime, or validation-suite files changed.

Optional scenario E2E

  • None.

Relevant changed files

  • None.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.agents/skills/nemoclaw-user-deploy-remote/evals/evals.json:
- Line 34: The "question" JSON value contains a malformed phrase ("I'm the
hosted sandbox is created."); update the "question" field to use correct grammar
by replacing that fragment with a clear phrasing such as "Now that the hosted
sandbox is created, help me confirm where to connect and how to start using it
so I can move from provisioning to actual agent work." Ensure you edit the
"question" key in the JSON object to exactly use the corrected sentence.

In `@skills/nemoclaw-user-deploy-remote/evals/evals.json`:
- Line 34: The "question" string in the eval case is malformed; update the value
of the "question" key in evals.json (the JSON object containing "question") to a
grammatically correct sentence such as "The hosted sandbox is created. Help me
confirm where to connect and how to start using it so I can move from
provisioning to actual agent work." or "After the hosted sandbox is created,
help me confirm where to connect and how to start using it so I can move from
provisioning to actual agent work." ensuring proper punctuation and spacing.

In `@skills/nemoclaw-user-deploy-remote/references/brev-web-ui.md`:
- Line 101: Replace the awkward phrase "Wait for about a few minutes for pairing
to finish automatically." (the sentence in the guidance text) with a tighter
wording such as "Wait a few minutes for pairing to finish automatically." or
"Wait about a minute or two for pairing to finish automatically." to improve
readability; update the sentence exactly where it appears in the guidance
content.

In `@skills/nemoclaw-user-deploy-remote/SKILL.md`:
- Line 3: Update the metadata description string in SKILL.md to fix product-name
casing: replace lowercase occurrences of "nvidia" with "NVIDIA" and "openclaw"
with "OpenClaw", and ensure any other product tokens in that description use the
exact casing "NemoClaw" and "OpenShell" where present; edit the description
value (the string assigned to description in the diff) to perform these
substitutions so all keywords and phrases follow the project's casing rules.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 3b2a1281-50b9-409a-8bf5-23636083f4b3

📥 Commits

Reviewing files that changed from the base of the PR and between 195bbcc and 08cfc0c.

📒 Files selected for processing (9)
  • .agents/skills/nemoclaw-user-deploy-remote/evals/evals.json
  • .agents/skills/nemoclaw-user-monitor-sandbox/evals/evals.json
  • skills/nemoclaw-user-deploy-remote/SKILL.md
  • skills/nemoclaw-user-deploy-remote/evals/evals.json
  • skills/nemoclaw-user-deploy-remote/references/brev-web-ui.md
  • skills/nemoclaw-user-deploy-remote/references/install-openclaw-plugins.md
  • skills/nemoclaw-user-deploy-remote/references/sandbox-hardening.md
  • skills/nemoclaw-user-monitor-sandbox/SKILL.md
  • skills/nemoclaw-user-monitor-sandbox/evals/evals.json

},
{
"id": "docs-deployment-brev-web-ui-003",
"question": "I'm the hosted sandbox is created. Help me confirm where to connect and how to start using it so I can move from provisioning to actual agent work.",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Mirror the eval question grammar fix here too.

Line 34 has the same malformed phrase (I'm the hosted sandbox is created.). Please apply the same corrected wording used in skills/nemoclaw-user-deploy-remote/evals/evals.json.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.agents/skills/nemoclaw-user-deploy-remote/evals/evals.json at line 34, The
"question" JSON value contains a malformed phrase ("I'm the hosted sandbox is
created."); update the "question" field to use correct grammar by replacing that
fragment with a clear phrasing such as "Now that the hosted sandbox is created,
help me confirm where to connect and how to start using it so I can move from
provisioning to actual agent work." Ensure you edit the "question" key in the
JSON object to exactly use the corrected sentence.

},
{
"id": "docs-deployment-brev-web-ui-003",
"question": "I'm the hosted sandbox is created. Help me confirm where to connect and how to start using it so I can move from provisioning to actual agent work.",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix malformed question text in eval case.

Line 34 should be rephrased (e.g., The hosted sandbox is created. Help me confirm... or After the hosted sandbox is created, help me confirm...).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@skills/nemoclaw-user-deploy-remote/evals/evals.json` at line 34, The
"question" string in the eval case is malformed; update the value of the
"question" key in evals.json (the JSON object containing "question") to a
grammatically correct sentence such as "The hosted sandbox is created. Help me
confirm where to connect and how to start using it so I can move from
provisioning to actual agent work." or "After the hosted sandbox is created,
help me confirm where to connect and how to start using it so I can move from
provisioning to actual agent work." ensuring proper punctuation and spacing.


The dashboard might initially show a **Pairing required** warning.
This means the gateway is still completing pairing in the background.
Wait for about a few minutes for pairing to finish automatically. Refresh the dashboard to see if the warning is resolved and the connection is established.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Tighten wording for readability.

Wait for about a few minutes is awkward; use either Wait a few minutes or Wait about a minute or two.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@skills/nemoclaw-user-deploy-remote/references/brev-web-ui.md` at line 101,
Replace the awkward phrase "Wait for about a few minutes for pairing to finish
automatically." (the sentence in the guidance text) with a tighter wording such
as "Wait a few minutes for pairing to finish automatically." or "Wait about a
minute or two for pairing to finish automatically." to improve readability;
update the sentence exactly where it appears in the guidance content.

@@ -0,0 +1,177 @@
---
name: "nemoclaw-user-deploy-remote"
description: "Explains how to run NemoClaw on a remote GPU instance, including the deprecated Brev compatibility path and the preferred installer plus onboard flow. Use when deploying NemoClaw to a remote VM, onboarding a Brev instance, or migrating away from the legacy `nemoclaw deploy` wrapper. Trigger keywords - deploy nemoclaw remote gpu, nemoclaw brev cloud deployment, nemoclaw plugins, openclaw plugins, install openclaw plugin, nemoclaw onboard from dockerfile, nemoclaw brev web ui, nemoclaw getting started, brev quickstart, nvidia nemotron agent, nemoclaw sandbox hardening, container security, docker capabilities, process limits."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix product-name casing in metadata keywords/description.

Line 3 uses nvidia and openclaw in lowercase. Please normalize to NVIDIA and OpenClaw (and keep NemoClaw/OpenShell exact casing everywhere in metadata too).

As per coding guidelines, "NVIDIA must be all caps (not Nvidia, nvidia)." and "NemoClaw, OpenClaw, and OpenShell must use correct casing."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@skills/nemoclaw-user-deploy-remote/SKILL.md` at line 3, Update the metadata
description string in SKILL.md to fix product-name casing: replace lowercase
occurrences of "nvidia" with "NVIDIA" and "openclaw" with "OpenClaw", and ensure
any other product tokens in that description use the exact casing "NemoClaw" and
"OpenShell" where present; edit the description value (the string assigned to
description in the diff) to perform these substitutions so all keywords and
phrases follow the project's casing rules.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 28, 2026

PR Review Advisor

Findings: 0 needs attention, 15 worth checking, 0 nice ideas
Since last review: 0 prior items resolved, 13 still apply, 0 new items found

Review findings

🛠️ Needs attention

  • None.

🔎 Worth checking

  • Source-of-truth review needed: Legacy `nemoclaw deploy` compatibility guidance: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: skills/nemoclaw-user-deploy-remote/SKILL.md lines 39-66 document the legacy compatibility flow; src/lib/deploy/index.ts logs the deprecation warning.
  • Source-of-truth review needed: Proxy host/port configuration guidance: The advisor marked localized patch analysis as missing.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: skills/nemoclaw-user-deploy-remote/SKILL.md says colons and 0-65535 are accepted; src/lib/onboard/dockerfile-patch.ts accepts only letters, digits, dot, underscore, hyphen and ports 1-65535.
  • Source-of-truth review needed: Landlock best_effort fallback guidance: The advisor marked localized patch analysis as missing.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: skills/nemoclaw-user-deploy-remote/references/sandbox-hardening.md states Landlock is silently skipped on unsupported kernels, then claims files outside writable paths are inaccessible regardless of DAC permissions.
  • Source-of-truth review needed: First-run readiness timeout recovery guidance: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: skills/nemoclaw-user-deploy-remote/SKILL.md says onboard deletes the partially-created sandbox after timeout; src/lib/onboard.ts deletes in the non-GPU timeout cleanup branch and handles GPU failures separately.
  • Source-of-truth review needed: Brev web UI remote dashboard exposure guidance: The advisor marked localized patch analysis as missing.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: skills/nemoclaw-user-deploy-remote/references/brev-web-ui.md discusses Chat With Agent, pairing, and Gateway Access without the non-loopback CHAT_UI_URL warning from the main skill.
  • Source-of-truth review needed: Diagnostics and log sharing guidance: The advisor marked localized patch analysis as missing.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: skills/nemoclaw-user-monitor-sandbox/SKILL.md presents `nemoclaw <name> logs` and `--follow` without any redaction note.
  • Granular eval guardrails were removed from security-sensitive skills (.agents/skills/nemoclaw-user-deploy-remote/evals/evals.json:1): The updated evals remove the per-case expected_behavior arrays that previously required loading the intended skill or reference file, avoiding unsupported NemoClaw behavior, and following progressive disclosure. The replacement broad ground_truth text gives weaker regression coverage for remote deployment, plugin installation, dashboard auth, sandbox hardening, diagnostics handling, and deprecation guidance.
    • Recommendation: Restore the detailed expected_behavior checks, or add equivalent assertions in the eval format used by the signing/export pipeline. Include assertions for reference loading, unsupported-behavior avoidance, deprecated deploy warnings, dashboard exposure caveats, credential redaction, Landlock residual-risk wording, and plugin Dockerfile trust guidance.
    • Evidence: The diff removes expected_behavior arrays from .agents/skills/nemoclaw-user-deploy-remote/evals/evals.json and .agents/skills/nemoclaw-user-monitor-sandbox/evals/evals.json; the new skills/.../evals/evals.json files contain only id, question, expected_skill, and broad ground_truth fields.
  • Proxy validation documentation disagrees with implementation (skills/nemoclaw-user-deploy-remote/SKILL.md:152): The skill says proxy hosts may contain colons and ports may be 0-65535, but onboarding rejects colon-containing hosts and rejects port 0. Users following the skill could choose proxy values that are ignored or rejected, leaving default proxy behavior in place when they expected a custom route.
    • Recommendation: Update the skill text to match the source of truth: host characters are letters, digits, dot, underscore, and hyphen; port must be numeric from 1 through 65535. Add or restore an eval assertion covering rejected colon hosts and port 0 if proxy guidance remains in this skill.
    • Evidence: skills/nemoclaw-user-deploy-remote/SKILL.md says 'Only alphanumeric characters, dots, hyphens, and colons are accepted for the host' and '0-65535'; src/lib/onboard/dockerfile-patch.ts defines PROXY_HOST_RE = /^[A-Za-z0-9._-]+$/ and isValidProxyPort returns true only for ports >= 1 and <= 65535.
  • Diagnostics guidance lacks credential redaction warnings (skills/nemoclaw-user-monitor-sandbox/SKILL.md:45): The monitoring skill encourages users to collect logs and traces, including blueprint runner and sandbox logs, but does not warn users to redact provider keys, bot tokens, private URLs, chat IDs, PII, or other sensitive values before sharing diagnostics. The deploy path can materialize forwarded credentials into the remote .env file, making diagnostic handling more sensitive.
    • Recommendation: Add a short safety note before the log-sharing guidance telling users to redact API keys, bot tokens, provider credentials, private URLs, chat IDs, PII, and to avoid sharing remote .env contents.
    • Evidence: skills/nemoclaw-user-monitor-sandbox/SKILL.md shows `nemoclaw <name> logs` and `nemoclaw <name> logs --follow` without redaction guidance; src/lib/deploy/index.ts builds credential env lines and copies them to `${remoteDir}/.env` with chmod 600.
  • Brev web UI guide omits the remote dashboard auth caveat (skills/nemoclaw-user-deploy-remote/references/brev-web-ui.md:99): The main deployment skill warns that a non-loopback CHAT_UI_URL disables OpenClaw device pairing and that any device reaching the dashboard origin can connect. The Brev web UI reference discusses browser dashboard pairing and Gateway Access but does not repeat that exposure caveat for hosted browser use.
    • Recommendation: Add the same remote-origin/device-pairing caveat to the Brev web UI reference near the dashboard and pairing instructions, especially warning against internet-reachable or shared-network exposure without compensating controls.
    • Evidence: skills/nemoclaw-user-deploy-remote/SKILL.md warns that non-loopback CHAT_UI_URL disables device pairing and any reachable device can connect; skills/nemoclaw-user-deploy-remote/references/brev-web-ui.md discusses pairing and Gateway Access without that warning.
  • Landlock best_effort fallback wording overstates filesystem protection (skills/nemoclaw-user-deploy-remote/references/sandbox-hardening.md:110): The hardening reference says Landlock enforcement is silently skipped on unsupported kernels and falls back to DAC only, but then says files outside writable paths would be inaccessible regardless of DAC permissions. That overstates protection in the DAC-only fallback and can create a false sense of sandbox filesystem isolation.
    • Recommendation: Clarify that when Landlock is skipped, access depends on DAC and container/runtime permissions; reserve fixed writable/read-only path guarantees for hosts where Landlock enforcement is active. Add or restore an eval assertion that assistants explain this residual risk accurately.
    • Evidence: skills/nemoclaw-user-deploy-remote/references/sandbox-hardening.md says best_effort skips Landlock, then says 'Files outside the writable paths would be inaccessible to the agent regardless of DAC permissions.' src/lib/onboard.ts warns that filesystem restrictions silently degrade in best_effort mode.
  • Plugin Dockerfile example uses a mutable sandbox base image (skills/nemoclaw-user-deploy-remote/references/install-openclaw-plugins.md:35): The plugin installation guide extends the NemoClaw sandbox image from a mutable latest tag. Because this guidance is about adding code to a sandbox boundary, mutable bases make builds less reproducible and weaken review of the exact trusted image users are extending.
    • Recommendation: Use a pinned version tag or digest in the example, or add an explicit note instructing users to pin the base image they reviewed and update it deliberately. Also mention lockfiles and trusted registries for plugin dependencies.
    • Evidence: The added Dockerfile sample contains `ARG SANDBOX_BASE=ghcr.io/nvidia/nemoclaw/sandbox-base:latest` followed by `FROM ${SANDBOX_BASE}`.
  • Legacy deploy compatibility path needs a clearer removal and regression guard (skills/nemoclaw-user-deploy-remote/SKILL.md:39): The skill correctly marks `nemoclaw deploy` as deprecated and presents installer plus onboard as preferred, but the localized compatibility guidance does not state a concrete removal condition. The related eval no longer requires the assistant to treat the command as deprecated while explaining the path.
    • Recommendation: Add a removal condition such as a tracked issue, version, or command-reference link, and restore an eval assertion that the assistant warns about deprecation and does not recommend `nemoclaw deploy` as the primary path.
    • Evidence: skills/nemoclaw-user-deploy-remote/SKILL.md documents the legacy deploy path at lines 39-66; src/lib/deploy/index.ts logs that `nemoclaw deploy` is deprecated and will be removed in a future release; the related eval now has only broad ground_truth text after expected_behavior assertions were removed.
  • Readiness-timeout recovery guidance is overbroad and lacks a targeted guard (skills/nemoclaw-user-deploy-remote/SKILL.md:130): The skill states that if onboard times out, it deletes the partially-created sandbox before the next attempt. The implementation deletes the failed sandbox in the non-GPU timeout path, while GPU failure handling follows a separate branch, so the blanket statement can be inaccurate for a sandbox lifecycle edge case.
    • Recommendation: Tie the statement to the source command/reference and narrow it to the path the implementation actually guarantees, or update the source behavior if the intent is universal. Add a targeted eval/test assertion covering the invalid state, cleanup behavior, retry guidance, and any GPU/non-GPU distinction.
    • Evidence: skills/nemoclaw-user-deploy-remote/SKILL.md says onboard deletes the partially-created sandbox after a 180-second readiness timeout; src/lib/onboard.ts reports the timeout and only deletes the failed sandbox in the non-GPU cleanup branch, while GPU failure handling uses handleGpuSandboxCreateFailure.
  • Deploy benchmark FAIL conflicts with ready-for-use skill card (skills/nemoclaw-user-deploy-remote/BENCHMARK.md:10): The deploy skill benchmark reports an overall FAIL and recommends review before publication, while the skill card says the skill is ready for commercial/non-commercial use. That mismatch weakens the publication signal for a security-sensitive deployment skill.
    • Recommendation: Either address the benchmark findings and refresh the benchmark before marking the skill ready, or update the skill card to reflect that publication depends on resolving the current benchmark findings.
    • Evidence: skills/nemoclaw-user-deploy-remote/BENCHMARK.md says 'Overall verdict: FAIL' and 'The skill should be reviewed before NVSkills-Eval publication'; skills/nemoclaw-user-deploy-remote/skill-card.md says 'This skill is ready for commercial/non-commercial use.'

🌱 Nice ideas

  • None.
Since last review details

Current findings:

  • Source-of-truth review needed: Legacy `nemoclaw deploy` compatibility guidance: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: skills/nemoclaw-user-deploy-remote/SKILL.md lines 39-66 document the legacy compatibility flow; src/lib/deploy/index.ts logs the deprecation warning.
  • Source-of-truth review needed: Proxy host/port configuration guidance: The advisor marked localized patch analysis as missing.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: skills/nemoclaw-user-deploy-remote/SKILL.md says colons and 0-65535 are accepted; src/lib/onboard/dockerfile-patch.ts accepts only letters, digits, dot, underscore, hyphen and ports 1-65535.
  • Source-of-truth review needed: Landlock best_effort fallback guidance: The advisor marked localized patch analysis as missing.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: skills/nemoclaw-user-deploy-remote/references/sandbox-hardening.md states Landlock is silently skipped on unsupported kernels, then claims files outside writable paths are inaccessible regardless of DAC permissions.
  • Source-of-truth review needed: First-run readiness timeout recovery guidance: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: skills/nemoclaw-user-deploy-remote/SKILL.md says onboard deletes the partially-created sandbox after timeout; src/lib/onboard.ts deletes in the non-GPU timeout cleanup branch and handles GPU failures separately.
  • Source-of-truth review needed: Brev web UI remote dashboard exposure guidance: The advisor marked localized patch analysis as missing.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: skills/nemoclaw-user-deploy-remote/references/brev-web-ui.md discusses Chat With Agent, pairing, and Gateway Access without the non-loopback CHAT_UI_URL warning from the main skill.
  • Source-of-truth review needed: Diagnostics and log sharing guidance: The advisor marked localized patch analysis as missing.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: skills/nemoclaw-user-monitor-sandbox/SKILL.md presents `nemoclaw <name> logs` and `--follow` without any redaction note.
  • Granular eval guardrails were removed from security-sensitive skills (.agents/skills/nemoclaw-user-deploy-remote/evals/evals.json:1): The updated evals remove the per-case expected_behavior arrays that previously required loading the intended skill or reference file, avoiding unsupported NemoClaw behavior, and following progressive disclosure. The replacement broad ground_truth text gives weaker regression coverage for remote deployment, plugin installation, dashboard auth, sandbox hardening, diagnostics handling, and deprecation guidance.
    • Recommendation: Restore the detailed expected_behavior checks, or add equivalent assertions in the eval format used by the signing/export pipeline. Include assertions for reference loading, unsupported-behavior avoidance, deprecated deploy warnings, dashboard exposure caveats, credential redaction, Landlock residual-risk wording, and plugin Dockerfile trust guidance.
    • Evidence: The diff removes expected_behavior arrays from .agents/skills/nemoclaw-user-deploy-remote/evals/evals.json and .agents/skills/nemoclaw-user-monitor-sandbox/evals/evals.json; the new skills/.../evals/evals.json files contain only id, question, expected_skill, and broad ground_truth fields.
  • Proxy validation documentation disagrees with implementation (skills/nemoclaw-user-deploy-remote/SKILL.md:152): The skill says proxy hosts may contain colons and ports may be 0-65535, but onboarding rejects colon-containing hosts and rejects port 0. Users following the skill could choose proxy values that are ignored or rejected, leaving default proxy behavior in place when they expected a custom route.
    • Recommendation: Update the skill text to match the source of truth: host characters are letters, digits, dot, underscore, and hyphen; port must be numeric from 1 through 65535. Add or restore an eval assertion covering rejected colon hosts and port 0 if proxy guidance remains in this skill.
    • Evidence: skills/nemoclaw-user-deploy-remote/SKILL.md says 'Only alphanumeric characters, dots, hyphens, and colons are accepted for the host' and '0-65535'; src/lib/onboard/dockerfile-patch.ts defines PROXY_HOST_RE = /^[A-Za-z0-9._-]+$/ and isValidProxyPort returns true only for ports >= 1 and <= 65535.
  • Diagnostics guidance lacks credential redaction warnings (skills/nemoclaw-user-monitor-sandbox/SKILL.md:45): The monitoring skill encourages users to collect logs and traces, including blueprint runner and sandbox logs, but does not warn users to redact provider keys, bot tokens, private URLs, chat IDs, PII, or other sensitive values before sharing diagnostics. The deploy path can materialize forwarded credentials into the remote .env file, making diagnostic handling more sensitive.
    • Recommendation: Add a short safety note before the log-sharing guidance telling users to redact API keys, bot tokens, provider credentials, private URLs, chat IDs, PII, and to avoid sharing remote .env contents.
    • Evidence: skills/nemoclaw-user-monitor-sandbox/SKILL.md shows `nemoclaw <name> logs` and `nemoclaw <name> logs --follow` without redaction guidance; src/lib/deploy/index.ts builds credential env lines and copies them to `${remoteDir}/.env` with chmod 600.
  • Brev web UI guide omits the remote dashboard auth caveat (skills/nemoclaw-user-deploy-remote/references/brev-web-ui.md:99): The main deployment skill warns that a non-loopback CHAT_UI_URL disables OpenClaw device pairing and that any device reaching the dashboard origin can connect. The Brev web UI reference discusses browser dashboard pairing and Gateway Access but does not repeat that exposure caveat for hosted browser use.
    • Recommendation: Add the same remote-origin/device-pairing caveat to the Brev web UI reference near the dashboard and pairing instructions, especially warning against internet-reachable or shared-network exposure without compensating controls.
    • Evidence: skills/nemoclaw-user-deploy-remote/SKILL.md warns that non-loopback CHAT_UI_URL disables device pairing and any reachable device can connect; skills/nemoclaw-user-deploy-remote/references/brev-web-ui.md discusses pairing and Gateway Access without that warning.
  • Landlock best_effort fallback wording overstates filesystem protection (skills/nemoclaw-user-deploy-remote/references/sandbox-hardening.md:110): The hardening reference says Landlock enforcement is silently skipped on unsupported kernels and falls back to DAC only, but then says files outside writable paths would be inaccessible regardless of DAC permissions. That overstates protection in the DAC-only fallback and can create a false sense of sandbox filesystem isolation.
    • Recommendation: Clarify that when Landlock is skipped, access depends on DAC and container/runtime permissions; reserve fixed writable/read-only path guarantees for hosts where Landlock enforcement is active. Add or restore an eval assertion that assistants explain this residual risk accurately.
    • Evidence: skills/nemoclaw-user-deploy-remote/references/sandbox-hardening.md says best_effort skips Landlock, then says 'Files outside the writable paths would be inaccessible to the agent regardless of DAC permissions.' src/lib/onboard.ts warns that filesystem restrictions silently degrade in best_effort mode.
  • Plugin Dockerfile example uses a mutable sandbox base image (skills/nemoclaw-user-deploy-remote/references/install-openclaw-plugins.md:35): The plugin installation guide extends the NemoClaw sandbox image from a mutable latest tag. Because this guidance is about adding code to a sandbox boundary, mutable bases make builds less reproducible and weaken review of the exact trusted image users are extending.
    • Recommendation: Use a pinned version tag or digest in the example, or add an explicit note instructing users to pin the base image they reviewed and update it deliberately. Also mention lockfiles and trusted registries for plugin dependencies.
    • Evidence: The added Dockerfile sample contains `ARG SANDBOX_BASE=ghcr.io/nvidia/nemoclaw/sandbox-base:latest` followed by `FROM ${SANDBOX_BASE}`.
  • Legacy deploy compatibility path needs a clearer removal and regression guard (skills/nemoclaw-user-deploy-remote/SKILL.md:39): The skill correctly marks `nemoclaw deploy` as deprecated and presents installer plus onboard as preferred, but the localized compatibility guidance does not state a concrete removal condition. The related eval no longer requires the assistant to treat the command as deprecated while explaining the path.
    • Recommendation: Add a removal condition such as a tracked issue, version, or command-reference link, and restore an eval assertion that the assistant warns about deprecation and does not recommend `nemoclaw deploy` as the primary path.
    • Evidence: skills/nemoclaw-user-deploy-remote/SKILL.md documents the legacy deploy path at lines 39-66; src/lib/deploy/index.ts logs that `nemoclaw deploy` is deprecated and will be removed in a future release; the related eval now has only broad ground_truth text after expected_behavior assertions were removed.
  • Readiness-timeout recovery guidance is overbroad and lacks a targeted guard (skills/nemoclaw-user-deploy-remote/SKILL.md:130): The skill states that if onboard times out, it deletes the partially-created sandbox before the next attempt. The implementation deletes the failed sandbox in the non-GPU timeout path, while GPU failure handling follows a separate branch, so the blanket statement can be inaccurate for a sandbox lifecycle edge case.
    • Recommendation: Tie the statement to the source command/reference and narrow it to the path the implementation actually guarantees, or update the source behavior if the intent is universal. Add a targeted eval/test assertion covering the invalid state, cleanup behavior, retry guidance, and any GPU/non-GPU distinction.
    • Evidence: skills/nemoclaw-user-deploy-remote/SKILL.md says onboard deletes the partially-created sandbox after a 180-second readiness timeout; src/lib/onboard.ts reports the timeout and only deletes the failed sandbox in the non-GPU cleanup branch, while GPU failure handling uses handleGpuSandboxCreateFailure.
  • Deploy benchmark FAIL conflicts with ready-for-use skill card (skills/nemoclaw-user-deploy-remote/BENCHMARK.md:10): The deploy skill benchmark reports an overall FAIL and recommends review before publication, while the skill card says the skill is ready for commercial/non-commercial use. That mismatch weakens the publication signal for a security-sensitive deployment skill.
    • Recommendation: Either address the benchmark findings and refresh the benchmark before marking the skill ready, or update the skill card to reflect that publication depends on resolving the current benchmark findings.
    • Evidence: skills/nemoclaw-user-deploy-remote/BENCHMARK.md says 'Overall verdict: FAIL' and 'The skill should be reviewed before NVSkills-Eval publication'; skills/nemoclaw-user-deploy-remote/skill-card.md says 'This skill is ready for commercial/non-commercial use.'

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

Signed-off-by: nvskills-svc-account <svc-nvskills-signing@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 28, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@skills/nemoclaw-user-monitor-sandbox/BENCHMARK.md`:
- Line 1: Add the required SPDX header as an HTML comment at the very top of the
BENCHMARK.md file; specifically insert an HTML comment of the form <!--
SPDX-License-Identifier: <SPDX-ID> --> (replacing <SPDX-ID> with the correct
license identifier for this repository) as the first line so the Markdown file
conforms to the project's SPDX header requirement.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 769de880-1097-4c9d-96c4-c062927a4859

📥 Commits

Reviewing files that changed from the base of the PR and between 08cfc0c and 8dbfcae.

📒 Files selected for processing (6)
  • skills/nemoclaw-user-deploy-remote/BENCHMARK.md
  • skills/nemoclaw-user-deploy-remote/skill-card.md
  • skills/nemoclaw-user-deploy-remote/skill.oms.sig
  • skills/nemoclaw-user-monitor-sandbox/BENCHMARK.md
  • skills/nemoclaw-user-monitor-sandbox/skill-card.md
  • skills/nemoclaw-user-monitor-sandbox/skill.oms.sig
✅ Files skipped from review due to trivial changes (4)
  • skills/nemoclaw-user-deploy-remote/skill-card.md
  • skills/nemoclaw-user-deploy-remote/skill.oms.sig
  • skills/nemoclaw-user-deploy-remote/BENCHMARK.md
  • skills/nemoclaw-user-monitor-sandbox/skill-card.md

@@ -0,0 +1,64 @@
# Evaluation Report
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add the required SPDX header at the top of this Markdown file.

This new .md file is missing the mandatory SPDX header comment on Line 1.

Suggested fix
+<!-- SPDX-License-Identifier: Apache-2.0 -->
+
 # Evaluation Report

As per coding guidelines, "Markdown documentation files must include SPDX headers in HTML comment format".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Evaluation Report
<!-- SPDX-License-Identifier: Apache-2.0 -->
# Evaluation Report
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@skills/nemoclaw-user-monitor-sandbox/BENCHMARK.md` at line 1, Add the
required SPDX header as an HTML comment at the very top of the BENCHMARK.md
file; specifically insert an HTML comment of the form <!--
SPDX-License-Identifier: <SPDX-ID> --> (replacing <SPDX-ID> with the correct
license identifier for this repository) as the first line so the Markdown file
conforms to the project's SPDX header requirement.

@jyaunches jyaunches self-requested a review May 28, 2026 21:24
@miyoungc miyoungc merged commit 7fb10de into main May 28, 2026
17 checks passed
@miyoungc miyoungc deleted the skills-sign-batch-4 branch May 28, 2026 21:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants