docs(skills): refactor d2s script#4549
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
📝 WalkthroughWalkthroughThis PR refactors the docs-to-skills generation pipeline from a ChangesDocs-to-Skills Generator and Skill Consolidation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
|
🌿 Preview your docs: https://nvidia-preview-pr-4549.docs.buildwithfern.com/nemoclaw |
E2E Advisor RecommendationRequired E2E: None Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
|
E2E Scenario Advisor RecommendationRequired scenario E2E: None Full scenario advisor summaryE2E Scenario AdvisorBase: Required scenario E2E
Optional scenario E2E
Relevant changed files
|
PR Review AdvisorFindings: 0 needs attention, 5 worth checking, 0 nice ideas Review findings🛠️ Needs attention
🔎 Worth checking
🌱 Nice ideas
Since last review detailsCurrent findings:
This is an automated advisory review. A human maintainer must make the final merge decision. |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (3)
.agents/skills/nemoclaw-user-overview/SKILL.md (1)
14-14: Split into one sentence per line.This bullet contains three sentences on a single line, which makes diffs harder to review. Each sentence should be on its own line in the source Markdown.
As per coding guidelines: One sentence per line in source makes diffs readable.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.agents/skills/nemoclaw-user-overview/SKILL.md at line 14, Edit the bullet starting with "Load [references/overview.md](references/overview.md)" in SKILL.md to place each sentence on its own line: break the current single-line bullet into three separate lines so the sentence about when to use the Ecosystem page, the sentence about internal mechanics/How It Works, and the sentence explaining what NemoClaw covers (onboarding, lifecycle management, OpenClaw operations, capabilities and purpose) are each on their own line; keep the original wording but only adjust line breaks..agents/skills/nemoclaw-user-manage-policy/SKILL.md (1)
284-284: Use active voice."Custom presets applied with
--from-fileor--from-dirare recorded in the NemoClaw sandbox registry" uses passive voice. Consider: "NemoClaw records custom presets applied with--from-fileor--from-dirin the sandbox registry."Similarly, "can be removed" and "does not need to be kept" are passive constructions.
As per coding guidelines: Active voice is required for all documentation.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.agents/skills/nemoclaw-user-manage-policy/SKILL.md at line 284, Rewrite the passive sentences to active voice: change "Custom presets applied with `--from-file` or `--from-dir` are recorded in the NemoClaw sandbox registry alongside their full YAML content, so they can be removed by name — the original file does not need to be kept on disk" to an active form such as "NemoClaw records custom presets applied with `--from-file` or `--from-dir` in the sandbox registry along with their full YAML content, so you can remove them by name and do not need to keep the original file on disk." Update the line containing `--from-file`, `--from-dir`, "NemoClaw sandbox registry", and references to "removed by name" / "does not need to be kept" to use active voice consistently..agents/skills/nemoclaw-user-configure-security/SKILL.md (1)
16-16: Split into one sentence per line.This bullet contains multiple sentences on a single line. Each sentence should be on its own line in the source Markdown for better diff readability.
As per coding guidelines: One sentence per line in source makes diffs readable.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.agents/skills/nemoclaw-user-configure-security/SKILL.md at line 16, The bullet in .agents/skills/nemoclaw-user-configure-security/SKILL.md contains multiple sentences on one line; split that single bullet into separate lines so each sentence is on its own line (e.g., break the sentence that starts "Lists OpenClaw security controls..." into separate lines for each sentence), preserving the same wording and the reference **Load [references/openclaw-controls.md](references/openclaw-controls.md)** and the list of controls (prompt injection detection, tool access control, rate limiting, environment variable policy, audit framework, supply chain scanning, messaging access policy, context visibility, and safe regex) so diffs show one sentence per line.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/CONTRIBUTING.md`:
- Line 99: Replace the passive, inaccurate sentence "Sibling pages are written
unchanged to `references/`." with an active, accurate statement noting that the
generator rewrites and normalizes files before output; update the line to
something like: state that the generator rewrites and normalizes reference
markdown and then writes it to `references/`, removing the word "unchanged" and
using active voice so the documentation reflects the actual behavior of the
generator.
In `@scripts/docs-to-skills.py`:
- Around line 1629-1635: The code builds reference filenames using only
page.path.stem which causes collisions across directories; change the ref_name
creation in the loop (the block using reference_pages, _page_rel(page),
skill_md_local_links and reference_local_links) to derive a unique filename/path
from the page's relative path (include parent directory segments or use the
page.path relative-to-root with a .md suffix) instead of stem-only, and update
both assignments (skill_md_local_links[rel] and reference_local_links[rel])
accordingly; apply the same fix to the similar block around the other occurrence
(lines shown 1755-1766).
- Around line 1853-1855: The current assignment groups[page.path.stem] = [page]
can clobber other procedure pages with identical filenames (e.g., multiple
index.mdx); change the key to a collision-safe unique identifier instead of
page.path.stem (for example use page.path.as_posix() or
page.path.parent.joinpath(page.path.stem).as_posix(), or a page.slug/URL if
available) so each procedure group is unique; update the branch that checks
page.content_type in PROCEDURE_CONTENT_TYPES to use the new key when creating
the group and keep the rest of grouping logic unchanged.
---
Nitpick comments:
In @.agents/skills/nemoclaw-user-configure-security/SKILL.md:
- Line 16: The bullet in
.agents/skills/nemoclaw-user-configure-security/SKILL.md contains multiple
sentences on one line; split that single bullet into separate lines so each
sentence is on its own line (e.g., break the sentence that starts "Lists
OpenClaw security controls..." into separate lines for each sentence),
preserving the same wording and the reference **Load
[references/openclaw-controls.md](references/openclaw-controls.md)** and the
list of controls (prompt injection detection, tool access control, rate
limiting, environment variable policy, audit framework, supply chain scanning,
messaging access policy, context visibility, and safe regex) so diffs show one
sentence per line.
In @.agents/skills/nemoclaw-user-manage-policy/SKILL.md:
- Line 284: Rewrite the passive sentences to active voice: change "Custom
presets applied with `--from-file` or `--from-dir` are recorded in the NemoClaw
sandbox registry alongside their full YAML content, so they can be removed by
name — the original file does not need to be kept on disk" to an active form
such as "NemoClaw records custom presets applied with `--from-file` or
`--from-dir` in the sandbox registry along with their full YAML content, so you
can remove them by name and do not need to keep the original file on disk."
Update the line containing `--from-file`, `--from-dir`, "NemoClaw sandbox
registry", and references to "removed by name" / "does not need to be kept" to
use active voice consistently.
In @.agents/skills/nemoclaw-user-overview/SKILL.md:
- Line 14: Edit the bullet starting with "Load
[references/overview.md](references/overview.md)" in SKILL.md to place each
sentence on its own line: break the current single-line bullet into three
separate lines so the sentence about when to use the Ecosystem page, the
sentence about internal mechanics/How It Works, and the sentence explaining what
NemoClaw covers (onboarding, lifecycle management, OpenClaw operations,
capabilities and purpose) are each on their own line; keep the original wording
but only adjust line breaks.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: c77f483e-8e4c-4710-ad65-6b0abc62594b
📒 Files selected for processing (17)
.agents/skills/nemoclaw-user-agent-skills/SKILL.md.agents/skills/nemoclaw-user-agent-skills/references/agent-skills.md.agents/skills/nemoclaw-user-configure-inference/SKILL.md.agents/skills/nemoclaw-user-configure-inference/references/use-local-inference-details.md.agents/skills/nemoclaw-user-configure-security/SKILL.md.agents/skills/nemoclaw-user-get-started/SKILL.md.agents/skills/nemoclaw-user-get-started/references/quickstart-details.md.agents/skills/nemoclaw-user-manage-policy/SKILL.md.agents/skills/nemoclaw-user-manage-policy/references/customize-network-policy-details.md.agents/skills/nemoclaw-user-manage-sandboxes/SKILL.md.agents/skills/nemoclaw-user-manage-sandboxes/references/lifecycle-details.md.agents/skills/nemoclaw-user-overview/SKILL.md.agents/skills/nemoclaw-user-reference/SKILL.mddocs/CONTRIBUTING.mddocs/about/overview.mdxdocs/resources/agent-skills.mdxscripts/docs-to-skills.py
💤 Files with no reviewable changes (5)
- .agents/skills/nemoclaw-user-get-started/references/quickstart-details.md
- .agents/skills/nemoclaw-user-manage-policy/references/customize-network-policy-details.md
- .agents/skills/nemoclaw-user-agent-skills/references/agent-skills.md
- .agents/skills/nemoclaw-user-configure-inference/references/use-local-inference-details.md
- .agents/skills/nemoclaw-user-manage-sandboxes/references/lifecycle-details.md
| Sibling procedure pages, concept pages, and reference pages go into a `references/` subdirectory for progressive disclosure, keeping `SKILL.md` concise while preserving access to the full docs. | ||
| The script reads YAML frontmatter from each doc page to determine its content type (`how_to`, `concept`, `reference`, `get_started`), then groups pages into skills using the `grouped` strategy by default. | ||
| Within each directory group, the highest-priority procedure page (`how_to`, `get_started`, or `tutorial`) becomes the full body of `SKILL.md`. | ||
| Sibling pages are written unchanged to `references/`. |
There was a problem hiding this comment.
Do not describe reference pages as “unchanged.”
The generator rewrites and normalizes reference markdown before writing to references/, so “written unchanged” is inaccurate. Please also rewrite this sentence in active voice.
✏️ Proposed wording
- Sibling pages are written unchanged to `references/`.
+ The generator writes sibling pages to `references/` after markdown cleanup and link rewriting.As per coding guidelines, "Active voice required. Flag passive constructions."
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Sibling pages are written unchanged to `references/`. | |
| The generator writes sibling pages to `references/` after markdown cleanup and link rewriting. |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/CONTRIBUTING.md` at line 99, Replace the passive, inaccurate sentence
"Sibling pages are written unchanged to `references/`." with an active, accurate
statement noting that the generator rewrites and normalizes files before output;
update the line to something like: state that the generator rewrites and
normalizes reference markdown and then writes it to `references/`, removing the
word "unchanged" and using active voice so the documentation reflects the actual
behavior of the generator.
| for page in reference_pages: | ||
| rel = _page_rel(page) | ||
| if rel is None: | ||
| continue | ||
| ref_name = page.path.stem + ".md" | ||
| skill_md_local_links[rel] = f"references/{ref_name}" | ||
| reference_local_links[rel] = ref_name |
There was a problem hiding this comment.
Avoid stem-only reference filenames in aggregated groups.
ref_name = page.path.stem + ".md" can collide in individual mode when concept/reference pages from different directories share a stem, causing last-write-wins overwrites and bad cross-links.
🔧 Proposed fix
def generate_skill(
@@
- skill_md_local_links: dict[str, str] = {}
- reference_local_links: dict[str, str] = {}
+ skill_md_local_links: dict[str, str] = {}
+ reference_local_links: dict[str, str] = {}
+
+ def _reference_filename(page: DocPage) -> str:
+ if strategy == "individual":
+ rel = _page_rel(page)
+ source = Path(rel).with_suffix("").as_posix() if rel else page.path.with_suffix("").as_posix()
+ return source.strip("/").replace("/", "__") + ".md"
+ return page.path.stem + ".md"
@@
for page in reference_pages:
@@
- ref_name = page.path.stem + ".md"
+ ref_name = _reference_filename(page)
skill_md_local_links[rel] = f"references/{ref_name}"
reference_local_links[rel] = ref_name
@@
for ref_page in reference_pages:
- ref_name = ref_page.path.stem + ".md"
+ ref_name = _reference_filename(ref_page)Also applies to: 1755-1766
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/docs-to-skills.py` around lines 1629 - 1635, The code builds
reference filenames using only page.path.stem which causes collisions across
directories; change the ref_name creation in the loop (the block using
reference_pages, _page_rel(page), skill_md_local_links and
reference_local_links) to derive a unique filename/path from the page's relative
path (include parent directory segments or use the page.path relative-to-root
with a .md suffix) instead of stem-only, and update both assignments
(skill_md_local_links[rel] and reference_local_links[rel]) accordingly; apply
the same fix to the similar block around the other occurrence (lines shown
1755-1766).
| if page.content_type in PROCEDURE_CONTENT_TYPES: | ||
| groups[page.path.stem] = [page] | ||
| elif page.content_type == "concept": |
There was a problem hiding this comment.
Use collision-safe keys for individual procedure groups.
groups[page.path.stem] can silently overwrite a previous procedure page when filenames repeat (for example, multiple index.mdx pages), dropping skills from generation.
🔧 Proposed fix
def group_individual(pages: list[DocPage]) -> dict[str, list[DocPage]]:
@@
for page in pages:
if page.content_type in PROCEDURE_CONTENT_TYPES:
- groups[page.path.stem] = [page]
+ base_key = re.sub(
+ r"[^a-z0-9-]",
+ "-",
+ page.path.with_suffix("").as_posix().lower(),
+ ).strip("-")
+ key = base_key
+ n = 2
+ while key in groups:
+ key = f"{base_key}-{n}"
+ n += 1
+ groups[key] = [page]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/docs-to-skills.py` around lines 1853 - 1855, The current assignment
groups[page.path.stem] = [page] can clobber other procedure pages with identical
filenames (e.g., multiple index.mdx); change the key to a collision-safe unique
identifier instead of page.path.stem (for example use page.path.as_posix() or
page.path.parent.joinpath(page.path.stem).as_posix(), or a page.slug/URL if
available) so each procedure group is unique; update the branch that checks
page.content_type in PROCEDURE_CONTENT_TYPES to use the new key when creating
the group and keep the rest of grouping logic unchanged.
Summary
Refactors
scripts/docs-to-skills.pyto a simpler two-strategy generator (groupedandindividual) and regenerates.agents/skills/nemoclaw-user-*output to match. The oldsmartstrategy, 11,500-character spill logic, and*-details.mddeferral files are removed; sibling pages now land inreferences/unchanged, and procedure pages inline in full when they lead a group.Related Issue
None.
Changes
Generator (
scripts/docs-to-skills.py)smartdefault withgrouped(directory groups) and keepindividual(one skill per procedure page; concept/reference buckets).how_to,get_started, ortutorial) becomesSKILL.mdbody; all siblings go toreferences/.overview,reference,configure-security) emit a thinSKILL.mdwith frontmatter +## Referencesonly; full content stays inreferences/.MAX_SKILL_MD_CHARS, section splitting/deferral, and generated*-details.mdspill files.collapse_consecutive_blank_lines,append_markdown_section).Docs source tweaks
docs/about/overview.mdx: addskill.priority: 10for overview ordering in generated metadata.docs/resources/agent-skills.mdx: setcontent.typetohow_toso the single-page group inlines correctly.Regenerated user skills
get-started(quickstart + provider options),configure-inference,manage-policy,manage-sandboxes,deploy-remote,monitor-sandbox,agent-skills.overview,reference,configure-security.quickstart-details.md,use-local-inference-details.md,customize-network-policy-details.md,lifecycle-details.md,agent-skills.md.Docs pipeline docs
docs/CONTRIBUTING.mdgrouped-strategy description.Type of Change
Design notes (reviewer follow-ups)
references/so agents load detail on demand (progressive disclosure). Routing text lives inSKILL.mdfrontmatter + References bullets; full task coverage remains in sibling reference files.resolve_includes()path containment is pre-existing MyST-only behavior; this PR does not expand its surface. Follow-up hardening can be a separate change if we want docs-root allowlisting.Verification
npx prek run --all-filespasses (via CIchecks)npm testpasses (via CIunit-vitest-linux)npm run docsbuilds without warnings (doc changes only; Fern preview posted)Local commands run:
Signed-off-by: Miyoung Choi miyoungc@nvidia.com