docs: restructure agent and contributor documentation (plan 427, PR 1) by nabinchha · Pull Request #454 · NVIDIA-NeMo/DataDesigner

nabinchha · 2026-03-24T00:11:48Z

📋 Summary

Restructure DataDesigner's documentation to clearly separate concerns: a concise architectural guide for agents (AGENTS.md), a comprehensive code style reference (STYLEGUIDE.md), a contributor-focused development guide (DEVELOPMENT.md), and an updated contribution guide (CONTRIBUTING.md). This is PR 1 of the agent-assisted development plan (plan #427), covering Phases 0–2.

🔄 Changes

✨ Added

STYLEGUIDE.md — New comprehensive code style guide extracted from the old AGENTS.md, expanded with:
- Google-style docstring conventions (Args:, Returns:, Raises:, Attributes:)
- Pydantic model and dataclass guidance (when to use each, ConfigBase patterns, validator naming)
- Error handling patterns (raise ... from exc, boundary wrapping, canonical error types)
- f-string preference, nested function avoidance, and other style rules
DEVELOPMENT.md — New development guide extracted from the old AGENTS.md and CONTRIBUTING.md, including:
- Per-package test targets (make test-config, make test-engine, make test-interface)
- E2E/tutorial/recipe test commands with API key setup note
- Flat test function preference (no class-based suites)
- Notebook regeneration commands (make convert-execute-notebooks, make generate-colab-notebooks)
- Import performance CI threshold documentation (3-second average)
.agents/README.md — Documents the .agents/ directory structure, symlink compatibility, and development-vs-usage scope
architecture/ — 10 stub architecture documents (overview.md, config.md, engine.md, models.md, mcp.md, dataset-builders.md, sampling.md, cli.md, agent-introspection.md, plugins.md) ready for Phase 3 content

🔧 Changed

AGENTS.md — Rewritten from ~626 lines to ~56 lines as a focused architectural guide: identity, layering, core concepts, design principles, structural invariants, and development pointers
CONTRIBUTING.md — Overhauled to focus on agent-assisted contribution workflow, referencing the new doc structure
README.md — Added brief mention of agent-assisted development
.agents/skills/review-code/SKILL.md — Updated to reference the new three-file doc structure (AGENTS.md, STYLEGUIDE.md, DEVELOPMENT.md)
plans/427/agent-first-development-plan.md — Updated delivery strategy to combine Phases 0–2 into PR 1

🗑️ Removed

.claude/skills/new-sdg/ — Obsolete prototyping skill, superseded by skills/data-designer/

🏗️ Restructured

.claude/skills/ and .claude/agents/ — Canonical files moved to .agents/skills/ and .agents/agents/; .claude/ directories replaced with symlinks for backward compatibility

🔍 Attention Areas

⚠️ Reviewers: Please pay special attention to the following:

AGENTS.md — This is the primary file agents read on every interaction. Verify the layering description, dependency direction, and structural invariants are accurate.
.claude/agents and .claude/skills — These are now symlinks to .agents/. Verify they resolve correctly in your local checkout.
STYLEGUIDE.md — New sections on docstrings, Pydantic/dataclass conventions, and error handling should be validated against current codebase patterns.

🤖 Generated with AI

PR-1 for #427

Restructure AGENTS.md from ~627 lines to ~55 lines of high-signal architectural invariants. Extract code style into STYLEGUIDE.md and development workflow into DEVELOPMENT.md. Overhaul CONTRIBUTING.md to reflect agent-assisted development as the primary workflow. Move skills and sub-agents from .claude/ to .agents/ as the tool-agnostic home, with symlinks back for Claude Code compatibility. Add architecture/ skeleton with 10 stub files for incremental population. Implements PR 1 of #427. Made-with: Cursor

The new-sdg skill is superseded by skills/data-designer/, which is the proper usage skill for building datasets. Update .agents/README.md to reference the usage skill's actual location. Made-with: Cursor

Add docstring conventions (Google style), Pydantic/dataclass guidance, error handling patterns, and f-string preference to STYLEGUIDE.md. Clarify per-package test targets, flat test style, e2e API key requirement, notebook regeneration commands, and import perf threshold in DEVELOPMENT.md. Point dataset-building agents to the data-designer skill in AGENTS.md and clarify dependency direction arrows. Made-with: Cursor

Made-with: Cursor

greptile-apps · 2026-03-24T00:15:50Z

Greptile Summary

This PR restructures DataDesigner's developer documentation by splitting a bloated 626-line AGENTS.md into three focused files (AGENTS.md at ~56 lines, STYLEGUIDE.md, DEVELOPMENT.md), adding a tool-agnostic .agents/ directory with symlinks from .claude/ for backward compatibility, creating 10 architecture stub documents for Phase 3 content, and overhauling CONTRIBUTING.md to reflect an agent-assisted contribution workflow.

AGENTS.md is cleanly trimmed to only architectural invariants, layering, and development pointers — the information an agent needs on every interaction, without volatility from style rules or key-file lists
STYLEGUIDE.md expands on the old style sections with new coverage of Google-style docstrings, Pydantic/dataclass guidance, and error handling; the previously flagged SIM-enforcement inconsistency was fixed in eb5315b
DEVELOPMENT.md consolidates setup, per-package test targets, notebook commands, and the 3-second import-performance CI threshold in one place
CONTRIBUTING.md copyright year was corrected in eb5315b; the new agent-first workflow is clearly explained and the create-pr/review-code skills are surfaced appropriately
Symlinks (.claude/agents → ../.agents/agents, .claude/skills → ../.agents/skills) are well-formed and their relative targets resolve correctly
Architecture stubs are all clearly marked as placeholders; cross-references between stubs are accurate
review-code skill is correctly updated to load all three new docs in Step 2

Confidence Score: 5/5

Pure documentation restructuring with no code changes; both previously flagged issues were resolved in the prior commit eb5315b.
All changed files are documentation and agent skill/config files — no production code is touched. The restructuring is logically consistent: content in AGENTS.md, STYLEGUIDE.md, and DEVELOPMENT.md aligns across all three files; symlink targets are correct; architecture stubs are clearly marked as placeholders. Both prior review findings (SIM enforcement wording and copyright year discrepancy) were addressed in eb5315b before this review. No new issues were identified.
No files require special attention.

Important Files Changed

Filename	Overview
AGENTS.md	Rewritten from ~626 lines to ~56 lines. Accurately describes layering, dependency direction, core concepts, and structural invariants. Correctly references STYLEGUIDE.md and DEVELOPMENT.md for details. No issues found.
STYLEGUIDE.md	New comprehensive style guide extracted from old AGENTS.md with added sections on docstrings, Pydantic/dataclass patterns, and error handling. The previously flagged SIM enforcement note was corrected in `eb5315b`. Active linter rules section accurately reflects pyproject.toml. No issues found.
DEVELOPMENT.md	New development guide covering prerequisites, setup, workflow, testing patterns, pre-commit hooks, and import performance thresholds. Content is accurate and well-organized. No issues found.
CONTRIBUTING.md	Overhauled to focus on agent-assisted contribution workflow. Previously flagged copyright year inconsistency was corrected in `eb5315b`. Streamlined significantly from the previous ~236 lines. Correctly links to issue templates and new doc structure.
.agents/README.md	New file clearly documents the .agents/ directory structure, symlink targets, and usage scope (development vs. end-user). Accurate and concise.
.agents/skills/review-code/SKILL.md	Updated Step 2 to load all three new docs (AGENTS.md, STYLEGUIDE.md, DEVELOPMENT.md) rather than the single old AGENTS.md. The split is logical and accurate. No issues found.
.claude/agents	Symlink pointing to ../.agents/agents. Relative target resolves correctly from .claude/ to the repo root and into .agents/agents/.
.claude/skills	Symlink pointing to ../.agents/skills. Relative target resolves correctly from .claude/ to the repo root and into .agents/skills/.
architecture/overview.md	Correctly marked as a stub with a prominent notice and placeholder sections. Cross-references to sibling architecture docs are accurate.
plans/427/agent-first-development-plan.md	Updated delivery strategy to combine Phases 0–2 into this PR. Content is consistent with what was actually delivered. No issues found.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    CLAUDE["CLAUDE.md\n(@AGENTS.md)"] --> AGENTS["AGENTS.md\nArchitecture · Layering\nCore principles · Invariants"]
    AGENTS --> STYLE["STYLEGUIDE.md\nFormatting · Naming · Types\nImports · Docstrings\nPydantic · Error handling"]
    AGENTS --> DEV["DEVELOPMENT.md\nSetup · Workflow · Testing\nPre-commit · Import perf"]
    AGENTS --> ARCH["architecture/\n10 stubs (Phase 3)"]
    CONTRIB["CONTRIBUTING.md\nAgent-assisted workflow\nIssues · PRs · Reviews"] --> DEV
    SKILL["review-code SKILL.md\n(Step 2: load all three)"] --> AGENTS
    SKILL --> STYLE
    SKILL --> DEV
    subgraph ".agents/ (canonical)"
        AG2["agents/\ndocs-searcher\ngithub-searcher"]
        SK2["skills/\ncommit · create-pr\nreview-code · update-pr\nsearch-docs · search-github"]
    end
    subgraph ".claude/ (symlinks)"
        CLA["agents → ../.agents/agents"]
        CLS["skills → ../.agents/skills"]
    end
    CLA -.->|symlink| AG2
    CLS -.->|symlink| SK2

_{Reviews (5): Last reviewed commit: "Merge branch 'main' into nmulepati/docs/..." | Re-trigger Greptile}

STYLEGUIDE.md

CONTRIBUTING.md

Add plan document step, self-review with multi-model passes, automated CI review expectations, and comment resolution protocol. Made-with: Cursor

Move architecture doc population from deferred/incremental to PR 2 since the subsystems already exist. Update plan delivery strategy, execution order, and out-of-scope sections accordingly. Made-with: Cursor

…ibuting Replace pd.DataFrame with list[dict[str, str]] in naming example to avoid contradicting lazy-import guidance in the same file. Soften "enforced by SIM" to note SIM rules are not yet enabled in CI. Fix upstream sync instructions for fork-based contributors. Update copyright year in CONTRIBUTING.md from 2025 to 2026 to match STYLEGUIDE.md. Made-with: Cursor

nabinchha · 2026-03-24T16:40:16Z

All findings addressed in eb5315b:

STYLEGUIDE.md naming example — replaced pd.DataFrame return type with list[dict[str, str]] to avoid contradicting the lazy-import guidance in the same file.
DEVELOPMENT.md upstream sync — added git remote add upstream instruction for fork-based contributors and changed fetch/merge to use upstream instead of origin.
STYLEGUIDE.md SIM claim — softened to `SIM` rules; not yet enforced by CI but code should comply.
CONTRIBUTING.md copyright year — updated from 2025 to 2026 to match STYLEGUIDE.md.

nabinchha added 3 commits March 23, 2026 17:22

remove obsolete new-sdg skill

5d77717

The new-sdg skill is superseded by skills/data-designer/, which is the proper usage skill for building datasets. Update .agents/README.md to reference the usage skill's actual location. Made-with: Cursor

nabinchha requested a review from a team as a code owner March 24, 2026 00:11

docs: link AGENTS.md to architecture/ directory

c415e15

Made-with: Cursor

greptile-apps bot reviewed Mar 24, 2026

View reviewed changes

STYLEGUIDE.md Outdated Show resolved Hide resolved

CONTRIBUTING.md Outdated Show resolved Hide resolved

nabinchha added 4 commits March 23, 2026 18:24

docs: refine CONTRIBUTING.md contribution workflow

2254bda

Add plan document step, self-review with multi-model passes, automated CI review expectations, and comment resolution protocol. Made-with: Cursor

docs: add architecture/ to PR 2 scope and link from AGENTS.md

05a64f4

Move architecture doc population from deferred/incremental to PR 2 since the subsystems already exist. Update plan delivery strategy, execution order, and out-of-scope sections accordingly. Made-with: Cursor

Merge branch 'main' into nmulepati/docs/427-agent-first-dev-pr-1

979f151

Merge branch 'main' into nmulepati/docs/427-agent-first-dev-pr-1

1584f50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: restructure agent and contributor documentation (plan 427, PR 1)#454

docs: restructure agent and contributor documentation (plan 427, PR 1)#454
nabinchha wants to merge 9 commits intomainfrom
nmulepati/docs/427-agent-first-dev-pr-1

nabinchha commented Mar 24, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Mar 24, 2026 •

edited

Loading

Confidence Score: 5/5

Flowchart

Uh oh!

Uh oh!

Uh oh!

nabinchha commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nabinchha commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📋 Summary

🔄 Changes

✨ Added

🔧 Changed

🗑️ Removed

🏗️ Restructured

🔍 Attention Areas

Uh oh!

greptile-apps bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Uh oh!

nabinchha commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nabinchha commented Mar 24, 2026 •

edited

Loading

greptile-apps bot commented Mar 24, 2026 •

edited

Loading