Skip to content

docs: restructure agent and contributor documentation (plan 427, PR 1)#454

Open
nabinchha wants to merge 9 commits intomainfrom
nmulepati/docs/427-agent-first-dev-pr-1
Open

docs: restructure agent and contributor documentation (plan 427, PR 1)#454
nabinchha wants to merge 9 commits intomainfrom
nmulepati/docs/427-agent-first-dev-pr-1

Conversation

@nabinchha
Copy link
Contributor

@nabinchha nabinchha commented Mar 24, 2026

📋 Summary

Restructure DataDesigner's documentation to clearly separate concerns: a concise architectural guide for agents (AGENTS.md), a comprehensive code style reference (STYLEGUIDE.md), a contributor-focused development guide (DEVELOPMENT.md), and an updated contribution guide (CONTRIBUTING.md). This is PR 1 of the agent-assisted development plan (plan #427), covering Phases 0–2.

🔄 Changes

✨ Added

  • STYLEGUIDE.md — New comprehensive code style guide extracted from the old AGENTS.md, expanded with:
    • Google-style docstring conventions (Args:, Returns:, Raises:, Attributes:)
    • Pydantic model and dataclass guidance (when to use each, ConfigBase patterns, validator naming)
    • Error handling patterns (raise ... from exc, boundary wrapping, canonical error types)
    • f-string preference, nested function avoidance, and other style rules
  • DEVELOPMENT.md — New development guide extracted from the old AGENTS.md and CONTRIBUTING.md, including:
    • Per-package test targets (make test-config, make test-engine, make test-interface)
    • E2E/tutorial/recipe test commands with API key setup note
    • Flat test function preference (no class-based suites)
    • Notebook regeneration commands (make convert-execute-notebooks, make generate-colab-notebooks)
    • Import performance CI threshold documentation (3-second average)
  • .agents/README.md — Documents the .agents/ directory structure, symlink compatibility, and development-vs-usage scope
  • architecture/ — 10 stub architecture documents (overview.md, config.md, engine.md, models.md, mcp.md, dataset-builders.md, sampling.md, cli.md, agent-introspection.md, plugins.md) ready for Phase 3 content

🔧 Changed

  • AGENTS.md — Rewritten from ~626 lines to ~56 lines as a focused architectural guide: identity, layering, core concepts, design principles, structural invariants, and development pointers
  • CONTRIBUTING.md — Overhauled to focus on agent-assisted contribution workflow, referencing the new doc structure
  • README.md — Added brief mention of agent-assisted development
  • .agents/skills/review-code/SKILL.md — Updated to reference the new three-file doc structure (AGENTS.md, STYLEGUIDE.md, DEVELOPMENT.md)
  • plans/427/agent-first-development-plan.md — Updated delivery strategy to combine Phases 0–2 into PR 1

🗑️ Removed

🏗️ Restructured

  • .claude/skills/ and .claude/agents/ — Canonical files moved to .agents/skills/ and .agents/agents/; .claude/ directories replaced with symlinks for backward compatibility

🔍 Attention Areas

⚠️ Reviewers: Please pay special attention to the following:

  • AGENTS.md — This is the primary file agents read on every interaction. Verify the layering description, dependency direction, and structural invariants are accurate.
  • .claude/agents and .claude/skills — These are now symlinks to .agents/. Verify they resolve correctly in your local checkout.
  • STYLEGUIDE.md — New sections on docstrings, Pydantic/dataclass conventions, and error handling should be validated against current codebase patterns.

🤖 Generated with AI

PR-1 for #427

Restructure AGENTS.md from ~627 lines to ~55 lines of high-signal
architectural invariants. Extract code style into STYLEGUIDE.md and
development workflow into DEVELOPMENT.md. Overhaul CONTRIBUTING.md
to reflect agent-assisted development as the primary workflow.

Move skills and sub-agents from .claude/ to .agents/ as the
tool-agnostic home, with symlinks back for Claude Code compatibility.
Add architecture/ skeleton with 10 stub files for incremental
population.

Implements PR 1 of #427.

Made-with: Cursor
The new-sdg skill is superseded by skills/data-designer/, which is the
proper usage skill for building datasets. Update .agents/README.md to
reference the usage skill's actual location.

Made-with: Cursor
Add docstring conventions (Google style), Pydantic/dataclass guidance,
error handling patterns, and f-string preference to STYLEGUIDE.md.

Clarify per-package test targets, flat test style, e2e API key
requirement, notebook regeneration commands, and import perf threshold
in DEVELOPMENT.md.

Point dataset-building agents to the data-designer skill in AGENTS.md
and clarify dependency direction arrows.

Made-with: Cursor
@nabinchha nabinchha requested a review from a team as a code owner March 24, 2026 00:11
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 24, 2026

Greptile Summary

This PR restructures DataDesigner's developer documentation by splitting a bloated 626-line AGENTS.md into three focused files (AGENTS.md at ~56 lines, STYLEGUIDE.md, DEVELOPMENT.md), adding a tool-agnostic .agents/ directory with symlinks from .claude/ for backward compatibility, creating 10 architecture stub documents for Phase 3 content, and overhauling CONTRIBUTING.md to reflect an agent-assisted contribution workflow.

  • AGENTS.md is cleanly trimmed to only architectural invariants, layering, and development pointers — the information an agent needs on every interaction, without volatility from style rules or key-file lists
  • STYLEGUIDE.md expands on the old style sections with new coverage of Google-style docstrings, Pydantic/dataclass guidance, and error handling; the previously flagged SIM-enforcement inconsistency was fixed in eb5315b
  • DEVELOPMENT.md consolidates setup, per-package test targets, notebook commands, and the 3-second import-performance CI threshold in one place
  • CONTRIBUTING.md copyright year was corrected in eb5315b; the new agent-first workflow is clearly explained and the create-pr/review-code skills are surfaced appropriately
  • Symlinks (.claude/agents → ../.agents/agents, .claude/skills → ../.agents/skills) are well-formed and their relative targets resolve correctly
  • Architecture stubs are all clearly marked as placeholders; cross-references between stubs are accurate
  • review-code skill is correctly updated to load all three new docs in Step 2

Confidence Score: 5/5

  • Pure documentation restructuring with no code changes; both previously flagged issues were resolved in the prior commit eb5315b.
  • All changed files are documentation and agent skill/config files — no production code is touched. The restructuring is logically consistent: content in AGENTS.md, STYLEGUIDE.md, and DEVELOPMENT.md aligns across all three files; symlink targets are correct; architecture stubs are clearly marked as placeholders. Both prior review findings (SIM enforcement wording and copyright year discrepancy) were addressed in eb5315b before this review. No new issues were identified.
  • No files require special attention.

Important Files Changed

Filename Overview
AGENTS.md Rewritten from ~626 lines to ~56 lines. Accurately describes layering, dependency direction, core concepts, and structural invariants. Correctly references STYLEGUIDE.md and DEVELOPMENT.md for details. No issues found.
STYLEGUIDE.md New comprehensive style guide extracted from old AGENTS.md with added sections on docstrings, Pydantic/dataclass patterns, and error handling. The previously flagged SIM enforcement note was corrected in eb5315b. Active linter rules section accurately reflects pyproject.toml. No issues found.
DEVELOPMENT.md New development guide covering prerequisites, setup, workflow, testing patterns, pre-commit hooks, and import performance thresholds. Content is accurate and well-organized. No issues found.
CONTRIBUTING.md Overhauled to focus on agent-assisted contribution workflow. Previously flagged copyright year inconsistency was corrected in eb5315b. Streamlined significantly from the previous ~236 lines. Correctly links to issue templates and new doc structure.
.agents/README.md New file clearly documents the .agents/ directory structure, symlink targets, and usage scope (development vs. end-user). Accurate and concise.
.agents/skills/review-code/SKILL.md Updated Step 2 to load all three new docs (AGENTS.md, STYLEGUIDE.md, DEVELOPMENT.md) rather than the single old AGENTS.md. The split is logical and accurate. No issues found.
.claude/agents Symlink pointing to ../.agents/agents. Relative target resolves correctly from .claude/ to the repo root and into .agents/agents/.
.claude/skills Symlink pointing to ../.agents/skills. Relative target resolves correctly from .claude/ to the repo root and into .agents/skills/.
architecture/overview.md Correctly marked as a stub with a prominent notice and placeholder sections. Cross-references to sibling architecture docs are accurate.
plans/427/agent-first-development-plan.md Updated delivery strategy to combine Phases 0–2 into this PR. Content is consistent with what was actually delivered. No issues found.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    CLAUDE["CLAUDE.md\n(@AGENTS.md)"] --> AGENTS["AGENTS.md\nArchitecture · Layering\nCore principles · Invariants"]
    AGENTS --> STYLE["STYLEGUIDE.md\nFormatting · Naming · Types\nImports · Docstrings\nPydantic · Error handling"]
    AGENTS --> DEV["DEVELOPMENT.md\nSetup · Workflow · Testing\nPre-commit · Import perf"]
    AGENTS --> ARCH["architecture/\n10 stubs (Phase 3)"]
    CONTRIB["CONTRIBUTING.md\nAgent-assisted workflow\nIssues · PRs · Reviews"] --> DEV
    SKILL["review-code SKILL.md\n(Step 2: load all three)"] --> AGENTS
    SKILL --> STYLE
    SKILL --> DEV
    subgraph ".agents/ (canonical)"
        AG2["agents/\ndocs-searcher\ngithub-searcher"]
        SK2["skills/\ncommit · create-pr\nreview-code · update-pr\nsearch-docs · search-github"]
    end
    subgraph ".claude/ (symlinks)"
        CLA["agents → ../.agents/agents"]
        CLS["skills → ../.agents/skills"]
    end
    CLA -.->|symlink| AG2
    CLS -.->|symlink| SK2
Loading

Reviews (5): Last reviewed commit: "Merge branch 'main' into nmulepati/docs/..." | Re-trigger Greptile

Add plan document step, self-review with multi-model passes,
automated CI review expectations, and comment resolution protocol.

Made-with: Cursor
Move architecture doc population from deferred/incremental to PR 2
since the subsystems already exist. Update plan delivery strategy,
execution order, and out-of-scope sections accordingly.

Made-with: Cursor
…ibuting

Replace pd.DataFrame with list[dict[str, str]] in naming example to
avoid contradicting lazy-import guidance in the same file. Soften
"enforced by SIM" to note SIM rules are not yet enabled in CI. Fix
upstream sync instructions for fork-based contributors. Update
copyright year in CONTRIBUTING.md from 2025 to 2026 to match
STYLEGUIDE.md.

Made-with: Cursor
@nabinchha
Copy link
Contributor Author

All findings addressed in eb5315b:

  • STYLEGUIDE.md naming example — replaced pd.DataFrame return type with list[dict[str, str]] to avoid contradicting the lazy-import guidance in the same file.
  • DEVELOPMENT.md upstream sync — added git remote add upstream instruction for fork-based contributors and changed fetch/merge to use upstream instead of origin.
  • STYLEGUIDE.md SIM claim — softened to `SIM` rules; not yet enforced by CI but code should comply.
  • CONTRIBUTING.md copyright year — updated from 2025 to 2026 to match STYLEGUIDE.md.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant