Skip to content

docs: add AGENTS.md, STYLEGUIDE.md, agent-assisted contribution (#114)#149

Merged
lipikaramaswamy merged 7 commits into
mainfrom
lipikaramaswamy/docs/114-agents-styleguide
May 13, 2026
Merged

docs: add AGENTS.md, STYLEGUIDE.md, agent-assisted contribution (#114)#149
lipikaramaswamy merged 7 commits into
mainfrom
lipikaramaswamy/docs/114-agents-styleguide

Conversation

@lipikaramaswamy
Copy link
Copy Markdown
Collaborator

Summary

  • Add AGENTS.md β€” architecture overview, pipeline diagrams, and structural invariants for agents working in
    the codebase.
  • Add STYLEGUIDE.md β€” code conventions ruff and ty cannot enforce (Pydantic vs dataclass, error handling,
    column-name constants, prompt construction).
  • Add CLAUDE.md β€” 3-line redirect to AGENTS.md so Claude Code picks it up.
  • Add an "Agent-Assisted Development" subsection to CONTRIBUTING.md establishing the
    plans/<issue-number>/<short-name>.md convention for non-trivial changes.
  • Ignore .claude/worktrees/ (Claude Code session worktrees).

Filed #148 as a follow-up for three pre-existing column-name string-literal violations in
llm_replace_workflow.py:53-55 that the new STYLEGUIDE rule covers.

Closes #114.

Test plan

  • CI passes (format-check, mkdocs build --strict)
  • Reviewer can navigate the AGENTS.md ↔ STYLEGUIDE.md ↔ CONTRIBUTING.md cross-links
  • git check-ignore .claude/worktrees/foo returns the path (worktrees properly ignored)

lipikaramaswamy and others added 2 commits May 8, 2026 16:56
Local Claude Code session worktrees under .claude/worktrees/ shouldn't
be tracked. Matches the convention already in the DataDesigner repo.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…orkflow (#114)

- AGENTS.md: architecture overview, pipeline diagrams, structural invariants
- STYLEGUIDE.md: code conventions ruff and ty cannot enforce
- CLAUDE.md: 3-line redirect to AGENTS.md
- CONTRIBUTING.md: new Agent-Assisted Development subsection establishing the
  plans/<issue-number>/<short-name>.md convention for non-trivial changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: lipikaramaswamy <lramaswamy@nvidia.com>
@lipikaramaswamy lipikaramaswamy requested a review from a team as a code owner May 9, 2026 01:03
@lipikaramaswamy lipikaramaswamy changed the title docs: add AGENTS.md, STYLEGUIDE.md, and agent-assisted contribution workflow (#114) docs: add AGENTS.md, STYLEGUIDE.md, agent-assisted contribution (#114) May 9, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 9, 2026

Greptile Summary

This PR adds developer and agent-oriented documentation (AGENTS.md, STYLEGUIDE.md, CLAUDE.md) and an "Agent-Assisted Development" subsection to CONTRIBUTING.md, establishing the plans/<issue-number>/<short-name>.md convention for non-trivial changes. It also expands docstrings on five engine/config functions and adds .claude/worktrees/ to .gitignore.

  • AGENTS.md provides a module map, pipeline diagrams, core-concept glossary, structural invariants, and development commands, all consistent with the actual codebase structure.
  • STYLEGUIDE.md captures conventions that linters cannot enforce: Pydantic vs. dataclass choice, error handling patterns, column-name constant usage, prompt construction, import style, and design principles.
  • Python file changes are purely docstring improvements β€” expanded explanations, cross-references, and conversion from RST double-backtick style to Markdown single-backtick style; no logic is altered.

Confidence Score: 5/5

Safe to merge β€” all changes are documentation and docstring improvements with no modifications to runtime logic.

Every Python change in this PR is a docstring expansion or style update (RST β†’ Markdown backticks). No control flow, data handling, or public API signatures are touched. The new Markdown files accurately reflect the codebase structure and cross-link correctly. The .gitignore addition is a single, scoped exclusion for Claude Code session worktrees.

No files require special attention.

Important Files Changed

Filename Overview
.gitignore Adds .claude/worktrees/ exclusion so Claude Code session worktrees are not tracked by git.
AGENTS.md New agent-oriented architecture reference: module map, pipeline ASCII diagrams, core concepts, structural invariants, and dev commands β€” accurate and consistent with the source code.
CLAUDE.md Thin vendor adapter file that redirects Claude Code to AGENTS.md via both a Markdown link and the @AGENTS.md include directive.
CONTRIBUTING.md Adds 'Agent-Assisted Development' subsection establishing the plans/<issue-number>/<short-name>.md plan-file convention and cross-links to AGENTS.md and STYLEGUIDE.md.
STYLEGUIDE.md New style guide covering Pydantic vs dataclasses, error handling, column-name constants, prompt construction, import style, type annotations, naming, comments, future annotations, and design principles.
src/anonymizer/config/anonymizer_config.py Expanded the Rewrite.evaluation property docstring with the sync contract and guidance on when to use the property vs. construct EvaluationCriteria directly.
src/anonymizer/config/rewrite.py Docstring style update on EvaluationCriteria: double-backtick RST formatting replaced with single-backtick Markdown to match the Google-style guide.
src/anonymizer/engine/constants.py Expanded _jinja() docstring to clarify when to use the helper (shared DataFrame column refs) vs. when not to (local Jinja loop variables).
src/anonymizer/engine/ndd/adapter.py Expanded NddAdapter.run_workflow() docstring with full Args/Returns sections and the engine boundary contract (sub-workflows must not call DataDesigner.create() / .preview() directly).
src/anonymizer/engine/prompt_utils.py Updated substitute_placeholders() docstring to clarify its role (dynamic prompt values, not DataFrame column refs) and converted RST backticks to Markdown style.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Developer or Agent] -->|reads| B[AGENTS.md\nArchitecture and invariants]
    A -->|reads| C[STYLEGUIDE.md\nCode conventions]
    A -->|reads| D[CONTRIBUTING.md\nWorkflow and PR process]

    B -->|cross-links| C
    B -->|cross-links| D
    C -->|cross-links| B
    C -->|cross-links| D

    D -->|new section| E[Agent-Assisted Development]
    E --> F{Non-trivial change?}
    F -->|yes| G[plans/issue/name.md\nDraft plan PR first]
    F -->|no| H[Implement directly]
    G -->|approved| H
    H -->|follows| B
    H -->|follows| C

    I[CLAUDE.md] --> B
Loading

Reviews (7): Last reviewed commit: "docs: add Agent compatibility section to..." | Re-trigger Greptile

Comment thread STYLEGUIDE.md Outdated
Comment thread STYLEGUIDE.md Outdated
lipikaramaswamy and others added 2 commits May 8, 2026 18:07
Required by the copyright-check CI step (tools/codestyle/copyright_fixer.py).
Matches the HTML-comment header format used by docs/concepts/*.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: lipikaramaswamy <lramaswamy@nvidia.com>
Address greptile-apps review feedback on PR #149:

- Rewrite the TYPE_CHECKING paragraph to explicitly split type-hint-only
  imports (TYPE_CHECKING block) from runtime use (top-level), and call
  out the NameError failure mode. Avoids the misread where an agent
  could wrap all heavy-library imports in TYPE_CHECKING.
- Use "DataDesigner" instead of "NeMo Data Designer" so the display
  name matches AGENTS.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: lipikaramaswamy <lramaswamy@nvidia.com>
Comment thread CONTRIBUTING.md
Copy link
Copy Markdown
Collaborator

@binaryaaron binaryaaron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying something out- here's my feedback in prompt format.


Prompt for PR author

Please revise the agent/developer documentation in this branch with one main goal: make the agent instructions tool-neutral, cross-agent compatible, and less likely to drift from the code.

The current direction is strong, but AGENTS.md, CLAUDE.md, and STYLEGUIDE.md blur a few boundaries:

  • AGENTS.md mixes a canonical agent entry point with detailed implementation docs and style rules.
  • CLAUDE.md is a valid Claude Code shim, but non-Claude readers only see @AGENTS.md.
  • STYLEGUIDE.md should be the source of truth for review-enforced coding conventions, but some of those conventions are duplicated in AGENTS.md.
  • A few rules are phrased as absolute when the current code has local exceptions.

Please update the docs along these lines.

1. Make AGENTS.md the canonical tool-neutral entry point

Keep AGENTS.md focused on:

  • who the file is for
  • where product users should go instead
  • the high-level package map
  • the architectural guardrails agents must not violate
  • links to STYLEGUIDE.md, CONTRIBUTING.md, and deeper developer docs

Avoid vendor-specific syntax or IDE-specific assumptions in AGENTS.md. Do not put Claude @... includes, Cursor rule paths, MCP names, slash commands, or agent-product instructions there. Adapter files such as CLAUDE.md can carry those details.

Suggested addition near the top:

## Agent compatibility

`AGENTS.md` is the canonical instruction file for coding agents working in this repository. Keep it tool-neutral:

- Use plain Markdown and repository-relative links.
- Do not rely on vendor-specific include syntax, slash commands, MCP names, or IDE-only behavior.
- Put tool-specific adapter instructions in thin wrapper files such as `CLAUDE.md`.
- If an adapter file and this file disagree, this file wins.

2. Fix the module map wording

Current wording says the package has β€œthree modules.” That is too narrow. The code has three primary subpackages plus top-level public logging utilities.

Please change the wording to something like:

`nemo-anonymizer` is a single package with three primary subpackages plus top-level public utilities:

Then keep the existing anonymizer.config, anonymizer.engine, and anonymizer.interface bullets, and add a compact logging bullet:

- **`anonymizer.logging`** β€” public logging configuration (`LoggingConfig`, `configure_logging`) used by the API, CLI, and examples.

3. Make CLAUDE.md a readable shim

Keep the Claude Code include line, but add a human-readable fallback so GitHub reviewers and non-Claude tools understand the file.

Suggested replacement:

# Claude Code instructions

Canonical agent instructions live in [AGENTS.md](AGENTS.md).

@AGENTS.md

Do not duplicate the full agent instructions in CLAUDE.md.

4. Move detailed implementation material out of AGENTS.md

AGENTS.md should not be the only source for detailed pipeline contracts. Move or reduce these sections:

  • Pipeline diagrams: move the full replace/rewrite diagrams to docs/concepts/replace.md, docs/concepts/rewrite.md, or a new developer architecture page. In AGENTS.md, keep only a short pointer.
  • NddAdapter.run_workflow() behavior: document the detailed contract in the method/class docstring.
  • Rewrite.evaluation and EvaluationCriteria sync behavior: document the contract on the Rewrite.evaluation property.
  • Prompt helper behavior: keep the formal contract in _jinja() / substitute_placeholders() docstrings and the style rule in STYLEGUIDE.md.
  • Future imports, type annotations, SPDX headers, prompt construction, and column-name conventions: keep these in STYLEGUIDE.md; in AGENTS.md, link to the style guide rather than repeating the rules.

Good target shape: AGENTS.md as a short index and guardrail file, not a full architecture document.

5. Narrow over-broad rules

Several current statements are directionally right but too absolute for the existing code. Please calibrate them so agents do not churn valid code or infer false invariants.

NDD adapter wording

Current claim: NddAdapter is the only place the DataDesigner dependency crosses.

Issue: DataDesigner config types and decorators appear outside the adapter, and Anonymizer constructs DataDesigner.

Better wording:

`NddAdapter.run_workflow()` is the engine boundary for executing DataDesigner workflows. Engine workflows declare DataDesigner column configs, but they do not call `DataDesigner.create()` or `preview()` directly.

EvaluationCriteria wording

Current claim: never construct EvaluationCriteria.

Issue: tests and engine-level code construct or pass it directly. The production risk is manually duplicating the Rewrite to EvaluationCriteria mapping.

Better wording:

When production code starts from the user-facing `Rewrite` config, pass `Rewrite.evaluation` into the engine. Do not manually duplicate the `Rewrite` to `EvaluationCriteria` mapping.

Column-name constants

Current claim: all column names must be constants.

Issue: LlmReplaceWorkflow currently uses local scratch columns such as "_entity_examples" and "_entities_for_replace".

Either promote those scratch columns to COL_* constants, or narrow the rule:

Shared pipeline columns, trace columns, public output columns, NDD column names, prompt column references, and merge/join keys must use `COL_*` constants from `engine/constants.py`. Local scratch columns may be literal strings only when they do not cross workflow boundaries.

Prompt _jinja() usage

Current claim: all column references in NDD prompt templates go through _jinja().

Issue: local Jinja loop variables and local scratch prompt variables are not the same as shared DataFrame column references.

Better wording:

Shared DataFrame column references in NDD prompt templates should use `_jinja(COL_*)`. Local Jinja loop variables and explicitly local scratch prompt variables may remain local, but do not hardcode shared column names into prompt strings.

assert wording

If STYLEGUIDE.md says β€œNo assert for validation,” clarify that this applies to production/library validation. Pytest assertions in tests are fine.

Suggested wording:

In production/library code, do not use `assert` for validation. Pytest assertions in tests are fine.

6. Clarify STYLEGUIDE.md

Please make STYLEGUIDE.md the canonical home for review-enforced conventions.

Suggested intro adjustment:

Code and documentation conventions for NeMo Anonymizer that ruff and ty cannot enforce. Architecture boundaries and agent workflow rules live in [AGENTS.md](AGENTS.md).

Specific clarifications to consider:

  • In β€œPydantic vs Dataclasses,” say dataclasses are for private/internal frozen value objects, including private config helpers such as _RiskToleranceBundle, not only engine containers.
  • In β€œError Handling,” point the final-judge exception example at RewriteWorkflow._run_final_judge, since the broad catch lives in rewrite_workflow.py.
  • In β€œColumn Names,” narrow the absolute rule or promote local scratch columns to constants.
  • In β€œPrompt Construction,” distinguish shared DataFrame column references from local Jinja variables.
  • In β€œCode Organization,” consider β€œPrefer public functions before private helpers when adding new symbols” instead of an absolute rule, unless the repo already satisfies it everywhere.

7. Keep CONTRIBUTING.md vendor-neutral

The current β€œClaude Code, Cursor, Codex” list is understandable but may age poorly. Consider neutral wording:

When automating edits with coding agents (IDE assistants, CLI tools, or hosted models), follow the standard Pull Request Process plus these additions:

Also consider replacing β€œThe agent should read...” with language that applies to both humans and automated tools:

Implementers, human or automated, should read `AGENTS.md` and `STYLEGUIDE.md` before non-trivial changes.

Verification notes

These recommendations were checked against the current source:

  • Rewrite and Rewrite.evaluation live in src/anonymizer/config/anonymizer_config.py.
  • EvaluationCriteria, RiskTolerance, and _RiskToleranceBundle live in src/anonymizer/config/rewrite.py.
  • NddAdapter lives in src/anonymizer/engine/ndd/adapter.py.
  • _jinja() lives in src/anonymizer/engine/constants.py.
  • substitute_placeholders() lives in src/anonymizer/engine/prompt_utils.py.
  • configure_logging and LoggingConfig are public exports from anonymizer.
  • Current code has local scratch DataFrame columns in LlmReplaceWorkflow, so the column constant rule needs either a code change or narrower wording.

The main outcome should be a cleaner split:

AGENTS.md      = tool-neutral orientation and guardrails
CLAUDE.md      = Claude-specific adapter shim
STYLEGUIDE.md  = review-enforced code/documentation conventions
code docstrings = callable contracts and implementation-local invariants
developer docs = pipeline diagrams, concepts, and longer architecture explanations

Comment thread AGENTS.md
Comment thread STYLEGUIDE.md
lipikaramaswamy and others added 2 commits May 12, 2026 01:09
Signed-off-by: lipikaramaswamy <lramaswamy@nvidia.com>
- AGENTS.md: add anonymizer.logging to module map; narrow NDD adapter
  and EvaluationCriteria wording (used to claim too much); clarify
  _jinja scope (DataFrame columns vs local Jinja loop variables);
  shrink NDD Adapter, Config Pattern, Prompt Conventions, and
  Structural Invariants sections to pointers (contracts now live in
  docstrings or STYLEGUIDE.md).
- CLAUDE.md: readable shim header so non-Claude readers understand
  the file's purpose.
- CONTRIBUTING.md: vendor-neutral language in the Agent-Assisted
  Development section (don't enumerate specific agents; "human or
  agentic" implementers).
- STYLEGUIDE.md: add License Headers section; clarify that the assert
  ban is for production code (pytest assertions in tests are fine);
  scope "public before private" to newly added symbols; reframe
  dataclasses for private/internal frozen value objects; point
  error-handling example at RewriteWorkflow._run_final_judge.
- Docstrings expanded with the contracts that previously lived in
  AGENTS.md: NddAdapter.run_workflow, Rewrite.evaluation, _jinja,
  substitute_placeholders. EvaluationCriteria docstring backticks
  normalized.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: lipikaramaswamy <lramaswamy@nvidia.com>
@lipikaramaswamy lipikaramaswamy requested a review from a team as a code owner May 12, 2026 05:10
Signed-off-by: lipikaramaswamy <lramaswamy@nvidia.com>
@lipikaramaswamy
Copy link
Copy Markdown
Collaborator Author

@binaryaaron Thanks, I went through all 7 themes.

  1. Agent compatibility section β€” applied (cf49c17).

  2. Module map fix β€” applied (e122b4f). Added anonymizer.logging
    and reframed leader as "three primary subpackages plus top-level
    public utilities".

  3. CLAUDE.md readable shim β€” applied (e122b4f).

  4. Move detail out of AGENTS.md:

    • Pipeline diagrams: kept in AGENTS.md. They reference engine
      internals (EntityDetectionWorkflow.run(), NddAdapter, COL_*),
      which are dev-facing β€” moving them to docs/concepts/ would
      conflate audiences there.
    • NddAdapter.run_workflow β†’ docstring (e122b4f)
    • Rewrite.evaluation β†’ property docstring (e122b4f)
    • _jinja / substitute_placeholders β†’ expanded docstrings
      (e122b4f)
    • Structural Invariants β†’ pointer to STYLEGUIDE.md, with a new
      License Headers section added there (e122b4f)
  5. Narrow over-broad rules:

  6. STYLEGUIDE.md clarifications β€” applied (e122b4f): intro scoped vs
    AGENTS.md; dataclasses framing for private/internal frozen value
    objects; error-handling example pointed at
    RewriteWorkflow._run_final_judge; "public before private"
    scoped to new symbols.

  7. Vendor-neutral CONTRIBUTING.md β€” applied (e122b4f): dropped the
    "Claude Code, Cursor, Codex" enumeration; "human or agentic"
    implementers.

@lipikaramaswamy lipikaramaswamy merged commit f6b8a57 into main May 13, 2026
11 checks passed
@lipikaramaswamy lipikaramaswamy deleted the lipikaramaswamy/docs/114-agents-styleguide branch May 13, 2026 01:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

chore: add AGENTS.md, CLAUDE.md, and STYLEGUIDE.md for agent-assisted development

3 participants