Add agent-discoverability contract test (#461) by igerber · Pull Request #467 · igerber/diff-diff

igerber · 2026-05-19T01:27:22Z

Summary

New tests/test_agent_discoverability.py — static snapshot test that pins the agent-facing surface introduced by PR Surface agent_workflow() + curated dir() for LLM discoverability (#460) #464 against future regression. 17 assertions across 11 test functions (12 standalone + 5 parametrize cells over the agent-facing entrypoint names). Snapshot-only: no live API calls, no subprocess, no live agents — runs in the default pytest suite.
Removes the __dir__() contract-test row from TODO.md that PR Surface agent_workflow() + curated dir() for LLM discoverability (#460) #464 deferred here.
Adds a CHANGELOG bullet describing the test surface.

What's locked by the new test file:

__all__ membership of the 5 agent-facing primitives (agent_workflow, profile_panel, get_llm_guide, practitioner_next_steps, BusinessReport)
dir(diff_diff) head matches _AGENT_FACING_ORDER in declared order — anchors to the tuple, not a slice length, so trims/expansions to the head are tracked
dir() tail stays alphabetic when keyed by str (recovery key for tooling that re-sorts)
dir() returns the full module namespace, not just __all__ — preserves __doc__/__name__/__file__ for inspect.getmembers consumers
_OrderedName invariants: isinstance(_, str), str methods (.upper, ==, hash for dict keys, in, f-string)
inspect.getmembers(diff_diff) parity with dir(diff_diff)
Top-level __doc__ first non-blank paragraph names agent_workflow; the full doc names all 4 downstream primitives
agent_workflow() output script references each canonical helper by name; every fit_candidates entry resolves on the diff_diff namespace
Canonical estimator class names (CallawaySantAnna, ChaisemartinDHaultfoeuille, ContinuousDiD, DifferenceInDifferences, HeterogeneousAdoptionDiD, HonestDiD, ImputationDiD, PreTrendsPower, SunAbraham, TwoWayFixedEffects, WooldridgeDiD) remain importable
Each agent-facing entrypoint stays callable

Closes #461 (snapshot variant). The live-agent regression test (spawning a cold-start Claude subprocess against a staggered-DiD task) remains a follow-up that depends on causal-llm-eval packaging its harness module — that path will be tracked as its own follow-up issue if not already.

Methodology references (required if estimator / math changes)

Method name(s): N/A — no estimator, math, variance, or default inference behavior changes.
Paper / source link(s): N/A
Any intentional deviations from the source (and why): None

Validation

Tests added/updated: tests/test_agent_discoverability.py (new, 17 tests). All pass locally; existing tests/test_agent_workflow.py (23) and tests/test_guides.py (41) remain green for a combined 81/81.
Backtest / simulation / notebook evidence (if applicable): N/A — pure in-process inspection.

Security / privacy

Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

New `tests/test_agent_discoverability.py` pins the agent-facing surface introduced by PR #464 against future regression. Snapshot/static assertions only — no live API calls, no subprocess, runs in the default pytest suite. What's locked: 1. `__all__` membership of agent_workflow / profile_panel / get_llm_guide / practitioner_next_steps / BusinessReport (catches export pruning). 2. `dir(diff_diff)` head-first ordering matches `_AGENT_FACING_ORDER` (catches drift in `_OrderedName.__lt__` or `__dir__()` regression). 3. `dir()` tail stays alphabetic when keyed by `str` (recovery key for downstream tooling that re-sorts). 4. `dir()` returns the FULL module namespace, not just `__all__` (preserves `__doc__` / `__name__` / `__file__` for `inspect.getmembers` consumers). 5. `_OrderedName` invariants: `isinstance(_, str)` holds, str methods work (upper, eq, hash, `in`, f-string). 6. Top-level `__doc__` first non-blank paragraph names `agent_workflow`; full doc text names the 4 downstream primitives. 7. `agent_workflow()` output script references each canonical helper by name; every `fit_candidates` entry resolves on the diff_diff namespace. 8. Canonical estimator class names (CallawaySantAnna, ChaisemartinDHaultfoeuille, ContinuousDiD, DifferenceInDifferences, HeterogeneousAdoptionDiD, HonestDiD, ImputationDiD, PreTrendsPower, SunAbraham, TwoWayFixedEffects, WooldridgeDiD) remain importable. 9. Each agent-facing entrypoint stays callable. 17 tests (12 standalone + 5 parametrize cells over the agent-facing entrypoint names). Closes #461 (snapshot variant). The live-agent regression test remains a follow-up that depends on causal-llm-eval packaging its harness module. Also closes the `__dir__()` contract-test row from PR #464's TODO.md (deferred there, landed here). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-19T01:32:38Z

Overall Assessment

✅ Looks good

No unmitigated P0/P1 findings. This PR only adds a contract test plus CHANGELOG.md/TODO.md updates; it does not change estimator code, identification logic, weighting, variance/SE computation, or defaults. I cross-checked the affected surface against the Methodology Registry and the relevant package/docstring contracts in diff_diff/__init__.py and diff_diff/agent_workflow.py.

Executive Summary

No methodology-governed implementation changed; the estimator/inference contracts in docs/methodology/REGISTRY.md:L40-L178 are not implicated by this diff.
P2: the new “full namespace + inspect.getmembers parity” claim is under-tested. The current parity assertion can miss dropped non-dunder names while the PR closes the deferred TODO and advertises the contract as pinned (tests/test_agent_discoverability.py:L118-L144, CHANGELOG.md:L11-L14).
P3: DiagnosticReport is part of the surfaced head order in _AGENT_FACING_ORDER, but the new explicit public-surface/callability assertions exclude it (diff_diff/__init__.py:L529-L536, tests/test_agent_discoverability.py:L39-L54, tests/test_agent_discoverability.py:L267-L288).
The changelog/TODO edits are otherwise consistent with the new test file.
I could not execute pytest in this environment: pytest is unavailable, and importing diff_diff fails here because numpy is not installed.

Methodology

No findings. The diff only touches tests/test_agent_discoverability.py, CHANGELOG.md, and TODO.md; no estimator/math/variance/default-behavior path covered by docs/methodology/REGISTRY.md:L40-L178 was modified.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings beyond the test-coverage gaps noted below.

Tech Debt

Severity: P2
Impact: The PR removes the deferred contract-test item and advertises “dir() full-namespace + inspect.getmembers parity” as covered, but test_getmembers_parity_with_default_module_dir() is not an independent membership check. In CPython, inspect.getmembers() derives its name list from dir(), so dir_names == gm_names at tests/test_agent_discoverability.py:L137-L140 can still pass if __dir__() drops non-head, non-dunder names. Because test_dir_returns_full_module_namespace() only spot-checks four dunders at tests/test_agent_discoverability.py:L126-L128, the full-namespace contract described in CHANGELOG.md:L11-L14 is not actually pinned.
Concrete fix: Add an explicit membership assertion against the real module namespace, e.g. compare set(map(str, dir(diff_diff))) to set(vars(diff_diff)) (or an explicit allowlisted equivalent), and keep the inspect.getmembers() check as a secondary accessibility/value check rather than the primary membership proof.

Security

No findings.

Documentation/Tests

Severity: P3
Impact: _AGENT_FACING_ORDER treats DiagnosticReport as part of the surfaced agent-facing head (diff_diff/__init__.py:L529-L536), and the changelog advertises it that way (CHANGELOG.md:L14), but the new explicit __all__ and callability checks only cover five names and omit DiagnosticReport (tests/test_agent_discoverability.py:L45-L54, tests/test_agent_discoverability.py:L267-L288). A future regression that leaves DiagnosticReport in dir(diff_diff) but de-exports it or makes it non-callable would not fail this new contract suite.
Concrete fix: Either add DiagnosticReport to the explicit public-surface assertions, or narrow the changelog/contract wording so the test is only claiming to pin the five workflow primitives.

Verification note: static review only; I could not run the new test file in this environment because pytest is missing and diff_diff cannot be imported without numpy.

…overage) CI codex flagged 2 actionable items on the initial push: P2 (Tech Debt): `test_getmembers_parity_with_default_module_dir()` was tautological. CPython's `inspect.getmembers()` derives its name list from `dir(obj)`, so the `dir_names == gm_names` check was always true regardless of how narrow `__dir__()` became. The "full namespace" contract advertised in the CHANGELOG was therefore not actually pinned. Fix: split into two tests with independent assertions: - `test_dir_returns_full_module_namespace` now compares `set(map(str, dir(diff_diff)))` against `set(vars(diff_diff))` — vars() reads the real underlying module dict and `__dir__` does NOT derive from it automatically, so this catches a regression where `__dir__` is reduced to `__all__`. - `test_getmembers_returns_accessible_values` retains the `getattr` + `__doc__` accessibility check as a secondary contract (every reported name must actually resolve to a value). P3 (Docs/Tests): `_AGENT_FACING_ORDER` includes `DiagnosticReport`, but the explicit `__all__` and callability tests covered only 5 of the 6 names. A regression that left `DiagnosticReport` in `dir()` but de-exported it would not have failed. Fix: anchor both assertions to `_AGENT_FACING_ORDER` itself rather than duplicating a hard-coded list: - `test_agent_facing_names_in_all` now derives `required` from `_AGENT_FACING_ORDER`. - `test_agent_facing_entrypoint_callable` parametrizes over `sorted(_AGENT_FACING_ORDER)` directly. Future head-name additions automatically extend coverage instead of needing two sources of truth. Tests: 17 → 18 (DiagnosticReport added to callable parametrize). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-19T01:39:34Z

🔁 AI review rerun (requested by @igerber)

Head SHA: da184114f986c93cec0874dd5b7b798f65b8302f

Overall Assessment

✅ Looks good

Executive Summary

Re-review: both prior findings are addressed. The new test_dir_returns_full_module_namespace() independently compares dir(diff_diff) to vars(diff_diff), removing the earlier tautological inspect.getmembers()-only check (tests/test_agent_discoverability.py:L113-L137).
Re-review: DiagnosticReport is now covered by the explicit public-surface assertions because both the __all__ and callability tests are anchored to _AGENT_FACING_ORDER, which includes it (tests/test_agent_discoverability.py:L39-L49, L278-L290; diff_diff/__init__.py:L507-L523, L529-L536).
No estimator, identification, weighting, variance/SE, or default-behavior code changed. The diff is limited to a new contract test plus CHANGELOG.md / TODO.md updates (CHANGELOG.md:L11-L14, TODO.md:L159-L166), so the Methodology Registry is not implicated.
The new assertions line up with the existing discoverability contract in diff_diff.__dir__() and agent_workflow() (diff_diff/__init__.py:L525-L589; diff_diff/agent_workflow.py:L124-L135, L153-L160, L212-L220).
Verification was static only: I could not run pytest or import diff_diff in this environment because numpy, pandas, and pytest are unavailable.

Methodology

Severity: None. Impact: No affected method in docs/methodology/REGISTRY.md; this PR does not modify estimator/reporting implementations, only discoverability regression coverage (docs/methodology/REGISTRY.md:L3791-L3800). Concrete fix: None.

Code Quality

Severity: None. Impact: The new test is anchored to source-of-truth symbols (_AGENT_FACING_ORDER, vars(diff_diff)) rather than duplicated literals, which reduces drift risk and resolves the earlier review gaps (tests/test_agent_discoverability.py:L39-L49, L80-L137, L278-L290). Concrete fix: None.

Performance

Severity: None. Impact: Added checks are lightweight import-time/introspection assertions over tiny in-memory objects; no production runtime path is changed (tests/test_agent_discoverability.py:L228-L290). Concrete fix: None.

Maintainability

Severity: None. Impact: Removing the deferred TODO row is justified because the new test now covers the deferred __dir__() contract, including the previously missing full-namespace proof (tests/test_agent_discoverability.py:L113-L155; TODO.md:L159-L166). Concrete fix: None.

Tech Debt

Severity: None. Impact: No new untracked debt is introduced by this diff. Concrete fix: None.

Security

Severity: None. Impact: The PR adds no subprocess, network, secret-handling, or PR-controlled execution behavior; the test remains purely in-process introspection (tests/test_agent_discoverability.py:L1-L21). Concrete fix: None.

Documentation/Tests

Severity: None. Impact: The changelog/TODO edits are consistent with the new test coverage, and the prior re-review findings are closed (CHANGELOG.md:L11-L14; tests/test_agent_discoverability.py:L113-L137, L278-L290). Concrete fix: None.
Severity: P3. Impact: Residual risk is limited to unexecuted verification in this environment; I could not run the test suite because numpy, pandas, and pytest are missing locally. Concrete fix: Run tests/test_agent_discoverability.py in the project’s normal CI environment.

igerber added the ready-for-ci Triggers CI test workflows label May 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add agent-discoverability contract test (#461)#467

Add agent-discoverability contract test (#461)#467
igerber wants to merge 2 commits into
mainfrom
agent-discoverability-tests

igerber commented May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented May 19, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant