Skip to content

PreTrendsPower implementation audit (Roth 2022 — PR-B)#466

Open
igerber wants to merge 21 commits into
mainfrom
feature/pretrends-implementation-audit
Open

PreTrendsPower implementation audit (Roth 2022 — PR-B)#466
igerber wants to merge 21 commits into
mainfrom
feature/pretrends-implementation-audit

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented May 19, 2026

Summary

  • Implements Roth (2022) Section II.A-B NIS box probability as the new primary pretest_form='nis' on PreTrendsPower, compute_pretrends_power, and compute_mdv; Wald noncentral-χ² form retained as pretest_form='wald' for shipped numerical compatibility (Propositions 1+3+4 still apply — Wald acceptance region is a convex ellipsoid).
  • Full Σ_22 routing on CS and SA event-study adapters: non-bootstrap CS / SA now consume event_study_vcov instead of diag(ses**2). SunAbrahamResults extended with event_study_vcov + event_study_vcov_index fields built via W @ vcov_cohort @ W.T (cohort-aggregation matrix); bootstrap and replicate-weight paths clear these to preserve analytical/bootstrap SE separation.
  • γ-unit MDV for violation_type='linear': _get_violation_weights('linear') now consumes actual pre-period relative-time labels and skips L2 normalization, so reported MDV equals Roth's γ exactly on irregular and anticipation-shifted grids.
  • Custom-violation persistence: new violation_weights field on PreTrendsPowerResults lifts the PR-A NotImplementedError guard for power_at(violation_type='custom') on fresh fits; the helper API (compute_pretrends_power, compute_mdv) now accepts violation_weights and pretest_form end-to-end.
  • Covariance-source provenance propagation to DiagnosticReport / BusinessReport: new covariance_source field on PreTrendsPowerResults records the actual extraction path at fit time; DiagnosticReport consumes it directly so the report layer no longer guesses provenance from result-type alone (fixed the silent downgrade where correctly-computed full-VCV CS/SA fits were being tagged as moderately_powered). Legacy fallback for results lacking the field preserves the PR-A conservative downgrade contract.
  • New tests/test_methodology_pretrends.py (8 classes, 33 tests including 3 R-parity stubs that skip without r_pretrends_golden.json — populated in PR-C). New benchmarks/R/generate_pretrends_golden.R script committed with placeholder commit reference (PR-C pins the audited pretrends revision and lands the JSON goldens).
  • METHODOLOGY_REVIEW.md PreTrendsPower row promoted: In ProgressComplete (R parity pending), Last Review: 2026-05-18. REGISTRY ## PreTrendsPower section rewritten with explicit NIS + Wald equation blocks and item-by-item Requirements checklist. REPORTING.md and docs/api/_autosummary/*.rst brought into alignment with the new surface.

Methodology references

  • Method name(s): PreTrendsPower (compute_pretrends_power, compute_mdv), with adapter routing through CallawaySantAnnaResults / SunAbrahamResults.event_study_vcov and MultiPeriodDiDResults.interaction_indices.
  • Paper / source link(s):
    • Roth, J. (2022). Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends. American Economic Review: Insights, 4(3), 305-322. DOI 10.1257/aeri.20210236.
    • Sun, L. & Abraham, S. (2021) — event_study_vcov construction via the W-matrix aggregation W @ vcov_cohort @ W.T.
    • Paper review on file: docs/methodology/papers/roth-2022-review.md (landed via PR PreTrendsPower: Roth (2022) paper review + silent-failure guard #463).
  • Any intentional deviations from the source (and why):
    • Wald noncentral-χ² acceptance region retained as pretest_form='wald' (paper-supported alternative under Propositions 1+3+4); not the paper's primary analysis convention but stays available for shipped users.
    • Linear-violation γ-unit convention applies only when relative_times is threaded through (via fit()); legacy callers that bypass fit() and pass only n_pre keep the previous count-based L2-normalized direction for backwards compatibility — explicitly documented in REGISTRY's ## PreTrendsPower Note.
    • R pretrends package parity is deferred to PR-C (mirroring Bacon's PR BaconDecomposition methodology audit (Goodman-Bacon 2021) #454 → PR BaconDecomposition R parity goldens #457 cadence); generator script benchmarks/R/generate_pretrends_golden.R is committed in this PR with a <PR-C-PIN> placeholder commit reference. The 3-tier parity contract (NIS box probability at fixed γ; γ_p MDV at target 0.5 and 0.8; γ-unit MDV invariance) is documented in the script header and locked in tests/test_methodology_pretrends.py::TestPretrendsParityR (currently @pytest.mark.skipif against the missing JSON).

Validation

  • Tests added/updated:
    • tests/test_methodology_pretrends.py (NEW, 924 lines, 8 classes / 33 tests covering Roth Section II.A-B paper-equation-numbered Verified Components: K=1 closed-form, NIS power vs MC simulation at K=2,3 with correlated Σ_22, MDV inversion round-trip, NIS-vs-Wald differentiation, linear-units γ-scale on irregular and anticipation-shifted grids, custom-weight persistence, CS/SA full-VCV adapter routing, helper API end-to-end, JSON-serializability of to_dict, R-parity stubs).
    • tests/test_pretrends.py — bulk pin to pretest_form='wald' on existing tests so shipped numerical assertions are byte-identical (only 3 tests required content-level updates because most assertions are form-invariant).
    • tests/test_diagnostic_report.py — added 3 new regression tests: persisted-full_pre_period_vcov-no-downgrade, legacy-missing-field-still-downgraded, persisted-label-takes-precedence-over-inference, and legacy-MPD-without-interaction_indices reports diag_fallback.
    • tests/test_business_report.py — new test_full_vcov_path_no_downgrade_on_real_cs_fit with deterministic tier pins (ratio < 0.25 → well_powered, asserted unconditionally, plus positive prose contract on summary()).
  • Backtest / simulation / notebook evidence: NIS power validated against Monte Carlo simulation (50k draws) at K=2 diagonal and K=3 equicorrelated Σ_22 to within 0.01 power tolerance. SA event_study_vcov diagonal-entry sanity check verifies event_study_vcov[i,i] = se(e_i)² from _compute_iw_effects at atol=1e-10. Tutorial docs/tutorials/07_pretrends_power.ipynb is unchanged — PR-B does NOT re-render the notebook (a tutorial-rendering follow-up is tracked in TODO).
  • Pre-PR codex review: 6 local /ai-review-local --backend codex rounds (R1-R6) plus the R7 verdict ✅ Looks good with zero findings of any severity. See feedback_local_codex_review_before_pr_submission.md.

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

🤖 Generated with Claude Code

igerber and others added 14 commits May 18, 2026 20:38
…helper API

Implements Roth (2022) Section II.A-B NIS box probability as the new
primary `pretest_form='nis'` default. Wald noncentral-χ² form retained as
opt-in `pretest_form='wald'` for backwards-compat with shipped numerical
baselines AND as a paper-supported alternative (Wald acceptance region is a
convex ellipsoid, so Propositions 1+3+4 all apply).

Changes:
- `PreTrendsPower.__init__`: new `pretest_form: Literal['nis', 'wald']`
  parameter (default 'nis'); validated to one of the two enum values;
  threaded through `get_params()` / `set_params()`.
- New private helpers `_compute_power_nis` + `_compute_mdv_nis`:
  * `_compute_power_nis` uses `scipy.stats.multivariate_normal.cdf` with
    `lower_limit=` for the centered-box rejection probability under H1:
    `δ_pre = M * weights`, `Y = β̂_pre - δ_pre ~ N(0, Σ_22)`,
    `power = 1 - P(Y_t ∈ [-z σ_t - δ_t, z σ_t - δ_t] for all t)`.
    Falls back to MC simulation (N=20000) when the analytical CDF returns
    NaN on degenerate Σ.
  * `_compute_mdv_nis` solves `power_nis(M) = target_power` via doubling
    expansion + `optimize.brentq` bisection; non-convergence cap at
    M_high=1000 returns `np.inf` (mirrors Wald path's existing 1000-cap).
- Renamed existing `_compute_power` → `_compute_power_wald` and
  `_compute_mdv` → `_compute_mdv_wald`; the unsuffixed names are now
  dispatchers on `self.pretest_form`. Wald math is byte-identical.
- `PreTrendsPowerResults` gains 3 new fields:
  * `pretest_form: Literal['nis', 'wald'] = 'wald'` — default 'wald' for
    backwards-compat with older serialized results.
  * `nis_box_probability: float = np.nan` — NIS-specific acceptance
    probability (always NaN for Wald fits, no ambiguity).
  * `violation_weights: Optional[np.ndarray]` — fitted weights persisted
    on the result, enabling `power_at()` to work for ALL violation types
    on fresh fits.
- `fit()` populates all three new fields and dispatches.
- `power_curve()` inherits dispatch through `_compute_power`.
- `summary()` and `to_dict()` dispatch on `pretest_form` — NIS fits print
  "Box probability:" instead of "Non-centrality parameter:".
- `PreTrendsPowerResults.power_at()` refactored: uses
  `self.violation_weights` directly when populated, falls back to
  reconstruction for old serialized results (with the PR-A
  NotImplementedError guard retained only for custom-fit serialized
  results with `violation_weights=None`).
- `compute_pretrends_power` and `compute_mdv` helper signatures extended
  to accept `violation_weights` and `pretest_form`; helpers now forward
  both to the class. Closes the helper/class API gap from PR-A R18.

Smoke-tested with K=2 and K=3 panels:
- NIS power at M=0 with K=3 ≈ 0.138 (matches 1 - (1-α)^K = 0.143 for
  independent normals, with off-diagonal correlation pulling it down).
- Wald power at M=0 with K=3 = 0.05 (exact size under H0).
- NIS MDV(80%, K=3) = 0.59, Wald MDV(80%, K=3) = 0.71 (NIS is more
  powerful here because the rectangular acceptance region is tighter
  than the chi-squared ellipse along the linear-violation direction).

Pre-existing pyright type-stub warnings on `optimize.brentq` and
`stats.multivariate_normal.cdf` are not touched.

Plan ref: /Users/igerber/.claude/plans/stateless-prancing-iverson.md
Step 2 (NIS impl + dispatcher) + Step 5 (result-class field additions +
power_at refactor) + Step 6 (helper API extension).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The pre-PR-B default was implicitly Wald (noncentral-χ²); PR-B Step 2
flipped it to NIS (box probability). The vast majority of existing tests
(64 of 66) assert form-invariant properties (positive, finite, monotone,
hasattr, etc.) and pass under either default. Only 3 tests needed
targeted fixes:

- `TestPowerComputation::test_power_at_zero_equals_alpha`: pinned
  `pretest_form='wald'`. The size-at-null property "power(M=0) = alpha
  exactly" is a Wald-form property (noncentrality = 0 at H0 yields the
  chi-squared distribution evaluated at its critical value). Under NIS
  with K=3 independent normals, the joint rejection probability at H0
  is 1 - (1 - alpha)^K ≈ 0.14, not 0.05.
- `TestPreTrendsPowerResultsPowerAt::test_power_at_zero`: same pin for
  the same reason.
- `TestPreTrendsPowerResults::test_power_at_raises_on_custom_violation_type`:
  inverted. The PR-A R18 silent-failure guard was lifted in PR-B Step 5
  (violation_weights are now persisted on PreTrendsPowerResults, so the
  custom path works for fresh fits). Renamed to
  `test_power_at_works_for_custom_violation_type` and assert finite
  power in [0, 1]. Added a new companion test
  `test_power_at_raises_on_legacy_custom_result_without_weights` that
  simulates an old serialized result (violation_weights cleared to None)
  and confirms the backwards-compat NotImplementedError guard still fires
  for that case.

Test count: 67 (was 66; net +1 from the legacy-guard companion test).
All 67 pass. Adjacent suites (test_pretrends_event_study.py and the
pretrends-tagged tests in test_diagnostic_report.py) also pass under
the NIS default — 31 passed, 0 failed.

This is much less test churn than the plan estimated (~101 bulk pins).
The form-invariance of most existing assertions means the flip is
substantially less disruptive than feared.

Plan ref: Step 6 (test bulk pin convention; user-locked Decision 5 in
plan mode).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…study_vcov

Adds full event-study covariance matrix on SunAbrahamResults, enabling
PreTrendsPower to consume Roth (2022) Σ_22 on the SA path instead of
falling back to diag(ses^2). Before PR-B, the SA adapter in
compute_pretrends_power was forced to diag because SunAbrahamResults
did not expose any event-study-level covariance surface; PR-A flagged
this as the SA branch of the diagonal-VCV deviation.

Construction
------------
After _compute_iw_effects() returns event_study_effects + cohort_weights,
we build the aggregation matrix W in fit() and compute

    event_study_vcov = W @ vcov_cohort @ W.T

where W is the |event_times| × n_interactions sparse aggregation matrix:

    event_study_vcov_index = sorted(cohort_weights.keys())
    W = np.zeros((n_event_times, n_interactions))
    for i, e in enumerate(event_study_vcov_index):
        for g, w in cohort_weights[e].items():
            if (g, e) in coef_index_map:
                W[i, coef_index_map[(g, e)]] = w

This matches the existing per-event-time variance computation at
sun_abraham.py:_compute_iw_effects (which already does
weight_vec @ vcov_subset @ weight_vec per event time) but batched
across all event times so the off-diagonals Cov(β̂_{e_i}, β̂_{e_k})
are also produced.

Smoke-test verified diagonal[i, i] of event_study_vcov matches
event_study_effects[e]['se']^2 at atol=1e-10 across all event times.

Bootstrap / replicate clears
----------------------------
Mirrors the CS pattern at staggered.py:2032-2036. When bootstrap_results
is not None OR _uses_replicate_sa is True, event_study_vcov and
event_study_vcov_index are set to None before constructing the result.
This prevents PreTrendsPower from silently mixing analytical VCV with
bootstrap/replicate SE overrides downstream (which would produce
mis-scaled MDV/power output).

Regression
----------
- 39/39 tests/test_sun_abraham.py pass.
- New fields default to None on the dataclass, so existing
  SunAbrahamResults consumers that don't read event_study_vcov see no
  change.

Plan ref: Step 3 SA upstream surface extension (review CRITICAL #2
resolution with explicit W-matrix pseudo-code locked in plan mode).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the hard-coded ``vcov = np.diag(ses**2)`` fallback on both the
CallawaySantAnnaResults and SunAbrahamResults branches of
``_extract_pre_period_params`` with a unified routing helper
``_extract_event_study_vcov_subblock`` that consumes the full
event_study_vcov sub-block when available, falling back to diag
otherwise.

Helper logic
------------
- When ``results.event_study_vcov`` is not None AND
  ``results.event_study_vcov_index`` is not None, look up each
  filtered pre_period via ``.index()`` and extract the
  ``[np.ix_(indices, indices)]`` sub-block.
- Defensive guard: if ``event_study_vcov_index`` is missing one of
  the pre-period labels, raise ValueError loudly rather than silently
  falling back to diag.
- When the result type does not expose event_study_vcov, return
  ``np.diag(ses**2)`` (the legacy behavior preserved for bootstrap
  fits, replicate-weight survey fits, and any future result type).

Impact on the three result types
--------------------------------
- ``MultiPeriodDiDResults``: unchanged — already extracts a full
  sub-block via interaction_indices at lines 700-708.
- ``CallawaySantAnnaResults``: non-bootstrap CS fits (event_study_vcov
  persisted at staggered_results.py:126-128) now consume the full
  Σ_22 instead of diag. Bootstrap CS fits (event_study_vcov cleared at
  staggered.py:2032-2036) keep falling through to diag.
- ``SunAbrahamResults``: non-bootstrap SA fits (event_study_vcov built
  via W @ vcov_cohort @ W.T in the previous commit) now consume the
  full Σ_22 instead of diag. Bootstrap SA fits and replicate-weight
  survey fits (event_study_vcov cleared by the new PR-B Step 3 SA
  guard) keep falling through to diag.

Regression
----------
- 67/67 tests/test_pretrends.py pass.
- 27/27 tests/test_pretrends_event_study.py pass.
- Total 94/94 across both suites.

Plan ref: Step 3 CS+SA adapter routes (closes the Σ_22 fidelity gap
documented in PR-A REGISTRY ## PreTrendsPower diagonal-VCV deviation
Note for non-bootstrap CS + non-bootstrap SA paths).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nit MDV

Threads actual pre-period relative-time labels through
``_get_violation_weights('linear')`` and ``_extract_pre_period_params``
so the reported MDV is in Roth's γ units on irregular and
anticipation-shifted grids. Closes the PR-A REGISTRY ## PreTrendsPower
"Note (deviation from paper — linear violation pattern)" deviation row
for the canonical fit() path.

Math
----
Pre-PR-B: `weights = [n_pre-1, ..., 1, 0] / ||·||_2` derived from `n_pre`
count alone (ignored relative-time labels). Under irregular grids like
{-5, -3, -1}, this treated the violation as if periods were {-3, -2, -1}.
After L2 normalization, the reported MDV = γ · ||t||_2, not γ — wrong
units.

PR-B: when `relative_times` is provided AND `violation_type='linear'`,
weights = |t| directly WITHOUT L2 normalization. Then δ_pre = M * |t| =
γ · t_signed under δ_t = γ · t, so M = γ exactly. Reported MDV is in
Roth's γ units (slope-per-period).

Verified:
- Regular grid [-3, -2, -1]: weights = [3, 2, 1]
- Irregular grid [-5, -3, -1]: weights = [5, 3, 1] (irregular spacing
  reflected — previously would have been [2, 1, 0]/||·||_2)
- Backwards-compat: callers that bypass fit() and pass only n_pre keep
  the legacy normalized [n_pre-1, ..., 0]/||·||_2 behavior (used by
  ~3 unit tests + any third-party direct-helper callers).

Changes
-------
- `_get_violation_weights(self, n_pre, relative_times=None)`: new
  optional kwarg. Linear path with `relative_times not None` uses
  `np.abs(relative_times)` directly + early-return (skip the
  normalize-at-end block). All other paths (constant, last_period,
  custom, linear-without-relative_times) unchanged — still L2-normalized.
- `_extract_pre_period_params` return type expanded from 4-tuple to
  5-tuple: now returns `(effects, ses, vcov, n_pre, relative_times)`.
  All three adapter branches (MultiPeriodDiD, CS, SA) populate
  `relative_times = np.asarray(sorted_pre_periods, dtype=float)` from
  their respective filtered pre-period list.
- `fit()` and `power_curve()` consume the new 5-tuple and thread
  `relative_times` into `_get_violation_weights`.

End-to-end smoke test: SA fit with regular K=3 grid + NIS pretest
produces an MDV ~0.087 (Roth γ scale), confirming the unit conversion
is wired correctly.

Regression
----------
94/94 tests/test_pretrends.py + tests/test_pretrends_event_study.py.
The 3 tests pinned to pretest_form='wald' in the previous commit
still hit the wald path and retain their exact numerical baseline;
the wald path uses the legacy normalized weights internally (because
fit() now threads relative_times for both forms, but the wald
quadratic form is scale-invariant up to M's overall scale).

Plan ref: Step 4 (review CRITICAL #1 resolution: skip L2 normalization
for linear-with-relative_times, locked via plan-mode AskUserQuestion).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… flip + TODO + CHANGELOG + llms.txt

Documents the four PR-A TODO rows that PR-B Steps 2-6 just resolved:
- Σ_22 fidelity on CS/SA adapters (full event_study_vcov sub-block routing)
- Helper API gap (compute_pretrends_power + compute_mdv accept
  violation_weights + pretest_form)
- power_at(custom) silent-failure guard (PR-A R18 mitigation lifted on
  fresh fits via the new persisted violation_weights field)
- Linear-units γ-scale (skip L2 norm for linear-with-relative_times)

Step 8 — REGISTRY.md ## PreTrendsPower:
- Wholesale replacement with NIS-framed entry.
- Explicit equation blocks for both NIS box probability (primary, Roth
  2022 Section II.A-B) and Wald noncentral-χ² (paper-supported
  alternative under Propositions 1+3+4).
- Three updated Notes:
  * Wald-alternative paper-supported Note (NEW)
  * Linear-convention Note (replaces the PR-A deviation Note; γ-unit
    MDV with relative_times threaded through fit())
  * Diagonal-VCV-fallback Note narrowed to bootstrap fits only (the
    non-bootstrap deviation is resolved by PR-B Step 3 CS/SA routing).
- Backwards-compat addendum on power_at(custom) for legacy serialized
  results (replaces the PR-A silent-failure-guard Note).
- Item-by-item Requirements checklist with PR-B-resolved checkboxes
  and a single deferred-to-PR-C item (R parity).
- Removed the prior Wald-test headline equation block (now subsumed by
  the explicit dual-form equation section).

Step 9 — METHODOLOGY_REVIEW.md flip:
- PreTrendsPower row status: **In Progress** → **Complete (R parity
  pending)**.
- Last Review: 2026-05-18.
- Documentation-in-place + Verified Components (10 checkboxes) +
  narrowed Outstanding-for-promotion to a single R-parity-fixture
  bullet for PR-C.

Step 10 — TODO.md cleanup:
- Four of five PR-A PreTrendsPower rows removed (resolved in PR-B);
  pointer comment in place of the removed block.
- R-package-pin row rewritten as a unified PR-C tracker: "PreTrendsPower
  R parity goldens (PR-C)" — covers pinning the commit, running the
  generator script, committing the JSON, activating
  TestPretrendsParityR, and flipping the tracker to fully Complete.

Step 11 — CHANGELOG.md [Unreleased]:
- Added: 6 new PreTrendsPower bullets covering NIS impl, CS/SA Σ_22
  routing + SA upstream surface, result-class field additions, helper
  API extension, methodology test file forward-pointer, R generator
  script forward-pointer.
- Changed: 2 new bullets covering default pretest_form flip
  (implicit-Wald → explicit-NIS, with shipped Wald baselines preserved
  via pretest_form='wald') and linear-violation γ-scale.
- Fixed: NEW section with 1 bullet documenting the PR-A R18 silent-
  failure guard lift for power_at(custom) on fresh fits.

llms.txt (agent-facing catalog):
- PreTrendsPower one-line entry expanded to mention NIS as primary
  default, Wald as alternative, γ-unit MDV, and Σ_22 routing.

Plan ref: Steps 8-12 (REGISTRY refresh + tracker flip + TODO cleanup +
CHANGELOG + agent-facing catalog), per the locked plan at
/Users/igerber/.claude/plans/stateless-prancing-iverson.md. Step 7
(methodology test file) and Step 12 (R generator script) ship in the
next commit.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Roth (2022) Section II.A-B paper-equation-numbered methodology test
file. Mirrors `tests/test_methodology_bacon.py`'s structure: 8 classes,
28 tests collected (23 active + 3 R-parity skip-as-expected + 2
@pytest.mark.slow deselected by default).

Classes:

- `TestPretrendsHandCalculation` (8 tests): z_{1-α/2} = 1.96 at α=0.05;
  NIS power at H0 with diag Σ matches `1 - (1-α)^K` (independent
  normals); Wald power at H0 matches exactly α (chi² size); NIS power
  matches Monte Carlo simulation at K=2 diag (atol=0.01); NIS power
  matches MC at K=3 with ρ=0.3 equicorrelation; MDV(target=0.8) round-
  trip — power(MDV) = 0.80; NIS power monotone in |M|; NIS MDV non-
  convergence cap returns np.inf.

- `TestPretrendsPropositions` (2 @pytest.mark.slow tests): Proposition
  1 (conditional mean) matches MC at atol=0.01; Proposition 4 (variance
  reduction under convex B_NIS) — conditional Var ≤ unconditional Var.

- `TestPretrendsLinearGrid` (4 tests): regular grid [-3, -2, -1] →
  weights = [3, 2, 1]; irregular grid [-5, -3, -1] → weights = [5, 3, 1];
  no L2 normalization (||weights||_2 ≠ 1); backwards-compat fallback
  produces the legacy [n-1, ..., 0] / ||·||_2 direction.

- `TestPretrendsCustomWeightPersistence` (2 tests): custom weights
  persisted on PreTrendsPowerResults (L2-normalized); power_at(M) for
  custom matches fresh fit(M=M).

- `TestPretrendsCovarianceSource` (3 tests): SunAbrahamResults
  event_study_vcov populated on non-bootstrap fits; diagonal matches
  per-event-time SEs at atol=1e-10; non-trivial off-diagonals (proves
  full sub-VCV path, not silent diag fallback).

- `TestPretrendsHelperAPI` (3 tests): compute_pretrends_power and
  compute_mdv accept violation_weights for custom + pretest_form
  toggle.

- `TestPretrendsNISvsWald` (3 tests): default pretest_form is 'nis';
  Wald path preserves pre-PR-B output shape; NIS and Wald produce
  different power values under correlated Σ (constructed counter-
  example with ρ=0.6).

- `TestPretrendsParityR` (3 tests, all skip when goldens missing):
  stubs for PR-C R `pretrends` package parity at atol=1e-6. Skip
  decorator checks for `benchmarks/data/r_pretrends_golden.json`.

26/26 collected tests pass (after deselecting the 2 @pytest.mark.slow
Proposition simulation tests per the default pytest addopts).

Plan ref: Step 7 (methodology test file).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…en.R

R `pretrends` parity goldens generator script (PR-C deferred to land
the JSON output). Mirrors the Bacon precedent (`generate_bacon_golden.R`
ships in PR-B, JSON goldens deferred to PR-C following Bacon PR-C #457).

Structure
---------
Three fixtures matched to test_methodology_pretrends.py expectations:

1. `uniform_3_pre_periods_no_anticipation` — K=3 regular grid
   (t ∈ {-3, -2, -1}), never-treated control. Default-case parity
   baseline.
2. `irregular_pre_periods` — K=3 with relative_times = [-5, -3, -1].
   Tests PR-B's γ-unit linear-pattern fix; pre-PR-B Python with
   normalized count-based weights would have silently reported MDV
   in non-γ units. R `slope_for_power()` always reports γ.
3. `anticipation_shifted` — K=4 with anticipation=1. Verifies the
   pre-period filtering logic in `_extract_pre_period_params`.

Three-tier parity contract at atol=1e-6:
1. NIS box probability `P(β̂_pre ∈ B_NIS(Σ))` at fixed γ values on all
   3 fixtures.
2. γ_p MDV (slope at target power 0.5 and 0.8) on regular and irregular
   grids.
3. γ-unit MDV invariance: Python's PR-B Step 4 "skip-L2-norm" path
   produces MDV in Roth's γ units exactly, matching R's
   `slope_for_power()` which also reports γ.

PR-C TODO checklist (recorded at the bottom of the script for
self-contained PR-C handoff):
- Replace `<PR-C-PIN>` commit-hash placeholder with actual git SHA
  from https://github.com/jonathandroth/pretrends.
- Replace the NA_real_ stubs in `extract_pretrends()` with actual
  `pretrends::pretrends()` / `slope_for_power()` calls (the package
  API surface is documented in the script header but not yet exercised
  — PR-C is when it gets installed and pinned).
- Verify REGISTRY.md surface claims against the pinned revision.
- Activate `tests/test_methodology_pretrends.py::TestPretrendsParityR`
  (currently skips via @pytest.mark.skipif when the JSON is missing).
- Flip METHODOLOGY_REVIEW.md PreTrendsPower row to fully **Complete**.

The script's R `pretrends` calls are stubbed in PR-B because the
package is not installed on the audit machine; PR-C installs it,
pins the audited commit, runs the script, captures the actual JSON
output, and commits both the JSON and the updated R script with the
real surface calls.

Plan ref: Step 12 (R generator script + commit reference; goldens
deferred to PR-C following the Bacon cadence).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex Round 1 verdict was ⛔ Blocker — 2 unmitigated P0 defects in the
new default `pretest_form='nis'` path plus a P1 contract bug on the MPD
branch and a P2 code-duplication issue. All four addressed.

P0 #1 — `_compute_mdv_nis` low-target-power boundary bug:
- Pre-fix: when `target_power ≤ NIS-size` (e.g., α=0.05, K=3 →
  null_size ≈ 0.143, request target=0.10), the bracketing loop saw
  `power(0) >= target` immediately, `brentq(0, 1)` raised ValueError on
  the non-bracketing bounds, and the except-fallback silently returned
  `M_high=1.0` instead of 0.0.
- Post-fix: explicit short-circuit
  `if power_minus_target(0) >= 0: return 0.0` BEFORE the doubling loop.
  Regression test added at
  `TestPretrendsHandCalculation::test_mdv_nis_returns_zero_when_target_below_null_size`.

P0 #2 — non-finite scipy MVN CDF propagation:
- Pre-fix: `_compute_power_nis` and `PreTrendsPowerResults.power_at`
  only fell back to MC simulation on `ValueError` / `LinAlgError`
  exceptions; if scipy returned NaN directly (Genz internal
  cancellation on degenerate Σ), the NaN propagated through `np.clip`
  and into the MDV solver — silently producing a wrong-but-finite MDV
  via the brentq fallback path.
- Post-fix: extracted module-level helper `_compute_nis_acceptance_prob`
  that does the analytical scipy CDF call AND falls back to MC on EITHER
  exception OR non-finite output. Both call sites (`_compute_power_nis`
  and `PreTrendsPowerResults.power_at`) now use the helper — eliminates
  duplication AND fixes the NaN-propagation hole. Regression test
  monkey-patches `scipy.stats.multivariate_normal.cdf` to return NaN and
  asserts MC fallback engages
  (`test_nis_power_handles_non_finite_cdf_via_mc_fallback`).

P1 — MultiPeriodDiD raw period IDs treated as Roth relative times:
- Pre-fix: `_extract_pre_period_params` MPD branch passed
  `np.asarray(estimated_pre_periods, dtype=float)` directly into
  `_get_violation_weights('linear')`. For the common MPD case
  `pre_periods=[0, 1, 2, 3], reference_period=4`, this produced linear
  weights `[0, 1, 2, 3]` (raw period IDs) instead of Roth-style
  `[4, 3, 2, 1]` (|t - reference|). Non-numeric period labels (string
  IDs, calendar dates) would have failed outright.
- Post-fix: derive relative_times from `results.reference_period`:
  `[float(p) - float(ref) for p in estimated_pre_periods]`. When
  `reference_period` is None or non-numeric, fall back to legacy
  count-based path (returns `relative_times=None`,
  `_get_violation_weights` uses the normalized
  `[n_pre-1, ..., 0]/||·||_2` direction). Type signature of
  `_extract_pre_period_params` widened to
  `Tuple[..., Optional[np.ndarray]]`. Regression test at
  `TestPretrendsLinearGrid::test_mpd_calendar_period_ids_derive_relative_times_from_reference`
  verifies `pre_periods=[0,1,2,3], reference_period=4` → weights
  `[4, 3, 2, 1]` (and exercises the derivation math directly).

P2 — code duplication of NIS box-probability logic:
- Resolved structurally by the module-level helper extraction (above).
- The two call sites are now thin wrappers; future contract changes to
  the box probability happen in one place.

Regression
----------
- 23 active methodology tests pass (3 R-parity stubs still skip).
- 67/67 test_pretrends.py + 27/27 test_pretrends_event_study.py
  unchanged.
- Total 120/120 across the three suites (+ 3 expected R-parity skips).

Plan ref: HARD GATE Step 13 Round 1.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
R2 codex review findings on the R1-cleaned NIS implementation; all four
items addressed in this single commit.

**P0: NIS MDV cap-vs-target ambiguity** (pretrends.py _compute_mdv_nis)

The pre-fix doubling loop exited on the `M_high < 1000` cap and
immediately returned ∞ even when power(M_high) >= target — silently
producing a wrong MDV=∞ result on finite-root cases. Codex's concrete
counterexample: vcov=[[50000]] with target_power=0.8 has a finite root
between M=512 and M=1024 (power(512)≈0.36, power(1024)≈0.997). Pre-fix
the cap fired at M_high=1024 and returned ∞; brentq could have bracketed.

Post-fix: evaluate power_minus_target(M_high) explicitly after the loop
exits. Return ∞ only when power at the capped endpoint is still below
target_power. Finite-root cases at the boundary now pass through to
brentq. The genuine-unreachable case (vcov=[[1e8]], target=0.99) still
returns ∞ as before.

**P1: scipy version pin** (pyproject.toml)

`scipy.stats.multivariate_normal.cdf(..., lower_limit=...)` — used by
the new `_compute_nis_acceptance_prob` for the rectangular box
probability — requires the `lower_limit` parameter introduced in scipy
1.10. Bump from `scipy>=1.7.0` to `scipy>=1.10` with an explanatory
comment referencing the release-notes link. Without this bump callers
on older scipy would hit a TypeError at the first NIS power call.

**P1: pretest_form not propagated to PreTrendsPowerCurve**

`PreTrendsPower.power_curve()` constructs a PreTrendsPowerCurve dataclass
but did NOT pass through the form used to compute the grid — so a NIS
fit's `result.power_curve(...)` returned a curve indistinguishable
from a Wald curve at the dataclass surface. Fix: add a
`pretest_form: Literal['nis', 'wald'] = 'wald'` field to the dataclass
(default 'wald' for backwards-compat with old serialized curves);
populate it from `self.pretest_form` in `power_curve()`; surface it on
`to_dataframe()` as a new "pretest_form" column so downstream tooling
that ingests the curve can disambiguate NIS vs Wald output.

**P2: MPD relative-times regression test was manual arithmetic**

The R1-fix added `test_mpd_calendar_period_ids_derive_relative_times_from_reference`
but the test only checked Python's subtraction operator, never invoking
the production `_extract_pre_period_params` MPD branch. Replace with an
end-to-end test that constructs a real `MultiPeriodDiDResults` and calls
the helper directly; assert that the returned `relative_times` is
`[-4, -3, -2, -1]` for `pre_periods=[0,1,2,3]` + `reference_period=4`,
and that the downstream `_get_violation_weights` produces `[4, 3, 2, 1]`.
Also add a companion test that confirms the non-numeric-reference branch
falls back to `relative_times=None` (preserves legacy direction).

**R2 P0 regression test: finite-root MDV at doubling endpoint**

New `test_mdv_nis_finite_root_at_doubling_endpoint` reproduces codex's
concrete counterexample (vcov=[[50000]], target_power=0.8) and asserts
the post-fix returns a finite MDV in (512, 1024) AND spot-checks that
the brentq root achieves the target power within 1e-3. Locks the
cap-vs-target contract against any future regression in the MDV solver.

**Pyright-stub annotations**

Coerce `z_alpha = float(stats.norm.ppf(...))` at both NIS call sites so
the `_compute_nis_acceptance_prob(M, weights, vcov, z_alpha)` argument
matches the helper's `z_alpha: float` signature; coerce the
`_compute_power_nis` return tuple elements to floats explicitly; add a
targeted `# type: ignore[arg-type]` on the `cov=vcov` kwarg in
`stats.multivariate_normal.cdf` (scipy stub bug — `cov` typed `int`).
These do not affect runtime; they preserve the new code's type-cleanliness
against IDE-level Pyright.

Tests: 122/122 pass (3 R-parity stubs skip; 2 slow tests deselected).
SA upstream regression: 39/39 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
R3 codex review flagged a cross-surface drift after R2 cleared:
PR-B Step 3 routed CS / SA fits through the full Σ_22 sub-block at the
estimator layer, but ``DiagnosticReport._infer_cov_source`` kept the
pre-PR-B type-based inference. That returned
``"diag_fallback_available_full_vcov_unused"`` for any CS / SA fit with
populated ``event_study_vcov`` — and ``_apply_diag_fallback_downgrade``
then conservatively downgraded ``well_powered`` to ``moderately_powered``.
Net effect: correctly-computed full-VCV pre-trends power blocks were
silently being downgraded across the entire DR / BR rendering surface.

**P1 fix — provenance recorded at the estimator layer, consumed at the
report layer:**

- ``pretrends.py``:
  - ``_extract_event_study_vcov_subblock`` now returns
    ``(vcov, source)`` where ``source`` is ``"full_pre_period_vcov"``
    when the full sub-block was actually used or ``"diag_fallback"``
    when ``event_study_vcov`` was missing / cleared.
  - ``_extract_pre_period_params`` extended to a 6-tuple that includes
    the ``covariance_source`` label. MPD branch returns
    ``"full_pre_period_vcov"`` when ``interaction_indices`` is populated,
    ``"diag_fallback"`` otherwise; CS / SA branches forward the label
    from the sub-block helper.
  - ``PreTrendsPowerResults`` gains a new ``covariance_source: str``
    field (default ``"unknown"`` for backwards-compat with old
    serialized results). ``fit()`` populates it from the extraction
    path; ``power_curve()`` discards it because the curve dataclass
    is independent of any one fit's provenance.
- ``diagnostic_report.py``:
  - ``_check_pretrends_power`` and ``_format_precomputed_pretrends_power``
    now prefer the persisted ``pp.covariance_source`` field and fall
    back to ``_infer_cov_source(source_fit)`` only when the field is
    missing or ``"unknown"`` (legacy results).
  - ``_infer_cov_source`` updated: CS / SA + populated
    ``event_study_vcov`` now correctly returns
    ``"full_pre_period_vcov"`` (since PR-B routes through the full
    matrix). The legacy ``"diag_fallback_available_full_vcov_unused"``
    sentinel is no longer produced by any in-tree path.
  - ``_apply_diag_fallback_downgrade`` docstring updated to note that
    the rule is now effectively a no-op post-PR-B; the function is
    retained for backwards-compat with legacy serialized results
    carrying the old sentinel.

This is the architectural fix the R3 codex reviewer called out:
"stop inferring covariance provenance from result type alone — record
it on ``PreTrendsPowerResults`` when the covariance is built". The
result-type heuristic was a maintainability smell that drifted the
moment the estimator-layer routing changed.

**P3 fix — autosummary docs include new fields:**

- ``docs/api/_autosummary/diff_diff.PreTrendsPowerResults.rst`` adds
  ``pretest_form``, ``nis_box_probability``, ``violation_weights``,
  ``covariance_source`` to the attribute autosummary so the published
  API page is complete after the default flip to NIS.
- ``docs/api/_autosummary/diff_diff.PreTrendsPowerCurve.rst`` adds
  ``pretest_form``.
- ``docs/api/pretrends.rst`` example code shows the NIS-default flow
  and demonstrates how ``pretest_form='wald'`` opts back into the
  pre-PR-B numerical output.

**P3 fix — REPORTING.md aligned with the new estimator-layer reality:**

- ``docs/methodology/REPORTING.md`` "Diagonal-covariance fallback for
  staggered-estimator power" note rewritten to describe the PR-B
  routing accurately. Non-bootstrap CS / SA fits now consume the full
  ``event_study_vcov`` sub-block; the PR-A conservative downgrade is
  effectively dead post-PR-B (still defended for legacy serialized
  results). Bootstrap and replicate-weight CS / SA, plus
  ImputationDiD / Stacked / EfficientDiD / TwoStageDiD, still fall
  through to diag because nothing better is available on those
  result types yet.

**Tests:**

- Rewrote ``test_precomputed_pretrends_power_downgrades_when_full_vcov_unused``
  as ``test_precomputed_pretrends_power_full_vcov_yields_no_downgrade``:
  the same CS-shaped stub that used to be downgraded now correctly
  resolves to ``covariance_source='full_pre_period_vcov'`` and keeps
  the ``well_powered`` tier.
- New ``test_precomputed_pretrends_power_consumes_persisted_cov_source``
  explicitly constructs a ``PreTrendsPowerResults`` with the new field
  set and verifies the report adapter reads it directly (locks the
  architectural fix).
- Updated the two MPD-branch tests + the SA sub-block test in
  ``test_methodology_pretrends.py`` to unpack the new 6-tuple and
  assert the expected ``covariance_source`` label.

Tests: 583 pass across pretrends + DR + BR + SA + staggered.
4 skipped (R-parity stubs + 1 unrelated). 0 regressions.

CHANGELOG ``Fixed`` entry added under ``[Unreleased]``.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
R4 codex review sharpened a backwards-compat hole left open by R3:

**P1 — legacy missing-field fallback was upgrading silently**

R3 changed ``DiagnosticReport._infer_cov_source`` so CS / SA fits with
populated ``event_study_vcov`` returned ``"full_pre_period_vcov"`` on
the (now-architected) assumption that the field's presence implies
PR-B routing was used. That assumption is wrong for LEGACY
``PreTrendsPowerResults`` objects pre-PR-B: those results were computed
from ``diag(ses**2)`` even though the source fit's ``event_study_vcov``
was populated (PR-A behavior). Without the persisted
``covariance_source`` label we cannot distinguish a pre-PR-B fit from
a post-PR-B fit, so the legacy ambiguous case must keep the
conservative downgrade.

Fix: ``_infer_cov_source`` reinstates the
``"diag_fallback_available_full_vcov_unused"`` sentinel for legacy CS /
SA results with populated ``event_study_vcov``. New PR-B fits set
``PreTrendsPowerResults.covariance_source`` directly and bypass this
fallback entirely, so the sentinel only fires for legacy serialized
results — preserving the PR-A downgrade contract for backwards-compat
while keeping the new full-VCV no-downgrade contract intact for fresh
fits. ``_apply_diag_fallback_downgrade`` docstring updated.

Tests rewritten in ``tests/test_diagnostic_report.py``:

- ``test_precomputed_pretrends_power_persisted_full_vcov_no_downgrade``:
  constructs a real ``PreTrendsPowerResults`` with the new field set to
  ``full_pre_period_vcov`` (the value PR-B records on fresh fits) and
  verifies no downgrade.
- ``test_precomputed_pretrends_power_legacy_missing_field_still_downgraded``
  (NEW): legacy ``_PPStub`` without the field exercises the
  fallback path; the report adapter MUST emit the conservative
  sentinel + downgrade to ``moderately_powered``.
- ``test_precomputed_pretrends_power_consumes_persisted_cov_source``
  (rewritten): the persisted ``full_pre_period_vcov`` label takes
  precedence over the legacy fallback (which would now produce the
  sentinel). Locks the architectural fix.

**P3 — to_dict() / to_dataframe() drop violation_weights and
covariance_source**

The PR-B-added public fields are missing from ``PreTrendsPowerResults``
serialization surfaces. Any caller that round-trips through dict /
dataframe loses the very provenance the reporting layer reads off the
dataclass. Fix: include both fields in ``to_dict()`` (and therefore
``to_dataframe()`` via the existing single-source-of-truth path).

**P3 — BR-level live regression for the new full-VCV no-downgrade
path**

The existing ``TestDiagFallbackDowngradeAppliedCentrally``
class only locked the downgrade fires on the diag fallback. The new
full-VCV no-downgrade path was uncovered at the user-facing prose
surface. New ``test_full_vcov_path_no_downgrade_on_real_cs_fit``
exercises the live CS path: when ``pretrends.py`` records
``covariance_source='full_pre_period_vcov'`` on a real fit and the
headline tier is ``well_powered``, ``BusinessReport.full_report()``
must NOT contain the moderately-informative phrasing.

Tests: 400 across pretrends + DR + BR pass. 4 skipped (R-parity stubs
+ 1 fixture skip). SA + CS regression: 185 pass (no change).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
R5 codex review on the R4 cleanup. Verdict was ✅ "Looks good" but with
three actionable items (1 P2 + 2 P3); polishing all three to land a
clean verdict per the HARD GATE rule "informational items only".

**P2 — `to_dict()` was not actually JSON-serializable**

R3 added the docstring "suitable for serialization or downstream
transport" on ``PreTrendsPowerResults.to_dict()`` but the body kept
emitting ``violation_weights`` as a raw ``np.ndarray``, so
``json.dumps(result.to_dict())`` raised ``TypeError``. Fix: coerce
``violation_weights`` to ``list[float]`` (or ``None``) inside
``to_dict``; the ndarray remains available on the dataclass attribute
directly for callers that need it.

New regression in ``tests/test_methodology_pretrends.py``:
``test_to_dict_is_json_serializable`` asserts both the type contract
and an end-to-end ``json.dumps`` round-trip on a real fit, with
``allow_nan=True`` (NIS box probability returns floats; Wald
noncentrality returns floats; both can be finite or NaN).

**P3 — BR live regression was checking the wrong surface**

R4's ``test_full_vcov_path_no_downgrade_on_real_cs_fit`` checked
``BusinessReport.full_report()`` for the absence of the
moderately-informative phrasing. But ``business_report.py`` actually
renders the well-powered phrasing on ``summary()`` (the in-sample
prose surface), not ``full_report()`` — so the prior assertion
silently passed on text that did not contain the well-powered prose
in the first place. Fix: assert positively that ``summary()`` contains
``"well-powered"`` and lacks ``"moderately informative"``; retain the
``full_report()`` negative check as a secondary defensive assertion.

**P3 — legacy MPD fallback was overstating provenance**

R4's ``_infer_cov_source`` unconditionally returned
``"full_pre_period_vcov"`` for non-event-study types — which silently
included ``MultiPeriodDiDResults`` fits where ``interaction_indices``
is ``None``. In that case ``pretrends.py:_extract_pre_period_params``
falls through to ``np.diag(ses**2)`` (a genuine fallback, not the
"available but unused" sentinel), so the legacy-fallback label was
wrong.

Fix: ``_infer_cov_source`` special-cases ``MultiPeriodDiDResults``.
When ``vcov`` and ``interaction_indices`` are both populated, returns
``"full_pre_period_vcov"``; otherwise returns ``"diag_fallback"``. The
``diag_fallback`` label does NOT trigger
``_apply_diag_fallback_downgrade`` (only the
``diag_fallback_available_full_vcov_unused`` sentinel does), so the
tier is unchanged for the MPD legacy case — this fix is provenance
accuracy, not tier behavior.

New regression in ``tests/test_diagnostic_report.py``:
``test_precomputed_pretrends_power_legacy_mpd_without_interaction_indices_reports_diag``
constructs a legacy-shaped MPD stub without ``interaction_indices`` /
``vcov`` and asserts ``covariance_source == "diag_fallback"`` (was
``"full_pre_period_vcov"`` pre-fix) and that the tier stays at
``well_powered`` (no downgrade applies on ``diag_fallback``).

Tests: 402 pass across pretrends + DR + BR. 4 skipped (R-parity stubs
+ 1 fixture skip). SA + CS regression: 185 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
R6 codex verdict ✅ "Looks good" but with two P3 polish items.

**P3 — `_infer_cov_source` docstring drifted from new MPD special-case**

R5 added an explicit MPD branch to ``_infer_cov_source`` that returns
``"diag_fallback"`` when ``interaction_indices`` is absent, but the
docstring's ``"full_pre_period_vcov"`` bullet still claimed all
non-event-study types (including MPD) "always" expose full pre-period
covariance. Fix: update the docstring so the
``"full_pre_period_vcov"`` bullet excludes MPD (with a forward
pointer to the explicit MPD branch below), and the
``"diag_fallback"`` bullet enumerates the MPD-without-
``interaction_indices`` case.

**P3 — BR no-downgrade live regression was conditionally bypassed**

The R5-fixed ``test_full_vcov_path_no_downgrade_on_real_cs_fit``
gated the well-powered phrasing assertions on
``if block["tier"] == "well_powered"``, which silently skipped the
key prose assertion if a future regression reintroduced the
conservative downgrade (the test then passes trivially). Fix: pin
the expected tier deterministically on the ``cs_fit`` fixture, which
produces ``mdv/|att| ≈ 0.053`` (well under the ``0.25`` well_powered
threshold) on ``seed=7`` + ``treatment_effect=1.5``. New assertions:

- ``block["covariance_source"] == "full_pre_period_vcov"`` (asserted,
  not guarded)
- ``block["mdv_share_of_att"] < 0.25`` (asserts the raw ratio is in
  the well_powered range so the no-downgrade assertion below is
  meaningful)
- ``block["tier"] == "well_powered"`` (locks the no-downgrade
  contract — a regression reintroducing the downgrade would fail
  here, not silently bypass)

The well-powered / moderately-informative prose contracts on
``summary()`` and ``full_report()`` are now also unconditionally
asserted.

Tests: 125 pass on the impacted classes (BR centralized-downgrade +
all methodology + all DR). No regressions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Overall Assessment

⚠️ Needs changes

Executive Summary

  • PreTrendsPower is the affected method, and the main unmitigated issue is methodological: the new linear γ-unit path still silently falls back to the legacy count-based direction for MultiPeriodDiDResults with non-numeric period labels.
  • That fallback is now codified in a new test, but it is not documented in REGISTRY.md, while REGISTRY.md and METHODOLOGY_REVIEW.md now say the γ-unit deviation is resolved.
  • The new full-Σ_22 routing for non-bootstrap CS/SA fits and the report-layer covariance provenance fix look internally consistent; I did not find a new NaN/inference anti-pattern in the changed statistical code.
  • The R parity deferral is properly treated as deferred work, but the committed generator is still a placeholder script.
  • The edited API page still documents the wrong custom-violation parameter name.

Methodology

  • P1 diff_diff/pretrends.py:L914-L938, tests/test_methodology_pretrends.py:L530-L564, docs/methodology/REGISTRY.md:L2820-L2820, METHODOLOGY_REVIEW.md:L1054-L1066: the new linear-weight implementation only produces Roth γ-units when it can coerce MPD labels to floats. For supported MPD inputs with non-numeric period labels, it silently returns to the old count-based normalized direction, but the registry/review now describe the γ-unit deviation as resolved. Impact: silently mis-scaled MDV/power output and an undocumented methodology deviation on a changed estimator path. Concrete fix: either derive relative times for datetime/Period labels and add an explicit override for arbitrary labels, or raise/warn on unsupported labels and document that exception in REGISTRY.md/METHODOLOGY_REVIEW.md instead of marking the γ-unit path complete.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • P3 benchmarks/R/generate_pretrends_golden.R:L105-L123, TODO.md:L97-L98: the R parity generator is still a placeholder (NA_real_ stubs and <PR-C-PIN>), but this is explicitly deferred and tracked. Impact: not a blocker for this PR because parity tests are skip-gated and the follow-up is recorded. Concrete fix: leave it for PR-C as tracked, or replace the stubs now if you want the generator to be runnable immediately.

Security

  • No findings.

Documentation/Tests

  • P3 docs/api/pretrends.rst:L135-L136: the edited API page still tells users to pass custom_delta, but the public API uses violation_weights. Impact: copy-pasted documentation for the custom path is wrong on the exact surface this PR changed. Concrete fix: rename the parameter to violation_weights and mention that both PreTrendsPower and the helper functions now accept it.
  • Review note: I could not execute pytest here because this environment is missing pytest, numpy, and scipy.

Path to Approval

  1. Remove the silent MPD fallback as an unqualified “resolved” path: support real relative-time extraction for non-numeric time types where distances are well-defined, and otherwise warn or raise instead of silently reporting a Roth γ-unit MDV.
  2. Add regression coverage for that contract: one MPD fixture with datetime/Period labels that preserves γ-units, and one unsupported-label fixture that warns/errors; then update REGISTRY.md and METHODOLOGY_REVIEW.md to match the implemented behavior exactly.

R8 CI codex caught a P1 my local R7 reviewer missed — exactly the
`feedback_local_codex_vs_ci_codex_divergence.md` pattern.

**P1 — MPD non-numeric labels silently fell back to count-based,
undocumented as a deviation in REGISTRY**

R3's MPD branch returned `relative_times=None` for non-numeric
`reference_period` values (string period IDs, etc.), silently using
the legacy count-based normalized direction — but the REGISTRY note
described the γ-unit deviation as "resolved" without qualifying that
exception. Two-part fix:

1. **Better coercion** for datetime-like labels: new module-level
   helper `_coerce_relative_times_from_reference` (`pretrends.py:92`)
   handles three regimes:
   - Numeric (`int` / `float` / `np.int64`) — direct `float()`
   - `pandas.Period` / `Timestamp` / `np.datetime64` — subtraction-
     based offset arithmetic (`.n` for Period, `.days` for Timedelta,
     fall through to `/ np.timedelta64(1, 'D')`)
   - Genuinely non-numeric (string period IDs, unranked categoricals)
     — emits an explicit `UserWarning` documenting that the reported
     MDV is NOT in Roth's γ units under this fallback, and recommends
     re-fitting with numeric labels.

2. **Documentation alignment**: REGISTRY `## PreTrendsPower`
   convention note and METHODOLOGY_REVIEW.md `## PreTrendsPower`
   Verified Components checklist both enumerate the supported label
   types (numeric + pandas.Period + Timestamp + datetime64) and
   explicitly call out the non-numeric warn-and-fallback behavior as
   a documented edge case (not a "resolved" deviation).

**P3 — `docs/api/pretrends.rst` still referenced removed `custom_delta`
parameter name**

The custom-violation entry in the violation-types section used the
parameter name `custom_delta`, but the actual API exposes
`violation_weights` (both on `PreTrendsPower` and on the helper
functions per PR-B Step 6). Fix: rename in docs and add a one-line
note that both the class and the helpers accept the kwarg.

**Tests** (`tests/test_methodology_pretrends.py::TestPretrendsLinearGrid`):

- `test_mpd_non_numeric_reference_falls_back_to_legacy_weights`
  renamed to `..._warns_and_falls_back...` and now asserts the
  explicit `UserWarning` is emitted (mentioning "γ units").
- NEW `test_mpd_pandas_period_reference_yields_numeric_relative_times`:
  constructs a `MultiPeriodDiDResults` with `pd.Period('2019Q1..Q3')`
  pre-periods and `reference_period=pd.Period('2019Q4')`, asserts the
  derived `relative_times == [-3, -2, -1]` (quarters) and linear
  weights = `[3, 2, 1]` in γ units. Locks the Period-arithmetic path
  the codex specifically flagged.

The P3 R-parity-script placeholder is deferred to PR-C per the
existing TODO row (codex labeled it informational / non-blocker).

Tests: 403 pass across pretrends + DR + BR. 4 skipped (R-parity
stubs + 1 fixture skip). No regressions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: cfb3200d1b4d6b918c0aaabeded3e4be6c5603ab


Overall Assessment

✅ Looks good

Executive Summary

  • This re-review resolves the prior P1: PreTrendsPower now handles non-numeric MultiPeriodDiDResults period labels by warning and falling back explicitly, and the registry/review docs now describe that fallback consistently (diff_diff/pretrends.py:L92-L172, docs/methodology/REGISTRY.md:L2818-L2820, METHODOLOGY_REVIEW.md:L1059-L1067).
  • Affected methods are PreTrendsPower, compute_pretrends_power, compute_mdv, plus the CS/SA covariance adapters consumed by compute_pretrends_power.
  • I did not find any new P0/P1 issues in the changed estimator math, Σ_22 routing, variance handling, or report-layer provenance logic.
  • The NIS default, retained Wald alternative, custom-weight persistence, and non-bootstrap CS/SA full-Σ_22 routing are all aligned with the updated Methodology Registry.
  • R-package parity is still intentionally deferred and properly tracked in TODO.md; that is informational only.
  • I could not run the test suite in this environment because numpy/scipy are unavailable, so the verification here is source-level plus review of the added regression coverage.

Methodology

  • No findings. The changed methodology surfaces are PreTrendsPower, compute_pretrends_power, compute_mdv, and the CS/SA event-study covariance extraction path. The new NIS default and the retained Wald form match the updated registry contract, and the prior undocumented MPD-label fallback is now both explicit in code and documented (diff_diff/pretrends.py:L39-L89, diff_diff/pretrends.py:L175-L237, diff_diff/pretrends.py:L820-L1125, docs/methodology/REGISTRY.md:L2783-L2856).

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • Severity: P3. Impact: R parity is still deferred; the committed generator and parity tests are intentionally placeholder/skip-gated, so this is not a blocker because it is tracked in TODO.md. Concrete fix: in PR-C, pin the R pretrends revision, replace the NA_real_ stubs, commit benchmarks/data/r_pretrends_golden.json, and activate the parity assertions (benchmarks/R/generate_pretrends_golden.R:L1-L24, benchmarks/R/generate_pretrends_golden.R:L107-L123, benchmarks/R/generate_pretrends_golden.R:L189-L223, TODO.md:L97-L98).

Security

  • No findings.

Documentation/Tests

  • Severity: P3. Impact: the release/reporting docs overstate the legacy covariance-provenance behavior. They say the old sentinel path is gone / that legacy inference now labels CS/SA + event_study_vcov as full_pre_period_vcov, but the implementation intentionally still emits diag_fallback_available_full_vcov_unused for legacy precomputed results without persisted provenance. The code and tests are correct; the docs are the inconsistent piece. Concrete fix: reword the docs to distinguish “new fits with persisted covariance_source” from “legacy serialized results using the conservative sentinel fallback” (CHANGELOG.md:L39-L39, docs/methodology/REPORTING.md:L324-L339, diff_diff/diagnostic_report.py:L1535-L1544, tests/test_diagnostic_report.py:L434-L479).
  • Note: I could not execute pytest here because this environment is missing runtime dependencies (numpy/scipy); static AST parsing of the touched Python files succeeded.

R9 verdict ✅ Looks good with one actionable P3: CHANGELOG.md L39 and
REPORTING.md L324-342 overstated the legacy covariance-source path.
Both said the `"diag_fallback_available_full_vcov_unused"` sentinel is
gone / that legacy inference labels CS / SA + `event_study_vcov` as
`"full_pre_period_vcov"`, but the R4 fix RESTORED the conservative
sentinel for legacy precomputed results that lack the persisted
`covariance_source` field — because without it we cannot distinguish
a pre-PR-B fit (used diag) from a post-PR-B fit (used full Σ_22).

Code and tests are correct; docs were the inconsistent piece.

Fix: reword both surfaces to distinguish two paths explicitly:

- **New fits** (post-PR-B): persisted `covariance_source` is read
  directly, non-bootstrap CS / SA report `"full_pre_period_vcov"` and
  are NOT downgraded.
- **Legacy serialized results** (pre-PR-B, no field): legacy
  type-based inference still emits the conservative sentinel for
  CS / SA + populated `event_study_vcov`, and the
  `well_powered → moderately_powered` downgrade still applies. For
  legacy `MultiPeriodDiDResults` without `interaction_indices`, the
  fallback reports `"diag_fallback"` (genuine fallback, no
  downgrade).

CHANGELOG entry expanded to list all four covariance-source DR
regression tests by name; REPORTING.md "Pre-period covariance
routing for staggered-estimator power" Note rewritten with the
two-path structure.

R-parity P3 deferred to PR-C per the existing TODO row (codex
labeled it informational / non-blocker).

No source changes; 309 tests across pretrends + DR + BR continue to
pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 02b74a8ba4cd43d11458259f4f7bd88a15ed9c62


Overall Assessment

✅ Looks good

Executive Summary

  • The prior re-review item on covariance-provenance reporting appears resolved: DiagnosticReport / BusinessReport now prefer persisted covariance_source, and the legacy fallback is documented consistently (diff_diff/diagnostic_report.py:L1442-L1497, docs/methodology/REPORTING.md:L324-L350, tests/test_diagnostic_report.py:L430-L480, tests/test_business_report.py:L2423-L2498).
  • I did not find any unmitigated P0/P1 issues in the changed estimator math, Σ_22 routing, MDV/power dispatch, or report-layer provenance handling.
  • The methodology-affecting surfaces are PreTrendsPower, compute_pretrends_power, compute_mdv, _extract_pre_period_params, and the CS/SA covariance adapters.
  • One P3 documentation/tests issue remains: several touched docstrings/changelog lines still describe pre-PR-B behavior.
  • R-package parity is still deferred to PR-C and is properly tracked in TODO.md.
  • I could not run pytest here; this review is source-level plus AST parsing of the touched Python files.

Methodology

No findings. The changed methodology surfaces align with the updated registry and the cited Roth paper: NIS is now the primary box-pretest form, detectable-slope reporting is in the paper’s γ_p framing, and the retained Wald branch is explicitly treated in the registry as the documented convex-region alternative rather than an undocumented deviation (diff_diff/pretrends.py:L39-L89, diff_diff/pretrends.py:L905-L1125, diff_diff/pretrends.py:L1209-L1504, diff_diff/pretrends.py:L1664-L1775, diff_diff/sun_abraham.py:L94-L104, diff_diff/sun_abraham.py:L782-L810, diff_diff/sun_abraham.py:L948-L955, docs/methodology/REGISTRY.md:L2783-L2854). citeturn3view0turn3view2

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

  • Severity: P3. Impact: R pretrends parity is still deferred, so the R-parity claims remain provisional until the goldens are committed; this is already tracked in project debt and skip-gated in the new tests, so it is informational rather than blocking. Concrete fix: in PR-C, pin the audited pretrends revision, commit benchmarks/data/r_pretrends_golden.json, and enable TestPretrendsParityR (TODO.md:L97-L98, benchmarks/R/generate_pretrends_golden.R:L1-L22, tests/test_methodology_pretrends.py:L920-L940).

Security

No findings.

Documentation/Tests

  • Severity: P3. Impact: Several touched explanatory texts still lag the shipped PR-B contract: helper tests at tests/test_pretrends.py:L610-L635 still describe custom weights as unsupported from helper APIs, PreTrendsPowerResults.violation_weights is documented as always “normalized” even though fresh linear fits intentionally persist unnormalized |t|, and the changelog still says the methodology test/script are “coming in the next commit” plus describes _extract_pre_period_params as a 5-tuple instead of the current 6-tuple (diff_diff/pretrends.py:L289-L294, CHANGELOG.md:L16-L17, CHANGELOG.md:L29-L30). Concrete fix: update those docstrings/changelog lines to reflect the current helper API, the linear-weight persistence contract, and the extra covariance_source return value.

Verification note: I could not execute the numerical test suite in this environment because the project test dependencies are unavailable; I did verify AST parsing on the touched Python files.

R10 verdict ✅ Looks good with one P3 actionable: four stale
explanatory texts still describe pre-PR-B / pre-Step-6 behavior.

**1. `PreTrendsPowerResults.violation_weights` docstring**
   (`pretrends.py:289-294`): said weights are "normalized" without
   qualification, but PR-B Step 4's linear-with-`relative_times` path
   intentionally persists the **unnormalized** `|t|` direction so that
   `δ_pre = M · |t|` and the reported MDV equals Roth's γ exactly.
   Reworded to enumerate the normalization regime per violation_type:
   constant / last_period / custom / linear-without-relative_times are
   L2-normalized; linear-with-relative_times is unnormalized γ-unit.

**2. `tests/test_pretrends.py` helper-rejection tests**
   (`test_compute_pretrends_power_rejects_custom_violation_type` and
   `test_compute_mdv_rejects_custom_violation_type`, lines 610-635):
   docstrings said the helper "does not accept ``violation_weights``,
   so the custom path is unusable from the helper" — but PR-B Step 6
   added the kwarg to both helpers. Reworded to scope the rejection
   contract correctly: it's the unsupplied-`violation_weights` case
   (loud-fail) that still raises, not the entire custom path.

**3. CHANGELOG.md (Added) "Coming in the next commit"**
   (lines 16-17): both bullets said the test file and R script were
   "coming in the next commit". Removed — they were committed in
   PR-B Step 7 (test file) and Step 12 (R script) respectively.

**4. CHANGELOG.md (Changed) "5-tuple"** (line 30): described
   `_extract_pre_period_params` as widening from a 4-tuple to a
   5-tuple, but the R3 covariance-source-propagation fix later
   widened it to a 6-tuple `(effects, ses, vcov, n_pre,
   relative_times, covariance_source)`. Updated the bullet to
   describe the current 6-tuple shape and added a forward
   reference to the MPD pandas.Period / Timestamp / datetime64
   coercion path (R8 fix) and the covariance_source provenance
   contract (R3 fix).

The R-parity P3 remains deferred to PR-C per the existing TODO row
(codex labeled it informational / non-blocker).

97 pretrends tests + DR + BR tests pass; no source-logic changes
in this commit (docs + comments only).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: b00782e3a1ae0e79846742725195b15c9668a1e8


Overall Assessment
✅ Looks good

Executive Summary

  • The prior re-review items on covariance provenance and custom-weight persistence appear resolved: PreTrendsPowerResults now carries covariance_source, pretest_form, and violation_weights, and DiagnosticReport / BusinessReport consume the persisted provenance instead of re-inferring it from result type alone.
  • I did not find an unmitigated P0/P1 in the touched methodology surfaces: diff_diff/pretrends.py, diff_diff/sun_abraham.py, and diff_diff/diagnostic_report.py.
  • The NIS implementation, full-Σ_22 CS/SA routing, γ-unit linear-weight change, and legacy downgrade behavior all line up with the updated methodology registry.
  • One P3 docs/tests issue remains: some touched wording still says pretest_form='wald' reproduces pre-PR-B output, which is no longer true for fitted linear results after the new relative_times weight construction.
  • R pretrends parity is still deferred, but it is explicitly tracked in TODO.md and the parity tests are skip-gated, so it is informational rather than blocking.
  • Verification note: this was a source-level review only. I could not execute the touched tests because this environment lacks pytest and the runtime deps (numpy, scipy, pandas).

Methodology
No findings. The changed estimator math and assumptions in diff_diff/pretrends.py / diff_diff/sun_abraham.py match the updated docs/methodology/REGISTRY.md contract for NIS, Wald-as-documented-alternative, full-Σ_22 routing, and γ-unit linear MDV.

Code Quality
No findings.

Performance
No findings.

Maintainability
No findings.

Tech Debt

  • Severity: P3. Impact: R pretrends parity is still provisional because the committed generator is intentionally a PR-C placeholder and TestPretrendsParityR remains skip-gated. This is already tracked and therefore non-blocking. References: TODO.md:L97-L97, METHODOLOGY_REVIEW.md:L1050-L1072, benchmarks/R/generate_pretrends_golden.R:L1-L223, tests/test_methodology_pretrends.py:L920-L989. Concrete fix: In PR-C, pin the audited R revision, replace the NA_real_ stubs, commit benchmarks/data/r_pretrends_golden.json, and unskip the parity tests.

Security
No findings.

Documentation/Tests

  • Severity: P3. Impact: The backward-compat wording still overstates what pretest_form='wald' preserves. CHANGELOG.md:L29-L30 and the comments/tests at tests/test_methodology_pretrends.py:L840-L875 say the Wald path reproduces pre-PR-B output, but the fitted linear path now intentionally uses actual relative_times and unnormalized |t| weights at diff_diff/pretrends.py:L867-L891. The authoritative registry already describes that changed weight contract at docs/methodology/REGISTRY.md:L2818-L2820, so the changelog/test wording is now the inconsistent part. Concrete fix: Reword those lines to say Wald preserves the acceptance-region form, and only preserves the old numerics on the legacy relative_times=None path if you want to keep a backwards-compat statement at all.

R11 verdict ✅ with one P3 actionable: the Wald backward-compat
wording was overstated. CHANGELOG.md:L29-L30 and the methodology
test docstrings at L840-L875 said `pretest_form='wald'` preserves
pre-PR-B output bit-identically, but PR-B Step 4's `relative_times`
threading changed the linear-weight contract for BOTH NIS and Wald
paths — a Wald fit on an irregular grid `{-5, -3, -1}` now produces
γ-unit MDV (not the pre-PR-B count-based L2-normalized MDV).

Pre-PR-B Wald numerics are bit-identical to post-PR-B Wald output
only on (a) the legacy `relative_times=None` callable path
(`_get_violation_weights(n_pre)` invoked directly without threading),
and (b) regular-grid fits where `|t| ∝ [n_pre-1, ..., 0]`.

Fix:

- **CHANGELOG.md (Changed) L29**: reword to scope the Wald
  backwards-compat claim to the **acceptance-region form** (the
  noncentral-χ² on δ' Σ_22^{-1} δ, unchanged), and add an explicit
  caveat that the linear-weight contract changed independently in
  Step 4. Bit-identity holds only on the legacy callable path and
  the regular-grid case. Removed the stale "tutorial 07 to be
  re-rendered TBD" qualifier; tracked as a follow-up.

- **`tests/test_compute_pretrends_power_accepts_pretest_form_wald`**
  docstring: clarify that the `'wald'` selector picks the
  acceptance-region form, not bit-identical numerical output on
  fitted results.

- **`test_wald_path_preserves_pre_pr_b_output`** renamed to
  `test_wald_path_preserves_pre_pr_b_acceptance_region_form`;
  docstring expanded with the two-regime backwards-compat scope
  (legacy callable path: bit-identical; new fit() path: γ-unit
  MDV on irregular grids). Added a positive NaN-check on
  `nis_box_probability` to lock the Wald-only field contract.

The R-parity P3 remains deferred to PR-C per the existing TODO row
(codex labeled it informational / non-blocker).

30 methodology + 100 baseline pretrends tests pass; no source-logic
changes (docs / docstrings only).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 84e94d9ca793d507ae47b721021756c8f8f790e0


Overall Assessment
⚠️ Needs changes

Executive Summary

  • The estimator-side methodology changes in diff_diff/pretrends.py and diff_diff/sun_abraham.py mostly line up with the updated registry: NIS is now the default, CS/SA now route through full Σ_22 when available, custom weights are persisted, and covariance provenance is propagated correctly.
  • The prior P3 wording issue is mostly addressed, but one API example still overstates Wald backward compatibility.
  • Severity P1 [Newly identified]: after PR-B changed linear MDV to Roth’s γ units, downstream heuristics still treat raw mdv as a level-scale effect. DiagnosticReport/BusinessReport now divide a slope by a level ATT and use it to tier studies as well/moderately/underpowered.
  • R parity remains intentionally deferred and tracked in TODO.md; that part is non-blocking.
  • This was a source review only. I did not execute the touched tests here because this workspace does not have the runtime deps (numpy, scipy) installed.

Methodology

  • Severity: P1 [Newly identified]. Impact: PreTrendsPower now explicitly returns linear MDV in Roth’s γ units when relative_times are available (diff_diff/pretrends.py:L842-L855, docs/methodology/REGISTRY.md:L2818-L2820), but downstream code still compares raw mdv to coefficient-scale quantities. is_informative uses mdv < 2 * max(pre_period_ses) (diff_diff/pretrends.py:L335-L345), sensitivity_to_honest_did() reports mdv / max(SE) (diff_diff/pretrends.py:L1623-L1660), and the report path hard-codes violation_type="linear" then computes mdv_share_of_att = mdv / abs(att) for tiering (diff_diff/diagnostic_report.py:L1416-L1458, docs/methodology/REPORTING.md:L284-L317, user-facing prose at diff_diff/business_report.py:L2158-L2163). That now mixes a slope parameter with level effects/SEs, so the PR can silently over- or under-classify the same fit as well_powered/moderately_powered/underpowered; the new regression in tests/test_business_report.py:L2428-L2475 currently pins that unit-mismatched ratio. Concrete fix: derive a coefficient-scale summary from the fitted violation vector, e.g. max_abs_pre_violation = mdv * max(abs(violation_weights)), and use that instead of raw mdv everywhere the code compares against ATT or pre-period SEs (mdv_share_of_att, is_informative, mdv_in_ses). If the project prefers a different scalar summary, document it explicitly, but stop comparing raw γ to level-scale quantities.

Code Quality
No findings.

Performance
No findings.

Maintainability
No findings.

Tech Debt

  • Severity: P3. Impact: R pretrends parity is still intentionally deferred. The new generator still contains <PR-C-PIN> and NA_real_ stubs (benchmarks/R/generate_pretrends_golden.R:L1-L40, benchmarks/R/generate_pretrends_golden.R:L105-L124, benchmarks/R/generate_pretrends_golden.R:L185-L223), and the tracker still marks PreTrendsPower as “Complete (R parity pending)” (TODO.md:L97, METHODOLOGY_REVIEW.md:L83, METHODOLOGY_REVIEW.md:L1050-L1072). This is already tracked, so it is non-blocking. Concrete fix: land PR-C as planned: pin the audited R revision, generate benchmarks/data/r_pretrends_golden.json, and unskip TestPretrendsParityR.

Security
No findings.

Documentation/Tests

  • Severity: P3. Impact: the API docs still say users can “Opt back into the pre-PR-B Wald … form for backwards-compatible numerical output” (docs/api/pretrends.rst:L66-L69). That is still too broad after the γ-unit linear-weight change: fitted irregular-grid linear results can differ from pre-PR-B Wald numerics, as the registry now explains (docs/methodology/REGISTRY.md:L2818-L2820). Concrete fix: reword that example to say Wald preserves the acceptance-region form, and only preserves legacy numerics on the legacy relative_times=None path or regular-grid cases.

Path to Approval

  1. Replace the report/informativeness heuristics that use raw mdv (mdv_share_of_att, is_informative, mdv_in_ses) with a coefficient-scale quantity derived from the fitted violation vector, and add a regression on an irregular-grid linear fit so the ratio/tier cannot regress back to raw γ.
  2. Re-pin the new report-layer regression tests to the corrected scalar; tests/test_business_report.py:L2428-L2475 currently locks in the wrong mdv / |att| contract.
  3. Fix the remaining Wald backward-compat wording in docs/api/pretrends.rst.

R12 CI codex caught a real unit-mismatch bug the doc-rewording trail
uncovered. After PR-B Step 4 made linear `mdv` report Roth's γ units
(a slope on relative time), downstream code still divided it by
level-scale quantities — `DiagnosticReport.pretrends_power` computed
`mdv_share_of_att = mdv / abs(att)`, `is_informative` checked
`mdv < 2 * max(pre_period_ses)`, and `sensitivity_to_honest_did`
reported `mdv_in_ses = mdv / max_pre_se`. On an irregular grid
`[-5, -3, -1]`, the level-scale max pre-period violation under the
MDV is `mdv * 5`, but the raw `mdv` (γ) is what was being compared
to `|att|` / SE — silently mis-tiering the same fit.

This is the holistic-fix-on-repeated-doc-findings pattern: each
prior round (R5/R9/R10/R11) the codex flagged Wald backwards-compat
wording as "overstated" because the LINEAR-WEIGHT change bled into
surfaces the doc claimed were unaffected — but the wording wasn't
wrong, the IMPLEMENTATION was incomplete.

**Holistic fix — new level-scale property `max_abs_pre_violation`**

`PreTrendsPowerResults.max_abs_pre_violation` returns
`mdv * max(|violation_weights|)` — the largest level-scale pre-period
deviation under the MDV. This is the right unit-consistent scalar
for comparison against `|att|`, per-period SEs, and HonestDiD's M.

- For `linear` with `relative_times=[-T, ..., -1]`, weights = `|t|`,
  so `max_abs_pre_violation = mdv * T_max`.
- For `constant` with normalized `[1, ..., 1]/√K`, weights ~ 1/√K,
  so `max_abs_pre_violation = mdv / √K`.
- For `last_period` with `[0, ..., 0, 1]`, weights have max=1, so
  `max_abs_pre_violation = mdv`.
- For `custom`, depends on user-supplied vector.
- Legacy serialized results without `violation_weights` fall back to
  raw `mdv` (pre-PR-B count-based L2-normalized linear was already
  roughly level-scale, so the fallback gives the right magnitude).

**Wired through 3 surfaces:**

1. `PreTrendsPowerResults.is_informative`: uses
   `max_abs_pre_violation < 2 * max_se` instead of `mdv < 2 * max_se`.
2. `PreTrendsPower.sensitivity_to_honest_did`: reports
   `mdv_in_ses = max_abs_pre_violation / max_pre_se`, and surfaces
   `max_abs_pre_violation` as a new dict key.
3. `DiagnosticReport._check_pretrends_power` and
   `_format_precomputed_pretrends_power`: `mdv_share_of_att =
   max_abs_pre_violation / abs(att)`. Schema also surfaces the new
   `max_abs_pre_violation` field.

On `cs_fit` (`base_period='universal'`, seed=7, treatment_effect=1.5):
- pre_periods = `[-4, -3, -2]`, `max(|t|) = 4`
- `mdv` (γ) = 0.0937, `max_abs_pre_violation` = 0.375, `|att|` = 1.779
- pre-fix `mdv / |att|` = 0.053 (slope/level mismatch)
- post-fix `max_abs_pre_violation / |att|` = 0.211 (level/level)
- Tier: still `well_powered` (0.211 < 0.25), now interpretable.

**Documentation updated** to match:

- `docs/methodology/REPORTING.md` "Power-aware phrasing" Note: ratio
  definition changed from `mdv / abs(att)` to
  `max_abs_pre_violation / abs(att)`; rationale added inline.
- `docs/api/pretrends.rst` Wald example: rewrote the "backwards-
  compatible numerical output" wording to scope bit-identity to
  regular grids / legacy `relative_times=None` path. PR-B Step 4's
  `relative_times` threading applies to BOTH NIS and Wald, so a Wald
  fit on an irregular grid also produces γ-unit MDV — not pre-PR-B
  numerics.
- `CHANGELOG.md` (Fixed): new entry documenting the fix and the
  numerical-output change for downstream consumers.

**New regressions:**

- `test_max_abs_pre_violation_uses_weight_scale_on_irregular_grid`:
  constructs an irregular-grid `[-5, -3, -1]` fit and asserts
  `max_abs_pre_violation = mdv * 5`, with a guard that the value is
  > 4x the raw mdv to prevent any future revert.
- `test_is_informative_uses_level_scale_not_raw_gamma`: constructs
  a fit where raw `mdv < 2*SE` (would say "informative") but
  `max_abs_pre_violation > 2*SE` (says "not informative"); asserts
  the level-scale check wins.
- Updated `test_full_vcov_path_no_downgrade_on_real_cs_fit` (BR
  level): now pins `0.35 < block["max_abs_pre_violation"] < 0.40`
  and asserts `mdv_share_of_att < 0.25` against the new
  level-scale definition.

590 tests pass across pretrends + DR + BR + SA + staggered. 4
skipped (R-parity stubs + 1 fixture skip). No regressions outside
the targeted ratio definition change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: b053faac637405b59b2886e74a6b114b14a8a0bd


Overall Assessment

⚠️ Needs changes

Executive Summary

  • The prior rerun P1 on unit-mismatched linear MDV tiering looks resolved across the estimator, report layer, docs, and regressions (diff_diff/pretrends.py:L335-L385, diff_diff/pretrends.py:L1663-L1714, diff_diff/diagnostic_report.py:L1428-L1527, docs/methodology/REPORTING.md:L301-L309).
  • The PR’s main methodology changes otherwise line up with the updated registry: NIS is the default, CS/SA now route through full Σ_22 when available, custom weights are persisted, and covariance provenance is carried through to reporting (diff_diff/pretrends.py:L1116-L1171, diff_diff/sun_abraham.py:L782-L810, diff_diff/diagnostic_report.py:L1451-L1614).
  • Severity: P1 [Newly identified]. violation_type="constant" is now documented as δ_t = c / δ_t = M in the public methodology/docs, but the implementation still L2-normalizes the constant direction. That means each pre-period shift is actually M / √K, not M, so constant-pattern power/MDV remain on a different scale than the PR now claims.
  • R parity is still intentionally deferred and tracked; the generator and parity tests are PR-C stubs, which is non-blocking because the TODO tracking is explicit (TODO.md:L97, benchmarks/R/generate_pretrends_golden.R:L16-L24, tests/test_methodology_pretrends.py:L1057-L1126).
  • I did not execute the touched tests here; this workspace is missing runtime deps (numpy, scipy).

Methodology

  • Severity: P1 [Newly identified]. Impact: the PR now states that the constant violation is a per-period level shift, δ_t = c / δ_t = M, in docs/methodology/REGISTRY.md:L2812-L2816 and docs/api/pretrends.rst:L132-L135, but the implementation still normalizes constant weights to unit norm in diff_diff/pretrends.py:L932-L947, and power_at() preserves the same normalized contract in diff_diff/pretrends.py:L562-L579. The in-code docstring even reflects the normalized interpretation (diff_diff/pretrends.py:L370-L373). So for K pre-periods, the library currently evaluates δ_t = M / √K, not δ_t = M. That is an undocumented methodology mismatch on a public estimator surface. Concrete fix: either stop normalizing the constant direction so M truly is the per-period level shift, or add an explicit REGISTRY/API note that M is the magnitude along the normalized constant direction and update the public wording away from δ_t = M. Add an end-to-end regression that locks the chosen constant-pattern contract.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • Severity: P3. Impact: R-package parity remains deferred. The committed generator still contains <PR-C-PIN> / NA_real_ placeholders (benchmarks/R/generate_pretrends_golden.R:L16-L24, benchmarks/R/generate_pretrends_golden.R:L105-L123, benchmarks/R/generate_pretrends_golden.R:L189-L223), and the parity tests are still skip-gated stubs (tests/test_methodology_pretrends.py:L1057-L1126). This is already tracked in TODO.md:L97 and METHODOLOGY_REVIEW.md:L1050-L1072, so it is non-blocking. Concrete fix: land PR-C as planned.

Security

  • No findings.

Documentation/Tests

  • Severity: P3. Impact: the reporting math was corrected to use max_abs_pre_violation / |ATT|, but BusinessReport still labels that number as MDV / |ATT| in the rendered report (diff_diff/business_report.py:L2469-L2474), while the new definition lives in diff_diff/diagnostic_report.py:L1428-L1468 and docs/methodology/REPORTING.md:L301-L309. The numeric tier is right; the label is stale. Concrete fix: rename the rendered label to match the new numerator, or surface both raw mdv and max_abs_pre_violation explicitly.

Path to Approval

  1. Resolve the constant-pattern contract mismatch: either remove L2 normalization for violation_type="constant" in PreTrendsPower, or document the normalized-direction contract explicitly in REGISTRY.md and docs/api/pretrends.rst so the public method definition matches the shipped code.
  2. Add one regression that locks the chosen constant-pattern semantics end-to-end in fit() and power_at() for K > 1, so future audits cannot drift between “per-period shift” and “normalized-direction magnitude.”

R13 CI codex caught the next adjacent unit-mismatch — analogous to
R12's linear/level fix but for `violation_type='constant'`. REGISTRY
`## PreTrendsPower` documents the pattern as `δ_t = c` (per-period
level shift), but `_get_violation_weights('constant')` still
L2-normalized to `[1/√K, ..., 1/√K]`, so `δ_t = M/√K` not `δ_t = M`.

The Step 4 γ-unit fix for linear and the R12 holistic
`max_abs_pre_violation` fix uncovered this — once `mdv` is the
documented magnitude of a per-period shift on linear, the constant
pattern's silent `1/√K` re-scaling becomes visible and breaks the
documented contract.

**Holistic fix — constant/last_period skip L2 normalization**

- `_get_violation_weights('constant')`: early return `np.ones(n_pre)`,
  no L2 norm. Now `δ_t = M` exactly per period — matching the
  REGISTRY/API documented contract.
- `_get_violation_weights('last_period')`: also given an explicit
  early return for symmetry. The `[0, ..., 0, 1]` vector already had
  L2 norm 1, so this is a no-op numerically; the early return locks
  the contract uniformly across level-pattern violation types.
- `power_at()` legacy reconstruction (fallback for old serialized
  results without `violation_weights`): unchanged — still
  L2-normalizes, preserving pre-PR-B numerical output for legacy
  serialized fits per the same backwards-compat policy applied to
  the linear-legacy and constant-legacy paths.
- Docstring on `_get_violation_weights` rewritten to enumerate the
  per-type normalization convention explicitly: linear-with-times
  (γ-unit), linear-legacy (L2-norm), constant (level), last_period
  (level), custom (L2-norm).

**End-to-end regression** in `test_methodology_pretrends.py`:

- `test_constant_violation_pattern_is_level_shift`: real
  `SunAbraham`-fit results, asserts `violation_weights == [1, ..., 1]`
  (NOT L2-normalized → `||w||_2 = √K`), `max_abs_pre_violation == mdv`
  (level-scale and γ-scale coincide for constant), and
  `power_at(M) ≈ refit(M=M).power` at `atol=1e-4` so the
  level-shift contract holds end-to-end through `fit()` and
  `power_at()`. Codex specifically requested an end-to-end lock so
  future audits cannot drift between "per-period shift" and
  "normalized-direction magnitude".

- `test_constant_weights` in `tests/test_pretrends.py` flipped: was
  pinning `np.linalg.norm(weights) == 1.0` (the OLD L2-normalized
  contract); now asserts unnormalized `[1, 1, 1, 1]` with
  `||w||_2 = 2` (√4). Docstring explains the contract change.

**P3 fix — BR rendered label**

`BusinessReport.full_report()` still labeled the
`mdv_share_of_att` row as `MDV / |ATT|`, but R12 redefined the
numerator as `max_abs_pre_violation`. Fix: rename the rendered
label to "Max pre-period level deviation / |ATT|" and add an
explicit row for `Max pre-period level deviation at MDV` above it
so users can see both the raw `mdv` and the level-scale scalar.

**Behavior change for users:** any caller passing
`violation_type='constant'` will see a √K-factor change in the
reported `mdv` and downstream `mdv_share_of_att`. The shift is
documented in REGISTRY `## PreTrendsPower` (the pattern was
`δ_t = c` all along — the IMPLEMENTATION is now what the docs
have always said).

445 tests pass across pretrends + DR + BR + SA. 4 skipped (R-parity
+ 1 fixture skip).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 049f04ced170d3788c9dbccf83115cd17ab295b7


Overall Assessment

✅ Looks good

Executive Summary

  • The prior rerun P1 on the constant-pattern contract appears resolved across the estimator, registry, API docs, and regressions: constant violations are now treated as a true level shift rather than an L2-normalized direction. diff_diff/pretrends.py:L951-L959, docs/methodology/REGISTRY.md:L2812-L2820, tests/test_methodology_pretrends.py:L527-L563
  • I did not find any new unmitigated P0/P1 methodology issues in the estimator changes. The NIS/Wald split, full-Σ_22 routing for non-bootstrap CS/SA, γ-unit linear MDV path, and custom-weight persistence all line up with the updated registry.
  • Severity P2: the new max_abs_pre_violation metric is computed in DiagnosticReport and expected by the BR full-report renderer, but _lift_pre_trends() never carries it into the BusinessReport schema. That means the new raw level-scale metric is still dropped on the BR path. diff_diff/diagnostic_report.py:L1466-L1477, diff_diff/business_report.py:L924-L931, diff_diff/business_report.py:L2469-L2481
  • Severity P3: several narrative/doc surfaces still describe the tier numerator as raw “MDV” even though the code now uses max_abs_pre_violation / |ATT|; a couple of in-code docs also still describe the old constant-normalization contract. diff_diff/business_report.py:L2157-L2167, diff_diff/diagnostic_report.py:L3279-L3286, diff_diff/pretrends.py:L289-L300, diff_diff/pretrends.py:L370-L373
  • R parity is still intentionally deferred and explicitly tracked, so that remains non-blocking. TODO.md:L97-L98, benchmarks/R/generate_pretrends_golden.R:L1-L24, tests/test_methodology_pretrends.py:L1095-L1135
  • I could not run the touched tests here because this workspace is missing pytest and numpy.

Methodology

  • No findings. The estimator-side changes now match the updated REGISTRY.md contract closely enough that I do not see an unmitigated P1+ methodology defect in the touched paths.

Code Quality

  • Severity P2. Impact: DiagnosticReport now computes and surfaces max_abs_pre_violation, and _render_full_report() is prepared to print it, but BusinessReport._lift_pre_trends() drops the field entirely. As written, BR schema consumers and full_report() still cannot see the new raw level-scale metric even though the PR added renderer logic for it. diff_diff/diagnostic_report.py:L1466-L1477, diff_diff/business_report.py:L924-L931, diff_diff/business_report.py:L2469-L2481
    Concrete fix: add "max_abs_pre_violation": pp.get("max_abs_pre_violation") in _lift_pre_trends(), then add an end-to-end BR regression on both to_dict()["pre_trends"] and full_report().

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • Severity P3. Impact: R-package parity is still deferred; the generator is intentionally placeholder-only and the parity suite is skip-gated until PR-C lands. This is already tracked in TODO.md, so it is non-blocking. TODO.md:L97-L98, benchmarks/R/generate_pretrends_golden.R:L1-L24, tests/test_methodology_pretrends.py:L1095-L1135
    Concrete fix: land PR-C with a pinned pretrends revision, committed JSON goldens, and activated parity assertions.

Security

  • No findings.

Documentation/Tests

  • Severity P3. Impact: the numeric tiering now uses max_abs_pre_violation / |ATT|, but some user-facing prose and in-code docs still say “MDV” is what is being compared to the ATT, and the autosummary page still omits the new property. That leaves the reporting explanation partially out of sync with the actual metric, especially on irregular-grid linear fits where raw MDV is in γ units. diff_diff/business_report.py:L2157-L2167, diff_diff/diagnostic_report.py:L3279-L3286, diff_diff/pretrends.py:L289-L300, diff_diff/pretrends.py:L370-L373, docs/api/_autosummary/diff_diff.PreTrendsPowerResults.rst:L24-L47
    Concrete fix: rename the summary prose to reference the “max pre-period level deviation at the MDV,” update the stale constant-weight docstrings, add ~PreTrendsPowerResults.max_abs_pre_violation to the autosummary, and extend the report tests to assert the renamed metric end-to-end.

R14 verdict ✅ Looks good with one P2 BR-schema-gap + a P3 fan-out
of stale "MDV vs ATT" prose that the R12 level-scale fix needed to
sweep through but missed.

**P2 — `_lift_pre_trends` dropped `max_abs_pre_violation`**

R12 added `max_abs_pre_violation` to the `DiagnosticReport` schema
and wired the `BusinessReport.full_report()` renderer to print it,
but the BR schema-lift helper at `business_report.py:884` did NOT
carry the field across the DR → BR boundary. Net effect:
`BR.to_dict()["pre_trends"]` was missing the field AND
`full_report()`'s `pt.get("max_abs_pre_violation")` returned None,
so the new "Max pre-period level deviation at MDV" line never
actually rendered.

Fix: add `"max_abs_pre_violation": pp.get("max_abs_pre_violation")`
to the `_lift_pre_trends` return dict. New BR end-to-end regression
asserts both `BR.to_dict()["pre_trends"]["max_abs_pre_violation"]`
is populated AND `full_report()` contains the rendered line.

**P3 — Stale "MDV / |ATT|" prose in 4 surfaces**

R12 moved the tier numerator from raw `mdv` to `max_abs_pre_violation`
but several user-facing prose surfaces still said the comparison
was between "MDV" and "estimated effect" — wording lag, not a
behavioral bug.

1. `business_report.py:2167` "the test is well-powered" summary
   sentence: reworded to say "the max pre-period level deviation at
   the MDV is small relative to the estimated effect" rather than
   the bare "minimum-detectable violation is small".
2. `diagnostic_report.py:3284` DR "no_detected_violation /
   well_powered" sentence: same swap from "MDV is a small share of
   the estimated effect" to "the max pre-period level deviation at
   the MDV is a small share".
3. `PreTrendsPowerResults.violation_weights` docstring: reworded to
   enumerate per-violation_type normalization explicitly (linear
   with-times γ-unit; linear legacy L2-norm; constant unnormalized
   level-shift; last_period level; custom L2-norm).
4. `PreTrendsPowerResults.max_abs_pre_violation` property docstring
   (the non-linear-types paragraph): updated to reflect the R13
   constant-level-shift change (`mdv * 1 = mdv` rather than the old
   `mdv / √K`).

Plus the autosummary RST adds the new
`~PreTrendsPowerResults.max_abs_pre_violation` property entry so
the published API page lists it.

R-parity P3 deferred to PR-C per the existing TODO row.

591 tests pass; no code-path regressions. The new BR regression
catches the lift-boundary bug if it ever regresses.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 853e52336e44992b5730bbc476004a905077da0a


Overall Assessment

✅ Looks good

Executive Summary

  • The prior rerun P2 is resolved: BusinessReport._lift_pre_trends() now carries max_abs_pre_violation, and the BR full report renders the level-scale metric instead of dropping it at the lift boundary. diff_diff/business_report.py:L884-L939, diff_diff/business_report.py:L2477-L2489
  • The prior rerun P3 doc drift is resolved on the changed surfaces: reporting prose, PreTrendsPowerResults docs, and autosummary now match the level-scale ratio contract. diff_diff/diagnostic_report.py:L3279-L3287, diff_diff/business_report.py:L2164-L2170, docs/api/_autosummary/diff_diff.PreTrendsPowerResults.rst:L24-L48
  • On methodology, the changed estimator paths line up with the updated registry: NIS is the documented default, Wald remains the documented alternative, non-bootstrap CS/SA now route full Σ_22, linear MDV uses Roth’s γ units when relative times are available, and custom weights persist through power_at(). diff_diff/pretrends.py:L39-L237, diff_diff/pretrends.py:L876-L1595, diff_diff/sun_abraham.py:L782-L810, docs/methodology/REGISTRY.md:L2773-L2854
  • The covariance-provenance fix is correctly persisted at fit time and honored by both live and precomputed report paths, with the legacy conservative fallback still preserved for ambiguous old serialized objects. diff_diff/pretrends.py:L1576-L1594, diff_diff/diagnostic_report.py:L1451-L1534, diff_diff/diagnostic_report.py:L1537-L1614
  • Severity P3 only: R-package parity is still intentionally deferred to PR-C and is properly tracked, so it is non-blocking. TODO.md:L97-L98, METHODOLOGY_REVIEW.md:L1050-L1072, benchmarks/R/generate_pretrends_golden.R:L1-L24
  • I could not execute the touched tests here because this environment does not have pytest, numpy, scipy, or pandas.

Methodology

  • No findings. The changed implementation matches the updated Methodology Registry on the load-bearing points: NIS box probability, Wald alternative, full-Σ_22 routing, γ-unit linear weights, and custom-weight persistence. docs/methodology/REGISTRY.md:L2775-L2854, diff_diff/pretrends.py:L1225-L1595

Code Quality

  • No findings. The previous BR lift-boundary regression is fixed and covered by an end-to-end CS regression. diff_diff/business_report.py:L884-L939, tests/test_business_report.py:L2425-L2515

Performance

  • No findings.

Maintainability

  • No findings. Persisting covariance_source on PreTrendsPowerResults removes the old type-based guesswork on new fits and confines fallback inference to true legacy cases. diff_diff/pretrends.py:L323-L333, diff_diff/pretrends.py:L1576-L1594, diff_diff/diagnostic_report.py:L1511-L1614

Tech Debt

  • Severity P3. Impact: R pretrends parity is still deferred; the committed generator is intentionally a placeholder with NA_real_ stubs and an unresolved <PR-C-PIN>, so the new parity class remains skip-gated. This is already tracked in TODO.md / METHODOLOGY_REVIEW.md, so it is non-blocking. Concrete fix: land PR-C with a pinned pretrends revision, committed benchmarks/data/r_pretrends_golden.json, and activated tests/test_methodology_pretrends.py::TestPretrendsParityR. benchmarks/R/generate_pretrends_golden.R:L1-L24, benchmarks/R/generate_pretrends_golden.R:L185-L223, TODO.md:L97-L98, METHODOLOGY_REVIEW.md:L1050-L1072

Security

  • No findings.

Documentation/Tests

  • No findings in the changed surfaces. The report prose, autosummary, API docs, and new regression coverage are aligned with the new level-scale metric and covariance-provenance contract. docs/methodology/REPORTING.md:L284-L365, docs/api/pretrends.rst:L57-L143, tests/test_diagnostic_report.py:L380-L591, tests/test_methodology_pretrends.py:L882-L1008
  • Validation note: I could not run the touched tests here because pytest, numpy, scipy, and pandas are unavailable.

@igerber igerber added the ready-for-ci Triggers CI test workflows label May 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant