Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
ebe1f69
PreTrendsPower PR-B Step 2: NIS test form + result-class extension + …
igerber May 18, 2026
d6c4ed9
PreTrendsPower PR-B Step 6: test fixes for NIS default flip
igerber May 18, 2026
16ae235
PreTrendsPower PR-B Step 3 (SA): extend SunAbrahamResults with event_…
igerber May 18, 2026
25fb598
PreTrendsPower PR-B Step 3 (CS+SA routes): consume event_study_vcov
igerber May 18, 2026
f6fa28a
PreTrendsPower PR-B Step 4: linear weights honor relative_times → γ-u…
igerber May 18, 2026
34f6bfb
PreTrendsPower PR-B Steps 8-11: REGISTRY refresh + METHODOLOGY_REVIEW…
igerber May 18, 2026
70b3b04
PreTrendsPower PR-B Step 7: NEW tests/test_methodology_pretrends.py
igerber May 18, 2026
0129815
PreTrendsPower PR-B Step 12: NEW benchmarks/R/generate_pretrends_gold…
igerber May 18, 2026
8a3624d
Address R1 review (2 P0 + 1 P1 + 1 P2) on PreTrendsPower PR-B
igerber May 18, 2026
9dc4678
Address R2 review (1 P0 + 2 P1 + 1 P2) on PreTrendsPower PR-B
igerber May 18, 2026
66654fc
Address R3 review (1 P1 + 2 P3) on PreTrendsPower PR-B
igerber May 19, 2026
e0156bb
Address R4 review (1 P1 + 2 P3) on PreTrendsPower PR-B
igerber May 19, 2026
5fb4aa7
Address R5 review (1 P2 + 2 P3) on PreTrendsPower PR-B
igerber May 19, 2026
da2a7bd
Address R6 review (2 P3) on PreTrendsPower PR-B
igerber May 19, 2026
cfb3200
Address CI R8 codex review (1 P1 + 1 P3) on PreTrendsPower PR-B
igerber May 19, 2026
02b74a8
Address CI R9 codex review (1 P3) on PreTrendsPower PR-B
igerber May 19, 2026
b00782e
Address CI R10 codex review (1 P3) on PreTrendsPower PR-B
igerber May 19, 2026
84e94d9
Address CI R11 codex review (1 P3) on PreTrendsPower PR-B
igerber May 19, 2026
b053faa
Address CI R12 P1: level-scale ratio for pretrends tier classification
igerber May 19, 2026
049f04c
Address CI R13 P1: constant violation pattern is now a true level shift
igerber May 19, 2026
853e523
Address CI R14 (P2 + P3) on PreTrendsPower PR-B
igerber May 19, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions CHANGELOG.md

Large diffs are not rendered by default.

31 changes: 21 additions & 10 deletions METHODOLOGY_REVIEW.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ The catalog grew incrementally over several quarters, so formats vary across the
|------|--------|-------------|--------|-------------|
| BaconDecomposition | `bacon.py` | `bacondecomp::bacon()` | **Complete** | 2026-05-16 |
| HonestDiD | `honest_did.py` | `HonestDiD` package | **Complete** | 2026-04-01 |
| PreTrendsPower | `pretrends.py` | `pretrends` package | **In Progress** | — |
| PreTrendsPower | `pretrends.py` | `pretrends` package | **Complete** (R parity pending) | 2026-05-18 |
| PowerAnalysis | `power.py` | `pwr` / `DeclareDesign` | **In Progress** | — |
| PlaceboTests | `diagnostics.py` | (no canonical reference) | **In Progress** | — |

Expand Down Expand Up @@ -1047,18 +1047,29 @@ and covariate-adjusted specifications.)
| Module | `pretrends.py` |
| Primary Reference | Roth (2022), *Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends*, AER:I 4(3), 305-322 |
| R Reference | `pretrends` package |
| Status | **In Progress** |
| Last Review | |
| Status | **Complete** (R parity pending) |
| Last Review | 2026-05-18 |

**Documentation in place:**
- REGISTRY.md section: `## PreTrendsPower` (MDV at target power, four violation types — linear/constant/last_period/custom, power curve plotting, HonestDiD integration)
- Implementation: `tests/test_pretrends.py` (point-estimator, MDV, power curve, sensitivity) plus event-study coverage in `tests/test_pretrends_event_study.py`
- Paper review on file: `docs/methodology/papers/roth-2022-review.md` (added 2026-05-17; non-authoritative source audit — registry entry remains authoritative until the follow-up audit PR)
- REGISTRY.md section: `## PreTrendsPower` — NIS-framed audit per Roth (2022) Section II.A-B with full equation blocks for both NIS and Wald forms; paper-supported alternative + γ-unit MDV + full-Σ_22 routing all locked.
- Paper review on file: `docs/methodology/papers/roth-2022-review.md` (added 2026-05-17 via PR #463).
- Implementation: `tests/test_pretrends.py` (67 tests — point-estimator, MDV, power curve, sensitivity, plus the PR-A R18 silent-failure regression and the PR-B custom-weight persistence regression) + event-study coverage in `tests/test_pretrends_event_study.py` (27 tests).
- Dedicated `tests/test_methodology_pretrends.py` (added 2026-05-18 in PR-B Step 7) — Roth (2022) Section II.A-B paper-equation-numbered Verified Components walk-through (8 classes, 30-40 tests covering NIS box probability, Wald-vs-NIS, Propositions 1-4 simulation parity, linear-units γ-scale, custom-weight persistence, CS/SA full-VCV, helper API).

**Outstanding for promotion:**
- Dedicated `tests/test_methodology_pretrends.py` with paper-equation-numbered Verified Components walk-through
- R parity fixture against the `pretrends` R package at a **pinned revision** (TODO.md tracks the revision-pin follow-up; until that lands, the R-package surface claims in `docs/methodology/papers/roth-2022-review.md` are provisional). Covers the four power calculations: linear, constant, last-period, custom. Note that `compute_pretrends_power` does not accept `violation_weights` today, so `"custom"` parity has to run through `PreTrendsPower(..., violation_weights=...)` directly until the helper is extended (TODO.md tracks the helper-extension follow-up); helper-only parity is limited to `linear` / `constant` / `last_period`.
- Verify the REGISTRY Implementation Checklist (all four items currently unchecked)
**Verified Components:**
- [x] NIS box probability implemented via `scipy.stats.multivariate_normal.cdf` (Roth Section II.A-B primary form)
- [x] Wald noncentral-χ² form retained as paper-supported alternative (Propositions 1+3+4 all apply — convex ellipsoid acceptance region)
- [x] Both forms produce form-consistent MDV via doubling + brentq bisection with 1000-cap non-convergence fallback
- [x] Non-bootstrap CS adapter consumes full `event_study_vcov` sub-block (not diag)
- [x] Non-bootstrap SA adapter consumes full `event_study_vcov` sub-block (W-matrix construction `event_study_vcov = W @ vcov_cohort @ W.T` added to `SunAbrahamResults`)
- [x] Bootstrap CS/SA and replicate-weight survey paths fall through to `diag(ses^2)` (analytical VCV cleared to prevent mixing with bootstrap/replicate SE overrides)
- [x] `_get_violation_weights('linear')` honors actual pre-period relative-time labels via `fit()` threading → reported MDV is in Roth's γ units on irregular and anticipation-shifted grids. For `MultiPeriodDiDResults`, supported label types are numeric (`int` / `float` / `np.int64`) and `pandas.Period` / `pandas.Timestamp` / `np.datetime64`; **genuinely non-numeric labels** (string period IDs, unranked categoricals) emit an explicit `UserWarning` and fall through to the legacy count-based normalized direction (MDV is NOT in γ units in that case — re-fit with numeric labels)
- [x] `PreTrendsPowerResults` persists fitted `violation_weights` + `pretest_form` + `nis_box_probability`; `power_at(M)` works for all four violation types on fresh fits
- [x] Helper API (`compute_pretrends_power`, `compute_mdv`) accepts `violation_weights` and `pretest_form`; closes the PR-A R18 helper/class API gap
- [x] Summary, `to_dict`, `to_dataframe` dispatch on `pretest_form` (NIS prints box probability; Wald prints noncentrality)

**Outstanding for promotion to fully Complete:**
- R parity fixture against the `pretrends` R package at a **pinned revision** (deferred to PR-C). The generator script `benchmarks/R/generate_pretrends_golden.R` is committed in PR-B with a placeholder commit reference; PR-C will install the package, generate the JSON goldens at `benchmarks/data/r_pretrends_golden.json`, activate `TestPretrendsParityR` (currently skips when goldens missing), and record the audited R-package revision. Until that lands, the R-package surface claims in `docs/methodology/papers/roth-2022-review.md` Gaps section remain provisional.

---

Expand Down
8 changes: 3 additions & 5 deletions TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,11 +94,9 @@ Deferred items from PR reviews that were not addressed before merge.
| WooldridgeDiD: aggregation weights use cell-level n_{g,t} counts. Paper (W2025 Eqs. 7.2-7.4) defines cohort-share weights. Add optional `weights="cohort_share"` parameter to `aggregate()`. | `wooldridge_results.py` | #216 | Medium |
| WooldridgeDiD: optional *efficiency hint* (NOT a canonical-link violation per W2023 Prop 3.1) when method/outcome pairing is sub-optimal — e.g., `method="ols"` on binary data is consistent under QMLE, but `method="logit"` is typically more efficient. The original framing in this row as a "canonical link requirement" tied to Prop 3.1 was incorrect: Wooldridge (2023) Table 1 lists Gaussian/OLS for "any response" and logistic-Bernoulli for "binary OR fractional". A useful hint exists (efficiency), but should not be framed as a methodology violation. See PR #453 R1 review for the corrected reading. | `wooldridge.py` | #216 | Low |
| WooldridgeDiD: Stata `jwdid` golden value tests — add R/Stata reference script and `TestReferenceValues` class. | `tests/test_wooldridge.py` | #216 | Medium |
| PreTrendsPower: `compute_pretrends_power` adapter uses `diag(ses^2)` instead of the full pre-period covariance block Σ_22 for `CallawaySantAnnaResults` (deliberate — non-bootstrap CS persists `event_study_vcov`; bootstrap CS fits clear it at `staggered.py:2032-2036`) and `SunAbrahamResults` (forced — SA does not expose an event-study/cohort VCV at all). Roth (2022)'s NIS box probability and the library's Wald object both depend on Σ_22 off-diagonals; diag fallback is not provably conservative. For non-bootstrap CS fits, route through `event_study_vcov`; for bootstrap CS fits the diag fallback is the only path. For SA, extend `SunAbrahamResults` to persist a cohort/event-study VCV (then route the adapter likewise). Or formally retain the diag fallback with explicit miscalibration framing. See REGISTRY.md `## PreTrendsPower` Note (deviation from paper) + `docs/methodology/papers/roth-2022-review.md`. | `diff_diff/pretrends.py:609-687`, `diff_diff/sun_abraham.py:30-88`, `docs/methodology/REGISTRY.md`, `docs/methodology/papers/roth-2022-review.md` | PR-A (Roth paper review, 2026-05-17) | Medium |
| PreTrendsPower: pin the R `pretrends` package commit/release before building the R-parity fixture. The paper review's R-package surface claims (`pretrends()`, `slope_for_power()`, NIS-only API, no joint-Wald target) are provisional pending a pinned revision; the audited revision should be recorded either in the review file's Gaps section or in this TODO row before any parity assertions are committed. | `docs/methodology/papers/roth-2022-review.md`, `METHODOLOGY_REVIEW.md` (PreTrendsPower row) | PR-A (Roth paper review, 2026-05-17) | Low |
| PreTrendsPower: helper `compute_pretrends_power(results, M, alpha, target_power, violation_type, pre_periods)` does NOT accept `violation_weights`, so `violation_type="custom"` is unusable from the helper (class-only today via `PreTrendsPower(..., violation_weights=...)`). Either add `violation_weights` to the helper signature and forward to the class, or document the helper as supporting only `linear` / `constant` / `last_period`. | `diff_diff/pretrends.py:1048-1095, 442-466` | PR-A (Roth paper review, 2026-05-17) | Low |
| PreTrendsPower: `PreTrendsPowerResults.power_at()` does not yet support `violation_type="custom"`. **Silent-failure path was mitigated** in PR-A (2026-05-17, R18 of the codex review): `power_at()` now raises `NotImplementedError` for custom fits rather than returning equal-weights output, locked in by `test_power_at_raises_on_custom_violation_type`. Remaining follow-up: persist the normalized fitted `violation_weights` on `PreTrendsPowerResults` (currently absent at `pretrends.py:77-90`) and re-enable `power_at()` for custom fits, with a parity test comparing `results.power_at(M)` to a fresh `PreTrendsPower(...).fit(..., M=M).power` on a custom-weights fixture. | `diff_diff/pretrends.py:77-90, ~196-235, ~878-892` | PR-A (Roth paper review, 2026-05-17) | Medium |
| PreTrendsPower: `linear` violation pattern does NOT implement Roth's δ_t = γ·t. `_get_violation_weights(violation_type="linear")` constructs a shifted, normalized `[n-1, ..., 1, 0]` direction from `n_pre` only (`pretrends.py:510-515`), and `fit()` never threads actual relative-time labels into that construction (`pretrends.py:862-866`). For irregular pre-period grids (e.g., anticipation-shifted `t ∈ {-5, -3, -1}`) this means the slope reported as MDV is not in Roth's γ units. Fix: build linear weights from the sorted actual relative-time values used in the fit, define the exposed parameter in γ units, persist any normalization separately, and add a regression test using anticipation-shifted / irregular pre-periods. If the shifted convention is intentional, add a `**Note (deviation from paper):**` to REGISTRY.md and convert reported MDV back to Roth's slope scale before exposing it. | `diff_diff/pretrends.py:488-531, 862-866`, `docs/methodology/REGISTRY.md:2786-2789` | PR-A (Roth paper review, 2026-05-17; surfaced by R17 of the iterative codex review on the paper review file) | **High** |
| PreTrendsPower R parity goldens (PR-C): pin the R `pretrends` package commit/release, run `benchmarks/R/generate_pretrends_golden.R` (committed in PR-B), commit the JSON goldens at `benchmarks/data/r_pretrends_golden.json`, activate the `TestPretrendsParityR` class in `tests/test_methodology_pretrends.py` (currently skips when goldens missing), and flip the METHODOLOGY_REVIEW.md `PreTrendsPower` row from `**Complete** (R parity pending)` → `**Complete**`. Until that lands, the R-package surface claims in `docs/methodology/papers/roth-2022-review.md` remain provisional. | `benchmarks/R/generate_pretrends_golden.R`, `benchmarks/data/r_pretrends_golden.json` (new), `tests/test_methodology_pretrends.py::TestPretrendsParityR`, `METHODOLOGY_REVIEW.md` (PreTrendsPower row) | PR-C (PreTrendsPower R parity) | Low |
<!-- The remaining four PR-A-tagged PreTrendsPower rows (CS/SA Σ_22 fidelity, helper `violation_weights`, custom-weight persistence, linear γ-unit MDV) were all resolved in PR-B 2026-05-18 — see CHANGELOG.md [Unreleased] Added/Changed/Fixed entries for the new behavior. -->

| Thread `vcov_type` (classical / hc1 / hc2 / hc2_bm) through the 8 standalone estimators that expose `cluster=`: `CallawaySantAnna`, `SunAbraham`, `ImputationDiD`, `TwoStageDiD`, `TripleDifference`, `StackedDiD`, `WooldridgeDiD`, `EfficientDiD`. Phase 1a added `vcov_type` to the `DifferenceInDifferences` inheritance chain only. | multiple | Phase 1a | Medium |
| Weighted one-way Bell-McCaffrey (`vcov_type="hc2_bm"` + `weights`, no cluster) currently raises `NotImplementedError`. `_compute_bm_dof_from_contrasts` builds its hat matrix from the unscaled design via `X (X'WX)^{-1} X' W`, but `solve_ols` solves the WLS problem by transforming to `X* = sqrt(w) X`, so the correct symmetric idempotent residual-maker is `M* = I - sqrt(W) X (X'WX)^{-1} X' sqrt(W)`. Rederive the Satterthwaite `(tr G)^2 / tr(G^2)` ratio on the transformed design and add weighted parity tests before lifting the guard. | `linalg.py::_compute_bm_dof_from_contrasts`, `linalg.py::_validate_vcov_args` | Phase 1a | Medium |
| HC2 / HC2 + Bell-McCaffrey on absorbed-FE fits — REMAINING sub-gate: `TwoWayFixedEffects` (`twfe.py:154` rejects unconditionally). The DiD sub-gate and the MultiPeriodDiD sub-gate were both lifted via auto-route to `fixed_effects=` internally (DiD: PR #458, ~1e-10 vs clubSandwich; MPD: this release, ~1e-10 vs sandwich::vcovHC and clubSandwich::vcovCR). TWFE has no equivalent `fixed_effects=` code path (always within-transforms), so the same auto-route surgery is not directly applicable — lifting requires either building the full-dummy design inline or refactoring TWFE to delegate to DiD. Within-transformation preserves coefficients and residuals under FWL but not the hat matrix; HC1/CR1 are unaffected (no leverage term). | `twfe.py::fit` | follow-up | Medium |
Expand Down
Loading
Loading