Skip to content

Commit f6499db

Browse files
authored
Merge pull request #463 from igerber/feature/roth-2022-paper-review
2 parents 1dc6a59 + b3f8181 commit f6499db

7 files changed

Lines changed: 434 additions & 14 deletions

File tree

METHODOLOGY_REVIEW.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1053,11 +1053,11 @@ and covariate-adjusted specifications.)
10531053
**Documentation in place:**
10541054
- REGISTRY.md section: `## PreTrendsPower` (MDV at target power, four violation types — linear/constant/last_period/custom, power curve plotting, HonestDiD integration)
10551055
- Implementation: `tests/test_pretrends.py` (point-estimator, MDV, power curve, sensitivity) plus event-study coverage in `tests/test_pretrends_event_study.py`
1056+
- Paper review on file: `docs/methodology/papers/roth-2022-review.md` (added 2026-05-17; non-authoritative source audit — registry entry remains authoritative until the follow-up audit PR)
10561057

10571058
**Outstanding for promotion:**
1058-
- Paper review under `docs/methodology/papers/roth-2022-review.md`
10591059
- Dedicated `tests/test_methodology_pretrends.py` with paper-equation-numbered Verified Components walk-through
1060-
- R parity fixture against the `pretrends` R package (the four power calculations: linear, constant, last-period, custom)
1060+
- R parity fixture against the `pretrends` R package at a **pinned revision** (TODO.md tracks the revision-pin follow-up; until that lands, the R-package surface claims in `docs/methodology/papers/roth-2022-review.md` are provisional). Covers the four power calculations: linear, constant, last-period, custom. Note that `compute_pretrends_power` does not accept `violation_weights` today, so `"custom"` parity has to run through `PreTrendsPower(..., violation_weights=...)` directly until the helper is extended (TODO.md tracks the helper-extension follow-up); helper-only parity is limited to `linear` / `constant` / `last_period`.
10611061
- Verify the REGISTRY Implementation Checklist (all four items currently unchecked)
10621062

10631063
---

TODO.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,11 @@ Deferred items from PR reviews that were not addressed before merge.
9494
| WooldridgeDiD: aggregation weights use cell-level n_{g,t} counts. Paper (W2025 Eqs. 7.2-7.4) defines cohort-share weights. Add optional `weights="cohort_share"` parameter to `aggregate()`. | `wooldridge_results.py` | #216 | Medium |
9595
| WooldridgeDiD: optional *efficiency hint* (NOT a canonical-link violation per W2023 Prop 3.1) when method/outcome pairing is sub-optimal — e.g., `method="ols"` on binary data is consistent under QMLE, but `method="logit"` is typically more efficient. The original framing in this row as a "canonical link requirement" tied to Prop 3.1 was incorrect: Wooldridge (2023) Table 1 lists Gaussian/OLS for "any response" and logistic-Bernoulli for "binary OR fractional". A useful hint exists (efficiency), but should not be framed as a methodology violation. See PR #453 R1 review for the corrected reading. | `wooldridge.py` | #216 | Low |
9696
| WooldridgeDiD: Stata `jwdid` golden value tests — add R/Stata reference script and `TestReferenceValues` class. | `tests/test_wooldridge.py` | #216 | Medium |
97+
| PreTrendsPower: `compute_pretrends_power` adapter uses `diag(ses^2)` instead of the full pre-period covariance block Σ_22 for `CallawaySantAnnaResults` (deliberate — non-bootstrap CS persists `event_study_vcov`; bootstrap CS fits clear it at `staggered.py:2032-2036`) and `SunAbrahamResults` (forced — SA does not expose an event-study/cohort VCV at all). Roth (2022)'s NIS box probability and the library's Wald object both depend on Σ_22 off-diagonals; diag fallback is not provably conservative. For non-bootstrap CS fits, route through `event_study_vcov`; for bootstrap CS fits the diag fallback is the only path. For SA, extend `SunAbrahamResults` to persist a cohort/event-study VCV (then route the adapter likewise). Or formally retain the diag fallback with explicit miscalibration framing. See REGISTRY.md `## PreTrendsPower` Note (deviation from paper) + `docs/methodology/papers/roth-2022-review.md`. | `diff_diff/pretrends.py:609-687`, `diff_diff/sun_abraham.py:30-88`, `docs/methodology/REGISTRY.md`, `docs/methodology/papers/roth-2022-review.md` | PR-A (Roth paper review, 2026-05-17) | Medium |
98+
| PreTrendsPower: pin the R `pretrends` package commit/release before building the R-parity fixture. The paper review's R-package surface claims (`pretrends()`, `slope_for_power()`, NIS-only API, no joint-Wald target) are provisional pending a pinned revision; the audited revision should be recorded either in the review file's Gaps section or in this TODO row before any parity assertions are committed. | `docs/methodology/papers/roth-2022-review.md`, `METHODOLOGY_REVIEW.md` (PreTrendsPower row) | PR-A (Roth paper review, 2026-05-17) | Low |
99+
| PreTrendsPower: helper `compute_pretrends_power(results, M, alpha, target_power, violation_type, pre_periods)` does NOT accept `violation_weights`, so `violation_type="custom"` is unusable from the helper (class-only today via `PreTrendsPower(..., violation_weights=...)`). Either add `violation_weights` to the helper signature and forward to the class, or document the helper as supporting only `linear` / `constant` / `last_period`. | `diff_diff/pretrends.py:1048-1095, 442-466` | PR-A (Roth paper review, 2026-05-17) | Low |
100+
| PreTrendsPower: `PreTrendsPowerResults.power_at()` does not yet support `violation_type="custom"`. **Silent-failure path was mitigated** in PR-A (2026-05-17, R18 of the codex review): `power_at()` now raises `NotImplementedError` for custom fits rather than returning equal-weights output, locked in by `test_power_at_raises_on_custom_violation_type`. Remaining follow-up: persist the normalized fitted `violation_weights` on `PreTrendsPowerResults` (currently absent at `pretrends.py:77-90`) and re-enable `power_at()` for custom fits, with a parity test comparing `results.power_at(M)` to a fresh `PreTrendsPower(...).fit(..., M=M).power` on a custom-weights fixture. | `diff_diff/pretrends.py:77-90, ~196-235, ~878-892` | PR-A (Roth paper review, 2026-05-17) | Medium |
101+
| PreTrendsPower: `linear` violation pattern does NOT implement Roth's δ_t = γ·t. `_get_violation_weights(violation_type="linear")` constructs a shifted, normalized `[n-1, ..., 1, 0]` direction from `n_pre` only (`pretrends.py:510-515`), and `fit()` never threads actual relative-time labels into that construction (`pretrends.py:862-866`). For irregular pre-period grids (e.g., anticipation-shifted `t ∈ {-5, -3, -1}`) this means the slope reported as MDV is not in Roth's γ units. Fix: build linear weights from the sorted actual relative-time values used in the fit, define the exposed parameter in γ units, persist any normalization separately, and add a regression test using anticipation-shifted / irregular pre-periods. If the shifted convention is intentional, add a `**Note (deviation from paper):**` to REGISTRY.md and convert reported MDV back to Roth's slope scale before exposing it. | `diff_diff/pretrends.py:488-531, 862-866`, `docs/methodology/REGISTRY.md:2786-2789` | PR-A (Roth paper review, 2026-05-17; surfaced by R17 of the iterative codex review on the paper review file) | **High** |
97102
| Thread `vcov_type` (classical / hc1 / hc2 / hc2_bm) through the 8 standalone estimators that expose `cluster=`: `CallawaySantAnna`, `SunAbraham`, `ImputationDiD`, `TwoStageDiD`, `TripleDifference`, `StackedDiD`, `WooldridgeDiD`, `EfficientDiD`. Phase 1a added `vcov_type` to the `DifferenceInDifferences` inheritance chain only. | multiple | Phase 1a | Medium |
98103
| Weighted one-way Bell-McCaffrey (`vcov_type="hc2_bm"` + `weights`, no cluster) currently raises `NotImplementedError`. `_compute_bm_dof_from_contrasts` builds its hat matrix from the unscaled design via `X (X'WX)^{-1} X' W`, but `solve_ols` solves the WLS problem by transforming to `X* = sqrt(w) X`, so the correct symmetric idempotent residual-maker is `M* = I - sqrt(W) X (X'WX)^{-1} X' sqrt(W)`. Rederive the Satterthwaite `(tr G)^2 / tr(G^2)` ratio on the transformed design and add weighted parity tests before lifting the guard. | `linalg.py::_compute_bm_dof_from_contrasts`, `linalg.py::_validate_vcov_args` | Phase 1a | Medium |
99104
| HC2 / HC2 + Bell-McCaffrey on absorbed-FE fits — REMAINING sub-gate: `TwoWayFixedEffects` (`twfe.py:154` rejects unconditionally). The DiD sub-gate and the MultiPeriodDiD sub-gate were both lifted via auto-route to `fixed_effects=` internally (DiD: PR #458, ~1e-10 vs clubSandwich; MPD: this release, ~1e-10 vs sandwich::vcovHC and clubSandwich::vcovCR). TWFE has no equivalent `fixed_effects=` code path (always within-transforms), so the same auto-route surgery is not directly applicable — lifting requires either building the full-dummy design inline or refactoring TWFE to delegate to DiD. Within-transformation preserves coefficients and residuals under FWL but not the hat matrix; HC1/CR1 are unaffected (no leverage term). | `twfe.py::fit` | follow-up | Medium |

diff_diff/pretrends.py

Lines changed: 54 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -209,9 +209,31 @@ def power_at(self, M: float) -> float:
209209
-------
210210
float
211211
Power to detect violation of magnitude M.
212+
213+
Raises
214+
------
215+
NotImplementedError
216+
If the fit was made with ``violation_type="custom"``. The
217+
``PreTrendsPowerResults`` dataclass does not currently persist
218+
the fitted ``violation_weights``, so this method cannot
219+
reconstruct the custom weights. Refit
220+
``PreTrendsPower(violation_type="custom", violation_weights=...)``
221+
with the new ``M`` instead. Tracked in TODO.md as a planned
222+
follow-up to persist the fitted weights.
212223
"""
213224
from scipy import stats
214225

226+
if self.violation_type == "custom":
227+
raise NotImplementedError(
228+
"PreTrendsPowerResults.power_at() does not support "
229+
"violation_type='custom': fitted violation_weights are "
230+
"not persisted on the result object, so the custom weights "
231+
"cannot be reconstructed. Refit "
232+
"PreTrendsPower(violation_type='custom', "
233+
"violation_weights=...) with the new M instead. "
234+
"See TODO.md (PreTrendsPower power_at custom path)."
235+
)
236+
215237
n_pre = self.n_pre_periods
216238

217239
# Reconstruct violation weights based on violation type
@@ -227,8 +249,14 @@ def power_at(self, M: float) -> float:
227249
weights = np.zeros(n_pre)
228250
weights[-1] = 1.0
229251
else:
230-
# For custom, we can't reconstruct - use equal weights as fallback
231-
weights = np.ones(n_pre)
252+
# Fail loud on unknown violation_type values. Mirrors the raise
253+
# at the end of _get_violation_weights(); prevents silent
254+
# equal-weights output if a future violation_type is added to
255+
# fit() but not threaded through power_at().
256+
raise ValueError(
257+
f"Unknown violation_type: {self.violation_type!r}. "
258+
f"Expected one of: 'linear', 'constant', 'last_period', 'custom'."
259+
)
232260

233261
# Normalize weights to unit L2 norm
234262
norm = np.linalg.norm(weights)
@@ -1067,7 +1095,18 @@ def compute_pretrends_power(
10671095
target_power : float, default=0.80
10681096
Target power for MDV calculation.
10691097
violation_type : str, default='linear'
1070-
Type of violation pattern.
1098+
Type of violation pattern. This convenience helper supports
1099+
``linear`` / ``constant`` / ``last_period`` only and does NOT
1100+
accept ``violation_weights``, so passing
1101+
``violation_type='custom'`` will raise ``ValueError`` from the
1102+
underlying ``PreTrendsPower`` constructor (which requires
1103+
``violation_weights`` when ``violation_type='custom'``). To use a
1104+
custom violation pattern, instantiate ``PreTrendsPower(...,
1105+
violation_weights=...)`` directly. Note that
1106+
``PreTrendsPowerResults.power_at()`` on such a fit raises
1107+
``NotImplementedError`` because fitted weights are not yet
1108+
persisted on the result object; refit with the new ``M`` instead.
1109+
Both gaps are tracked in TODO.md until the follow-up audit lands.
10711110
pre_periods : list of int, optional
10721111
Explicit list of pre-treatment periods. If None, attempts to infer
10731112
from results. Use when you've estimated all periods as post_periods.
@@ -1114,7 +1153,18 @@ def compute_mdv(
11141153
target_power : float, default=0.80
11151154
Target power for MDV calculation.
11161155
violation_type : str, default='linear'
1117-
Type of violation pattern.
1156+
Type of violation pattern. This convenience helper supports
1157+
``linear`` / ``constant`` / ``last_period`` only and does NOT
1158+
accept ``violation_weights``, so passing
1159+
``violation_type='custom'`` will raise ``ValueError`` from the
1160+
underlying ``PreTrendsPower`` constructor (which requires
1161+
``violation_weights`` when ``violation_type='custom'``). To use a
1162+
custom violation pattern, instantiate ``PreTrendsPower(...,
1163+
violation_weights=...)`` directly. Note that
1164+
``PreTrendsPowerResults.power_at()`` on such a fit raises
1165+
``NotImplementedError`` because fitted weights are not yet
1166+
persisted on the result object; refit with the new ``M`` instead.
1167+
Both gaps are tracked in TODO.md until the follow-up audit lands.
11181168
pre_periods : list of int, optional
11191169
Explicit list of pre-treatment periods. If None, attempts to infer
11201170
from results. Use when you've estimated all periods as post_periods.

docs/methodology/REGISTRY.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2761,7 +2761,7 @@ CRITICAL: δ_pre = β_pre pins pre-treatment violations to observed coefficients
27612761

27622762
## PreTrendsPower
27632763

2764-
**Primary source:** [Roth, J. (2022). Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends. *American Economic Review: Insights*, 4(3), 305-322.](https://doi.org/10.1257/aeri.20210236)
2764+
**Primary source:** [Roth, J. (2022). Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends. *American Economic Review: Insights*, 4(3), 305-322.](https://doi.org/10.1257/aeri.20210236). Paper review on file: `docs/methodology/papers/roth-2022-review.md` (non-authoritative source audit; this REGISTRY entry remains the authoritative methodology contract).
27652765

27662766
**Key implementation requirements:**
27672767

@@ -2793,6 +2793,10 @@ Violation types:
27932793
- **Last period**: δ_{-1} = c, others zero
27942794
- **Custom**: user-specified pattern
27952795

2796+
- **Note (deviation from paper — `linear` violation pattern):** the shipped `PreTrendsPower._get_violation_weights("linear")` constructs `[n_pre-1, ..., 1, 0]` from `n_pre` alone and `PreTrendsPower.fit()` never threads the actual relative-time labels into that construction (`pretrends.py:488-531`, `pretrends.py:862-866`). For irregular or anticipation-shifted pre-period grids (e.g., `t ∈ {-5, -3, -1}`), this means the slope reported as MDV is NOT in Roth's `γ` units — the shifted/normalized direction effectively assumes contiguous relative times `{-(n_pre-1), ..., -1}`. The follow-up audit (tracked in TODO.md) will either rebuild `linear` weights from the sorted actual relative-time values and expose the parameter in Roth's `γ` units, or formally retain the current shifted/normalized contract with this Note as the deviation record.
2797+
2798+
- **Note (silent-failure guard — `power_at()` with `violation_type="custom"`):** `PreTrendsPowerResults` does not currently persist the fitted `violation_weights`, so `power_at(M)` cannot reconstruct the custom direction. As of this commit, `PreTrendsPowerResults.power_at()` raises `NotImplementedError` for `violation_type="custom"` rather than silently returning equal-weights output. To compute power at a new `M` for a custom fit, refit `PreTrendsPower(violation_type="custom", violation_weights=...)` with the new `M`. Tracked in TODO.md as a planned follow-up to persist the fitted weights and lift the guard.
2799+
27962800
*Standard errors:*
27972801
- Power calculations are exact (no sampling variability)
27982802
- Uncertainty comes from estimated Σ
@@ -2802,6 +2806,13 @@ Violation types:
28022806
- Single pre-period: power calculation trivial
28032807
- Very high power: MDV approaches zero
28042808

2809+
- **Note (deviation from paper — diagonal pre-period VCV fallback):** Roth (2022)'s power and bias objects (both the paper-analyzed NIS box probability and the library's Wald / noncentral-χ² form) operate on the full pre-period covariance block Σ_22. The shipped `compute_pretrends_power` adapter currently uses different sources for the pre-period covariance by result type:
2810+
- `MultiPeriodDiDResults` (`pretrends.py:592-601`): extracts the full pre-period sub-block from `results.vcov` when `interaction_indices` is populated; falls back to `diag(ses^2)` otherwise.
2811+
- `CallawaySantAnnaResults` (`pretrends.py:609-652`): hard-codes `vcov = diag(ses^2)`. Non-bootstrap CS fits persist a full `event_study_vcov` matrix (`staggered_results.py:126-128`), so the diag fallback is a deliberate choice in that path. Bootstrap CS fits clear `event_study_vcov` before storing results (`staggered.py:2032-2036`) to prevent mixing analytical VCV with bootstrap SEs, so the full-Σ22 route is not available for bootstrap fits at all.
2812+
- `SunAbrahamResults` (`pretrends.py:660-687`): hard-codes `vcov = diag(ses^2)`; the diag fallback is *forced* because `SunAbrahamResults` does not currently expose an event-study or cohort covariance matrix.
2813+
2814+
Dropping the off-diagonals is NOT a paper-supported numerical choice and is NOT guaranteed to be conservative for MDV/power (the direction of the discrepancy depends on the sign and magnitude of the dropped correlations). The PR-B follow-up audit (tracked in `TODO.md`) will either extend full-sub-VCV consumption to all three paths (with SA also requiring upstream surface work on `SunAbrahamResults`) or formally retain the diag fallback with explicit miscalibration framing. See `docs/methodology/papers/roth-2022-review.md` for the full derivation.
2815+
28052816
**Reference implementation(s):**
28062817
- R: `pretrends` package (Roth's official package)
28072818

docs/methodology/REPORTING.md

Lines changed: 14 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -328,13 +328,20 @@ a library setting.
328328
`DiagnosticReport.pretrends_power` block records
329329
`covariance_source: "diag_fallback_available_full_vcov_unused"` in
330330
that case, and `BusinessReport` downgrades a `well_powered` tier to
331-
`moderately_powered` before rendering prose. This is a known
332-
conservative deviation from the documented "use the full pre-period
333-
covariance" position — it prevents the diagonal approximation from
334-
producing an overly optimistic "well-powered" claim when correlated
335-
pre-period errors could tighten the MDV. The right long-term fix is
336-
to teach `compute_pretrends_power()` to consume `event_study_vcov`
337-
and `event_study_vcov_index`; until that lands this downgrade stays.
331+
`moderately_powered` before rendering prose. This is a documented
332+
deviation from the paper-derived "use the full pre-period covariance"
333+
position. **Not provably conservative**: under Roth (2022)'s NIS
334+
framework and the library's Wald form, the MDV/power objects depend
335+
on the off-diagonals of Σ_22, and the direction of the discrepancy
336+
between full-Σ_22 and diag(ses^2) depends on the sign and magnitude
337+
of the dropped correlations — see the `**Note (deviation from paper
338+
— diagonal pre-period VCV fallback):**` block under `## PreTrendsPower`
339+
in `docs/methodology/REGISTRY.md`. The `well_powered → moderately_powered`
340+
downgrade in BusinessReport reduces the chance of an overly optimistic
341+
claim in practice, but it is not a proof of conservatism. The right
342+
long-term fix is to teach `compute_pretrends_power()` to consume
343+
`event_study_vcov` and `event_study_vcov_index`; until that lands the
344+
downgrade stays.
338345

339346
- **Note:** Unit-translation policy. BusinessReport does not
340347
arithmetically translate log-points to percents or level effects to

0 commit comments

Comments
 (0)