Document RCS IF phi=psi/n convention, add analytical-vs-bootstrap SE convergence test

igerber · claude · igerber · commit cfcc441d5100 · 2026-03-29T09:22:00.000-04:00
REGISTRY.md: Document that RCS IFs use phi=psi/n convention (SE = sqrt(sum(phi^2))),
algebraically equivalent to R's sd(psi)/sqrt(n). The 1/n_all denominator in gradient
terms is the colMeans -&gt; phi conversion, not extra shrinkage.

Add test proving correctness: analytical SE within 20% of bootstrap SE (499 iters)
for RCS reg with covariates.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
@@ -420,7 +420,8 @@ The multiplier bootstrap uses random weights w_i with E[w]=0 and Var(w)=1:
 - **Note (deviation from R):** CallawaySantAnna survey reg+covariates per-cell SE uses a conservative plug-in IF based on WLS residuals. The treated IF is `inf_treated_i = (sw_i/sum(sw_treated)) * (resid_i - ATT)` (normalized by treated weight sum, matching unweighted `(resid-ATT)/n_t`). The control IF is `inf_control_i = -(sw_i/sum(sw_control)) * wls_resid_i` (normalized by control weight sum, matching unweighted `-resid/n_c`). SE is computed as `sqrt(sum(sw_t_norm * (resid_t - ATT)^2) + sum(sw_c_norm * resid_c^2))`, the weighted analogue of the unweighted `sqrt(var_t/n_t + var_c/n_c)`. This omits the semiparametrically efficient nuisance correction from DRDID's `reg_did_panel` — WLS residuals are orthogonal to the weighted design matrix by construction, so the first-order IF term is asymptotically valid but may be conservative. SEs pass weight-scale-invariance tests. The efficient DRDID correction is deferred to future work.
 - **Note (deviation from R):** Per-cell ATT(g,t) SEs under survey weights use influence-function-based variance (matching R's `did::att_gt` analytical SE path) rather than full Taylor-series linearization. When strata/PSU/FPC are present, analytical aggregated SEs (`n_bootstrap=0`) use `compute_survey_if_variance()` on the combined IF/WIF; bootstrap aggregated SEs (`n_bootstrap>0`) use PSU-level multiplier weights.
 
-- **Note:** Repeated cross-sections (`panel=False`, Phase 7b): supports surveys like BRFSS, ACS annual, and CPS monthly where units are not followed over time. Uses cross-sectional DRDID (Sant'Anna & Zhao 2020, Section 4): two outcome models (one per period) instead of one on ΔY, and per-observation influence functions instead of per-unit. All three estimation methods (reg, ipw, dr) supported with and without covariates. Aggregation and bootstrap use the "canonical index" abstraction where the index space is observations (not units). Survey weights are per-observation (no unit-level collapse). Data generated via `generate_staggered_data(panel=False)`.
+- **Note:** Repeated cross-sections (`panel=False`, Phase 7b): supports surveys like BRFSS, ACS annual, and CPS monthly where units are not followed over time. Uses cross-sectional DRDID (Sant'Anna & Zhao 2020, Section 4): `reg` matches `DRDID::reg_did_rc` (Eq 2.2), `dr` matches `DRDID::drdid_rc` (locally efficient, Eq 3.3+3.4 with 4 OLS fits), `ipw` matches `DRDID::std_ipw_did_rc`. Per-observation influence functions instead of per-unit. All three estimation methods support covariates and survey weights.
+- **Note (deviation from R):** RCS influence functions use `phi_i = psi_i / n` convention (SE = `sqrt(sum(phi^2))`), matching the library-wide IF convention where IFs are pre-scaled by `1/n`. R's DRDID uses `psi_i` directly with `SE = sd(psi) / sqrt(n)`. These are algebraically equivalent — `sqrt(sum(psi^2/n^2)) = sqrt(sum(psi^2))/n ≈ sd(psi)/sqrt(n)` — confirmed by analytical-vs-bootstrap SE convergence tests. The `1/n_all` denominator in gradient terms (`M1`, `M2`) is not "extra shrinkage" but the `colMeans` → phi convention conversion.
 - **Note:** Non-survey DR path also includes nuisance IF corrections (PS + OR), matching the survey path structure (Phase 7a). Previously used plug-in IF only.
 
 **Reference implementation(s):**
diff --git a/tests/test_staggered_rc.py b/tests/test_staggered_rc.py
@@ -405,6 +405,44 @@ def test_summary_labels_rcs(self, rc_data):
         assert "units:" not in summary.split("\n")[3]  # Treated line
 
 
+# =============================================================================
+# Analytical vs Bootstrap SE convergence (proves IF scaling is correct)
+# =============================================================================
+
+
+class TestAnalyticalBootstrapConvergence:
+    """Analytical SE should closely match bootstrap SE — proves IF magnitude is correct."""
+
+    def test_reg_se_matches_bootstrap(self, rc_data_with_covariates):
+        """Analytical reg SE should be within 20% of bootstrap SE."""
+        r_analytical = CallawaySantAnna(estimation_method="reg", panel=False).fit(
+            rc_data_with_covariates,
+            "outcome",
+            "unit",
+            "period",
+            "first_treat",
+            covariates=["x1"],
+        )
+        r_bootstrap = CallawaySantAnna(
+            estimation_method="reg", panel=False, n_bootstrap=499, seed=42
+        ).fit(
+            rc_data_with_covariates,
+            "outcome",
+            "unit",
+            "period",
+            "first_treat",
+            covariates=["x1"],
+        )
+        # ATTs should match (bootstrap doesn't change point estimate)
+        np.testing.assert_allclose(r_analytical.overall_att, r_bootstrap.overall_att, atol=1e-10)
+        # SEs should be within 20% (proves IF scaling is correct)
+        ratio = r_analytical.overall_se / r_bootstrap.overall_se
+        assert 0.8 < ratio < 1.2, (
+            f"Analytical/bootstrap SE ratio {ratio:.3f} outside [0.8, 1.2] — "
+            f"analytical={r_analytical.overall_se:.4f}, bootstrap={r_bootstrap.overall_se:.4f}"
+        )
+
+
 # =============================================================================
 # Unequal Cohort Counts Across Periods
 # =============================================================================