You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A chronological account of how policyengine-us-data published two Social Security regressions over a fourteen-month window, why both slipped through the safety net the team installed after the March 2026 incident, and what would have caught them. All times UTC. Numbers throughout are weighted to the Enhanced CPS household weights.
policyengine-us-data shipped Social Security regressions during two distinct windows. The first (~14 days, 2026-02-18 → 2026-03-04) understated total social_security at sim-year 2024 by ~44% via a base-year-vs-projected-year discontinuity. The second (~9 days live, 2026-05-04 → 2026-05-12) understated total social_security at every sim year by ~65% because the retirement subcomponent stopped being emitted to the H5. Both were caused by a data-model contract break around what the data pipeline emits vs what policyengine-us computes. The fix that closed the first regression (#554) established the right contract — emit leaves, compute aggregates — but the model side later violated the same contract from the other direction by converting an emitted input to a formula. Detection of the second regression came from an external signal: ~48% baseline senior SPM poverty in a new dashboard, surfaced 2026-05-12 14:22 UTC and fixed 2h 40m later.
Chronology
2024-08-24 — Latent pattern established
Commit c28821b (@nikhilwoodruff, "Improve calibration for ECPS") removes a line that was shadowing the long IMPUTED_VARIABLES list in extended_cps.py:
-IMPUTED_VARIABLES = ["employment_income"]
After this commit, PUF imputation is active for the full 14-variable list, including social_security. The four sub-components (_retirement, _disability, _survivors, _dependents) are set independently from CPS source codes and are not reconciled with the PUF-imputed total. The H5 now stores two unreconciled paths for the same logical quantity.
But for most of the time, the two paths agreed: the sweep (below) shows the stored aggregate and the subcomponent sum within ~3% of each other from the earliest HF-tagged release (2025-05-27) through 2026-02-17. The mechanical pattern was latent; the magnitude was not yet damaging.
Issue #530 ("CPS top-coding caps AGI at $6.26M") motivates a pipeline restructure that extracts PUF imputation into its own module. The extraction lands in commits 6ecfc19 (2026-02-16) and 85f7e41 (2026-02-17 16:08 UTC) on a branch eventually merged via PR #530/#516.
The IMPUTED_VARIABLES list is preserved, but the new module's PUF-imputation behavior is materially narrower: it imputes positive social_security on far fewer PUF-cloned records than the previous implementation.
2026-02-18 22:11 UTC — Severe discontinuity begins
Release 1.69.0 is the first published H5 built with the new puf_impute.py. The stored social_security/2024 weighted total drops from $1,434B (release 1.68.0, 2026-02-17 16:13 UTC) to $824B in 16 hours. Subcomponents are unaffected at ~$1,430B. The two stored paths now disagree by ~42% in weighted dollars.
When policyengine-us runs a simulation:
At sim-year 2024 (the base year) it uses the stored aggregate → $824B
At sim-year 2025+ it falls through to the formula, summing the four subcomponents → ~$1,430B uprated
This produces a year-over-year ~$600B step jump in total Social Security at the base-year boundary. The same artifact drove an ~9-point Gini drop between 2024 and 2025 (net income Gini 0.6502 → 0.5625), which is what was first noticed.
2026-02-23 — Bug filed
Issue #551 (@MaxGhenis): "PUF imputation overwrites social_security but not sub-components, causing base-year discontinuity." The issue reports the magnitudes at the time of filing (5 days into the severe window):
Path
Nonzero records
Weighted recipients
Weighted total
Base year (stored aggregate)
2,281
33.6M
~$809B
Projected years (formula)
7,197
67.8M
~$1,436B
Ratio
3.15×
2.02×
1.78×
The issue summary describes "~3x more Social Security recipients in projected years vs the base year" — that ratio is correct for raw nonzero records (3.15×) but overstates the weighted-recipient or weighted-dollar impact (2.02× and 1.78× respectively, i.e. a ~44% understatement of total SS at 2024).
PR #552 opens the same day with a tactical fix: rescale subcomponents proportionally after PUF overwrites the aggregate.
2026-02-26 — Tactical fix abandoned in favor of systemic fix
PR #552 is closed unmerged. The team takes the systemic path instead: stop storing variables that policyengine-us computes via formula / adds / subtracts, and impute the leaves directly. This eliminates the dual-path disagreement by removing one of the paths from storage. The work continues as PR #554.
2026-03-04 15:23 UTC — #554 merged; severe discontinuity ends
#554 ("Calibrate retirement contributions: targets, SS reconciliation, and QRF imputation", @PavelMakarchuk) merges. The relevant piece for what follows is _drop_formula_variables: the exporter strips variables that policyengine-us computes via formula / adds / subtracts. The first H5 published after this — 1.71.2 at 2026-03-04 16:56 UTC — no longer stores social_security. Total severe pre-#554 window: ~14 days 17 hours.
The fix has a known cost: the data pipeline now depends on complete and consistent imputation of the four Social Security sub-components on both halves of the dataset. On the PUF clone half, subcomponents aren't yet imputed by the new pipeline, so the first post-#554 H5s undercount total SS by ~4% (subcomponent sum $1,383B vs the pre-#554 subcomponent path of $1,436B). This is a much milder effect than the discontinuity it replaced.
2026-03-04 ~20:00 UTC — Unrelated employment-income corruption
The same _drop_formula_variables introduced in #554 has an unrelated rename bug that drops employment_income from the H5, sending aggregate SPM poverty to ~42% (vs ~11% actual). This is documented in detail in postmortem #578.
None of these will fire on the May Social Security regression. Details in "Why prevention missed it" below.
2026-03-14 13:41 UTC — #589 closes the post-#554 source-level window
#589 (@MaxGhenis, "QRF-impute CPS-only variables for PUF clone half") adds second-stage QRF imputation for CPS-only variables on the PUF clone half, including the four Social Security subcomponents. Subcomponent sum recovers from ~$1,383B to ~$1,449B over the ~10 days the source-level window is open.
2026-04-17 18:36 UTC — Contract break from the model side
policyengine-us#8040 (@MaxGhenis, "Add Social Security retirement benefit calculation chain") merges. It makes social_security_retirement formula-backed; the observed survey value moves to social_security_retirement_reported behind the parameter gov.simulation.reported_social_security_retirement. A variable that policyengine-us-data had been emitting as a canonical input is now the same kind of variable the exporter strips by rule.
The first policyengine-us release with the formula (verified by PyPI sdist inspection) is 1.644.0. No policyengine-us-data build picks this up immediately — releases on 2026-04-18 still lock policyengine-us==1.637.0 and so are unaffected.
2026-04-29 21:04 UTC — First bad model lock
policyengine-us-data==1.88.3, commit 491ac09, locks policyengine-us==1.674.1 (which has the retirement formula). The pipeline run fails for an unrelated reason; no artifact is produced.
2026-04-30 20:47 UTC — First completed bad generated artifact
policyengine-us-data==1.89.1, commit a2f3bb3, policyengine-us==1.678.0. Pipeline succeeds. The H5 silently drops social_security_retirement (because it now has a formula, the exporter strips it), and the H5 does not emit social_security_retirement_reported as a replacement. At sim time, the model fills social_security_retirement with the reported path, which defaults to zero. Total social_security, recomputed from adds, drops from ~$1,470B to ~$540B (a ~65% understatement). Not yet promoted to live consumers.
2026-05-04 03:05 UTC — First promoted bad artifact
policyengine-us-data==1.90.1, commit f14931e, policyengine-us==1.678.0. Bad data now serving downstream consumers. None of the existing sanity checks fire — aggregate poverty stays within the wide tolerance because the loss is concentrated in seniors; the KEY_MONETARY_VARS NaN/Inf check silently skips because the variable is absent from the H5; the upload validator has no Social Security check.
2026-05-12 06:14 UTC — Dashboard scaffold
poverty-dashboard#1 (@PavelMakarchuk): initial scaffold of an internal poverty dashboard pulling from the latest policyengine-us-data artifact via a Modal backend.
2026-05-12 14:22 UTC — Detection
poverty-dashboard#2 merges, populating the 2026 baseline and adding senior poverty columns. Baseline senior SPM poverty reads ~48% — vs published SPM senior rates in the low double digits. The number is an aggregate impossibility, not a calibration miss. Investigation starts immediately.
Investigation path (captured in the gist): impossible senior SPM rate → decompose senior SPM resources → Social Security aggregate → compare stored H5 variables against the policyengine-us variable registry → identify variables dropped by _drop_formula_variables → trace policyengine-us release history to find when social_security_retirement became formula-backed → match against policyengine-us-data lockfile and pipeline history.
Keeps lower-level SSA calculation internals but does not wire them into the canonical retirement input.
Removes several other package-side reported-data switches and public reported variables found during the same audit.
The model-side fix was chosen over a data-side workaround (emitting social_security_retirement_reported) because the long-term contract is the right place to fix it: any variable that policyengine-us-data emits should be a canonical input in policyengine-us, full stop.
Removes reported SPM/data helper outputs that should not be public runtime controls.
Keeps remaining data-backed leaves explicitly named as data leaves, e.g. *_data.
2026-05-12 17:02 UTC — Live regression ends
First fixed Run Pipeline completes: run 25749650363, commit 61a43e9, policyengine-us-data==1.112.2, policyengine-us==1.691.3. Detection-to-fix: 2h 40m.
2026-05-13 03:56 UTC — First post-fix promoted release
policyengine-us-data==1.113.1 published. Total Social Security back to ~$1,550B, with social_security_retirement ≈ $1,179B.
Visualizing the regressions in published H5s
The sweep below reads each Hugging Face tag's enhanced_cps_2024.h5, weights social_security/2024 and the four subcomponents/2024 by household weight, and plots the totals. One H5 at a time, discarded after read — no large local cache.
Red is stored social_security/2024 (present only in pre-#554 H5s — the PUF-imputed path). Blue is the sum of the four subcomponents at 2024. Key features:
Feb 18 2026 step change. Stored social_security/2024 drops from $1,434B (1.68.0) to $824B (1.69.0) in 16 hours when the new puf_impute.py ships. The simulator now disagrees with itself by ~44% at the base-year boundary.
Late April → mid-May 2026: blue cliff. Subcomponent sum falls from ~$1,470B to ~$540B as social_security_retirement stops being emitted to the H5. Total SS understated by ~65% at every sim year.
May 13 2026: blue recovers. 1.113.1 publishes with retirement restored.
A three-year view, with a 2.5% COLA factor applied to project 2025 and 2026 from the subcomponent sum, makes the simulator's year-over-year discontinuity in late Feb explicit — the red 2024 line crashes to $809B while the blue 2025 and green 2026 lines continue at ~$1,500B:
The three defenses installed after the March 2026 employment-income incident all failed to fire on the May Social Security regression:
Aggregate poverty-rate sanity test (5–30% range). Overall poverty stayed within the wide tolerance because the retirement-income loss is concentrated in seniors. A senior-stratified slice would have failed; the aggregate did not.
KEY_MONETARY_VARS NaN/Inf check on social_security. This check reads the variable from the H5 and verifies no NaN/Inf, then continues silently if the variable is missing. After #8040, social_security is no longer stored in the H5; the check is a no-op. social_security_retirement is not in KEY_MONETARY_VARS at all.
Upload-time validation in upload_completed_datasets.py. Validates file size, structure, employment-income aggregate, weight totals. No Social Security check.
The deeper pattern: the March 2026 sanity tests were shaped to the specific failure of the March incident (employment_income == 0 at every record). The May regression presented as a different shape of the same underlying contract bug — a variable that should be present is silently absent from the H5, model fills with zero — and walked past every gate.
Two of the three checks would also have missed the late-Feb pre-#554 severe discontinuity: the aggregate poverty check tolerance is wide enough, and the KEY_MONETARY_VARS check only verifies NaN/Inf in social_security not its magnitude. The upload validator was installed after the incident.
Follow-ups
Prioritized by how directly each would have caught the regressions documented here.
P0 — would have caught both regressions directly
Aggregate target check against SSA published totals. At sim-year 2024 (base year), total weighted social_security must be within ±10% of the SSA Annual Statistical Supplement aggregate (~$1.5T). Would have failed:
Would not have caught the May regression on its own: the model's adds formula sums whatever subs are present, including retirement=0, so the identity holds by construction. Needs to be paired with the aggregate target check.
Demographic-slice smoke tests. Release-blocking checks for baseline senior SPM poverty, senior market income, weighted SS recipient count by age cohort. The 48% senior SPM rate that surfaced the May regression is the same number that would have failed a 5–25% senior poverty tolerance.
P1 — would catch the contract break upstream
Canonical-inputs manifest with cross-repo check.policyengine-us-data maintains an explicit list of variables it emits. CI fails on a policyengine-us lock bump if any listed variable has become formula-backed in the new model version, unless the variable is allowlisted or a replacement input is emitted.
is_input annotation in policyengine-us. Variables that are canonical inputs declare it explicitly. The exporter pivots off declared intent, not "has a formula." A variable that has both a formula and is marked as a canonical input is the legitimate case; the exporter still emits it and the model uses the formula only when the input is absent.
Reverse-direction CI.policyengine-us PRs that touch variables on the canonical-inputs manifest run the smoke tests (P0.1, P0.2, P0.3) against the latest policyengine-us-data artifact.
P2 — durable hygiene
Release ledger. For each policyengine-us-data package release: source SHA, GitHub Actions run, Modal run ID, locked policyengine-us version, and whether artifacts were merely generated, staged, or promoted. Currently scattered across git, the Actions UI, Modal, and HF.
Public-API discipline. Public APIs and formulas use canonical model variables. Reported/data-only variables stay internal. Mostly done in #8263 and Remove reported SPM data inputs #960; codify as a checklist.
Hugging Face dataset card auto-update on revert. When a published artifact is found defective, programmatically update the HF dataset card so downstream consumers see the warning without polling our discussions.
Sweep tooling as a periodic check. The H5 sweep used to produce the plots above (one tag at a time, no local cache) is small and can run weekly. Anomalies in stored-vs-subcomponent agreement, or large jumps in any KEY_MONETARY_VAR aggregate between adjacent tags, become a release-blocking alarm.
References
#530 — top-coding bug that motivated the puf_impute.py extraction.
#551 — the base-year discontinuity bug report; magnitudes captured 5 days into the severe window. The "~3x" ratio in the summary applies to nonzero record counts; weighted-recipient and weighted-dollar ratios are ~2.0× and ~1.8× respectively.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
A chronological account of how
policyengine-us-datapublished two Social Security regressions over a fourteen-month window, why both slipped through the safety net the team installed after the March 2026 incident, and what would have caught them. All times UTC. Numbers throughout are weighted to the Enhanced CPS household weights.Reproduction appendix (notebook, plain Python, per-release H5 sweep, results CSV, plots): https://gist.github.com/MaxGhenis/ddf033f52eb51e6082bade87564f7b83 — metadata-only, no large H5 downloads required.
TL;DR
policyengine-us-datashipped Social Security regressions during two distinct windows. The first (~14 days, 2026-02-18 → 2026-03-04) understated totalsocial_securityat sim-year 2024 by ~44% via a base-year-vs-projected-year discontinuity. The second (~9 days live, 2026-05-04 → 2026-05-12) understated totalsocial_securityat every sim year by ~65% because the retirement subcomponent stopped being emitted to the H5. Both were caused by a data-model contract break around what the data pipeline emits vs whatpolicyengine-uscomputes. The fix that closed the first regression (#554) established the right contract — emit leaves, compute aggregates — but the model side later violated the same contract from the other direction by converting an emitted input to a formula. Detection of the second regression came from an external signal: ~48% baseline senior SPM poverty in a new dashboard, surfaced 2026-05-12 14:22 UTC and fixed 2h 40m later.Chronology
2024-08-24 — Latent pattern established
Commit
c28821b(@nikhilwoodruff, "Improve calibration for ECPS") removes a line that was shadowing the longIMPUTED_VARIABLESlist inextended_cps.py:-IMPUTED_VARIABLES = ["employment_income"]After this commit, PUF imputation is active for the full 14-variable list, including
social_security. The four sub-components (_retirement,_disability,_survivors,_dependents) are set independently from CPS source codes and are not reconciled with the PUF-imputed total. The H5 now stores two unreconciled paths for the same logical quantity.But for most of the time, the two paths agreed: the sweep (below) shows the stored aggregate and the subcomponent sum within ~3% of each other from the earliest HF-tagged release (2025-05-27) through 2026-02-17. The mechanical pattern was latent; the magnitude was not yet damaging.
2026-02-12 → 2026-02-17 —
puf_impute.pyextractionIssue #530 ("CPS top-coding caps AGI at $6.26M") motivates a pipeline restructure that extracts PUF imputation into its own module. The extraction lands in commits
6ecfc19(2026-02-16) and85f7e41(2026-02-17 16:08 UTC) on a branch eventually merged via PR #530/#516.The
IMPUTED_VARIABLESlist is preserved, but the new module's PUF-imputation behavior is materially narrower: it imputes positivesocial_securityon far fewer PUF-cloned records than the previous implementation.2026-02-18 22:11 UTC — Severe discontinuity begins
Release 1.69.0 is the first published H5 built with the new
puf_impute.py. The storedsocial_security/2024weighted total drops from $1,434B (release 1.68.0, 2026-02-17 16:13 UTC) to $824B in 16 hours. Subcomponents are unaffected at ~$1,430B. The two stored paths now disagree by ~42% in weighted dollars.When
policyengine-usruns a simulation:This produces a year-over-year ~$600B step jump in total Social Security at the base-year boundary. The same artifact drove an ~9-point Gini drop between 2024 and 2025 (net income Gini 0.6502 → 0.5625), which is what was first noticed.
2026-02-23 — Bug filed
Issue #551 (@MaxGhenis): "PUF imputation overwrites social_security but not sub-components, causing base-year discontinuity." The issue reports the magnitudes at the time of filing (5 days into the severe window):
The issue summary describes "~3x more Social Security recipients in projected years vs the base year" — that ratio is correct for raw nonzero records (3.15×) but overstates the weighted-recipient or weighted-dollar impact (2.02× and 1.78× respectively, i.e. a ~44% understatement of total SS at 2024).
PR #552 opens the same day with a tactical fix: rescale subcomponents proportionally after PUF overwrites the aggregate.
2026-02-26 — Tactical fix abandoned in favor of systemic fix
PR #552 is closed unmerged. The team takes the systemic path instead: stop storing variables that
policyengine-uscomputes via formula /adds/subtracts, and impute the leaves directly. This eliminates the dual-path disagreement by removing one of the paths from storage. The work continues as PR #554.2026-03-04 15:23 UTC — #554 merged; severe discontinuity ends
#554 ("Calibrate retirement contributions: targets, SS reconciliation, and QRF imputation", @PavelMakarchuk) merges. The relevant piece for what follows is
_drop_formula_variables: the exporter strips variables thatpolicyengine-uscomputes via formula /adds/subtracts. The first H5 published after this — 1.71.2 at 2026-03-04 16:56 UTC — no longer storessocial_security. Total severe pre-#554 window: ~14 days 17 hours.The fix has a known cost: the data pipeline now depends on complete and consistent imputation of the four Social Security sub-components on both halves of the dataset. On the PUF clone half, subcomponents aren't yet imputed by the new pipeline, so the first post-#554 H5s undercount total SS by ~4% (subcomponent sum $1,383B vs the pre-#554 subcomponent path of $1,436B). This is a much milder effect than the discontinuity it replaced.
2026-03-04 ~20:00 UTC — Unrelated employment-income corruption
The same
_drop_formula_variablesintroduced in #554 has an unrelated rename bug that dropsemployment_incomefrom the H5, sending aggregate SPM poverty to ~42% (vs ~11% actual). This is documented in detail in postmortem #578.2026-03-05 — Prevention measures from #578 land
PR #570 adds three lines of defense in response to the employment-income incident:
KEY_MONETARY_VARSNaN/Inf check, includingsocial_security.None of these will fire on the May Social Security regression. Details in "Why prevention missed it" below.
2026-03-14 13:41 UTC — #589 closes the post-#554 source-level window
#589 (@MaxGhenis, "QRF-impute CPS-only variables for PUF clone half") adds second-stage QRF imputation for CPS-only variables on the PUF clone half, including the four Social Security subcomponents. Subcomponent sum recovers from ~$1,383B to ~$1,449B over the ~10 days the source-level window is open.
2026-04-17 18:36 UTC — Contract break from the model side
policyengine-us#8040 (@MaxGhenis, "Add Social Security retirement benefit calculation chain") merges. It makes
social_security_retirementformula-backed; the observed survey value moves tosocial_security_retirement_reportedbehind the parametergov.simulation.reported_social_security_retirement. A variable thatpolicyengine-us-datahad been emitting as a canonical input is now the same kind of variable the exporter strips by rule.The first
policyengine-usrelease with the formula (verified by PyPI sdist inspection) is 1.644.0. Nopolicyengine-us-databuild picks this up immediately — releases on 2026-04-18 still lockpolicyengine-us==1.637.0and so are unaffected.2026-04-29 21:04 UTC — First bad model lock
policyengine-us-data==1.88.3, commit491ac09, lockspolicyengine-us==1.674.1(which has the retirement formula). The pipeline run fails for an unrelated reason; no artifact is produced.2026-04-30 20:47 UTC — First completed bad generated artifact
policyengine-us-data==1.89.1, commita2f3bb3,policyengine-us==1.678.0. Pipeline succeeds. The H5 silently dropssocial_security_retirement(because it now has a formula, the exporter strips it), and the H5 does not emitsocial_security_retirement_reportedas a replacement. At sim time, the model fillssocial_security_retirementwith the reported path, which defaults to zero. Totalsocial_security, recomputed fromadds, drops from ~$1,470B to ~$540B (a ~65% understatement). Not yet promoted to live consumers.2026-05-04 03:05 UTC — First promoted bad artifact
policyengine-us-data==1.90.1, commitf14931e,policyengine-us==1.678.0. Bad data now serving downstream consumers. None of the existing sanity checks fire — aggregate poverty stays within the wide tolerance because the loss is concentrated in seniors; theKEY_MONETARY_VARSNaN/Inf check silently skips because the variable is absent from the H5; the upload validator has no Social Security check.2026-05-12 06:14 UTC — Dashboard scaffold
poverty-dashboard#1 (@PavelMakarchuk): initial scaffold of an internal poverty dashboard pulling from the latest
policyengine-us-dataartifact via a Modal backend.2026-05-12 14:22 UTC — Detection
poverty-dashboard#2 merges, populating the 2026 baseline and adding senior poverty columns. Baseline senior SPM poverty reads ~48% — vs published SPM senior rates in the low double digits. The number is an aggregate impossibility, not a calibration miss. Investigation starts immediately.
Investigation path (captured in the gist): impossible senior SPM rate → decompose senior SPM resources → Social Security aggregate → compare stored H5 variables against the
policyengine-usvariable registry → identify variables dropped by_drop_formula_variables→ tracepolicyengine-usrelease history to find whensocial_security_retirementbecame formula-backed → match againstpolicyengine-us-datalockfile and pipeline history.2026-05-12 16:24 UTC — Model-side fix
policyengine-us#8263 (@MaxGhenis) merges:
social_security_retirementas the canonical input.social_security_retirement_reported.gov.simulation.reported_social_security_retirement.The model-side fix was chosen over a data-side workaround (emitting
social_security_retirement_reported) because the long-term contract is the right place to fix it: any variable thatpolicyengine-us-dataemits should be a canonical input inpolicyengine-us, full stop.2026-05-12 16:59 UTC — Data-side fix
policyengine-us-data#960 (@MaxGhenis) merges:
policyengine-us>=1.691.3.policyengine-us==1.691.3.*_data.2026-05-12 17:02 UTC — Live regression ends
First fixed
Run Pipelinecompletes: run 25749650363, commit61a43e9,policyengine-us-data==1.112.2,policyengine-us==1.691.3. Detection-to-fix: 2h 40m.2026-05-13 03:56 UTC — First post-fix promoted release
policyengine-us-data==1.113.1published. Total Social Security back to ~$1,550B, withsocial_security_retirement≈ $1,179B.Visualizing the regressions in published H5s
The sweep below reads each Hugging Face tag's
enhanced_cps_2024.h5, weightssocial_security/2024and the four subcomponents/2024 by household weight, and plots the totals. One H5 at a time, discarded after read — no large local cache.Red is stored
social_security/2024(present only in pre-#554 H5s — the PUF-imputed path). Blue is the sum of the four subcomponents at 2024. Key features:social_security/2024drops from $1,434B (1.68.0) to $824B (1.69.0) in 16 hours when the newpuf_impute.pyships. The simulator now disagrees with itself by ~44% at the base-year boundary.social_security_retirementstops being emitted to the H5. Total SS understated by ~65% at every sim year.A three-year view, with a 2.5% COLA factor applied to project 2025 and 2026 from the subcomponent sum, makes the simulator's year-over-year discontinuity in late Feb explicit — the red 2024 line crashes to $809B while the blue 2025 and green 2026 lines continue at ~$1,500B:
Underlying data and scripts:
ss_sweep_results.csv,ss_sweep.py,ss_plot.py.Incident windows
Why prevention missed it
The three defenses installed after the March 2026 employment-income incident all failed to fire on the May Social Security regression:
KEY_MONETARY_VARSNaN/Inf check onsocial_security. This check reads the variable from the H5 and verifies no NaN/Inf, thencontinues silently if the variable is missing. After #8040,social_securityis no longer stored in the H5; the check is a no-op.social_security_retirementis not inKEY_MONETARY_VARSat all.upload_completed_datasets.py. Validates file size, structure, employment-income aggregate, weight totals. No Social Security check.The deeper pattern: the March 2026 sanity tests were shaped to the specific failure of the March incident (
employment_income == 0at every record). The May regression presented as a different shape of the same underlying contract bug — a variable that should be present is silently absent from the H5, model fills with zero — and walked past every gate.Two of the three checks would also have missed the late-Feb pre-#554 severe discontinuity: the aggregate poverty check tolerance is wide enough, and the
KEY_MONETARY_VARScheck only verifies NaN/Inf insocial_securitynot its magnitude. The upload validator was installed after the incident.Follow-ups
Prioritized by how directly each would have caught the regressions documented here.
P0 — would have caught both regressions directly
social_securitymust be within ±10% of the SSA Annual Statistical Supplement aggregate (~$1.5T). Would have failed:social_security == retirement + disability + survivors + dependentsand analogous identities for otheradds-defined aggregates. Drift > tolerance fails the build.addsformula sums whatever subs are present, including retirement=0, so the identity holds by construction. Needs to be paired with the aggregate target check.P1 — would catch the contract break upstream
policyengine-us-datamaintains an explicit list of variables it emits. CI fails on apolicyengine-uslock bump if any listed variable has become formula-backed in the new model version, unless the variable is allowlisted or a replacement input is emitted.is_inputannotation inpolicyengine-us. Variables that are canonical inputs declare it explicitly. The exporter pivots off declared intent, not "has a formula." A variable that has both a formula and is marked as a canonical input is the legitimate case; the exporter still emits it and the model uses the formula only when the input is absent.policyengine-usPRs that touch variables on the canonical-inputs manifest run the smoke tests (P0.1, P0.2, P0.3) against the latestpolicyengine-us-dataartifact.P2 — durable hygiene
policyengine-us-datapackage release: source SHA, GitHub Actions run, Modal run ID, lockedpolicyengine-usversion, and whether artifacts were merely generated, staged, or promoted. Currently scattered across git, the Actions UI, Modal, and HF.References
puf_impute.pyextraction.social_security_retirementformula-backed.social_security_retirementas canonical input.61a43e9.Beta Was this translation helpful? Give feedback.
All reactions