Improve reweighting: higher penalty, more iterations, gradient convergence#407
Improve reweighting: higher penalty, more iterations, gradient convergence#407
Conversation
|
@martinholmer, I erroneously pushed the updated PR to origin fork. So I've opened this new one. As noted in PR 403, could you please run If we're looking good, I need to make changes to get it to pass tests, revise, and submit and merge. I will fix the targets file so that it does not try to target all-zero columns, and change expected values for other tests, unless you recommend otherwise. |
I can't do that until you update PR #407 for recent changes on the master branch (notably the bug fix in PR #408 and the test updates in PR #409). Plus, your PR #407 fails the tests on GitHub, so you need to fix that as well (by running "make format" on your computer). |
…e target filtering Major changes to the national reweighting optimizer: 1. Drop impossible targets: automatically filter out targets where the data column is all-zero (8 targets for estate income/losses and rent & royalty net income/losses). These caused the loss to plateau at ~8.0. Now 550 targets instead of 558. 2. Switch float32 to float64: eliminates floating-point precision issues that caused cross-machine non-determinism on the flat loss surface. 3. Run reweighting in a subprocess: isolates from PyTorch autograd state left by PolicyEngine Microsimulation, which shifted gradient accumulation order by 1 ULP, compounding over many iterations. 4. Pre-scale weights: multiply all weights so the weighted filer total matches the SOI target before optimization. Ensures the L2 deviation penalty only measures redistributive changes, not the level shift. 5. Enable L2 weight deviation penalty (default 0.0001): penalizes sum((new - original)^2) / sum(original^2), scaled by the initial loss value. Reduces extreme weight distortion while maintaining excellent target accuracy. 6. Switch Adam to L-BFGS optimizer: quasi-Newton method with strong Wolfe line search. Dramatically better convergence — 549/550 targets within 0.1% (vs 523/550 with Adam at same penalty). GPU and CPU produce nearly identical results. 7. Extract build_loss_matrix() to module-level function for reuse. 8. Add diagnostic output: penalty value, target accuracy statistics, weight change distribution, and reproducibility fingerprint for cross-machine comparison. Remove TensorBoard and tqdm dependencies (no longer needed with L-BFGS). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…gence - Increase REWEIGHT_DEVIATION_PENALTY from 0.0001 to 0.01 for better weight stability (99x stronger regularization toward original weights) - Increase max L-BFGS iterations from 200 to 800; in practice convergence typically occurs well before 800 steps - Replace loss-change convergence criterion (abs(prev-curr) < 1e-12) with gradient-norm criterion (grad_norm < 1e-5), which is the proper first-order optimality condition and avoids false convergence when the Hessian approximation is poor - Add grad_norm to per-step console output alongside loss - Improve GPU status messages to cover all four cases: enabled, requested but unavailable, available but disabled by user, and not available - Remove console NaN-check loop (replaced by unit test coverage) - Extract impossible-target filtering into _drop_impossible_targets() helper and add unit tests for it in tests/test_reweight.py Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Replace print("WARNING: ...") in _drop_impossible_targets with
warnings.warn(..., UserWarning) so the alert surfaces in pytest output
rather than being lost in console noise.
Update test_drop_impossible_targets_removes_all_zero_column to use
pytest.warns(UserWarning) to explicitly verify the warning is raised.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Remove 4 all-zero target variables (estate_income, estate_losses, rent_and_royalty_net_income, rent_and_royalty_net_losses) from the reweighting optimizer since tc_to_soi() hardcodes them to zero. Switch tests to np.allclose default tolerances (rtol=1e-5, atol=1e-8) with exact machine-generated expected values for: test_weights, test_tax_expenditures, test_imputed_variables. Add test_no_all_zero_columns_in_real_loss_matrix to verify the fix. Skip test_variable_totals pending issue #410. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
67469e7 to
e8700ec
Compare
|
@donboyd5, The current version of PR #407 fails the "make lint" test: |
|
Ah, you told me to do make format which I did, but not make lint. I will do
that and resubmit.
…On Fri, Feb 20, 2026 at 9:06 AM Martin Holmer ***@***.***> wrote:
*martinholmer* left a comment (PSLmodels/tax-microdata-benchmarking#407)
<#407 (comment)>
@donboyd5 <https://github.com/donboyd5>, The current version of PR #407
<#407> fails
the "make lint" test:
(base) TMD> ./gitpr 407
remote: Enumerating objects: 48, done.
remote: Counting objects: 100% (48/48), done.
remote: Compressing objects: 100% (6/6), done.
remote: Total 35 (delta 29), reused 35 (delta 29), pack-reused 0 (from 0)
Unpacking objects: 100% (35/35), 12.08 KiB | 562.00 KiB/s, done.
From https://github.com/PSLmodels/tax-microdata-benchmarking
* [new ref] refs/pull/407/head -> pr-407
Switched to branch 'pr-407'
On branch pr-407
(base) TMD> make format
black . -l 79
All done! ✨ 🍰 ✨
45 files left unchanged.
(base) TMD> make lint
************* Module tests.test_reweight
tests/test_reweight.py:70:4: C0415: Import outside toplevel (warnings) (import-outside-toplevel)
make: *** [lint] Error 16
—
Reply to this email directly, view it on GitHub
<#407 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABR4JGE63QU5N5WIF4BCMU34M4IHRAVCNFSM6AAAAACVXB4K2WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTSMZVGAYTIMRTGM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
@martinholmer agree 100%. Have made a note of it. I've fixed the linting error and pushed the update. Hopefully, it is ready for review now. We won't know until you run it locally whether the cpu results on your machine are close enough to gpu results on my machine to pass all tests but I hope so. They certainly should be extremely close, but it would be nice to pass without increasing any tolerances. |
|
@donboyd5, Thanks for all your work on PR #407. Yes, the test results are much closer on my Apple M4 computer, but there are a few small differences that have more to do with test logic than the new reweighting algorithm. Why don't you merge PR #407 now and I will create a PR that changes the test tolerances so that the tests pass on my computer. Does that seem like a sensible approach? I'll leave another comment to this PR that reports my results. |
|
@martinholmer, yes, thanks! |
|
@donboyd5, Here are the results on an Apple M4 computer: And here are the three sets of test failures: |
|
@martinholmer thanks! |
Summary
Supersedes #403 (rebased on current master).
Improves the reweighting optimization to achieve reliable convergence:
REWEIGHT_DEVIATION_PENALTYincreased from0.0001to0.01(100x), providing sufficient regularization toward a unique solutionmax_lbfgs_iterincreased from200to800(converges at ~332 on GPU, ~391 on CPU)loss_change < 1e-12withgrad_norm < 1e-5(proper first-order optimality condition, avoids false convergence)_drop_impossible_targets()helper that filters out all-zero columns before optimization, withwarnings.warn()alertsResults
tmd.csv.gz(within same machine)Files changed
tmd/imputation_assumptions.pyREWEIGHT_DEVIATION_PENALTY:0.0001→0.01tmd/utils/reweight.py_drop_impossible_targets(), GPU messagingtests/test_reweight.py_drop_impossible_targets()Commits
e8a85beImprove reweighting: L-BFGS optimizer, float64, L2 penalty, impossible target filtering5624795Improve reweighting: higher penalty, more iterations, gradient convergence67469e7Use warnings.warn() for impossible target alert; verify in test🤖 Generated with Claude Code