Improve reweighting: higher penalty, more iterations, gradient convergence by donboyd5 · Pull Request #407 · PSLmodels/tax-microdata-benchmarking

donboyd5 · 2026-02-19T17:46:56Z

Summary

Supersedes #403 (rebased on current master).

Improves the reweighting optimization to achieve reliable convergence:

Higher deviation penalty: REWEIGHT_DEVIATION_PENALTY increased from 0.0001 to 0.01 (100x), providing sufficient regularization toward a unique solution
More iterations: max_lbfgs_iter increased from 200 to 800 (converges at ~332 on GPU, ~391 on CPU)
Gradient-norm convergence: Replaced loss_change < 1e-12 with grad_norm < 1e-5 (proper first-order optimality condition, avoids false convergence)
Impossible target filtering: Extracted _drop_impossible_targets() helper that filters out all-zero columns before optimization, with warnings.warn() alerts
GPU availability messaging: 4-case diagnostic output for GPU status

Results

GPU run: converges at step 332, ~77s
CPU run: converges at step 391
GPU vs CPU: bit-for-bit identical tmd.csv.gz (within same machine)

Files changed

File	Change
`tmd/imputation_assumptions.py`	`REWEIGHT_DEVIATION_PENALTY`: `0.0001` → `0.01`
`tmd/utils/reweight.py`	Higher iteration limit, gradient-norm convergence, `_drop_impossible_targets()`, GPU messaging
`tests/test_reweight.py`	New: 3 unit tests for `_drop_impossible_targets()`

Commits

e8a85be Improve reweighting: L-BFGS optimizer, float64, L2 penalty, impossible target filtering
5624795 Improve reweighting: higher penalty, more iterations, gradient convergence
67469e7 Use warnings.warn() for impossible target alert; verify in test

🤖 Generated with Claude Code

donboyd5 · 2026-02-19T17:54:23Z

@martinholmer, I erroneously pushed the updated PR to origin fork. So I've opened this new one.

As noted in PR 403, could you please run make clean && make data on your computer and compare to results in Google Drive subfolder tmd_2026-02-19_lbfgs_gpu_800iters_p010_gradnorm_prupdate? They should (I hope) be extremely close.

If we're looking good, I need to make changes to get it to pass tests, revise, and submit and merge. I will fix the targets file so that it does not try to target all-zero columns, and change expected values for other tests, unless you recommend otherwise.

martinholmer · 2026-02-19T20:50:04Z

@donboyd5 said in PR #407:

As noted in #403, could you please run make clean && make data on your computer and compare to results in Google Drive subfolder tmd_2026-02-19_lbfgs_gpu_800iters_p010_gradnorm_prupdate? They should (I hope) be extremely close.

I can't do that until you update PR #407 for recent changes on the master branch (notably the bug fix in PR #408 and the test updates in PR #409). Plus, your PR #407 fails the tests on GitHub, so you need to fix that as well (by running "make format" on your computer).

…e target filtering Major changes to the national reweighting optimizer: 1. Drop impossible targets: automatically filter out targets where the data column is all-zero (8 targets for estate income/losses and rent & royalty net income/losses). These caused the loss to plateau at ~8.0. Now 550 targets instead of 558. 2. Switch float32 to float64: eliminates floating-point precision issues that caused cross-machine non-determinism on the flat loss surface. 3. Run reweighting in a subprocess: isolates from PyTorch autograd state left by PolicyEngine Microsimulation, which shifted gradient accumulation order by 1 ULP, compounding over many iterations. 4. Pre-scale weights: multiply all weights so the weighted filer total matches the SOI target before optimization. Ensures the L2 deviation penalty only measures redistributive changes, not the level shift. 5. Enable L2 weight deviation penalty (default 0.0001): penalizes sum((new - original)^2) / sum(original^2), scaled by the initial loss value. Reduces extreme weight distortion while maintaining excellent target accuracy. 6. Switch Adam to L-BFGS optimizer: quasi-Newton method with strong Wolfe line search. Dramatically better convergence — 549/550 targets within 0.1% (vs 523/550 with Adam at same penalty). GPU and CPU produce nearly identical results. 7. Extract build_loss_matrix() to module-level function for reuse. 8. Add diagnostic output: penalty value, target accuracy statistics, weight change distribution, and reproducibility fingerprint for cross-machine comparison. Remove TensorBoard and tqdm dependencies (no longer needed with L-BFGS). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…gence - Increase REWEIGHT_DEVIATION_PENALTY from 0.0001 to 0.01 for better weight stability (99x stronger regularization toward original weights) - Increase max L-BFGS iterations from 200 to 800; in practice convergence typically occurs well before 800 steps - Replace loss-change convergence criterion (abs(prev-curr) < 1e-12) with gradient-norm criterion (grad_norm < 1e-5), which is the proper first-order optimality condition and avoids false convergence when the Hessian approximation is poor - Add grad_norm to per-step console output alongside loss - Improve GPU status messages to cover all four cases: enabled, requested but unavailable, available but disabled by user, and not available - Remove console NaN-check loop (replaced by unit test coverage) - Extract impossible-target filtering into _drop_impossible_targets() helper and add unit tests for it in tests/test_reweight.py Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Replace print("WARNING: ...") in _drop_impossible_targets with warnings.warn(..., UserWarning) so the alert surfaces in pytest output rather than being lost in console noise. Update test_drop_impossible_targets_removes_all_zero_column to use pytest.warns(UserWarning) to explicitly verify the warning is raised. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Remove 4 all-zero target variables (estate_income, estate_losses, rent_and_royalty_net_income, rent_and_royalty_net_losses) from the reweighting optimizer since tc_to_soi() hardcodes them to zero. Switch tests to np.allclose default tolerances (rtol=1e-5, atol=1e-8) with exact machine-generated expected values for: test_weights, test_tax_expenditures, test_imputed_variables. Add test_no_all_zero_columns_in_real_loss_matrix to verify the fix. Skip test_variable_totals pending issue #410. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

martinholmer · 2026-02-20T14:06:27Z

@donboyd5, The current version of PR #407 fails the "make lint" test:

(base) TMD> ./gitpr 407
remote: Enumerating objects: 48, done.
remote: Counting objects: 100% (48/48), done.
remote: Compressing objects: 100% (6/6), done.
remote: Total 35 (delta 29), reused 35 (delta 29), pack-reused 0 (from 0)
Unpacking objects: 100% (35/35), 12.08 KiB | 562.00 KiB/s, done.
From https://github.com/PSLmodels/tax-microdata-benchmarking
 * [new ref]         refs/pull/407/head -> pr-407
Switched to branch 'pr-407'
On branch pr-407

(base) TMD> make format
black . -l 79
All done! ✨ 🍰 ✨
45 files left unchanged.

(base) TMD> make lint  
************* Module tests.test_reweight
tests/test_reweight.py:70:4: C0415: Import outside toplevel (warnings) (import-outside-toplevel)
make: *** [lint] Error 16

donboyd5 · 2026-02-20T14:22:13Z

Ah, you told me to do make format which I did, but not make lint. I will do that and resubmit.

…

On Fri, Feb 20, 2026 at 9:06 AM Martin Holmer ***@***.***> wrote: *martinholmer* left a comment (PSLmodels/tax-microdata-benchmarking#407) <#407 (comment)> @donboyd5 <https://github.com/donboyd5>, The current version of PR #407 <#407> fails the "make lint" test: (base) TMD> ./gitpr 407 remote: Enumerating objects: 48, done. remote: Counting objects: 100% (48/48), done. remote: Compressing objects: 100% (6/6), done. remote: Total 35 (delta 29), reused 35 (delta 29), pack-reused 0 (from 0) Unpacking objects: 100% (35/35), 12.08 KiB | 562.00 KiB/s, done. From https://github.com/PSLmodels/tax-microdata-benchmarking * [new ref] refs/pull/407/head -> pr-407 Switched to branch 'pr-407' On branch pr-407 (base) TMD> make format black . -l 79 All done! ✨ 🍰 ✨ 45 files left unchanged. (base) TMD> make lint ************* Module tests.test_reweight tests/test_reweight.py:70:4: C0415: Import outside toplevel (warnings) (import-outside-toplevel) make: *** [lint] Error 16 — Reply to this email directly, view it on GitHub <#407 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABR4JGE63QU5N5WIF4BCMU34M4IHRAVCNFSM6AAAAACVXB4K2WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTSMZVGAYTIMRTGM> . You are receiving this because you were mentioned.Message ID: ***@***.***>

martinholmer · 2026-02-20T14:32:09Z

@donboyd5 said in PR #407

Ah, you told me to do "make format" which I did, but not "make lint".
I will do that and resubmit.

Thanks. Doing "make lint" on all your PRs seems important given that "make lint" identified a serious bug in the code from the original project.

donboyd5 · 2026-02-20T14:41:06Z

@martinholmer agree 100%. Have made a note of it.

I've fixed the linting error and pushed the update. Hopefully, it is ready for review now.

We won't know until you run it locally whether the cpu results on your machine are close enough to gpu results on my machine to pass all tests but I hope so. They certainly should be extremely close, but it would be nice to pass without increasing any tolerances.

martinholmer · 2026-02-20T14:59:59Z

@donboyd5, Thanks for all your work on PR #407. Yes, the test results are much closer on my Apple M4 computer, but there are a few small differences that have more to do with test logic than the new reweighting algorithm. Why don't you merge PR #407 now and I will create a PR that changes the test tolerances so that the tests pass on my computer. Does that seem like a sensible approach?

I'll leave another comment to this PR that reports my results.

donboyd5 · 2026-02-20T15:02:36Z

@martinholmer, yes, thanks!

martinholmer · 2026-02-20T15:04:50Z

@donboyd5, Here are the results on an Apple M4 computer:

(base) TMD> git branch
  master
* pr-407

(base) TMD> make clean ; make test
rm -f tmd/storage/output/tmd*
rm -f tmd/storage/output/cached*
rm -f tmd/storage/output/preimpute_tmd.csv.gz
python tmd/create_taxcalc_input_variables.py
Creating 2021 PUF+CPS file assuming:
  IMPUTATION_RF_RNG_SEED = 1928374
  IMPUTATION_BETA_RNG_SEED = 37465
  ASSUMED ITMDED_GROW_RATE = 0.020
  ASSUMED W2_WAGES_SCALE = 0.15000
  WEIGHT_DEVIATION_PENALTY = 0.010
  ASSUMED CPS_WEIGHTS_SCALE = 0.58060
Skipping CPS previous year income imputation given lack of data...
Importing PolicyEngine-US variable metadata...
Uprating PUF from 2015 to 2021...
Pre-processing PUF...
Imputing missing PUF demographics...
Constructing hierarchical PUF: 100%|█████████████████████████████████████████| 207692/207692 [00:15<00:00, 13789.76it/s]
Creating tc dataset from 'PUF 2021' for year 2021...
Creating tc dataset from 'CPS 2021' for year 2021...
Combining PUF filers and CPS nonfilers...
Adding Tax-Calculator outputs for 2021...
Reweighting...
...reweighting for year 2021
...weight deviation penalty: 0.01
...weight multiplier bounds: [0.1, 10.0]
...GPU requested but not available, using CPU
...pre-scaled weights: target filers=160,824,340, current filers=161,180,573, scale=0.997790
Targeting 550 SOI statistics
...input records: 225256, columns: 212
...input weights: total=183102955.22, mean=812.866051, sdev=733.258325
...initial loss: 30.1622806381
...starting L-BFGS optimization (up to 800 steps)
    step    1: loss=1.4427306263, grad=7.31e-02
    step    2: loss=0.3713031445, grad=1.35e-02
    step    3: loss=0.2004057567, grad=6.96e-03
    step    4: loss=0.1515716113, grad=5.17e-03
    step    5: loss=0.1404665138, grad=2.23e-03
    step   10: loss=0.1281016441, grad=1.18e-03
    step   20: loss=0.1184997899, grad=5.72e-04
    step   30: loss=0.1164858462, grad=7.42e-04
    step   40: loss=0.1153402114, grad=3.06e-04
    step   50: loss=0.1146045623, grad=1.95e-04
    step   60: loss=0.1141314363, grad=3.22e-04
    step   70: loss=0.1138145214, grad=2.28e-04
    step   80: loss=0.1136413947, grad=1.16e-04
    step   90: loss=0.1135139809, grad=2.49e-04
    step  100: loss=0.1133615669, grad=9.88e-05
    step  110: loss=0.1132213482, grad=9.52e-05
    step  120: loss=0.1131482203, grad=1.46e-04
    step  130: loss=0.1130826892, grad=7.22e-05
    step  140: loss=0.1130244487, grad=9.86e-05
    step  150: loss=0.1129824139, grad=5.02e-05
    step  160: loss=0.1129526712, grad=7.28e-05
    step  170: loss=0.1129196719, grad=6.77e-05
    step  180: loss=0.1128822572, grad=8.37e-05
    step  190: loss=0.1128465989, grad=4.78e-05
    step  200: loss=0.1128207649, grad=2.90e-05
    step  210: loss=0.1127974399, grad=3.10e-05
    step  220: loss=0.1127707716, grad=6.36e-05
    step  230: loss=0.1127379391, grad=4.46e-05
    step  240: loss=0.1127238775, grad=5.05e-05
    step  250: loss=0.1127135144, grad=2.12e-05
    step  260: loss=0.1127061755, grad=2.29e-05
    step  270: loss=0.1126975333, grad=5.53e-05
    step  280: loss=0.1126902734, grad=2.28e-05
    step  290: loss=0.1126833862, grad=1.72e-05
    step  300: loss=0.1126771068, grad=1.70e-05
    step  310: loss=0.1126693816, grad=4.69e-05
    step  320: loss=0.1126622826, grad=2.05e-05
    step  330: loss=0.1126564082, grad=3.85e-05
    step  340: loss=0.1126497881, grad=2.26e-05
    step  350: loss=0.1126432003, grad=3.20e-05
    step  360: loss=0.1126367467, grad=1.93e-05
    step  370: loss=0.1126301489, grad=1.29e-05
    step  380: loss=0.1126251975, grad=4.44e-05
    step  390: loss=0.1126215176, grad=9.70e-06
    converged at step 390 (grad norm 9.70e-06 < 1e-5)
...optimization completed in 1028.2 seconds (390 steps)
...final loss: 0.1126215176
...final weights: total=183705550.49, mean=815.541209, sdev=961.786509
...target accuracy (550 targets):
    mean |relative error|: 0.001207
    max  |relative error|: 0.085798
    within   0.1%:  414/550 (75.3%)
    within   1.0%:  540/550 (98.2%)
    within   5.0%:  549/550 (99.8%)
    within  10.0%:  550/550 (100.0%)
    worst targets:
        8.580% | unemployment compensation/count/AGI in -inf-inf/all returns/All
        2.606% | unemployment compensation/total/AGI in -inf-inf/all returns/All
        2.019% | employment income/total/AGI in 15k-20k/all returns/All
        1.905% | count/count/AGI in -inf-0/all returns/Single
        1.503% | employment income/count/AGI in 10k-15k/all returns/All
        1.374% | count/count/AGI in -inf-0/all returns/All
        1.100% | employment income/count/AGI in 15k-20k/all returns/All
        1.092% | adjusted gross income/total/AGI in 15k-20k/all returns/Single
        1.074% | employment income/total/AGI in 20k-25k/all returns/All
        1.011% | employment income/total/AGI in 50k-75k/all returns/All
...weight changes (vs pre-optimization weights):
    weight ratio (new/original):
      min=0.099779, p5=0.099779, median=0.893403, p95=3.693994, max=9.977898
    distribution of |% change|:
          <0.01%:      15 (0.0%)
       0.01-0.1%:     180 (0.1%)
          0.1-1%:  19,941 (8.9%)
            1-5%:  12,263 (5.4%)
           5-10%:  17,429 (7.7%)
         10-100%: 147,222 (65.4%)
           >100%:  28,206 (12.5%)
...REPRODUCIBILITY FINGERPRINT:
    weights: n=225256, total=183705550.494724, mean=815.541209, sdev=961.786509
    weights: min=0.107695, p25=15.681872, p50=578.096774, p75=1364.337176, max=16527.649527
    sum(weights^2)=358188745118.965576
    final loss: 0.1126215176
...reweighting finished
Removing output variables from PUF+CPS DataFrame...
Writing PUF+CPS file... [/Users/mrh/TMD/tmd/storage/output/tmd.csv.gz]
python tmd/create_taxcalc_sampling_weights.py
python tmd/create_taxcalc_growth_factors.py
python tmd/create_taxcalc_cached_files.py
python tmd/create_taxcalc_imputed_variables.py
Preparing data for imputation...
Imputing overtime and tip data from SIPP to TMD...
Imputing auto loan interest data from CEX to TMD...
Writing preimpute TMD file... [/Users/mrh/TMD/tmd/storage/output/preimpute_tmd.csv.gz]
Writing augmented TMD file... [/Users/mrh/TMD/tmd/storage/output/tmd.csv.gz]

And here are the three sets of test failures:

E           ValueError: 
E           ACT-vs-EXP TAX EXPENDITURE DIFFERENCES:
E           *** actual
E           --- expect
E           ***************
E           *** 1,2 ****
E           ! YR,KIND,EST= 2023 paytax 1381.9
E           ! YR,KIND,EST= 2023 iitax 2237.4
E           --- 1,2 ----
E           ! YR,KIND,EST= 2023 paytax 1381.8
E           ! YR,KIND,EST= 2023 iitax 2237.2
E           ***************
E           *** 7 ****
E           ! YR,KIND,EST= 2023 cgqd_tax_preference 174.5
E           --- 7 ----
E           ! YR,KIND,EST= 2023 cgqd_tax_preference 174.4
E tests/test_tax_expenditures.py:52: ValueError

E           ValueError: 
E           WEIGHT VARIABLE ACT-vs-EXP DIFFS:
E           WEIGHT_DIFF:mean,act,exp= 815.5412085349112 815.5521277934885
E           WEIGHT_DIFF:sdev,act,exp= 961.7865088550712 961.7270821801824
E tests/test_weights.py:28: ValueError

E           ValueError: 
E           IMPUTED VARIABLE DEDUCTION BENEFIT ACT-vs-EXP DIFFS:
E           DIFF:OTM,totben,act,exp= 23.94 23.95
E tests/test_imputed_variables.py:143: ValueError

donboyd5 · 2026-02-20T15:08:41Z

@martinholmer thanks!

donboyd5 mentioned this pull request Feb 19, 2026

Improve reweighting: L-BFGS optimizer, float64, L2 penalty #403

Closed

donboyd5 requested a review from martinholmer February 19, 2026 17:48

donboyd5 and others added 4 commits February 19, 2026 17:09

donboyd5 force-pushed the pr-improve-reweighting branch from 67469e7 to e8700ec Compare February 20, 2026 00:28

Fix linting error: move warnings import to top of test_reweight.py

f187115

donboyd5 merged commit bfffed1 into master Feb 20, 2026

donboyd5 deleted the pr-improve-reweighting branch February 20, 2026 15:26

This was referenced Feb 20, 2026

Increase several test tolerances to ensure tests pass on more computers #411

Merged

Update make idtest tmd results using TMD PR 407 files PSLmodels/Tax-Calculator#2999

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Improve reweighting: higher penalty, more iterations, gradient convergence#407

Improve reweighting: higher penalty, more iterations, gradient convergence#407
donboyd5 merged 5 commits intomasterfrom
pr-improve-reweighting

donboyd5 commented Feb 19, 2026 •

edited

Loading

Uh oh!

donboyd5 commented Feb 19, 2026 •

edited

Loading

Uh oh!

martinholmer commented Feb 19, 2026

Uh oh!

martinholmer commented Feb 20, 2026

Uh oh!

donboyd5 commented Feb 20, 2026 via email

Uh oh!

martinholmer commented Feb 20, 2026

Uh oh!

donboyd5 commented Feb 20, 2026 •

edited

Loading

Uh oh!

martinholmer commented Feb 20, 2026

Uh oh!

donboyd5 commented Feb 20, 2026

Uh oh!

martinholmer commented Feb 20, 2026

Uh oh!

donboyd5 commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

donboyd5 commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Results

Files changed

Commits

Uh oh!

donboyd5 commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martinholmer commented Feb 19, 2026

Uh oh!

martinholmer commented Feb 20, 2026

Uh oh!

donboyd5 commented Feb 20, 2026 via email

Uh oh!

martinholmer commented Feb 20, 2026

Uh oh!

donboyd5 commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martinholmer commented Feb 20, 2026

Uh oh!

donboyd5 commented Feb 20, 2026

Uh oh!

martinholmer commented Feb 20, 2026

Uh oh!

donboyd5 commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

donboyd5 commented Feb 19, 2026 •

edited

Loading

donboyd5 commented Feb 19, 2026 •

edited

Loading

donboyd5 commented Feb 20, 2026 •

edited

Loading