Skip to content

Comments

Improve reweighting: higher penalty, more iterations, gradient convergence#407

Merged
donboyd5 merged 5 commits intomasterfrom
pr-improve-reweighting
Feb 20, 2026
Merged

Improve reweighting: higher penalty, more iterations, gradient convergence#407
donboyd5 merged 5 commits intomasterfrom
pr-improve-reweighting

Conversation

@donboyd5
Copy link
Collaborator

@donboyd5 donboyd5 commented Feb 19, 2026

Summary

Supersedes #403 (rebased on current master).

Improves the reweighting optimization to achieve reliable convergence:

  • Higher deviation penalty: REWEIGHT_DEVIATION_PENALTY increased from 0.0001 to 0.01 (100x), providing sufficient regularization toward a unique solution
  • More iterations: max_lbfgs_iter increased from 200 to 800 (converges at ~332 on GPU, ~391 on CPU)
  • Gradient-norm convergence: Replaced loss_change < 1e-12 with grad_norm < 1e-5 (proper first-order optimality condition, avoids false convergence)
  • Impossible target filtering: Extracted _drop_impossible_targets() helper that filters out all-zero columns before optimization, with warnings.warn() alerts
  • GPU availability messaging: 4-case diagnostic output for GPU status

Results

  • GPU run: converges at step 332, ~77s
  • CPU run: converges at step 391
  • GPU vs CPU: bit-for-bit identical tmd.csv.gz (within same machine)

Files changed

File Change
tmd/imputation_assumptions.py REWEIGHT_DEVIATION_PENALTY: 0.00010.01
tmd/utils/reweight.py Higher iteration limit, gradient-norm convergence, _drop_impossible_targets(), GPU messaging
tests/test_reweight.py New: 3 unit tests for _drop_impossible_targets()

Commits

  • e8a85be Improve reweighting: L-BFGS optimizer, float64, L2 penalty, impossible target filtering
  • 5624795 Improve reweighting: higher penalty, more iterations, gradient convergence
  • 67469e7 Use warnings.warn() for impossible target alert; verify in test

🤖 Generated with Claude Code

@donboyd5
Copy link
Collaborator Author

donboyd5 commented Feb 19, 2026

@martinholmer, I erroneously pushed the updated PR to origin fork. So I've opened this new one.

As noted in PR 403, could you please run make clean && make data on your computer and compare to results in Google Drive subfolder tmd_2026-02-19_lbfgs_gpu_800iters_p010_gradnorm_prupdate? They should (I hope) be extremely close.

If we're looking good, I need to make changes to get it to pass tests, revise, and submit and merge. I will fix the targets file so that it does not try to target all-zero columns, and change expected values for other tests, unless you recommend otherwise.

@martinholmer
Copy link
Collaborator

@donboyd5 said in PR #407:

As noted in #403, could you please run make clean && make data on your computer and compare to results in Google Drive subfolder tmd_2026-02-19_lbfgs_gpu_800iters_p010_gradnorm_prupdate? They should (I hope) be extremely close.

I can't do that until you update PR #407 for recent changes on the master branch (notably the bug fix in PR #408 and the test updates in PR #409). Plus, your PR #407 fails the tests on GitHub, so you need to fix that as well (by running "make format" on your computer).

donboyd5 and others added 4 commits February 19, 2026 17:09
…e target filtering

Major changes to the national reweighting optimizer:

1. Drop impossible targets: automatically filter out targets where the
   data column is all-zero (8 targets for estate income/losses and
   rent & royalty net income/losses). These caused the loss to plateau
   at ~8.0. Now 550 targets instead of 558.

2. Switch float32 to float64: eliminates floating-point precision issues
   that caused cross-machine non-determinism on the flat loss surface.

3. Run reweighting in a subprocess: isolates from PyTorch autograd state
   left by PolicyEngine Microsimulation, which shifted gradient
   accumulation order by 1 ULP, compounding over many iterations.

4. Pre-scale weights: multiply all weights so the weighted filer total
   matches the SOI target before optimization. Ensures the L2 deviation
   penalty only measures redistributive changes, not the level shift.

5. Enable L2 weight deviation penalty (default 0.0001): penalizes
   sum((new - original)^2) / sum(original^2), scaled by the initial
   loss value. Reduces extreme weight distortion while maintaining
   excellent target accuracy.

6. Switch Adam to L-BFGS optimizer: quasi-Newton method with strong
   Wolfe line search. Dramatically better convergence — 549/550 targets
   within 0.1% (vs 523/550 with Adam at same penalty). GPU and CPU
   produce nearly identical results.

7. Extract build_loss_matrix() to module-level function for reuse.

8. Add diagnostic output: penalty value, target accuracy statistics,
   weight change distribution, and reproducibility fingerprint for
   cross-machine comparison.

Remove TensorBoard and tqdm dependencies (no longer needed with L-BFGS).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…gence

- Increase REWEIGHT_DEVIATION_PENALTY from 0.0001 to 0.01 for better
  weight stability (99x stronger regularization toward original weights)
- Increase max L-BFGS iterations from 200 to 800; in practice convergence
  typically occurs well before 800 steps
- Replace loss-change convergence criterion (abs(prev-curr) < 1e-12) with
  gradient-norm criterion (grad_norm < 1e-5), which is the proper first-order
  optimality condition and avoids false convergence when the Hessian
  approximation is poor
- Add grad_norm to per-step console output alongside loss
- Improve GPU status messages to cover all four cases: enabled, requested
  but unavailable, available but disabled by user, and not available
- Remove console NaN-check loop (replaced by unit test coverage)
- Extract impossible-target filtering into _drop_impossible_targets() helper
  and add unit tests for it in tests/test_reweight.py

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Replace print("WARNING: ...") in _drop_impossible_targets with
warnings.warn(..., UserWarning) so the alert surfaces in pytest output
rather than being lost in console noise.

Update test_drop_impossible_targets_removes_all_zero_column to use
pytest.warns(UserWarning) to explicitly verify the warning is raised.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Remove 4 all-zero target variables (estate_income, estate_losses,
rent_and_royalty_net_income, rent_and_royalty_net_losses) from the
reweighting optimizer since tc_to_soi() hardcodes them to zero.

Switch tests to np.allclose default tolerances (rtol=1e-5, atol=1e-8)
with exact machine-generated expected values for: test_weights,
test_tax_expenditures, test_imputed_variables.

Add test_no_all_zero_columns_in_real_loss_matrix to verify the fix.
Skip test_variable_totals pending issue #410.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@donboyd5 donboyd5 force-pushed the pr-improve-reweighting branch from 67469e7 to e8700ec Compare February 20, 2026 00:28
@martinholmer
Copy link
Collaborator

@donboyd5, The current version of PR #407 fails the "make lint" test:

(base) TMD> ./gitpr 407
remote: Enumerating objects: 48, done.
remote: Counting objects: 100% (48/48), done.
remote: Compressing objects: 100% (6/6), done.
remote: Total 35 (delta 29), reused 35 (delta 29), pack-reused 0 (from 0)
Unpacking objects: 100% (35/35), 12.08 KiB | 562.00 KiB/s, done.
From https://github.com/PSLmodels/tax-microdata-benchmarking
 * [new ref]         refs/pull/407/head -> pr-407
Switched to branch 'pr-407'
On branch pr-407

(base) TMD> make format
black . -l 79
All done! ✨ 🍰 ✨
45 files left unchanged.

(base) TMD> make lint  
************* Module tests.test_reweight
tests/test_reweight.py:70:4: C0415: Import outside toplevel (warnings) (import-outside-toplevel)
make: *** [lint] Error 16

@donboyd5
Copy link
Collaborator Author

donboyd5 commented Feb 20, 2026 via email

@martinholmer
Copy link
Collaborator

@donboyd5 said in PR #407

Ah, you told me to do "make format" which I did, but not "make lint".
I will do that and resubmit.

Thanks. Doing "make lint" on all your PRs seems important given that "make lint" identified a serious bug in the code from the original project.

@donboyd5
Copy link
Collaborator Author

donboyd5 commented Feb 20, 2026

@martinholmer agree 100%. Have made a note of it.

I've fixed the linting error and pushed the update. Hopefully, it is ready for review now.

We won't know until you run it locally whether the cpu results on your machine are close enough to gpu results on my machine to pass all tests but I hope so. They certainly should be extremely close, but it would be nice to pass without increasing any tolerances.

@martinholmer
Copy link
Collaborator

@donboyd5, Thanks for all your work on PR #407. Yes, the test results are much closer on my Apple M4 computer, but there are a few small differences that have more to do with test logic than the new reweighting algorithm. Why don't you merge PR #407 now and I will create a PR that changes the test tolerances so that the tests pass on my computer. Does that seem like a sensible approach?

I'll leave another comment to this PR that reports my results.

@donboyd5
Copy link
Collaborator Author

@martinholmer, yes, thanks!

@donboyd5 donboyd5 merged commit bfffed1 into master Feb 20, 2026
@martinholmer
Copy link
Collaborator

@donboyd5, Here are the results on an Apple M4 computer:

(base) TMD> git branch
  master
* pr-407

(base) TMD> make clean ; make test
rm -f tmd/storage/output/tmd*
rm -f tmd/storage/output/cached*
rm -f tmd/storage/output/preimpute_tmd.csv.gz
python tmd/create_taxcalc_input_variables.py
Creating 2021 PUF+CPS file assuming:
  IMPUTATION_RF_RNG_SEED = 1928374
  IMPUTATION_BETA_RNG_SEED = 37465
  ASSUMED ITMDED_GROW_RATE = 0.020
  ASSUMED W2_WAGES_SCALE = 0.15000
  WEIGHT_DEVIATION_PENALTY = 0.010
  ASSUMED CPS_WEIGHTS_SCALE = 0.58060
Skipping CPS previous year income imputation given lack of data...
Importing PolicyEngine-US variable metadata...
Uprating PUF from 2015 to 2021...
Pre-processing PUF...
Imputing missing PUF demographics...
Constructing hierarchical PUF: 100%|█████████████████████████████████████████| 207692/207692 [00:15<00:00, 13789.76it/s]
Creating tc dataset from 'PUF 2021' for year 2021...
Creating tc dataset from 'CPS 2021' for year 2021...
Combining PUF filers and CPS nonfilers...
Adding Tax-Calculator outputs for 2021...
Reweighting...
...reweighting for year 2021
...weight deviation penalty: 0.01
...weight multiplier bounds: [0.1, 10.0]
...GPU requested but not available, using CPU
...pre-scaled weights: target filers=160,824,340, current filers=161,180,573, scale=0.997790
Targeting 550 SOI statistics
...input records: 225256, columns: 212
...input weights: total=183102955.22, mean=812.866051, sdev=733.258325
...initial loss: 30.1622806381
...starting L-BFGS optimization (up to 800 steps)
    step    1: loss=1.4427306263, grad=7.31e-02
    step    2: loss=0.3713031445, grad=1.35e-02
    step    3: loss=0.2004057567, grad=6.96e-03
    step    4: loss=0.1515716113, grad=5.17e-03
    step    5: loss=0.1404665138, grad=2.23e-03
    step   10: loss=0.1281016441, grad=1.18e-03
    step   20: loss=0.1184997899, grad=5.72e-04
    step   30: loss=0.1164858462, grad=7.42e-04
    step   40: loss=0.1153402114, grad=3.06e-04
    step   50: loss=0.1146045623, grad=1.95e-04
    step   60: loss=0.1141314363, grad=3.22e-04
    step   70: loss=0.1138145214, grad=2.28e-04
    step   80: loss=0.1136413947, grad=1.16e-04
    step   90: loss=0.1135139809, grad=2.49e-04
    step  100: loss=0.1133615669, grad=9.88e-05
    step  110: loss=0.1132213482, grad=9.52e-05
    step  120: loss=0.1131482203, grad=1.46e-04
    step  130: loss=0.1130826892, grad=7.22e-05
    step  140: loss=0.1130244487, grad=9.86e-05
    step  150: loss=0.1129824139, grad=5.02e-05
    step  160: loss=0.1129526712, grad=7.28e-05
    step  170: loss=0.1129196719, grad=6.77e-05
    step  180: loss=0.1128822572, grad=8.37e-05
    step  190: loss=0.1128465989, grad=4.78e-05
    step  200: loss=0.1128207649, grad=2.90e-05
    step  210: loss=0.1127974399, grad=3.10e-05
    step  220: loss=0.1127707716, grad=6.36e-05
    step  230: loss=0.1127379391, grad=4.46e-05
    step  240: loss=0.1127238775, grad=5.05e-05
    step  250: loss=0.1127135144, grad=2.12e-05
    step  260: loss=0.1127061755, grad=2.29e-05
    step  270: loss=0.1126975333, grad=5.53e-05
    step  280: loss=0.1126902734, grad=2.28e-05
    step  290: loss=0.1126833862, grad=1.72e-05
    step  300: loss=0.1126771068, grad=1.70e-05
    step  310: loss=0.1126693816, grad=4.69e-05
    step  320: loss=0.1126622826, grad=2.05e-05
    step  330: loss=0.1126564082, grad=3.85e-05
    step  340: loss=0.1126497881, grad=2.26e-05
    step  350: loss=0.1126432003, grad=3.20e-05
    step  360: loss=0.1126367467, grad=1.93e-05
    step  370: loss=0.1126301489, grad=1.29e-05
    step  380: loss=0.1126251975, grad=4.44e-05
    step  390: loss=0.1126215176, grad=9.70e-06
    converged at step 390 (grad norm 9.70e-06 < 1e-5)
...optimization completed in 1028.2 seconds (390 steps)
...final loss: 0.1126215176
...final weights: total=183705550.49, mean=815.541209, sdev=961.786509
...target accuracy (550 targets):
    mean |relative error|: 0.001207
    max  |relative error|: 0.085798
    within   0.1%:  414/550 (75.3%)
    within   1.0%:  540/550 (98.2%)
    within   5.0%:  549/550 (99.8%)
    within  10.0%:  550/550 (100.0%)
    worst targets:
        8.580% | unemployment compensation/count/AGI in -inf-inf/all returns/All
        2.606% | unemployment compensation/total/AGI in -inf-inf/all returns/All
        2.019% | employment income/total/AGI in 15k-20k/all returns/All
        1.905% | count/count/AGI in -inf-0/all returns/Single
        1.503% | employment income/count/AGI in 10k-15k/all returns/All
        1.374% | count/count/AGI in -inf-0/all returns/All
        1.100% | employment income/count/AGI in 15k-20k/all returns/All
        1.092% | adjusted gross income/total/AGI in 15k-20k/all returns/Single
        1.074% | employment income/total/AGI in 20k-25k/all returns/All
        1.011% | employment income/total/AGI in 50k-75k/all returns/All
...weight changes (vs pre-optimization weights):
    weight ratio (new/original):
      min=0.099779, p5=0.099779, median=0.893403, p95=3.693994, max=9.977898
    distribution of |% change|:
          <0.01%:      15 (0.0%)
       0.01-0.1%:     180 (0.1%)
          0.1-1%:  19,941 (8.9%)
            1-5%:  12,263 (5.4%)
           5-10%:  17,429 (7.7%)
         10-100%: 147,222 (65.4%)
           >100%:  28,206 (12.5%)
...REPRODUCIBILITY FINGERPRINT:
    weights: n=225256, total=183705550.494724, mean=815.541209, sdev=961.786509
    weights: min=0.107695, p25=15.681872, p50=578.096774, p75=1364.337176, max=16527.649527
    sum(weights^2)=358188745118.965576
    final loss: 0.1126215176
...reweighting finished
Removing output variables from PUF+CPS DataFrame...
Writing PUF+CPS file... [/Users/mrh/TMD/tmd/storage/output/tmd.csv.gz]
python tmd/create_taxcalc_sampling_weights.py
python tmd/create_taxcalc_growth_factors.py
python tmd/create_taxcalc_cached_files.py
python tmd/create_taxcalc_imputed_variables.py
Preparing data for imputation...
Imputing overtime and tip data from SIPP to TMD...
Imputing auto loan interest data from CEX to TMD...
Writing preimpute TMD file... [/Users/mrh/TMD/tmd/storage/output/preimpute_tmd.csv.gz]
Writing augmented TMD file... [/Users/mrh/TMD/tmd/storage/output/tmd.csv.gz]

And here are the three sets of test failures:

E           ValueError: 
E           ACT-vs-EXP TAX EXPENDITURE DIFFERENCES:
E           *** actual
E           --- expect
E           ***************
E           *** 1,2 ****
E           ! YR,KIND,EST= 2023 paytax 1381.9
E           ! YR,KIND,EST= 2023 iitax 2237.4
E           --- 1,2 ----
E           ! YR,KIND,EST= 2023 paytax 1381.8
E           ! YR,KIND,EST= 2023 iitax 2237.2
E           ***************
E           *** 7 ****
E           ! YR,KIND,EST= 2023 cgqd_tax_preference 174.5
E           --- 7 ----
E           ! YR,KIND,EST= 2023 cgqd_tax_preference 174.4
E tests/test_tax_expenditures.py:52: ValueError
E           ValueError: 
E           WEIGHT VARIABLE ACT-vs-EXP DIFFS:
E           WEIGHT_DIFF:mean,act,exp= 815.5412085349112 815.5521277934885
E           WEIGHT_DIFF:sdev,act,exp= 961.7865088550712 961.7270821801824
E tests/test_weights.py:28: ValueError
E           ValueError: 
E           IMPUTED VARIABLE DEDUCTION BENEFIT ACT-vs-EXP DIFFS:
E           DIFF:OTM,totben,act,exp= 23.94 23.95
E tests/test_imputed_variables.py:143: ValueError

@donboyd5
Copy link
Collaborator Author

@martinholmer thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants