Improve reweighting: L-BFGS optimizer, float64, L2 penalty by donboyd5 · Pull Request #403 · PSLmodels/tax-microdata-benchmarking

donboyd5 · 2026-02-18T12:07:58Z

Summary

Addresses #400. Improves the reweighting optimization in several ways:

Drop impossible targets: Filters out targets where all data values are zero (cannot be hit by any weighting)
L-BFGS optimizer: Replaces Adam with L-BFGS (200 steps, strong Wolfe line search), which converges faster and more reliably on this smooth loss surface
float64 precision: Uses double precision throughout optimization instead of float32
L2 weight deviation penalty: Adds a regularization term (REWEIGHT_DEVIATION_PENALTY = 0.0001) that penalizes large deviations from original weights using L2 norm: sum((new - original)^2) / sum(original^2)
Pre-scale weights: Scales initial weights so their sum matches the SOI filer total before optimization begins
Subprocess isolation: Runs reweighting in a subprocess to prevent PyTorch autograd state contamination from prior PolicyEngine operations (ensures reproducible results)
Diagnostic output: Prints target accuracy stats, weight change distribution, and a reproducibility fingerprint for cross-machine comparison

The 8 impossible targets (all-zero data columns) dropped are:

estate income/total
estate income/count
estate losses/total
estate losses/count
rent and royalty net income/total
rent and royalty net income/count
rent and royalty net losses/total
rent and royalty net losses/count

Results

Optimization (NVIDIA GeForce RTX 5070 Ti):

200 L-BFGS steps in 45.8 seconds
Final loss: 0.0013541230

Target accuracy (550 targets after filtering impossible ones):

Threshold	Targets hit
within 0.1%	549/550 (99.8%)
within 1.0%	550/550 (100.0%)

Mean |relative error|: 0.000036
Max |relative error|: 0.001088 (unemployment compensation count)

Weight change distribution (vs pre-optimization weights):

Weight ratio (new/original): min=0.10, median=0.93, max=9.98
64.9% of weights changed by 10-100%, 7.9% changed by >100%

Reproducibility fingerprint:

weights: n=225256, total=183944426.237832, mean=816.601672, sdev=1034.200297
weights: min=0.107695, p25=19.458470, p50=498.854647, p75=1350.629694, max=16527.649527
sum(weights^2)=391136443083.456787
final loss: 0.0013541230

Known test failures

4 tests fail because expected values need updating for the new weights. These are not regressions — they reflect the changed weight distribution. Test expectations should be updated after the approach is reviewed and approved.

Test	Issue
`test_weights`	Weight sdev changed: 1034.2 vs expected 1140.2 (lower sdev = less extreme weights)
`test_variable_totals`	e24515 (Sch D Sec 1250 Gain) 76.2% above expected
`test_imputed_variables`	Small diffs in OTM/TIP deduction benefit estimates (within ~1%)
`test_tax_expenditures`	Small diffs in tax expenditure estimates (within ~1%)

40 passed, 4 failed, 2 skipped.

Files changed

File	Change
`tmd/utils/reweight.py`	Major rewrite: L-BFGS, float64, penalty, diagnostics
`tmd/datasets/tmd.py`	Subprocess isolation with temp files
`tmd/imputation_assumptions.py`	Set `REWEIGHT_DEVIATION_PENALTY = 0.0001`

…e target filtering Major changes to the national reweighting optimizer: 1. Drop impossible targets: automatically filter out targets where the data column is all-zero (8 targets for estate income/losses and rent & royalty net income/losses). These caused the loss to plateau at ~8.0. Now 550 targets instead of 558. 2. Switch float32 to float64: eliminates floating-point precision issues that caused cross-machine non-determinism on the flat loss surface. 3. Run reweighting in a subprocess: isolates from PyTorch autograd state left by PolicyEngine Microsimulation, which shifted gradient accumulation order by 1 ULP, compounding over many iterations. 4. Pre-scale weights: multiply all weights so the weighted filer total matches the SOI target before optimization. Ensures the L2 deviation penalty only measures redistributive changes, not the level shift. 5. Enable L2 weight deviation penalty (default 0.0001): penalizes sum((new - original)^2) / sum(original^2), scaled by the initial loss value. Reduces extreme weight distortion while maintaining excellent target accuracy. 6. Switch Adam to L-BFGS optimizer: quasi-Newton method with strong Wolfe line search. Dramatically better convergence — 549/550 targets within 0.1% (vs 523/550 with Adam at same penalty). GPU and CPU produce nearly identical results. 7. Extract build_loss_matrix() to module-level function for reuse. 8. Add diagnostic output: penalty value, target accuracy statistics, weight change distribution, and reproducibility fingerprint for cross-machine comparison. Remove TensorBoard and tqdm dependencies (no longer needed with L-BFGS). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

donboyd5 · 2026-02-18T12:22:12Z

@martinholmer, could you please review this draft PR that addresses #400?

Please run it on your machine with make clean and make data and examine, in particular the "reproducibility fingerprint" in comparison to the same on my machine, which is in the comment above. If we are the same to 4-6 significant digits I think we are likely to find results sufficiently reproducible across machines. We shall see.

The key changes that probably aid in making results cross-machine-reproducible are (1) adding a regularization term - a penalty for deviations from the pre-optimization weights (biggest impact), which should make the optimizer seek a solution, of many possible solutions, that is unique, and (2) moving to float64 from float32.

I think the regularization term should make the results much better (more plausible and better representations of the real world) in addition to being more reproducible. I was always frustrated that we didn't use regularization before. (It was not working well, likely because it had an L1 loss function; this PR changes it to an L2 loss for the weight deviation penalty.)

Running it probably will take more time than before, because it does so much more. With GPU, it runs in about 45 seconds on my machine, up from 13 seconds before. With CPU only (the automatic fallback), it runs in about 13 minutes, which is ok I guess. If your machine is older and doesn't have a GPU it may take longer. It runs for 200 iterations.

It currently fails several tests, as noted above. That will have to be fixed. I think the right solution is to change expectations but could you please weigh in on that? I can update the expectations if we agree that's the way to go.

martinholmer · 2026-02-18T15:12:47Z

@donboyd5, your top-level comment for PR #403 should mention the eight targets you have removed from the reweighting logic.

donboyd5 · 2026-02-18T15:18:15Z

@donboyd5, your top-level comment for PR #403 should mention the eight targets you have removed from the reweighting logic.

Thanks, @martinholmer. Done.

martinholmer · 2026-02-18T17:16:52Z

@donboyd5, I download PR #403 and merged in recent changes on the master branch (which is something you should do on your computer), and then executed "make clean ; make test".

Suggestion

One thing I would suggest is removing this print output:

WARNING: Dropping 8 impossible targets (all-zero data columns):
  - estate income/total/AGI in -inf-inf/all returns/All
  - estate income/count/AGI in -inf-inf/all returns/All
  - estate losses/total/AGI in -inf-inf/all returns/All
  - estate losses/count/AGI in -inf-inf/all returns/All
  - rent and royalty net income/total/AGI in -inf-inf/all returns/All
  - rent and royalty net income/count/AGI in -inf-inf/all returns/All
  - rent and royalty net losses/total/AGI in -inf-inf/all returns/All
  - rent and royalty net losses/count/AGI in -inf-inf/all returns/All

This is not a WARNING; these eight variables should never have been included in the reweighting optimization.
Remove them in the code with an appropriate explanation in coments, but there is no need to litter the execution output with these details.

Question

I thought these changes would bring the reweighting results across computers into line, but that this not so.
Why the remaining differences in the weights?

Targeting 550 SOI statistics
...input records: 225256, columns: 212
...input weights: total=183102955.22, mean=812.866051, sdev=733.258325
...initial loss: 30.1834235773
...starting L-BFGS optimization (up to 200 steps)
    step    1: loss=1.2405617208
    step    2: loss=0.2573820229
    step    3: loss=0.0852074509
    step    4: loss=0.0259646406
    step    5: loss=0.0103684975
    step   10: loss=0.0022622399
    step   20: loss=0.0016362870
    step   30: loss=0.0015472874
    step   40: loss=0.0015347108
    step   50: loss=0.0015073811
    step   60: loss=0.0014717812
    step   70: loss=0.0014536594
    step   80: loss=0.0014467455
    step   90: loss=0.0014381440
    step  100: loss=0.0014259465
    step  110: loss=0.0014020188
    step  120: loss=0.0013950044
    step  130: loss=0.0013878916
    step  140: loss=0.0013781245
    step  150: loss=0.0013709365
    step  160: loss=0.0013634679
    step  170: loss=0.0013601118
    step  180: loss=0.0013586319
    step  190: loss=0.0013553898
    step  200: loss=0.0013524386
...optimization completed in 523.7 seconds (200 steps)
...final loss: 0.0013524386
...final weights: total=183944911.85, mean=816.603828, sdev=1033.750389
...target accuracy (550 targets):
    mean |relative error|: 0.000034
    max  |relative error|: 0.001157
    within   0.1%:  549/550 (99.8%)
    within   1.0%:  550/550 (100.0%)
    within   5.0%:  550/550 (100.0%)
    within  10.0%:  550/550 (100.0%)
    worst targets:
        0.116% | unemployment compensation/count/AGI in -inf-inf/all returns/All
        0.038% | unemployment compensation/total/AGI in -inf-inf/all returns/All
        0.037% | employment income/total/AGI in 15k-20k/all returns/All
        0.031% | employment income/count/AGI in 10k-15k/all returns/All
        0.028% | adjusted gross income/total/AGI in 15k-20k/all returns/Single
        0.028% | employment income/total/AGI in 50k-75k/all returns/All
        0.027% | adjusted gross income/total/AGI in 50k-75k/all returns/Single
        0.020% | employment income/total/AGI in 40k-50k/all returns/All
        0.020% | adjusted gross income/total/AGI in 15k-20k/all returns/All
        0.020% | capital gains gross/total/AGI in 10m-inf/all returns/All
...weight changes (vs pre-optimization weights):
    weight ratio (new/original):
      min=0.099779, p5=0.099779, median=0.929497, p95=2.423514, max=9.977898
    distribution of |% change|:
          <0.01%:      44 (0.0%)
       0.01-0.1%:     363 (0.2%)
          0.1-1%:  21,751 (9.7%)
            1-5%:  17,104 (7.6%)
           5-10%:  22,143 (9.8%)
         10-100%: 146,008 (64.8%)
           >100%:  17,843 (7.9%)
...REPRODUCIBILITY FINGERPRINT:
    weights: n=225256, total=183944911.853932, mean=816.603828, sdev=1033.750389
    weights: min=0.107695, p25=19.141867, p50=501.114243, p75=1352.771224, max=16527.649527
    sum(weights^2)=390927660938.248657
    final loss: 0.0013524386
...reweighting finished

martinholmer · 2026-02-18T17:31:24Z

@donboyd5, when on your computer you use your NVIDIA chip to accelerate the reweighting optimization, are the floating-point operations on the NVIDIA chip being done at 32-bit precision or at 64-bit precision?

Google AI gives this response (which is not clear to me):

Q: Does the NVIDIA GeForce RTX 5070 Ti chip do 32bit or 64bit floating point calculations?

A: I Overview
The NVIDIA GeForce RTX 5070 Ti, based on the Blackwell architecture (GB203) with 16GB GDDR7, 
primarily utilizes 32-bit floating-point (FP32) calculations for gaming and AI workloads. 
While it supports 64-bit (FP64) operations, these are not its focus, and the series has 
abandoned native 32-bit CUDA support for legacy applications. 

FP32 (Single Precision): This is the standard for rendering and gaming, driven by the 
card's 8,960 CUDA cores.

64-bit Calculation: Supported for compute-intensive tasks, though, like all GeForce cards, it is
significantly slower than FP32 performance.

Architecture: Blackwell architecture supports advanced tensor cores for AI-driven 32-bit (or lower)
calculations.

Memory: 256-bit memory bus with GDDR7. 

The RTX 5070 Ti is designed as an enthusiast-class card, focusing on high-speed gaming and
AI acceleration using 32-bit precision.

martinholmer · 2026-02-18T17:43:56Z

@donboyd5 said in PR #403:

Running it probably will take more time than before, because it does so much more. With GPU, it runs in about 45 seconds on my machine, up from 13 seconds before. With CPU only (the automatic fallback), it runs in about 13 minutes, which is ok I guess. If your machine is older and doesn't have a GPU it may take longer. It runs for 200 iterations.

Actually, when using the M4 64-bit CPU on my Mac AirBook, the reweighting time is 524 secs (~8.7 mins).

donboyd5 · 2026-02-18T18:21:28Z

Q: Does the NVIDIA GeForce RTX 5070 Ti chip do 32bit or 64bit floating point calculations?

A: I Overview
The NVIDIA GeForce RTX 5070 Ti, based on the Blackwell architecture (GB203) with 16GB GDDR7,
primarily utilizes 32-bit floating-point (FP32) calculations for gaming and AI workloads.
While it supports 64-bit (FP64) operations, these are not its focus, and the series has
abandoned native 32-bit CUDA support for legacy applications.

FP32 (Single Precision): This is the standard for rendering and gaming, driven by the
card's 8,960 CUDA cores.

64-bit Calculation: Supported for compute-intensive tasks, though, like all GeForce cards, it is
significantly slower than FP32 performance.

Architecture: Blackwell architecture supports advanced tensor cores for AI-driven 32-bit (or lower)
calculations.

Memory: 256-bit memory bus with GDDR7.

The RTX 5070 Ti is designed as an enthusiast-class card, focusing on high-speed gaming and
AI acceleration using 32-bit precision.

Here's what Claude says with regard to the specific code:

Our code does run in true 64-bit on the GPU. When we create tensors with torch.float64, PyTorch executes FP64 operations on the CUDA cores — the GPU respects the tensor dtype, it doesn't silently downgrade.

However, Google AI's answer highlights the tradeoff:

FP64 on GeForce cards runs at ~1/32 the throughput of FP32 (consumer cards intentionally throttle FP64 to differentiate from data-center GPUs like A100/H100)
Our operations use the regular CUDA cores, not the tensor cores (which only support FP16/BF16/TF32)
Despite this, 200 L-BFGS steps still completed in ~46 seconds, so the throughput penalty is acceptable for our problem size
One nuance worth knowing: PyTorch on Ampere+ GPUs defaults torch.backends.cuda.matmul.allow_tf32 = True, which can silently downgrade FP32 matrix multiplications to TF32 (only 10-bit mantissa). This doesn't affect our FP64 tensors, but it's another reason FP64 was a good choice — it sidesteps that pitfall entirely.

So bottom line: we're genuinely computing in 64-bit, it's just slower than 32-bit would be. For our workload (~46 seconds total), it's well worth the precision.

donboyd5 · 2026-02-18T18:21:42Z

@donboyd5 said in PR #403:

Running it probably will take more time than before, because it does so much more. With GPU, it runs in about 45 seconds on my machine, up from 13 seconds before. With CPU only (the automatic fallback), it runs in about 13 minutes, which is ok I guess. If your machine is older and doesn't have a GPU it may take longer. It runs for 200 iterations.

Actually, when using the M4 64-bit CPU on my Mac AirBook, the reweighting time is 524 secs (~8.7 mins).

Impressive.

donboyd5 · 2026-02-18T18:25:54Z

Suggestion

One thing I would suggest is removing this print output:
WARNING: Dropping 8 impossible targets (all-zero data columns):
  - estate income/total/AGI in -inf-inf/all returns/All
  - estate income/count/AGI in -inf-inf/all returns/All
  - estate losses/total/AGI in -inf-inf/all returns/All
  - estate losses/count/AGI in -inf-inf/all returns/All
  - rent and royalty net income/total/AGI in -inf-inf/all returns/All
  - rent and royalty net income/count/AGI in -inf-inf/all returns/All
  - rent and royalty net losses/total/AGI in -inf-inf/all returns/All
  - rent and royalty net losses/count/AGI in -inf-inf/all returns/All
This is not a WARNING; these eight variables should never have been included in the reweighting optimization. Remove them in the code with an appropriate explanation in coments, but there is no need to litter the execution output with these details.

I think we need some kind of programmer self-protection -- with automated methods of defining targets even a responsible programmer in the future testing a large number of possible targets could include a target that the data cannot hit. That should be reported. One option would be to add a test that checks whether any columns were all zeros (and therefore removed). Then we wouldn't need anything in the normal console output.

donboyd5 · 2026-02-18T18:38:54Z

I thought these changes would bring the reweighting results across computers into line, but that this not so. Why the remaining differences in the weights?
...REPRODUCIBILITY FINGERPRINT:
    weights: n=225256, total=183944911.853932, mean=816.603828, sdev=1033.750389
    weights: min=0.107695, p25=19.141867, p50=501.114243, p75=1352.771224, max=16527.649527
    sum(weights^2)=390927660938.248657
    final loss: 0.0013524386
...reweighting finished

There are still hardware and software differences that could lead to very small differences in results. I think these differences are extremely small and might have no noticeable impact on results. Can you look at the actual values I have for the following test failures and see if they are noticeably different from what you have? I suspect they are extremely close if not identical (as far as amounts displayed are concerned; there would be differences if we showed full precision, of course).

We should decide whether differences are small enough for our purposes. If not we can always try another 100 iterations and see what we get.

E           ValueError: 
E           IMPUTED VARIABLE DEDUCTION BENEFIT ACT-vs-EXP DIFFS:
E           DIFF:OTM,totben,act,exp,atol,rtol= 24.02 23.88 0.01 0.0015
E           DIFF:OTM,affpct,act,exp,atol,rtol= 8.85 8.83 0.01 0.0001
E           DIFF:OTM,affben,act,exp,atol,rtol= 1411.0 1406 1.0 0.0
E           DIFF:TIP,affpct,act,exp,atol,rtol= 2.6 2.58 0.01 0.0001
E           DIFF:TIP,affben,act,exp,atol,rtol= 1388.0 1400 1.0 0.0
E           DIFF:ALL,affpct,act,exp,atol,rtol= 28.07 28.04 0.01 0.0001
E           DIFF:ALL,affben,act,exp,atol,rtol= 1018.0 1020 1.0 0.0

E           ACT-vs-EXP TAX EXPENDITURE DIFFERENCES:
E           *** actual
E           --- expect
E           ***************
E           *** 1,7 ****
E           ! YR,KIND,EST= 2023 paytax 1387.4
E           ! YR,KIND,EST= 2023 iitax 2236.8
E           ! YR,KIND,EST= 2023 ctc 129.6
E           ! YR,KIND,EST= 2023 eitc 77.6
E           ! YR,KIND,EST= 2023 social_security_partial_taxability 57.5
E           ! YR,KIND,EST= 2023 niit -43.7
E           ! YR,KIND,EST= 2023 cgqd_tax_preference 174.9
E           *** 9 ****
E           ! YR,KIND,EST= 2023 salt 20.9

donboyd5 · 2026-02-18T19:30:40Z

@martinholmer, the note above about differences is just for starters. Most of those numbers have few digits. It would be better to compare more-precise numbers. If you can provide me with your tmd files in the folder we have been sharing that would be great. Meanwhile, I'll post my files that result from the run I did for the PR.

martinholmer · 2026-02-18T20:22:04Z

@donboyd5 said in PR #403:

One option would be to add a test that checks whether any columns were all zeros (and therefore removed). Then we wouldn't need anything in the normal console output.

This is an excellent idea --- one that should have been part of the original phase of the project.
Why don't you add that check and streamline the "normal console output"?

donboyd5 · 2026-02-18T20:24:58Z

Will definitely do when I revise the PR. Sent from my phone; please excuse brevity and speech-to-text errors.

…

On Wed, Feb 18, 2026, 3:22 PM Martin Holmer ***@***.***> wrote: *martinholmer* left a comment (PSLmodels/tax-microdata-benchmarking#403) <#403 (comment)> @donboyd5 <https://github.com/donboyd5> said in PR #403 <#403>: One option would be to add a test that checks whether any columns were all zeros (and therefore removed). Then we wouldn't need anything in the normal console output. This is an excellent idea --- one that should have been part of the original phase of the project. Why don't you add that check and streamline the "normal console output"? — Reply to this email directly, view it on GitHub <#403 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABR4JGAWWEUS2YP5HMA6WQL4MTCYFAVCNFSM6AAAAACVRD6Y6WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTSMRSHE4TOMRYGM> . You are receiving this because you were mentioned.Message ID: ***@***.***>

martinholmer · 2026-02-18T20:29:44Z

@donboyd5 said in PR #403:

There are still hardware and software differences that could lead to very small differences in results. I think these differences are extremely small and might have no noticeable impact on results. Can you look at the actual values I have for the following test failures and see if they are noticeably different from what you have?

Comparing "actual values" is a good idea, but the test results are too indirect.
I'll be happy to do the comparison if you make available to me the following two sets of the three tmd* files:

The three tmd* files generated on your computer when using your Nvidia chip.
The three tmd* files generated on your computer when NOT using your Nvidia chip.

Thanks.

martinholmer · 2026-02-18T21:44:20Z

@donboyd5 said in PR #403 that when using his Nvidia chip the reweighting FINGERPRINT was this:

weights: n=225256, total=183944426.237832, mean=816.601672, sdev=1034.200297
weights: min=0.107695, p25=19.458470, p50=498.854647, p75=1350.629694, max=16527.649527
sum(weights^2)=391136443083.456787
final loss: 0.0013541230

Using my Apple M4 chip, I get this FINGERPRINT:

weights: n=225256, total=183944911.853932, mean=816.603828, sdev=1033.750389
weights: min=0.107695, p25=19.141867, p50=501.114243, p75=1352.771224, max=16527.649527
sum(weights^2)=390927660938.248657
final loss: 0.0013524386

The differences in these fingerprints are considerable.
For example, the median weight is 498.854647 versus 501.114243,
which is a ratio of 1.00453, with the difference in the median showing up
in the FIRST decimal digit (when 64-bit IEEE 754 precision is roughly
16 decimal digits).
And the final loss amount differs in the 4th decimal digit.
I don't understand why the differences are this big.

donboyd5 · 2026-02-19T16:16:12Z

@martinholmer, you are right.

I've learned a lot since yesterday and have a suggested alternative analysis if you have time to do one more run on your computer to compare to one on mine.

Here's what I learned, with considerable help from Claude:

As you said, those relatively large cross-machine differences in medians suggest the overall file differences are larger than we'd like.
In fact, within-machine differences on my computer between gpu and cpu runs were also larger than we'd like (e.g., median weight differed by 0.47%).
The root cause is that the algorithm wasn't converging. This is probably also true with master — possibly even more so — but master doesn't report convergence information to the console.
The loss function is quite flat with near-non-unique solutions, so loss improves little with extra iterations. The newly added regularization term (sum of squared relative changes in weights, times a penalty) pushes toward a unique solution, but the penalty of 0.0001 I picked initially was too small to do this meaningfully.
With a max of only 200 iterations, we were hitting the iteration limit before convergence anyway.
The convergence criterion from master was also flawed: it stopped when the change in loss was small, rather than when the gradient norm was small. These can diverge badly — a small loss change with a large gradient norm means we're not near a minimum yet; a small gradient norm with a still-noticeable loss change means we are near a minimum and further iterations won't help. The old criterion could stop too early in the first case, and keep running needlessly in the second.

The fix, reflected in an update I'll push in a few minutes:

Increase the penalty from 0.0001 to 0.01 (determined by experimentation)
Change the convergence check from loss_change < 1e-12 to grad_norm < 1e-5
Allow up to 800 iterations — more than enough (my gpu converges at step 332, my cpu at step 391)

The update is based on current master. It also converts the all-zero-column console print to a proper Python UserWarning with a test, and improves the optimization log output.

The larger penalty for weight deviations means those deviations are more important relative to differences from targets than they were before, and errors in targets can worsen. We have a few targets that have errors larger than we would like. But as always, that can mean those targets are simply hard to achieve. I am ok with that.

Results on my machine are now virtually identical between gpu and cpu. I'm hopeful this means you'll see near-identical results too.

The optimization runs in about 77 seconds using my gpu, and in 26 minutes using my cpu.

Here is the fingerprint info from the gpu run:

...REPRODUCIBILITY FINGERPRINT:
    weights: n=225256, total=183704576.003510, mean=815.536882, sdev=961.763464
    weights: min=0.107695, p25=15.681872, p50=576.865081, p75=1364.130311, max=16527.649527
    sum(weights^2)=358177170463.156738
    final loss: 0.1127857440

Finally, with Claude I explored many dead ends and suboptimal solutions between yesterday and today, including Adam with different stopping criteria and max iterations (20,000, up from 2,000!), L-BFGS-B through scipy (the bounds version) rather than L-BFGS through pytorch with clamping to implement bounds (the best approach), and several other approaches. Results are in the archive subfolder of our Drive folder if you have any interest; I will delete them in a day or two. I do have one idea of how we might speed up cpu-only solution but I am not sure the cpu implementation is so slow that it merits more work, and I am not sure the idea would work.

My gpu results are in folder tmd_2026-02-19_lbfgs_gpu_800iters_p010_gradnorm_prupdate. Would you be able to run make clean && make data using the updated PR (pushed within a few minutes) and share your results including any analysis of the tax data that you think helpful?

Many thanks.

donboyd5 · 2026-02-19T16:23:56Z

@martinholmer , good to go.

martinholmer · 2026-02-19T17:14:20Z

@donboyd5 said in PR #403:

good to go.

But it doesn't look like you've merged into this PR recent changes on master.

donboyd5 · 2026-02-19T17:47:41Z

Superseded by #407, which is rebased on current master.

donboyd5 requested a review from martinholmer February 18, 2026 12:09

donboyd5 marked this pull request as ready for review February 18, 2026 17:09

donboyd5 mentioned this pull request Feb 19, 2026

Improve reweighting: higher penalty, more iterations, gradient convergence #407

Merged

donboyd5 closed this Feb 19, 2026

donboyd5 deleted the improve-reweighting branch February 20, 2026 15:27

Comments

Conversation

donboyd5 commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Results

Known test failures

Files changed

Uh oh!

donboyd5 commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martinholmer commented Feb 18, 2026

Uh oh!

donboyd5 commented Feb 18, 2026

Uh oh!

martinholmer commented Feb 18, 2026

Uh oh!

martinholmer commented Feb 18, 2026

Uh oh!

martinholmer commented Feb 18, 2026

Uh oh!

donboyd5 commented Feb 18, 2026

Uh oh!

donboyd5 commented Feb 18, 2026

Uh oh!

donboyd5 commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

donboyd5 commented Feb 18, 2026

Uh oh!

donboyd5 commented Feb 18, 2026

Uh oh!

martinholmer commented Feb 18, 2026

Uh oh!

donboyd5 commented Feb 18, 2026 via email

Uh oh!

martinholmer commented Feb 18, 2026

Uh oh!

martinholmer commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

donboyd5 commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

donboyd5 commented Feb 19, 2026

Uh oh!

martinholmer commented Feb 19, 2026

Uh oh!

donboyd5 commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

donboyd5 commented Feb 18, 2026 •

edited

Loading

donboyd5 commented Feb 18, 2026 •

edited

Loading

donboyd5 commented Feb 18, 2026 •

edited

Loading

martinholmer commented Feb 18, 2026 •

edited

Loading

donboyd5 commented Feb 19, 2026 •

edited

Loading