Add adversarial weight regularisation pipeline by nikhilwoodruff · Pull Request #296 · PolicyEngine/policyengine-uk-data

nikhilwoodruff · 2026-03-17T16:40:02Z

Summary

Adds a diagnostics package implementing the adversarial weight regularisation pipeline from the design doc
Phase 1 (influence detector): Computes per-record influence across a reporting surface of 10 metrics × 4 slice dimensions (income decile, region, age band, tenure). Identifies records exceeding a configurable influence threshold, computes Kish effective sample sizes, and samples across random policy reforms.
Phase 2 (generative model): Trains a TVAE on FRS input attributes via sdv, with conditional sampling using varied conditioning fractions for diverse offspring generation.
Phase 3 (adversarial loop): Iteratively detects worst-offender records, generates synthetic offspring, and replaces high-weight records with weighted offspring.
Phase 4 (regularised recalibration): Entropy-regularised weight optimisation with KL divergence penalty and optional hard weight cap.
Adds a CLI (python -m policyengine_uk_data.diagnostics) with diagnose, train, and regularise commands.
Adds visualisation script producing weight distribution, Kish ESS, influence heatmap, and scatter plots.

Current dataset diagnostics

Running Phase 1 on the enhanced FRS reveals:

53,508 households, median weight 33, max weight 372,747 (skewness 31.4)
274 records exceed the 5% influence threshold
Overall Kish effective sample size: 930 (out of 53k records)
Worst offender (HH #506) has 95.5% influence on housing_benefit_reported/age_band=16-24
Most problematic slices: young age bands (16-24) and specific region×income combinations

Test plan

All new modules pass ruff lint and format checks
Syntax validation passes for all 6 new files
Diagnostics script runs successfully on enhanced_frs_2023_24.h5
Run full adversarial loop on a subset to verify convergence
Compare pre/post weight distributions after regularisation

Introduces a diagnostics package that detects high-influence survey records, generates synthetic offspring via TVAE, and recalibrates with entropy regularisation and weight capping to reduce output noise in population subgroup statistics. Components: - influence.py: reporting surface definition, per-record influence computation, Kish effective sample size, random reform sampling - generative_model.py: TVAE training on FRS input attributes, conditional sampling with varied conditioning fractions - offspring.py: adversarial detect-spawn-recalibrate loop - recalibrate.py: entropy-regularised weight optimisation with optional hard weight cap and zero-weight pruning - __main__.py: CLI with diagnose/train/regularise commands

Produces charts showing weight distribution, Kish effective sample sizes by population slice, high-influence records table, influence heatmap, and weight-vs-influence scatter plot.

MaxGhenis · 2026-03-24T10:52:43Z

What motivated this?

nikhilwoodruff · 2026-03-24T11:34:32Z

General noisiness/instability of results

MaxGhenis · 2026-05-02T13:00:46Z

I reviewed the offspring expansion path and I think this needs changes before it is usable as a weight regularisation input.

The main issue is that extract_household_features models hh_*_income as household sums, but expansion copies every source person row and only overwrites the head's income with the synthetic household-level total. Any copied non-head income remains, so a multi-earner household becomes synthetic household income + original non-head income.

There is a broader consistency issue too: the generator samples structure/demographic fields such as household size, children, head demographics, disability, hours, and UC claim status, but expansion preserves the source persons/benefit units and only mutates a small subset of household fields plus head income. That means the expanded records are not faithful realisations of the sampled diagnostic surface.

Separately, GitHub currently reports mergeable=CONFLICTING / mergeStateStatus=DIRTY, and lint is failing. I would keep this draft until the expansion path either constructs records consistent with the sampled features or narrows the sampled features to quantities it actually applies.

nikhilwoodruff added 2 commits March 17, 2026 16:37

Add weight diagnostics visualisation script

db6a4b7

Produces charts showing weight distribution, Kish effective sample sizes by population slice, high-influence records table, influence heatmap, and weight-vs-influence scatter plot.

nikhilwoodruff marked this pull request as draft March 24, 2026 11:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add adversarial weight regularisation pipeline#296

Add adversarial weight regularisation pipeline#296
nikhilwoodruff wants to merge 2 commits intomainfrom
feat/adversarial-weight-regularisation

nikhilwoodruff commented Mar 17, 2026

Uh oh!

MaxGhenis commented Mar 24, 2026

Uh oh!

nikhilwoodruff commented Mar 24, 2026

Uh oh!

MaxGhenis commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nikhilwoodruff commented Mar 17, 2026

Summary

Current dataset diagnostics

Test plan

Uh oh!

MaxGhenis commented Mar 24, 2026

Uh oh!

nikhilwoodruff commented Mar 24, 2026

Uh oh!

MaxGhenis commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants