Skip to content

Add adversarial weight regularisation pipeline#296

Draft
nikhilwoodruff wants to merge 2 commits intomainfrom
feat/adversarial-weight-regularisation
Draft

Add adversarial weight regularisation pipeline#296
nikhilwoodruff wants to merge 2 commits intomainfrom
feat/adversarial-weight-regularisation

Conversation

@nikhilwoodruff
Copy link
Copy Markdown
Contributor

Summary

  • Adds a diagnostics package implementing the adversarial weight regularisation pipeline from the design doc
  • Phase 1 (influence detector): Computes per-record influence across a reporting surface of 10 metrics × 4 slice dimensions (income decile, region, age band, tenure). Identifies records exceeding a configurable influence threshold, computes Kish effective sample sizes, and samples across random policy reforms.
  • Phase 2 (generative model): Trains a TVAE on FRS input attributes via sdv, with conditional sampling using varied conditioning fractions for diverse offspring generation.
  • Phase 3 (adversarial loop): Iteratively detects worst-offender records, generates synthetic offspring, and replaces high-weight records with weighted offspring.
  • Phase 4 (regularised recalibration): Entropy-regularised weight optimisation with KL divergence penalty and optional hard weight cap.
  • Adds a CLI (python -m policyengine_uk_data.diagnostics) with diagnose, train, and regularise commands.
  • Adds visualisation script producing weight distribution, Kish ESS, influence heatmap, and scatter plots.

Current dataset diagnostics

Running Phase 1 on the enhanced FRS reveals:

  • 53,508 households, median weight 33, max weight 372,747 (skewness 31.4)
  • 274 records exceed the 5% influence threshold
  • Overall Kish effective sample size: 930 (out of 53k records)
  • Worst offender (HH #506) has 95.5% influence on housing_benefit_reported/age_band=16-24
  • Most problematic slices: young age bands (16-24) and specific region×income combinations

Test plan

  • All new modules pass ruff lint and format checks
  • Syntax validation passes for all 6 new files
  • Diagnostics script runs successfully on enhanced_frs_2023_24.h5
  • Run full adversarial loop on a subset to verify convergence
  • Compare pre/post weight distributions after regularisation

Introduces a diagnostics package that detects high-influence survey
records, generates synthetic offspring via TVAE, and recalibrates
with entropy regularisation and weight capping to reduce output
noise in population subgroup statistics.

Components:
- influence.py: reporting surface definition, per-record influence
  computation, Kish effective sample size, random reform sampling
- generative_model.py: TVAE training on FRS input attributes,
  conditional sampling with varied conditioning fractions
- offspring.py: adversarial detect-spawn-recalibrate loop
- recalibrate.py: entropy-regularised weight optimisation with
  optional hard weight cap and zero-weight pruning
- __main__.py: CLI with diagnose/train/regularise commands
Produces charts showing weight distribution, Kish effective sample
sizes by population slice, high-influence records table, influence
heatmap, and weight-vs-influence scatter plot.
@MaxGhenis
Copy link
Copy Markdown
Contributor

What motivated this?

@nikhilwoodruff nikhilwoodruff marked this pull request as draft March 24, 2026 11:34
@nikhilwoodruff
Copy link
Copy Markdown
Contributor Author

General noisiness/instability of results

@MaxGhenis
Copy link
Copy Markdown
Contributor

I reviewed the offspring expansion path and I think this needs changes before it is usable as a weight regularisation input.

The main issue is that extract_household_features models hh_*_income as household sums, but expansion copies every source person row and only overwrites the head's income with the synthetic household-level total. Any copied non-head income remains, so a multi-earner household becomes synthetic household income + original non-head income.

There is a broader consistency issue too: the generator samples structure/demographic fields such as household size, children, head demographics, disability, hours, and UC claim status, but expansion preserves the source persons/benefit units and only mutates a small subset of household fields plus head income. That means the expanded records are not faithful realisations of the sampled diagnostic surface.

Separately, GitHub currently reports mergeable=CONFLICTING / mergeStateStatus=DIRTY, and lint is failing. I would keep this draft until the expansion path either constructs records consistent with the sampled features or narrows the sampled features to quantities it actually applies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants