Add adversarial weight regularisation pipeline#296
Add adversarial weight regularisation pipeline#296nikhilwoodruff wants to merge 2 commits intomainfrom
Conversation
Introduces a diagnostics package that detects high-influence survey records, generates synthetic offspring via TVAE, and recalibrates with entropy regularisation and weight capping to reduce output noise in population subgroup statistics. Components: - influence.py: reporting surface definition, per-record influence computation, Kish effective sample size, random reform sampling - generative_model.py: TVAE training on FRS input attributes, conditional sampling with varied conditioning fractions - offspring.py: adversarial detect-spawn-recalibrate loop - recalibrate.py: entropy-regularised weight optimisation with optional hard weight cap and zero-weight pruning - __main__.py: CLI with diagnose/train/regularise commands
Produces charts showing weight distribution, Kish effective sample sizes by population slice, high-influence records table, influence heatmap, and weight-vs-influence scatter plot.
|
What motivated this? |
|
General noisiness/instability of results |
|
I reviewed the offspring expansion path and I think this needs changes before it is usable as a weight regularisation input. The main issue is that There is a broader consistency issue too: the generator samples structure/demographic fields such as household size, children, head demographics, disability, hours, and UC claim status, but expansion preserves the source persons/benefit units and only mutates a small subset of household fields plus head income. That means the expanded records are not faithful realisations of the sampled diagnostic surface. Separately, GitHub currently reports |
Summary
diagnosticspackage implementing the adversarial weight regularisation pipeline from the design docsdv, with conditional sampling using varied conditioning fractions for diverse offspring generation.python -m policyengine_uk_data.diagnostics) withdiagnose,train, andregularisecommands.Current dataset diagnostics
Running Phase 1 on the enhanced FRS reveals:
housing_benefit_reported/age_band=16-24Test plan