Skip to content

Add optional Modal calibration backend#386

Open
MaxGhenis wants to merge 1 commit intomainfrom
codex/modal-calibration-main
Open

Add optional Modal calibration backend#386
MaxGhenis wants to merge 1 commit intomainfrom
codex/modal-calibration-main

Conversation

@MaxGhenis
Copy link
Copy Markdown
Contributor

Summary

  • Recreates the useful part of Add Modal GPU calibration #279 on top of current main instead of resolving the stale/conflicting branch.
  • Adds an opt-in Modal GPU backend for the constituency and local-authority calibration steps.
  • Keeps pull-request builds on the existing CPU path; trusted push/manual builds enable Modal only when MODAL_TOKEN_ID and MODAL_TOKEN_SECRET are present.
  • Mirrors current CPU calibration semantics for sparse local targets by sending a local_target_available mask and replacing NaNs with zero only in the tensor payload.

What this does

The dataset build still creates/imputes/uprates the FRS locally. At calibration time:

  • default path: unchanged CPU calibrate_local_areas flow;
  • MODAL_CALIBRATE=1: build the constituency, LA, and national loss matrices locally, serialize the arrays, spawn both GPU optimization jobs on Modal concurrently, retrieve weight checkpoints, write the calibration logs, and save the same .h5 weight files.

Verification

  • uv lock --check
  • uv run --python 3.13 --frozen --extra dev ruff check policyengine_uk_data/datasets/create_datasets.py policyengine_uk_data/utils/modal_calibrate.py policyengine_uk_data/tests/test_modal_calibration.py
  • uv run --python 3.13 --frozen --extra dev python - <<'PY' ... import modal_calibrate ... PY
  • uv run --python 3.13 --frozen pytest policyengine_uk_data/tests/test_modal_calibration.py policyengine_uk_data/tests/test_calibrate_save.py policyengine_uk_data/tests/test_calibration_progress.py -q
  • Parsed .github/workflows/push.yaml and .github/workflows/pull_request.yaml with PyYAML.

Supersedes #279.

@MaxGhenis MaxGhenis mentioned this pull request May 2, 2026
3 tasks
@MaxGhenis
Copy link
Copy Markdown
Contributor Author

On reconsideration, I don't think this is needed for correctness: the data build already runs on GitHub Actions, and the current main push failure I checked happened after dataset build, in make test on rail_subsidy_spending, not because calibration could not run.

This PR should be treated as optional infrastructure for reducing release-build wall time only. It keeps PR CI on the CPU path and only enables Modal in trusted push/manual builds when Modal secrets exist, but it still adds dependency/secrets/remote-execution complexity. I would leave this unmerged unless slow release builds are a real operational problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant