Added daily requeuing of missing frames.#461
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a Celery beat (cron) job to detect and requeue “missing” frames by querying the archive API, and refactors archive querying / queue routing to reduce duplication.
Changes:
- Introduces a new
celery.requeue_missing_framesperiodic task and wires it into the existing cron container/entrypoint. - Adds a shared
banzai.querymodule with retrying archive GET + archive frame pagination helpers. - Refactors queue selection into
get_processing_queue()and reuses archive query helper in FITS downloading / BPM ingestion.
Reviewed changes
Copilot reviewed 10 out of 11 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| uv.lock | Bumps lco-banzai to 1.36.0 and updates lock metadata. |
| pyproject.toml | Updates version and switches console script to new cron entrypoint. |
| helm-chart/banzai/templates/listener.yaml | Updates the cron container command to banzai_cron. |
| CHANGES.md | Adds 1.36.0 changelog entry for daily requeueing. |
| banzai/utils/observation_utils.py | Adds tenacity retry to calibration-block archive query. |
| banzai/utils/instrument_utils.py | Adds shared get_processing_queue() helper. |
| banzai/utils/fits_utils.py | Replaces direct requests.get with archive_get helper for downloads. |
| banzai/settings.py | Adds requeue cron configuration knobs. |
| banzai/scheduling.py | Adds the requeue_missing_frames Celery task and archive querying usage. |
| banzai/query.py | New module for archive querying helpers and cross-matching. |
| banzai/main.py | Adds new cron entrypoint scheduling and reuses archive_get / get_processing_queue. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
jchate6
left a comment
There was a problem hiding this comment.
I would like to see more documentation here if possible.
-
New or heavily reworked functions would benefit from doc strings describing their purpose, use, and intended functionality.
-
I think we need to record this change of behavior somewhere easily accessible and referenced so that the precise details of Banzai's expected behavior are understood by a wider audience.
-
There are several changes here that I don't really understand why they were made (like the changes in
fits_utils). Some comments in the PR explaining how these different changes are related would generally be appreciated.
Also, there are no test changes. Do we want to test that we catch and re-queue missing frames correctly?
I'm curious if we have a different strategy for the cron. Does it make more sense to run it more than once a day? What if we triggered for each site at local noon rather than one bulk re-queue in the West Coast morning? This would spread out the queries and queuing as well.
|
The archive_get function doesn't change any of the behavior. It is just a refactor of previously duplicated retries. I've added a docstring. |
This PR adds a cron to requeue missing frames using the same cron container as the calibration scheduler. The query only relies on the archive api.
I also refactored some of the archive querying for better reuse of the retry logic and less code duplication.