Skip to content

Added daily requeuing of missing frames.#461

Open
cmccully wants to merge 8 commits into
mainfrom
requeue-missing
Open

Added daily requeuing of missing frames.#461
cmccully wants to merge 8 commits into
mainfrom
requeue-missing

Conversation

@cmccully
Copy link
Copy Markdown
Collaborator

@cmccully cmccully commented May 8, 2026

This PR adds a cron to requeue missing frames using the same cron container as the calibration scheduler. The query only relies on the archive api.

I also refactored some of the archive querying for better reuse of the retry logic and less code duplication.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a Celery beat (cron) job to detect and requeue “missing” frames by querying the archive API, and refactors archive querying / queue routing to reduce duplication.

Changes:

  • Introduces a new celery.requeue_missing_frames periodic task and wires it into the existing cron container/entrypoint.
  • Adds a shared banzai.query module with retrying archive GET + archive frame pagination helpers.
  • Refactors queue selection into get_processing_queue() and reuses archive query helper in FITS downloading / BPM ingestion.

Reviewed changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
uv.lock Bumps lco-banzai to 1.36.0 and updates lock metadata.
pyproject.toml Updates version and switches console script to new cron entrypoint.
helm-chart/banzai/templates/listener.yaml Updates the cron container command to banzai_cron.
CHANGES.md Adds 1.36.0 changelog entry for daily requeueing.
banzai/utils/observation_utils.py Adds tenacity retry to calibration-block archive query.
banzai/utils/instrument_utils.py Adds shared get_processing_queue() helper.
banzai/utils/fits_utils.py Replaces direct requests.get with archive_get helper for downloads.
banzai/settings.py Adds requeue cron configuration knobs.
banzai/scheduling.py Adds the requeue_missing_frames Celery task and archive querying usage.
banzai/query.py New module for archive querying helpers and cross-matching.
banzai/main.py Adds new cron entrypoint scheduling and reuses archive_get / get_processing_queue.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread banzai/settings.py Outdated
Comment thread banzai/query.py Outdated
Comment thread banzai/query.py Outdated
Comment thread banzai/query.py Outdated
Comment thread banzai/scheduling.py Outdated
Comment thread banzai/scheduling.py Outdated
Comment thread banzai/main.py Outdated
Comment thread banzai/utils/fits_utils.py
Comment thread banzai/utils/fits_utils.py Outdated
Comment thread banzai/main.py
Copy link
Copy Markdown
Collaborator

@jchate6 jchate6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see more documentation here if possible.

  • New or heavily reworked functions would benefit from doc strings describing their purpose, use, and intended functionality.

  • I think we need to record this change of behavior somewhere easily accessible and referenced so that the precise details of Banzai's expected behavior are understood by a wider audience.

  • There are several changes here that I don't really understand why they were made (like the changes in fits_utils). Some comments in the PR explaining how these different changes are related would generally be appreciated.

Also, there are no test changes. Do we want to test that we catch and re-queue missing frames correctly?

I'm curious if we have a different strategy for the cron. Does it make more sense to run it more than once a day? What if we triggered for each site at local noon rather than one bulk re-queue in the West Coast morning? This would spread out the queries and queuing as well.

Comment thread banzai/settings.py Outdated
Comment thread banzai/settings.py Outdated
Comment thread banzai/main.py Outdated
Comment thread banzai/scheduling.py
Comment thread banzai/query.py
@cmccully
Copy link
Copy Markdown
Collaborator Author

The archive_get function doesn't change any of the behavior. It is just a refactor of previously duplicated retries. I've added a docstring.

Copy link
Copy Markdown
Collaborator

@jchate6 jchate6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Well done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants