Smartstack Scaffolding#444
Conversation
0f7e6d3 to
3512bdb
Compare
3502eec to
8b6fb9a
Compare
3805921 to
ba59ec7
Compare
38dc3d6 to
b0189da
Compare
ba59ec7 to
e5350bd
Compare
e5350bd to
3bc4bc5
Compare
29b6816 to
b720aa8
Compare
30a1d0b to
ea53072
Compare
| update_filepath(db_address, cal.id, dest_dir) | ||
| logger.info(f"Downloaded {cal.filename}") | ||
| logger.info(f"Cached {cal.filename}") | ||
| return True |
There was a problem hiding this comment.
I don't generally like returning True unless it is a boolean check, e.g., is this object x type? I tend to raise an exception if something didn't work. This is kind of a stylistic thing, but it's a place to make the code more readable for future us.
There was a problem hiding this comment.
This function takes a file that needs to be cached and results in three possible outcomes: downloaded, skipped, or failed. The return value is used by the caller to keep track of how many files were downloaded vs how many were already cached. So I still think this is the cleanest solution, but I'm open to alternatives if you have one in mind.
There was a problem hiding this comment.
Why does the caller care if the file is downloaded or already cached? Doesn't it just care that the file is on disk now?
There was a problem hiding this comment.
Pull request overview
This PR introduces “smart stacking” scaffolding for site deployments (DB tracking + subframe reduction + Redis-based notifications + per-camera stacking workers), while also refactoring local/site Docker Compose to treat Redis/RabbitMQ as external dependencies and polishing site deployment/E2E workflows.
Changes:
- Add site-only DB tables (
StackFrameviaSiteBase) and smart-stacking worker/supervisor plumbing. - Externalize Redis/RabbitMQ into
docker-compose-dependencies.ymland switch workers toREDIS_URL/RABBITMQ_URL. - Expand/adjust unit + site E2E coverage and update site/local env templates and documentation accordingly.
Reviewed changes
Copilot reviewed 30 out of 32 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| uv.lock | Dependency lock revision bump. |
| site-banzai-env.default | Site env defaults updated (absolute paths, broker URLs, smart stacking knobs). |
| local-banzai-env.default | Local env defaults updated to use broker URLs and queue names. |
| scripts/queue_images.py | Queueing script changed to publish absolute host paths. |
| README.rst | Docs updated for new dependencies compose flow and env vars. |
| pytest.ini | Adds smart stacking markers and updates recursion excludes. |
| pyproject.toml | Adds new console entry points for stacking/subframe components. |
| docker-compose-site.yml | Site compose updated for external brokers, absolute-path mounts, new stacking services, and e2e isolation knobs. |
| docker-compose-local.yml | Local compose updated for external brokers and configurable queue/worker naming. |
| docker-compose-dependencies.yml | New compose file for Redis + RabbitMQ dependencies. |
| banzai/utils/stage_utils.py | run_pipeline_stages now returns the processed images list. |
| banzai/utils/fits_utils.py | Adds optional attempt logging to download_from_s3. |
| banzai/stacking.py | New stacking worker loop, timeout handling, and supervisor process manager. |
| banzai/settings.py | Adds env-derived settings for subframe/stacking queues and broker URLs. |
| banzai/scheduling.py | Adds process_subframe Celery task and wiring for DB + Redis notifications. |
| banzai/main.py | Adds SubframeListener + CLI entrypoint to consume stack-queue messages and dispatch Celery tasks. |
| banzai/dbs.py | Introduces SiteBase, StackFrame, site-aware sessions, and stack frame CRUD/upsert helpers. |
| banzai/celeryconfig.py | Removes hardcoded broker_url from Celery config module. |
| banzai/cache/replication.py | Subscription naming/slot reuse improvements for replication setup. |
| banzai/cache/init.py | Ensures site DB schema creation and updated replication setup call signature. |
| banzai/cache/download_worker.py | Adds argparse CLI, improved logging/heartbeat, NULL-frameid handling, reconciliation and backoff logic. |
| banzai/tests/test_smart_stacking.py | New unit tests covering stacking DB ops, notifications, listener, task, supervisor, and resilience. |
| banzai/tests/test_replication.py | Updates replication tests for new subscription/slot behavior. |
| banzai/tests/test_download_worker.py | Updates tests for new download worker behavior and CLI parsing. |
| banzai/tests/test_dbs.py | Adds tests for site-table creation and site-aware session enforcement. |
| banzai/tests/test_cache_init.py | Updates cache-init expectations for site schema and replication call changes. |
| banzai/tests/site_e2e/test_site_e2e.py | Expands site E2E coverage (cached cal usage, drift reconciliation, stacking path, slot reuse). |
| banzai/tests/site_e2e/conftest.py | Improves e2e isolation/scoping, absolute path enforcement, compose helpers, and cleanup. |
| banzai/tests/site_e2e/site_e2e.env.template | Adds isolation knobs + absolute host path requirements + broker URLs. |
| banzai/tests/site_e2e/README.md | Updates e2e docs for isolation, absolute paths, and log watching. |
| .gitignore | Ignores local-banzai-env. |
| .dockerignore | Ignores data/ directory in Docker build context. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| shutil.rmtree(path, ignore_errors=True) | ||
|
|
||
|
|
||
| def run_docker_compose(compose_file, *args, cwd=None, env=None, project=None): |
There was a problem hiding this comment.
This may just be me, but I'm really nervous about python calling docker when this python code will likely be running in a docker container.
There was a problem hiding this comment.
My intent is that site_e2e is a host-run deployment harness, not code that runs inside the banzai containers. Pytest is currently the outer orchestrator for Docker Compose. If we want the site_e2e tests to run in CI then I agree it makes sense to move the compose lifecycle management out to CI and let pytest talk only to the already running services. Are you ok if we defer that for now?
There was a problem hiding this comment.
I don't really think pytest should be in charge of container management. That feels like an anti-pattern to me. In principle we could use tox, but I'm not really sure what you are trying to accomplish, so I'm concerned.
adf8300 to
5f760f6
Compare
|
Sacking architecture and process flow diagrams have been added here: https://github.com/LCOGT/banzai/pull/444/changes#diff-1df90f6630375c6d0b8ea4221090de028e494a40178e28d9a35f0458f6d82451 |
Originally this PR was just splitting Redis/RabbitMQ out of the site/local
docker-compose files, but it ended up bundling several dependent changes
together rather than stacking PRs that kept overriding each other.
What's in here now:
Smart stacking scaffolding
StackFramemodel +SiteBasedeclarative base for site-only tables(
dbs.py)SubframeListener+process_subframeCelery task: validates incomingRabbitMQ messages, inserts a tracking row, runs the reduction pipeline,
updates the row with the reduced filepath, and pushes a Redis notification
per camera (
main.py,scheduling.py)stacking.py: per-camera worker loop drains Redis notifications, checksgroup completion (all reduced AND all expected frames present, or
is_last), and finalizes complete or timed-out stacks; supervisor spawnsone worker per camera
banzai-subframe-listener,banzai-subframe-worker,banzai-stacking-supervisorservices + matching env varstest_smart_stacking.pyand a new site-E2E phaseRedis / RabbitMQ as external dependencies
docker-compose-{site,local}.ymlintodocker-compose-dependencies.ymlso they can be left running acrosspipeline restarts (and pointed at existing site infrastructure)
REDIS_URL/RABBITMQ_URL; no more hardcodedredis://redis:6379/0Site deployment fixes & polish (from running this at a site)
triggers, NULL-
frameidhandling, per-frameid failure backoff, filepathreconciliation when replication overwrites with NULL, "waiting for
replication" log while
calimagesis empty during initializationSLOT_NAMEoverride,reuse existing slot when recreating subscription
created_at(notdateobs) for stack-timeout — handles NULLdateobsand reprocessing (note: stack timeout is being reworked a bit in the branch I'm currently working on)
host.docker.internalresolution), container-prefix support so thefixture stack and
just site-upcan coexist, absolute-HOST_*_DIRrequirement so file paths resolve identically inside and outside
containers
argparseentry points for site workers; removed unusedOPENTSDB_PYTHON_METRICS_TEST_MODEplumbing