Skip to content

feat: add trogon-cron scheduler#108

Open
yordis wants to merge 23 commits intomainfrom
CRON
Open

feat: add trogon-cron scheduler#108
yordis wants to merge 23 commits intomainfrom
CRON

Conversation

@yordis
Copy link
Copy Markdown
Member

@yordis yordis commented Apr 9, 2026

Summary

  • add the CRON design docs and CRON-web companion artifacts
  • add trogon-cron, a distributed NATS/JetStream-backed scheduler with CLI, client API, leader election, retries, dead-lettering, and Docker/dev setup
  • expand the crate with traits, mocks, generic injection, trogon-nats helpers, and stronger internal job modeling/value objects
  • add broad unit, integration, and end-to-end coverage across scheduling, spawn execution, retry behavior, and hot-reload flows

@cursor
Copy link
Copy Markdown

cursor bot commented Apr 9, 2026

PR Summary

Medium Risk
Introduces a new NATS/JetStream-backed scheduling control plane plus Docker/JetStream topology changes; behavior depends on newer NATS scheduler features and adds new event/snapshot persistence paths.

Overview
Adds a new Rust service, trogon-cron, providing a distributed CRON-like scheduler over NATS/JetStream with a CLI to list/get/add/remove/enable/disable jobs and a controller that performs leader election and reconciles desired job specs into native NATS scheduled messages.

Implements job specs, validation/resolution into NATS subjects/headers, an event-sourced job stream (job_registered, job_state_changed, job_removed) with optimistic concurrency controls, snapshot projection/catch-up, and NATS integration for schedule publishing and active schedule reconciliation (plus mocks and tests).

Updates dev Docker compose to run trogon-cron (profiled), switches NATS to synadia/nats-server:nightly with localhost-bound ports, and extends telemetry ServiceName plus workspace deps/lockfile to include the new crates and supporting dependencies.

Reviewed by Cursor Bugbot for commit 0010fe4. Bugbot is set up for automated code reviews on this repo. Configure here.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 9, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds a new crate trogon-cron: a distributed CRON scheduler using NATS JetStream/KV for config, leader election, and tick publishing. Provides serde job configs, domain validation, execution (publish/spawn) with retries and DLQ, a scheduler runtime with hot-reload and leader election, CLI tooling, mocks/tests, and Docker compose + Dockerfile.

Changes

Cohort / File(s) Summary
Crate manifest
rsworkspace/crates/trogon-cron/Cargo.toml
New crate manifest, workspace lint settings, test-support feature, dependencies, binary and explicit test targets.
Public surface & CLI / binary
rsworkspace/crates/trogon-cron/src/lib.rs, rsworkspace/crates/trogon-cron/src/cli.rs, rsworkspace/crates/trogon-cron/src/main.rs
Crate root and re-exports; Clap CLI types; Tokio binary entrypoint and job-management command handlers and CLI error mapping.
Client & traits
rsworkspace/crates/trogon-cron/src/client.rs, rsworkspace/crates/trogon-cron/src/traits.rs
CronClient generic over ConfigStore with CRUD methods; trait interfaces for ConfigStore, LeaderLock, TickPublisher, and JobConfigChange stream types.
Config, domain & errors
rsworkspace/crates/trogon-cron/src/config.rs, rsworkspace/crates/trogon-cron/src/domain.rs, rsworkspace/crates/trogon-cron/src/error.rs
Serde job types (Schedule, Action, RetryConfig, JobConfig, TickPayload), domain conversions/validation to Registered/Runtime jobs (cron parsing, spawn validation), and CronError/JobConfigError/TimeoutError with Display/Error and From impls.
Execution & runtime
rsworkspace/crates/trogon-cron/src/executor.rs, rsworkspace/crates/trogon-cron/src/scheduler.rs, rsworkspace/crates/trogon-cron/src/leader.rs
JobState and firing logic; publish and spawn execution with retry/backoff, timeouts and DLQ; Scheduler that hot-reloads configs, enforces leader-only execution, preserves runtime flags; leader-election helper with renew/release semantics.
NATS / JetStream implementations & KV helpers
rsworkspace/crates/trogon-cron/src/kv.rs, rsworkspace/crates/trogon-cron/src/nats_impls.rs
KV/stream create-or-get helpers, job load/watch logic; NatsConfigStore (KV-backed), JetStream TickPublisher adapter with ack/timeout handling, and NatsLeaderLock.
Mocks & tests
rsworkspace/crates/trogon-cron/src/mocks.rs, rsworkspace/crates/trogon-cron/tests/cron_unit.rs
In-memory mocks for TickPublisher, LeaderLock, ConfigStore and unit/integration tests exercising client, executor, scheduler initialization and validation cases.
Packaging & infra
devops/docker/compose/compose.yml, devops/docker/compose/services/trogon-cron/Dockerfile
Docker Compose service entry (profile cron) and multi-stage Dockerfile using cargo-chef and a slim runtime image.
Telemetry enum update
rsworkspace/crates/acp-telemetry/src/service_name.rs
Added ServiceName::TrogonCron variant and updated tests.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Scheduler as Scheduler
    participant KV as NATS KV (configs/leader)
    participant Watch as Config Watcher
    participant Leader as LeaderElection
    participant Publisher as TickPublisher (JetStream)
    participant Executor as Executor

    Scheduler->>KV: load_jobs_and_watch()
    KV-->>Scheduler: (snapshot, watch_stream)
    loop main tick
        Scheduler->>Leader: ensure_leader()
        alt is leader
            Scheduler->>Scheduler: compute due jobs(now)
            Scheduler->>Executor: execute(job_state)
            Executor->>Publisher: publish_tick(...) / spawn process
            Publisher-->>Executor: publish result / ack
        else not leader
            Scheduler-->>Scheduler: skip execution
        end
    end
    Watch-->>Scheduler: JobConfigChange (Put/Delete)
    Scheduler->>Scheduler: apply config change (hot-reload)
Loading
sequenceDiagram
    autonumber
    participant Executor as Executor
    participant Retry as RetryLoop
    participant Jet as JetStream
    participant DLQ as cron.errors

    Executor->>Retry: prepare TickPayload & headers
    Retry->>Jet: publish(payload)
    alt success
        Jet-->>Retry: ack
    else transient failure
        Retry->>Retry: backoff + jitter (respect max duration)
        Retry->>Jet: publish(retry)
        alt retries exhausted
            Retry->>DLQ: publish dead-letter(details)
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Poem

🐇 I nudged the nodes at break of dawn,

jobs in rows, their timers drawn,
NATS hums ticks across the stream,
leader hops and runs the dream,
retries twirl — the scheduler's song.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 72.83% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: add trogon-cron scheduler' directly summarizes the main change: introducing a new distributed CRON scheduler called trogon-cron, which is the primary focus of the changeset.
Description check ✅ Passed The description is directly related to the changeset, providing context about the trogon-cron scheduler addition, its features (leader election, retries, dead-lettering), and the testing coverage included.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch CRON

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

badge

Code Coverage Summary

Details
Filename                                                                      Stmts    Miss  Cover    Missing
--------------------------------------------------------------------------  -------  ------  -------  ---------------------------------------------------------------------------------------------------------
crates/acp-nats/src/client/request_permission.rs                                338       0  100.00%
crates/acp-nats/src/client/ext.rs                                               365       8  97.81%   193-204, 229-240
crates/acp-nats/src/client/rpc_reply.rs                                          71       0  100.00%
crates/acp-nats/src/client/fs_write_text_file.rs                                451       0  100.00%
crates/acp-nats/src/client/fs_read_text_file.rs                                 384       0  100.00%
crates/acp-nats/src/client/ext_session_prompt_response.rs                       157       0  100.00%
crates/acp-nats/src/client/terminal_create.rs                                   294       0  100.00%
crates/acp-nats/src/client/terminal_kill.rs                                     309       0  100.00%
crates/acp-nats/src/client/terminal_release.rs                                  357       0  100.00%
crates/acp-nats/src/client/session_update.rs                                     55       0  100.00%
crates/acp-nats/src/client/mod.rs                                              2987       0  100.00%
crates/acp-nats/src/client/terminal_output.rs                                   223       0  100.00%
crates/acp-nats/src/client/terminal_wait_for_exit.rs                            396       0  100.00%
crates/trogon-source-notion/src/notion_event_type.rs                             48       3  93.75%   49-51
crates/trogon-source-notion/src/notion_verification_token.rs                     17       0  100.00%
crates/trogon-source-notion/src/server.rs                                       351       8  97.72%   100-104, 137-138, 162-163
crates/trogon-source-notion/src/signature.rs                                     63       1  98.41%   32
crates/acp-nats/src/nats/subjects/commands/prompt.rs                             18       0  100.00%
crates/acp-nats/src/nats/subjects/commands/close.rs                              18       0  100.00%
crates/acp-nats/src/nats/subjects/commands/cancel.rs                             18       0  100.00%
crates/acp-nats/src/nats/subjects/commands/resume.rs                             18       0  100.00%
crates/acp-nats/src/nats/subjects/commands/set_mode.rs                           18       0  100.00%
crates/acp-nats/src/nats/subjects/commands/fork.rs                               18       0  100.00%
crates/acp-nats/src/nats/subjects/commands/set_config_option.rs                  18       0  100.00%
crates/acp-nats/src/nats/subjects/commands/load.rs                               18       0  100.00%
crates/acp-nats/src/nats/subjects/commands/set_model.rs                          18       0  100.00%
crates/trogon-std/src/env/in_memory.rs                                           81       0  100.00%
crates/trogon-std/src/env/system.rs                                              17       0  100.00%
crates/trogon-std/src/telemetry/http.rs                                         239       0  100.00%
crates/trogon-std/src/fs/mem.rs                                                 220      10  95.45%   61-63, 77-79, 133-135, 158
crates/trogon-std/src/fs/system.rs                                              102       0  100.00%
crates/acp-nats/src/jetstream/provision.rs                                       61       0  100.00%
crates/acp-nats/src/jetstream/streams.rs                                        194       4  97.94%   254-256, 266
crates/acp-nats/src/jetstream/consumers.rs                                       99       0  100.00%
crates/acp-nats/src/jetstream/ext_policy.rs                                      26       0  100.00%
crates/trogon-cron/src/events/decision.rs                                        70      11  84.29%   13-28
crates/trogon-cron/src/events/mod.rs                                            121       3  97.52%   45-46, 147
crates/trogon-cron/src/events/state.rs                                           64      21  67.19%   58, 60-61, 67-76, 99-101, 106-107, 114-119
crates/acp-telemetry/src/trace.rs                                                32       3  90.62%   23-24, 32
crates/acp-telemetry/src/service_name.rs                                         52       0  100.00%
crates/acp-telemetry/src/metric.rs                                               35       3  91.43%   30-31, 39
crates/acp-telemetry/src/lib.rs                                                 169      32  81.07%   28-34, 56-63, 98, 103, 108, 122-137, 174, 177, 180, 186
crates/acp-telemetry/src/signal.rs                                                7       1  85.71%   43
crates/acp-telemetry/src/log.rs                                                  71       1  98.59%   43
crates/trogon-gateway/src/source_status.rs                                       32       0  100.00%
crates/trogon-gateway/src/config.rs                                            1371       0  100.00%
crates/trogon-gateway/src/http.rs                                               114       0  100.00%
crates/trogon-gateway/src/main.rs                                                 4       0  100.00%
crates/trogon-gateway/src/streams.rs                                            128       0  100.00%
crates/acp-nats/src/telemetry/metrics.rs                                         65       0  100.00%
crates/trogon-eventsourcing/src/decision.rs                                      68       4  94.12%   42-44, 104
crates/trogon-eventsourcing/src/lib.rs                                          101       0  100.00%
crates/trogon-eventsourcing/src/snapshots.rs                                    371     207  44.20%   42-44, 133-141, 147, 151, 157-169, 215-440
crates/trogon-service-config/src/lib.rs                                          98       0  100.00%
crates/trogon-source-gitlab/src/server.rs                                       431       0  100.00%
crates/trogon-source-gitlab/src/signature.rs                                     30       0  100.00%
crates/trogon-source-gitlab/src/config.rs                                        17       0  100.00%
crates/acp-nats-agent/src/connection.rs                                        1434       1  99.93%   686
crates/trogon-source-linear/src/server.rs                                       392       0  100.00%
crates/trogon-source-linear/src/signature.rs                                     54       1  98.15%   16
crates/trogon-source-linear/src/config.rs                                        17       0  100.00%
crates/trogon-source-github/src/server.rs                                       351       0  100.00%
crates/trogon-source-github/src/signature.rs                                     64       0  100.00%
crates/trogon-source-github/src/config.rs                                        17       0  100.00%
crates/trogon-std/src/dirs/system.rs                                             76       0  100.00%
crates/trogon-std/src/dirs/fixed.rs                                              84       0  100.00%
crates/acp-nats-stdio/src/config.rs                                              72       0  100.00%
crates/acp-nats-stdio/src/main.rs                                               141      27  80.85%   64, 116-123, 129-131, 148, 179-200
crates/trogon-cron/src/store/append_events.rs                                    35      35  0.00%    51-189
crates/trogon-cron/src/store/catch_up_snapshots.rs                                7       7  0.00%    75-81
crates/trogon-cron/src/store/change_job_state.rs                                 50      19  62.00%   48-55, 145-155
crates/trogon-cron/src/store/connect.rs                                           6       6  0.00%    24-29
crates/trogon-cron/src/store/events_stream.rs                                     9       9  0.00%    19-27
crates/trogon-cron/src/store/get_job.rs                                          12       9  25.00%   22-30
crates/trogon-cron/src/store/list_jobs.rs                                         9       9  0.00%    12-20
crates/trogon-cron/src/store/load_and_watch.rs                                    7       7  0.00%    152-158
crates/trogon-cron/src/store/rewrite_projection.rs                               23      23  0.00%    12-39
crates/trogon-cron/src/store/config_bucket.rs                                     9       9  0.00%    19-27
crates/trogon-cron/src/store/project_event_to_snapshot.rs                        11      11  0.00%    71-81
crates/trogon-cron/src/store/register_job.rs                                     69      39  43.48%   31-35, 43-58, 72, 74-84, 162-172
crates/trogon-cron/src/store/remove_job.rs                                       40      23  42.50%   41, 45-52, 62-64, 125-135
crates/trogon-cron/src/store/snapshot_bucket.rs                                   9       9  0.00%    19-27
crates/trogon-nats/src/lease/lease_bucket.rs                                     19       0  100.00%
crates/trogon-nats/src/lease/lease_key.rs                                        19       0  100.00%
crates/trogon-nats/src/lease/renew_interval.rs                                   61       0  100.00%
crates/trogon-nats/src/lease/mod.rs                                             603      13  97.84%   184-197
crates/trogon-nats/src/lease/renew.rs                                           263      19  92.78%   21-27, 46-57
crates/trogon-nats/src/lease/nats_kv_lease_config.rs                             30       0  100.00%
crates/trogon-nats/src/lease/release.rs                                           5       5  0.00%    8-12
crates/trogon-nats/src/lease/acquire.rs                                           8       8  0.00%    9-18
crates/trogon-nats/src/lease/lease_timing.rs                                     21       0  100.00%
crates/trogon-nats/src/lease/ttl.rs                                              76       0  100.00%
crates/trogon-nats/src/lease/lease_config_error.rs                               13       0  100.00%
crates/trogon-nats/src/lease/provision.rs                                       196      10  94.90%   81-91
crates/acp-nats/src/nats/extensions.rs                                            3       0  100.00%
crates/acp-nats/src/nats/mod.rs                                                  23       0  100.00%
crates/acp-nats/src/nats/parsing.rs                                             285       1  99.65%   153
crates/acp-nats/src/nats/subjects/global/session_new.rs                           8       0  100.00%
crates/acp-nats/src/nats/subjects/global/authenticate.rs                          8       0  100.00%
crates/acp-nats/src/nats/subjects/global/initialize.rs                            8       0  100.00%
crates/acp-nats/src/nats/subjects/global/logout.rs                                8       0  100.00%
crates/acp-nats/src/nats/subjects/global/ext.rs                                  12       0  100.00%
crates/acp-nats/src/nats/subjects/global/ext_notify.rs                           12       0  100.00%
crates/acp-nats/src/nats/subjects/global/session_list.rs                          8       0  100.00%
crates/trogon-source-discord/src/config.rs                                      109       0  100.00%
crates/trogon-source-discord/src/gateway.rs                                     491       1  99.80%   157
crates/trogon-std/src/time/system.rs                                             35       0  100.00%
crates/trogon-std/src/time/mock.rs                                              129       0  100.00%
crates/acp-nats-ws/src/main.rs                                                  189      18  90.48%   89, 209-230, 308
crates/acp-nats-ws/src/connection.rs                                            166      35  78.92%   75-82, 87-98, 114, 116-117, 122, 133-135, 142, 146, 150, 153-161, 172, 176, 179, 182-186, 220
crates/acp-nats-ws/src/upgrade.rs                                                57       2  96.49%   59, 90
crates/acp-nats-ws/src/config.rs                                                 83       0  100.00%
crates/acp-nats/src/nats/subjects/client_ops/terminal_release.rs                 15       0  100.00%
crates/acp-nats/src/nats/subjects/client_ops/fs_write_text_file.rs               15       0  100.00%
crates/acp-nats/src/nats/subjects/client_ops/session_update.rs                   15       0  100.00%
crates/acp-nats/src/nats/subjects/client_ops/terminal_kill.rs                    15       0  100.00%
crates/acp-nats/src/nats/subjects/client_ops/terminal_output.rs                  15       0  100.00%
crates/acp-nats/src/nats/subjects/client_ops/terminal_create.rs                  15       0  100.00%
crates/acp-nats/src/nats/subjects/client_ops/terminal_wait_for_exit.rs           15       0  100.00%
crates/acp-nats/src/nats/subjects/client_ops/session_request_permission.rs       15       0  100.00%
crates/acp-nats/src/nats/subjects/client_ops/fs_read_text_file.rs                15       0  100.00%
crates/trogon-nats/src/jetstream/claim_check.rs                                 372       0  100.00%
crates/trogon-nats/src/jetstream/mocks.rs                                       740      44  94.05%   368-382, 388-396, 411-422, 436-446, 514-516
crates/trogon-nats/src/jetstream/stream_max_age.rs                               18       0  100.00%
crates/trogon-nats/src/jetstream/traits.rs                                       61      61  0.00%    152-264
crates/trogon-nats/src/jetstream/create_conflicts.rs                             26       0  100.00%
crates/trogon-nats/src/jetstream/publish.rs                                      64       0  100.00%
crates/acp-nats/src/nats/subjects/subscriptions/one_agent.rs                     18       0  100.00%
crates/acp-nats/src/nats/subjects/subscriptions/all_client.rs                    11       0  100.00%
crates/acp-nats/src/nats/subjects/subscriptions/all_agent.rs                     11       0  100.00%
crates/acp-nats/src/nats/subjects/subscriptions/prompt_wildcard.rs               11       0  100.00%
crates/acp-nats/src/nats/subjects/subscriptions/one_session.rs                   18       0  100.00%
crates/acp-nats/src/nats/subjects/subscriptions/all_session.rs                   11       0  100.00%
crates/acp-nats/src/nats/subjects/subscriptions/all_agent_ext.rs                 11       0  100.00%
crates/acp-nats/src/nats/subjects/subscriptions/one_client.rs                    18       0  100.00%
crates/acp-nats/src/nats/subjects/subscriptions/global_all.rs                    11       0  100.00%
crates/acp-nats/src/agent/ext_method.rs                                          92       0  100.00%
crates/acp-nats/src/agent/fork_session.rs                                       106       0  100.00%
crates/acp-nats/src/agent/resume_session.rs                                     102       0  100.00%
crates/acp-nats/src/agent/set_session_mode.rs                                    71       0  100.00%
crates/acp-nats/src/agent/initialize.rs                                          83       0  100.00%
crates/acp-nats/src/agent/list_sessions.rs                                       50       0  100.00%
crates/acp-nats/src/agent/ext_notification.rs                                    88       0  100.00%
crates/acp-nats/src/agent/close_session.rs                                       67       0  100.00%
crates/acp-nats/src/agent/load_session.rs                                       101       0  100.00%
crates/acp-nats/src/agent/set_session_model.rs                                   71       0  100.00%
crates/acp-nats/src/agent/test_support.rs                                       299       0  100.00%
crates/acp-nats/src/agent/new_session.rs                                         91       0  100.00%
crates/acp-nats/src/agent/authenticate.rs                                        52       0  100.00%
crates/acp-nats/src/agent/bridge.rs                                             123       4  96.75%   109-112
crates/acp-nats/src/agent/logout.rs                                              49       0  100.00%
crates/acp-nats/src/agent/js_request.rs                                         304       0  100.00%
crates/acp-nats/src/agent/prompt.rs                                             633       0  100.00%
crates/acp-nats/src/agent/set_session_config_option.rs                           71       0  100.00%
crates/acp-nats/src/agent/cancel.rs                                             105       0  100.00%
crates/acp-nats/src/agent/mod.rs                                                 65       0  100.00%
crates/trogon-nats/src/telemetry/messaging.rs                                   102       0  100.00%
crates/trogon-source-telegram/src/config.rs                                      17       0  100.00%
crates/trogon-source-telegram/src/server.rs                                     387       0  100.00%
crates/trogon-source-telegram/src/signature.rs                                   38       0  100.00%
crates/trogon-std/src/secret_string.rs                                           35       0  100.00%
crates/trogon-std/src/args.rs                                                    10       0  100.00%
crates/trogon-std/src/json.rs                                                    30       0  100.00%
crates/trogon-std/src/http.rs                                                    19       0  100.00%
crates/trogon-std/src/duration.rs                                                45       0  100.00%
crates/acp-nats/src/nats/subjects/stream.rs                                      58       0  100.00%
crates/acp-nats/src/nats/subjects/mod.rs                                        380       0  100.00%
crates/acp-nats/src/nats/subjects/responses/prompt_response.rs                   27       0  100.00%
crates/acp-nats/src/nats/subjects/responses/cancelled.rs                         18       0  100.00%
crates/acp-nats/src/nats/subjects/responses/response.rs                          20       0  100.00%
crates/acp-nats/src/nats/subjects/responses/update.rs                            27       0  100.00%
crates/acp-nats/src/nats/subjects/responses/ext_ready.rs                         15       0  100.00%
crates/trogon-source-slack/src/config.rs                                         17       0  100.00%
crates/trogon-source-slack/src/server.rs                                        954       0  100.00%
crates/trogon-source-slack/src/signature.rs                                      80       0  100.00%
crates/acp-nats/src/error.rs                                                     84       0  100.00%
crates/acp-nats/src/in_flight_slot_guard.rs                                      32       0  100.00%
crates/acp-nats/src/ext_method_name.rs                                           70       0  100.00%
crates/acp-nats/src/acp_prefix.rs                                                51       0  100.00%
crates/acp-nats/src/client_proxy.rs                                             200       0  100.00%
crates/acp-nats/src/config.rs                                                   204       0  100.00%
crates/acp-nats/src/lib.rs                                                       73       0  100.00%
crates/acp-nats/src/jsonrpc.rs                                                    6       0  100.00%
crates/acp-nats/src/pending_prompt_waiters.rs                                   141       0  100.00%
crates/acp-nats/src/req_id.rs                                                    39       0  100.00%
crates/acp-nats/src/session_id.rs                                                72       0  100.00%
crates/trogon-source-incidentio/src/config.rs                                    16       0  100.00%
crates/trogon-source-incidentio/src/incidentio_event_type.rs                     67       0  100.00%
crates/trogon-source-incidentio/src/signature.rs                                455       0  100.00%
crates/trogon-source-incidentio/src/server.rs                                   365       0  100.00%
crates/trogon-source-incidentio/src/incidentio_signing_secret.rs                 67       0  100.00%
crates/trogon-nats/src/messaging.rs                                             598       2  99.67%   153, 163
crates/trogon-nats/src/token.rs                                                   8       0  100.00%
crates/trogon-nats/src/mocks.rs                                                 304       0  100.00%
crates/trogon-nats/src/auth.rs                                                  119       0  100.00%
crates/trogon-nats/src/client.rs                                                 25      25  0.00%    50-89
crates/trogon-nats/src/connect.rs                                               105      11  89.52%   22-24, 37, 49, 68-73
crates/trogon-nats/src/nats_token.rs                                            161       0  100.00%
crates/trogon-cron/src/mocks.rs                                                 451      62  86.25%   94-96, 170-186, 197-201, 208-209, 226-229, 242-246, 256-260, 265-269, 276-277, 302, 306-320, 331-335, 340
crates/trogon-cron/src/error.rs                                                 213      10  95.31%   116, 223-236
crates/trogon-cron/src/nats.rs                                                  474     285  39.87%   43-148, 164-199, 210-214, 234-248, 257-264, 299-444, 454-500, 543
crates/trogon-cron/src/main.rs                                                    4       0  100.00%
crates/trogon-cron/src/config.rs                                                125       0  100.00%
crates/trogon-cron/src/kv.rs                                                     51      39  23.53%   40-191
crates/trogon-cron/src/scheduler.rs                                             359      93  74.09%   47-73, 98-170, 215, 307-331
crates/trogon-cron/src/domain.rs                                                382      15  96.07%   128-132, 180-183, 259, 264-266, 271-272
crates/trogon-cron/src/job_id.rs                                                 23      12  47.83%   27-32, 42-50
TOTAL                                                                         29246    1339  95.42%

Diff against main

Filename                                                     Stmts    Miss  Cover
---------------------------------------------------------  -------  ------  --------
crates/trogon-cron/src/events/decision.rs                      +70     +11  +84.29%
crates/trogon-cron/src/events/mod.rs                          +121      +3  +97.52%
crates/trogon-cron/src/events/state.rs                         +64     +21  +67.19%
crates/acp-telemetry/src/service_name.rs                        +3       0  +100.00%
crates/trogon-eventsourcing/src/decision.rs                    +68      +4  +94.12%
crates/trogon-eventsourcing/src/lib.rs                        +101       0  +100.00%
crates/trogon-eventsourcing/src/snapshots.rs                  +371    +207  +44.20%
crates/trogon-cron/src/store/append_events.rs                  +35     +35  +100.00%
crates/trogon-cron/src/store/catch_up_snapshots.rs              +7      +7  +100.00%
crates/trogon-cron/src/store/change_job_state.rs               +50     +19  +62.00%
crates/trogon-cron/src/store/connect.rs                         +6      +6  +100.00%
crates/trogon-cron/src/store/events_stream.rs                   +9      +9  +100.00%
crates/trogon-cron/src/store/get_job.rs                        +12      +9  +25.00%
crates/trogon-cron/src/store/list_jobs.rs                       +9      +9  +100.00%
crates/trogon-cron/src/store/load_and_watch.rs                  +7      +7  +100.00%
crates/trogon-cron/src/store/rewrite_projection.rs             +23     +23  +100.00%
crates/trogon-cron/src/store/config_bucket.rs                   +9      +9  +100.00%
crates/trogon-cron/src/store/project_event_to_snapshot.rs      +11     +11  +100.00%
crates/trogon-cron/src/store/register_job.rs                   +69     +39  +43.48%
crates/trogon-cron/src/store/remove_job.rs                     +40     +23  +42.50%
crates/trogon-cron/src/store/snapshot_bucket.rs                 +9      +9  +100.00%
crates/trogon-nats/src/lease/provision.rs                      -14       0  -0.34%
crates/trogon-nats/src/jetstream/traits.rs                     +18     +18  +100.00%
crates/trogon-nats/src/jetstream/create_conflicts.rs           +26       0  +100.00%
crates/trogon-cron/src/mocks.rs                               +451     +62  +86.25%
crates/trogon-cron/src/error.rs                               +213     +10  +95.31%
crates/trogon-cron/src/nats.rs                                +474    +285  +39.87%
crates/trogon-cron/src/main.rs                                  +4       0  +100.00%
crates/trogon-cron/src/config.rs                              +125       0  +100.00%
crates/trogon-cron/src/kv.rs                                   +51     +39  +23.53%
crates/trogon-cron/src/scheduler.rs                           +359     +93  +74.09%
crates/trogon-cron/src/domain.rs                              +382     +15  +96.07%
crates/trogon-cron/src/job_id.rs                               +23     +12  +47.83%
TOTAL                                                        +3206    +995  -3.26%

Results for commit: 10888ba

Minimum allowed coverage is 95%

♻️ This comment has been updated with latest results

@yordis yordis changed the title feat: add trogon-cron scheduler, tooling, and coverage feat: add trogon-cron scheduler Apr 9, 2026
@yordis yordis force-pushed the CRON branch 4 times, most recently from 1d1431c to 8652b66 Compare April 9, 2026 19:38
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🧹 Nitpick comments (5)
rsworkspace/crates/trogon-cron/Dockerfile (1)

16-16: Add --no-install-recommends to reduce image size.

The apt-get install commands should include --no-install-recommends to avoid pulling unnecessary packages.

📦 Proposed fix
-RUN apt-get update && apt-get install -y pkg-config libssl-dev && rm -rf /var/lib/apt/lists/*
+RUN apt-get update && apt-get install -y --no-install-recommends pkg-config libssl-dev && rm -rf /var/lib/apt/lists/*
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rsworkspace/crates/trogon-cron/Dockerfile` at line 16, Modify the Dockerfile
RUN line that installs system packages so apt-get install uses
--no-install-recommends to avoid pulling unnecessary packages; update the RUN
command that currently calls "apt-get install -y pkg-config libssl-dev" to
include "--no-install-recommends" and keep the apt-get update and rm -rf
/var/lib/apt/lists/* steps intact.
rsworkspace/crates/trogon-cron/src/error.rs (1)

3-4: Kv and Publish variants discard error context by storing strings.

Per coding guidelines, errors should wrap the source error as a field or variant instead of converting to a string. The Kv(String) and Publish(String) variants likely originate from NATS KV/publish errors and should preserve the original error type for proper error chaining.

Consider wrapping the underlying async_nats error types:

 pub enum CronError {
-    Kv(String),
-    Publish(String),
+    Kv(async_nats::jetstream::kv::CreateError), // or appropriate error type
+    Publish(async_nats::PublishError),          // or appropriate error type

Then update source() to return Some(e) for these variants as well.

As per coding guidelines: "Never discard error context by converting a typed error into a string; wrap the source error as a field or variant instead."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rsworkspace/crates/trogon-cron/src/error.rs` around lines 3 - 4, The
Kv(String) and Publish(String) enum variants in the Error type discard original
error context; change them to hold the underlying async_nats error (e.g.,
Kv(async_nats::Error) and Publish(async_nats::Error) or a boxed source like
Kv(Box<dyn std::error::Error + Send + Sync>)) so the original error is
preserved, update any constructors/From impls that currently convert to String
to wrap the original error instead, and modify the Error::source(&self)
implementation to return Some(inner_error) for the Kv and Publish variants so
proper error chaining is maintained.
rsworkspace/crates/trogon-cron/src/config.rs (1)

325-334: Consider moving TickPayload before the tests module.

TickPayload is defined after #[cfg(test)] mod tests, which is unconventional. While valid Rust, it may confuse readers who expect all public types to be defined before tests.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rsworkspace/crates/trogon-cron/src/config.rs` around lines 325 - 334, The
TickPayload struct is defined after the #[cfg(test)] mod tests block which is
unconventional and can confuse readers; move the TickPayload definition (the pub
struct TickPayload with fields job_id, fired_at, execution_id, payload and its
derives Serialize/Deserialize) so it appears before the tests module (i.e.,
place TickPayload above the tests mod) and ensure any test code referencing
TickPayload still compiles after the reorder.
rsworkspace/crates/trogon-cron/src/kv.rs (2)

96-127: The 500ms timeout for empty-bucket detection may be fragile.

The hardcoded 500ms deadline assumes the initial watcher snapshot arrives within that window. On high-latency networks or under load, this could cause the scheduler to start with an incomplete job set, silently missing jobs that arrive after the timeout but before the first delta == 0 entry.

Consider either:

  1. Making the timeout configurable
  2. Using the NATS pending_count or similar metadata to detect completion
  3. Documenting this limitation prominently
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rsworkspace/crates/trogon-cron/src/kv.rs` around lines 96 - 127, The
hardcoded 500ms deadline used via tokio::time::sleep (pinned as deadline) to
decide the watcher initial snapshot is fragile and can drop jobs; replace it
with a robust approach: either make the timeout configurable (expose a
parameter/const used where deadline is created) or remove the timeout and use
watcher metadata (e.g., NATS pending_count or another completion signal) to
detect the end-of-snapshot instead of breaking on timeout; update the loop
around watcher.next()/delta checks (and error path returning CronError::Kv) to
rely on the chosen mechanism so JobConfig deserialization and the is_last
(e.delta == 0) logic drive completion reliably.

55-62: Swallowing the create_stream error loses diagnostic information.

If create_stream fails for a reason other than "stream already exists" (e.g., permission denied, invalid config), the code falls through to get_stream, which may fail with a misleading error. Consider matching on the specific "already exists" error variant rather than catching all errors.

match js.create_stream(config).await {
    Ok(_) => Ok(()),
    Err(e) if e.kind() == CreateStreamErrorKind::StreamNameExists => {
        // Already exists, verify it's accessible
        js.get_stream(TICKS_STREAM).await.map(|_| ()).map_err(...)
    }
    Err(e) => Err(CronError::Stream(e)), // Propagate actual creation failures
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rsworkspace/crates/trogon-cron/src/kv.rs` around lines 55 - 62, The current
match on js.create_stream(config).await swallows all errors and falls back to
js.get_stream(TICKS_STREAM), losing the original failure details; update the
match to pattern-match the specific "stream already exists" error (e.g., Err(e)
if e.kind() == CreateStreamErrorKind::StreamNameExists) and only in that case
call js.get_stream(TICKS_STREAM).await.map(|_| ()).map_err(|e|
CronError::Kv(e.to_string())); for any other Err(e) from js.create_stream return
Err(CronError::Stream(e.to_string())) (or the appropriate CronError variant) so
creation failures are propagated instead of masked.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@rsworkspace/crates/trogon-cron/Dockerfile`:
- Around line 22-31: The runtime Dockerfile currently runs the container as
root; modify the runtime stage to create a dedicated non-root user and group
(e.g., trogon), ensure the copied binary /usr/local/bin/trogon-cron is owned by
that user (chown) and has executable permissions, set a sensible HOME if needed,
and add a USER trogon directive before ENTRYPOINT so the container runs as the
non-root user; reference the runtime stage, the COPY of
/usr/local/bin/trogon-cron, ENTRYPOINT ["trogon-cron"], and CMD ["serve"] when
making these changes.

In `@rsworkspace/crates/trogon-cron/src/client.rs`:
- Around line 73-80: The code in set_enabled uses CronError::Kv(format!(...))
for a missing job; add a typed error variant like CronError::JobNotFound { id:
String } in your error enum (e.g., in error.rs) and replace the ok_or_else
mapping in set_enabled to return CronError::JobNotFound { id: id.to_string() }
instead of constructing a formatted string; update any other sites that
currently create "job not found" Kv strings to use the new JobNotFound variant
so callers can match on the typed error.

In `@rsworkspace/crates/trogon-cron/src/executor.rs`:
- Around line 279-281: The current guard around publish_dead_letter (the if
max_retries > 0 check) prevents publishing a DLQ event when an execution
permanently fails with zero retries; update the failure path so
publish_dead_letter(&publisher, &tick, "publish", actual_attempts,
last_error).await is invoked whenever execution ultimately fails (i.e., when
last_error indicates a terminal failure), regardless of max_retries or retry:
None. Remove or replace the max_retries conditional in the block that handles
final failures (referencing publish_dead_letter, publisher, tick,
actual_attempts, last_error) and make the same change in the analogous block
around lines 442-444 so dead-letter events are always emitted for permanent
failures.
- Around line 224-280: The retry loop is converting errors to String (last_error
= e.to_string()) and losing typed context; instead introduce a small internal
error enum (e.g., PublishError with variants for NatsPublish, Spawn, etc.) and
change last_error from String to that enum (or Option<PublishError>), assign
Err(e) into the appropriate variant when publish_tick fails, and propagate that
typed error through the loop; only convert to String when calling
publish_dead_letter and in the final tracing::error (i.e., stringify inside
publish_dead_letter or right at the logging call). Update publish_dead_letter
(or its callsite) to accept the typed error or perform the to_string there, and
ensure the enum implements Display/Debug/From for the original error types so no
source context is lost.
- Around line 184-201: The concurrent check in RuntimeAction::Spawn uses
state.is_running.load(...) then later sets it, which is racy; replace the
load+separate store with an atomic compare_exchange on state.is_running to
change false -> true (using appropriate Orderings, e.g., SeqCst or
Acquire/Release) and only call spawn_process when compare_exchange succeeds;
apply the same compare_exchange replacement for the other occurrence around
lines 297-303 so the guard for non-concurrent jobs is claimed atomically before
invoking spawn_process.

In `@rsworkspace/crates/trogon-cron/src/nats_impls.rs`:
- Around line 195-218: The TickPublisher::publish_tick implementation is
discarding typed error context by converting PublishOutcome errors to strings;
update the error handling to preserve and wrap the original error types instead
of calling to_string(). Add variants to CronError (e.g.,
CronError::PublishError(PublishError) and CronError::AckError(AckError) or a
single CronError::PublishSource(Box<dyn Error + Send + Sync>)) and map
PublishOutcome::PublishFailed(e), ::AckFailed(e), and ::StoreFailed(e) to those
variants (or use .source() wrapping) when returning Err; ensure publish_tick
(and any call sites of publish_event / JetStreamContextPublisher) return the new
CronError variants so the original PublishError/AckError types and context are
preserved rather than being stringified.
- Around line 36-49: The NatsConfigStore::put_job implementation wraps SDK
errors into CronError::Kv by converting them to strings and uses map_err,
violating the zero-cost passthrough rule; change the implementation of
ConfigStore for NatsConfigStore to return the underlying SDK errors directly
(i.e., remove map_err calls around self.js.get_key_value(...) and kv.put(...)
and avoid converting errors to strings) or adjust CronError::Kv to carry the
original SDK error type so you can use the ? operator without losing context;
ensure serde_json::to_vec errors are propagated similarly (e.g., via ?), so the
body of put_job is a direct await/return passthrough to js.get_key_value and
kv.put with no map_err conversions.
- Around line 234-259: The code in NatsLeaderLock's try_acquire, renew, and
release is converting underlying store errors to strings (map_err(|e|
CronError::Kv(e.to_string()))) which discards original error types; change this
by making CronError::Kv carry the original error (or a boxed dyn Error) and/or
implement From for the store error types, then replace map_err calls with
map_err(Into::into) or map_err(CronError::from) so create/update/delete errors
are preserved; update the CronError enum definition to accept the concrete
source types (or Box<dyn std::error::Error + Send + Sync>) and adjust
try_acquire, renew, and release to forward the original
CreateError/UpdateError/DeleteError instead of to_string.

In `@rsworkspace/crates/trogon-cron/src/scheduler.rs`:
- Around line 180-201: The helper reestablish_config_watch currently blocks
retries and is not cancellation-aware, which can prevent run() from observing
shutdown signals; modify reestablish_config_watch to accept a cancellation
trigger (e.g., a ShutdownReceiver, a CancellationToken, or an async
shutdown_signal future) or return early on cancellation, then inside the loop
use tokio::select! to race ConfigStore::load_and_watch().await and the shutdown
trigger (and the sleep retry against the shutdown), returning an Err/early exit
when cancelled so the outer run() can handle graceful leader-lock release;
alternatively move the retry sleep/select logic up into run() and keep
reestablish_config_watch as a single-attempt helper that returns its
watcher/error to the caller.
- Around line 153-156: preserve_runtime_state currently only copies is_running
and publish_in_flight, causing last_fired to reset and interval jobs to fire
immediately after reload; update preserve_runtime_state (and the build_job_state
path that calls it) to also carry forward the prior firing state by copying
old_state.last_fired into new_state.last_fired when the schedule is unchanged,
or, if the schedule has changed, recompute new_state.last_fired from
old_state.last_fired (e.g., advance/snap it through the new schedule) so reloads
do not treat every reload as a first load; refer to preserve_runtime_state,
build_job_state, and JobState.last_fired to locate where to copy or recompute
the value.

---

Nitpick comments:
In `@rsworkspace/crates/trogon-cron/Dockerfile`:
- Line 16: Modify the Dockerfile RUN line that installs system packages so
apt-get install uses --no-install-recommends to avoid pulling unnecessary
packages; update the RUN command that currently calls "apt-get install -y
pkg-config libssl-dev" to include "--no-install-recommends" and keep the apt-get
update and rm -rf /var/lib/apt/lists/* steps intact.

In `@rsworkspace/crates/trogon-cron/src/config.rs`:
- Around line 325-334: The TickPayload struct is defined after the #[cfg(test)]
mod tests block which is unconventional and can confuse readers; move the
TickPayload definition (the pub struct TickPayload with fields job_id, fired_at,
execution_id, payload and its derives Serialize/Deserialize) so it appears
before the tests module (i.e., place TickPayload above the tests mod) and ensure
any test code referencing TickPayload still compiles after the reorder.

In `@rsworkspace/crates/trogon-cron/src/error.rs`:
- Around line 3-4: The Kv(String) and Publish(String) enum variants in the Error
type discard original error context; change them to hold the underlying
async_nats error (e.g., Kv(async_nats::Error) and Publish(async_nats::Error) or
a boxed source like Kv(Box<dyn std::error::Error + Send + Sync>)) so the
original error is preserved, update any constructors/From impls that currently
convert to String to wrap the original error instead, and modify the
Error::source(&self) implementation to return Some(inner_error) for the Kv and
Publish variants so proper error chaining is maintained.

In `@rsworkspace/crates/trogon-cron/src/kv.rs`:
- Around line 96-127: The hardcoded 500ms deadline used via tokio::time::sleep
(pinned as deadline) to decide the watcher initial snapshot is fragile and can
drop jobs; replace it with a robust approach: either make the timeout
configurable (expose a parameter/const used where deadline is created) or remove
the timeout and use watcher metadata (e.g., NATS pending_count or another
completion signal) to detect the end-of-snapshot instead of breaking on timeout;
update the loop around watcher.next()/delta checks (and error path returning
CronError::Kv) to rely on the chosen mechanism so JobConfig deserialization and
the is_last (e.delta == 0) logic drive completion reliably.
- Around line 55-62: The current match on js.create_stream(config).await
swallows all errors and falls back to js.get_stream(TICKS_STREAM), losing the
original failure details; update the match to pattern-match the specific "stream
already exists" error (e.g., Err(e) if e.kind() ==
CreateStreamErrorKind::StreamNameExists) and only in that case call
js.get_stream(TICKS_STREAM).await.map(|_| ()).map_err(|e|
CronError::Kv(e.to_string())); for any other Err(e) from js.create_stream return
Err(CronError::Stream(e.to_string())) (or the appropriate CronError variant) so
creation failures are propagated instead of masked.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 824714aa-1b17-40de-8d8e-1d77a4a2f5c4

📥 Commits

Reviewing files that changed from the base of the PR and between 290461a and 8652b66.

⛔ Files ignored due to path filters (1)
  • rsworkspace/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (19)
  • rsworkspace/crates/AGENTS.md
  • rsworkspace/crates/trogon-cron/Cargo.toml
  • rsworkspace/crates/trogon-cron/Dockerfile
  • rsworkspace/crates/trogon-cron/docker-compose.yml
  • rsworkspace/crates/trogon-cron/src/client.rs
  • rsworkspace/crates/trogon-cron/src/config.rs
  • rsworkspace/crates/trogon-cron/src/domain.rs
  • rsworkspace/crates/trogon-cron/src/error.rs
  • rsworkspace/crates/trogon-cron/src/executor.rs
  • rsworkspace/crates/trogon-cron/src/kv.rs
  • rsworkspace/crates/trogon-cron/src/leader.rs
  • rsworkspace/crates/trogon-cron/src/lib.rs
  • rsworkspace/crates/trogon-cron/src/main.rs
  • rsworkspace/crates/trogon-cron/src/mocks.rs
  • rsworkspace/crates/trogon-cron/src/nats_impls.rs
  • rsworkspace/crates/trogon-cron/src/scheduler.rs
  • rsworkspace/crates/trogon-cron/src/traits.rs
  • rsworkspace/crates/trogon-cron/tests/cron_unit.rs
  • rsworkspace/crates/trogon-cron/tests/integration.rs

@yordis yordis added the rust:coverage-baseline-reset Relax Rust coverage gate to establish a new baseline label Apr 9, 2026
@yordis yordis force-pushed the CRON branch 2 times, most recently from f0d1276 to 6728775 Compare April 9, 2026 19:53
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (6)
rsworkspace/crates/trogon-cron/src/nats_impls.rs (3)

36-120: ⚠️ Potential issue | 🟠 Major

Stop stringifying JetStream errors in the ConfigStore impl.

These map_err(|e| CronError::Kv(e.to_string())) calls erase the concrete async-nats error types on every KV operation. Either let CronError::Kv wrap the SDK errors directly or give ConfigStore an associated error type so these methods can stay passthroughs and use ?.

As per coding guidelines: "Production implementations of infrastructure traits must be zero-cost passthroughs to the underlying SDK. No error wrapping (use the SDK's error types directly via associated type Error), no return type conversion, no map_err, no map(|_| ())." and "Never discard error context by converting a typed error into a string; wrap the source error as a field or variant instead."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rsworkspace/crates/trogon-cron/src/nats_impls.rs` around lines 36 - 120, The
current NatsConfigStore implementation for ConfigStore (methods put_job,
get_job, delete_job, list_jobs, load_and_watch) is converting
async-nats/jetstream SDK errors to strings via map_err(|e|
CronError::Kv(e.to_string())), losing typed error context; fix by making these
methods passthrough zero-cost: either change CronError::Kv to hold the original
SDK error type (wrap the error instead of to_string) or update the ConfigStore
signature to use an associated Error type and return the SDK errors directly
(remove map_err and the e.to_string() conversions) so all KV calls
(get_key_value, entry, delete, keys, get, etc.) propagate the concrete SDK
errors via ?.

194-215: ⚠️ Potential issue | 🟠 Major

Preserve the original PublishOutcome failure sources here.

PublishFailed, AckFailed, AckTimedOut, and StoreFailed all get collapsed into CronError::Publish(String), so the caller loses the concrete publish/ack/store failure type and nested context. This should stay typed through the TickPublisher boundary.

As per coding guidelines: "Never discard error context by converting a typed error into a string; wrap the source error as a field or variant instead."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rsworkspace/crates/trogon-cron/src/nats_impls.rs` around lines 194 - 215, The
publish_tick implementation on jetstream::Context currently collapses
PublishOutcome variants into CronError::Publish(String); change CronError to
preserve source error types (e.g., add variants
CronError::PublishFailed(PublishError), CronError::AckFailed(AckError),
CronError::AckTimedOut(Duration) and CronError::StoreFailed(StoreError) or a
single CronError::Publish{source: PublishOutcomeError} that wraps the original
error), then update publish_tick to map each PublishOutcome variant to the
corresponding typed CronError (use .map_err(|e| CronError::PublishFailed(e)) or
construct the appropriate variant for AckTimedOut with the timeout), ensuring
you propagate the original error values instead of converting them to strings;
touch the publish_tick function, the PublishOutcome match arms, and the
CronError enum to implement this typed error propagation.

233-258: ⚠️ Potential issue | 🟠 Major

Leader-lock KV operations still erase SDK error types.

try_acquire, renew, and release are still doing CronError::Kv(e.to_string()), which drops the original create/update/delete errors and breaks the zero-cost passthrough rule for infrastructure code.

As per coding guidelines: "Production implementations of infrastructure traits must be zero-cost passthroughs to the underlying SDK. No error wrapping (use the SDK's error types directly via associated type Error), no return type conversion, no map_err, no map(|_| ())." and "Errors must be typed—use structs or enums, never String or format!()."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rsworkspace/crates/trogon-cron/src/nats_impls.rs` around lines 233 - 258, The
impl for NatsLeaderLock currently converts SDK errors into CronError via
map_err(e.to_string()) in try_acquire, renew and release; change it to be a
zero-cost passthrough by making the impl's associated type Error the underlying
KV/store error type (the concrete error returned by
self.store.create/update/delete, e.g., crate::kv::Error or the store client's
error type) and remove all map_err/string conversions so each method returns the
store's Result directly (call self.store.create(LEADER_KEY, value).await,
self.store.update(LEADER_KEY, value, revision).await, and
self.store.delete(LEADER_KEY).await without mapping errors). Ensure references
to NatsLeaderLock, try_acquire, renew, release, CronError and
crate::kv::LEADER_KEY are used to locate and update the code.
rsworkspace/crates/trogon-cron/src/executor.rs (3)

279-281: ⚠️ Potential issue | 🟠 Major

Dead-letter terminal failures even when retries are disabled.

Both branches only publish to cron.errors when max_retries > 0, so a permanent first-attempt failure with retry: None or max_retries: 0 never reaches the DLQ path. Emit the dead-letter event whenever execution ends in failure.

Suggested fix
-    if max_retries > 0 {
-        publish_dead_letter(&publisher, &tick, "publish", actual_attempts, last_error).await;
-    }
+    publish_dead_letter(&publisher, &tick, "publish", actual_attempts, last_error).await;
-            if max_retries > 0 {
-                publish_dead_letter(&publisher, &tick, "spawn", actual_attempts, last_error).await;
-            }
+            publish_dead_letter(&publisher, &tick, "spawn", actual_attempts, last_error).await;

Also applies to: 442-444

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rsworkspace/crates/trogon-cron/src/executor.rs` around lines 279 - 281, The
code only calls publish_dead_letter(&publisher, &tick, "publish",
actual_attempts, last_error).await when max_retries > 0, so permanent failures
with retry: None or max_retries == 0 never get sent to the DLQ; remove or change
the max_retries guard so publish_dead_letter is invoked whenever execution
finishes in a failure path regardless of max_retries (i.e., call
publish_dead_letter in the failure handling branch that currently handles
terminal errors), and mirror this same fix for the other failure branch that
currently has the same max_retries check (the similar publish_dead_letter call
later in the file).

184-201: ⚠️ Potential issue | 🔴 Critical

Claim is_running atomically before spawning.

load() in execute() and the later store(true) in spawn_process() are separate operations, so two callers can both observe false and both spawn when concurrent == false. Use compare_exchange(false, true, ...) in execute() and remove the unconditional store in spawn_process().

Suggested fix
-            let is_run = state.is_running.load(Ordering::SeqCst);
-            if !command.concurrent() && is_run {
+            if !command.concurrent()
+                && state
+                    .is_running
+                    .compare_exchange(false, true, Ordering::SeqCst, Ordering::SeqCst)
+                    .is_err()
+            {
                 tracing::debug!(job_id = %state.job.id(), "Skipping tick — previous invocation still running");
                 return;
             }
-    is_running.store(true, Ordering::SeqCst);

Also applies to: 297-303

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rsworkspace/crates/trogon-cron/src/executor.rs` around lines 184 - 201, The
check of state.is_running in RuntimeAction::Spawn is racy because load()
followed by spawn_process()’s store(true) allows two threads to both spawn when
concurrent() == false; change the logic in execute() (where RuntimeAction::Spawn
is handled) to attempt an atomic compare_exchange(false, true, Ordering::SeqCst,
Ordering::SeqCst) on state.is_running and only proceed to call spawn_process if
the exchange succeeds, and remove the unconditional store(true) inside
spawn_process(); ensure spawn_process (or its completion path) still clears
is_running back to false when the job finishes or fails, and apply the same
compare_exchange-based claim to the other similar invocation that currently
mirrors this pattern.

223-278: ⚠️ Potential issue | 🟠 Major

Keep retry failures typed until the DLQ/log boundary.

last_error = e.to_string(), Err(e.to_string()), and the format!(...)-based retry failures flatten everything into String long before the retry loop finishes. That drops source context and makes the DLQ/log path lossy. Please keep an internal typed failure enum here and stringify only when you finally serialize DeadLetterPayload.

Based on learnings: "Errors must be typed—use structs or enums, never String or format!()." and "Never discard error context by converting a typed error into a string; wrap the source error as a field or variant instead."

Also applies to: 305-440

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rsworkspace/crates/trogon-cron/src/executor.rs` around lines 223 - 278, The
retry loop currently collapses all errors to String early (last_error =
e.to_string()), losing source/context; introduce a small internal typed error
enum/struct (e.g., enum RetryError { Publish(trogon_nats::Error), Timeout,
BackoffExceeded, ... }) and change last_error from String to Option<RetryError>,
store Err variants as last_error = Some(RetryError::Publish(e)) inside the loop
(refer to publisher.publish_tick and the for attempt loop), keep all context
until the DLQ/log boundary, and only stringify or serialize the error when
constructing DeadLetterPayload or when emitting the final
tracing::error/tracing::warn (use %last_error.to_string() or implement Display
on RetryError at that boundary). Ensure other retry-related logic
(retry_backoff_with_jitter, max_retry_duration checks, and places around lines
305-440) uses the typed error similarly.
🧹 Nitpick comments (2)
devops/docker/compose/compose.yml (1)

85-87: Add optional NATS auth env pass-through for secured deployments.

This service currently injects only NATS_URL; when NATS auth is enabled, trogon-cron supports env-based auth but won’t receive credentials from Compose.

Suggested patch
   trogon-cron:
     build:
       context: ../../../rsworkspace
       dockerfile: ../devops/docker/compose/services/trogon-cron/Dockerfile
     environment:
       NATS_URL: "nats://nats:4222"
+      NATS_CREDS: "${NATS_CREDS:-}"
+      NATS_NKEY: "${NATS_NKEY:-}"
+      NATS_USER: "${NATS_USER:-}"
+      NATS_PASSWORD: "${NATS_PASSWORD:-}"
+      NATS_TOKEN: "${NATS_TOKEN:-}"
       RUST_LOG: "${RUST_LOG:-info}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@devops/docker/compose/compose.yml` around lines 85 - 87, Add optional NATS
auth env passthrough to the trogon-cron service by extending its environment
block (the same block that currently contains NATS_URL and RUST_LOG) to include
optional variables such as NATS_USER, NATS_PASS and NATS_TOKEN (or other auth
vars your deployment uses) and map them to host environment values (e.g.,
${NATS_USER:-} style) so credentials are forwarded only when provided; update
the environment section alongside NATS_URL and RUST_LOG to include these
optional keys.
rsworkspace/crates/trogon-cron/src/executor.rs (1)

61-65: Use saturating multiplication to guard against u64 overflow in the backoff calculation.

When base_sec is very large, the multiplication base_sec * multiplier can theoretically overflow (panic in debug builds, wraparound in release). While practical configurations use small values (1–40 seconds, all well below the u64::MAX / 32 threshold), using saturating_mul provides defensive correctness without cost.

Suggested fix
-    Duration::from_secs(base_sec * multiplier)
+    Duration::from_secs(base_sec.saturating_mul(multiplier))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rsworkspace/crates/trogon-cron/src/executor.rs` around lines 61 - 65, The
backoff multiplier computation in retry_backoff can overflow when multiplying a
large base_sec by the multiplier; change the multiplication to a saturating
multiplication (use base_sec.saturating_mul(multiplier)) before creating the
Duration so it never overflows. Update the retry_backoff function to compute
multiplier as currently done and pass base_sec.saturating_mul(multiplier) into
Duration::from_secs to defensively guard against u64 overflow.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@rsworkspace/crates/trogon-cron/src/domain.rs`:
- Around line 293-300: PublishSubject::new currently only checks the "cron."
prefix and allows invalid NATS subjects (wildcards, whitespace, empty tokens);
update the constructor to validate the full subject using the existing
trogon-nats token validation (e.g., trogon_nats::token::validate or the SDK
subject wrapper) and return CronError::InvalidJobConfig with a clear reason on
failure; specifically, inside PublishSubject::new replace the simple starts_with
check with a call to the trogon-nats validation function (or construct the
trogon-nats Subject type) for the entire subject string and propagate validation
errors into the same InvalidJobConfig variant so invalid subjects cannot be
constructed.

In `@rsworkspace/crates/trogon-cron/src/error.rs`:
- Around line 2-9: Replace the String payloads in the CronError enum
(Kv(String), Publish(String), InvalidJobConfig { reason: String }) with typed
error wrappers so callers can preserve source chains: define concrete error
types (e.g., KvError, PublishError, JobConfigError) or use wrappers carrying the
original error (e.g., Kv(Box<dyn std::error::Error + Send + Sync>),
Publish(Box<dyn std::error::Error + Send + Sync>), InvalidJobConfig { source:
Box<...> }), update the CronError declaration to use those types, and adjust any
Display/impl std::error::Error for CronError to produce human-readable messages
while returning the original sources from source(). Ensure code that currently
calls to_string() on sources is updated to pass the original error into the new
Kv/Publish/InvalidJobConfig variants.

---

Duplicate comments:
In `@rsworkspace/crates/trogon-cron/src/executor.rs`:
- Around line 279-281: The code only calls publish_dead_letter(&publisher,
&tick, "publish", actual_attempts, last_error).await when max_retries > 0, so
permanent failures with retry: None or max_retries == 0 never get sent to the
DLQ; remove or change the max_retries guard so publish_dead_letter is invoked
whenever execution finishes in a failure path regardless of max_retries (i.e.,
call publish_dead_letter in the failure handling branch that currently handles
terminal errors), and mirror this same fix for the other failure branch that
currently has the same max_retries check (the similar publish_dead_letter call
later in the file).
- Around line 184-201: The check of state.is_running in RuntimeAction::Spawn is
racy because load() followed by spawn_process()’s store(true) allows two threads
to both spawn when concurrent() == false; change the logic in execute() (where
RuntimeAction::Spawn is handled) to attempt an atomic compare_exchange(false,
true, Ordering::SeqCst, Ordering::SeqCst) on state.is_running and only proceed
to call spawn_process if the exchange succeeds, and remove the unconditional
store(true) inside spawn_process(); ensure spawn_process (or its completion
path) still clears is_running back to false when the job finishes or fails, and
apply the same compare_exchange-based claim to the other similar invocation that
currently mirrors this pattern.
- Around line 223-278: The retry loop currently collapses all errors to String
early (last_error = e.to_string()), losing source/context; introduce a small
internal typed error enum/struct (e.g., enum RetryError {
Publish(trogon_nats::Error), Timeout, BackoffExceeded, ... }) and change
last_error from String to Option<RetryError>, store Err variants as last_error =
Some(RetryError::Publish(e)) inside the loop (refer to publisher.publish_tick
and the for attempt loop), keep all context until the DLQ/log boundary, and only
stringify or serialize the error when constructing DeadLetterPayload or when
emitting the final tracing::error/tracing::warn (use %last_error.to_string() or
implement Display on RetryError at that boundary). Ensure other retry-related
logic (retry_backoff_with_jitter, max_retry_duration checks, and places around
lines 305-440) uses the typed error similarly.

In `@rsworkspace/crates/trogon-cron/src/nats_impls.rs`:
- Around line 36-120: The current NatsConfigStore implementation for ConfigStore
(methods put_job, get_job, delete_job, list_jobs, load_and_watch) is converting
async-nats/jetstream SDK errors to strings via map_err(|e|
CronError::Kv(e.to_string())), losing typed error context; fix by making these
methods passthrough zero-cost: either change CronError::Kv to hold the original
SDK error type (wrap the error instead of to_string) or update the ConfigStore
signature to use an associated Error type and return the SDK errors directly
(remove map_err and the e.to_string() conversions) so all KV calls
(get_key_value, entry, delete, keys, get, etc.) propagate the concrete SDK
errors via ?.
- Around line 194-215: The publish_tick implementation on jetstream::Context
currently collapses PublishOutcome variants into CronError::Publish(String);
change CronError to preserve source error types (e.g., add variants
CronError::PublishFailed(PublishError), CronError::AckFailed(AckError),
CronError::AckTimedOut(Duration) and CronError::StoreFailed(StoreError) or a
single CronError::Publish{source: PublishOutcomeError} that wraps the original
error), then update publish_tick to map each PublishOutcome variant to the
corresponding typed CronError (use .map_err(|e| CronError::PublishFailed(e)) or
construct the appropriate variant for AckTimedOut with the timeout), ensuring
you propagate the original error values instead of converting them to strings;
touch the publish_tick function, the PublishOutcome match arms, and the
CronError enum to implement this typed error propagation.
- Around line 233-258: The impl for NatsLeaderLock currently converts SDK errors
into CronError via map_err(e.to_string()) in try_acquire, renew and release;
change it to be a zero-cost passthrough by making the impl's associated type
Error the underlying KV/store error type (the concrete error returned by
self.store.create/update/delete, e.g., crate::kv::Error or the store client's
error type) and remove all map_err/string conversions so each method returns the
store's Result directly (call self.store.create(LEADER_KEY, value).await,
self.store.update(LEADER_KEY, value, revision).await, and
self.store.delete(LEADER_KEY).await without mapping errors). Ensure references
to NatsLeaderLock, try_acquire, renew, release, CronError and
crate::kv::LEADER_KEY are used to locate and update the code.

---

Nitpick comments:
In `@devops/docker/compose/compose.yml`:
- Around line 85-87: Add optional NATS auth env passthrough to the trogon-cron
service by extending its environment block (the same block that currently
contains NATS_URL and RUST_LOG) to include optional variables such as NATS_USER,
NATS_PASS and NATS_TOKEN (or other auth vars your deployment uses) and map them
to host environment values (e.g., ${NATS_USER:-} style) so credentials are
forwarded only when provided; update the environment section alongside NATS_URL
and RUST_LOG to include these optional keys.

In `@rsworkspace/crates/trogon-cron/src/executor.rs`:
- Around line 61-65: The backoff multiplier computation in retry_backoff can
overflow when multiplying a large base_sec by the multiplier; change the
multiplication to a saturating multiplication (use
base_sec.saturating_mul(multiplier)) before creating the Duration so it never
overflows. Update the retry_backoff function to compute multiplier as currently
done and pass base_sec.saturating_mul(multiplier) into Duration::from_secs to
defensively guard against u64 overflow.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0b160d0f-18b6-4b16-99ce-1ce2a1741a29

📥 Commits

Reviewing files that changed from the base of the PR and between 8652b66 and df02750.

⛔ Files ignored due to path filters (1)
  • rsworkspace/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (18)
  • devops/docker/compose/compose.yml
  • devops/docker/compose/services/trogon-cron/Dockerfile
  • rsworkspace/crates/trogon-cron/Cargo.toml
  • rsworkspace/crates/trogon-cron/src/client.rs
  • rsworkspace/crates/trogon-cron/src/config.rs
  • rsworkspace/crates/trogon-cron/src/domain.rs
  • rsworkspace/crates/trogon-cron/src/error.rs
  • rsworkspace/crates/trogon-cron/src/executor.rs
  • rsworkspace/crates/trogon-cron/src/kv.rs
  • rsworkspace/crates/trogon-cron/src/leader.rs
  • rsworkspace/crates/trogon-cron/src/lib.rs
  • rsworkspace/crates/trogon-cron/src/main.rs
  • rsworkspace/crates/trogon-cron/src/mocks.rs
  • rsworkspace/crates/trogon-cron/src/nats_impls.rs
  • rsworkspace/crates/trogon-cron/src/scheduler.rs
  • rsworkspace/crates/trogon-cron/src/traits.rs
  • rsworkspace/crates/trogon-cron/tests/cron_unit.rs
  • rsworkspace/crates/trogon-cron/tests/integration.rs
✅ Files skipped from review due to trivial changes (3)
  • rsworkspace/crates/trogon-cron/tests/cron_unit.rs
  • rsworkspace/crates/trogon-cron/Cargo.toml
  • rsworkspace/crates/trogon-cron/src/main.rs
🚧 Files skipped from review as they are similar to previous changes (7)
  • rsworkspace/crates/trogon-cron/src/leader.rs
  • rsworkspace/crates/trogon-cron/src/client.rs
  • rsworkspace/crates/trogon-cron/src/mocks.rs
  • rsworkspace/crates/trogon-cron/src/kv.rs
  • rsworkspace/crates/trogon-cron/src/lib.rs
  • rsworkspace/crates/trogon-cron/src/traits.rs
  • rsworkspace/crates/trogon-cron/src/scheduler.rs

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (7)
rsworkspace/crates/trogon-cron/src/error.rs (1)

2-33: ⚠️ Potential issue | 🟠 Major

Keep CronError variants typed instead of storing strings.

Kv(String), Publish(String), and the string reason payloads force callers to flatten failures before they reach this enum, which is why most variants cannot expose a source() chain. Please wrap concrete source errors or typed validation errors here and keep the human-readable rendering in Display.

Based on learnings: "Errors must be typed—use structs or enums, never String or format!()." and "Never discard error context by converting a typed error into a string; wrap the source error as a field or variant instead."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rsworkspace/crates/trogon-cron/src/error.rs` around lines 2 - 33, CronError
currently stores String payloads (Kv(String), Publish(String), InvalidJobConfig
{ reason: String }) which discards typed error context; change these variants to
hold concrete error types or boxed error trait objects (e.g., Kv(Box<dyn
std::error::Error + Send + Sync>), Publish(Box<dyn std::error::Error + Send +
Sync>), and InvalidJobConfig(ValidationError) where ValidationError is a new
struct/enum describing validation failure), update the Display impl to format
human-readable messages from those types, and extend std::error::Error::source()
to return the wrapped error for Kv, Publish and InvalidJobConfig so callers can
access the original error chain (leave Serde and Io handling as-is).
rsworkspace/crates/trogon-cron/src/nats_impls.rs (3)

233-258: ⚠️ Potential issue | 🟠 Major

Leader-lock operations are also discarding the KV source errors.

create, update, and delete all end in CronError::Kv(e.to_string()), so CAS mismatches and backend failures lose their concrete types and source() chain. Please preserve the underlying KV errors here too.

Based on learnings: "Never discard error context by converting a typed error into a string; wrap the source error as a field or variant instead." and "Errors must be typed—use structs or enums, never String or format!()."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rsworkspace/crates/trogon-cron/src/nats_impls.rs` around lines 233 - 258, The
leader-lock methods (NatsLeaderLock::try_acquire, ::renew, ::release) currently
map KV errors to CronError::Kv(e.to_string()), discarding the original typed
error and its source chain; change the CronError::Kv variant to hold the
original error type (or a boxed dyn Error) and map errors directly (e.g.,
.map_err(|e| CronError::Kv(e)) or .map_err(|e| CronError::Kv { source: e })
instead of e.to_string()), so the errors returned from
self.store.create/update/delete preserve their concrete type and source.

194-215: ⚠️ Potential issue | 🟠 Major

Preserve PublishOutcome sources in TickPublisher::Error.

PublishFailed, AckFailed, AckTimedOut, and StoreFailed are all collapsed into CronError::Publish(String), so downstream retry and DLQ handling cannot inspect or chain the real JetStream failure. This needs the same typed passthrough/wrapping treatment as the KV paths.

Based on learnings: "Never discard error context by converting a typed error into a string; wrap the source error as a field or variant instead." and "Errors must be typed—use structs or enums, never String or format!()."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rsworkspace/crates/trogon-cron/src/nats_impls.rs` around lines 194 - 215, The
TickPublisher impl for jetstream::Context currently collapses PublishOutcome
variants into CronError::Publish(String); instead preserve and wrap the original
typed errors instead of converting to strings—update CronError to include
variants/fields for PublishFailed, AckFailed, AckTimedOut (with duration), and
StoreFailed (or a single Publish variant that contains the original
PublishOutcome::... error type), then change the matching in publish_tick (which
calls publish_event and uses JetStreamContextPublisher and PublishOutcome) to
return Err(CronError::PublishFailed(e)) / Err(CronError::AckFailed(e)) /
Err(CronError::AckTimedOut(timeout)) / Err(CronError::StoreFailed(e)) (or the
equivalent wrapper) so the original error types are preserved and chainable
instead of using to_string()/format!.

36-157: ⚠️ Potential issue | 🟠 Major

ConfigStore should pass through SDK errors instead of stringifying them.

Every map_err(|e| CronError::Kv(e.to_string())) here loses the original JetStream/KV error type and breaks the zero-cost passthrough rule for production infrastructure traits. Either let the trait surface the SDK errors directly, or make CronError wrap the concrete sources so these calls can use ? without flattening.

As per coding guidelines, "Production implementations of infrastructure traits must be zero-cost passthroughs to the underlying SDK. No error wrapping (use the SDK's error types directly via associated type Error), no return type conversion (add associated types like type Info to match the SDK's return), no map_err, no map(|_| ())." and "Never discard error context by converting a typed error into a string; wrap the source error as a field or variant instead."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rsworkspace/crates/trogon-cron/src/nats_impls.rs` around lines 36 - 157, The
code currently stringifies SDK errors via map_err(|e|
CronError::Kv(e.to_string())) in NatsConfigStore (see put_job, get_job,
delete_job, list_jobs, load_and_watch), which loses error types; change
CronError::Kv to hold the concrete SDK error type (e.g., kv::Error or the
appropriate JetStream error) or add a new variant wrapping Box<dyn
std::error::Error + Send + Sync>, implement From<kv::Error> for CronError, then
remove the map_err closures and use the ? operator directly (e.g., let kv =
self.js.get_key_value(CONFIG_BUCKET).await?; and similarly for
kv.put(...).await? etc.), so SDK errors are propagated without being
stringified. Ensure any places that previously used CronError::Kv(String) are
updated to handle the new variant.
rsworkspace/crates/trogon-cron/src/executor.rs (3)

223-281: ⚠️ Potential issue | 🟠 Major

Don't erase retry failures to String before the terminal path.

last_error = e.to_string(), Err(format!(...)), and the String-based spawn/publish retry loops drop the original failure type long before logging or DLQ publication. Keep an internal typed error through the retry path and stringify only at the final log / dead-letter boundary.

Based on learnings: "Errors must be typed—use structs or enums, never String or format!()." and "Never discard error context by converting a typed error into a string; wrap the source error as a field or variant instead."

Also applies to: 306-444

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rsworkspace/crates/trogon-cron/src/executor.rs` around lines 223 - 281, The
retry loop currently converts errors to String early (last_error =
e.to_string()), losing typed error context; change last_error to hold the
original typed error (e.g., Option<anyhow::Error> or Option<Box<dyn
std::error::Error + Send + Sync>>) and assign Some(e.into()) when
publisher.publish_tick(subject.clone(), headers, payload.clone().into()).await
returns Err(e), keep using that typed error for all logic, and only call
.to_string() (or format!) when emitting the final tracing::error and when
calling publish_dead_letter(&publisher, &tick, "publish", actual_attempts,
last_error_string). Ensure references to publisher.publish_tick, last_error,
actual_attempts, and publish_dead_letter are updated so the retry warn logs use
the typed error (or its display) without dropping error type until the terminal
path.

279-281: ⚠️ Potential issue | 🟠 Major

Emit DLQ events for terminal failures even when retries are disabled.

Right now a first-attempt permanent failure disappears from cron.errors whenever retry: None or max_retries == 0. If dead-lettering is part of the scheduler contract, the DLQ publish should happen whenever execution ends in failure, not only when retries were configured.

Also applies to: 442-444

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rsworkspace/crates/trogon-cron/src/executor.rs` around lines 279 - 281, The
DLQ publish is gated by max_retries > 0 so terminal failures when retries are
disabled never get emitted; remove that condition and always call
publish_dead_letter(...) when execution ends in failure. Locate the two guarded
calls (the one using publish_dead_letter(&publisher, &tick, "publish",
actual_attempts, last_error).await and the duplicate at the later block) and
change the control flow so that whenever a terminal failure path is taken (use
the existing failure context variables: publisher, tick, actual_attempts,
last_error) you await publish_dead_letter(...) unconditionally instead of only
when max_retries > 0.

184-201: ⚠️ Potential issue | 🟠 Major

Claim is_running atomically for concurrent: false jobs.

Line 185 does a load(), and Line 300 later does a separate store(true). Two ticks can both observe false before either store happens, so the non-concurrent guard can still launch duplicate processes. This needs a single atomic claim, e.g. compare_exchange(false, true, ...), before spawn_process() is called.

🔧 Suggested change
-            let is_run = state.is_running.load(Ordering::SeqCst);
-            if !command.concurrent() && is_run {
+            if !command.concurrent()
+                && state
+                    .is_running
+                    .compare_exchange(false, true, Ordering::SeqCst, Ordering::SeqCst)
+                    .is_err()
+            {
                 tracing::debug!(job_id = %state.job.id(), "Skipping tick — previous invocation still running");
                 return;
             }
@@
-    is_running.store(true, Ordering::SeqCst);
-
     tokio::spawn(async move {
         let _reset = AtomicFlagReset::new(is_running);

Also applies to: 297-300

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rsworkspace/crates/trogon-cron/src/executor.rs` around lines 184 - 201, The
non-concurrent guard currently does a separate load() then later store(true),
which allows a race where two ticks both see false and both spawn; change the
logic in the RuntimeAction::Spawn branch to atomically claim state.is_running
using compare_exchange(false, true, Ordering::SeqCst, Ordering::SeqCst) when
!command.concurrent(), and only call spawn_process() if the compare_exchange
succeeds (otherwise log/debug skip). Also remove or adapt the separate
state.is_running.store(true) in the spawn_process / completion path so that only
the atomic claim sets the flag (and ensure the flag is cleared on process end as
before).
🧹 Nitpick comments (2)
rsworkspace/crates/trogon-cron/src/config.rs (1)

7-31: Make invalid config states unrepresentable in the public API.

These exported config types still model validated concepts as raw primitives (expr, subject, bin, id, interval_sec, timeout_sec), so malformed jobs deserialize successfully and only fail later in aggregate conversion. Prefer per-field value objects/newtypes with serde support here, so invalid cron expressions, subjects, paths, and positive-second values are rejected at construction time instead of after JobConfig already exists.

As per coding guidelines, "Prefer domain-specific value objects over primitives (e.g., AcpPrefix not String). Each type's factory must guarantee correctness at construction—invalid instances should be unrepresentable." and "Validate per-type, not per-aggregate: avoid validating unrelated fields together in a single constructor."

Also applies to: 79-90

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rsworkspace/crates/trogon-cron/src/config.rs` around lines 7 - 31, Replace
raw primitive fields (expr, subject, bin, interval_sec, timeout_sec) in the
public config enums/structs (Schedule, Action, and JobConfig) with
domain-specific value objects/newtypes that perform validation at construction
and implement Serde (e.g., CronExpr, NatsSubject, ExecutablePath,
PositiveSeconds); update Schedule::Interval to use PositiveSeconds.interval_sec
and Schedule::Cron to use CronExpr.expr, and Action::Publish/Spawn to use
NatsSubject.subject and ExecutablePath.bin, plus PositiveSeconds.timeout_sec;
ensure each newtype exposes a fallible constructor that enforces invariants and
Serde deserializes by invoking that constructor so malformed configs fail at
deserialization rather than later during JobConfig conversion.
rsworkspace/crates/trogon-cron/src/main.rs (1)

136-150: Consider using a typed error instead of format!()-based io::Error.

The crate already defines CronError variants. Using a typed error (e.g., CronError::NotFound { id }) instead of constructing an io::Error with format!() would align better with the crate's error handling patterns.

♻️ Suggested approach
 async fn cmd_get(client: CronClient, id: &str) -> Result<(), Box<dyn std::error::Error>> {
     match client.get_job(id).await? {
         Some(job) => println!("{}", serde_json::to_string_pretty(&job).unwrap()),
         None => {
-            return Err(std::io::Error::new(
-                std::io::ErrorKind::NotFound,
-                format!("job '{id}' not found"),
-            )
-            .into());
+            return Err(trogon_cron::CronError::NotFound { id: id.to_string() }.into());
         }
     }
 
     Ok(())
 }

This requires adding a NotFound variant to CronError if it doesn't already exist.

As per coding guidelines: "Errors must be typed—use structs or enums, never String or format!()."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rsworkspace/crates/trogon-cron/src/main.rs` around lines 136 - 150, Replace
the ad-hoc io::Error in cmd_get with the crate's typed CronError: update/ensure
CronError has a NotFound { id: String } (or similar) variant, then return
Err(CronError::NotFound { id: id.to_string() }.into()) when
client.get_job(id).await? yields None; modify the cmd_get function to construct
and return that CronError instead of using format!() so error handling is
consistent with the rest of the crate.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@rsworkspace/crates/trogon-cron/src/main.rs`:
- Around line 159-168: The current map_err closures for
std::fs::read_to_string(file) and serde_json::from_str(&json) throw away the
original error by formatting it to a string; instead, add a CLI-specific error
type (e.g., an enum like CliError with variants ReadFile { path: String, source:
std::io::Error } and InvalidJobConfig { source: serde_json::Error }) or switch
the function to return a boxed error (Box<dyn std::error::Error>) / use
anyhow/thiserror, and change the map_err calls to wrap the original source error
into those variants (preserving the source) rather than calling format!;
reference the read_to_string(file) map_err closure and the
serde_json::from_str(&json) -> JobConfig mapping to locate and update the
conversions.

---

Duplicate comments:
In `@rsworkspace/crates/trogon-cron/src/error.rs`:
- Around line 2-33: CronError currently stores String payloads (Kv(String),
Publish(String), InvalidJobConfig { reason: String }) which discards typed error
context; change these variants to hold concrete error types or boxed error trait
objects (e.g., Kv(Box<dyn std::error::Error + Send + Sync>), Publish(Box<dyn
std::error::Error + Send + Sync>), and InvalidJobConfig(ValidationError) where
ValidationError is a new struct/enum describing validation failure), update the
Display impl to format human-readable messages from those types, and extend
std::error::Error::source() to return the wrapped error for Kv, Publish and
InvalidJobConfig so callers can access the original error chain (leave Serde and
Io handling as-is).

In `@rsworkspace/crates/trogon-cron/src/executor.rs`:
- Around line 223-281: The retry loop currently converts errors to String early
(last_error = e.to_string()), losing typed error context; change last_error to
hold the original typed error (e.g., Option<anyhow::Error> or Option<Box<dyn
std::error::Error + Send + Sync>>) and assign Some(e.into()) when
publisher.publish_tick(subject.clone(), headers, payload.clone().into()).await
returns Err(e), keep using that typed error for all logic, and only call
.to_string() (or format!) when emitting the final tracing::error and when
calling publish_dead_letter(&publisher, &tick, "publish", actual_attempts,
last_error_string). Ensure references to publisher.publish_tick, last_error,
actual_attempts, and publish_dead_letter are updated so the retry warn logs use
the typed error (or its display) without dropping error type until the terminal
path.
- Around line 279-281: The DLQ publish is gated by max_retries > 0 so terminal
failures when retries are disabled never get emitted; remove that condition and
always call publish_dead_letter(...) when execution ends in failure. Locate the
two guarded calls (the one using publish_dead_letter(&publisher, &tick,
"publish", actual_attempts, last_error).await and the duplicate at the later
block) and change the control flow so that whenever a terminal failure path is
taken (use the existing failure context variables: publisher, tick,
actual_attempts, last_error) you await publish_dead_letter(...) unconditionally
instead of only when max_retries > 0.
- Around line 184-201: The non-concurrent guard currently does a separate load()
then later store(true), which allows a race where two ticks both see false and
both spawn; change the logic in the RuntimeAction::Spawn branch to atomically
claim state.is_running using compare_exchange(false, true, Ordering::SeqCst,
Ordering::SeqCst) when !command.concurrent(), and only call spawn_process() if
the compare_exchange succeeds (otherwise log/debug skip). Also remove or adapt
the separate state.is_running.store(true) in the spawn_process / completion path
so that only the atomic claim sets the flag (and ensure the flag is cleared on
process end as before).

In `@rsworkspace/crates/trogon-cron/src/nats_impls.rs`:
- Around line 233-258: The leader-lock methods (NatsLeaderLock::try_acquire,
::renew, ::release) currently map KV errors to CronError::Kv(e.to_string()),
discarding the original typed error and its source chain; change the
CronError::Kv variant to hold the original error type (or a boxed dyn Error) and
map errors directly (e.g., .map_err(|e| CronError::Kv(e)) or .map_err(|e|
CronError::Kv { source: e }) instead of e.to_string()), so the errors returned
from self.store.create/update/delete preserve their concrete type and source.
- Around line 194-215: The TickPublisher impl for jetstream::Context currently
collapses PublishOutcome variants into CronError::Publish(String); instead
preserve and wrap the original typed errors instead of converting to
strings—update CronError to include variants/fields for PublishFailed,
AckFailed, AckTimedOut (with duration), and StoreFailed (or a single Publish
variant that contains the original PublishOutcome::... error type), then change
the matching in publish_tick (which calls publish_event and uses
JetStreamContextPublisher and PublishOutcome) to return
Err(CronError::PublishFailed(e)) / Err(CronError::AckFailed(e)) /
Err(CronError::AckTimedOut(timeout)) / Err(CronError::StoreFailed(e)) (or the
equivalent wrapper) so the original error types are preserved and chainable
instead of using to_string()/format!.
- Around line 36-157: The code currently stringifies SDK errors via map_err(|e|
CronError::Kv(e.to_string())) in NatsConfigStore (see put_job, get_job,
delete_job, list_jobs, load_and_watch), which loses error types; change
CronError::Kv to hold the concrete SDK error type (e.g., kv::Error or the
appropriate JetStream error) or add a new variant wrapping Box<dyn
std::error::Error + Send + Sync>, implement From<kv::Error> for CronError, then
remove the map_err closures and use the ? operator directly (e.g., let kv =
self.js.get_key_value(CONFIG_BUCKET).await?; and similarly for
kv.put(...).await? etc.), so SDK errors are propagated without being
stringified. Ensure any places that previously used CronError::Kv(String) are
updated to handle the new variant.

---

Nitpick comments:
In `@rsworkspace/crates/trogon-cron/src/config.rs`:
- Around line 7-31: Replace raw primitive fields (expr, subject, bin,
interval_sec, timeout_sec) in the public config enums/structs (Schedule, Action,
and JobConfig) with domain-specific value objects/newtypes that perform
validation at construction and implement Serde (e.g., CronExpr, NatsSubject,
ExecutablePath, PositiveSeconds); update Schedule::Interval to use
PositiveSeconds.interval_sec and Schedule::Cron to use CronExpr.expr, and
Action::Publish/Spawn to use NatsSubject.subject and ExecutablePath.bin, plus
PositiveSeconds.timeout_sec; ensure each newtype exposes a fallible constructor
that enforces invariants and Serde deserializes by invoking that constructor so
malformed configs fail at deserialization rather than later during JobConfig
conversion.

In `@rsworkspace/crates/trogon-cron/src/main.rs`:
- Around line 136-150: Replace the ad-hoc io::Error in cmd_get with the crate's
typed CronError: update/ensure CronError has a NotFound { id: String } (or
similar) variant, then return Err(CronError::NotFound { id: id.to_string()
}.into()) when client.get_job(id).await? yields None; modify the cmd_get
function to construct and return that CronError instead of using format!() so
error handling is consistent with the rest of the crate.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 855f74ed-89ed-4cbb-b212-99cf8e932726

📥 Commits

Reviewing files that changed from the base of the PR and between df02750 and d6d34ed.

⛔ Files ignored due to path filters (1)
  • rsworkspace/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (20)
  • devops/docker/compose/compose.yml
  • devops/docker/compose/services/trogon-cron/Dockerfile
  • rsworkspace/crates/acp-telemetry/src/service_name.rs
  • rsworkspace/crates/trogon-cron/Cargo.toml
  • rsworkspace/crates/trogon-cron/src/cli.rs
  • rsworkspace/crates/trogon-cron/src/client.rs
  • rsworkspace/crates/trogon-cron/src/config.rs
  • rsworkspace/crates/trogon-cron/src/domain.rs
  • rsworkspace/crates/trogon-cron/src/error.rs
  • rsworkspace/crates/trogon-cron/src/executor.rs
  • rsworkspace/crates/trogon-cron/src/kv.rs
  • rsworkspace/crates/trogon-cron/src/leader.rs
  • rsworkspace/crates/trogon-cron/src/lib.rs
  • rsworkspace/crates/trogon-cron/src/main.rs
  • rsworkspace/crates/trogon-cron/src/mocks.rs
  • rsworkspace/crates/trogon-cron/src/nats_impls.rs
  • rsworkspace/crates/trogon-cron/src/scheduler.rs
  • rsworkspace/crates/trogon-cron/src/traits.rs
  • rsworkspace/crates/trogon-cron/tests/cron_unit.rs
  • rsworkspace/crates/trogon-cron/tests/integration.rs
✅ Files skipped from review due to trivial changes (3)
  • devops/docker/compose/services/trogon-cron/Dockerfile
  • rsworkspace/crates/trogon-cron/Cargo.toml
  • rsworkspace/crates/trogon-cron/src/leader.rs
🚧 Files skipped from review as they are similar to previous changes (4)
  • rsworkspace/crates/trogon-cron/tests/cron_unit.rs
  • rsworkspace/crates/trogon-cron/src/client.rs
  • rsworkspace/crates/trogon-cron/src/mocks.rs
  • rsworkspace/crates/trogon-cron/src/traits.rs

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (3)
rsworkspace/crates/trogon-cron/src/domain.rs (1)

296-304: ⚠️ Potential issue | 🟠 Major

Reject malformed NATS subjects in PublishSubject.

This factory still accepts invalid subjects as long as they start with cron.. Inputs like cron., cron..backup, cron.*, or cron.bad subject become valid domain objects here and only fail later when the scheduler tries to publish. Tighten this value object so construction guarantees a real NATS subject, not just the prefix. As per coding guidelines: "Prefer domain-specific value objects over primitives (e.g., AcpPrefix not String). Each type's factory must guarantee correctness at construction—invalid instances should be unrepresentable."

#!/bin/bash
# Inspect the current PublishSubject factory and the existing NATS token helpers.
sed -n '296,309p' rsworkspace/crates/trogon-cron/src/domain.rs
fd -i 'token.rs' rsworkspace/crates/trogon-nats -x sed -n '1,220p' {}
rg -n 'PublishSubject|validate|subject' rsworkspace/crates/trogon-cron/src rsworkspace/crates/trogon-nats/src
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rsworkspace/crates/trogon-cron/src/domain.rs` around lines 296 - 304, The
PublishSubject::new constructor currently only checks the "cron." prefix;
tighten it to reject empty or malformed NATS subjects by validating the
remainder after "cron.": ensure there's at least one non-empty segment after the
prefix, disallow consecutive dots or trailing dots (reject "cron.",
"cron..backup"), forbid wildcard tokens like '*' or '>' and any whitespace, and
enforce the same token rules used by the NATS helpers in the trogon-nats crate
(see token validation helpers in token.rs) so subject components are valid
identifiers; if validation fails, return
Err(CronError::invalid_job_config(JobConfigError::PublishSubjectPrefix { subject
})) as before.
rsworkspace/crates/trogon-cron/src/nats_impls.rs (1)

35-40: 🛠️ Refactor suggestion | 🟠 Major

Keep the production trait impls as SDK passthroughs.

NatsConfigStore and NatsLeaderLock still do error adaptation inside the trait impls via CronError::kv_source(...)/map_err(...). That preserves context, but it still violates the repo rule that production infrastructure trait bodies should be thin passthroughs to the SDK. Consider moving this translation above the trait boundary or reshaping the traits so the impls can expose the SDK error/return types directly. As per coding guidelines: "Production implementations of infrastructure traits must be zero-cost passthroughs to the underlying SDK. No error wrapping (use the SDK's error types directly via associated type Error), no return type conversion (add associated types like type Info to match the SDK's return), no map_err, no map(|_| ())."

#!/bin/bash
# Inspect the trait signatures and the current production impl bodies.
sed -n '1,240p' rsworkspace/crates/trogon-cron/src/traits.rs
sed -n '1,260p' rsworkspace/crates/trogon-cron/src/nats_impls.rs

Also applies to: 44-99, 101-148, 228-253

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rsworkspace/crates/trogon-cron/src/nats_impls.rs` around lines 35 - 40, The
production impls (e.g., NatsConfigStore::config_bucket and NatsLeaderLock impls)
are wrapping SDK errors with CronError via map_err/CronError::kv_source inside
trait bodies; change the trait signatures to expose the SDK types/errors
directly (add associated types like type Error and type Info to the trait in
traits.rs) and make the impls zero-cost passthroughs that return the SDK's
Result/values without map_err or wrapping; move the CronError translation to the
call sites or adapter layer above the trait boundary so functions like
config_bucket simply call self.js.get_key_value(CONFIG_BUCKET).await and return
the SDK result as-is.
rsworkspace/crates/trogon-cron/src/kv.rs (1)

36-48: ⚠️ Potential issue | 🟠 Major

Don't treat any same-name JetStream resource as compatible.

After an already exists create failure, these helpers only prove the bucket/stream name exists. They never verify that the existing resource still has the requested history, max_age, or subjects. That can silently accept a leader bucket with the wrong TTL or tick/DLQ streams wired to the wrong subject set. Fetch the existing config and fail if the important fields differ.

#!/bin/bash
# Show the desired configs and the current already-exists fallback paths.
sed -n '24,127p' rsworkspace/crates/trogon-cron/src/kv.rs

Also applies to: 56-106, 108-127

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rsworkspace/crates/trogon-cron/src/kv.rs` around lines 36 - 48, The current
get_or_create_leader_bucket helper (and the other helpers using get_or_create
with kv::Config like the blocks at 56-106 and 108-127) only treats an "already
exists" result as success without validating the existing resource; update the
logic so that after a create failure due to already existing you fetch the
existing bucket/stream config from JetStream (via js) and compare the important
fields (for kv::Config: history and max_age; for streams also verify subjects
and retention/replicas/etc. as appropriate) against the requested
kv::Config/stream config used in get_or_create; if any critical field mismatches
(e.g., history != 1, max_age != Duration::from_secs(10), wrong subjects) return
a CronError indicating a config mismatch instead of silently accepting the
existing resource. Ensure this validation is applied in
get_or_create_leader_bucket and the similar helper blocks referenced so you fail
fast when an incompatible same-name resource exists.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@rsworkspace/crates/trogon-cron/src/executor.rs`:
- Around line 63-68: The multiplication in retry_backoff(base_sec, attempt) can
overflow; replace base_sec * multiplier with a saturation or capped value to
avoid panics/wraps — e.g., compute let secs =
base_sec.saturating_mul(multiplier) (or use checked_mul and clamp to a defined
MAX_RETRY_BACKOFF_SECS constant) and then return Duration::from_secs(secs);
update retry_backoff to use base_sec.saturating_mul(multiplier) or checked_mul +
min(secs, MAX) to keep the retry delay well-defined.

In `@rsworkspace/crates/trogon-cron/src/kv.rs`:
- Around line 163-199: The current 500ms deadline can truncate the watcher
initial-snapshot; change the logic in the loop that drains the initial snapshot
(the tokio::select! using deadline, tokio::pin!(deadline), and the _ = &mut
deadline branch) to instead rely on the watcher entries' e.delta == 0 as the
end-of-snapshot marker: keep consuming watcher.next() until you see Some(Ok(e))
with e.delta == 0 (or None/Err handled as now), deserialize JobConfig on
kv::Operation::Put, and only treat an empty bucket specially (e.g., if watcher
yields None immediately) or load a point-in-time snapshot before attaching the
watch; if you must keep a timeout for defense, make it much larger and clearly
document it.

In `@rsworkspace/crates/trogon-cron/src/nats_impls.rs`:
- Around line 79-99: list_jobs currently swallows deserialization failures and
omits corrupt entries; change the logic in list_jobs (and where it calls
config_bucket(), kv.keys(), kv.get(), and serde_json::from_slice::<JobConfig>)
to fail fast or surface the offending key instead of silently skipping: after
obtaining the value from kv.get(&key).await, attempt serde_json::from_slice and
map any Err into a CronError that includes the key (e.g. via
CronError::kv_source or a new CronError variant) so the function returns an
error containing the key and underlying serde error rather than dropping the
entry; ensure any early returns use the existing CronError type so callers see
corrupted configs.

---

Duplicate comments:
In `@rsworkspace/crates/trogon-cron/src/domain.rs`:
- Around line 296-304: The PublishSubject::new constructor currently only checks
the "cron." prefix; tighten it to reject empty or malformed NATS subjects by
validating the remainder after "cron.": ensure there's at least one non-empty
segment after the prefix, disallow consecutive dots or trailing dots (reject
"cron.", "cron..backup"), forbid wildcard tokens like '*' or '>' and any
whitespace, and enforce the same token rules used by the NATS helpers in the
trogon-nats crate (see token validation helpers in token.rs) so subject
components are valid identifiers; if validation fails, return
Err(CronError::invalid_job_config(JobConfigError::PublishSubjectPrefix { subject
})) as before.

In `@rsworkspace/crates/trogon-cron/src/kv.rs`:
- Around line 36-48: The current get_or_create_leader_bucket helper (and the
other helpers using get_or_create with kv::Config like the blocks at 56-106 and
108-127) only treats an "already exists" result as success without validating
the existing resource; update the logic so that after a create failure due to
already existing you fetch the existing bucket/stream config from JetStream (via
js) and compare the important fields (for kv::Config: history and max_age; for
streams also verify subjects and retention/replicas/etc. as appropriate) against
the requested kv::Config/stream config used in get_or_create; if any critical
field mismatches (e.g., history != 1, max_age != Duration::from_secs(10), wrong
subjects) return a CronError indicating a config mismatch instead of silently
accepting the existing resource. Ensure this validation is applied in
get_or_create_leader_bucket and the similar helper blocks referenced so you fail
fast when an incompatible same-name resource exists.

In `@rsworkspace/crates/trogon-cron/src/nats_impls.rs`:
- Around line 35-40: The production impls (e.g., NatsConfigStore::config_bucket
and NatsLeaderLock impls) are wrapping SDK errors with CronError via
map_err/CronError::kv_source inside trait bodies; change the trait signatures to
expose the SDK types/errors directly (add associated types like type Error and
type Info to the trait in traits.rs) and make the impls zero-cost passthroughs
that return the SDK's Result/values without map_err or wrapping; move the
CronError translation to the call sites or adapter layer above the trait
boundary so functions like config_bucket simply call
self.js.get_key_value(CONFIG_BUCKET).await and return the SDK result as-is.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c4dd3c74-8f61-46c5-aef2-1ac766fb5f6b

📥 Commits

Reviewing files that changed from the base of the PR and between 04091e9 and a7d0da9.

📒 Files selected for processing (11)
  • rsworkspace/crates/trogon-cron/src/client.rs
  • rsworkspace/crates/trogon-cron/src/domain.rs
  • rsworkspace/crates/trogon-cron/src/error.rs
  • rsworkspace/crates/trogon-cron/src/executor.rs
  • rsworkspace/crates/trogon-cron/src/kv.rs
  • rsworkspace/crates/trogon-cron/src/lib.rs
  • rsworkspace/crates/trogon-cron/src/main.rs
  • rsworkspace/crates/trogon-cron/src/nats_impls.rs
  • rsworkspace/crates/trogon-cron/src/scheduler.rs
  • rsworkspace/crates/trogon-cron/tests/cron_unit.rs
  • rsworkspace/crates/trogon-cron/tests/integration.rs
✅ Files skipped from review due to trivial changes (2)
  • rsworkspace/crates/trogon-cron/src/lib.rs
  • rsworkspace/crates/trogon-cron/src/client.rs

Signed-off-by: Yordis Prieto <yordis.prieto@gmail.com>
Signed-off-by: Yordis Prieto <yordis.prieto@gmail.com>
Signed-off-by: Yordis Prieto <yordis.prieto@gmail.com>
Signed-off-by: Yordis Prieto <yordis.prieto@gmail.com>
}));
}
Ok(Some(timezone))
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timezone validation accepts arbitrary non-existent timezone strings

Low Severity

The validate_timezone function only checks for whitespace, control characters, and leading/trailing spaces. Any syntactically clean but semantically invalid timezone string (e.g., "Mars/Base", "Foo/Bar") passes validation. This defers failure to the NATS server at schedule publish time, making it harder to diagnose. Since the cron crate and chrono-tz are already available, the timezone could be validated against the IANA database at spec validation time.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit acb4f3e. Configure here.

yordis added 2 commits April 12, 2026 19:43
Signed-off-by: Yordis Prieto <yordis.prieto@gmail.com>
Signed-off-by: Yordis Prieto <yordis.prieto@gmail.com>
services:
nats:
image: nats:2.11-alpine
image: synadia/nats-server:nightly
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NATS healthcheck tool may be missing in new image

High Severity

The NATS image changed from nats:2.11-alpine (which bundles busybox wget) to synadia/nats-server:nightly (a minimal Alpine image that may not include wget). The healthcheck still uses wget to probe the monitoring endpoint. If wget is unavailable, the healthcheck will permanently fail, and every service with depends_on: nats: condition: service_healthy — including trogon-gateway and trogon-cron — will never start.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 3f52c9c. Configure here.

yordis added 2 commits April 12, 2026 21:29
Signed-off-by: Yordis Prieto <yordis.prieto@gmail.com>
Signed-off-by: Yordis Prieto <yordis.prieto@gmail.com>
Signed-off-by: Yordis Prieto <yordis.prieto@gmail.com>

[dev-dependencies]
tempfile = { workspace = true }
time = "0.3"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused time dev-dependency in trogon-cron Cargo.toml

Low Severity

The time = "0.3" crate is listed as a dev-dependency but does not appear to be used in any test files for trogon-cron. The timestamp conversions in the crate use chrono, and while async_nats uses time internally for StreamMessage.time, the time crate is not directly referenced in test code. This is dead dependency clutter.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit e940e83. Configure here.

yordis added 2 commits April 13, 2026 13:06
Signed-off-by: Yordis Prieto <yordis.prieto@gmail.com>
Signed-off-by: Yordis Prieto <yordis.prieto@gmail.com>
yordis added 7 commits April 13, 2026 15:48
Signed-off-by: Yordis Prieto <yordis.prieto@gmail.com>
Signed-off-by: Yordis Prieto <yordis.prieto@gmail.com>
Signed-off-by: Yordis Prieto <yordis.prieto@gmail.com>
Signed-off-by: Yordis Prieto <yordis.prieto@gmail.com>
Signed-off-by: Yordis Prieto <yordis.prieto@gmail.com>
Signed-off-by: Yordis Prieto <yordis.prieto@gmail.com>
Signed-off-by: Yordis Prieto <yordis.prieto@gmail.com>
Signed-off-by: Yordis Prieto <yordis.prieto@gmail.com>
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 4 total unresolved issues (including 3 from previous reviews).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 0010fe4. Configure here.

}

if let Ok(source_stream) = self.js.stream_by_subject(source_subject).await
&& source_stream != SCHEDULES_STREAM
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stream_by_subject returns stream name string comparison issue

Low Severity

The ensure_source_subject method calls self.js.stream_by_subject(source_subject) which according to NATS JetStream API returns the stream name as a String. The comparison source_stream != SCHEDULES_STREAM compares this with the constant "CRON_SCHEDULES". However, if this API call fails for a reason other than "no stream found" (e.g., network error), the if let Ok(...) silently swallows the error and proceeds to add the subject, which could lead to subject conflicts if the subject belongs to a stream that was temporarily unreachable.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 0010fe4. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rust:coverage-baseline-reset Relax Rust coverage gate to establish a new baseline

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant