feat: coordination mode aware crank#1232
Conversation
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughTaskSchedulerService::start now waits for coordination to leave StartingUp and uses is_primary_mode() to decide behavior: in Primary it loads persisted tasks and spawns the scheduler run loop; in non-primary modes it spawns an immediately-resolving task (standby). load_persisted_tasks returns an explicit Ok(()). Tests were updated to run serially and to switch coordination mode; a new test asserts start() completes immediately when in replica mode. A dev-dependency for serial_test was added. Assessment against linked issues
Suggested reviewers
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@magicblock-task-scheduler/src/service.rs`:
- Around line 735-749: The test
test_task_scheduler_does_not_start_on_standby_mode calls
switch_to_replica_mode() and never restores coordination state; capture the
current coordination mode before calling switch_to_replica_mode() (e.g., let
prev = get_coordination_mode()), then after the test body restore it (e.g.,
set_coordination_mode(prev)) in a finally/teardown block (or use a RAII guard)
so switch_to_replica_mode()’s mutation does not leak to other tests; ensure you
reference and use the existing helpers (switch_to_replica_mode,
get_coordination_mode, set_coordination_mode or equivalent) when implementing
the restore.
- Around line 501-507: The is_primary_mode loop can hang forever; wrap the
waiting loop in a bounded timeout (e.g. using tokio::time::timeout with a
reasonable Duration like 10s) and return false if the timeout elapses, and also
make the inner wait cancellation-aware by using tokio::select! to await either
the sleep or an existing shutdown/cancellation token (or future) if one exists;
modify async fn is_primary_mode(&self) to use
tokio::time::timeout(Duration::from_secs(...), async { loop { ... } }).await
handling the Err(timeout) case and/or the cancellation branch so startup cannot
deadlock while polling CoordinationMode::current.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: ca1d00d1-f89c-463a-a54d-3c0e34e1e4bd
📒 Files selected for processing (1)
magicblock-task-scheduler/src/service.rs
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
magicblock-task-scheduler/src/service.rs (1)
762-775:⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy liftFix
is_primary_modeplacement and swap the misleading test names
- In
magicblock-task-scheduler/src/service.rs,async fn is_primary_mode(&self) -> boolis currently declared insidefn next_execution_millis(...)(nested item), so it cannot be an associatedTaskSchedulerServicemethod called viaself.is_primary_mode(). Moveis_primary_modeintoimpl TaskSchedulerService(or a separateimplblock) and closenext_execution_millisafter returning itsi64.- The test names are swapped around the current
test_stale_crank_completion_*/test_failed_records_*definitions: the first test is exercising periodic cleanup, and the second test is exercising “stale crank completion does not mutate replacement”.♻️ Suggested layout
impl TaskSchedulerService { // ... existing methods ... /// Waits until the coordination mode is not StartingUp. /// Should be fast because task scheduler is started after the ledger replay completes. async fn is_primary_mode(&self) -> bool { let mut mode = CoordinationMode::current(); while mode == CoordinationMode::StartingUp { tokio::select! { _ = self.token.cancelled() => return false, _ = tokio::time::sleep(Duration::from_millis(100)) => {} } mode = CoordinationMode::current(); } mode == CoordinationMode::Primary } } fn next_execution_millis( last_execution_millis: i64, execution_interval_millis: i64, now: i64, ) -> i64 { if last_execution_millis == 0 { now } else { last_execution_millis + execution_interval_millis } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@magicblock-task-scheduler/src/service.rs` around lines 762 - 775, The is_primary_mode async fn is currently nested inside next_execution_millis which prevents calling self.is_primary_mode(); move the async fn is_primary_mode(&self) into the impl TaskSchedulerService (or its own impl) so it becomes an associated method, and ensure next_execution_millis(...) is a standalone free function that returns its i64 before the impl block closes; also rename/swap the two test identifiers so the test exercising periodic cleanup is named test_stale_crank_completion_* and the one exercising “stale crank completion does not mutate replacement” is named test_failed_records_* to match their behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@magicblock-task-scheduler/src/service.rs`:
- Line 1004: The two test functions' identifiers are swapped: the function
currently named test_stale_crank_completion_does_not_mutate_replaced_task
actually tests periodic cleanup, and the function named
test_failed_records_are_cleaned_up_periodically actually tests stale-crank
handling. Rename the functions so the names match their behavior — change the
function currently declared as
test_stale_crank_completion_does_not_mutate_replaced_task to
test_failed_records_are_cleaned_up_periodically, and change the one declared as
test_failed_records_are_cleaned_up_periodically to
test_stale_crank_completion_does_not_mutate_replaced_task — keeping their
bodies, assertions, and any attributes unchanged (refer to the two test fn
symbols to locate and swap the identifiers).
---
Outside diff comments:
In `@magicblock-task-scheduler/src/service.rs`:
- Around line 762-775: The is_primary_mode async fn is currently nested inside
next_execution_millis which prevents calling self.is_primary_mode(); move the
async fn is_primary_mode(&self) into the impl TaskSchedulerService (or its own
impl) so it becomes an associated method, and ensure next_execution_millis(...)
is a standalone free function that returns its i64 before the impl block closes;
also rename/swap the two test identifiers so the test exercising periodic
cleanup is named test_stale_crank_completion_* and the one exercising “stale
crank completion does not mutate replacement” is named test_failed_records_* to
match their behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 7b2b401f-742a-4009-8732-f044b43a254a
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (1)
magicblock-task-scheduler/src/service.rs
| pub async fn start( | ||
| mut self, | ||
| ) -> TaskSchedulerResult<JoinHandle<TaskSchedulerResult<()>>> { | ||
| if self.is_primary_mode().await { |
There was a problem hiding this comment.
I'd suggest doing it on calling site level(magic_validator.rs). It seems like task scheduler wether works or not, so from point crank service it shouldn't be aware of mode at the moment.
Closes #1231
Summary
Disables the crank service when the node starts in replica mode
Breaking Changes
Summary by CodeRabbit
New Features
Tests
Chores