Skip to content

feat: coordination mode aware crank#1232

Open
Dodecahedr0x wants to merge 10 commits into
masterfrom
dode/mode-aware-crank
Open

feat: coordination mode aware crank#1232
Dodecahedr0x wants to merge 10 commits into
masterfrom
dode/mode-aware-crank

Conversation

@Dodecahedr0x
Copy link
Copy Markdown
Contributor

@Dodecahedr0x Dodecahedr0x commented May 20, 2026

Closes #1231

Summary

Disables the crank service when the node starts in replica mode

Breaking Changes

  • None

Summary by CodeRabbit

  • New Features

    • Scheduler now waits for coordination startup and only begins its scheduling loop when in primary mode; in standby/replica mode startup returns immediately.
  • Tests

    • Added and updated tests to validate startup behavior across coordination states, including a test ensuring standby mode returns immediately and serialized startup tests.
  • Chores

    • Added a dev dependency to enable serialized test execution.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 20, 2026

Warning

Rate limit exceeded

@Dodecahedr0x has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 37 minutes and 58 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: de325a2b-452c-40c7-9fd1-5115e970e987

📥 Commits

Reviewing files that changed from the base of the PR and between 9e3855e and b13a200.

📒 Files selected for processing (1)
  • magicblock-task-scheduler/src/service.rs
📝 Walkthrough

Walkthrough

TaskSchedulerService::start now waits for coordination to leave StartingUp and uses is_primary_mode() to decide behavior: in Primary it loads persisted tasks and spawns the scheduler run loop; in non-primary modes it spawns an immediately-resolving task (standby). load_persisted_tasks returns an explicit Ok(()). Tests were updated to run serially and to switch coordination mode; a new test asserts start() completes immediately when in replica mode. A dev-dependency for serial_test was added.

Assessment against linked issues

Objective Addressed Explanation
Disable the task scheduler service for replicas [#1231]

Suggested reviewers

  • GabrielePicco
  • bmuddha
  • snawaz
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dode/mode-aware-crank

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@magicblock-task-scheduler/src/service.rs`:
- Around line 735-749: The test
test_task_scheduler_does_not_start_on_standby_mode calls
switch_to_replica_mode() and never restores coordination state; capture the
current coordination mode before calling switch_to_replica_mode() (e.g., let
prev = get_coordination_mode()), then after the test body restore it (e.g.,
set_coordination_mode(prev)) in a finally/teardown block (or use a RAII guard)
so switch_to_replica_mode()’s mutation does not leak to other tests; ensure you
reference and use the existing helpers (switch_to_replica_mode,
get_coordination_mode, set_coordination_mode or equivalent) when implementing
the restore.
- Around line 501-507: The is_primary_mode loop can hang forever; wrap the
waiting loop in a bounded timeout (e.g. using tokio::time::timeout with a
reasonable Duration like 10s) and return false if the timeout elapses, and also
make the inner wait cancellation-aware by using tokio::select! to await either
the sleep or an existing shutdown/cancellation token (or future) if one exists;
modify async fn is_primary_mode(&self) to use
tokio::time::timeout(Duration::from_secs(...), async { loop { ... } }).await
handling the Err(timeout) case and/or the cancellation branch so startup cannot
deadlock while polling CoordinationMode::current.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ca1d00d1-f89c-463a-a54d-3c0e34e1e4bd

📥 Commits

Reviewing files that changed from the base of the PR and between ac36179 and cb607bd.

📒 Files selected for processing (1)
  • magicblock-task-scheduler/src/service.rs

Comment thread magicblock-task-scheduler/src/service.rs Outdated
Comment thread magicblock-task-scheduler/src/service.rs Outdated
@Dodecahedr0x Dodecahedr0x marked this pull request as ready for review May 21, 2026 10:06
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
magicblock-task-scheduler/src/service.rs (1)

762-775: ⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy lift

Fix is_primary_mode placement and swap the misleading test names

  • In magicblock-task-scheduler/src/service.rs, async fn is_primary_mode(&self) -> bool is currently declared inside fn next_execution_millis(...) (nested item), so it cannot be an associated TaskSchedulerService method called via self.is_primary_mode(). Move is_primary_mode into impl TaskSchedulerService (or a separate impl block) and close next_execution_millis after returning its i64.
  • The test names are swapped around the current test_stale_crank_completion_* / test_failed_records_* definitions: the first test is exercising periodic cleanup, and the second test is exercising “stale crank completion does not mutate replacement”.
♻️ Suggested layout
impl TaskSchedulerService {
    // ... existing methods ...

    /// Waits until the coordination mode is not StartingUp.
    /// Should be fast because task scheduler is started after the ledger replay completes.
    async fn is_primary_mode(&self) -> bool {
        let mut mode = CoordinationMode::current();
        while mode == CoordinationMode::StartingUp {
            tokio::select! {
                _ = self.token.cancelled() => return false,
                _ = tokio::time::sleep(Duration::from_millis(100)) => {}
            }
            mode = CoordinationMode::current();
        }
        mode == CoordinationMode::Primary
    }
}

fn next_execution_millis(
    last_execution_millis: i64,
    execution_interval_millis: i64,
    now: i64,
) -> i64 {
    if last_execution_millis == 0 {
        now
    } else {
        last_execution_millis + execution_interval_millis
    }
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@magicblock-task-scheduler/src/service.rs` around lines 762 - 775, The
is_primary_mode async fn is currently nested inside next_execution_millis which
prevents calling self.is_primary_mode(); move the async fn
is_primary_mode(&self) into the impl TaskSchedulerService (or its own impl) so
it becomes an associated method, and ensure next_execution_millis(...) is a
standalone free function that returns its i64 before the impl block closes; also
rename/swap the two test identifiers so the test exercising periodic cleanup is
named test_stale_crank_completion_* and the one exercising “stale crank
completion does not mutate replacement” is named test_failed_records_* to match
their behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@magicblock-task-scheduler/src/service.rs`:
- Line 1004: The two test functions' identifiers are swapped: the function
currently named test_stale_crank_completion_does_not_mutate_replaced_task
actually tests periodic cleanup, and the function named
test_failed_records_are_cleaned_up_periodically actually tests stale-crank
handling. Rename the functions so the names match their behavior — change the
function currently declared as
test_stale_crank_completion_does_not_mutate_replaced_task to
test_failed_records_are_cleaned_up_periodically, and change the one declared as
test_failed_records_are_cleaned_up_periodically to
test_stale_crank_completion_does_not_mutate_replaced_task — keeping their
bodies, assertions, and any attributes unchanged (refer to the two test fn
symbols to locate and swap the identifiers).

---

Outside diff comments:
In `@magicblock-task-scheduler/src/service.rs`:
- Around line 762-775: The is_primary_mode async fn is currently nested inside
next_execution_millis which prevents calling self.is_primary_mode(); move the
async fn is_primary_mode(&self) into the impl TaskSchedulerService (or its own
impl) so it becomes an associated method, and ensure next_execution_millis(...)
is a standalone free function that returns its i64 before the impl block closes;
also rename/swap the two test identifiers so the test exercising periodic
cleanup is named test_stale_crank_completion_* and the one exercising “stale
crank completion does not mutate replacement” is named test_failed_records_* to
match their behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7b2b401f-742a-4009-8732-f044b43a254a

📥 Commits

Reviewing files that changed from the base of the PR and between bc9162e and 9e3855e.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (1)
  • magicblock-task-scheduler/src/service.rs

Comment thread magicblock-task-scheduler/src/service.rs
Copy link
Copy Markdown
Contributor

@taco-paco taco-paco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comment on design, but never the less this should work as well

pub async fn start(
mut self,
) -> TaskSchedulerResult<JoinHandle<TaskSchedulerResult<()>>> {
if self.is_primary_mode().await {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest doing it on calling site level(magic_validator.rs). It seems like task scheduler wether works or not, so from point crank service it shouldn't be aware of mode at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: crank is not aware of coordination mode

2 participants