Skip to content

fix: various correctness related enhancements #1248

Merged
lucacillario merged 7 commits into
masterfrom
bmuddha/fixes/correctness
May 27, 2026
Merged

fix: various correctness related enhancements #1248
lucacillario merged 7 commits into
masterfrom
bmuddha/fixes/correctness

Conversation

@bmuddha
Copy link
Copy Markdown
Collaborator

@bmuddha bmuddha commented May 26, 2026

Summary by CodeRabbit

  • New Features

    • Added COMMIT_CANCELLED transaction error support.
    • Replicator now derives producer/consumer identifiers from validator identity.
  • Bug Fixes

    • Fixed storage allocation cursor rollback on capacity overflow.
    • Shutdown ordering improved to flush durable state after worker joins.
    • Ledger/write ordering and snapshot pause adjusted to reduce race conditions.
    • Checksum logic now skips confined accounts.
    • Channel paths now apply backpressure (bounded queues).
  • Validation

    • v0 transactions with address table lookups are rejected.
  • Tests

    • Added manual-slot test environment and deterministic slot-notification test.

Review Change Stack

@bmuddha bmuddha self-assigned this May 26, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 26, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 77de2bde-8c14-4a12-baea-006207b8274f

📥 Commits

Reviewing files that changed from the base of the PR and between c380fb5 and 9c4c53e.

⛔ Files ignored due to path filters (2)
  • Cargo.lock is excluded by !**/*.lock
  • test-integration/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (16)
  • Cargo.toml
  • magicblock-account-cloner/src/lib.rs
  • magicblock-accounts-db/src/lib.rs
  • magicblock-accounts-db/src/storage.rs
  • magicblock-aperture/src/requests/http/mod.rs
  • magicblock-aperture/tests/setup.rs
  • magicblock-aperture/tests/websocket.rs
  • magicblock-api/src/magic_validator.rs
  • magicblock-core/src/link.rs
  • magicblock-ledger/src/store/api.rs
  • magicblock-processor/src/scheduler/mod.rs
  • magicblock-replicator/Cargo.toml
  • magicblock-replicator/src/service/context.rs
  • magicblock-replicator/src/service/mod.rs
  • storage-proto/proto/transaction_by_addr.proto
  • storage-proto/src/convert.rs
💤 Files with no reviewable changes (1)
  • Cargo.toml

📝 Walkthrough

Walkthrough

This PR replaces machine-id-based validator identity with identifiers derived from the validator public key and threads the validator identity into the replication service. It threads a WithEncoded wrapper through account cloner send paths, adds COMMIT_CANCELLED to the transaction error proto and codec with tests, replaces program-id-index validation with v0 transaction shape validation that rejects address-table lookups, fixes allocation cursor leaks and skips confined accounts in checksum, converts some channels to bounded for backpressure, refactors scheduler snapshot pausing and block finalization ordering, sequences durable-state flush after worker shutdown, and adds a manual-slot test env used by a deterministic slot subscription test.

Suggested reviewers

  • GabrielePicco
  • lucacillario
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch bmuddha/fixes/correctness

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@bmuddha bmuddha changed the title draft: fix: various correctness related enhancements fix: various correctness related enhancements May 26, 2026
@bmuddha bmuddha marked this pull request as draft May 26, 2026 14:12
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
magicblock-api/src/magic_validator.rs (1)

244-261: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Replace the expect on the replication channel path.

Line 246 still panics during validator startup if dispatch.replication_messages is missing. This wiring error should return an initialization error, not abort the process.

As per coding guidelines, Treat any usage of .unwrap() or .expect() in production Rust code as a MAJOR issue.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@magicblock-api/src/magic_validator.rs` around lines 244 - 261, The code
panics by calling expect on dispatch.replication_messages in the
replication_service initialization; change this to return an initialization
error instead of aborting: check dispatch.replication_messages with if let
Some(messages_rx) = dispatch.replication_messages.take() (or match), and if None
return Err(...) (or propagate a suitable InitError/anyhow::Error) from the
surrounding function with a clear message referencing the missing replication
channel; then pass messages_rx into ReplicationService::new as before. Ensure
you update the function signature to return a Result if needed and reference
dispatch.replication_messages, ReplicationService::new, and replication_service
in your fix.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@magicblock-accounts-db/src/storage.rs`:
- Line 239: The unconditional header.write_cursor.store(start_index,
Ordering::Relaxed) can race with other concurrent allocations; instead only roll
back the cursor if it still equals the value returned by the earlier fetch_add
reservation. Replace the unconditional store with a conditional compare_exchange
on header.write_cursor using the expected post-fetch_add value (the value
returned by fetch_add) so the reserving thread only restores start_index when
compare_exchange succeeds; keep appropriate memory orderings consistent with the
original fetch_add/loads.

In `@magicblock-api/src/magic_validator.rs`:
- Around line 1022-1038: The replication thread (self.replication_handle) must
be stopped and joined before flushing or shutting down durable state; change the
shutdown sequence so you first request cancellation of the replication service,
wait for and join/await self.replication_handle to complete (handle and log any
join/await errors), and only then call self.accountsdb.flush(),
self.ledger.flush(), and self.ledger.shutdown(true); apply the same reorder/fix
for the later shutdown block around lines 1040-1051 so no replication mutations
occur while flushing or shutting down the ledger/accountsdb.

In `@magicblock-processor/src/scheduler/mod.rs`:
- Around line 576-583: The code currently swallows errors from
self.ledger.write_block in the primary path and then always advances the
accountsdb slot via self.accountsdb.set_slot(slot); change the control flow so
that when self.coordinator.is_primary() and self.ledger.write_block(block)
returns Err you DO NOT call self.accountsdb.set_slot(slot) — instead return or
propagate the error (or early-return) so the caller can short-circuit
update_sysvars/metrics; specifically replace the inspect_err swallow on
self.ledger.write_block with a match or ? propagation on the Result from
self.ledger.write_block, and only call self.accountsdb.set_slot(slot) on Ok,
ensuring update_sysvars is skipped when finalization/persistence fails.

---

Outside diff comments:
In `@magicblock-api/src/magic_validator.rs`:
- Around line 244-261: The code panics by calling expect on
dispatch.replication_messages in the replication_service initialization; change
this to return an initialization error instead of aborting: check
dispatch.replication_messages with if let Some(messages_rx) =
dispatch.replication_messages.take() (or match), and if None return Err(...) (or
propagate a suitable InitError/anyhow::Error) from the surrounding function with
a clear message referencing the missing replication channel; then pass
messages_rx into ReplicationService::new as before. Ensure you update the
function signature to return a Result if needed and reference
dispatch.replication_messages, ReplicationService::new, and replication_service
in your fix.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6836f23c-1f5c-4481-a620-f94ec3f09ad2

📥 Commits

Reviewing files that changed from the base of the PR and between f9e4334 and 7018744.

⛔ Files ignored due to path filters (2)
  • Cargo.lock is excluded by !**/*.lock
  • test-integration/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (15)
  • Cargo.toml
  • magicblock-account-cloner/src/lib.rs
  • magicblock-accounts-db/src/lib.rs
  • magicblock-accounts-db/src/storage.rs
  • magicblock-aperture/src/requests/http/mod.rs
  • magicblock-aperture/tests/websocket.rs
  • magicblock-api/src/magic_validator.rs
  • magicblock-core/src/link.rs
  • magicblock-ledger/src/store/api.rs
  • magicblock-processor/src/scheduler/mod.rs
  • magicblock-replicator/Cargo.toml
  • magicblock-replicator/src/service/context.rs
  • magicblock-replicator/src/service/mod.rs
  • storage-proto/proto/transaction_by_addr.proto
  • storage-proto/src/convert.rs
💤 Files with no reviewable changes (1)
  • Cargo.toml

Comment thread magicblock-accounts-db/src/storage.rs Outdated
Comment thread magicblock-api/src/magic_validator.rs
Comment thread magicblock-processor/src/scheduler/mod.rs
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
magicblock-processor/src/scheduler/mod.rs (1)

577-584: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Still abort the slot transition on primary ledger write failure.

This now avoids advancing accountsdb, but transition_to_new_slot() still always runs update_sysvars() and metrics::set_slot() at Lines 513-515. A failed primary write_block() therefore still advances the scheduler’s in-memory slot/hash state and can even drive later snapshotting from an unpersisted block. Have finalize_block() return a Result/bool and short-circuit the caller on failure.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@magicblock-processor/src/scheduler/mod.rs` around lines 577 - 584,
finalize_block currently swallows ledger write failures and still lets
transition_to_new_slot() run update_sysvars() and metrics::set_slot(), so change
finalize_block to return Result<(), Error> (or bool) and propagate the error to
the caller; inside finalize_block ensure coordinator.is_primary() checks call
ledger.write_block(block) and return Err/false on failure (include the existing
inspect_err logging), and only when Ok(true) call accountsdb.set_slot(...) and
allow transition_to_new_slot() to run; update the caller to short-circuit
(return early) when finalize_block returns Err/false so update_sysvars(),
metrics::set_slot(), and any snapshot-driving logic do not run after a failed
write.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@magicblock-processor/src/scheduler/mod.rs`:
- Around line 577-584: finalize_block currently swallows ledger write failures
and still lets transition_to_new_slot() run update_sysvars() and
metrics::set_slot(), so change finalize_block to return Result<(), Error> (or
bool) and propagate the error to the caller; inside finalize_block ensure
coordinator.is_primary() checks call ledger.write_block(block) and return
Err/false on failure (include the existing inspect_err logging), and only when
Ok(true) call accountsdb.set_slot(...) and allow transition_to_new_slot() to
run; update the caller to short-circuit (return early) when finalize_block
returns Err/false so update_sysvars(), metrics::set_slot(), and any
snapshot-driving logic do not run after a failed write.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 81b48d06-487e-48a4-bf9a-072ccb3956fb

📥 Commits

Reviewing files that changed from the base of the PR and between 7018744 and c380fb5.

📒 Files selected for processing (5)
  • magicblock-accounts-db/src/storage.rs
  • magicblock-aperture/tests/setup.rs
  • magicblock-aperture/tests/websocket.rs
  • magicblock-api/src/magic_validator.rs
  • magicblock-processor/src/scheduler/mod.rs

@bmuddha bmuddha marked this pull request as ready for review May 26, 2026 16:55
@bmuddha bmuddha force-pushed the bmuddha/fixes/correctness branch from c380fb5 to 9c4c53e Compare May 27, 2026 05:19
@lucacillario lucacillario merged commit 4ba4a87 into master May 27, 2026
35 checks passed
@lucacillario lucacillario deleted the bmuddha/fixes/correctness branch May 27, 2026 08:44
thlorenz added a commit that referenced this pull request May 27, 2026
* master:
  fix: various correctness related enhancements  (#1248)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants