fix: various correctness related enhancements #1248
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: ⛔ Files ignored due to path filters (2)
📒 Files selected for processing (16)
💤 Files with no reviewable changes (1)
📝 WalkthroughWalkthroughThis PR replaces machine-id-based validator identity with identifiers derived from the validator public key and threads the validator identity into the replication service. It threads a WithEncoded wrapper through account cloner send paths, adds COMMIT_CANCELLED to the transaction error proto and codec with tests, replaces program-id-index validation with v0 transaction shape validation that rejects address-table lookups, fixes allocation cursor leaks and skips confined accounts in checksum, converts some channels to bounded for backpressure, refactors scheduler snapshot pausing and block finalization ordering, sequences durable-state flush after worker shutdown, and adds a manual-slot test env used by a deterministic slot subscription test. Suggested reviewers
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
magicblock-api/src/magic_validator.rs (1)
244-261:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReplace the
expecton the replication channel path.Line 246 still panics during validator startup if
dispatch.replication_messagesis missing. This wiring error should return an initialization error, not abort the process.As per coding guidelines,
Treat any usage of .unwrap() or .expect() in production Rust code as a MAJOR issue.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@magicblock-api/src/magic_validator.rs` around lines 244 - 261, The code panics by calling expect on dispatch.replication_messages in the replication_service initialization; change this to return an initialization error instead of aborting: check dispatch.replication_messages with if let Some(messages_rx) = dispatch.replication_messages.take() (or match), and if None return Err(...) (or propagate a suitable InitError/anyhow::Error) from the surrounding function with a clear message referencing the missing replication channel; then pass messages_rx into ReplicationService::new as before. Ensure you update the function signature to return a Result if needed and reference dispatch.replication_messages, ReplicationService::new, and replication_service in your fix.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@magicblock-accounts-db/src/storage.rs`:
- Line 239: The unconditional header.write_cursor.store(start_index,
Ordering::Relaxed) can race with other concurrent allocations; instead only roll
back the cursor if it still equals the value returned by the earlier fetch_add
reservation. Replace the unconditional store with a conditional compare_exchange
on header.write_cursor using the expected post-fetch_add value (the value
returned by fetch_add) so the reserving thread only restores start_index when
compare_exchange succeeds; keep appropriate memory orderings consistent with the
original fetch_add/loads.
In `@magicblock-api/src/magic_validator.rs`:
- Around line 1022-1038: The replication thread (self.replication_handle) must
be stopped and joined before flushing or shutting down durable state; change the
shutdown sequence so you first request cancellation of the replication service,
wait for and join/await self.replication_handle to complete (handle and log any
join/await errors), and only then call self.accountsdb.flush(),
self.ledger.flush(), and self.ledger.shutdown(true); apply the same reorder/fix
for the later shutdown block around lines 1040-1051 so no replication mutations
occur while flushing or shutting down the ledger/accountsdb.
In `@magicblock-processor/src/scheduler/mod.rs`:
- Around line 576-583: The code currently swallows errors from
self.ledger.write_block in the primary path and then always advances the
accountsdb slot via self.accountsdb.set_slot(slot); change the control flow so
that when self.coordinator.is_primary() and self.ledger.write_block(block)
returns Err you DO NOT call self.accountsdb.set_slot(slot) — instead return or
propagate the error (or early-return) so the caller can short-circuit
update_sysvars/metrics; specifically replace the inspect_err swallow on
self.ledger.write_block with a match or ? propagation on the Result from
self.ledger.write_block, and only call self.accountsdb.set_slot(slot) on Ok,
ensuring update_sysvars is skipped when finalization/persistence fails.
---
Outside diff comments:
In `@magicblock-api/src/magic_validator.rs`:
- Around line 244-261: The code panics by calling expect on
dispatch.replication_messages in the replication_service initialization; change
this to return an initialization error instead of aborting: check
dispatch.replication_messages with if let Some(messages_rx) =
dispatch.replication_messages.take() (or match), and if None return Err(...) (or
propagate a suitable InitError/anyhow::Error) from the surrounding function with
a clear message referencing the missing replication channel; then pass
messages_rx into ReplicationService::new as before. Ensure you update the
function signature to return a Result if needed and reference
dispatch.replication_messages, ReplicationService::new, and replication_service
in your fix.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 6836f23c-1f5c-4481-a620-f94ec3f09ad2
⛔ Files ignored due to path filters (2)
Cargo.lockis excluded by!**/*.locktest-integration/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (15)
Cargo.tomlmagicblock-account-cloner/src/lib.rsmagicblock-accounts-db/src/lib.rsmagicblock-accounts-db/src/storage.rsmagicblock-aperture/src/requests/http/mod.rsmagicblock-aperture/tests/websocket.rsmagicblock-api/src/magic_validator.rsmagicblock-core/src/link.rsmagicblock-ledger/src/store/api.rsmagicblock-processor/src/scheduler/mod.rsmagicblock-replicator/Cargo.tomlmagicblock-replicator/src/service/context.rsmagicblock-replicator/src/service/mod.rsstorage-proto/proto/transaction_by_addr.protostorage-proto/src/convert.rs
💤 Files with no reviewable changes (1)
- Cargo.toml
There was a problem hiding this comment.
♻️ Duplicate comments (1)
magicblock-processor/src/scheduler/mod.rs (1)
577-584:⚠️ Potential issue | 🟠 Major | ⚡ Quick winStill abort the slot transition on primary ledger write failure.
This now avoids advancing
accountsdb, buttransition_to_new_slot()still always runsupdate_sysvars()andmetrics::set_slot()at Lines 513-515. A failed primarywrite_block()therefore still advances the scheduler’s in-memory slot/hash state and can even drive later snapshotting from an unpersisted block. Havefinalize_block()return aResult/booland short-circuit the caller on failure.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@magicblock-processor/src/scheduler/mod.rs` around lines 577 - 584, finalize_block currently swallows ledger write failures and still lets transition_to_new_slot() run update_sysvars() and metrics::set_slot(), so change finalize_block to return Result<(), Error> (or bool) and propagate the error to the caller; inside finalize_block ensure coordinator.is_primary() checks call ledger.write_block(block) and return Err/false on failure (include the existing inspect_err logging), and only when Ok(true) call accountsdb.set_slot(...) and allow transition_to_new_slot() to run; update the caller to short-circuit (return early) when finalize_block returns Err/false so update_sysvars(), metrics::set_slot(), and any snapshot-driving logic do not run after a failed write.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@magicblock-processor/src/scheduler/mod.rs`:
- Around line 577-584: finalize_block currently swallows ledger write failures
and still lets transition_to_new_slot() run update_sysvars() and
metrics::set_slot(), so change finalize_block to return Result<(), Error> (or
bool) and propagate the error to the caller; inside finalize_block ensure
coordinator.is_primary() checks call ledger.write_block(block) and return
Err/false on failure (include the existing inspect_err logging), and only when
Ok(true) call accountsdb.set_slot(...) and allow transition_to_new_slot() to
run; update the caller to short-circuit (return early) when finalize_block
returns Err/false so update_sysvars(), metrics::set_slot(), and any
snapshot-driving logic do not run after a failed write.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 81b48d06-487e-48a4-bf9a-072ccb3956fb
📒 Files selected for processing (5)
magicblock-accounts-db/src/storage.rsmagicblock-aperture/tests/setup.rsmagicblock-aperture/tests/websocket.rsmagicblock-api/src/magic_validator.rsmagicblock-processor/src/scheduler/mod.rs
c380fb5 to
9c4c53e
Compare
* master: fix: various correctness related enhancements (#1248)
Summary by CodeRabbit
New Features
Bug Fixes
Validation
Tests