feat(cluster): implement replica bootstrap by numinnex · Pull Request #3163 · apache/iggy

numinnex · 2026-04-24T15:12:26Z

Implement Replica bootstrap logic. Currently it does not handle cases where replicas are out of sync (needs State Transfer to be implemented).

codecov · 2026-04-24T15:27:48Z

Codecov Report

❌ Patch coverage is 9.07850% with 1332 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.09%. Comparing base (78f447e) to head (59a9231).
⚠️ Report is 4 commits behind head on master.

Files with missing lines	Patch %	Lines
core/server-ng/src/bootstrap.rs	0.00%	1142 Missing ⚠️
core/server-ng/src/config_writer.rs	0.00%	71 Missing ⚠️
core/server-ng/src/main.rs	0.00%	33 Missing ⚠️
core/shard/src/router.rs	0.00%	17 Missing ⚠️
core/common/src/types/streaming_stats.rs	50.00%	12 Missing ⚠️
core/configs/src/server_ng_config/validators.rs	41.17%	7 Missing and 3 partials ⚠️
core/consensus/src/impls.rs	0.00%	10 Missing ⚠️
core/configs/src/server_ng_config/displays.rs	0.00%	7 Missing ⚠️
core/partitions/src/iggy_index_writer.rs	0.00%	7 Missing ⚠️
core/partitions/src/messages_writer.rs	0.00%	6 Missing ⚠️
... and 5 more

❌ Your patch check has failed because the patch coverage (9.07%) is below the target coverage (50.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@             Coverage Diff              @@
##             master    #3163      +/-   ##
============================================
- Coverage     74.43%   71.09%   -3.35%     
  Complexity      943      943              
============================================
  Files          1186     1190       +4     
  Lines        106343   102827    -3516     
  Branches      83377    79873    -3504     
============================================
- Hits          79161    73107    -6054     
- Misses        24434    26769    +2335     
- Partials       2748     2951     +203

Components	Coverage Δ
Rust Core	`71.42% <9.07%> (-4.30%)`	⬇️
Java SDK	`60.14% <ø> (ø)`
C# SDK	`69.07% <ø> (-0.31%)`	⬇️
Python SDK	`81.43% <ø> (ø)`
Node SDK	`91.44% <ø> (+0.03%)`	⬆️
Go SDK	`39.80% <ø> (+0.20%)`	⬆️

Files with missing lines	Coverage Δ
core/configs/src/server_config/server.rs	`80.28% <100.00%> (+6.20%)`	⬆️
core/configs/src/server_ng_config/defaults.rs	`100.00% <100.00%> (ø)`
core/configs/src/server_ng_config/server_ng.rs	`47.05% <100.00%> (+6.07%)`	⬆️
core/message_bus/src/replica/io.rs	`83.09% <100.00%> (+0.18%)`	⬆️
core/metadata/src/stm/mod.rs	`41.50% <100.00%> (+1.70%)`	⬆️
core/metadata/src/stm/mux.rs	`92.63% <100.00%> (+0.96%)`	⬆️
core/partitions/src/lib.rs	`0.00% <ø> (ø)`
core/common/src/sharding/namespace.rs	`95.00% <93.75%> (+29.37%)`	⬆️
core/message_bus/src/lib.rs	`92.08% <0.00%> (-0.82%)`	⬇️
core/metadata/src/impls/recovery.rs	`74.12% <88.88%> (+3.49%)`	⬆️
... and 12 more

... and 123 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

hubcio

out of diff findings:

core/partitions/src/messages_writer.rs:60-63 — let _ = file.sync_all().await.map_err(...) swallows IO error on the file_exists open path. EIO at boot indicates dying media, but bootstrap proceeds with a stale view of disk. on master HEAD already, but server-ng makes this path live for every partition. propagate the error.

core/partitions/src/iggy_index_writer.rs:53 — let _ = file.sync_all().await; same pattern, no map_err at all. propagate.

core/partitions/src/messages_writer.rs:137 — let chunk_vec: Vec<_> = chunk.to_vec(); allocates a Vec per chunk per save_frozen_batches. steady-state per-batch alloc on what is now the primary write path. cache a reusable Vec in the writer or refactor the compio iov call to take a borrowed slice.

core/shard/src/lib.rs:622 — let namespaces: Vec<_> = planes.1.0.namespaces().copied().collect(); allocates a fresh Vec per inbox-frame iteration. core/shard/src/router.rs:270 calls process_loopback after every dispatched frame in steady state. hoist a namespaces_buf: Vec<u64> like the existing loopback_buf, or short-circuit when partitions / loopback drained are empty.

core/shard/src/router.rs:94-103 and 199-208 — shards_table.shard_for(...).unwrap_or_else(|| { warn!(...); 0 }) silently falls back to shard 0. single-shard server-ng masks this today; multi-shard makes it a silent routing-bug hider. drop the fallback or fail loud.

core/binary_protocol/src/consensus/message.rs:347 — transmute_header does a 256B header copy + 256B memset + Message::try_from(owned) re-validate after the closure already wrote a known-valid header. that's a 3rd validation per request (1st transport recv, 2nd try_into_typed, 3rd here). reached per-client-message via the new make_deferred_*_handler closures at core/server-ng/src/bootstrap.rs:1086,1116. add a transmute_header_unchecked that skips the trailing TryFrom validation and rely on the closure invariant.

core/metadata/src/stm/stream.rs:615-676 — Snapshotable::to_snapshot reads round_robin_counter with Ordering::Relaxed during snapshot fill. concurrent producer increments race the snapshot read. single-shard today: not exploitable. document the invariant.

core/common/src/types/streaming_stats.rs counter ordering — counter fetch_add / fetch_sub use Ordering::AcqRel even though the counters synchronize no other data. Relaxed is correct; AcqRel forces unnecessary ldar / stlr fences on aarch64. per-batch produce path. confirmed via method names like _inconsistent and load_for_snapshot that no consumer relies on the AcqRel half.

numinnex · 2026-05-06T08:40:39Z

Addressed all of the correctness issues, I've skipped the optimizations nits, as this code will change quite a few times before it's going to be used by main server binary.

hubcio

additional findings cite code outside this PR's diff hunks:

core/journal/src/prepare_journal.rs:143 - open() does not detect WAL exceeding slot capacity

file not in this PR but the recovery rewrite exposes this gap. live append cannot wrap the 1024-slot ring (panic at prepare_journal.rs:411-443:422-431; SnapshotCoordinator::checkpoint_if_needed at core/metadata/src/impls/metadata.rs:243-260 with CHECKPOINT_MARGIN=64 forces snapshot near capacity). live runtime cannot lose ops. but the rebuild loop at lines 143-184 silently overwrites slot N % SLOT_COUNT when reading a WAL file containing more entries than the ring can hold (manual recovery, copy from peer, mismatched binary). iter_headers_from(0) then returns only the last 1024 slots. recovery.rs:142 (the recovery caller in this PR) does not detect this. fix: at open, bail when first slot's op > expected replay_from, or when slot count exceeds SLOT_COUNT. defensive; cheap.

core/consensus/src/impls.rs:732 - set_view should take &self

view: Cell<u32> (line 504). set_last_prepare_checksum(&self) (line 779) and set_log_view(&self) (line 788) take &self to mutate identical Cell fields. set_view is the only odd one out at &mut self. Cell allows mutation through shared refs; &mut is not load-bearing. caller bootstrap.rs:251 has let mut consensus = VsrConsensus::new(...) solely because of this method, then later setters work on &self. fix: pub fn set_view(&self, view: u32). one-line.

core/consensus/src/impls.rs:750 - pipeline_mut is dead code

pub const fn pipeline_mut(&mut self) -> &mut RefCell<P>. rg pipeline_mut returns 1 match, the definition itself, with no callers anywhere in core/. unused / dead code should be removed. returning &mut RefCell is doubly redundant: RefCell already provides interior mutability via borrow_mut. the author TODO at line 741 already flags the borrow-across-await footgun on the related pipeline() accessor; same risk applies here, plus the &mut requirement. fix: delete.

hubcio · 2026-05-06T10:11:04Z

+    let recovered = recover::<ServerNgMuxStateMachine>(Path::new(&config.system.path))
+        .await
+        .map_err(ServerNgError::MetadataRecovery)?;
+    let restored_op = recovered.last_applied_op.unwrap_or_else(|| {


recovery.rs:139-158 walks the WAL and applies every replayed entry, setting last_applied_op to journal tail M. bootstrap then calls restore_commit_state(M, M) declaring commit_min == commit_max == M. impls.rs:1110-1130, 1218, 1230 propagate commit_min via primary heartbeat and DVC; view_change_quorum.rs:88-95 dvc_max_commit propagates further into peer state via complete_view_change_as_primary.

WAL entries past the last quorum-confirmed op are NOT committed; bootstrap declares them so. multi-replica safety violation: a replica with uncommitted-but-replayed ops claims they're committed.

single-node trivially correct (quorum=1). latent landmine the moment cluster ships.

fix: persist a separate commit watermark (superblock) and only restore commit_min up to that watermark. replay-as-applied does not mean replay-as-committed.

We don't store the commit ranges yet persistently, I've done this with assumption that the cluster for now on bootstrap will be starting either with empty state, or where all replicas in the cluster are in exactly the same state on the shutdown/reboot moment (no state transfer support).

hubcio · 2026-05-06T10:24:33Z

 pub const STREAM_MASK: u64 = (1u64 << STREAM_BITS) - 1;

+#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+pub enum NamespaceCapacityError {


I think we should use derive Error from thiserror crate.

numinnex added 5 commits May 4, 2026 14:44

temp

a0e2860

tmp v2

df1cd74

finito

dada052

fix cargo machete & sort

06ee923

server-ng config

a1d20d7

numinnex force-pushed the replica_bootstrap branch from 39ce6cb to a1d20d7 Compare May 4, 2026 13:18

numinnex added 2 commits May 4, 2026 16:04

temp

ff8ac9c

works

991523b

hubcio requested changes May 5, 2026

View reviewed changes

numinnex and others added 2 commits May 6, 2026 10:20

Merge branch 'master' into replica_bootstrap

7437014

address code review nits

cc31a42

hubcio requested changes May 6, 2026

View reviewed changes

numinnex added 2 commits May 7, 2026 15:21

address review comments

f06d15f

slimmer error

59a9231

Conversation

numinnex commented Apr 24, 2026

Uh oh!

codecov Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

hubcio left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

numinnex commented May 6, 2026

Uh oh!

hubcio left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

hubcio May 6, 2026

Choose a reason for hiding this comment

Uh oh!

numinnex May 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

hubcio May 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented Apr 24, 2026 •

edited

Loading