Refactoring of the code maintaining ClusterNodes#6466
Conversation
…er fields via Deref Methods removed: is_ready, grpc_advertise_addr, ingester_status, indexing_tasks, indexing_capacity. All call sites updated to access fields directly through the existing Deref<Target = ClusterMember>. Note: indexing_capacity() was an alias for the indexing_cpu_capacity field.
The deprecated watch channel and background task maintained a redundant copy of the ready member list. The test-only helpers (ready_members, wait_for_ready_members) are reimplemented on top of the existing change_stream(), which already provides consistent snapshots plus incremental updates under the write lock. The test_single_node_cluster_readiness test is updated accordingly.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3496c2d5d0
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| } | ||
| let members: Vec<ClusterMember> = ready_members.values().cloned().collect(); | ||
| if predicate(&members) { | ||
| return; | ||
| } |
There was a problem hiding this comment.
Evaluate readiness predicate on a full snapshot
wait_for_ready_members now runs predicate after every single replayed ClusterChange event, but change_stream() replays current ready nodes as one Add per node. That means the predicate can observe a transient partial set (e.g. return true after the first Add even though the full ready set does not satisfy it), and it is never evaluated for the initial empty set when no events arrive. This can make tests pass too early or time out incorrectly depending on predicate shape.
Useful? React with 👍 / 👎.
| if node.enabled_services().contains(&QuickwitService::Indexer) => | ||
| { | ||
| let node_id = node.node_id().to_owned(); | ||
| ClusterChange::Add(node) if node.is_service_enabled(QuickwitService::Indexer) => { |
There was a problem hiding this comment.
| ClusterChange::Add(node) if node.is_service_enabled(QuickwitService::Indexer) => { | |
| ClusterChange::Add(node) if node.is_indexer() => { |
There was a problem hiding this comment.
💡 Codex Review
https://github.com/quickwit-oss/quickwit/blob/00cae1e515334cf9282a833ee7f1ae83ba707490/quickwit-cluster/src/member.rs#L147-L148
Match is_ingester against Ingester service
ClusterMember::is_ingester currently checks for QuickwitService::Indexer, so any code using this helper will misclassify pure indexer nodes as ingesters and miss real ingester-only nodes. This can break service-specific control flow (e.g., ingest routing or pool membership decisions) once callers adopt the new helper; the check should use the ingester service variant instead of indexer.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
This PR refactors Quickwit’s cluster node/member plumbing to align with the updated chitchat API and reduce cloning overhead, while removing legacy readiness-tracking code that was only used in tests.
Changes:
- Switch
NodeIdtoArc<str>(and update call sites) to make node-id cloning cheap withchitchatnow emittingArc<str>. - Clarify the
ClusterNode/ClusterMemberrelationship by nesting member data withinClusterNodeand updating access patterns. - Remove the deprecated “ready members” watch-channel task and rework readiness waiting logic/tests; bump
chitchatto0.11.0(and lockfile updates).
Reviewed changes
Copilot reviewed 44 out of 45 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| quickwit/quickwit-serve/src/lib.rs | Update cluster change handling, logging, and NodeId conversions for refactored cluster node/member types. |
| quickwit/quickwit-serve/src/developer_api/debug.rs | Adjust debug client creation to new ClusterNode field access / NodeId type. |
| quickwit/quickwit-serve/src/datafusion_api/setup.rs | Update pool change logic to use grpc_advertise_addr field access. |
| quickwit/quickwit-proto/src/types/mod.rs | Refactor NodeId to Arc<str> and adapt trait impls accordingly. |
| quickwit/quickwit-metastore/src/tests/list_splits.rs | Update test to construct NodeId via new API. |
| quickwit/quickwit-janitor/src/actors/delete_task_pipeline.rs | Update pipeline id node-id construction to new NodeId API. |
| quickwit/quickwit-integration-tests/src/test_utils/cluster_sandbox.rs | Update test node config to build NodeId via new API. |
| quickwit/quickwit-ingest/src/ingest_v2/workbench.rs | Update tests to use new NodeId constructor. |
| quickwit/quickwit-ingest/src/ingest_v2/state.rs | Update tests to use new NodeId constructor. |
| quickwit/quickwit-ingest/src/ingest_v2/routing_table.rs | Update NodeId conversions and test fixtures to new API. |
| quickwit/quickwit-ingest/src/ingest_v2/router.rs | Update NodeId conversions and proto string conversions where needed. |
| quickwit/quickwit-ingest/src/ingest_v2/replication.rs | Update (leader/follower) NodeId conversions and proto fields to strings. |
| quickwit/quickwit-ingest/src/ingest_v2/models.rs | Update tests to use new NodeId constructor. |
| quickwit/quickwit-ingest/src/ingest_v2/ingester.rs | Update self-node id acquisition to new NodeId API usage. |
| quickwit/quickwit-ingest/src/ingest_v2/fetch.rs | Update NodeId handling in tests and ingester pool interactions. |
| quickwit/quickwit-ingest/src/ingest_v2/broadcast/local_shards.rs | Update event parsing to construct NodeId via new API. |
| quickwit/quickwit-ingest/src/ingest_v2/broadcast/capacity_score.rs | Update event parsing to construct NodeId via new API. |
| quickwit/quickwit-indexing/src/test_utils.rs | Update test sandbox node-id creation to new NodeId API. |
| quickwit/quickwit-indexing/src/source/mod.rs | Update tests for pipeline ids to use new NodeId constructor. |
| quickwit/quickwit-indexing/src/source/ingest/mod.rs | Update self node id acquisition and shard routing NodeId conversions. |
| quickwit/quickwit-indexing/src/merge_policy/mod.rs | Update tests to use new NodeId constructor. |
| quickwit/quickwit-indexing/src/actors/uploader.rs | Update tests to use new NodeId constructor. |
| quickwit/quickwit-indexing/src/actors/packager.rs | Update tests to use new NodeId constructor. |
| quickwit/quickwit-indexing/src/actors/merge_planner.rs | Update pipeline membership checks and tests for NodeId API changes. |
| quickwit/quickwit-indexing/src/actors/merge_pipeline.rs | Update tests to use new NodeId constructor. |
| quickwit/quickwit-indexing/src/actors/merge_executor.rs | Update split metadata conversion to build NodeId via new API. |
| quickwit/quickwit-indexing/src/actors/indexing_service.rs | Update tests to use new NodeId constructor. |
| quickwit/quickwit-indexing/src/actors/indexing_pipeline.rs | Update tests to use new NodeId constructor. |
| quickwit/quickwit-indexing/src/actors/indexer.rs | Update tests to use new NodeId constructor. |
| quickwit/quickwit-indexing/failpoints/mod.rs | Update tests to use new NodeId constructor. |
| quickwit/quickwit-control-plane/src/tests.rs | Update test change-stream logic for new cluster node/member APIs. |
| quickwit/quickwit-control-plane/src/model/shard_table.rs | Update unavailable-leader checks and tests for NodeId API changes. |
| quickwit/quickwit-control-plane/src/ingest/ingest_controller.rs | Update NodeId conversions, pool lookups, and tests for new NodeId API. |
| quickwit/quickwit-control-plane/src/indexing_scheduler/scheduling/mod.rs | Update tests to use new NodeId constructor. |
| quickwit/quickwit-control-plane/src/indexing_scheduler/mod.rs | Adjust comparisons to account for NodeId type changes. |
| quickwit/quickwit-control-plane/src/control_plane.rs | Update NodeId conversions and readiness-trigger logic field accesses. |
| quickwit/quickwit-config/src/node_config/serialize.rs | Update node-id resolution/serialization and tests for new NodeId type. |
| quickwit/quickwit-cluster/src/node.rs | Make ClusterNode wrap ClusterMember and adjust service helpers/fields. |
| quickwit/quickwit-cluster/src/member.rs | Add service helper methods; build member using Arc<str> node ids from chitchat. |
| quickwit/quickwit-cluster/src/grpc_service.rs | Ensure protobuf IDs use owned strings given chitchat node-id type changes. |
| quickwit/quickwit-cluster/src/grpc_gossip.rs | Update gRPC gossip encoding/decoding for Arc<str> node ids. |
| quickwit/quickwit-cluster/src/cluster.rs | Remove deprecated ready-members watch channel; rework readiness waiting for testsuite. |
| quickwit/quickwit-cluster/src/change.rs | Update change computation to use Arc<str>/NodeId refactoring and readiness field access. |
| quickwit/Cargo.toml | Bump chitchat dependency to 0.11.0. |
| quickwit/Cargo.lock | Lockfile updates from dependency bump(s). |
Comments suppressed due to low confidence (1)
quickwit/quickwit-serve/src/lib.rs:333
cluster.self_node_id().to_owned()will allocate/copy the node id becauseNodeIdRef::to_owned()has to build a freshArc<str>from&str. SinceChitchatId::node_idis already anArc<str>(and cheap to clone), consider building theNodeIdviaNodeId::from_arc_str(cluster.self_chitchat_id().node_id.clone())(or adding a helper onCluster) to better realize the intended cheap-clone benefit ofArc<str>.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let mut change_stream = self.change_stream(); | ||
| let mut ready_members: BTreeMap<NodeId, ClusterMember> = BTreeMap::new(); | ||
| timeout(timeout_after, async move { | ||
| while let Some(change) = change_stream.next().await { |
| .expect("`chitchat_id` should be a required field"); | ||
| let chitchat_id = ChitchatId { | ||
| node_id: proto_chitchat_id.node_id.clone(), | ||
| node_id: Arc::from(proto_chitchat_id.node_id.as_str()), |
| idle_shard_timeout: Duration, | ||
| ) -> IngestV2Result<Self> { | ||
| let self_node_id: NodeId = cluster.self_node_id().into(); | ||
| let self_node_id: NodeId = cluster.self_node_id().to_owned(); |
Uh oh!
There was an error while loading. Please reload this page.