Skip to content

Refactoring of the code maintaining ClusterNodes#6466

Open
fulmicoton wants to merge 9 commits into
mainfrom
paul.masurel/nodeid_as_arc_str
Open

Refactoring of the code maintaining ClusterNodes#6466
fulmicoton wants to merge 9 commits into
mainfrom
paul.masurel/nodeid_as_arc_str

Conversation

@fulmicoton
Copy link
Copy Markdown
Collaborator

@fulmicoton fulmicoton commented May 26, 2026

  • we now rely on Arc instead of String for NodeId. Chitchat now emits Arc, making cloning cheap
  • nesting ClusterMember within ClusterNode. Both struct had a confusing relationship. Now the relationship is clearer. ClusterNode is Member + a channel.
  • as a legacy code, we used to maintain a watch channel with the list of ClusterMember. That channel was maintained by own task, generating copies of all members upon cluster modifications. These members were only used in unit test. This PR removes this code that was marked as "to deprecate"

…er fields via Deref

Methods removed: is_ready, grpc_advertise_addr, ingester_status, indexing_tasks, indexing_capacity.
All call sites updated to access fields directly through the existing Deref<Target = ClusterMember>.
Note: indexing_capacity() was an alias for the indexing_cpu_capacity field.
The deprecated watch channel and background task maintained a redundant copy
of the ready member list. The test-only helpers (ready_members, wait_for_ready_members)
are reimplemented on top of the existing change_stream(), which already provides
consistent snapshots plus incremental updates under the write lock.
The test_single_node_cluster_readiness test is updated accordingly.
@fulmicoton fulmicoton changed the title Using Arc<str> for node ids. Refactoring of the code maintaining ClusterNodes May 26, 2026
@fulmicoton fulmicoton requested a review from guilload May 26, 2026 15:50
@fulmicoton fulmicoton marked this pull request as ready for review May 26, 2026 15:51
@fulmicoton fulmicoton requested review from a team as code owners May 26, 2026 15:51
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3496c2d5d0

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +127 to +131
}
let members: Vec<ClusterMember> = ready_members.values().cloned().collect();
if predicate(&members) {
return;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Evaluate readiness predicate on a full snapshot

wait_for_ready_members now runs predicate after every single replayed ClusterChange event, but change_stream() replays current ready nodes as one Add per node. That means the predicate can observe a transient partial set (e.g. return true after the first Add even though the full ready set does not satisfy it), and it is never evaluated for the initial empty set when no events arrive. This can make tests pass too early or time out incorrectly depending on predicate shape.

Useful? React with 👍 / 👎.

if node.enabled_services().contains(&QuickwitService::Indexer) =>
{
let node_id = node.node_id().to_owned();
ClusterChange::Add(node) if node.is_service_enabled(QuickwitService::Indexer) => {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ClusterChange::Add(node) if node.is_service_enabled(QuickwitService::Indexer) => {
ClusterChange::Add(node) if node.is_indexer() => {

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

https://github.com/quickwit-oss/quickwit/blob/00cae1e515334cf9282a833ee7f1ae83ba707490/quickwit-cluster/src/member.rs#L147-L148
P2 Badge Match is_ingester against Ingester service

ClusterMember::is_ingester currently checks for QuickwitService::Indexer, so any code using this helper will misclassify pure indexer nodes as ingesters and miss real ingester-only nodes. This can break service-specific control flow (e.g., ingest routing or pool membership decisions) once callers adopt the new helper; the check should use the ingester service variant instead of indexer.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors Quickwit’s cluster node/member plumbing to align with the updated chitchat API and reduce cloning overhead, while removing legacy readiness-tracking code that was only used in tests.

Changes:

  • Switch NodeId to Arc<str> (and update call sites) to make node-id cloning cheap with chitchat now emitting Arc<str>.
  • Clarify the ClusterNode/ClusterMember relationship by nesting member data within ClusterNode and updating access patterns.
  • Remove the deprecated “ready members” watch-channel task and rework readiness waiting logic/tests; bump chitchat to 0.11.0 (and lockfile updates).

Reviewed changes

Copilot reviewed 44 out of 45 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
quickwit/quickwit-serve/src/lib.rs Update cluster change handling, logging, and NodeId conversions for refactored cluster node/member types.
quickwit/quickwit-serve/src/developer_api/debug.rs Adjust debug client creation to new ClusterNode field access / NodeId type.
quickwit/quickwit-serve/src/datafusion_api/setup.rs Update pool change logic to use grpc_advertise_addr field access.
quickwit/quickwit-proto/src/types/mod.rs Refactor NodeId to Arc<str> and adapt trait impls accordingly.
quickwit/quickwit-metastore/src/tests/list_splits.rs Update test to construct NodeId via new API.
quickwit/quickwit-janitor/src/actors/delete_task_pipeline.rs Update pipeline id node-id construction to new NodeId API.
quickwit/quickwit-integration-tests/src/test_utils/cluster_sandbox.rs Update test node config to build NodeId via new API.
quickwit/quickwit-ingest/src/ingest_v2/workbench.rs Update tests to use new NodeId constructor.
quickwit/quickwit-ingest/src/ingest_v2/state.rs Update tests to use new NodeId constructor.
quickwit/quickwit-ingest/src/ingest_v2/routing_table.rs Update NodeId conversions and test fixtures to new API.
quickwit/quickwit-ingest/src/ingest_v2/router.rs Update NodeId conversions and proto string conversions where needed.
quickwit/quickwit-ingest/src/ingest_v2/replication.rs Update (leader/follower) NodeId conversions and proto fields to strings.
quickwit/quickwit-ingest/src/ingest_v2/models.rs Update tests to use new NodeId constructor.
quickwit/quickwit-ingest/src/ingest_v2/ingester.rs Update self-node id acquisition to new NodeId API usage.
quickwit/quickwit-ingest/src/ingest_v2/fetch.rs Update NodeId handling in tests and ingester pool interactions.
quickwit/quickwit-ingest/src/ingest_v2/broadcast/local_shards.rs Update event parsing to construct NodeId via new API.
quickwit/quickwit-ingest/src/ingest_v2/broadcast/capacity_score.rs Update event parsing to construct NodeId via new API.
quickwit/quickwit-indexing/src/test_utils.rs Update test sandbox node-id creation to new NodeId API.
quickwit/quickwit-indexing/src/source/mod.rs Update tests for pipeline ids to use new NodeId constructor.
quickwit/quickwit-indexing/src/source/ingest/mod.rs Update self node id acquisition and shard routing NodeId conversions.
quickwit/quickwit-indexing/src/merge_policy/mod.rs Update tests to use new NodeId constructor.
quickwit/quickwit-indexing/src/actors/uploader.rs Update tests to use new NodeId constructor.
quickwit/quickwit-indexing/src/actors/packager.rs Update tests to use new NodeId constructor.
quickwit/quickwit-indexing/src/actors/merge_planner.rs Update pipeline membership checks and tests for NodeId API changes.
quickwit/quickwit-indexing/src/actors/merge_pipeline.rs Update tests to use new NodeId constructor.
quickwit/quickwit-indexing/src/actors/merge_executor.rs Update split metadata conversion to build NodeId via new API.
quickwit/quickwit-indexing/src/actors/indexing_service.rs Update tests to use new NodeId constructor.
quickwit/quickwit-indexing/src/actors/indexing_pipeline.rs Update tests to use new NodeId constructor.
quickwit/quickwit-indexing/src/actors/indexer.rs Update tests to use new NodeId constructor.
quickwit/quickwit-indexing/failpoints/mod.rs Update tests to use new NodeId constructor.
quickwit/quickwit-control-plane/src/tests.rs Update test change-stream logic for new cluster node/member APIs.
quickwit/quickwit-control-plane/src/model/shard_table.rs Update unavailable-leader checks and tests for NodeId API changes.
quickwit/quickwit-control-plane/src/ingest/ingest_controller.rs Update NodeId conversions, pool lookups, and tests for new NodeId API.
quickwit/quickwit-control-plane/src/indexing_scheduler/scheduling/mod.rs Update tests to use new NodeId constructor.
quickwit/quickwit-control-plane/src/indexing_scheduler/mod.rs Adjust comparisons to account for NodeId type changes.
quickwit/quickwit-control-plane/src/control_plane.rs Update NodeId conversions and readiness-trigger logic field accesses.
quickwit/quickwit-config/src/node_config/serialize.rs Update node-id resolution/serialization and tests for new NodeId type.
quickwit/quickwit-cluster/src/node.rs Make ClusterNode wrap ClusterMember and adjust service helpers/fields.
quickwit/quickwit-cluster/src/member.rs Add service helper methods; build member using Arc<str> node ids from chitchat.
quickwit/quickwit-cluster/src/grpc_service.rs Ensure protobuf IDs use owned strings given chitchat node-id type changes.
quickwit/quickwit-cluster/src/grpc_gossip.rs Update gRPC gossip encoding/decoding for Arc<str> node ids.
quickwit/quickwit-cluster/src/cluster.rs Remove deprecated ready-members watch channel; rework readiness waiting for testsuite.
quickwit/quickwit-cluster/src/change.rs Update change computation to use Arc<str>/NodeId refactoring and readiness field access.
quickwit/Cargo.toml Bump chitchat dependency to 0.11.0.
quickwit/Cargo.lock Lockfile updates from dependency bump(s).
Comments suppressed due to low confidence (1)

quickwit/quickwit-serve/src/lib.rs:333

  • cluster.self_node_id().to_owned() will allocate/copy the node id because NodeIdRef::to_owned() has to build a fresh Arc<str> from &str. Since ChitchatId::node_id is already an Arc<str> (and cheap to clone), consider building the NodeId via NodeId::from_arc_str(cluster.self_chitchat_id().node_id.clone()) (or adding a helper on Cluster) to better realize the intended cheap-clone benefit of Arc<str>.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +113 to +116
let mut change_stream = self.change_stream();
let mut ready_members: BTreeMap<NodeId, ClusterMember> = BTreeMap::new();
timeout(timeout_after, async move {
while let Some(change) = change_stream.next().await {
.expect("`chitchat_id` should be a required field");
let chitchat_id = ChitchatId {
node_id: proto_chitchat_id.node_id.clone(),
node_id: Arc::from(proto_chitchat_id.node_id.as_str()),
idle_shard_timeout: Duration,
) -> IngestV2Result<Self> {
let self_node_id: NodeId = cluster.self_node_id().into();
let self_node_id: NodeId = cluster.self_node_id().to_owned();
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants