Paul.masurel/search metrics#6416
Draft
fulmicoton wants to merge 12 commits intomainfrom
Draft
Conversation
09dced5 to
494c14e
Compare
There was a problem hiding this comment.
Pull request overview
This PR expands Quickwit’s search observability by introducing richer resource statistics across split/leaf/root search execution, including per-split storage download volume, and by refining Lambda invocation metrics.
Changes:
- Add a
CountingStoragewrapper to measure per-request downloaded bytes and GET request counts. - Replace/extend search “resource stats” in the search protobuf and propagate aggregated stats to root search responses.
- Track additional timing/counters in leaf search (permit wait time, wall time, lambda bottleneck) and refine Lambda metrics labeling.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
quickwit/quickwit-storage/src/lib.rs |
Exposes the new counting storage wrapper types from the storage crate. |
quickwit/quickwit-storage/src/counting_storage.rs |
Adds CountingStorage + DownloadCounters to instrument storage reads. |
quickwit/quickwit-serve/src/jaeger_api/rest_handler.rs |
Updates tests/struct construction for the new SearchResponse.resource_stats field. |
quickwit/quickwit-search/src/service.rs |
Adds resource_stats to scroll responses (currently set to None). |
quickwit/quickwit-search/src/scroll_context.rs |
Adapts scroll batching to the new (LeafSearchResponse, stats) return type. |
quickwit/quickwit-search/src/root.rs |
Computes root-level resource stats and threads them into SearchResponse. |
quickwit/quickwit-search/src/lib.rs |
Reworks resource-stats merging utilities for the new leaf/split stats model. |
quickwit/quickwit-search/src/leaf.rs |
Collects per-split stats (downloads, timings), wall time, and lambda bottleneck signals. |
quickwit/quickwit-search/src/leaf_cache.rs |
Overwrites cached responses’ stats to reflect cache-hit counters. |
quickwit/quickwit-search/src/collector.rs |
Switches resource stat merging to the new leaf/split resource stats types. |
quickwit/quickwit-search/src/cluster_client.rs |
Tracks leaf-search call counts including retries; merges updated stats on retry. |
quickwit/quickwit-proto/src/codegen/quickwit/quickwit.search.rs |
Updates generated Rust types for the modified search protobuf messages. |
quickwit/quickwit-proto/protos/quickwit/search.proto |
Introduces SplitResourceStats, LeafResourceStats, RootResourceStats; adds SearchResponse.resource_stats. |
quickwit/quickwit-proto/build.rs |
Ensures search.proto changes retrigger codegen; adds clippy attribute for large enum variant. |
quickwit/quickwit-lambda-client/src/metrics.rs |
Renames Lambda metrics label key from status to outcome. |
quickwit/quickwit-lambda-client/src/invoker.rs |
Classifies Lambda invocation outcomes (`success |
quickwit/quickwit-jaeger/src/lib.rs |
Updates tests/struct construction for the new SearchResponse.resource_stats field. |
quickwit/quickwit-common/src/lib.rs |
Adds is_running_in_lambda() helper used by leaf search instrumentation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+57
to
+61
| /// Wrap a base `Storage` with this proxy at the entry point of a request to | ||
| /// observe the per-request download volume. Cached layers (split cache, footer | ||
| /// cache, hotcache, byte-range cache) live BELOW this wrapper, so reads served | ||
| /// from cache do NOT contribute to the counters — that is the desired behavior | ||
| /// for a "downloaded from object storage" measurement. |
Comment on lines
+535
to
+538
| // Wrap storage at request scope so we observe per-split download volume. | ||
| // Cache layers (split cache, footer cache, hotcache, byte-range cache) live | ||
| // BELOW this wrapper, so reads served from cache do not contribute to the | ||
| // counters — that is the desired "downloaded from object storage" semantics. |
Comment on lines
1641
to
1650
| let search_permit_futures = searcher_context | ||
| .search_permit_provider | ||
| .get_permits_with_offload(permit_sizes, offload_threshold) | ||
| .await; | ||
|
|
||
| let splits_to_run_on_lambda: Vec<(SplitIdAndFooterOffsets, SearchRequest)> = | ||
| splits.drain(search_permit_futures.len()..).collect(); | ||
|
|
||
| let permit_requested_at = Instant::now(); | ||
| let splits_to_run_locally: Vec<LocalSearchTask> = splits |
Comment on lines
+788
to
+789
| // Each entry is (leaf response, number of leaf attempts made to obtain it). | ||
| // Metadata-count responses are synthesised locally and contribute 0 attempts. |
Comment on lines
+818
to
+820
| let leaf_num_calls_without_retries: u64 = | ||
| leaf_search_call_including_retries.load(Ordering::Relaxed) as u64; | ||
|
|
Comment on lines
44
to
+56
| leaf_search_requests_total: new_counter_vec( | ||
| "leaf_search_requests_total", | ||
| "Total number of Lambda leaf search invocations.", | ||
| "lambda", | ||
| &[], | ||
| ["status"], | ||
| ["outcome"], | ||
| ), | ||
| leaf_search_duration_seconds: new_histogram_vec( | ||
| "leaf_search_duration_seconds", | ||
| "Duration of Lambda leaf search invocations in seconds.", | ||
| "lambda", | ||
| &[], | ||
| ["status"], | ||
| ["outcome"], |
Comment on lines
109
to
+113
| self.search_request.search_after = Some(previous_last_hit); | ||
| let leaf_search_response: LeafSearchResponse = crate::root::search_partial_hits_phase( | ||
| searcher_context, | ||
| &self.indexes_metas_for_leaf_search, | ||
| &self.search_request, | ||
| &self.split_metadatas[..], | ||
| cluster_client, | ||
| ) | ||
| .await?; | ||
| let (leaf_search_response, _root_resource_stats): (LeafSearchResponse, _) = | ||
| crate::root::search_partial_hits_phase( | ||
| searcher_context, | ||
| &self.indexes_metas_for_leaf_search, |
| aggregation_postcard: None, | ||
| failed_splits: scroll_context.failed_splits, | ||
| num_successful_splits: scroll_context.num_successful_splits, | ||
| resource_stats: None, |
58de558 to
ec50de2
Compare
ec50de2 to
cf8145c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Describe the proposed changes made in this PR.
How was this PR tested?
Describe how you tested this PR.