Conversation
Clean compilation with no warnings. Here's a summary of all changes made: ## Files Modified ### 1. `Cargo.toml` - Added dependencies: `tracing-opentelemetry = "0.32"`, `tracing-actix-web = "0.7"`, `opentelemetry`, `opentelemetry_sdk` (with `rt-tokio`), `opentelemetry-otlp` (with `grpc-tonic`, `http-proto`, `http-json`) — all from the same git rev as existing `opentelemetry-proto` - Added `tracing-subscriber` feature `"registry"` - Added `[patch.crates-io]` section to unify `opentelemetry` and `opentelemetry_sdk` types across all crates ### 2. `src/telemetry.rs` (NEW) - `init_otel_tracer() -> Option<SdkTracerProvider>` — reads `OTEL_EXPORTER_OTLP_ENDPOINT` env var; if unset returns `None` (OTel disabled). Supports gRPC, HTTP/protobuf, and HTTP/JSON (default) protocols via `OTEL_EXPORTER_OTLP_PROTOCOL`. Registers W3C `TraceContextPropagator` globally. ### 3. `src/lib.rs` - Added `pub mod telemetry;` ### 4. `src/main.rs` - `init_logger()` now returns `Option<SdkTracerProvider>` and wires the OTel tracing layer into the subscriber - `main()` captures the provider and calls `provider.shutdown()` before exit ### 5. `src/handlers/http/modal/mod.rs` - Replaced `actix_web::middleware::Logger::default()` with `tracing_actix_web::TracingLogger::default()` for automatic HTTP request tracing with W3C traceparent propagation ### 6. `src/handlers/http/query.rs` — 7 functions instrumented - **`query()`** — root span with `query.sql` and `query.streaming` fields - **`get_counts()`** — root span - **`handle_count_query()`** — child span with `table` field - **`handle_non_streaming_query()`** — child span - **`handle_streaming_query()`** — child span - **`into_query()`** — child span - **`get_records_and_fields()`** — child span - **`create_streams_for_distributed()`** — child span with `stream_count` field + Pattern 1 span propagation into `JoinSet::spawn` tasks ### 7. `src/query/mod.rs` — 4 functions instrumented - **`execute()`** — child span + **Pattern 2 W3C TraceContext propagation** across `QUERY_RUNTIME` (separate `Runtime::new()` — cross-OS-thread boundary). Injects context before spawn, extracts and sets parent inside the spawned closure. - **`Query::execute()`** — child span (`query.datafusion_execute`) - **`CountsRequest::get_bin_density()`** — child span with `stream` field - **`get_manifest_list()`** — child span with `stream` field ### 8. `src/storage/field_stats.rs` — 1 function instrumented - **`get_dataset_stats()`** — root span ### 9. `src/handlers/http/cluster/mod.rs` — 1 function instrumented - **`send_query_request()`** — child span Co-authored-by: otex-dev <dev@otex.dev>
WalkthroughAdds OpenTelemetry tracing integration to the application via new dependencies, a new telemetry module, and tracing instrumentation attributes across multiple functions. Implements OpenTelemetry initialization, context propagation across thread boundaries, and replaces HTTP logging middleware with tracing-actix-web. Changes
Sequence Diagram(s)sequenceDiagram
participant App as Application<br/>Startup
participant Main as main()
participant Telemetry as telemetry module
participant OTEL as OpenTelemetry<br/>Provider
participant Tracing as tracing-subscriber<br/>Registry
Main->>Main: Call init_logger()
Main->>Telemetry: init_otel_tracer()
Telemetry->>Telemetry: Check OTEL_EXPORTER_OTLP_ENDPOINT
Telemetry->>OTEL: Create SpanExporter<br/>(gRPC/HTTP)
OTEL-->>Telemetry: SpanExporter
Telemetry->>OTEL: Create SdkTracerProvider<br/>with BatchSpanProcessor
OTEL-->>Telemetry: SdkTracerProvider
Telemetry->>OTEL: set_tracer_provider(global)
Telemetry->>OTEL: set_text_map_propagator(W3C)
Telemetry-->>Main: Some(provider)
Main->>Tracing: Register tracer layer
Tracing-->>Main: OK
Main->>App: Application ready<br/>with tracing active
sequenceDiagram
participant Client as Client
participant Middleware as HTTP Middleware<br/>(TracingLogger)
participant Handler as Query Handler<br/>(instrumented)
participant Executor as Query Executor
participant OTEL as OTEL<br/>Exporter
Client->>Middleware: HTTP Request
Middleware->>Middleware: Create root span<br/>for request
Middleware->>Handler: invoke_handler()
Handler->>Handler: Create named span
Handler->>Executor: execute(query)
Executor->>Executor: Inject span context<br/>into HashMap
Executor->>Executor: Spawn QUERY_RUNTIME task<br/>with context
Executor->>Executor: Extract context in task
Executor->>Executor: Set span parent
Executor->>OTEL: Instrument future<br/>with span
OTEL-->>Executor: Span data
Executor-->>Handler: Result
Handler-->>Middleware: Response
Middleware->>OTEL: Export request span
OTEL-->>Client: HTTP Response
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
src/query/mod.rs (2)
81-98: Consider a more descriptive name for this type alias.
RBis cryptic. A name likeQueryResultBatchesorRecordBatchResultwould be more self-documenting and convey that this represents either collected batches or a streaming result.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/query/mod.rs` around lines 81 - 98, The type alias RB is cryptic; rename it to a descriptive identifier (e.g., QueryResultBatches or RecordBatchResult) and update all references accordingly. Locate the type alias declaration "pub type RB = Either<...>" and change the alias name, then update any usages (functions, structs, trait impls) that reference RB to the new name (ensure imports and re-exports are adjusted if RB was pub). Run cargo check/tests to confirm no remaining references to RB.
520-520: Consider consistent span naming with a common prefix.Other spans in this file use a
query.prefix (e.g.,query.execute,query.datafusion_execute). For consistency and easier filtering in trace backends, consider naming thesequery.get_bin_densityandquery.get_manifest_list.🔧 Suggested naming for consistency
-#[tracing::instrument(name = "get_bin_density", skip_all, fields(stream = %self.stream))] +#[tracing::instrument(name = "query.get_bin_density", skip_all, fields(stream = %self.stream))] pub async fn get_bin_density(-#[tracing::instrument(name = "get_manifest_list", skip(time_range, tenant_id), fields(stream = %stream_name))] +#[tracing::instrument(name = "query.get_manifest_list", skip(time_range, tenant_id), fields(stream = %stream_name))] pub async fn get_manifest_list(Also applies to: 726-726
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/query/mod.rs` at line 520, The tracing span names lack the common "query." prefix; update the tracing::instrument attributes for the functions like get_bin_density and get_manifest_list to use names such as "query.get_bin_density" and "query.get_manifest_list" (i.e., change #[tracing::instrument(name = "get_bin_density", ...)] to #[tracing::instrument(name = "query.get_bin_density", ...)] and similarly for get_manifest_list) so span names match the existing query.* convention for consistent filtering.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/handlers/http/query.rs`:
- Line 120: The tracing span currently emits the full SQL via the attribute on
the "query" handler (the tracing::instrument on the function using
query_request), which can leak sensitive data; remove the raw query text from
the span fields and instead record a non-sensitive identifier such as a
hash/fingerprint or metadata (e.g., query_hash, param_count, or a boolean for
streaming) derived from query_request.query; update the tracing attribute and
any span field population to use that safe field (compute the hash/fingerprint
in the handler using query_request.query and expose only that value in the span,
or omit the SQL entirely and keep query.streaming if needed).
In `@src/telemetry.rs`:
- Around line 34-35: The OTEL initialization currently only checks for
OTEL_EXPORTER_OTLP_ENDPOINT so a signal-specific env var like
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT is ignored; update the gate in the OTEL init
path (where OTEL is enabled/disabled, e.g., in the function that reads env vars
to initialize OTEL) to proceed if either OTEL_EXPORTER_OTLP_ENDPOINT or
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT is set, and ensure later logic still prefers
the signal-specific OTLP_TRACES endpoint when constructing the traces exporter
URL; change the boolean check that references only OTEL_EXPORTER_OTLP_ENDPOINT
to an existence check for either env var and preserve existing resolution logic
for trace endpoint selection.
---
Nitpick comments:
In `@src/query/mod.rs`:
- Around line 81-98: The type alias RB is cryptic; rename it to a descriptive
identifier (e.g., QueryResultBatches or RecordBatchResult) and update all
references accordingly. Locate the type alias declaration "pub type RB =
Either<...>" and change the alias name, then update any usages (functions,
structs, trait impls) that reference RB to the new name (ensure imports and
re-exports are adjusted if RB was pub). Run cargo check/tests to confirm no
remaining references to RB.
- Line 520: The tracing span names lack the common "query." prefix; update the
tracing::instrument attributes for the functions like get_bin_density and
get_manifest_list to use names such as "query.get_bin_density" and
"query.get_manifest_list" (i.e., change #[tracing::instrument(name =
"get_bin_density", ...)] to #[tracing::instrument(name =
"query.get_bin_density", ...)] and similarly for get_manifest_list) so span
names match the existing query.* convention for consistent filtering.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 7d2eb9b3-0884-4b11-9ba6-9582033fe14f
📒 Files selected for processing (9)
Cargo.tomlsrc/handlers/http/cluster/mod.rssrc/handlers/http/modal/mod.rssrc/handlers/http/query.rssrc/lib.rssrc/main.rssrc/query/mod.rssrc/storage/field_stats.rssrc/telemetry.rs
| Ok((Some(records), Some(fields))) | ||
| } | ||
|
|
||
| #[tracing::instrument(name = "query", skip(req, query_request), fields(otel.kind = "server", query.sql = %query_request.query, query.streaming = query_request.streaming))] |
There was a problem hiding this comment.
Do not emit raw SQL text into request spans.
Line 120 records full query text (query.sql), which can leak sensitive literals/identifiers into logs and OTLP backends.
🔒 Safer span fields
-#[tracing::instrument(name = "query", skip(req, query_request), fields(otel.kind = "server", query.sql = %query_request.query, query.streaming = query_request.streaming))]
+#[tracing::instrument(
+ name = "query",
+ skip(req, query_request),
+ fields(
+ otel.kind = "server",
+ query.streaming = query_request.streaming,
+ query.length = query_request.query.len()
+ )
+)]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| #[tracing::instrument(name = "query", skip(req, query_request), fields(otel.kind = "server", query.sql = %query_request.query, query.streaming = query_request.streaming))] | |
| #[tracing::instrument( | |
| name = "query", | |
| skip(req, query_request), | |
| fields( | |
| otel.kind = "server", | |
| query.streaming = query_request.streaming, | |
| query.length = query_request.query.len() | |
| ) | |
| )] |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/handlers/http/query.rs` at line 120, The tracing span currently emits the
full SQL via the attribute on the "query" handler (the tracing::instrument on
the function using query_request), which can leak sensitive data; remove the raw
query text from the span fields and instead record a non-sensitive identifier
such as a hash/fingerprint or metadata (e.g., query_hash, param_count, or a
boolean for streaming) derived from query_request.query; update the tracing
attribute and any span field population to use that safe field (compute the
hash/fingerprint in the handler using query_request.query and expose only that
value in the span, or omit the SQL entirely and keep query.streaming if needed).
| /// Set a signal-specific var `OTEL_EXPORTER_OTLP_TRACES_ENDPOINT` to | ||
| /// supply a full URL without any path suffix being added. |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify docs vs implementation gate in src/telemetry.rs
rg -n 'OTEL_EXPORTER_OTLP_TRACES_ENDPOINT|OTEL_EXPORTER_OTLP_ENDPOINT' src/telemetry.rs -C2
# Expected: mention of OTEL_EXPORTER_OTLP_TRACES_ENDPOINT in docs and
# gate using only OTEL_EXPORTER_OTLP_ENDPOINT in code.Repository: parseablehq/parseable
Length of output: 1395
OTEL is disabled when only the signal-specific endpoint is configured.
Line 34–35 documents that OTEL_EXPORTER_OTLP_TRACES_ENDPOINT can be set to supply a full URL without path suffix. However, line 51 gates OTEL initialization on OTEL_EXPORTER_OTLP_ENDPOINT alone, making the documented trace-specific endpoint non-functional. Users setting only OTEL_EXPORTER_OTLP_TRACES_ENDPOINT will have OTEL disabled.
Per OpenTelemetry spec, signal-specific endpoints should override the general endpoint; the gate must check for either endpoint being set.
Proposed fix
- std::env::var("OTEL_EXPORTER_OTLP_ENDPOINT").ok()?;
+ let has_endpoint = std::env::var_os("OTEL_EXPORTER_OTLP_ENDPOINT").is_some();
+ let has_traces_endpoint = std::env::var_os("OTEL_EXPORTER_OTLP_TRACES_ENDPOINT").is_some();
+ if !has_endpoint && !has_traces_endpoint {
+ return None;
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/telemetry.rs` around lines 34 - 35, The OTEL initialization currently
only checks for OTEL_EXPORTER_OTLP_ENDPOINT so a signal-specific env var like
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT is ignored; update the gate in the OTEL init
path (where OTEL is enabled/disabled, e.g., in the function that reads env vars
to initialize OTEL) to proceed if either OTEL_EXPORTER_OTLP_ENDPOINT or
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT is set, and ensure later logic still prefers
the signal-specific OTLP_TRACES endpoint when constructing the traces exporter
URL; change the boolean check that references only OTEL_EXPORTER_OTLP_ENDPOINT
to an existence check for either env var and preserve existing resolution logic
for trace endpoint selection.
Clean compilation with no warnings. Here's a summary of all changes made:
Files Modified
1.
Cargo.tomltracing-opentelemetry = "0.32",tracing-actix-web = "0.7",opentelemetry,opentelemetry_sdk(withrt-tokio),opentelemetry-otlp(withgrpc-tonic,http-proto,http-json) — all from the same git rev as existingopentelemetry-prototracing-subscriberfeature"registry"[patch.crates-io]section to unifyopentelemetryandopentelemetry_sdktypes across all crates2.
src/telemetry.rs(NEW)init_otel_tracer() -> Option<SdkTracerProvider>— readsOTEL_EXPORTER_OTLP_ENDPOINTenv var; if unset returnsNone(OTel disabled). Supports gRPC, HTTP/protobuf, and HTTP/JSON (default) protocols viaOTEL_EXPORTER_OTLP_PROTOCOL. Registers W3CTraceContextPropagatorglobally.3.
src/lib.rspub mod telemetry;4.
src/main.rsinit_logger()now returnsOption<SdkTracerProvider>and wires the OTel tracing layer into the subscribermain()captures the provider and callsprovider.shutdown()before exit5.
src/handlers/http/modal/mod.rsactix_web::middleware::Logger::default()withtracing_actix_web::TracingLogger::default()for automatic HTTP request tracing with W3C traceparent propagation6.
src/handlers/http/query.rs— 7 functions instrumentedquery()— root span withquery.sqlandquery.streamingfieldsget_counts()— root spanhandle_count_query()— child span withtablefieldhandle_non_streaming_query()— child spanhandle_streaming_query()— child spaninto_query()— child spanget_records_and_fields()— child spancreate_streams_for_distributed()— child span withstream_countfield + Pattern 1 span propagation intoJoinSet::spawntasks7.
src/query/mod.rs— 4 functions instrumentedexecute()— child span + Pattern 2 W3C TraceContext propagation acrossQUERY_RUNTIME(separateRuntime::new()— cross-OS-thread boundary). Injects context before spawn, extracts and sets parent inside the spawned closure.Query::execute()— child span (query.datafusion_execute)CountsRequest::get_bin_density()— child span withstreamfieldget_manifest_list()— child span withstreamfield8.
src/storage/field_stats.rs— 1 function instrumentedget_dataset_stats()— root span9.
src/handlers/http/cluster/mod.rs— 1 function instrumentedsend_query_request()— child spanCo-authored-by: otex-dev dev@otex.dev
Summary by CodeRabbit
Release Notes