Skip to content

Add OpenTelemetry instrumentation#1609

Open
parmesant wants to merge 1 commit intomainfrom
otel/instrument-1775057001447
Open

Add OpenTelemetry instrumentation#1609
parmesant wants to merge 1 commit intomainfrom
otel/instrument-1775057001447

Conversation

@parmesant
Copy link
Copy Markdown
Contributor

@parmesant parmesant commented Apr 1, 2026

Clean compilation with no warnings. Here's a summary of all changes made:

Files Modified

1. Cargo.toml

  • Added dependencies: tracing-opentelemetry = "0.32", tracing-actix-web = "0.7", opentelemetry, opentelemetry_sdk (with rt-tokio), opentelemetry-otlp (with grpc-tonic, http-proto, http-json) — all from the same git rev as existing opentelemetry-proto
  • Added tracing-subscriber feature "registry"
  • Added [patch.crates-io] section to unify opentelemetry and opentelemetry_sdk types across all crates

2. src/telemetry.rs (NEW)

  • init_otel_tracer() -> Option<SdkTracerProvider> — reads OTEL_EXPORTER_OTLP_ENDPOINT env var; if unset returns None (OTel disabled). Supports gRPC, HTTP/protobuf, and HTTP/JSON (default) protocols via OTEL_EXPORTER_OTLP_PROTOCOL. Registers W3C TraceContextPropagator globally.

3. src/lib.rs

  • Added pub mod telemetry;

4. src/main.rs

  • init_logger() now returns Option<SdkTracerProvider> and wires the OTel tracing layer into the subscriber
  • main() captures the provider and calls provider.shutdown() before exit

5. src/handlers/http/modal/mod.rs

  • Replaced actix_web::middleware::Logger::default() with tracing_actix_web::TracingLogger::default() for automatic HTTP request tracing with W3C traceparent propagation

6. src/handlers/http/query.rs — 7 functions instrumented

  • query() — root span with query.sql and query.streaming fields
  • get_counts() — root span
  • handle_count_query() — child span with table field
  • handle_non_streaming_query() — child span
  • handle_streaming_query() — child span
  • into_query() — child span
  • get_records_and_fields() — child span
  • create_streams_for_distributed() — child span with stream_count field + Pattern 1 span propagation into JoinSet::spawn tasks

7. src/query/mod.rs — 4 functions instrumented

  • execute() — child span + Pattern 2 W3C TraceContext propagation across QUERY_RUNTIME (separate Runtime::new() — cross-OS-thread boundary). Injects context before spawn, extracts and sets parent inside the spawned closure.
  • Query::execute() — child span (query.datafusion_execute)
  • CountsRequest::get_bin_density() — child span with stream field
  • get_manifest_list() — child span with stream field

8. src/storage/field_stats.rs — 1 function instrumented

  • get_dataset_stats() — root span

9. src/handlers/http/cluster/mod.rs — 1 function instrumented

  • send_query_request() — child span

Co-authored-by: otex-dev dev@otex.dev

Summary by CodeRabbit

Release Notes

  • New Features
    • Added OpenTelemetry integration enabling distributed tracing of application operations, queries, and HTTP requests.
    • Enhanced tracing with support for multiple OTLP export protocols (gRPC, HTTP protobuf, HTTP JSON) for improved observability and monitoring.
    • Improved request and query logging with native tracing support for better visibility into application behavior and performance.

Clean compilation with no warnings. Here's a summary of all changes made:

## Files Modified

### 1. `Cargo.toml`
- Added dependencies: `tracing-opentelemetry = "0.32"`, `tracing-actix-web = "0.7"`, `opentelemetry`, `opentelemetry_sdk` (with `rt-tokio`), `opentelemetry-otlp` (with `grpc-tonic`, `http-proto`, `http-json`) — all from the same git rev as existing `opentelemetry-proto`
- Added `tracing-subscriber` feature `"registry"`
- Added `[patch.crates-io]` section to unify `opentelemetry` and `opentelemetry_sdk` types across all crates

### 2. `src/telemetry.rs` (NEW)
- `init_otel_tracer() -> Option<SdkTracerProvider>` — reads `OTEL_EXPORTER_OTLP_ENDPOINT` env var; if unset returns `None` (OTel disabled). Supports gRPC, HTTP/protobuf, and HTTP/JSON (default) protocols via `OTEL_EXPORTER_OTLP_PROTOCOL`. Registers W3C `TraceContextPropagator` globally.

### 3. `src/lib.rs`
- Added `pub mod telemetry;`

### 4. `src/main.rs`
- `init_logger()` now returns `Option<SdkTracerProvider>` and wires the OTel tracing layer into the subscriber
- `main()` captures the provider and calls `provider.shutdown()` before exit

### 5. `src/handlers/http/modal/mod.rs`
- Replaced `actix_web::middleware::Logger::default()` with `tracing_actix_web::TracingLogger::default()` for automatic HTTP request tracing with W3C traceparent propagation

### 6. `src/handlers/http/query.rs` — 7 functions instrumented
- **`query()`** — root span with `query.sql` and `query.streaming` fields
- **`get_counts()`** — root span
- **`handle_count_query()`** — child span with `table` field
- **`handle_non_streaming_query()`** — child span
- **`handle_streaming_query()`** — child span
- **`into_query()`** — child span
- **`get_records_and_fields()`** — child span
- **`create_streams_for_distributed()`** — child span with `stream_count` field + Pattern 1 span propagation into `JoinSet::spawn` tasks

### 7. `src/query/mod.rs` — 4 functions instrumented
- **`execute()`** — child span + **Pattern 2 W3C TraceContext propagation** across `QUERY_RUNTIME` (separate `Runtime::new()` — cross-OS-thread boundary). Injects context before spawn, extracts and sets parent inside the spawned closure.
- **`Query::execute()`** — child span (`query.datafusion_execute`)
- **`CountsRequest::get_bin_density()`** — child span with `stream` field
- **`get_manifest_list()`** — child span with `stream` field

### 8. `src/storage/field_stats.rs` — 1 function instrumented
- **`get_dataset_stats()`** — root span

### 9. `src/handlers/http/cluster/mod.rs` — 1 function instrumented
- **`send_query_request()`** — child span

Co-authored-by: otex-dev <dev@otex.dev>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 1, 2026

Walkthrough

Adds OpenTelemetry tracing integration to the application via new dependencies, a new telemetry module, and tracing instrumentation attributes across multiple functions. Implements OpenTelemetry initialization, context propagation across thread boundaries, and replaces HTTP logging middleware with tracing-actix-web.

Changes

Cohort / File(s) Summary
Dependency & Initialization
Cargo.toml, src/main.rs, src/lib.rs, src/telemetry.rs
Added OpenTelemetry and tracing dependencies; introduced new telemetry module export; created init_otel_tracer() function supporting OTLP protocol configuration (gRPC, HTTP protobuf, HTTP JSON); modified init_logger() to return Option<SdkTracerProvider> and integrate tracer setup; added global tracer provider and W3C TraceContext propagator registration.
HTTP Handler Instrumentation
src/handlers/http/modal/mod.rs, src/handlers/http/cluster/mod.rs
Replaced Actix request logging middleware with tracing_actix_web::TracingLogger; added #[tracing::instrument] to send_query_request with field-skipping configuration.
Query Processing Instrumentation
src/handlers/http/query.rs, src/query/mod.rs
Added #[tracing::instrument] attributes to multiple query handlers and executors with structured span fields and skip configurations; introduced RB type alias for simplified return types; implemented explicit OpenTelemetry TraceContext injection/extraction across QUERY_RUNTIME thread boundary; wrapped distributed stream tasks with named spans and instrumentation.
Storage Handler Instrumentation
src/storage/field_stats.rs
Added #[tracing::instrument] to get_dataset_stats with server-kind field and parameter skipping.

Sequence Diagram(s)

sequenceDiagram
    participant App as Application<br/>Startup
    participant Main as main()
    participant Telemetry as telemetry module
    participant OTEL as OpenTelemetry<br/>Provider
    participant Tracing as tracing-subscriber<br/>Registry
    
    Main->>Main: Call init_logger()
    Main->>Telemetry: init_otel_tracer()
    Telemetry->>Telemetry: Check OTEL_EXPORTER_OTLP_ENDPOINT
    Telemetry->>OTEL: Create SpanExporter<br/>(gRPC/HTTP)
    OTEL-->>Telemetry: SpanExporter
    Telemetry->>OTEL: Create SdkTracerProvider<br/>with BatchSpanProcessor
    OTEL-->>Telemetry: SdkTracerProvider
    Telemetry->>OTEL: set_tracer_provider(global)
    Telemetry->>OTEL: set_text_map_propagator(W3C)
    Telemetry-->>Main: Some(provider)
    Main->>Tracing: Register tracer layer
    Tracing-->>Main: OK
    Main->>App: Application ready<br/>with tracing active
Loading
sequenceDiagram
    participant Client as Client
    participant Middleware as HTTP Middleware<br/>(TracingLogger)
    participant Handler as Query Handler<br/>(instrumented)
    participant Executor as Query Executor
    participant OTEL as OTEL<br/>Exporter
    
    Client->>Middleware: HTTP Request
    Middleware->>Middleware: Create root span<br/>for request
    Middleware->>Handler: invoke_handler()
    Handler->>Handler: Create named span
    Handler->>Executor: execute(query)
    Executor->>Executor: Inject span context<br/>into HashMap
    Executor->>Executor: Spawn QUERY_RUNTIME task<br/>with context
    Executor->>Executor: Extract context in task
    Executor->>Executor: Set span parent
    Executor->>OTEL: Instrument future<br/>with span
    OTEL-->>Executor: Span data
    Executor-->>Handler: Result
    Handler-->>Middleware: Response
    Middleware->>OTEL: Export request span
    OTEL-->>Client: HTTP Response
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • nitisht
  • nikhilsinhaparseable

Poem

🐰 Hops through the code with telemetry in sight,
Tracing each span from morning to night,
Context flows freely across every thread,
Observability ribbons, golden and red! ✨📡

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Add OpenTelemetry instrumentation' accurately summarizes the main change—adding comprehensive OpenTelemetry tracing integration throughout the codebase.
Description check ✅ Passed The PR description is well-structured and comprehensive, covering all modified files, key changes, and implementation details. However, it lacks the template sections for testing confirmation and documentation statements.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch otel/instrument-1775057001447

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
src/query/mod.rs (2)

81-98: Consider a more descriptive name for this type alias.

RB is cryptic. A name like QueryResultBatches or RecordBatchResult would be more self-documenting and convey that this represents either collected batches or a streaming result.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/query/mod.rs` around lines 81 - 98, The type alias RB is cryptic; rename
it to a descriptive identifier (e.g., QueryResultBatches or RecordBatchResult)
and update all references accordingly. Locate the type alias declaration "pub
type RB = Either<...>" and change the alias name, then update any usages
(functions, structs, trait impls) that reference RB to the new name (ensure
imports and re-exports are adjusted if RB was pub). Run cargo check/tests to
confirm no remaining references to RB.

520-520: Consider consistent span naming with a common prefix.

Other spans in this file use a query. prefix (e.g., query.execute, query.datafusion_execute). For consistency and easier filtering in trace backends, consider naming these query.get_bin_density and query.get_manifest_list.

🔧 Suggested naming for consistency
-#[tracing::instrument(name = "get_bin_density", skip_all, fields(stream = %self.stream))]
+#[tracing::instrument(name = "query.get_bin_density", skip_all, fields(stream = %self.stream))]
 pub async fn get_bin_density(
-#[tracing::instrument(name = "get_manifest_list", skip(time_range, tenant_id), fields(stream = %stream_name))]
+#[tracing::instrument(name = "query.get_manifest_list", skip(time_range, tenant_id), fields(stream = %stream_name))]
 pub async fn get_manifest_list(

Also applies to: 726-726

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/query/mod.rs` at line 520, The tracing span names lack the common
"query." prefix; update the tracing::instrument attributes for the functions
like get_bin_density and get_manifest_list to use names such as
"query.get_bin_density" and "query.get_manifest_list" (i.e., change
#[tracing::instrument(name = "get_bin_density", ...)] to
#[tracing::instrument(name = "query.get_bin_density", ...)] and similarly for
get_manifest_list) so span names match the existing query.* convention for
consistent filtering.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/handlers/http/query.rs`:
- Line 120: The tracing span currently emits the full SQL via the attribute on
the "query" handler (the tracing::instrument on the function using
query_request), which can leak sensitive data; remove the raw query text from
the span fields and instead record a non-sensitive identifier such as a
hash/fingerprint or metadata (e.g., query_hash, param_count, or a boolean for
streaming) derived from query_request.query; update the tracing attribute and
any span field population to use that safe field (compute the hash/fingerprint
in the handler using query_request.query and expose only that value in the span,
or omit the SQL entirely and keep query.streaming if needed).

In `@src/telemetry.rs`:
- Around line 34-35: The OTEL initialization currently only checks for
OTEL_EXPORTER_OTLP_ENDPOINT so a signal-specific env var like
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT is ignored; update the gate in the OTEL init
path (where OTEL is enabled/disabled, e.g., in the function that reads env vars
to initialize OTEL) to proceed if either OTEL_EXPORTER_OTLP_ENDPOINT or
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT is set, and ensure later logic still prefers
the signal-specific OTLP_TRACES endpoint when constructing the traces exporter
URL; change the boolean check that references only OTEL_EXPORTER_OTLP_ENDPOINT
to an existence check for either env var and preserve existing resolution logic
for trace endpoint selection.

---

Nitpick comments:
In `@src/query/mod.rs`:
- Around line 81-98: The type alias RB is cryptic; rename it to a descriptive
identifier (e.g., QueryResultBatches or RecordBatchResult) and update all
references accordingly. Locate the type alias declaration "pub type RB =
Either<...>" and change the alias name, then update any usages (functions,
structs, trait impls) that reference RB to the new name (ensure imports and
re-exports are adjusted if RB was pub). Run cargo check/tests to confirm no
remaining references to RB.
- Line 520: The tracing span names lack the common "query." prefix; update the
tracing::instrument attributes for the functions like get_bin_density and
get_manifest_list to use names such as "query.get_bin_density" and
"query.get_manifest_list" (i.e., change #[tracing::instrument(name =
"get_bin_density", ...)] to #[tracing::instrument(name =
"query.get_bin_density", ...)] and similarly for get_manifest_list) so span
names match the existing query.* convention for consistent filtering.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 7d2eb9b3-0884-4b11-9ba6-9582033fe14f

📥 Commits

Reviewing files that changed from the base of the PR and between 187a4e6 and 5a2b128.

📒 Files selected for processing (9)
  • Cargo.toml
  • src/handlers/http/cluster/mod.rs
  • src/handlers/http/modal/mod.rs
  • src/handlers/http/query.rs
  • src/lib.rs
  • src/main.rs
  • src/query/mod.rs
  • src/storage/field_stats.rs
  • src/telemetry.rs

Ok((Some(records), Some(fields)))
}

#[tracing::instrument(name = "query", skip(req, query_request), fields(otel.kind = "server", query.sql = %query_request.query, query.streaming = query_request.streaming))]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Do not emit raw SQL text into request spans.

Line 120 records full query text (query.sql), which can leak sensitive literals/identifiers into logs and OTLP backends.

🔒 Safer span fields
-#[tracing::instrument(name = "query", skip(req, query_request), fields(otel.kind = "server", query.sql = %query_request.query, query.streaming = query_request.streaming))]
+#[tracing::instrument(
+    name = "query",
+    skip(req, query_request),
+    fields(
+        otel.kind = "server",
+        query.streaming = query_request.streaming,
+        query.length = query_request.query.len()
+    )
+)]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#[tracing::instrument(name = "query", skip(req, query_request), fields(otel.kind = "server", query.sql = %query_request.query, query.streaming = query_request.streaming))]
#[tracing::instrument(
name = "query",
skip(req, query_request),
fields(
otel.kind = "server",
query.streaming = query_request.streaming,
query.length = query_request.query.len()
)
)]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/handlers/http/query.rs` at line 120, The tracing span currently emits the
full SQL via the attribute on the "query" handler (the tracing::instrument on
the function using query_request), which can leak sensitive data; remove the raw
query text from the span fields and instead record a non-sensitive identifier
such as a hash/fingerprint or metadata (e.g., query_hash, param_count, or a
boolean for streaming) derived from query_request.query; update the tracing
attribute and any span field population to use that safe field (compute the
hash/fingerprint in the handler using query_request.query and expose only that
value in the span, or omit the SQL entirely and keep query.streaming if needed).

Comment on lines +34 to +35
/// Set a signal-specific var `OTEL_EXPORTER_OTLP_TRACES_ENDPOINT` to
/// supply a full URL without any path suffix being added.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify docs vs implementation gate in src/telemetry.rs
rg -n 'OTEL_EXPORTER_OTLP_TRACES_ENDPOINT|OTEL_EXPORTER_OTLP_ENDPOINT' src/telemetry.rs -C2
# Expected: mention of OTEL_EXPORTER_OTLP_TRACES_ENDPOINT in docs and
# gate using only OTEL_EXPORTER_OTLP_ENDPOINT in code.

Repository: parseablehq/parseable

Length of output: 1395


OTEL is disabled when only the signal-specific endpoint is configured.

Line 34–35 documents that OTEL_EXPORTER_OTLP_TRACES_ENDPOINT can be set to supply a full URL without path suffix. However, line 51 gates OTEL initialization on OTEL_EXPORTER_OTLP_ENDPOINT alone, making the documented trace-specific endpoint non-functional. Users setting only OTEL_EXPORTER_OTLP_TRACES_ENDPOINT will have OTEL disabled.

Per OpenTelemetry spec, signal-specific endpoints should override the general endpoint; the gate must check for either endpoint being set.

Proposed fix
-    std::env::var("OTEL_EXPORTER_OTLP_ENDPOINT").ok()?;
+    let has_endpoint = std::env::var_os("OTEL_EXPORTER_OTLP_ENDPOINT").is_some();
+    let has_traces_endpoint = std::env::var_os("OTEL_EXPORTER_OTLP_TRACES_ENDPOINT").is_some();
+    if !has_endpoint && !has_traces_endpoint {
+        return None;
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/telemetry.rs` around lines 34 - 35, The OTEL initialization currently
only checks for OTEL_EXPORTER_OTLP_ENDPOINT so a signal-specific env var like
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT is ignored; update the gate in the OTEL init
path (where OTEL is enabled/disabled, e.g., in the function that reads env vars
to initialize OTEL) to proceed if either OTEL_EXPORTER_OTLP_ENDPOINT or
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT is set, and ensure later logic still prefers
the signal-specific OTLP_TRACES endpoint when constructing the traces exporter
URL; change the boolean check that references only OTEL_EXPORTER_OTLP_ENDPOINT
to an existence check for either env var and preserve existing resolution logic
for trace endpoint selection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant