[PECOBLR-1381] Implement telemetry Phase 6: Metric collection & aggregation#318
Closed
samikshya-db wants to merge 3 commits intoPECOBLR-1143-telemetry-phase4-5from
Closed
[PECOBLR-1381] Implement telemetry Phase 6: Metric collection & aggregation#318samikshya-db wants to merge 3 commits intoPECOBLR-1143-telemetry-phase4-5from
samikshya-db wants to merge 3 commits intoPECOBLR-1143-telemetry-phase4-5from
Conversation
…gation This commit implements Phase 6 (metric collection and aggregation) for the telemetry system. Phase 6: Metric Collection & Aggregation - Implement error classification (errors.go) - isTerminalError() for identifying non-retryable errors - classifyError() for categorizing errors for telemetry - HTTP error handling utilities - Implement telemetry interceptor (interceptor.go) - beforeExecute() / afterExecute() hooks for statement execution - Context-based metric tracking with metricContext - Latency measurement and tag collection - Connection event recording - Error swallowing with panic recovery - Implement metrics aggregator (aggregator.go) - Statement-level metric aggregation - Batch size and flush interval logic - Background flush goroutine with ticker - Thread-safe metric recording with mutex protection - Immediate flush for connection and terminal errors - Aggregated counts (chunks, bytes, polls) - Update telemetryClient (client.go) - Wire up aggregator with exporter - Automatic aggregator start in constructor - Graceful shutdown with 5s timeout - getInterceptor() for per-connection interceptors Architecture: - Each connection gets its own interceptor instance - All interceptors share the same aggregator (per host) - Aggregator batches metrics and flushes periodically - Exporter sends batched metrics to Databricks - Circuit breaker protects against endpoint failures Testing: - All 70+ existing tests continue to pass - Compilation verified, no breaking changes Note: Phase 7 (driver integration) will be completed separately to allow careful review and testing of hooks in connection.go and statement.go. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit implements Phase 7 (driver integration) for the telemetry system, completing the full telemetry pipeline from driver operations to export. Phase 7: Driver Integration - Add telemetry configuration to UserConfig - EnableTelemetry: User opt-in flag (respects server feature flags) - ForceEnableTelemetry: Force enable flag (bypasses server checks) - DSN parameter parsing in ParseDSN() - DeepCopy support for telemetry fields - Add telemetry support to connection - Add telemetry field to conn struct (*telemetry.Interceptor) - Initialize telemetry in connector.Connect() - Release telemetry resources in conn.Close() - Graceful shutdown with pending metric flush - Export telemetry types for driver use - Export Interceptor type (was interceptor) - Export GetInterceptor() method (was getInterceptor) - Export Close() method (was close) - Create driver integration helper (driver_integration.go) - InitializeForConnection(): One-stop initialization - ReleaseForConnection(): Resource cleanup - Encapsulates feature flag checks and client management - Reference counting for per-host resources Integration Flow: 1. User sets enableTelemetry=true or forceEnableTelemetry=true in DSN 2. connector.Connect() calls telemetry.InitializeForConnection() 3. Telemetry checks feature flags and returns Interceptor if enabled 4. Connection uses Interceptor for metric collection (Phase 8) 5. conn.Close() releases telemetry resources Architecture: - Per-connection: Interceptor instance - Per-host (shared): telemetryClient, aggregator, exporter - Global (singleton): clientManager, featureFlagCache, circuitBreakerManager Opt-In Priority (5 levels): 1. forceEnableTelemetry=true - Always enabled (testing/internal) 2. enableTelemetry=false - Always disabled (explicit opt-out) 3. enableTelemetry=true + server flag - User opt-in with server control 4. Server flag only - Default Databricks-controlled behavior 5. Default - Disabled (fail-safe) Testing: - All 70+ telemetry tests passing - No breaking changes to existing driver tests - Compilation verified across all packages - Graceful handling when telemetry disabled Note: Statement hooks (beforeExecute/afterExecute) will be added in follow-up for actual metric collection during query execution. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Collaborator
Author
|
Recreating with git stack for proper stacked PR management |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This stacked PR builds on #317 and implements Phase 6 (Metric Collection & Aggregation) and Phase 7 (Driver Integration) of the telemetry system. This completes the full telemetry pipeline from driver operations through aggregation to export.
Stack:
✅ Phase 6: Metric Collection & Aggregation (PECOBLR-1381)
New Components
1. Error Classification (
errors.go- 108 lines)isTerminalError()- Identifies non-retryable errorsclassifyError()- Categorizes errors for telemetry2. Telemetry Interceptor (
interceptor.go- 146 lines)BeforeExecute()/AfterExecute()hooks (exported)AddTag())3. Metrics Aggregator (
aggregator.go- 242 lines)4. Client Integration (
client.go- updated)GetInterceptor()for driver use✅ Phase 7: Driver Integration (PECOBLR-1382)
Configuration Support
UserConfig Extensions (
internal/config/config.go)EnableTelemetryfield (user opt-in, respects server)ForceEnableTelemetryfield (bypass server checks)enableTelemetry=true/false)forceEnableTelemetry=true)DeepCopy()support for new fieldsConnection Integration (
connection.go,connector.go)telemetry *telemetry.Interceptorfield toconnstructconnector.Connect()conn.Close()with resource releaseDriver Integration Helper (
driver_integration.go- 59 lines)InitializeForConnection()- One-stop initializationReleaseForConnection()- Resource cleanupType Exports
Interceptortype (wasinterceptor)GetInterceptor()method (wasgetInterceptor)Close()method (wasclose)📊 Integration Flow
Opt-In Priority (5 levels):
forceEnableTelemetry=true→ Always enabled (testing/internal)enableTelemetry=false→ Always disabled (explicit opt-out)enableTelemetry=true+ server flag → User opt-in with server control📈 Changes Summary
Phase 6 Files:
telemetry/errors.go(108 lines) - NEWtelemetry/interceptor.go(146 lines) - NEWtelemetry/aggregator.go(242 lines) - NEWtelemetry/client.go(+27/-9 lines) - MODIFIEDPhase 7 Files:
telemetry/driver_integration.go(59 lines) - NEWtelemetry/interceptor.go(exports) - MODIFIEDtelemetry/client.go(exports) - MODIFIEDinternal/config/config.go(+18 lines) - MODIFIEDconnection.go(+10 lines) - MODIFIEDconnector.go(+10 lines) - MODIFIEDtelemetry/DESIGN.md(marked Phase 6-7 complete) - MODIFIEDTotal: +1,073 insertions, -48 deletions across 13 files
✅ Testing Status
All tests passing ✅
Integration verified:
🎯 Completion Status
✅ Completed (Phases 1-7):
🔄 Optional Enhancements (Future):
🚀 Usage Example
Related Issues
Checklist