|
| 1 | +# Telemetry |
| 2 | + |
| 3 | +The driver emits anonymous usage and performance metrics to Databricks to help track driver |
| 4 | +adoption, identify performance regressions, and prioritize fixes. Telemetry is **enabled by |
| 5 | +default** and is additionally gated by a per-workspace server-side feature flag, so events are |
| 6 | +only exported when the workspace has telemetry turned on. No SQL text, parameter values, row |
| 7 | +data, table/column names, credentials, or IP addresses are ever collected. |
| 8 | + |
| 9 | +## What's collected |
| 10 | + |
| 11 | +Events are batched per host and exported to the Databricks control plane over HTTPS using the |
| 12 | +same auth as your queries. |
| 13 | + |
| 14 | +- **Connection** (`connection.open`): driver version and name, Node.js version, OS platform/ |
| 15 | + version, and boolean feature toggles (CloudFetch, LZ4, Arrow, direct results) plus numeric |
| 16 | + configs (socket timeout, retry max, CloudFetch concurrency). |
| 17 | +- **Statement** (`statement.start` / `statement.complete`): randomly generated statement and |
| 18 | + session UUIDs, operation type (e.g. `SELECT`), latency, result format, poll count, chunk |
| 19 | + count, bytes downloaded. |
| 20 | +- **CloudFetch chunk** (`cloudfetch.chunk`): chunk index, download latency, byte size, |
| 21 | + compressed flag. |
| 22 | +- **Error**: error class name, sanitized message (no PII), HTTP status, terminal-vs-retryable |
| 23 | + flag. Stack traces are not transmitted. |
| 24 | + |
| 25 | +Correlation IDs (session ID, statement ID) are random UUIDs and are not tied to user identity. |
| 26 | +Workspace ID is included for aggregation. |
| 27 | + |
| 28 | +## Configuration |
| 29 | + |
| 30 | +Options are passed to `new DBSQLClient({...})` (and can be overridden per `connect()` call). |
| 31 | +See the JSDoc on `IDBSQLClientConnectionOptions` in |
| 32 | +[`lib/contracts/IDBSQLClient.ts`](../lib/contracts/IDBSQLClient.ts) for the authoritative |
| 33 | +defaults and full descriptions. |
| 34 | + |
| 35 | +| Option | Purpose | |
| 36 | +| ---------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ | |
| 37 | +| `telemetryEnabled` | Master switch. `false` is a hard opt-out; `true` requests telemetry (still subject to the server flag). | |
| 38 | +| `telemetryAuthenticatedExport` | When `true`, exports go to the authenticated `/telemetry-ext` endpoint with full event context. When `false`, only error names go to the unauthenticated endpoint. | |
| 39 | +| `telemetryBatchSize` | Events accumulated before a flush. | |
| 40 | +| `telemetryFlushIntervalMs` | Periodic flush interval. | |
| 41 | +| `telemetryMaxRetries` | Retries per failed export. | |
| 42 | +| `telemetryCircuitBreakerThreshold` | Consecutive failures before the per-host breaker opens. | |
| 43 | +| `telemetryCircuitBreakerTimeout` | How long the breaker stays open before re-probing. | |
| 44 | +| `telemetryCloseTimeoutMs` | Upper bound on the final flush during `client.close()`. | |
| 45 | + |
| 46 | +### Basic example |
| 47 | + |
| 48 | +```javascript |
| 49 | +const { DBSQLClient } = require('@databricks/sql'); |
| 50 | + |
| 51 | +const client = new DBSQLClient(); |
| 52 | +await client.connect({ |
| 53 | + host: '********.databricks.com', |
| 54 | + path: '/sql/2.0/warehouses/****************', |
| 55 | + token: 'dapi********************************', |
| 56 | +}); |
| 57 | +``` |
| 58 | + |
| 59 | +### Disabling telemetry |
| 60 | + |
| 61 | +```javascript |
| 62 | +const client = new DBSQLClient({ telemetryEnabled: false }); |
| 63 | +``` |
| 64 | + |
| 65 | +## Opt-out |
| 66 | + |
| 67 | +Three independent ways to disable, in order of precedence (first match wins): |
| 68 | + |
| 69 | +1. **Environment variable**: `DATABRICKS_TELEMETRY_DISABLED` set to `1`, `true`, `yes`, or |
| 70 | + `on` (case-insensitive) disables telemetry process-wide regardless of any other setting. |
| 71 | +2. **Programmatic**: `telemetryEnabled: false` in `DBSQLClient` or `connect()` options is a |
| 72 | + hard opt-out for that client. |
| 73 | +3. **Server feature flag**: If the workspace's server-side flag is off, no events are exported |
| 74 | + even when the client requests them. |
| 75 | + |
| 76 | +## Multi-tenant / SaaS warning |
| 77 | + |
| 78 | +The driver maintains a singleton telemetry client per host (shared across all `DBSQLClient` |
| 79 | +instances pointing at the same workspace) to batch events and avoid rate limits. In a |
| 80 | +multi-tenant process where multiple tenants connect to the same host with different |
| 81 | +credentials, events buffered for tenant A may be flushed using whichever connection happens to |
| 82 | +own the authenticated export at the time. Tenant B's auth headers could carry tenant A's |
| 83 | +telemetry payload. |
| 84 | + |
| 85 | +If you run a multi-tenant SaaS that proxies queries from distinct end-customers through one |
| 86 | +Node process to the same Databricks host, set `telemetryEnabled: false` (or |
| 87 | +`telemetryAuthenticatedExport: false`) to prevent cross-tenant attribution in telemetry. |
| 88 | + |
| 89 | +## Troubleshooting |
| 90 | + |
| 91 | +- **No events visible**: confirm `telemetryEnabled` is not `false`, `DATABRICKS_TELEMETRY_DISABLED` |
| 92 | + is unset, and the workspace feature flag is on. Look for the debug log |
| 93 | + `Telemetry disabled via feature flag`. |
| 94 | +- **Events suddenly stop**: the per-host circuit breaker has likely opened after repeated |
| 95 | + export failures. Look for `Circuit breaker transitioned to OPEN`; it re-probes automatically |
| 96 | + after `telemetryCircuitBreakerTimeout` (default 60s). |
| 97 | +- **Buffer pressure / dropped metrics**: check `client.getTelemetryStats().droppedMetrics`. If |
| 98 | + it climbs, increase `telemetryMaxPendingMetrics` or lower `telemetryFlushIntervalMs`. |
| 99 | +- **Shutdown delay**: `client.close()` waits up to `telemetryCloseTimeoutMs` (default 2s) for |
| 100 | + the final flush. Lower it if shutdown latency matters more than the last batch. |
| 101 | +- **Telemetry failures impacting the app**: they shouldn't. Exceptions are caught and logged |
| 102 | + at debug only; the driver continues regardless. File an issue if you see otherwise. |
| 103 | + |
| 104 | +## FAQ |
| 105 | + |
| 106 | +**Does telemetry affect query performance?** Event emission is non-blocking and exports are |
| 107 | +batched on a background timer. Overhead is well under 1% of query time in typical workloads. |
| 108 | + |
| 109 | +**Can I see what's being sent?** Yes, enable debug-level logging on the driver's logger. |
| 110 | +Every export and circuit-breaker transition is logged. |
| 111 | + |
| 112 | +**Where does the data go?** To `/api/2.0/sql/telemetry-ext` (authenticated) or |
| 113 | +`/api/2.0/sql/telemetry-unauth` on the same Databricks host you're connected to. It stays in |
| 114 | +the same regional control plane as your queries. |
| 115 | + |
| 116 | +**Can I route telemetry to my own backend?** Not via configuration. Disable it and instrument |
| 117 | +your application using your own logger/metrics. |
| 118 | + |
| 119 | +**Can I disable telemetry for a single query?** No, the granularity is per-connection. Open a |
| 120 | +separate `DBSQLClient` with `telemetryEnabled: false` for the queries you want excluded. |
| 121 | + |
| 122 | +For implementation details (per-host management, circuit breaker state machine, exception |
| 123 | +handling policy), see [`spec/telemetry-design.md`](../spec/telemetry-design.md). |
0 commit comments