Skip to content

Commit 45563e5

Browse files
committed
Merge origin/main into sea-abstraction
Resolve conflicts between the SEA backend abstraction and the telemetry/SPOG changes from main while keeping Thrift-specific behavior behind backend adapters.
2 parents a9347e8 + e200a1b commit 45563e5

40 files changed

Lines changed: 4563 additions & 555 deletions

.eslintrc

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,8 @@
2929
"files": ["*.test.js", "*.test.ts"],
3030
"rules": {
3131
"no-unused-expressions": "off",
32-
"@typescript-eslint/no-unused-expressions": "off"
32+
"@typescript-eslint/no-unused-expressions": "off",
33+
"func-names": "off"
3334
}
3435
}
3536
]

CHANGELOG.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,12 @@
11
# Release History
22

3+
## 1.15.0
4+
5+
- Add SPOG routing support: parse `?o=<workspaceId>` from `httpPath` and inject `x-databricks-org-id` on Thrift, telemetry, and feature-flag requests. Expose `customHeaders` on `ConnectionOptions` for caller-supplied headers (databricks/databricks-sql-nodejs#391 by @samikshya-db)
6+
- Telemetry: enable by default with feature-flag-controlled priority, and fix final-flush dropping on `client.close()` due to a close-ordering bug (databricks/databricks-sql-nodejs#327, #391 by @samikshya-db)
7+
- Fix Azure AD OAuth: tenant-aware discovery URL and correct scope resource (databricks/databricks-sql-nodejs#363 by @msrathore-db)
8+
- Fix: use a valid SPDX license identifier in `package.json` (databricks/databricks-sql-nodejs#389 by @sreekanth-db)
9+
310
## 1.14.0
411

512
- Add statement-level query tag support (databricks/databricks-sql-nodejs#366 by @sreekanth-db)
@@ -12,6 +19,13 @@
1219
- Add metric view metadata support (databricks/databricks-sql-nodejs#312 by @shivam2680)
1320
- Fix: Avoid calling require('lz4') if it's really not required (databricks/databricks-sql-nodejs#316 by @ikkala)
1421
- Add telemetry foundation (off by default) (databricks/databricks-sql-nodejs#324 by @samikshya-db)
22+
- Telemetry event emission and per-host aggregation (databricks/databricks-sql-nodejs#327 by @samikshya-db).
23+
**Default change:** `telemetryEnabled` now defaults to `true` (gated by a remote feature flag).
24+
To opt out programmatically, pass `telemetryEnabled: false` to `connect()`.
25+
To disable globally without code changes, set the environment variable
26+
`DATABRICKS_TELEMETRY_DISABLED` to one of `1`, `true`, `yes`, or `on`
27+
(case-insensitive). Other values (empty, `0`, `false`, etc.) are ignored
28+
— the runtime config takes precedence.
1529

1630
## 1.12.0
1731

README.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,18 @@ client
5151
});
5252
```
5353

54+
## Telemetry
55+
56+
The driver emits connection, statement, and CloudFetch metrics plus
57+
redacted error events to help Databricks improve driver reliability. No
58+
SQL text, parameter values, or row data is ever collected. Emission is
59+
gated by a server-side feature flag and can be disabled per-connection
60+
with `telemetryEnabled: false` or globally with the
61+
`DATABRICKS_TELEMETRY_DISABLED` env var.
62+
63+
See [docs/TELEMETRY.md](docs/TELEMETRY.md) for the full event payloads,
64+
tuning knobs, multi-tenant guidance, and troubleshooting.
65+
5466
## Run Tests
5567

5668
### Unit tests

docs/TELEMETRY.md

Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
# Telemetry
2+
3+
The driver emits anonymous usage and performance metrics to Databricks to help track driver
4+
adoption, identify performance regressions, and prioritize fixes. Telemetry is **enabled by
5+
default** and is additionally gated by a per-workspace server-side feature flag, so events are
6+
only exported when the workspace has telemetry turned on. No SQL text, parameter values, row
7+
data, table/column names, credentials, or IP addresses are ever collected.
8+
9+
## What's collected
10+
11+
Events are batched per host and exported to the Databricks control plane over HTTPS using the
12+
same auth as your queries.
13+
14+
- **Connection** (`connection.open`): driver version and name, Node.js version, OS platform/
15+
version, and boolean feature toggles (CloudFetch, LZ4, Arrow, direct results) plus numeric
16+
configs (socket timeout, retry max, CloudFetch concurrency).
17+
- **Statement** (`statement.start` / `statement.complete`): randomly generated statement and
18+
session UUIDs, operation type (e.g. `SELECT`), latency, result format, poll count, chunk
19+
count, bytes downloaded.
20+
- **CloudFetch chunk** (`cloudfetch.chunk`): chunk index, download latency, byte size,
21+
compressed flag.
22+
- **Error**: error class name, sanitized message (no PII), HTTP status, terminal-vs-retryable
23+
flag. Stack traces are not transmitted.
24+
25+
Correlation IDs (session ID, statement ID) are random UUIDs and are not tied to user identity.
26+
Workspace ID is included for aggregation.
27+
28+
## Configuration
29+
30+
Options are passed to `new DBSQLClient({...})` (and can be overridden per `connect()` call).
31+
See the JSDoc on `IDBSQLClientConnectionOptions` in
32+
[`lib/contracts/IDBSQLClient.ts`](../lib/contracts/IDBSQLClient.ts) for the authoritative
33+
defaults and full descriptions.
34+
35+
| Option | Purpose |
36+
| ---------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
37+
| `telemetryEnabled` | Master switch. `false` is a hard opt-out; `true` requests telemetry (still subject to the server flag). |
38+
| `telemetryAuthenticatedExport` | When `true`, exports go to the authenticated `/telemetry-ext` endpoint with full event context. When `false`, only error names go to the unauthenticated endpoint. |
39+
| `telemetryBatchSize` | Events accumulated before a flush. |
40+
| `telemetryFlushIntervalMs` | Periodic flush interval. |
41+
| `telemetryMaxRetries` | Retries per failed export. |
42+
| `telemetryCircuitBreakerThreshold` | Consecutive failures before the per-host breaker opens. |
43+
| `telemetryCircuitBreakerTimeout` | How long the breaker stays open before re-probing. |
44+
| `telemetryCloseTimeoutMs` | Upper bound on the final flush during `client.close()`. |
45+
46+
### Basic example
47+
48+
```javascript
49+
const { DBSQLClient } = require('@databricks/sql');
50+
51+
const client = new DBSQLClient();
52+
await client.connect({
53+
host: '********.databricks.com',
54+
path: '/sql/2.0/warehouses/****************',
55+
token: 'dapi********************************',
56+
});
57+
```
58+
59+
### Disabling telemetry
60+
61+
```javascript
62+
const client = new DBSQLClient({ telemetryEnabled: false });
63+
```
64+
65+
## Opt-out
66+
67+
Three independent ways to disable, in order of precedence (first match wins):
68+
69+
1. **Environment variable**: `DATABRICKS_TELEMETRY_DISABLED` set to `1`, `true`, `yes`, or
70+
`on` (case-insensitive) disables telemetry process-wide regardless of any other setting.
71+
2. **Programmatic**: `telemetryEnabled: false` in `DBSQLClient` or `connect()` options is a
72+
hard opt-out for that client.
73+
3. **Server feature flag**: If the workspace's server-side flag is off, no events are exported
74+
even when the client requests them.
75+
76+
## Multi-tenant / SaaS warning
77+
78+
The driver maintains a singleton telemetry client per host (shared across all `DBSQLClient`
79+
instances pointing at the same workspace) to batch events and avoid rate limits. In a
80+
multi-tenant process where multiple tenants connect to the same host with different
81+
credentials, events buffered for tenant A may be flushed using whichever connection happens to
82+
own the authenticated export at the time. Tenant B's auth headers could carry tenant A's
83+
telemetry payload.
84+
85+
If you run a multi-tenant SaaS that proxies queries from distinct end-customers through one
86+
Node process to the same Databricks host, set `telemetryEnabled: false` (or
87+
`telemetryAuthenticatedExport: false`) to prevent cross-tenant attribution in telemetry.
88+
89+
## Troubleshooting
90+
91+
- **No events visible**: confirm `telemetryEnabled` is not `false`, `DATABRICKS_TELEMETRY_DISABLED`
92+
is unset, and the workspace feature flag is on. Look for the debug log
93+
`Telemetry disabled via feature flag`.
94+
- **Events suddenly stop**: the per-host circuit breaker has likely opened after repeated
95+
export failures. Look for `Circuit breaker transitioned to OPEN`; it re-probes automatically
96+
after `telemetryCircuitBreakerTimeout` (default 60s).
97+
- **Buffer pressure / dropped metrics**: check `client.getTelemetryStats().droppedMetrics`. If
98+
it climbs, increase `telemetryMaxPendingMetrics` or lower `telemetryFlushIntervalMs`.
99+
- **Shutdown delay**: `client.close()` waits up to `telemetryCloseTimeoutMs` (default 2s) for
100+
the final flush. Lower it if shutdown latency matters more than the last batch.
101+
- **Telemetry failures impacting the app**: they shouldn't. Exceptions are caught and logged
102+
at debug only; the driver continues regardless. File an issue if you see otherwise.
103+
104+
## FAQ
105+
106+
**Does telemetry affect query performance?** Event emission is non-blocking and exports are
107+
batched on a background timer. Overhead is well under 1% of query time in typical workloads.
108+
109+
**Can I see what's being sent?** Yes, enable debug-level logging on the driver's logger.
110+
Every export and circuit-breaker transition is logged.
111+
112+
**Where does the data go?** To `/api/2.0/sql/telemetry-ext` (authenticated) or
113+
`/api/2.0/sql/telemetry-unauth` on the same Databricks host you're connected to. It stays in
114+
the same regional control plane as your queries.
115+
116+
**Can I route telemetry to my own backend?** Not via configuration. Disable it and instrument
117+
your application using your own logger/metrics.
118+
119+
**Can I disable telemetry for a single query?** No, the granularity is per-connection. Open a
120+
separate `DBSQLClient` with `telemetryEnabled: false` for the queries you want excluded.
121+
122+
For implementation details (per-host management, circuit breaker state machine, exception
123+
handling policy), see [`spec/telemetry-design.md`](../spec/telemetry-design.md).

0 commit comments

Comments
 (0)