Skip to content

Commit cc97b18

Browse files
committed
fix(sea): F1/F2 fixups — u32 upper-bound + mock test + skip-gate + strict typeId asserts
DA round 1 findings on F1 (max_connections) and F2 (complex types as Arrow), addressed in a single fixup because both touch the F1 NodeJS branch. F1 — M1 upper-bound: the JS-layer validation accepted any positive integer but the napi binding accepts `u32`. A value > 2^32 - 1 would either silently truncate at FFI or throw a cryptic kernel error. Add an explicit upper-bound check with a clear `HiveDriverError` message that points at typical pool sizes. Three new unit tests: accept MAX_U32, reject MAX_U32+1, reject Number.MAX_SAFE_INTEGER. F1 — M2 mock test: the existing unit suite verified the buildSeaConnectionOptions output shape but did not exercise the end-to-end SeaBackend.connect → openSession → napi.openSession round-trip. Add two mock-binding tests via the existing `makeFakeBinding` helper that: 1. Confirm maxConnections is forwarded to the napi openSession call 2. Confirm maxConnections is `undefined` (not 0) when omitted, so the napi binding's `Option<u32>` reads None and the kernel default of 100 applies. F2 — H1 skip-gate: the e2e test imported `getSeaNative` at module top-level, which transitively ran `SeaNativeLoader.ts`'s module-level `require('../../native/sea/index.js')`. When the `.node` artifact isn't built (`yarn build:native` not run), that require throws MODULE_NOT_FOUND BEFORE mocha can invoke `before()`, crashing test discovery for the whole suite. Fix: 1. Verify the native artifact exists in `before()` and skip with a clear "run yarn build:native" message if absent 2. Lazy-require the napi shim via `createRequire(import.meta.url)` inside `loadBinding()` so the require lives outside module-load path. `createRequire` (vs bare `require`) handles the ESM reparse path mocha 11+ takes for ts-node-emitted files (MODULE_TYPELESS_PACKAGE_JSON warning). F2 — M2 strict assertions: prior regex-based assertions (`/List/i`, `/Map|List/i`, etc.) were too permissive — they would accept LargeList or FixedSizeList where the kernel must return plain `List`. Switch to arrow-js `Type` enum comparisons via `.type.typeId`. Adds: - Strict typeId equality for c_arr (List), c_map (Map), c_struct (Struct), c_arr_struct (List) - Nested-structural check that c_arr_struct's child is Struct - Row-count sanity assert (literal SELECT → 1 row) F2 — live e2e: ran against pecotesting (`DATABRICKS_PECOTESTING_*`) with the freshly-built napi binary. All assertions passed in 655ms. The kernel default `complex_types_as_json=false` is verified to produce native Arrow shapes for ARRAY/MAP/STRUCT and nested ARRAY<STRUCT> end-to-end. Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
1 parent 3e7852f commit cc97b18

3 files changed

Lines changed: 199 additions & 18 deletions

File tree

lib/sea/SeaAuth.ts

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -173,14 +173,23 @@ export function buildSeaConnectionOptions(options: ConnectionOptions): SeaNative
173173
};
174174
// Connection-level pool size — forwarded to the napi binding,
175175
// ignored when undefined so the kernel default (100) applies.
176-
// Validate at the JS layer (positive integer) so a misuse fails
177-
// here instead of inside the kernel with a cryptic message.
176+
// Validate at the JS layer (positive integer, ≤ 2^32 - 1 since the
177+
// napi binding accepts `u32`) so a misuse fails here with a clear
178+
// `HiveDriverError` instead of either silently truncating at the
179+
// FFI boundary or surfacing a cryptic kernel-side error. Upper-
180+
// bound check addresses DA round-1 M1 finding.
181+
const MAX_U32 = 0xff_ff_ff_ff; // 4_294_967_295
178182
if (options.maxConnections !== undefined) {
179183
if (!Number.isInteger(options.maxConnections) || options.maxConnections < 1) {
180184
throw new HiveDriverError(
181185
`SEA backend: \`maxConnections\` must be a positive integer; got ${options.maxConnections}.`,
182186
);
183187
}
188+
if (options.maxConnections > MAX_U32) {
189+
throw new HiveDriverError(
190+
`SEA backend: \`maxConnections\` exceeds the napi u32 limit (${MAX_U32}); got ${options.maxConnections}. Typical pool sizes are 10-500.`,
191+
);
192+
}
184193
base.maxConnections = options.maxConnections;
185194
}
186195

tests/e2e/sea/complex-types-e2e.test.ts

Lines changed: 99 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -34,8 +34,22 @@
3434
*/
3535

3636
import { expect } from 'chai';
37-
import { tableFromIPC } from 'apache-arrow';
38-
import { getSeaNative } from '../../../lib/sea/SeaNativeLoader';
37+
import { tableFromIPC, Type as ArrowType } from 'apache-arrow';
38+
import * as fs from 'fs';
39+
import * as path from 'path';
40+
import { createRequire } from 'module';
41+
42+
// Prerequisites for the live e2e:
43+
// 1. `DATABRICKS_PECOTESTING_*` env vars set (host, http_path, token)
44+
// 2. `yarn build:native` has produced `native/sea/index.linux-x64-gnu.node`
45+
//
46+
// SeaNativeLoader.ts does a module-level `const native =
47+
// require('../../native/sea/index.js')` which crashes file evaluation
48+
// (MODULE_NOT_FOUND) when the `.node` artifact is absent, BEFORE mocha
49+
// can run the `before()` skip-gate. So we (a) verify both prereqs from
50+
// the suite's `before()` and skip-and-explain, and (b) defer the
51+
// `getSeaNative` import to inside the test bodies via a synchronous
52+
// import-on-demand. DA round-1 H1 fixup.
3953

4054
interface NativeBinding {
4155
openSession(opts: {
@@ -71,11 +85,46 @@ describe('SEA complex types — native Arrow default', function suite() {
7185
if (!hostName || !httpPath || !token) {
7286
// eslint-disable-next-line no-invalid-this
7387
this.skip();
88+
return;
89+
}
90+
// Verify the native artifact exists before any test in the suite
91+
// attempts to import `getSeaNative` from SeaNativeLoader (whose
92+
// module-level require would crash the whole file load on
93+
// MODULE_NOT_FOUND). Skip-with-message so a developer sees the
94+
// actionable instruction instead of a cryptic crash.
95+
//
96+
// Use `process.cwd()` instead of `__dirname` because mocha may
97+
// reparse this file as ESM (MODULE_TYPELESS_PACKAGE_JSON path)
98+
// where `__dirname` is undefined. mocha always runs with cwd at
99+
// the package root, so this resolves consistently.
100+
const nodeArtifact = path.resolve(
101+
process.cwd(),
102+
'native/sea/index.linux-x64-gnu.node',
103+
);
104+
if (!fs.existsSync(nodeArtifact)) {
105+
// eslint-disable-next-line no-console
106+
console.warn(
107+
`[sea complex-types e2e] skipping: native binary not built. ` +
108+
`Run \`yarn build:native\` (or set DATABRICKS_SQL_KERNEL_REPO + yarn build:native) first.`,
109+
);
110+
// eslint-disable-next-line no-invalid-this
111+
this.skip();
74112
}
75113
});
76114

115+
// Build a `require` function from this module's URL so the call
116+
// works under both CJS and ESM reparse paths. mocha 11+ may reparse
117+
// ts-node-emitted files as ESM (MODULE_TYPELESS_PACKAGE_JSON), in
118+
// which case the bare `require` symbol is undefined.
119+
// eslint-disable-next-line @typescript-eslint/naming-convention
120+
const requireFromHere = createRequire(import.meta.url);
121+
122+
function loadBinding(): NativeBinding {
123+
return requireFromHere('../../../native/sea/index.js') as NativeBinding;
124+
}
125+
77126
it('ARRAY / MAP / STRUCT come back as native Arrow shapes', async () => {
78-
const binding = getSeaNative() as unknown as NativeBinding;
127+
const binding = loadBinding();
79128
const connection = await binding.openSession({
80129
hostName: hostName as string,
81130
httpPath: httpPath as string,
@@ -109,22 +158,56 @@ describe('SEA complex types — native Arrow default', function suite() {
109158
expect(structField, 'c_struct field present').to.not.equal(undefined);
110159
expect(arrStructField, 'c_arr_struct field present').to.not.equal(undefined);
111160

112-
// Arrow type ids per arrow-js — these are the structural checks
113-
// that distinguish "native Arrow" from "JSON Utf8". Arrow type
114-
// names are stable across arrow-js minor versions.
115-
expect(arrField!.type.toString()).to.match(/List/i, 'c_arr should be List');
116-
expect(mapField!.type.toString()).to.match(/Map|List/i, 'c_map should be Map (or List of Struct of key/value)');
117-
expect(structField!.type.toString()).to.match(/Struct/i, 'c_struct should be Struct');
118-
expect(arrStructField!.type.toString()).to.match(/List/i, 'c_arr_struct should be List of Struct');
119-
120-
// Sanity-check: NONE of the complex columns should be Utf8 — that
121-
// would indicate complex_types_as_json was inadvertently enabled.
161+
// Strict typeId checks (DA round-1 M2 fixup — prior regex
162+
// assertions matched too permissively, e.g. `/List/i` would
163+
// accept LargeList or FixedSizeList too. Arrow `Type` is a
164+
// numeric enum from arrow-js; comparing typeId is stable
165+
// across arrow-js minor versions and a structural match.
166+
//
167+
// The server returns ARRAY columns as Arrow `List` (typeId 12),
168+
// MAP as Arrow `Map` (typeId 17), STRUCT as Arrow `Struct`
169+
// (typeId 13). Nested ARRAY<STRUCT> is `List` whose child is
170+
// `Struct`.
171+
expect(arrField!.type.typeId).to.equal(
172+
ArrowType.List,
173+
`c_arr typeId should be List(${ArrowType.List}), got ${arrField!.type.typeId} (${arrField!.type})`,
174+
);
175+
expect(mapField!.type.typeId).to.equal(
176+
ArrowType.Map,
177+
`c_map typeId should be Map(${ArrowType.Map}), got ${mapField!.type.typeId} (${mapField!.type})`,
178+
);
179+
expect(structField!.type.typeId).to.equal(
180+
ArrowType.Struct,
181+
`c_struct typeId should be Struct(${ArrowType.Struct}), got ${structField!.type.typeId} (${structField!.type})`,
182+
);
183+
expect(arrStructField!.type.typeId).to.equal(
184+
ArrowType.List,
185+
`c_arr_struct typeId should be List(${ArrowType.List}), got ${arrStructField!.type.typeId} (${arrStructField!.type})`,
186+
);
187+
188+
// Nested-structural check: c_arr_struct should be List<Struct>.
189+
// Drilling into `children[0].type.typeId` catches a regression
190+
// where the kernel might wrap a Struct in something else (e.g.
191+
// FixedSizeList).
192+
const arrStructChildType = arrStructField!.type.children[0].type;
193+
expect(arrStructChildType.typeId).to.equal(
194+
ArrowType.Struct,
195+
`c_arr_struct child typeId should be Struct(${ArrowType.Struct}), got ${arrStructChildType.typeId}`,
196+
);
197+
198+
// Negative assertion: NONE of the complex columns should be Utf8
199+
// — that would indicate `complex_types_as_json` was inadvertently
200+
// enabled on the kernel side. Using typeId here too rather than
201+
// a string regex.
122202
for (const f of [arrField!, mapField!, structField!, arrStructField!]) {
123-
expect(f.type.toString()).to.not.match(
124-
/^Utf8$/,
125-
`${f.name} must not be a JSON string column`,
203+
expect(f.type.typeId).to.not.equal(
204+
ArrowType.Utf8,
205+
`${f.name} must not be a JSON string column (Utf8 typeId)`,
126206
);
127207
}
208+
209+
// Row-count sanity: SELECT-of-literals yields exactly one row.
210+
expect(table.numRows).to.equal(1, 'literal SELECT should yield one row');
128211
} finally {
129212
if (statement !== null) {
130213
try {

tests/unit/sea/auth-max-connections.test.ts

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,8 @@ import { expect } from 'chai';
1616
import { buildSeaConnectionOptions } from '../../../lib/sea/SeaAuth';
1717
import { ConnectionOptions } from '../../../lib/contracts/IDBSQLClient';
1818
import HiveDriverError from '../../../lib/errors/HiveDriverError';
19+
import SeaBackend from '../../../lib/sea/SeaBackend';
20+
import { makeFakeBinding } from './_helpers/fakeBinding';
1921

2022
describe('SeaAuth — maxConnections plumbing', () => {
2123
describe('buildSeaConnectionOptions forwards maxConnections', () => {
@@ -125,5 +127,92 @@ describe('SeaAuth — maxConnections plumbing', () => {
125127
expect(native.authMode).to.equal('OAuthU2m');
126128
expect(native.maxConnections).to.equal(25);
127129
});
130+
131+
// ─── DA round-1 M1 fixup — upper-bound (u32 ceiling) ───────────
132+
133+
it('accepts the maximum u32 value (4_294_967_295)', () => {
134+
const MAX_U32 = 4_294_967_295;
135+
const opts: ConnectionOptions = {
136+
host: 'example.cloud.databricks.com',
137+
path: '/sql/1.0/warehouses/abc',
138+
token: 'dapi-fake-pat',
139+
maxConnections: MAX_U32,
140+
};
141+
142+
const native = buildSeaConnectionOptions(opts);
143+
expect(native.maxConnections).to.equal(MAX_U32);
144+
});
145+
146+
it('rejects values exceeding the napi u32 ceiling', () => {
147+
const opts: ConnectionOptions = {
148+
host: 'example.cloud.databricks.com',
149+
path: '/sql/1.0/warehouses/abc',
150+
token: 'dapi-fake-pat',
151+
maxConnections: 4_294_967_296, // 2^32, one over u32 max
152+
};
153+
154+
expect(() => buildSeaConnectionOptions(opts)).to.throw(
155+
HiveDriverError,
156+
/maxConnections.*exceeds.*u32 limit/,
157+
);
158+
});
159+
160+
it('rejects 2^53 - 1 (safe integer ceiling — would silently truncate at FFI)', () => {
161+
const opts: ConnectionOptions = {
162+
host: 'example.cloud.databricks.com',
163+
path: '/sql/1.0/warehouses/abc',
164+
token: 'dapi-fake-pat',
165+
maxConnections: Number.MAX_SAFE_INTEGER,
166+
};
167+
168+
// The JS layer rejects before the FFI boundary so the napi
169+
// binding never sees a value it can't faithfully represent.
170+
expect(() => buildSeaConnectionOptions(opts)).to.throw(
171+
HiveDriverError,
172+
/maxConnections.*exceeds.*u32 limit/,
173+
);
174+
});
175+
});
176+
177+
// ─── DA round-1 M2 fixup — mock-binding round-trip ──────────────
178+
179+
describe('SeaBackend forwards maxConnections to the napi openSession call', () => {
180+
it('passes the user-supplied maxConnections value through', async () => {
181+
const { binding, calls } = makeFakeBinding();
182+
const backend = new SeaBackend({ nativeBinding: binding });
183+
184+
await backend.connect({
185+
host: 'example.cloud.databricks.com',
186+
path: '/sql/1.0/warehouses/abc',
187+
token: 'dapi-fake-pat',
188+
maxConnections: 250,
189+
});
190+
await backend.openSession({});
191+
192+
const openCall = calls.find((c) => c.method === 'openSession');
193+
expect(openCall, 'openSession must be called on the binding').to.not.equal(undefined);
194+
const opts = openCall!.args[0] as { maxConnections?: number; authMode?: string };
195+
expect(opts.maxConnections).to.equal(250);
196+
expect(opts.authMode).to.equal('Pat');
197+
});
198+
199+
it('omits maxConnections from the openSession call when not supplied', async () => {
200+
const { binding, calls } = makeFakeBinding();
201+
const backend = new SeaBackend({ nativeBinding: binding });
202+
203+
await backend.connect({
204+
host: 'example.cloud.databricks.com',
205+
path: '/sql/1.0/warehouses/abc',
206+
token: 'dapi-fake-pat',
207+
});
208+
await backend.openSession({});
209+
210+
const openCall = calls.find((c) => c.method === 'openSession');
211+
expect(openCall).to.not.equal(undefined);
212+
const opts = openCall!.args[0] as { maxConnections?: number };
213+
// `undefined` (not `0`) — the napi binding's `Option<u32>` reads
214+
// this as None and applies the kernel default of 100.
215+
expect(opts.maxConnections).to.equal(undefined);
216+
});
128217
});
129218
});

0 commit comments

Comments
 (0)