Skip to content

JiT Worker in CLI and RPC infrastructure#2109

Open
rafal-hawrylak wants to merge 1 commit intomainfrom
cli_jit_workers
Open

JiT Worker in CLI and RPC infrastructure#2109
rafal-hawrylak wants to merge 1 commit intomainfrom
cli_jit_workers

Conversation

@rafal-hawrylak
Copy link
Copy Markdown
Collaborator

This change establishes the foundation for executing JiT compilation in an isolated environment. It introduces the worker process management, the RPC bridge for database access during JiT, and the necessary Bazel targets.

@rafal-hawrylak rafal-hawrylak marked this pull request as ready for review March 9, 2026 19:59
@rafal-hawrylak rafal-hawrylak requested a review from a team as a code owner March 9, 2026 19:59
@rafal-hawrylak rafal-hawrylak enabled auto-merge (squash) March 9, 2026 19:59
@rafal-hawrylak rafal-hawrylak self-assigned this Mar 9, 2026
@@ -0,0 +1,101 @@
import { IDbAdapter, IDbClient } from "df/cli/api/dbadapters";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify: is this protobuf bridge protocol the same for GCP and CLI implementations?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both the CLI and GCP implementations follow the dataform.DbAdapter service defined in jit.proto.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment below on the issue with "Execute" compatibility

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL on the approach with executeRaw

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return new Uint8Array();
}

function mapRowToProto(row: { [key: string]: any }): google.protobuf.IStruct {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jit.proto declares that struct contains:

// Rows. For BigQuery, see
  // https://docs.cloud.google.com/bigquery/docs/reference/rest/v2/jobs/getQueryResults.

In other words, right now we expect a raw API result of "f,v" JSON struct, not bespoke conversion.

The BigQuery client for Node is strange in respect for this - it forcefully decodes those and removes rows from the original request. We could either:
a) Use googleapis package client instead for JiT
b) Come up with another protocol for encoding rows and implement it both here and in GCP. Let me know on chat if you need code pointers for GCP part.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL on the approach with executeRaw

Copy link
Copy Markdown
Collaborator

@ikholopov-omni ikholopov-omni Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has exactly the same problem. query() inside rawExecute already removes rows from response and only returns decoded ones as first component in the tuple [rows, _, response]

@rafal-hawrylak rafal-hawrylak force-pushed the cli_jit_workers branch 10 times, most recently from a05cf58 to be56edb Compare March 12, 2026 15:50
@rafal-hawrylak rafal-hawrylak force-pushed the cli_jit_workers branch 5 times, most recently from 9ca1f45 to 0fe90aa Compare March 27, 2026 18:00

async function handleListTables(dbadapter: IDbAdapter, request: Uint8Array): Promise<Uint8Array> {
const listTablesRequest = dataform.ListTablesRequest.decode(request);
const targets = await dbadapter.tables(listTablesRequest.database, listTablesRequest.schema);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the ListTables API could potentially return all needed metadata without the need to request it separately for each table?

Copy link
Copy Markdown
Collaborator Author

@rafal-hawrylak rafal-hawrylak Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can hide it in bigquery adapter. Eventually, we can't avoid api requests per table to bigquery and that's because to fetch metadata there is no suitable api in bigquery client. We may get most of the metadata directly from INFORMATION_SCHEMA.TABLE and INFORMATION_SCHEMA.COLUMNS but we'd miss lastModifiedTime and labels. Labels though can be obtained from INFORMATION_SCHEMA.TABLE_OPTIONS but this is not stightforward, requires parsing and what the most important this is low level information for which we'd couple bigquery implementation knowledge into our code.

@rafal-hawrylak rafal-hawrylak force-pushed the cli_jit_workers branch 5 times, most recently from c6650a7 to 22d3661 Compare March 29, 2026 08:40
options?: { concurrencyLimit?: number; bigqueryClient?: BigQuery }
) {
this.bigQueryCredentials = credentials;
this.bigqueryClient = options?.bigqueryClient;
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kolina like this? Honestly I don't like having the field in class just for tests. Maybe extract logic of creating BigQuery clients outside BigQueryDbAdapter to BigQueryClientProvider - it would be easily mockable and we'd use the benefits of dependency injection?

@rafal-hawrylak rafal-hawrylak force-pushed the cli_jit_workers branch 2 times, most recently from d512e48 to 38433f2 Compare March 30, 2026 06:56
* feat: add worker process management for JiT compilation

- Introduce base worker and JiT-specific child process logic

- Implement RPC bridge for database access during JiT

- Add VM scripts and loader for isolated execution

- Add unit tests for the RPC mechanism
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants