JiT Worker in CLI and RPC infrastructure#2109
Conversation
a08d4da to
01fa6ba
Compare
| @@ -0,0 +1,101 @@ | |||
| import { IDbAdapter, IDbClient } from "df/cli/api/dbadapters"; | |||
There was a problem hiding this comment.
To clarify: is this protobuf bridge protocol the same for GCP and CLI implementations?
There was a problem hiding this comment.
Both the CLI and GCP implementations follow the dataform.DbAdapter service defined in jit.proto.
- They both use the same binary "payloads" for the actual data (e.g., ExecuteRequest, ExecuteResponse).
- In CLI all 4 methods are implemented (Execute, ListTables, GetTable, and DeleteTable) and in GCP only Execute as of now (https://source.corp.google.com/piper///depot/google3/cloud/dataform/compilation/jitrunner/worker/engine.cc;rcl=881961593;l=163)
There was a problem hiding this comment.
See my comment below on the issue with "Execute" compatibility
There was a problem hiding this comment.
PTAL on the approach with executeRaw
There was a problem hiding this comment.
This one is waiting for:
- this to be released: feat(bigquery): allow the user to ask for skipping parsing rows when quering googleapis/google-cloud-node#7848
- this to be reviewed, merged and released: fix(bigquery): allow timestamp format overrides in createQueryStream googleapis/google-cloud-node#7905
01fa6ba to
33b9f19
Compare
cli/api/commands/jit/rpc.ts
Outdated
| return new Uint8Array(); | ||
| } | ||
|
|
||
| function mapRowToProto(row: { [key: string]: any }): google.protobuf.IStruct { |
There was a problem hiding this comment.
jit.proto declares that struct contains:
// Rows. For BigQuery, see
// https://docs.cloud.google.com/bigquery/docs/reference/rest/v2/jobs/getQueryResults.
In other words, right now we expect a raw API result of "f,v" JSON struct, not bespoke conversion.
The BigQuery client for Node is strange in respect for this - it forcefully decodes those and removes rows from the original request. We could either:
a) Use googleapis package client instead for JiT
b) Come up with another protocol for encoding rows and implement it both here and in GCP. Let me know on chat if you need code pointers for GCP part.
There was a problem hiding this comment.
PTAL on the approach with executeRaw
There was a problem hiding this comment.
It has exactly the same problem. query() inside rawExecute already removes rows from response and only returns decoded ones as first component in the tuple [rows, _, response]
a05cf58 to
be56edb
Compare
9ca1f45 to
0fe90aa
Compare
cli/api/commands/jit/rpc.ts
Outdated
|
|
||
| async function handleListTables(dbadapter: IDbAdapter, request: Uint8Array): Promise<Uint8Array> { | ||
| const listTablesRequest = dataform.ListTablesRequest.decode(request); | ||
| const targets = await dbadapter.tables(listTablesRequest.database, listTablesRequest.schema); |
There was a problem hiding this comment.
I think the ListTables API could potentially return all needed metadata without the need to request it separately for each table?
There was a problem hiding this comment.
Yes, we can hide it in bigquery adapter. Eventually, we can't avoid api requests per table to bigquery and that's because to fetch metadata there is no suitable api in bigquery client. We may get most of the metadata directly from INFORMATION_SCHEMA.TABLE and INFORMATION_SCHEMA.COLUMNS but we'd miss lastModifiedTime and labels. Labels though can be obtained from INFORMATION_SCHEMA.TABLE_OPTIONS but this is not stightforward, requires parsing and what the most important this is low level information for which we'd couple bigquery implementation knowledge into our code.
c6650a7 to
22d3661
Compare
| options?: { concurrencyLimit?: number; bigqueryClient?: BigQuery } | ||
| ) { | ||
| this.bigQueryCredentials = credentials; | ||
| this.bigqueryClient = options?.bigqueryClient; |
There was a problem hiding this comment.
@kolina like this? Honestly I don't like having the field in class just for tests. Maybe extract logic of creating BigQuery clients outside BigQueryDbAdapter to BigQueryClientProvider - it would be easily mockable and we'd use the benefits of dependency injection?
d512e48 to
38433f2
Compare
* feat: add worker process management for JiT compilation - Introduce base worker and JiT-specific child process logic - Implement RPC bridge for database access during JiT - Add VM scripts and loader for isolated execution - Add unit tests for the RPC mechanism
38433f2 to
e687190
Compare
This change establishes the foundation for executing JiT compilation in an isolated environment. It introduces the worker process management, the RPC bridge for database access during JiT, and the necessary Bazel targets.