|
| 1 | +# Add database engine plugins (external engines) |
| 2 | + |
| 3 | +This PR adds support for **database engine plugins**: external processes that implement a single `Parse` RPC and allow sqlc to work with databases that are not built-in (e.g. CockroachDB, TiDB, or custom SQL dialects). The plugin contract is deliberately minimal: no AST, no compiler in the middle, and a straight path from plugin output to codegen. |
| 4 | + |
| 5 | +--- |
| 6 | + |
| 7 | +## Pipeline: built-in engine vs external plugin |
| 8 | + |
| 9 | +### Built-in engine (PostgreSQL, MySQL, SQLite) |
| 10 | + |
| 11 | +```mermaid |
| 12 | +flowchart LR |
| 13 | + subgraph input |
| 14 | + schema[schema.sql] |
| 15 | + queries[queries.sql] |
| 16 | + end |
| 17 | +
|
| 18 | + subgraph sqlc_core |
| 19 | + parser[Parser] |
| 20 | + ast[(AST)] |
| 21 | + compiler[Compiler] |
| 22 | + catalog[(Catalog)] |
| 23 | + codegen_input[Queries + types] |
| 24 | + end |
| 25 | +
|
| 26 | + subgraph output |
| 27 | + codegen[Codegen plugin] |
| 28 | + end |
| 29 | +
|
| 30 | + schema --> parser |
| 31 | + queries --> parser |
| 32 | + parser --> ast |
| 33 | + ast --> compiler |
| 34 | + schema --> catalog |
| 35 | + catalog --> compiler |
| 36 | + compiler --> codegen_input |
| 37 | + codegen_input --> codegen |
| 38 | +``` |
| 39 | + |
| 40 | +- Parser produces an **intermediate AST**. |
| 41 | +- **Compiler** resolves types, expands `*`, validates against catalog, produces Queries. |
| 42 | +- Codegen receives already-compiled queries and types. |
| 43 | + |
| 44 | +### External engine plugin |
| 45 | + |
| 46 | +```mermaid |
| 47 | +flowchart LR |
| 48 | + subgraph input |
| 49 | + schema[schema.sql or connection] |
| 50 | + queries[queries.sql] |
| 51 | + end |
| 52 | +
|
| 53 | + subgraph sqlc |
| 54 | + adapter[engine process runner] |
| 55 | + end |
| 56 | +
|
| 57 | + subgraph plugin["Engine plugin (external process)"] |
| 58 | + parse[Parse] |
| 59 | + end |
| 60 | +
|
| 61 | + subgraph plugin_output["Plugin returns"] |
| 62 | + sql[SQL text] |
| 63 | + params[parameters] |
| 64 | + cols[columns] |
| 65 | + end |
| 66 | +
|
| 67 | + subgraph codegen_path["To codegen"] |
| 68 | + codegen_input[SQL + params + columns] |
| 69 | + codegen[Codegen plugin] |
| 70 | + end |
| 71 | +
|
| 72 | + schema --> adapter |
| 73 | + queries --> adapter |
| 74 | + adapter -->|"ParseRequest{sql, schema_sql | connection_params}"| parse |
| 75 | + parse -->|"ParseResponse{sql, parameters, columns}"| plugin_output |
| 76 | + plugin_output --> codegen_input |
| 77 | + codegen_input --> codegen |
| 78 | +``` |
| 79 | + |
| 80 | +- **No intermediate AST**: the plugin returns already “resolved” data (SQL text, parameters, columns). |
| 81 | +- **No compiler** for the plugin path: type resolution, `*` expansion, and validation are the plugin’s job. sqlc does not run the built-in compiler on plugin output. |
| 82 | +- Data from the plugin is passed through to the **codegen plugin** as-is (or after a thin adapter that today still produces a synthetic `[]ast.Statement` for compatibility; the useful payload is `sql` + `parameters` + `columns`). |
| 83 | + |
| 84 | +So: for external engines, the pipeline is effectively **schema + queries → engine plugin (Parse) → (sql, parameters, columns) → codegen**, with no AST and no compiler in between. |
| 85 | + |
| 86 | +### Where the branch is taken (generate only) |
| 87 | + |
| 88 | +The choice between “built-in engine” and “external plugin” happens **once per `sql[]` block**, when the compiler for that block is created. In the current implementation the branch is taken in **`internal/cmd/process.go`**: built-in engines use parse → compiler; plugin engines use **`runPluginQuerySet`** in **`plugin_engine_path.go`** (engine process runner, no compiler). Vet has no plugin-specific logic; for plugin-engine blocks it fails with compiler error "unknown engine". |
| 89 | + |
| 90 | +```mermaid |
| 91 | +flowchart TB |
| 92 | + process["processQuerySets()"] --> branch{"engine for this sql[]"} |
| 93 | + branch -->|"sqlite / mysql / postgresql"| parse_path["parse() → NewCompiler() → Result"] |
| 94 | + branch -->|"name in engines"| plugin_path["runPluginQuerySet()"] |
| 95 | + plugin_path --> runner["engine process runner → external process"] |
| 96 | + runner --> to_result["pluginResponseToCompilerQuery → compiler.Result"] |
| 97 | + parse_path --> result["ProcessResult → codegen"] |
| 98 | + to_result --> result |
| 99 | +``` |
| 100 | + |
| 101 | +**Call flow (built-in path)** |
| 102 | + |
| 103 | +1. **`internal/cmd/generate.go`** |
| 104 | + For each entry in `sql[]`, `parse()` is called with that block’s `config.SQL` (which includes `conf.Engine` = value of `engine: ...`). |
| 105 | + |
| 106 | +2. **`parse()`** calls **`compiler.NewCompiler(sql, combo, parserOpts)`** |
| 107 | + So every SQL block gets its own compiler, and the engine is selected inside `NewCompiler`. |
| 108 | + |
| 109 | +3. **`internal/compiler/engine.go`**, **`NewCompiler(conf config.SQL, combo config.CombinedSettings, ...)`** |
| 110 | + **Current code**: branch is in **`process.go`**, not here. **`NewCompiler`** only has sqlite/mysql/postgresql cases; `default` returns "unknown engine". Legacy snippet (branch used to be here): |
| 111 | + |
| 112 | + ```go |
| 113 | + switch conf.Engine { |
| 114 | + case config.EngineSQLite: |
| 115 | + // built-in: c.parser = sqlite.NewParser(), c.catalog = sqlite.NewCatalog(), ... |
| 116 | + case config.EngineMySQL: |
| 117 | + // built-in: dolphin parser + catalog |
| 118 | + case config.EnginePostgreSQL: |
| 119 | + // built-in: postgresql parser + catalog |
| 120 | + default: |
| 121 | + // “Other” engine name → treat as plugin |
| 122 | + if enginePlugin, found := config.FindEnginePlugin(&combo.Global, string(conf.Engine)); found { |
| 123 | + eng, _ := createPluginEngine(enginePlugin, combo.Dir) // plugin.NewPluginEngine or WASM |
| 124 | + c.parser = eng.Parser() // ProcessRunner, which calls the external process |
| 125 | + c.catalog = eng.Catalog() |
| 126 | + // ... |
| 127 | + } else { |
| 128 | + return nil, fmt.Errorf("unknown engine: %s ... add it to the 'engines' section ...") |
| 129 | + } |
| 130 | + } |
| 131 | + ``` |
| 132 | + |
| 133 | +- **Built-in path**: `conf.Engine` is `"sqlite"`, `"mysql"`, or `"postgresql"` → the switch hits one of the first three cases; parser and catalog are the in-tree implementations. |
| 134 | +- **Plugin engines**: the compiler does *not* load plugin engines. For `engine: myplugin` (name under `engines:`), **generate** uses the plugin path in cmd (`runPluginQuerySet` → engine process runner in **`plugin_engine_path.go`**); **vet** fails with compiler error "unknown engine" (no plugin-specific code in vet). (“unknown engine”. |
| 135 | + |
| 136 | +**Summary:** Built-in path = **`internal/compiler/engine.go`**; plugin path = **`internal/cmd/plugin_engine_path.go`**. |
| 137 | + |
| 138 | +--- |
| 139 | + |
| 140 | +## No intermediate AST for external plugins |
| 141 | + |
| 142 | +The plugin does **not** return an AST or “statements + AST”: |
| 143 | + |
| 144 | +- **Request**: query text + schema (or connection). |
| 145 | +- **Response**: `sql` (possibly with `*` expanded), `parameters`, `columns`. |
| 146 | + |
| 147 | +The plugin is the single place that defines how the query is interpreted. sqlc does not parse or analyze that SQL again; it forwards the plugin’s `ParseResponse` toward codegen. Any internal use of `[]ast.Statement` for the plugin path is a compatibility shim; the semantics are driven by the plugin’s `sql` / `parameters` / `columns`. |
| 148 | + |
| 149 | +--- |
| 150 | + |
| 151 | +## No compiler for external plugins |
| 152 | + |
| 153 | +The built-in **compiler** (catalog, type resolution, validation, expansion of `*`) is **not** used for external engine plugins: |
| 154 | + |
| 155 | +- The plugin is responsible for: |
| 156 | + - Resolving parameter and column types (using schema or DB). |
| 157 | + - Expanding `SELECT *` if desired. |
| 158 | + - Emitting whatever shape of `parameters` and `columns` the codegen expects. |
| 159 | +- sqlc does not run the compiler on plugin output; it passes that output through to codegen. So “compiler” is only in the built-in-engine path. |
| 160 | + |
| 161 | +--- |
| 162 | + |
| 163 | +## What is sent to and returned from the plugin |
| 164 | + |
| 165 | +**Invocation**: one RPC, `Parse`, over stdin/stdout (protobuf). |
| 166 | +Example: `sqlc-engine-mydb parse` with `ParseRequest` on stdin and `ParseResponse` on stdout. |
| 167 | + |
| 168 | +### Sent to the plugin (`ParseRequest`) |
| 169 | + |
| 170 | +| Field | Description | |
| 171 | +|-------------------|-------------| |
| 172 | +| `sql` | Query text to parse (from `queries.sql` or the current batch). | |
| 173 | +| `schema_sql` | *(optional)* Contents of the schema file(s), e.g. concatenated `schema.sql`. | |
| 174 | +| `connection_params` | *(optional)* DSN + options for “database-only” mode when schema is taken from the DB. | |
| 175 | + |
| 176 | +Exactly one of `schema_sql` or `connection_params` is used per request, depending on how the project is configured (see below). |
| 177 | + |
| 178 | +### Returned from the plugin (`ParseResponse`) |
| 179 | + |
| 180 | +| Field | Description | |
| 181 | +|-------------|-------------| |
| 182 | +| `sql` | Processed SQL. Can be the same as input, or e.g. `SELECT *` expanded to explicit columns. | |
| 183 | +| `parameters`| List of parameters: name, position, `data_type`, nullable, is_array, array_dims. | |
| 184 | +| `columns` | List of result columns: name, `data_type`, nullable, is_array, array_dims, optional table/schema. | |
| 185 | + |
| 186 | +These three are enough for codegen to generate type-safe code without an AST or compiler step. |
| 187 | + |
| 188 | +--- |
| 189 | + |
| 190 | +## How the schema is passed into the plugin |
| 191 | + |
| 192 | +Schema is provided to the plugin in one of two ways, via `ParseRequest.schema_source`: |
| 193 | + |
| 194 | +1. **Schema-based (files)** |
| 195 | + - sqlc reads the configured schema files (e.g. `schema: "schema.sql"`) and passes their contents as **`schema_sql`** (a string) in `ParseRequest`. |
| 196 | + - The plugin parses this SQL (e.g. `CREATE TABLE ...`) and uses it to resolve types, expand `*`, etc. |
| 197 | + |
| 198 | +2. **Database-only** |
| 199 | + - When schema is not from files, sqlc can pass **`connection_params`** (DSN + optional extra options) in `ParseRequest`. |
| 200 | + - The plugin connects to the DB and uses live metadata (e.g. `INFORMATION_SCHEMA` / `pg_catalog`) to resolve types and columns. |
| 201 | + |
| 202 | +So: **schema** is either “schema.sql as text” or “connection params to the database”; the plugin chooses how to use it. |
| 203 | + |
| 204 | +--- |
| 205 | + |
| 206 | +## Changes in `sqlc.yaml` |
| 207 | + |
| 208 | +### New top-level `engines` |
| 209 | + |
| 210 | +Plugins are declared under `engines` and referenced by name in `sql[].engine`: |
| 211 | + |
| 212 | +```yaml |
| 213 | +version: "2" |
| 214 | + |
| 215 | +engines: |
| 216 | + - name: mydb |
| 217 | + process: |
| 218 | + cmd: sqlc-engine-mydb |
| 219 | + env: |
| 220 | + - MYDB_DSN |
| 221 | + |
| 222 | +sql: |
| 223 | + - engine: mydb |
| 224 | + schema: "schema.sql" |
| 225 | + queries: "queries.sql" |
| 226 | + codegen: |
| 227 | + - plugin: go |
| 228 | + out: db |
| 229 | +``` |
| 230 | +
|
| 231 | +- **`engines`**: list of named engines. Each has `name` and either `process.cmd` (and optionally `env`) or a WASM config. |
| 232 | +- **`sql[].engine`**: for that SQL block, use the engine named `mydb` (which triggers the plugin) instead of `postgresql` / `mysql` / `sqlite`. |
| 233 | + |
| 234 | +So the only new concept in config is “define engines (including plugins) by name, then point `sql[].engine` at them.” Schema and queries are still configured per `sql[]` block as today. |
| 235 | + |
| 236 | +--- |
| 237 | + |
| 238 | +## Who handles sqlc placeholders in queries |
| 239 | + |
| 240 | +Support for sqlc-style placeholders (`sqlc.arg()`, `sqlc.narg()`, `sqlc.slice()`, `sqlc.embed()`, etc.) is **entirely up to the plugin**: |
| 241 | + |
| 242 | +- The plugin receives the raw query text (including those macros) in `ParseRequest.sql`. |
| 243 | +- It can parse and interpret them and reflect the result in `parameters` (and, if needed, in `sql` or in how it uses schema). There is no separate “sqlc placeholder” pass in the core for the plugin path. |
| 244 | +- If the plugin does not handle a placeholder, that placeholder will not be turned into proper parameters/columns by sqlc; the pipeline does not add a generic placeholder expander for external engines. |
| 245 | + |
| 246 | +So: **the database engine plugin is responsible for understanding and handling sqlc placeholders** for its engine. |
| 247 | + |
| 248 | +--- |
| 249 | + |
| 250 | +## Summary for maintainers |
| 251 | + |
| 252 | +- **One RPC**: `Parse(sql, schema_sql | connection_params) → (sql, parameters, columns)`. |
| 253 | +- **No AST, no compiler** on the plugin path; data flows from plugin to codegen. |
| 254 | +- **Schema** is passed either as `schema_sql` (file contents) or as `connection_params` (DSN) in `ParseRequest`. |
| 255 | +- **Config**: `engines[]` + `sql[].engine: <name>`; existing `schema` / `queries` / `codegen` stay as-is. |
| 256 | +- **Placeholders**: handled inside the plugin; core does not add a generic layer for external engines. |
| 257 | + |
| 258 | +This keeps the plugin API small and leaves type resolution and dialect behavior inside the plugin, while still allowing sqlc to drive generation from a single, well-defined contract. |
0 commit comments