Skip to content

Report stable physical catalog for DuckLake-backed sessions (single- and multi-tenant)#644

Open
fuziontech wants to merge 2 commits into
mainfrom
fix/single-tenant-stable-catalog
Open

Report stable physical catalog for DuckLake-backed sessions (single- and multi-tenant)#644
fuziontech wants to merge 2 commits into
mainfrom
fix/single-tenant-stable-catalog

Conversation

@fuziontech
Copy link
Copy Markdown
Member

@fuziontech fuziontech commented May 29, 2026

Problem

A serverless / single-tenant duckling failed its nightly SQLMesh run with Catalog "portola" does not exist, and SQLMesh also wanted to rebuild the entire warehouse. Root cause:

  • The client-visible catalog name (current_database() / pg_database) is installed from the connection's startup dbname (InitSessionDatabaseMetadata).
  • That dbname changed ducklakeportola.
  • SQLMesh fully-qualifies every model as catalog.schema.object and persists the catalog in state, so a changed default catalog reads as a brand-new warehouse → full rebuild of ~140 models. Mid-transition you also get Catalog X does not exist when a reference under the other name isn't translated.

Fix

Install current_database()/pg_database from a stable catalog name (sessionmeta.ReportedDatabaseName), in precedence order:

  1. a configured default catalog (e.g. iceberg) — that session's search_path/USE points there, so it's what current_database() must report;
  2. else, a DuckLake-backed session → the physical catalog ducklake (gated on the actual runtime DuckLake attachment);
  3. else, the connection dbname unchanged (plain DuckDB / no default).

The connection dbname still works as an alias for DuckLake sessions (logical-catalog transform rewrites <dbname>.schema.tableducklake.schema.table, USE "<dbname>"ducklake.main, and ducklake.* resolves directly) — so both names resolve and a half-migrated SQLMesh state can't strand on Catalog X does not exist.

Reporting ducklake for single- and multi-tenant DuckLake sessions makes the catalog identity consistent across deployments, so an org graduating from a single-tenant duckling to a multi-tenant worker pod keeps the same catalog name and does not churn SQLMesh. iceberg-default sessions report iceberg.

This intentionally changes the client-visible behavior of the logical-catalog feature for DuckLake sessions: current_database()/pg_database/information_schema/SHOW DATABASES now report the stable catalog (ducklake, or iceberg for iceberg-default users) instead of the connection/logical dbname. Org identity for observability remains via orgID (logged separately). The feature tests asserting the old contract are updated (alias-execution assertions kept; only metadata-reporting expectations flip):

  • tests/integration/logical_catalog_mapping_test.go, logical_database_catalog_mapping_test.go, logical_database_catalog_test.go
  • tests/k8s/sni_test.go

Second commit (public→main for direct physical-catalog refs): now that clients read current_database()=ducklake and may emit ducklake.public.<table>, the LogicalCatalogTransform maps publicmain for the physical catalog too (it previously only did so for the logical name; PublicSchemaTransform leaves 3-part names alone). Gated on PhysicalCatalogName, so postgres.public.* etc. are untouched.

Scope / consistency

Gating on runtime attachment leaves non-DuckLake sessions (e.g. iceberg-only orgs with no ducklake attached) reporting their own dbname. Covers both entry points: control-plane handler + standalone serve(). The change is control-plane behavior only — the worker just executes the macro SQL the control plane sends.

Verified live on portola (single-tenant duckling)

Deployed 4d6fed5 (currently-running base) + this fix via a graceful control-plane reload (systemctl reload → SIGUSR1/tableflip), worker left untouched:

  • dbname=portolaSELECT current_database(): portoladucklake
  • dbname=ducklakeducklake (unchanged) ✅
  • dbname=portolaSHOW DATABASESducklake
  • Zero disruption to in-flight Fivetran/SQLMesh imports (24+ real queries completed across the reload; worker pid unchanged).

Deployment notes

  • Behavior change is control-plane only; the worker binary is unaffected, so it can roll out via a control-plane reload without restarting workers (that's how it was validated on portola).
  • Rollback: restore the previous binary and systemctl reload duckgres.

Tests

  • TestReportedDatabaseName: DuckLake → ducklake (single- and multi-tenant); iceberg default → iceberg (incl. wins-over-ducklake); non-DuckLake → connection dbname.
  • TestTranspile_LogicalDatabaseCatalogMapping/physical_ducklake_public_schema_maps_to_main for the public→main fix.
  • Updated logical-catalog integration + k8s SNI tests to the new contract.
  • go build/go vet; transpiler, server, sessionmeta suites pass.

Test plan

  • go test ./server/sessionmeta/... ./transpiler/...
  • go test ./server/ -run 'Logical|DirectQueryRewrite|SessionDatabase|InitSession'
  • go build ./..., go vet
  • CI: integration-tests + k8s-integration-tests green
  • On the affected duckling: connect database=portolacurrent_database() = ducklake; SHOW DATABASES = ducklake (verified live, see above)

Out of scope

  • Why the connection now needs database=portola (this PR makes the catalog name stable regardless).
  • The JSON ->/->> Conversion Error: Failed to cast value to numerical errors from the same run — separate issue (DuckDB JSON-path, likely 1.5.3 fallout).

🤖 Generated with Claude Code

@fuziontech fuziontech force-pushed the fix/single-tenant-stable-catalog branch from b11465b to 3c34fb7 Compare May 29, 2026 22:05
@fuziontech fuziontech changed the title Report stable physical catalog for single-tenant DuckLake (stop SQLMesh full-rebuild on dbname change) Report stable physical catalog for DuckLake-backed sessions (single- and multi-tenant) May 29, 2026
@fuziontech fuziontech force-pushed the fix/single-tenant-stable-catalog branch from 3c34fb7 to c11a930 Compare May 29, 2026 22:44
A serverless duckling broke its nightly SQLMesh run after its connection
dbname changed (ducklake -> portola): current_database()/pg_database were
installed from the connection's startup dbname, and SQLMesh fully-qualifies
every model as catalog.schema.object and persists the catalog in its state.
A changed default catalog reads as a brand-new warehouse, so SQLMesh wants
to rebuild every model; mid-transition you also get "Catalog X does not
exist" when a reference under the other name isn't translated.

Install current_database()/pg_database from a stable catalog name chosen by
sessionmeta.ReportedDatabaseName, in precedence order:
  1. a configured default catalog (e.g. "iceberg") — that session's
     search_path/USE points there, so it is what current_database() must
     report;
  2. else, for a DuckLake-backed session, the physical catalog "ducklake"
     (gated on the actual runtime DuckLake attachment);
  3. else, the connection dbname unchanged (plain DuckDB / no default).

The connection dbname still works as an alias for DuckLake sessions: the
logical-catalog transform rewrites <dbname>.schema.table ->
ducklake.schema.table, the USE rewriter maps USE "<dbname>" -> ducklake.main,
and ducklake.* references resolve directly — so both names resolve and a
half-migrated SQLMesh state can't strand on "Catalog X does not exist".

Reporting "ducklake" for both single- and multi-tenant DuckLake sessions
makes the catalog identity consistent across deployments: an org graduating
from a single-tenant duckling to a multi-tenant worker pod keeps the same
catalog name and does not churn SQLMesh. iceberg-default sessions report
"iceberg" so their catalog identity is likewise stable and correct.

This intentionally changes the client-visible behavior of the logical
database catalog feature for DuckLake sessions: current_database()/
pg_database/information_schema/SHOW DATABASES now report the stable catalog
("ducklake", or "iceberg" for iceberg-default users) instead of the
connection/logical dbname. Org identity for observability is still available
via orgID (logged separately). Tests asserting the old contract are updated:
- tests/integration/logical_catalog_mapping_test.go
- tests/integration/logical_database_catalog_mapping_test.go
- tests/integration/logical_database_catalog_test.go
- tests/k8s/sni_test.go
The alias-execution assertions in those tests are kept (the connection
dbname still resolves for writes), only the metadata-reporting expectations
flip.

Also: use the shared physicalDuckLakeCatalog/PhysicalDuckLakeCatalog const
at the HasAttachedCatalog call sites and newTranspiler's PhysicalCatalogName
(they must stay in lockstep for the alias guarantee), and refresh the stale
control.go comment that claimed current_database() surfaces the per-org
routing database.

Both entry points covered: the control-plane connection handler and the
standalone serve() path; each detects DuckLake attachment before installing
session metadata. Adds sessionmeta.ReportedDatabaseName (+ shared
PhysicalDuckLakeCatalog const) with unit coverage.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@fuziontech fuziontech force-pushed the fix/single-tenant-stable-catalog branch from c11a930 to f689fd9 Compare May 29, 2026 22:59
Now that current_database() reports the physical catalog name ("ducklake")
for DuckLake-backed sessions, pg-compat clients that build
catalog.schema.object from current_database() emit ducklake.public.<table>.
Previously the public->main mapping only ran for the logical catalog name
(PublicSchemaTransform deliberately leaves 3-part names alone, and
LogicalCatalogTransform only matched the logical name), so ducklake.public.*
fell through unrewritten and failed — DuckLake's default schema is "main",
not "public".

Extend LogicalCatalogTransform.rewriteRangeVar to also map "public" -> "main"
when a reference already uses the physical catalog name, mirroring the
logical-name case. Unrelated external catalogs (e.g. postgres.public.*) are
untouched because the match is gated on PhysicalCatalogName.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant