Skip to content

fix: enforce 1:1 worker-to-session with full DuckDB state reset#317

Open
EDsCODE wants to merge 5 commits intomainfrom
fix/worker-session-isolation
Open

fix: enforce 1:1 worker-to-session with full DuckDB state reset#317
EDsCODE wants to merge 5 commits intomainfrom
fix/worker-session-isolation

Conversation

@EDsCODE
Copy link
Contributor

@EDsCODE EDsCODE commented Mar 12, 2026

Summary

  • Workers now get a completely fresh DuckDB instance between sessions (~90ms recycle). Previously only temp tables/views were cleaned up, which leaked SET variables, prepared statements, and other session state to the next connection on the same worker.
  • Removed least-loaded worker routing from AcquireWorker. When at capacity, workers wait with backoff until one becomes idle (1:1 model).
  • ReleaseWorker moved after gRPC DestroySession so the worker is only available once its new DB is ready.

Test plan

  • Unit tests pass (controlplane, duckdbservice)
  • QA: verified SET variable (default_null_order) no longer leaks between sessions in control-plane mode
  • QA: confirmed DB recycle takes ~90ms per session end
  • Deploy to staging and verify under real traffic

🤖 Generated with Claude Code

EDsCODE and others added 4 commits March 12, 2026 12:12
Workers now get a completely fresh DuckDB instance between sessions.
Previously, only temp tables/views were cleaned up, which leaked SET
variables, prepared statements, and other session state to the next
connection on the same worker.

Changes:
- DestroySession closes the old *sql.DB and eagerly creates a fresh
  one via CreateDBConnection (~90ms), guaranteeing clean state
- Remove least-loaded worker routing from AcquireWorker; when at
  capacity, wait with backoff until a worker becomes idle
- Move ReleaseWorker after gRPC DestroySession so the worker is only
  available once its new DB is ready
- Remove cleanupSessionState (no longer needed)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the ~90ms close-and-reopen DuckDB approach with an exhaustive
in-place cleanup that takes ~15-20ms. Uses allowlists of warmup-created
objects (macros, views, tables) to distinguish system state from user
state. On session destroy:

1. RESET all DuckDB settings
2. Drop temp objects (tables, views)
3. Drop user objects not in allowlists (tables, views, macros, sequences, types)
4. Drop user schemas
5. Detach user databases
6. Drop user secrets
7. Re-apply warmup settings (threads, memory_limit, paths, DuckLake)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@EDsCODE EDsCODE marked this pull request as ready for review March 12, 2026 21:23
Merge origin/main into fix/worker-session-isolation. Resolved conflicts:
- k8s_pool.go: removed re-introduced least-loaded routing from PR #315,
  kept 1:1 idle-or-spawn-or-wait logic
- session_mgr.go: took main's improved comments for destroy/release ordering
- service.go: removed cleanupSessionState (superseded by resetSessionState)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant