Skip to content

feat(server/idempotency): complete PgBackend for multi-worker durable replay#555

Merged
bokelley merged 2 commits into
mainfrom
bokelley/pg-backend
May 4, 2026
Merged

feat(server/idempotency): complete PgBackend for multi-worker durable replay#555
bokelley merged 2 commits into
mainfrom
bokelley/pg-backend

Conversation

@bokelley
Copy link
Copy Markdown
Contributor

@bokelley bokelley commented May 4, 2026

Summary

Closes #548. Replaces the NotImplementedError scaffold with a real psycopg3-async-pool-backed PgBackend. Multi-worker AdCP sellers can now declare IdempotencySupported(supported=True) and actually dedupe across workers.

  • PgBackend(pool=...) — async pool, per-call connection acquisition (matches PgWebhookDeliverySupervisor / PgReplayStore)
  • await create_schema() — idempotent CREATE TABLE IF NOT EXISTS + INDEX. JSONB response, COLLATE "C" on PK text columns
  • First-writer-wins under concurrent put: ON CONFLICT … DO UPDATE … WHERE existing.expires_at <= now() — concurrent writers racing the same fresh slot can't violate the cache invariant "same (scope, key) → same payload_hash"
  • Naive datetime → ValueError with schema-drift hint (fail-fast over silent UTC coercion)
  • ASCII identifier regex on table_name; adcp[pg] extra; ImportError when missing

Security hardening alongside the backend

  • IdempotencyStore._extract_scope_key rejects tenant_id / caller_identity containing the U+001E scope separator. Without this, tenant="A\x1eB" + principal="X" collides with tenant="A" + principal="B\x1eX" — multi-tenant isolation defeated
  • Log lines use _scope_log_id(scope_key) — sha256-truncated identifier instead of the raw principal id. PII no longer lands in centralized log sinks

Docstring caveats

  • Atomicity (v1): cache write commits on a fresh pool connection (not handler's tx). Non-idempotent handler side effects need handler-level dedup on a unique constraint, or the co-tx follow-on
  • Schema bootstrap: CREATE TABLE IF NOT EXISTS no-ops on a pre-existing wrong-shape table — Alembic-first deployments must copy the DDL verbatim
  • JSON-safe contract: response dict must serialize via json.dumps; no datetime/Decimal/set/bytes
  • DoS: no row cap, only TTL. Per-principal rate limiting at the auth tier is the AdCP spec's expectation; schedule delete_expired() as a sweep

Test plan

  • 15 unit tests with mocked psycopg pool — tests/test_pg_idempotency_backend.py
  • 9 conformance tests against real Postgres (skipped without ADCP_PG_TEST_URL): first-writer-wins concurrent put, expired-row overwrite, scope isolation, end-to-end IdempotencyStore replay through PgBackend
  • 4 scope-separator-rejection unit tests in tests/test_server_idempotency.py
  • CI workflow extended to run the new conformance suite in the Postgres-16 job
  • Adjacent + full unit suite green (3511 passed, 0 failures)
  • ruff check + mypy clean

Reviewed by

Pre-PR review by code-reviewer, security-reviewer, python-expert. Material concerns addressed:

  • ON CONFLICT DO UPDATE race (python-expert) — added WHERE expires_at <= now() for first-writer-wins (was last-writer-wins)
  • \x1e separator confusion (security-expert) — added validator in _extract_scope_key, fail-closed on either tenant or principal containing the separator
  • PII in logs (security-expert) — sha256-truncated _scope_log_id replaces raw scope_key in WARNING/DEBUG lines
  • Naive datetime (python-expert) — fail-fast with schema-drift hint instead of silent UTC coercion
  • Docstring expansions for atomicity caveat, Alembic-first drift, JSON-safe contract, DoS expectations

🤖 Generated with Claude Code

bokelley and others added 2 commits May 3, 2026 22:58
… replay

Replaces the NotImplementedError scaffold with a real psycopg3-async-pool
implementation. Multi-worker AdCP sellers can now declare
IdempotencySupported(supported=True) and actually dedupe across workers.

* PgBackend(pool=...) — async pool, per-call connection acquisition.
* await create_schema() — idempotent CREATE TABLE IF NOT EXISTS +
  CREATE INDEX. JSONB response, COLLATE "C" on text PK columns to
  prevent locale-driven cross-tenant collisions.
* ON CONFLICT (scope_key, key) DO UPDATE … WHERE existing.expires_at <= now()
  is FIRST-WRITER-WINS under concurrent put — concurrent writers
  racing the same fresh slot cannot violate the cache invariant
  "same (scope, key) → same payload_hash".
* Naive datetime from expires_at raises ValueError with a schema-drift
  hint instead of silent UTC coercion (fail-fast policy).
* Validates table_name through the same ASCII identifier regex as the
  other PG backends.
* Optional adcp[pg] extra; ImportError with install hint when missing.

Security hardening alongside the backend:

* IdempotencyStore._extract_scope_key rejects tenant_id /
  caller_identity values containing the U+001E scope separator.
  Without this, tenant="A\\x1eB" + principal="X" would collide with
  tenant="A" + principal="B\\x1eX" — multi-tenant isolation defeated.
* Log lines use _scope_log_id(scope_key) — sha256-truncated identifier,
  not the raw principal id. PII / commercial identity data no longer
  lands in centralized log sinks verbatim.

Docstring caveats covered: atomicity (v1 separate-tx commit; non-
idempotent handler side effects need handler-level dedup or co-tx
follow-on); schema bootstrap (CREATE TABLE IF NOT EXISTS no-ops on
pre-existing wrong-shape tables — Alembic-first deployments must copy
the DDL verbatim); JSON-safe contract; per-principal rate limiting at
the auth tier and delete_expired sweeps.

Tests:
* 15 unit tests with mocked psycopg pool.
* 9 conformance tests against real Postgres (skipped without
  ADCP_PG_TEST_URL): first-writer-wins concurrent put, expired-row
  overwrite, end-to-end IdempotencyStore replay, scope_key isolation.
* 4 scope-separator-rejection unit tests.
* CI workflow extended to run the new conformance suite.

Closes #548.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous test_put_overwrites_via_on_conflict asserted last-writer-wins
behavior. The new ON CONFLICT … WHERE expires_at <= now() guard makes a
sequential put against a non-expired slot a no-op. Update the test name
+ assertion to match the contract; the underlying SQL is correct.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bokelley bokelley merged commit 1123458 into main May 4, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: complete PgBackend for IdempotencyStore (multi-worker durable dedup)

1 participant