Skip to content

feat(replica): redaction via dbt-masking manifests + postgresql_anonymizer#40

Open
passcod wants to merge 11 commits into
mainfrom
feat/replica-redaction
Open

feat(replica): redaction via dbt-masking manifests + postgresql_anonymizer#40
passcod wants to merge 11 commits into
mainfrom
feat/replica-redaction

Conversation

@passcod
Copy link
Copy Markdown
Member

@passcod passcod commented May 13, 2026

Summary

Adds a new spec.redaction field to PostgresPhysicalReplica. When set, after each restore reaches the Ready phase the operator fetches a dbt-shaped masking manifest, installs the postgresql_anonymizer extension into the restore Pod, applies a SECURITY LABEL per masked column, runs anon.anonymize_database(), and re-enables read-only — all before allowing the switchover to make the new restore live.

The contract follows Tamanu's masking spec verbatim: short-form (masking: email) and extended-form (masking: { kind: integer, range: "20-50" }) are both accepted, all canonical kinds are handled (truncate, date, datetime, text, string, email, name, phone, place, url, zero, empty, nil, default, integer, float, money), and unknown kinds are tolerated as partial errors. Nulls are preserved via CASE wrappers; type-dependent kinds (zero, empty, default, nil) consult information_schema.columns + pg_attrdef so they dispatch correctly across text / numeric / bytea / json types. Mask-to-anon mapping was cross-checked against the dalibo docs.

The extension is installed by the postgres container's own command wrapper: it runs as root, apt-installs postgresql_anonymizer_$N from Dalibo Labs (cached on the restore PVC at /pgdata/.anon-cache/ so subsequent pod starts skip the network), copies the files into /usr/share/postgresql/$N/extension and /usr/lib/postgresql/$N/lib of its writable layer, then drops to UID 999 via gosu postgres before execing postgres. No PG-version gate — works on any major the operator otherwise supports.

CRD changes

New spec field on PostgresPhysicalReplica:

spec:
  redaction:
    manifestUrl: "https://docs.data.bes.au/tamanu/v{version}/manifest.json"
    versionQuery: "SELECT value FROM local_system_facts WHERE key = 'currentVersion'"
    versionFallbackToBase: true

New status fields: redactionPhase (activecomplete/partial/failed: …), redactionVersion, redactionColumnsApplied.

Behaviour

  • Writable during redaction, read-only after — operator flips default_transaction_read_only = on at the DB level and demotes the analytics user back to NOSUPERUSER + pg_read_all_data when spec.readOnly is true.
  • Per-statement-error tolerance — missing columns / unknown kinds / non-nullable nil etc. don't abort; the run reports partial with a count.
  • failed: is sticky — broken manifests don't loop; the next scheduled restore clears the phase.
  • Sweep gates on redaction settling — same pattern as schemaMigrationPhase. Schema migration runs after redaction so persistent_schemas views regenerate against redacted source data.

Tests

  • ~40 new unit tests covering manifest parsing (both shapes, table-level truncate, missing schema/name, unknown kinds), range parsing (last-dash split, floats), fragment building for every canonical kind including type-dispatched cases, spec validation, version resolution, and restore Pod wiring (including a PG-16 case to confirm there's no version gate).
  • New end-to-end integration test (tests/redaction.rs) and matrix entry. The CI workflow builds a kopia snapshot fixture with a Tamanu-shaped local_system_facts table + a users table + sync_lookup, deploys an in-namespace nginx-served static manifest, and asserts each mask kind took effect, sync_lookup was truncated, unmarked columns kept their values, read-only was re-enabled, and analytics was demoted from SUPERUSER.

Open items

  • Network dependency on Dalibo Labs apt repo — the first start of each restore Pod hits the repo; subsequent starts use the PVC cache. The Pod fails to start if the repo is unreachable.

🤖 Generated with Claude Code

passcod added 11 commits May 13, 2026 20:52
Introduce `spec.redaction` on PostgresPhysicalReplica with manifest URL,
version discovery (literal or SQL query), base-version fallback, and
extension image override. Add status fields tracking redaction phase,
resolved version, and column count.

No behaviour yet — wiring follows in subsequent commits.
…y layer

Introduces the redaction module:
- manifest.rs: parses Tamanu/dbt manifests into ColumnMask / TableMask
- mask.rs: 13-kind registry mapping each canonical mask to a SECURITY
  LABEL fragment for postgresql_anonymizer; type-dispatched zero/empty/
  default/nil; null-preserving CASE wrappers
- apply.rs: applies the parsed manifest against a live restore DB
  (CREATE EXTENSION, TRUNCATE for table-level, SECURITY LABEL per
  column, anon.anonymize_database, ALTER DATABASE SET read-only)
- redaction.rs: orchestrates fetch -> parse -> apply with version
  resolution (literal/SQL query) and base-version fallback

Reconciler wiring follows in a subsequent commit.
… redaction is set

When spec.redaction is configured on the replica, the restore Pod gets:
- an image volume sourcing the postgresql_anonymizer extension files
  (defaulting to registry.gitlab.com/dalibo/postgresql_anon:latest)
- a read-only mount at /extensions/anon
- extension_control_path / dynamic_library_path GUCs in postgresql.conf
  pointing at the mounted paths (PG 18+ features)

PG version < 18 is rejected at build time. Read-only enforcement is
also deferred (effective_read_only=false) so the redaction step can
write; the redaction module re-enables it at the database level when
done.
Drives spec.redaction through the reconcile loop:
- redaction runs against the switching restore before schema_migration
  so persistent_schemas dbt views regenerate against redacted source
  data
- redactionPhase tracks the run (active -> complete/partial/failed),
  with failed:* sticky to avoid retry loops on broken manifests
- stale-restore sweep waits on redaction settling, same gate as schema
  migration
- redactionPhase/Version/ColumnsApplied get reset along with the schema
  fields when the sweep removes the previous restore
- on success, when spec.read_only is true: ALTER DATABASE … SET
  default_transaction_read_only = on, demote analytics to
  NOSUPERUSER, and GRANT pg_read_all_data — matching the role posture
  the init script applies when effective_read_only is true
Adds an integration test that exercises the whole redaction pipeline
against a kind cluster:

- tests/fixtures/setup-kopia-repo-pg18.yaml: snapshots a PG 18 db
  with a Tamanu-shaped local_system_facts table plus users and
  sync_lookup test tables
- tests/fixtures/manifest-server.yaml: nginx + ConfigMap that serves
  a static dbt-shaped manifest covering email/name/date/phone/
  integer-range column masks and table-level truncate
- tests/fixtures/Dockerfile.anon-pg18: minimal PG-18 anon extension
  image built from Dalibo's apt repo (their published :stable is
  built against PG 16, so we have to roll our own for now)
- tests/redaction.rs: drives the workflow and asserts each mask kind
  took effect, sync_lookup was truncated, unmarked columns are
  unchanged, read-only is re-enabled, and analytics is demoted from
  SUPERUSER
- .github/workflows/integration.yml: new 'redaction' matrix entry,
  PG-18 image pre-pull, anon-image build, and PG-18 kopia setup

Also drive-by fixes from finding real defaults: the extension default
image name was registry.gitlab.com/dalibo/postgresql_anon — the
canonical name is .../postgresql_anonymizer:stable. The
extension_control_path / dynamic_library_path GUCs now reference the
Debian PG layout (.../usr/share/postgresql/N/extension and
.../usr/lib/postgresql/N/lib) inside the image mount.
…volume

Replaces the image-volume mechanism with an install-anon init container
that apt-installs postgresql_anonymizer_$N from Dalibo Labs and stages
the extension files under /pgdata/extensions/anon on the restore PVC.

Why: the published dalibo image is built against PG 16, and shipping a
pre-built PG-18 image (or one per PG major) is operationally heavy.
apt install completes in ~30s and runs once per restore, which is
negligible against a 20-minute restore. The install is idempotent on
pod restarts.

Drops the extensionImage spec field, DEFAULT_ANON_IMAGE constant, and
the Dockerfile.anon-pg18 / Docker build step from CI. Restore PG version
gate stays at 18+ because extension_control_path is a PG 18 GUC; older
PG would need files overlaid into the system extension dirs at pod
start, which is a separate change.
…ntainer prelude

Moves the anon-extension install from a separate init container to the
postgres container's own command wrapper. The container runs as root
for the prelude (which apt-installs postgresql_anonymizer_$N from
Dalibo Labs and copies anon.{control,sql,so} into the standard system
extension dirs of its own writable layer), then drops to UID 999 via
`gosu postgres` before exec'ing postgres.

This unlocks all PG majors the operator otherwise supports: postgres
finds anon at /usr/share/postgresql/$N/extension and /usr/lib/
postgresql/$N/lib (no extension_control_path GUC needed) so the PG-18
restriction goes away.

Drops:
- the install-anon init container
- extension_control_path / dynamic_library_path GUCs in postgresql.conf
- the PG-18 build-time gate in restore/builders.rs
- the PG-18 reconcile-time gate in replica/redaction.rs

Adds:
- a per-container securityContext (runAsUser=0) on the postgres
  container when redaction is set, overriding the pod-level UID 999
- REDACTION_ENABLED=1 env var to drive the prelude
- a /pgdata/.anon-cache PVC cache so the apt-install runs once per
  restore (not per pod restart)
anon doesn't ship a fake_name() function — only fake_first_name() and
fake_last_name() per the dalibo docs at
https://postgresql-anonymizer.readthedocs.io/en/stable/masking_functions/

The name-mask CASE composed the with-space branch as fake_name(), which
would have failed at SECURITY LABEL time with a tolerated error. Compose
first || ' ' || last for the with-space case; single names still use
fake_first_name().

Verified the rest of the registry against the docs at the same time;
no other functions needed adjustment.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant