Skip to content

docs: add secret rotation design (OpenBao-native)#1603

Open
devantler wants to merge 3 commits into
mainfrom
docs/secret-rotation-design
Open

docs: add secret rotation design (OpenBao-native)#1603
devantler wants to merge 3 commits into
mainfrom
docs/secret-rotation-design

Conversation

@devantler
Copy link
Copy Markdown
Contributor

Why

Planning artifact for automated secret rotation, before any prod-affecting implementation — the flagship change (OpenBao Database engine credential handover on the prod fleetdm MySQL) is high-risk and stateful, so it warrants a reviewed design first.

Key validated finding

Setting a non-zero refreshInterval on the generator-backed PushSecrets would not rotate them. Reading ESO v2.5.0 source: the PushSecret reconciler persists a GeneratorState (statemanager) and reuses it, so generator output stays stable across reconciles. Corroborated empirically (values unchanged for days; zero live GeneratorState instances). Rotation needs an explicit mechanism, not a config flip.

Design summary

  • OpenBao-native (chosen): Database secrets engine + static roles (rotate an existing DB user's password on a rotation_period) for fleetdm MySQL fleet user (Phase 1) and Valkey (Phase 2). Static roles fit apps that read a credential from a Secret at startup; dynamic roles don't.
  • Doc covers: engine/connection/role config in the vault-config Job, the credential-handover sequence (fresh + existing cluster), risks + rollback, and validation (CI's local-cluster system test exercises the chain).
  • Not OpenBao-native (no engine for arbitrary strings): cookie secret (migrate to OpenBao + scheduled rotation; no skew), OIDC client secrets (unify Dex+clients first, coordinated rotation), provider tokens, and the manual roots (Age key, OpenBao unseal/root) — each with a recommended approach.

Status

Proposal only — no manifests changed. Ready to implement Phase 1 as a follow-up PR on approval.

🤖 Generated with Claude Code

Design for automated secret rotation, preferring OpenBao-native mechanisms.

Key validated finding: a non-zero `refreshInterval` does NOT rotate
generator-backed secrets — ESO v2.5.0's PushSecret reconciler persists a
GeneratorState and reuses it, keeping the value stable. Rotation needs an
explicit mechanism.

Chosen approach: OpenBao Database secrets engine with static roles (rotates an
existing DB user's password on a schedule) for fleetdm MySQL/Valkey — the
natural OpenBao-native fit. Document covers the engine/role config, the
credential-handover sequence, risks + rollback, and CI/local validation, plus
why cookie/OIDC/provider/root secrets are not OpenBao-native and how to handle
them. Ships in phases, each as its own reviewed PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 27, 2026 21:24
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new design document describing a phased approach for automated secret rotation, prioritizing OpenBao-native mechanisms (Database secrets engine with static roles) and documenting risks, rollback, and validation before any production-affecting implementation.

Changes:

  • Introduces a proposal document outlining current secret sourcing, a validated ESO generator non-rotation finding, and a feasibility matrix by secret class.
  • Specifies a Phase 1 design for rotating FleetDM MySQL fleet credentials via OpenBao Database engine static roles, including handover sequencing and rollback steps.
  • Outlines follow-up phases (Valkey, oauth2-proxy cookie secret, OIDC coordination, provider tokens/roots) and recommended rollout order.

Comment thread docs/secret-rotation.md Outdated
Comment on lines +60 to +63
2. **Consume the rotated credential**: replace the `mysql` `ExternalSecret`'s
`mysql-password` source from KV (`apps/fleetdm/mysql`) with
`database/static-creds/fleet` (password field). Keep `mysql-root-password` /
`mysql-replication-password` on KV for now (root rotation is out of scope).
@devantler devantler marked this pull request as ready for review May 27, 2026 21:28
@devantler devantler enabled auto-merge May 27, 2026 21:29
Validation found ESO's Vault provider supports KV only — the database engine
cannot be read by a plain ExternalSecret. Correct Phase 1 to consume
database/static-creds/fleet via the VaultDynamicSecret generator
(dataFrom.sourceRef.generatorRef).

Add the decisive open question that gates implementation: whether the read
generator re-reads on each refresh or caches via GeneratorState (as the
PushSecret+Password generator does). If it caches, rotation silently fails and
fleet loses DB access at the first rotation — a time-bomb CI cannot catch. Must
be validated on a local cluster by forcing a rotation before prod. Also document
the inherent post-rotation propagation window.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…gates)

Isolated kind spike (ESO v2.5.0): changing the source value propagated to the
synced Secret within one refresh cycle. Unlike the Password generator (caches via
GeneratorState), the read-generator re-reads — so static-role rotation propagates
and there is no silent-rotation time-bomb. Phase 1 unblocked.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 27, 2026 21:44
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 10 comments.

Comment thread docs/secret-rotation.md
- **OpenBao = KV v2 + Kubernetes auth only** (configured by the `vault-config`
Job). No Database secrets engine is enabled yet.
- **Two sourcing paths coexist (mid-migration):**
- **OpenBao → ESO**: generators (`vault-seed/generators.yaml`) seed OpenBao KV
Comment thread docs/secret-rotation.md
alertmanager.
- **SOPS → Flux postBuild substitution**: `${dex_client_secret}`,
`${flux_web_client_secret}`, `${oauth2_proxy_cookie_secret}` etc. are still
read from `clusters/*/variables/variables-cluster-secret.enc.yaml`. **Dex
Comment thread docs/secret-rotation.md

### Validated finding — `refreshInterval` does **not** rotate generators

The `push-generated-secrets.yaml` PushSecrets use `refreshInterval: "0"`, and the
Comment thread docs/secret-rotation.md

| Class | Secrets | OpenBao-native fit | Plan |
| --- | --- | --- | --- |
| **Database creds** | fleetdm MySQL `fleet` user, Valkey | ✅ **Database engine, static roles** | Phase 1–2 below |
Comment thread docs/secret-rotation.md

### Validation

- `kubectl kustomize k8s/clusters/{local,prod}/` build; `ksail workload validate`.
Comment thread docs/secret-rotation.md
Comment on lines +43 to +44
OpenBao's **Database secrets engine** supports **static roles** for MySQL/MariaDB
and Valkey: OpenBao stores and **automatically rotates the password of an
Comment thread docs/secret-rotation.md
Comment on lines +114 to +117
### Phase 2 — fleetdm Valkey

Same pattern using the Valkey database plugin and a static role for the redis
user, once Phase 1 is proven.
Comment thread docs/secret-rotation.md
## Rollout order

1. Phase 1 — fleetdm MySQL static-role rotation (this design's flagship).
2. Phase 2 — fleetdm Valkey static-role rotation.
Comment thread docs/secret-rotation.md
Comment on lines +23 to +27
The `push-generated-secrets.yaml` PushSecrets use `refreshInterval: "0"`, and the
comment implies a non-zero value would rotate. **It would not.** ESO **v2.5.0**'s
PushSecret reconciler persists a `GeneratorState` (statemanager) and reuses the
prior state, keeping generator output **stable** across reconciles. Empirically:
generated values have been unchanged for days and there are no live
Comment thread docs/secret-rotation.md
Comment on lines +60 to +63
2. **Consume the rotated credential** — *not* a plain `ExternalSecret`. ESO's
Vault provider supports **KV only** ("The KV Secrets Engine is the only one
supported by this provider"), so the database engine must be read via the
**`VaultDynamicSecret` generator** (`generators.external-secrets.io`). A
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 🫴 Ready

Development

Successfully merging this pull request may close these issues.

2 participants