docs: add secret rotation design (OpenBao-native)#1603
Open
devantler wants to merge 3 commits into
Open
Conversation
Design for automated secret rotation, preferring OpenBao-native mechanisms. Key validated finding: a non-zero `refreshInterval` does NOT rotate generator-backed secrets — ESO v2.5.0's PushSecret reconciler persists a GeneratorState and reuses it, keeping the value stable. Rotation needs an explicit mechanism. Chosen approach: OpenBao Database secrets engine with static roles (rotates an existing DB user's password on a schedule) for fleetdm MySQL/Valkey — the natural OpenBao-native fit. Document covers the engine/role config, the credential-handover sequence, risks + rollback, and CI/local validation, plus why cookie/OIDC/provider/root secrets are not OpenBao-native and how to handle them. Ships in phases, each as its own reviewed PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a new design document describing a phased approach for automated secret rotation, prioritizing OpenBao-native mechanisms (Database secrets engine with static roles) and documenting risks, rollback, and validation before any production-affecting implementation.
Changes:
- Introduces a proposal document outlining current secret sourcing, a validated ESO generator non-rotation finding, and a feasibility matrix by secret class.
- Specifies a Phase 1 design for rotating FleetDM MySQL
fleetcredentials via OpenBao Database engine static roles, including handover sequencing and rollback steps. - Outlines follow-up phases (Valkey, oauth2-proxy cookie secret, OIDC coordination, provider tokens/roots) and recommended rollout order.
Comment on lines
+60
to
+63
| 2. **Consume the rotated credential**: replace the `mysql` `ExternalSecret`'s | ||
| `mysql-password` source from KV (`apps/fleetdm/mysql`) with | ||
| `database/static-creds/fleet` (password field). Keep `mysql-root-password` / | ||
| `mysql-replication-password` on KV for now (root rotation is out of scope). |
Validation found ESO's Vault provider supports KV only — the database engine cannot be read by a plain ExternalSecret. Correct Phase 1 to consume database/static-creds/fleet via the VaultDynamicSecret generator (dataFrom.sourceRef.generatorRef). Add the decisive open question that gates implementation: whether the read generator re-reads on each refresh or caches via GeneratorState (as the PushSecret+Password generator does). If it caches, rotation silently fails and fleet loses DB access at the first rotation — a time-bomb CI cannot catch. Must be validated on a local cluster by forcing a rotation before prod. Also document the inherent post-rotation propagation window. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…gates) Isolated kind spike (ESO v2.5.0): changing the source value propagated to the synced Secret within one refresh cycle. Unlike the Password generator (caches via GeneratorState), the read-generator re-reads — so static-role rotation propagates and there is no silent-rotation time-bomb. Phase 1 unblocked. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| - **OpenBao = KV v2 + Kubernetes auth only** (configured by the `vault-config` | ||
| Job). No Database secrets engine is enabled yet. | ||
| - **Two sourcing paths coexist (mid-migration):** | ||
| - **OpenBao → ESO**: generators (`vault-seed/generators.yaml`) seed OpenBao KV |
| alertmanager. | ||
| - **SOPS → Flux postBuild substitution**: `${dex_client_secret}`, | ||
| `${flux_web_client_secret}`, `${oauth2_proxy_cookie_secret}` etc. are still | ||
| read from `clusters/*/variables/variables-cluster-secret.enc.yaml`. **Dex |
|
|
||
| ### Validated finding — `refreshInterval` does **not** rotate generators | ||
|
|
||
| The `push-generated-secrets.yaml` PushSecrets use `refreshInterval: "0"`, and the |
|
|
||
| | Class | Secrets | OpenBao-native fit | Plan | | ||
| | --- | --- | --- | --- | | ||
| | **Database creds** | fleetdm MySQL `fleet` user, Valkey | ✅ **Database engine, static roles** | Phase 1–2 below | |
|
|
||
| ### Validation | ||
|
|
||
| - `kubectl kustomize k8s/clusters/{local,prod}/` build; `ksail workload validate`. |
Comment on lines
+43
to
+44
| OpenBao's **Database secrets engine** supports **static roles** for MySQL/MariaDB | ||
| and Valkey: OpenBao stores and **automatically rotates the password of an |
Comment on lines
+114
to
+117
| ### Phase 2 — fleetdm Valkey | ||
|
|
||
| Same pattern using the Valkey database plugin and a static role for the redis | ||
| user, once Phase 1 is proven. |
| ## Rollout order | ||
|
|
||
| 1. Phase 1 — fleetdm MySQL static-role rotation (this design's flagship). | ||
| 2. Phase 2 — fleetdm Valkey static-role rotation. |
Comment on lines
+23
to
+27
| The `push-generated-secrets.yaml` PushSecrets use `refreshInterval: "0"`, and the | ||
| comment implies a non-zero value would rotate. **It would not.** ESO **v2.5.0**'s | ||
| PushSecret reconciler persists a `GeneratorState` (statemanager) and reuses the | ||
| prior state, keeping generator output **stable** across reconciles. Empirically: | ||
| generated values have been unchanged for days and there are no live |
Comment on lines
+60
to
+63
| 2. **Consume the rotated credential** — *not* a plain `ExternalSecret`. ESO's | ||
| Vault provider supports **KV only** ("The KV Secrets Engine is the only one | ||
| supported by this provider"), so the database engine must be read via the | ||
| **`VaultDynamicSecret` generator** (`generators.external-secrets.io`). A |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Planning artifact for automated secret rotation, before any prod-affecting implementation — the flagship change (OpenBao Database engine credential handover on the prod fleetdm MySQL) is high-risk and stateful, so it warrants a reviewed design first.
Key validated finding
Setting a non-zero
refreshIntervalon the generator-backed PushSecrets would not rotate them. Reading ESO v2.5.0 source: the PushSecret reconciler persists aGeneratorState(statemanager) and reuses it, so generator output stays stable across reconciles. Corroborated empirically (values unchanged for days; zero liveGeneratorStateinstances). Rotation needs an explicit mechanism, not a config flip.Design summary
rotation_period) for fleetdm MySQLfleetuser (Phase 1) and Valkey (Phase 2). Static roles fit apps that read a credential from aSecretat startup; dynamic roles don't.vault-configJob, the credential-handover sequence (fresh + existing cluster), risks + rollback, and validation (CI's local-cluster system test exercises the chain).Status
Proposal only — no manifests changed. Ready to implement Phase 1 as a follow-up PR on approval.
🤖 Generated with Claude Code