Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-05-06

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
## Why

The platform has three related governance gaps that compound as the fleet grows toward ~80 tenants. The `nextcloud-platform` AppProject is bootstrap-only — its YAML in Git is the source of truth, but applying it requires manual `kubectl apply`. This caused real drift: commit 939db3d (4 May 2026) added a deny window for `nc-*` apps to block surprise auto-sync of shared values during office hours, but the fix sat unapplied for 2 days. On 6 May 2026 mid-day, a `common.yaml` change auto-synced to all tenants because the live AppProject still had only the platform-Application deny window. Separately, app version pinning is per-tenant only because the OPENCATALOGI/OPENCONNECTOR/OPENREGISTER env vars are constructed inside the ApplicationSet `goTemplate` (which can only see the tenant yaml), making it impossible to centrally pin versions in `common.yaml` or `env/*.yaml` — every version bump requires editing 21+ tenant files. And once versions *are* centrally pinned, a single `common.yaml` bump fan-outs to all tenants in one go: with v1.0.0 of the Conduction apps approaching, simultaneous `occ upgrade` across the fleet would saturate S3, Redis, and PgBouncer.

## What Changes

- **Bootstrap Application for AppProject** — add an Argo CD Application in the built-in `default` project that watches `nextcloud-platform/argo/projects/` and self-applies the `nextcloud-platform` AppProject. After eenmalige bootstrap, AppProject changes go via Git push.
- **Move version env-var construction from goTemplate to Helm-side `extraEnv`** in `common.yaml`. Define a structured `appVersions: {opencatalogi, openconnector, openregister}` key that Helm's layered merge resolves: `common.yaml` → `env/{accept,prod}.yaml` → `values/tenants/tenant-X.yaml`, last-wins.
- **BREAKING (semantic):** missing version on a tenant changes meaning. Today `tenant.apps.versions.X` unset → empty env var → install hook installs "latest/whatever the bootstrap script does". Post-change, missing → inherits env, then common (which will be pinned to v1.0.0 once it lands). Tenants implicitly tracking head will start tracking the central pin.
- **Deprecate** `tenant.apps.versions.*` field for one cycle; keep it readable as override fallback during the migration, then remove.
- **Add `nextcloud.platform/wave: "{{ .tenant.wave }}"` label** to Applications generated by the ApplicationSet so operators can do `argocd app sync -l nextcloud.platform/wave=N` for serialized wave-by-wave promotion.
- **Document phased promotion protocol** in `docs/ROLLOUTS.md`: bump `canary-overrides.yaml` (wave 0 only) → validate → bump `env/accept.yaml` → validate accept → bump `env/prod.yaml` and sync wave-by-wave with manual gates between waves.
- **Capacity check tasks** (no code changes): verify Argo controller `processors`, S3 connection limits, Redis maxclients, PgBouncer `default_pool_size` against fleet-wide upgrade burst.

## Capabilities

### New Capabilities
- `argo-self-managed-appproject`: How the `nextcloud-platform` AppProject is itself managed by Argo CD via a bootstrap Application, eliminating manual `kubectl apply` drift.
- `app-version-pinning`: How Conduction app versions (opencatalogi/openconnector/openregister) are pinned and inherited via Helm's layered values merge across `common.yaml` → `env/*.yaml` → `tenants/tenant-*.yaml`.
- `wave-gated-rollout`: How fleet-wide changes propagate in serialized waves with operator-gated promotion, using sync-wave annotations and a new platform-wave label.

### Modified Capabilities
<!-- None. No existing specs in openspec/specs/ — this is the first change in this repo. -->

## Impact

- **Argo CD config**: new `Application` manifest under `nextcloud-platform/argo/bootstrap/` (or similar), one-time `kubectl apply` to land it. AppProject afterwards is GitOps-managed.
- **ApplicationSet template** (`nextcloud-platform/argo/applicationsets/nextcloud-tenants.yaml`): remove version env-var synthesis from `goTemplate`, add `nextcloud.platform/wave` label.
- **Helm values** (`nextcloud-platform/values/common.yaml`): add `appVersions:` block + `extraEnv` entries that read from it with tenant-override fallback. Touches `env/accept.yaml` and `env/prod.yaml` if/when env-level overrides are needed at v1.0.0 time.
- **Tenant yaml files** (`nextcloud-platform/values/tenants/tenant-*.yaml`): batch migration from `tenant.apps.versions.*` to top-level `appVersions.*` (or strip entirely where the tenant should follow the platform default). Honor the existing `*vng*` validation exception.
- **Docs**: `docs/ROLLOUTS.md` gains the phased promotion protocol; `docs/ADDING-TENANT.md` updated to reflect new version-pinning location.
- **Validation script** (`scripts/validate-values.sh`): teach it about `appVersions.*` so missing/malformed entries are caught.
- **Operational runbook**: capacity-check tasks for Argo controller processors, S3, Redis, PgBouncer.
- **No data plane changes** — no DB migrations, no PVC moves, no namespace renames. Workloads keep running across the rollout.
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
## ADDED Requirements

### Requirement: App versions are defined under a top-level `appVersions` key

The Helm values for tenant deployments SHALL accept a top-level `appVersions` map with the keys `opencatalogi`, `openconnector`, and `openregister`, each holding a string version (e.g. `"1.0.0"`). This key MAY appear in any layer of the layered values architecture: `common.yaml`, `env/{accept,prod}.yaml`, or `values/tenants/tenant-*.yaml`.

#### Scenario: appVersions defined in common.yaml is the platform default
- **WHEN** `nextcloud-platform/values/common.yaml` defines `appVersions: {opencatalogi: "1.0.0", openconnector: "1.0.0", openregister: "1.0.0"}`
- **AND** no other layer overrides it
- **THEN** every tenant's Nextcloud pod MUST receive `OPENCATALOGI_VERSION=1.0.0`, `OPENCONNECTOR_VERSION=1.0.0`, `OPENREGISTER_VERSION=1.0.0` as environment variables

#### Scenario: env-level override applies to all tenants in that environment
- **WHEN** `common.yaml` sets `appVersions.opencatalogi: "1.0.0"`
- **AND** `env/accept.yaml` sets `appVersions.opencatalogi: "1.1.0-rc"`
- **THEN** every tenant with `tenant.environment: accept` MUST receive `OPENCATALOGI_VERSION=1.1.0-rc`
- **AND** every tenant with `tenant.environment: prod` MUST receive `OPENCATALOGI_VERSION=1.0.0`

#### Scenario: tenant-level override beats env and common
- **WHEN** `common.yaml` sets `appVersions.opencatalogi: "1.0.0"`
- **AND** `env/accept.yaml` sets `appVersions.opencatalogi: "1.1.0-rc"`
- **AND** `values/tenants/tenant-X-accept.yaml` sets `appVersions.opencatalogi: "1.0.5-hotfix"`
- **THEN** tenant `X-accept` MUST receive `OPENCATALOGI_VERSION=1.0.5-hotfix`
- **AND** all other accept tenants MUST receive `OPENCATALOGI_VERSION=1.1.0-rc`

#### Scenario: Layered merge order is common → env → db → tenant, last-wins
- **WHEN** the same `appVersions.X` key is set in multiple layers
- **THEN** the value from the layer applied last in the Helm merge MUST win, following the documented order in `CLAUDE.md`: `common.yaml`, then `env/<env>.yaml`, then `db/<dbType>.yaml`, then `values/tenants/tenant-<name>.yaml`

### Requirement: Version env vars are emitted by Helm, not by the ApplicationSet goTemplate

The `OPENCATALOGI_VERSION`, `OPENCONNECTOR_VERSION`, and `OPENREGISTER_VERSION` environment variables on the Nextcloud pod MUST be defined via Helm-side `extraEnv` entries in `nextcloud-platform/values/common.yaml`. They MUST NOT be constructed inside the ApplicationSet `goTemplate` `values:` block.

#### Scenario: ApplicationSet goTemplate no longer references app versions
- **WHEN** `nextcloud-platform/argo/applicationsets/nextcloud-tenants.yaml` is inspected
- **THEN** its `goTemplate` values block MUST NOT contain `OPENCATALOGI_VERSION`, `OPENCONNECTOR_VERSION`, or `OPENREGISTER_VERSION`

#### Scenario: common.yaml contains the version env-var definitions
- **WHEN** `nextcloud-platform/values/common.yaml` is inspected
- **THEN** `nextcloud.extraEnv` MUST contain entries for `OPENCATALOGI_VERSION`, `OPENCONNECTOR_VERSION`, and `OPENREGISTER_VERSION`
- **AND** each entry's `value` MUST be a Helm template expression that resolves the version from `.Values.appVersions.<app>` with `.Values.tenant.apps.versions.<app>` as a fallback override

### Requirement: Legacy `tenant.apps.versions.*` field remains an override for one deprecation cycle

During the deprecation cycle, the Helm template for each version env var MUST honor `.Values.tenant.apps.versions.<app>` as a final override that wins over `.Values.appVersions.<app>`. This preserves any pre-existing per-tenant pin until the migration is complete.

#### Scenario: Legacy pin still wins during deprecation cycle
- **WHEN** `common.yaml` sets `appVersions.opencatalogi: "1.0.0"`
- **AND** `values/tenants/tenant-Y-prod.yaml` still uses the legacy form `tenant.apps.versions.opencatalogi: "0.9.5"`
- **THEN** tenant `Y-prod` MUST receive `OPENCATALOGI_VERSION=0.9.5`

#### Scenario: Documentation flags legacy field as deprecated
- **WHEN** `docs/ADDING-TENANT.md` or the tenant template (`nextcloud-platform/values/templates/tenant-template.yaml`) is inspected
- **THEN** any reference to `tenant.apps.versions.*` MUST be marked as deprecated
- **AND** the recommended location for setting versions MUST be documented as `appVersions.*` in `common.yaml` or `env/*.yaml`

### Requirement: Validation script enforces appVersions presence in common.yaml

`scripts/validate-values.sh` MUST fail when `nextcloud-platform/values/common.yaml` does not define all three of `appVersions.opencatalogi`, `appVersions.openconnector`, `appVersions.openregister` as non-empty strings.

#### Scenario: validate-values.sh blocks empty appVersions
- **WHEN** an operator removes one of the three `appVersions.*` keys from `common.yaml`
- **AND** runs `./scripts/validate-values.sh`
- **THEN** the script MUST exit non-zero with a message identifying the missing key

#### Scenario: vng tenant exception preserved
- **WHEN** `validate-values.sh` runs against the full `values/tenants/` directory
- **THEN** any failure originating in a file matching `*vng*` MUST be ignored, consistent with the existing project convention

### Requirement: Missing version means inherit, not "latest"

After this change is fully migrated (i.e., legacy fallback removed in the follow-on cleanup commit), an unset `appVersions.<app>` at every layer MUST cause the corresponding env var to be empty. The install hook in `common.yaml` MAY treat an empty value as "use the chart default" or "fail loudly" — operators MUST NOT rely on empty meaning "install latest from upstream".

#### Scenario: Empty version is no longer an implicit "latest"
- **WHEN** the deprecation cycle ends and the legacy fallback is removed
- **AND** `common.yaml` defines `appVersions.opencatalogi: "1.0.0"`
- **THEN** every tenant inherits `OPENCATALOGI_VERSION=1.0.0` unless an env or tenant layer explicitly overrides
- **AND** there is no path by which an unset version resolves to a moving "latest" tag without operator intent
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
## ADDED Requirements

### Requirement: AppProject is GitOps-managed via a bootstrap Application

The platform SHALL include an Argo CD `Application` resource, named `appproject-bootstrap`, that watches the directory `nextcloud-platform/argo/projects/` in this repository and applies its contents to the `argocd` namespace. After eenmalige `kubectl apply` of the bootstrap Application itself, all subsequent changes to `nextcloud-platform/argo/projects/nextcloud-platform.yaml` (or any other file under that path) MUST take effect via Git push without further manual `kubectl apply`.

#### Scenario: AppProject change in Git is auto-applied to cluster
- **WHEN** an operator edits `nextcloud-platform/argo/projects/nextcloud-platform.yaml` and pushes to `upstream/main`
- **THEN** within Argo CD's normal refresh interval (≤3 minutes by default), the live `AppProject` named `nextcloud-platform` in the `argocd` namespace MUST reflect the committed YAML
- **AND** no manual `kubectl apply` step is required for the change to take effect

#### Scenario: Bootstrap Application is itself bootstrap-only
- **WHEN** the bootstrap Application manifest is changed in Git
- **THEN** the change MUST require a manual `kubectl apply` to take effect, because the bootstrap Application is the recursion-base and cannot manage itself

### Requirement: Bootstrap Application uses the built-in `default` project

The bootstrap Application's `spec.project` field MUST be set to `default` (the built-in Argo CD project). It MUST NOT use the `nextcloud-platform` AppProject.

#### Scenario: Bootstrap Application targets argocd namespace under default project
- **WHEN** the bootstrap Application is inspected
- **THEN** `spec.project` equals `default`
- **AND** `spec.destination.namespace` equals `argocd`
- **AND** the `nextcloud-platform` AppProject's `spec.destinations` list does NOT include the `argocd` namespace

### Requirement: Bootstrap Application disables auto-prune and preserves resources on deletion

The bootstrap Application's `spec.syncPolicy` MUST enable `automated` sync but MUST disable `prune` (set to `false` or omitted). The Application MUST set `spec.syncPolicy.preserveResourcesOnDeletion: true` (or use a finalizer pattern equivalent) so that accidental deletion of the Application does not cascade-delete the live AppProject.

#### Scenario: Stale AppProject manifest in Git does not delete live AppProject
- **WHEN** the file `nextcloud-platform/argo/projects/nextcloud-platform.yaml` is removed from Git
- **AND** Argo CD refreshes
- **THEN** the live `AppProject` named `nextcloud-platform` in the `argocd` namespace MUST remain present (no auto-prune)

#### Scenario: Deleting bootstrap Application preserves AppProject
- **WHEN** an operator runs `kubectl delete application appproject-bootstrap -n argocd`
- **THEN** the live `AppProject` named `nextcloud-platform` MUST remain present
- **AND** all tenant Applications referencing project `nextcloud-platform` MUST continue to operate

### Requirement: Bootstrap Application is auditable in the same repo as the AppProject

The bootstrap Application manifest MUST be checked in under `nextcloud-platform/argo/bootstrap/` (or a clearly named sibling directory under `nextcloud-platform/argo/`) so that operators can locate it without out-of-band knowledge.

#### Scenario: Operator finds bootstrap Application via repo layout
- **WHEN** an operator searches the repository for the manifest that bootstraps the `nextcloud-platform` AppProject
- **THEN** they find a single YAML file under `nextcloud-platform/argo/bootstrap/` (or equivalent named directory) containing an Argo CD `Application` resource with `metadata.name: appproject-bootstrap`

### Requirement: Manual `kubectl apply` remains a documented fallback during one deprecation cycle

For the first rollout cycle after this change lands, the documentation (`docs/SECRETS.md`, `docs/ROLLOUTS.md`, or a dedicated runbook) MUST retain instructions for manually applying the AppProject as a fallback in case the bootstrap Application is unavailable.

#### Scenario: Operator can recover when bootstrap Application is broken
- **WHEN** the bootstrap Application is in `Unknown` or `Degraded` state and the AppProject must be updated urgently
- **THEN** the documentation MUST instruct the operator to run `kubectl apply -f nextcloud-platform/argo/projects/nextcloud-platform.yaml` directly, with no other prerequisites
Loading