Summary
When the operator falls into the Update → Recreate StatefulSet path and r.sts.Update() inside doDeleteStatefulSet returns any non-nil error, three compounding bugs in pkg/controller/common/statefulset/statefulset-reconciler.go cause the reconcile to end with the StatefulSet stuck at spec.replicas: 0, no further retry, and the CHI marked Completed. The only recovery is to bounce the operator pod or kubectl scale --replicas=1 manually.
We hit this in production during a chart upgrade 0.24.2 → 0.26.3 (full sanitized trace attached as operator-logs.txt). The bugs are version-agnostic: the affected file is byte-identical between release-0.26.1, release-0.26.2, release-0.26.3, release-0.27.0, and master.
This issue is intermittent. The buggy code path is reached deterministically on every reconcile that takes the Update → Recreate branch, but whether the cascade pins a stuck end-state depends on whether r.sts.Update() at line 527 inside doDeleteStatefulSet returns a non-nil error in that particular call. The same trigger sequence on the same cluster will sometimes produce a clean recreate and sometimes the stuck cascade. Our reproduction (see below) reflects this intermittency by retrying until the cascade fires; production hit it on the first attempt during a chart upgrade 0.24.2 → 0.26.3.
Affected versions
- Reproduced: 0.26.3
- Code-path verified unchanged in: 0.27.0, master (HEAD)
- Likely present since
recreateStatefulSet was introduced in this form (≥ 0.24.x)
Root cause: three compounding bugs
All line numbers are against pkg/controller/common/statefulset/statefulset-reconciler.go at tag release-0.26.3. Same lines on master.
The three bugs compound: any single one in isolation would either be benign or self-recover. Together, they produce silent end-of-reconcile at Replicas=0 with CHI=Completed. Bug 2 is the chain's first link — fixing Bug 2 alone breaks the cascade and resolves the production failure (see Proposed fix below). Bugs 1 and 3 are described here for completeness.
Bug 1 — recreateStatefulSet discards the error from doDeleteStatefulSet (line 195)
func (r *Reconciler) recreateStatefulSet(ctx context.Context, host *api.Host, register bool, opts *ReconcileOptions) error {
...
_ = r.doDeleteStatefulSet(ctx, host) // <-- error silently discarded
_ = r.storage.ReconcilePVCs(ctx, host, api.DesiredStatefulSet)
return r.createStatefulSet(ctx, host, register, opts)
}
If the graceful delete fails, the recreate proceeds anyway. The original STS is still in the cluster, so the next call will predictably fail with AlreadyExists.
Bug 2 — doDeleteStatefulSet aborts before reaching Delete if the scale-to-0 Update returns any error (lines 525-530)
var zero int32 = 0
host.Runtime.CurStatefulSet.Spec.Replicas = &zero
if _, err := r.sts.Update(ctx, host.Runtime.CurStatefulSet); err != nil {
log.V(1).M(host).Error("UNABLE to update StatefulSet %s/%s", namespace, name)
return err // <-- returns here, never reaches r.sts.Delete() at line 536
}
The function name says doDeleteStatefulSet, the contract is "the STS is gone after I return", but the body abandons the goal on a transient Update error and never falls through to r.sts.Delete() — which is what would actually fulfill the contract.
Bug 3 — doCreateStatefulSet collapses every Create error (incl. AlreadyExists) to ErrCRUDRecreate, which is silently mapped to nil (lines 411-414 and 388-391)
// doCreateStatefulSet
if _, err := r.sts.Create(ctx, statefulSet); err != nil {
log.V(1).M(host).F().Error("StatefulSet create failed. err: %v", err)
return common.ErrCRUDRecreate
}
// shouldAbortOrContinueCreateStatefulSet
case common.ErrCRUDRecreate:
r.a.V(1).M(host).Warning("Got recreate action. Ignore and continue for now")
return nil
The parent caller reconcileHostStatefulSet (pkg/controller/chi/worker-reconciler-chi.go:413) registers the STS as successfully reconciled and the CHI moves to Completed despite the host being at Replicas=0.
Production trace
Full sanitized log: operator-logs.txt. Key sequence (line numbers in [brackets] reference the source file above):
hostScaleDown(): Host shutdown via scale down: 0-0
diff item [15]:'.Replicas' = '0' # STS now at Replicas=0
updateStatefulSet(...) - started
updateStatefulSet(...) switch from Update to Recreate # recreateStatefulSet begins [186]
doDeleteStatefulSet(): .../chi-clickhouse-chcluster1-0-0 [508]
UNABLE to update StatefulSet .../chi-clickhouse-chcluster1-0-0 [528] # Bug 2 fires
# r.sts.Update() returned a non-nil error;
# function returned, never called Delete()
createStatefulSet(...) - started
doCreateStatefulSet(): ...
StatefulSet create failed. err: statefulsets.apps "..." already exists [412] # consequence of Bugs 1+2
Got recreate action. Ignore and continue for now [390] # Bug 3 fires
finalizeReconcileAndMarkCompleted(): reconcile completed successfully # CHI -> Completed
Note: the underlying error type returned by r.sts.Update() at line 527 in our production trace is not directly visible in the log because the logging at line 528 (log.V(1).M(host).Error("UNABLE to update StatefulSet %s/%s", namespace, name)) only formats namespace/name and silently discards the err value. Consequently, while the cascade signature is unambiguous (Bug 2 fired), the specific apiserver error class (409 / 422 / 403 / 503 / network) at this call site is unknown from the log alone. Bug 2's return err is unconditional — the cascade fires identically for any non-nil error, so the proposed fix is correct regardless. Logging the underlying error here would be a useful diagnostic improvement on its own.
What the production log does prove: Operation cannot be fulfilled ... the object has been modified (the apiserver's verbatim 409 Conflict message) occurs on this cluster at runtime — see cr.go:82 statusUpdateRetry() at log line 2065. This site catches and retries it correctly. The doDeleteStatefulSet site is identical machinery without retry-on-conflict logic.
End state, persisted across operator reconciles:
$ kubectl -n default get sts chi-clickhouse-chcluster1-0-0
NAME READY AGE
chi-clickhouse-chcluster1-0-0 0/0 28d
$ kubectl -n default get chi clickhouse -o jsonpath='{.status.status}'
Completed
The operator never picks the CHI up again. Bouncing the operator pod recovers it.
What triggers this in production
Two contributing factors that together produced the stuck end-state in our case:
- Operator chart upgrade with pod-template diff (
0.24.2 → 0.26.3). All existing CHIs reconcile at startup of the new operator pod; the chart upgrade introduced fields that landed in the immutable region of StatefulSet.spec (selector, serviceName, volumeClaimTemplates), which made doUpdateStatefulSet fail at line 470 and the operator switch to the recreate path. This is directly observed in the production operator log.
- A non-nil error returned by
r.sts.Update() at line 527 during doDeleteStatefulSet. The log confirms this failure occurred (UNABLE to update StatefulSet, line 528) but does not reveal the underlying error class. The most likely candidate is a 409 Conflict race against kube-controller-manager writing .status.observedGeneration / .status.currentRevision / .status.replicas during pod termination after hostScaleDown set Replicas=0 ~9 seconds earlier. The operator's Get → Update window inside doDeleteStatefulSet (lines 512-527) was 261 ms in our production trace (10:22:20.325250 → 10:22:20.586212), wide enough for multiple status writes by kube-controller-manager to land inside it. The 409-via-RV-race is a strongly-supported hypothesis, not a directly-confirmed fact. Other plausible candidates (422 Invalid, 403 Forbidden from a webhook, 503 apiserver/etcd transient, network timeout) are all consistent with the same log evidence and produce the same downstream cascade because Bug 2 doesn't differentiate.
Whatever the specific apiserver error was, factor #2 is intermittent: same trigger #1 on the same cluster sometimes produces a clean recreate and sometimes the stuck cascade, depending on the millisecond-scale interleaving inside doDeleteStatefulSet's Get → Update window. Our reproduction (below) confirms the cascade fires from 409 Conflict specifically, but the proposed fix corrects Bug 2 for any non-nil error class.
Reproduction (kind cluster)
repro/ — full reproduction with real 409 Conflict. Topology matches production: 1 CHI + 1 CHK on stock operator 0.26.3.
The setup:
- Pre-stage
STS to 0 replicas (simulates production's hostSoftwareRestart-fail → hostScaleDown outcome on operator startup)
- Run a Go racer (
rv-racer/) doing 32 concurrent Get → mutate annotations → Update loops on the STS — faithful proxy for kube-controller-manager's STS controller writing .status fields during pod transitions
- Patch
CHI.spec.templates.volumeClaimTemplates[0].spec.resources.requests.storage (immutable on STS) — forces doUpdateStatefulSet to fail 422 Invalid, switching operator to recreate path
recreateStatefulSet → doDeleteStatefulSet → Get (RV=N) → racer's update lands (RV=N+1) → operator's Update → real 409 Conflict → Bug 2 fires → cascade
Because the bug is intermittent (the racer must land an Update in the operator's ~5-30 ms Get → Update window), the script retries up to 8 times until the cascade fires; typically fires on attempt 1-3.
A control test (run-control.sh) runs the same trigger sequence with the racer disabled — it recovers cleanly. This A/B proves the 409 is the differentiator between healthy reconcile and the stuck cascade.
Side-by-side cascade comparison
| Event |
Source line |
Production |
Kind reproduction |
switch from Update to Recreate |
:292 |
I0519 10:22:20.309900 |
I0523 05:11:17.096780 |
doDeleteStatefulSet started |
:508 |
I0519 10:22:20.586211 |
I0523 05:11:17.103113 |
UNABLE to update StatefulSet |
:528 |
E0519 10:22:20.586212 |
E0523 05:11:17.131153 |
StatefulSet create failed ... already exists |
:412 |
E0519 10:22:20.911908 |
E0523 05:11:17.175664 |
Got recreate action. Ignore and continue for now |
:390 |
W0519 10:22:22.067685 |
W0523 05:11:17.241865 |
reconcile completed successfully |
worker.go:415 |
I0519 10:54:08.271354 |
I0523 05:12:30.431607 |
(Full repro log: repro/minimal/operator.log. Cascade lines only: repro/minimal/cascade.log.)
Proposed fix
The minimal correct fix is a single change to doDeleteStatefulSet. A second optional change to recreateStatefulSet adds defense-in-depth without altering existing CRUD-error semantics.
Patch A (canonical) — doDeleteStatefulSet must honor its contract
The function's name and contract say "the StatefulSet is gone after I return". Today, if the courtesy scale-to-0 Update fails for any reason, the function returns at line 529 and never calls Delete — the very call that would have actually removed the object. Fall through to Delete regardless of the Update outcome. Delete is what guarantees the contract; the scale-to-0 is a graceful-shutdown courtesy that's safe to skip when it fails.
func (r *Reconciler) doDeleteStatefulSet(ctx context.Context, host *api.Host) error {
name := r.namer.Name(interfaces.NameStatefulSet, host)
namespace := host.Runtime.Address.Namespace
log.V(1).M(host).F().Info("%s/%s", namespace, name)
cur, err := r.sts.Get(ctx, host)
if err != nil {
if apiErrors.IsNotFound(err) {
log.V(1).M(host).Info("NEUTRAL not found StatefulSet %s/%s", namespace, name)
return nil
}
log.V(1).M(host).F().Error("FAIL get StatefulSet %s/%s err:%v", namespace, name, err)
return err
}
host.Runtime.CurStatefulSet = cur
// Courtesy scale-to-0 for graceful pod termination.
// Skip if already at 0; on any failure, proceed to Delete which terminates pods anyway.
if cur.Spec.Replicas == nil || *cur.Spec.Replicas != 0 {
var zero int32 = 0
cur.Spec.Replicas = &zero
if _, uerr := r.sts.Update(ctx, cur); uerr != nil {
// INTENTIONAL: do not return here. The function contract is "STS is gone" and
// r.sts.Delete below is what fulfills it. Update-to-zero failures are common
// (e.g. 409 Conflict from kube-controller-manager writing .status fields during
// pod transitions) and must not block the actual delete.
log.V(1).M(host).Warning(
"Scale-to-0 update failed (%v) - proceeding to Delete which will terminate pods", uerr)
} else {
_ = r.hostObjectsPoller.WaitHostStatefulSetReady(ctx, host)
}
} else {
log.V(1).M(host).Info("StatefulSet %s/%s already at Replicas=0, skipping scale-down", namespace, name)
}
if derr := r.sts.Delete(ctx, namespace, name); derr != nil && !apiErrors.IsNotFound(derr) {
log.V(1).M(host).F().Error("FAIL delete StatefulSet %s/%s err: %v", namespace, name, derr)
return derr
}
log.V(1).M(host).Info("OK delete StatefulSet %s/%s", namespace, name)
return nil
}
This single change resolves the bug end-to-end: Delete removes the StatefulSet, the subsequent r.sts.Create() succeeds (no AlreadyExists), the host completes its recreate cleanly, and the CHI legitimately moves to Completed. Bugs 1 and 3 become academic because their cascade preconditions are no longer met.
The fix is correct against all error types (409 Conflict, 422 Invalid, 403 Forbidden, 503 Unavailable, network timeouts) since all are handled identically by deferring to Delete. It preserves graceful shutdown when the courtesy Update succeeds, and doesn't change CRUD error propagation — no risk of accidentally flipping CHIs to ReconcileAbort.
Patch B (optional, defense-in-depth) — recreateStatefulSet shouldn't silently swallow delete errors
With Patch A in place, doDeleteStatefulSet only returns a non-nil error if Get or Delete itself failed in an unrecoverable way. That signal shouldn't be discarded:
func (r *Reconciler) recreateStatefulSet(ctx context.Context, host *api.Host, register bool, opts *ReconcileOptions) error {
if util.IsContextDone(ctx) {
return nil
}
if err := r.doDeleteStatefulSet(ctx, host); err != nil {
r.a.V(1).M(host).Warning(
"Recreate deferred: delete failed: %v - will retry on next reconcile", err)
return err // plain error — propagates to ReconcileStatefulSet caller, NOT ErrCRUDAbort
}
_ = r.storage.ReconcilePVCs(ctx, host, api.DesiredStatefulSet)
return r.createStatefulSet(ctx, host, register, opts)
}
Returning a plain error (not common.ErrCRUDAbort) keeps the failure transient — the next reconcile pass will retry, matching the contract for create/update failures handled via OnStatefulSetCreateFailed/OnStatefulSetUpdateFailed. Returning ErrCRUDAbort here would flip the CHI to ReconcileAbort and require manual intervention, which is the wrong policy for transient errors.
Happy to open a PR with Patch A (and optionally Patch B) plus a unit test that constructs a fake IKubeSTS returning Conflict on the scale-to-0 Update and asserts that doDeleteStatefulSet (a) still calls Delete, (b) returns nil, and that recreateStatefulSet end-to-end produces a fresh StatefulSet rather than leaving the cluster at Replicas=0.
Environment
- Operator: 0.26.3 (image tag pinned, upstream chart, no patches)
- ClickHouse: 25.8.17
- Kubernetes: 1.27
reconcile.statefulSet.recreate.onUpdateFailure: recreate (upstream default)
reconcile.statefulSet.update.onFailure: abort (upstream default)
- CHI: 1 shard × 1 replica, separate CHK with 3 keepers
Attached files (All part of the attached zip)
clickhouse.yaml — sanitized ClickHouseInstallation manifest reconciled in the production trace.
operator-values.yaml — sanitized values.yaml for the operator chart.
operator-logs.txt — sanitized operator pod log covering the failing reconcile pass.
repro/ — kind reproduction (Go racer + reproduction script + control test + captured operator logs + stuck CHI/STS yaml).
In the three top-level attachments, only environment-specific identifiers were renamed (namespace, internal user name, pod IPs, image registry, internal Secret/Volume names). Line numbers, timestamps, message text, and diff structure are preserved verbatim.
issue.zip
Summary
When the operator falls into the
Update → RecreateStatefulSet path andr.sts.Update()insidedoDeleteStatefulSetreturns any non-nil error, three compounding bugs inpkg/controller/common/statefulset/statefulset-reconciler.gocause the reconcile to end with the StatefulSet stuck atspec.replicas: 0, no further retry, and the CHI markedCompleted. The only recovery is to bounce the operator pod orkubectl scale --replicas=1manually.We hit this in production during a chart upgrade
0.24.2 → 0.26.3(full sanitized trace attached asoperator-logs.txt). The bugs are version-agnostic: the affected file is byte-identical betweenrelease-0.26.1,release-0.26.2,release-0.26.3,release-0.27.0, andmaster.This issue is intermittent. The buggy code path is reached deterministically on every reconcile that takes the
Update → Recreatebranch, but whether the cascade pins a stuck end-state depends on whetherr.sts.Update()at line 527 insidedoDeleteStatefulSetreturns a non-nil error in that particular call. The same trigger sequence on the same cluster will sometimes produce a clean recreate and sometimes the stuck cascade. Our reproduction (see below) reflects this intermittency by retrying until the cascade fires; production hit it on the first attempt during a chart upgrade0.24.2 → 0.26.3.Affected versions
recreateStatefulSetwas introduced in this form (≥ 0.24.x)Root cause: three compounding bugs
All line numbers are against
pkg/controller/common/statefulset/statefulset-reconciler.goat tagrelease-0.26.3. Same lines onmaster.The three bugs compound: any single one in isolation would either be benign or self-recover. Together, they produce silent end-of-reconcile at
Replicas=0withCHI=Completed. Bug 2 is the chain's first link — fixing Bug 2 alone breaks the cascade and resolves the production failure (seeProposed fixbelow). Bugs 1 and 3 are described here for completeness.Bug 1 —
recreateStatefulSetdiscards the error fromdoDeleteStatefulSet(line 195)If the graceful delete fails, the recreate proceeds anyway. The original STS is still in the cluster, so the next call will predictably fail with
AlreadyExists.Bug 2 —
doDeleteStatefulSetaborts before reachingDeleteif the scale-to-0Updatereturns any error (lines 525-530)The function name says
doDeleteStatefulSet, the contract is "the STS is gone after I return", but the body abandons the goal on a transientUpdateerror and never falls through tor.sts.Delete()— which is what would actually fulfill the contract.Bug 3 —
doCreateStatefulSetcollapses everyCreateerror (incl.AlreadyExists) toErrCRUDRecreate, which is silently mapped tonil(lines 411-414 and 388-391)The parent caller
reconcileHostStatefulSet(pkg/controller/chi/worker-reconciler-chi.go:413) registers the STS as successfully reconciled and the CHI moves toCompleteddespite the host being atReplicas=0.Production trace
Full sanitized log:
operator-logs.txt. Key sequence (line numbers in[brackets]reference the source file above):Note: the underlying error type returned by
r.sts.Update()at line 527 in our production trace is not directly visible in the log because the logging at line 528 (log.V(1).M(host).Error("UNABLE to update StatefulSet %s/%s", namespace, name)) only formatsnamespace/nameand silently discards theerrvalue. Consequently, while the cascade signature is unambiguous (Bug 2 fired), the specific apiserver error class (409 / 422 / 403 / 503 / network) at this call site is unknown from the log alone. Bug 2'sreturn erris unconditional — the cascade fires identically for any non-nil error, so the proposed fix is correct regardless. Logging the underlying error here would be a useful diagnostic improvement on its own.What the production log does prove:
Operation cannot be fulfilled ... the object has been modified(the apiserver's verbatim409 Conflictmessage) occurs on this cluster at runtime — seecr.go:82statusUpdateRetry()at log line 2065. This site catches and retries it correctly. ThedoDeleteStatefulSetsite is identical machinery without retry-on-conflict logic.End state, persisted across operator reconciles:
The operator never picks the CHI up again. Bouncing the operator pod recovers it.
What triggers this in production
Two contributing factors that together produced the stuck end-state in our case:
0.24.2 → 0.26.3). All existing CHIs reconcile at startup of the new operator pod; the chart upgrade introduced fields that landed in the immutable region ofStatefulSet.spec(selector, serviceName, volumeClaimTemplates), which madedoUpdateStatefulSetfail at line 470 and the operator switch to the recreate path. This is directly observed in the production operator log.r.sts.Update()at line 527 duringdoDeleteStatefulSet. The log confirms this failure occurred (UNABLE to update StatefulSet, line 528) but does not reveal the underlying error class. The most likely candidate is a409 Conflictrace againstkube-controller-managerwriting.status.observedGeneration/.status.currentRevision/.status.replicasduring pod termination afterhostScaleDownsetReplicas=0~9 seconds earlier. The operator'sGet → Updatewindow insidedoDeleteStatefulSet(lines 512-527) was 261 ms in our production trace (10:22:20.325250 → 10:22:20.586212), wide enough for multiple status writes bykube-controller-managerto land inside it. The 409-via-RV-race is a strongly-supported hypothesis, not a directly-confirmed fact. Other plausible candidates (422 Invalid,403 Forbiddenfrom a webhook,503apiserver/etcd transient, network timeout) are all consistent with the same log evidence and produce the same downstream cascade because Bug 2 doesn't differentiate.Whatever the specific apiserver error was, factor #2 is intermittent: same trigger #1 on the same cluster sometimes produces a clean recreate and sometimes the stuck cascade, depending on the millisecond-scale interleaving inside
doDeleteStatefulSet'sGet → Updatewindow. Our reproduction (below) confirms the cascade fires from409 Conflictspecifically, but the proposed fix corrects Bug 2 for any non-nil error class.Reproduction (kind cluster)
repro/— full reproduction with real 409 Conflict. Topology matches production: 1 CHI + 1 CHK on stock operator0.26.3.The setup:
STSto 0 replicas (simulates production'shostSoftwareRestart-fail →hostScaleDownoutcome on operator startup)rv-racer/) doing 32 concurrentGet → mutate annotations → Updateloops on the STS — faithful proxy forkube-controller-manager's STS controller writing.statusfields during pod transitionsCHI.spec.templates.volumeClaimTemplates[0].spec.resources.requests.storage(immutable on STS) — forcesdoUpdateStatefulSetto fail422 Invalid, switching operator to recreate pathrecreateStatefulSet → doDeleteStatefulSet → Get(RV=N) → racer's update lands (RV=N+1) → operator'sUpdate→ real 409 Conflict → Bug 2 fires → cascadeBecause the bug is intermittent (the racer must land an Update in the operator's ~5-30 ms
Get → Updatewindow), the script retries up to 8 times until the cascade fires; typically fires on attempt 1-3.A control test (
run-control.sh) runs the same trigger sequence with the racer disabled — it recovers cleanly. This A/B proves the 409 is the differentiator between healthy reconcile and the stuck cascade.Side-by-side cascade comparison
switch from Update to Recreate:292doDeleteStatefulSetstarted:508UNABLE to update StatefulSet:528StatefulSet create failed ... already exists:412Got recreate action. Ignore and continue for now:390reconcile completed successfullyworker.go:415(Full repro log:
repro/minimal/operator.log. Cascade lines only:repro/minimal/cascade.log.)Proposed fix
The minimal correct fix is a single change to
doDeleteStatefulSet. A second optional change torecreateStatefulSetadds defense-in-depth without altering existing CRUD-error semantics.Patch A (canonical) —
doDeleteStatefulSetmust honor its contractThe function's name and contract say "the StatefulSet is gone after I return". Today, if the courtesy scale-to-0
Updatefails for any reason, the function returns at line 529 and never callsDelete— the very call that would have actually removed the object. Fall through toDeleteregardless of theUpdateoutcome.Deleteis what guarantees the contract; the scale-to-0 is a graceful-shutdown courtesy that's safe to skip when it fails.This single change resolves the bug end-to-end:
Deleteremoves the StatefulSet, the subsequentr.sts.Create()succeeds (noAlreadyExists), the host completes its recreate cleanly, and the CHI legitimately moves toCompleted. Bugs 1 and 3 become academic because their cascade preconditions are no longer met.The fix is correct against all error types (409 Conflict, 422 Invalid, 403 Forbidden, 503 Unavailable, network timeouts) since all are handled identically by deferring to
Delete. It preserves graceful shutdown when the courtesyUpdatesucceeds, and doesn't change CRUD error propagation — no risk of accidentally flipping CHIs toReconcileAbort.Patch B (optional, defense-in-depth) —
recreateStatefulSetshouldn't silently swallow delete errorsWith Patch A in place,
doDeleteStatefulSetonly returns a non-nil error ifGetorDeleteitself failed in an unrecoverable way. That signal shouldn't be discarded:Returning a plain
error(notcommon.ErrCRUDAbort) keeps the failure transient — the next reconcile pass will retry, matching the contract for create/update failures handled viaOnStatefulSetCreateFailed/OnStatefulSetUpdateFailed. ReturningErrCRUDAborthere would flip the CHI toReconcileAbortand require manual intervention, which is the wrong policy for transient errors.Happy to open a PR with Patch A (and optionally Patch B) plus a unit test that constructs a fake
IKubeSTSreturningConflicton the scale-to-0Updateand asserts thatdoDeleteStatefulSet(a) still callsDelete, (b) returnsnil, and thatrecreateStatefulSetend-to-end produces a fresh StatefulSet rather than leaving the cluster atReplicas=0.Environment
reconcile.statefulSet.recreate.onUpdateFailure: recreate(upstream default)reconcile.statefulSet.update.onFailure: abort(upstream default)Attached files (All part of the attached zip)
clickhouse.yaml— sanitizedClickHouseInstallationmanifest reconciled in the production trace.operator-values.yaml— sanitizedvalues.yamlfor the operator chart.operator-logs.txt— sanitized operator pod log covering the failing reconcile pass.repro/— kind reproduction (Go racer + reproduction script + control test + captured operator logs + stuck CHI/STS yaml).In the three top-level attachments, only environment-specific identifiers were renamed (namespace, internal user name, pod IPs, image registry, internal Secret/Volume names). Line numbers, timestamps, message text, and diff structure are preserved verbatim.
issue.zip