During restart fallback, the operator can leave a host without its StatefulSet and pod while still reporting reconcile success.
Observed flow for 2 replica cluster, host 0-0:
- The operator attempted a software restart, but ClickHouse was already unreachable with
connection refused.
- It fell back to host shutdown via StatefulSet scale down / recreate.
- The StatefulSet update switched from
Update to Recreate.
- The operator deleted the host StatefulSet
chi-test-0-0, but the delete wait timed out.
- Despite the failed/timed-out delete, the operator continued into the create path.
- The replacement StatefulSet create failed because Kubernetes still reported the old StatefulSet as being deleted:
object is being deleted: statefulsets.apps "chi-test-0-0" already exists.
- That create failure was converted into a recreate action and ignored with
Got recreate action. Ignore and continue for now.
- The host was then marked as successfully shut down/reconciled, and the overall reconcile completed successfully.
- Later discovery showed
No cur StatefulSet available ... not found for host 0-0, while only host 0-1 still had a current StatefulSet.
- The cluster temporarily had only one pod IP, confirming that host
0-0 had no running replacement pod.
Expected behavior:
If StatefulSet delete times out or replacement StatefulSet creation fails, reconcile must abort and report failure. The operator must not continue as if the host was successfully reconciled, because the host may be left without its StatefulSet and pod.
Impact:
A restart fallback can silently leave a ClickHouse host missing its StatefulSet and pod while the CHI reconcile reports success. This makes the failure easy to miss and may require a later manual or follow-up reconcile to recover the missing replica.
Minimal Logs:
# Minimal evidence log - StatefulSet recreate false success
Context: host `0-0` was force-restarted. Software restart failed because ClickHouse was unreachable, so the operator fell back to StatefulSet recreate.
```log
I0528 09:00:35.406748 1 worker.go:185] shouldForceRestartHost():Host:0-0[0/0]:example-namespace/example-chi:RollingUpdate requires force restart. Host: 0-0
I0528 09:00:50.827881 1 worker-reconciler-chi.go:528] hostSoftwareRestart():Host:0-0[0/0]:example-namespace/example-chi:Host software restart start. Host: 0-0
I0528 09:00:51.017828 1 schemer.go:182] HostShutdown():Host:0-0[0/0]:example-namespace/example-chi:Host shutdown: 0-0
E0528 09:00:51.019670 1 connection.go:267] Exec():FAILED Exec(https://example.invalid/redacted doRequest: transport failed to send a request to ClickHouse: dial tcp 10.0.0.15:8123: connect: connection refused for SQL: SYSTEM SHUTDOWN
I0528 09:00:51.019959 1 worker-reconciler-chi.go:540] hostSoftwareRestart():Host:0-0[0/0]:example-namespace/example-chi:Host software restart abort 2. Host: 0-0 err: doRequest: transport failed to send a request to ClickHouse: dial tcp 10.0.0.15:8123: connect: connection refused
I0528 09:00:51.020027 1 worker-reconciler-chi.go:605] hostScaleDown():Host:0-0[0/0]:example-namespace/example-chi:Reconcile host. Host shutdown via scale down: 0-0
I0528 09:00:51.036949 1 statefulset-reconciler.go:163] ReconcileStatefulSet():Host:0-0[0/0]:example-namespace/example-chi:Need to reconcile MODIFIED StatefulSet: example-namespace/chi-example-chi-example-chi-0-0
I0528 09:00:56.833394 1 statefulset-reconciler.go:292] updateStatefulSet():Host:0-0[0/0]:example-namespace/example-chi:Update StatefulSet(example-namespace/chi-example-chi-example-chi-0-0) switch from Update to Recreate
I0528 09:00:56.843885 1 statefulset-reconciler.go:508] doDeleteStatefulSet():Host:0-0[0/0]:example-namespace/example-chi:example-namespace/chi-example-chi-example-chi-0-0
I0528 09:11:04.728203 1 poller.go:108] Poll():delete StatefulSet: example-namespace/chi-example-chi-example-chi-0-0:poll(delete StatefulSet: example-namespace/chi-example-chi-example-chi-0-0) - TIMEOUT reached
E0528 09:11:04.728251 1 statefulset-reconciler.go:541] doDeleteStatefulSet():Host:0-0[0/0]:example-namespace/example-chi:FAIL delete StatefulSet example-namespace/chi-example-chi-example-chi-0-0 err: poll(delete StatefulSet: example-namespace/chi-example-chi-example-chi-0-0) - wait timeout
I0528 09:11:07.537836 1 statefulset-reconciler.go:341] createStatefulSet():Host:0-0[0/0]:example-namespace/example-chi:Create StatefulSet: example-namespace/chi-example-chi-example-chi-0-0 - started
E0528 09:11:07.749947 1 statefulset-reconciler.go:412] doCreateStatefulSet():Host:0-0[0/0]:example-namespace/example-chi:StatefulSet create failed. err: object is being deleted: statefulsets.apps "chi-example-chi-example-chi-0-0" already exists
W0528 09:11:07.749990 1 statefulset-reconciler.go:390] Host:0-0[0/0]:example-namespace/example-chi:Got recreate action. Ignore and continue for now
I0528 09:11:07.755586 1 worker-reconciler-chi.go:614] hostScaleDown():Host:0-0[0/0]:example-namespace/example-chi:Host shutdown success. Host: 0-0
I0528 09:11:23.818147 1 worker-reconciler-chi.go:1065] reconcileHostIncludeIntoAllActivities():Host:0-0[0/0]:example-namespace/example-chi:Reconcile Host completed. Host: 0-0 ClickHouse version running: 25.8.16[25.8.16.10002/parsed from the tag: '25.8.16.10002']
I0528 09:12:58.868275 1 worker.go:415] finalizeReconcileAndMarkCompleted():CHI:example-namespace/example-chi:reconcile completed successfully, task id: task-a
I0528 09:30:18.763933 1 worker-boilerplate.go:156] processReconcilePod():unknown:Delete Pod. example-namespace/chi-example-chi-example-chi-0-0-0
I0528 09:33:25.142783 1 statefulset-reconciler.go:109] unknown:No cur StatefulSet available and the reason is - not found. Either new one or a deleted sts: example-namespace/chi-example-chi-example-chi-0-0
W0528 09:33:25.142815 1 statefulset-reconciler.go:111] unknown:No cur StatefulSet available but host has an ancestor. Found deleted sts. for: example-namespace/chi-example-chi-example-chi-0-0
I0528 09:33:25.209389 1 worker-reconciler-chi.go:166] CHI:example-namespace/example-chi:IPs of the CR example-namespace/example-chi: len: 1 [10.0.0.12]
I0528 09:39:33.693460 1 worker.go:552] Host:0-0[0/0]:example-namespace/example-chi:Host status: modified. Host: ns:example-namespace|chi:example-chi|clu:example-chi|sha:0|rep:0|host:0-0
I0528 09:39:50.694415 1 statefulset-reconciler.go:341] createStatefulSet():Host:0-0[0/0]:example-namespace/example-chi:Create StatefulSet: example-namespace/chi-example-chi-example-chi-0-0 - started
I0528 09:39:50.942331 1 worker-boilerplate.go:146] processReconcilePod():unknown:Add Pod. example-namespace/chi-example-chi-example-chi-0-0-0
I0528 09:41:01.773120 1 statefulset-reconciler.go:447] waitHostStatefulSetToLaunch():Host:0-0[0/0]:example-namespace/example-chi:Host sts ready. Host: 0-0
I0528 09:41:01.773142 1 statefulset-reconciler.go:452] waitHostStatefulSetToLaunch():Host:0-0[0/0]:example-namespace/example-chi:Host launched. Host: 0-0
I0528 09:41:01.858894 1 statefulset-reconciler.go:368] shouldAbortOrContinueCreateStatefulSet():Host:0-0[0/0]:example-namespace/example-chi:Create StatefulSet: example-namespace/chi-example-chi-example-chi-0-0 - completed
Bug signal:
software restart failed
-> fallback to StatefulSet recreate
-> StatefulSet delete wait timed out
-> replacement StatefulSet create failed because old STS was still deleting
-> create failure was ignored as recreate action
-> host and CHI reconcile were marked successful
-> later discovery showed host 0-0 had no current StatefulSet and only one pod IP remained
-> later manual/follow-up reconcile recreated the missing StatefulSet/pod
During restart fallback, the operator can leave a host without its StatefulSet and pod while still reporting reconcile success.
Observed flow for 2 replica cluster, host
0-0:connection refused.UpdatetoRecreate.chi-test-0-0, but the delete wait timed out.object is being deleted: statefulsets.apps "chi-test-0-0" already exists.Got recreate action. Ignore and continue for now.No cur StatefulSet available ... not foundfor host0-0, while only host0-1still had a current StatefulSet.0-0had no running replacement pod.Expected behavior:
If StatefulSet delete times out or replacement StatefulSet creation fails, reconcile must abort and report failure. The operator must not continue as if the host was successfully reconciled, because the host may be left without its StatefulSet and pod.
Impact:
A restart fallback can silently leave a ClickHouse host missing its StatefulSet and pod while the CHI reconcile reports success. This makes the failure easy to miss and may require a later manual or follow-up reconcile to recover the missing replica.
Minimal Logs:
Bug signal: