Seems that the ingress update affects the operator lease renewal causing it to exit and delaying the CR reconciliation.
2023-09-11T07:56:23Z INFO Waiting for ingress to update {"controller": "clusterrelocation", "controllerGroup": "rhsyseng.github.io", "controllerKind": "ClusterRelocation", "ClusterRelocation": {"name":"cluster"}, "namespace": "", "name": "cluster", "reconcileID": "0d68e259-8df3-4af2-a8cb-3cc0015b9c64"}
2023-09-11T07:56:33Z ERROR Reconciler error {"controller": "clusterrelocation", "controllerGroup": "rhsyseng.github.io", "controllerKind": "ClusterRelocation", "ClusterRelocation": {"name":"cluster"}, "namespace": "", "name": "cluster", "reconcileID": "0d68e259-8df3-4af2-a8cb-3cc0015b9c64", "error": "dial tcp 192.168.127.10:443: connect: connection refused"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:329
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:274
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:235
2023-09-11T07:56:33Z INFO validation succeeded {"controller": "clusterrelocation", "controllerGroup": "rhsyseng.github.io", "controllerKind": "ClusterRelocation", "ClusterRelocation": {"name":"cluster"}, "namespace": "", "name": "cluster", "reconcileID": "c2eb83a7-1de7-4d10-b0a7-22c3919ea01d"}
2023-09-11T07:56:33Z INFO TLS cert already exists for Ingresses {"controller": "clusterrelocation", "controllerGroup": "rhsyseng.github.io", "controllerKind": "ClusterRelocation", "ClusterRelocation": {"name":"cluster"}, "namespace": "", "name": "cluster", "reconcileID": "c2eb83a7-1de7-4d10-b0a7-22c3919ea01d"}
2023-09-11T07:56:33Z INFO Using user provided API certificate {"controller": "clusterrelocation", "controllerGroup": "rhsyseng.github.io", "controllerKind": "ClusterRelocation", "ClusterRelocation": {"name":"cluster"}, "namespace": "", "name": "cluster", "reconcileID": "c2eb83a7-1de7-4d10-b0a7-22c3919ea01d", "namespace": "relocation", "name": "new-api-certs"}
2023-09-11T07:56:33Z ERROR Reconciler error {"controller": "clusterrelocation", "controllerGroup": "rhsyseng.github.io", "controllerKind": "ClusterRelocation", "ClusterRelocation": {"name":"cluster"}, "namespace": "", "name": "cluster", "reconcileID": "c2eb83a7-1de7-4d10-b0a7-22c3919ea01d", "error": "dial tcp 192.168.127.10:443: connect: connection refused"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:329
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:274
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:235
E0911 07:56:52.138017 1 leaderelection.go:330] error retrieving resource lock openshift-operators/f4de3632.rhsyseng.github.io: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-operators/leases/f4de3632.rhsyseng.github.io": dial tcp 172.30.0.1:443: connect: connection refused
E0911 07:57:02.139324 1 leaderelection.go:330] error retrieving resource lock openshift-operators/f4de3632.rhsyseng.github.io: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-operators/leases/f4de3632.rhsyseng.github.io": dial tcp 172.30.0.1:443: connect: connection refused
E0911 07:58:23.958624 1 leaderelection.go:330] error retrieving resource lock openshift-operators/f4de3632.rhsyseng.github.io: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-operators/leases/f4de3632.rhsyseng.github.io": dial tcp 172.30.0.1:443: connect: connection refused
E0911 07:58:33.960122 1 leaderelection.go:330] error retrieving resource lock openshift-operators/f4de3632.rhsyseng.github.io: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-operators/leases/f4de3632.rhsyseng.github.io": dial tcp 172.30.0.1:443: connect: connection refused
Once the new instance starts it hangs for some time while it's trying to acquire the lease:
2023-09-11T07:58:56Z INFO setup starting manager
I0911 07:58:56.215805 1 leaderelection.go:248] attempting to acquire leader lease openshift-operators/f4de3632.rhsyseng.github.io...
2023-09-11T07:58:56Z INFO Starting server {"kind": "health probe", "addr": "[::]:8081"}
2023-09-11T07:58:56Z INFO Starting server {"path": "/metrics", "kind": "metrics", "addr": "127.0.0.1:8080"}
I0911 08:00:05.290010 1 leaderelection.go:258] successfully acquired lease openshift-operators/f4de3632.rhsyseng.github.io
2023-09-11T08:00:05Z DEBUG events cluster-relocation-operator-controller-manager-75666d5c5-tmn66_aca225e6-ef45-4574-b015-8132b0091818 became leader
Seems that the ingress update affects the operator lease renewal causing it to exit and delaying the CR reconciliation.
The operator exits while it's waiting for the ingress to get updated:
Once the new instance starts it hangs for some time while it's trying to acquire the lease:
Expected Behavior
Current Behavior
Possible Solution
Steps to Reproduce (for bugs)
Context
This issues delays the clusterrelocation CR reconciliation
I applied the CR on a stable cluster that was installed houres ago.
Regression
UnsureYour Environment
cluster-relocation-operator):latest operator from operator HUB
4.10