Describe the bug
Still related to the recently closed
#677
it is possible to get the operator in a state where leader election with high availability does not behave as expected
To reproduce
- create an operator in kubernetes with two pods
- allow the operator to do some work with some custom resources in another namespace than the operator
- apply a network policy on the namespace to stop all network traffic on the same namespace as the deployed operator
cat <<EOF | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: deny-kube-api
namespace: my-operator-namespace
spec:
podSelector: {}
policyTypes:
- Egress
egress:
- to:
- ipBlock:
cidr: 10.0.0.1/32
ports:
- protocol: TCP
port: 443
EOF
- wait for a while, until the timeout errors appear (takes about 10 minutes or so)
- remove the network policy
at this point you will see that either both pods are acting as leaders, and both are processing resources, or neither pods doing any work until a restart of either pod happens...
Expected behavior
When the deny network policy is applied for 15 minutes+ and then removed, only one pod should continue processing while other pod should be idle
also if the process exited with an error after a number of retries have been unsuccessful, that would be okay too, but this is up for a wider discussion
Screenshots
No response
Additional Context
No response
Describe the bug
Still related to the recently closed
#677
it is possible to get the operator in a state where leader election with high availability does not behave as expected
To reproduce
at this point you will see that either both pods are acting as leaders, and both are processing resources, or neither pods doing any work until a restart of either pod happens...
Expected behavior
When the deny network policy is applied for 15 minutes+ and then removed, only one pod should continue processing while other pod should be idle
also if the process exited with an error after a number of retries have been unsuccessful, that would be okay too, but this is up for a wider discussion
Screenshots
No response
Additional Context
No response