-
Notifications
You must be signed in to change notification settings - Fork 20
Description
Feature (What you would like to be added):
Currently, readinessProbe of etcd is set to an endpoint /healthz of HTTP server running in a backup sidecar.
This behaviour needed to be updated or improved as readinessProbe of clustered-etcd should depend on whether there is etcd-leader present or not then only it should serve the incoming write requests.
Motivation (Why is this needed?):
Approach/Hint to the implement solution (optional):
Approaches :
-
ETCDCTL_API=3 etcdctl endpoint health --endpoints=${ENDPOINTS} --command-timeout=Xs
etcdctl endpoint healthcommand performs a GET on the "health" key(source)- fails when there is no etcd leader or when Quorum is lost as GET request will fail if there is no etcd leader present.
Advantages of this Method (
etcdctl endpoint health).- We don't have to worry about such scenarios of causing outage as now snapshotter failure won't fails the readinessProbe of etcd.
- If there is no Quorum present, kubelet will mark the
etcd-membersasNotReadyand they won't able to serve the write as well as read requests.
Disadvantages of this Method (
etcdctl endpoint health).- Owner check feature depends on endpoint
/healthzof HTTP server because when Owner check fails it fails the readinessProbe of etcd by setting the HTTP status to 503 but this Owner check in multi-node scenario is already being discussed here. - It completely decouples the snapshotter of backup sidecar and readinessProbe of etcd, backup sidecar won't able to control when to let the traffic come in.
-
/healthendpoint of etcd.
/healthendpoint returnsfalseif one of the following conditions is met (source):- there is no etcd leader or leader-election is currently going on.
- the latency of a QGET request exceeds 1sec
Advantages and Disadvantage of Method 2 (
/healthendpoint).- similar to method 1.
-
Use endpoint
/healthzof HTTP server running in backup sidecar with modifications in such a way that wheneverbackup-restore leaderis elected it should setHTTP server status to 200for itself as well for allbackup-restore followersand set theHTTP server status to 503when there is no etcd-leader present.
Advantages of this Method (/healthz).- We still have some coupling between
snapshotterof backup sidecar andreadinessProbeof etcd, backup sidecar will able to control when to let the traffic come in for etcd.
Disadvantages of this Method (
/healthz).- It will takes time to implement as well as to handle edge cases.
Future Scope:
- Go with method 2 as it give us flexibility to set the
readinessProbefrom backup-sidecar and switch to gRPC instead of sending REST requests.
- We still have some coupling between