Skip to content

Sentinel client not able to recover after all sentinels went down #3237

@nponiros

Description

@nponiros

Description

I am running 3 sentinels as a stateful set in openshift and as part of my testing I shutdown all sentinels one by one until all were down and then started them up again. After the sentinels started the node client was not able to reconnect. I printed out the trace and added it below. From what I can tell the following happened:

  1. Initially the node client connected to redis-sentinel-0.redis-sentinel-svc.namespace.svc.cluster.local:26379
  2. It got sentinel data and replaced sentinelRootNodes with the data it received keeping the node that is connected. The data it received contain IPs instead of host names.
  3. Sentinels start going down
  4. The still connected sentinel keeps updating sentinelRootNodes until it can no longer connect. At that point it seems like sentinelRootNodes contains redis-sentinel-0.redis-sentinel-svc.namespace.svc.cluster.local:26379 and another root node with IP 10.242.67.238.
  5. The redis client tries to connect to both redis-sentinel-0.redis-sentinel-svc.namespace.svc.cluster.local and 10.242.67.238 and failing to do so.
  6. The originally connected sentinel client (redis-sentinel-0.redis-sentinel-svc.namespace.svc.cluster.local) fails and gets removed from sentinelRootNodes
  7. sentinelRootNodes now only contains the node with IP 10.242.67.238 and continues trying to connect using it
  8. Sentinels start coming back up but with different IPs than before
  9. The redis client can't recover because it still uses an old IP and it also has no nodes anymore in sentinelRootNodes that use a host name.

Note that I tested also with version 5.8.3 and that one works because the originally connected node (Step 6 above) does not get removed from sentinelRootNodes. I'm not sure if this can be reproduced without using Kubernetes or OpenShift so that you can shutdown the sentinels one by one but this what I use to connect:

const sentinelRootNodes = [
  {host: 'redis-sentinel-0.redis-sentinel-svc.namespace.svc.cluster.local', port: 26379},
  {host: 'redis-sentinel-1.redis-sentinel-svc.namespace.svc.cluster.local', port: 26379},
  {host: 'redis-sentinel-2.redis-sentinel-svc.namespace.svc.cluster.local', port: 26379},
];

const settings = {
  name: 'NAME',
  sentinelRootNodes,
  nodeClientOptions: { password: 'PASSWORD' }
};

const redisClient = createSentinel(settings);

redisClient.on('error', (err) => {});

redisClient.connect();

Any advice is appreciated. Let me know if you need more information.

Node.js Version

24.13.0

Redis Server Version

7.4.8

Node Redis Version

5.12.0

Platform

Linux

Logs

TRACE:  starting connect loop
TRACE:  observe: trying to connect to sentinel: redis-sentinel-0.redis-sentinel-svc.namespace.svc.cluster.local:26379
TRACE:  observe: connected to sentinel
TRACE:  observe: got all sentinel data
TRACE:  observe: destroying sentinel client
TRACE:  analyze: master node has changed to 10.242.70.253:6379 from undefined:undefined
TRACE:  analyze: sentinel node has changed to redis-sentinel-0.redis-sentinel-svc.namespace.svc.cluster.local:26379
TRACE:  transform: enter
TRACE:  transform: opening a new sentinel
TRACE:  transform: not destroying old sentinel as not open
TRACE:  transform: creating new sentinel to redis-sentinel-0.redis-sentinel-svc.namespace.svc.cluster.local:26379
TRACE:  transform: adding sentinel client connect() to promise list
TRACE:  created sentinel client to redis-sentinel-0.redis-sentinel-svc.namespace.svc.cluster.local:26379
TRACE:  transform: emiting topology-change event for sentinel_change
TRACE:  RedisSentinel: re-emit for topology-change for SENTINEL_CHANGE event returned false
TRACE:  transform: opening a new master
TRACE:  transform: destroying old masters if open
TRACE:  transform: creating all master clients and adding connect promises
TRACE:  created master client to 10.242.70.253:6379
TRACE:  transform: adding promise to change #pubSubProxy node
TRACE:  transform: emiting topology-change event for master_change
TRACE:  RedisSentinel: re-emit for topology-change for MASTER_CHANGE event returned false
TRACE:  RedisSentinel: re-emit for topology-change for SENTINE_LIST_CHANGE event returned false
TRACE:  transform: exit
TRACE:  #connect: returning
TRACE:  finished connect
TRACE:  attemping to send command to 10.242.70.253:6379
TRACE:  pubsub control channel message on +sdown
TRACE:  starting connect loop
TRACE:  observe: trying to connect to sentinel: redis-sentinel-0.redis-sentinel-svc.namespace.svc.cluster.local:26379
TRACE:  pubsub control channel message on +sdown
TRACE:  observe: connected to sentinel
TRACE:  observe: got all sentinel data
TRACE:  observe: destroying sentinel client
TRACE:  analyze: master node hasn't changed from 10.242.70.253:6379
TRACE:  analyze: sentinel node hasn't changed
TRACE:  transform: enter
TRACE:  RedisSentinel: re-emit for topology-change for SENTINE_LIST_CHANGE event returned false
TRACE:  transform: exit
TRACE:  #connect: anotherReset is true, so continuing
TRACE:  finished connect
TRACE:  starting connect loop
TRACE:  observe: trying to connect to sentinel: redis-sentinel-0.redis-sentinel-svc.namespace.svc.cluster.local:26379
TRACE:  observe: connected to sentinel
TRACE:  observe: got all sentinel data
TRACE:  observe: destroying sentinel client
TRACE:  analyze: master node hasn't changed from 10.242.70.253:6379
TRACE:  analyze: sentinel node hasn't changed
TRACE:  transform: enter
TRACE:  transform: exit
TRACE:  #connect: returning
TRACE:  finished connect
TRACE:  finished reconfgure
TRACE:  attemping to send command to 10.242.70.253:6379
TRACE:  pubsub control channel message on +sdown
TRACE:  starting connect loop
TRACE:  observe: trying to connect to sentinel: redis-sentinel-0.redis-sentinel-svc.namespace.svc.cluster.local:26379
TRACE:  pubsub control channel message on +sdown
TRACE:  observe: error Error: getaddrinfo ENOTFOUND redis-sentinel-0.redis-sentinel-svc.namespace.svc.cluster.local
TRACE:  observe: trying to connect to sentinel: 10.242.67.238:26379
TRACE:  observe: error Error: Connection timeout
TRACE:  observe: none of the sentinels are available
TRACE:  #connect: exception None of the sentinels are available
Error: None of the sentinels are available
    at RedisSentinelInternal.observe (/app/api/node_modules/@redis/client/dist/lib/sentinel/index.js:808:15)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async #connect (/app/api/node_modules/@redis/client/dist/lib/sentinel/index.js:553:51)
TRACE:  finished connect
TRACE:  starting connect loop
TRACE:  observe: trying to connect to sentinel: redis-sentinel-0.redis-sentinel-svc.namespace.svc.cluster.local:26379
TRACE:  observe: error Error: getaddrinfo ENOTFOUND redis-sentinel-0.redis-sentinel-svc.namespace.svc.cluster.local
TRACE:  observe: trying to connect to sentinel: 10.242.67.238:26379
TRACE:  observe: error Error: Connection timeout
TRACE:  observe: none of the sentinels are available
TRACE:  #connect: exception None of the sentinels are available
Error: None of the sentinels are available
    at RedisSentinelInternal.observe (/app/api/node_modules/@redis/client/dist/lib/sentinel/index.js:808:15)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async #connect (/app/api/node_modules/@redis/client/dist/lib/sentinel/index.js:553:51)
TRACE:  finished connect
TRACE:  starting connect loop
TRACE:  observe: trying to connect to sentinel: redis-sentinel-0.redis-sentinel-svc.namespace.svc.cluster.local:26379
TRACE:  observe: error Error: getaddrinfo ENOTFOUND redis-sentinel-0.redis-sentinel-svc.namespace.svc.cluster.local
TRACE:  observe: trying to connect to sentinel: 10.242.67.238:26379
TRACE:  observe: error Error: Connection timeout
TRACE:  observe: none of the sentinels are available
TRACE:  #connect: exception None of the sentinels are available
Error: None of the sentinels are available
    at RedisSentinelInternal.observe (/app/api/node_modules/@redis/client/dist/lib/sentinel/index.js:808:15)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async #connect (/app/api/node_modules/@redis/client/dist/lib/sentinel/index.js:553:51)
TRACE:  finished connect
TRACE:  starting connect loop
TRACE:  observe: trying to connect to sentinel: redis-sentinel-0.redis-sentinel-svc.namespace.svc.cluster.local:26379
TRACE:  observe: trying to connect to sentinel: 10.242.67.238:26379
TRACE:  observe: error Error: Connection timeout
TRACE:  observe: none of the sentinels are available
TRACE:  #connect: exception None of the sentinels are available
Error: None of the sentinels are available
    at RedisSentinelInternal.observe (/app/api/node_modules/@redis/client/dist/lib/sentinel/index.js:808:15)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async #connect (/app/api/node_modules/@redis/client/dist/lib/sentinel/index.js:553:51)
TRACE:  finished connect
TRACE:  starting connect loop
TRACE:  observe: trying to connect to sentinel: redis-sentinel-0.redis-sentinel-svc.namespace.svc.cluster.local:26379
TRACE:  observe: error Error: getaddrinfo ENOTFOUND redis-sentinel-0.redis-sentinel-svc.namespace.svc.cluster.local
TRACE:  observe: trying to connect to sentinel: 10.242.67.238:26379
TRACE:  observe: error Error: Connection timeout
TRACE:  observe: none of the sentinels are available
TRACE:  #connect: exception None of the sentinels are available
Error: None of the sentinels are available
    at RedisSentinelInternal.observe (/app/api/node_modules/@redis/client/dist/lib/sentinel/index.js:808:15)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async #connect (/app/api/node_modules/@redis/client/dist/lib/sentinel/index.js:553:51)
TRACE:  finished connect
TRACE:  starting connect loop
TRACE:  observe: trying to connect to sentinel: 10.242.67.238:26379
TRACE:  observe: error Error: Connection timeout
TRACE:  observe: none of the sentinels are available
TRACE:  #connect: exception None of the sentinels are available
Error: None of the sentinels are available
    at RedisSentinelInternal.observe (/app/api/node_modules/@redis/client/dist/lib/sentinel/index.js:808:15)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async #connect (/app/api/node_modules/@redis/client/dist/lib/sentinel/index.js:553:51)
TRACE:  finished connect
TRACE:  starting connect loop
TRACE:  observe: trying to connect to sentinel: 10.242.67.238:26379
TRACE:  observe: error Error: Connection timeout
TRACE:  observe: none of the sentinels are available
TRACE:  #connect: exception None of the sentinels are available
Error: None of the sentinels are available
    at RedisSentinelInternal.observe (/app/api/node_modules/@redis/client/dist/lib/sentinel/index.js:808:15)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async #connect (/app/api/node_modules/@redis/client/dist/lib/sentinel/index.js:553:51)
TRACE:  finished connect
TRACE:  starting connect loop
TRACE:  observe: trying to connect to sentinel: 10.242.67.238:26379
TRACE:  observe: error Error: Connection timeout
TRACE:  observe: none of the sentinels are available
TRACE:  #connect: exception None of the sentinels are available
Error: None of the sentinels are available
    at RedisSentinelInternal.observe (/app/api/node_modules/@redis/client/dist/lib/sentinel/index.js:808:15)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async #connect (/app/api/node_modules/@redis/client/dist/lib/sentinel/index.js:553:51)
TRACE:  finished connect
TRACE:  starting connect loop
TRACE:  observe: trying to connect to sentinel: 10.242.67.238:26379
TRACE:  observe: error Error: Connection timeout
TRACE:  observe: none of the sentinels are available
TRACE:  #connect: exception None of the sentinels are available
Error: None of the sentinels are available
    at RedisSentinelInternal.observe (/app/api/node_modules/@redis/client/dist/lib/sentinel/index.js:808:15)
    at process.processTicksAndRejections (node:internal/process/task_queues:104:5)
    at async #connect (/app/api/node_modules/@redis/client/dist/lib/sentinel/index.js:553:51)
TRACE:  finished connect

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions