Skip to content

Improve detection of connect/handshake failures and the retry-policy#3038

Merged
mgravell merged 3 commits intomainfrom
marc/ddos
Mar 19, 2026
Merged

Improve detection of connect/handshake failures and the retry-policy#3038
mgravell merged 3 commits intomainfrom
marc/ddos

Conversation

@mgravell
Copy link
Collaborator

@mgravell mgravell commented Mar 19, 2026

This PR investigates and resolves #3036, but also addresses some other related failure modes.

The fundamental issue in ^^^ was:

  • in PhysicalBridge, when invoking the retry-policy, we pass a count of how many retries we've already done, which is used by the policy, for example to implement exponential decay
  • this counter is reset when the shouldResetConnectionRetryCount flag is true
  • the code was incorrectly setting this flag to true even in "establishing" cases, due to an otherwise-intentional case fall-thru

So the simple part of this fix is simply: move the assignment of shouldResetConnectionRetryCount so that we only reset the retry count after a successful handshake (ConnectionEstablished).


This fix was tested using the in-process-server infrastructure, with failure modes:

  • actively (fault) reject connections
  • passively (no completion) reject connections
  • accept connections but never return responses
  • accept connections but return garbage responses

This testing highlit a second failure mode, whereby the "never return responses" scenario failed to reconnect. This turned out to be because our message timeout logic only considered "async" scenarios, since "sync" timeout is active from the perspective of the caller (Monitor.Wait). This means that if the handshake "critical tracer" (the echo or ping that we use with a payload that gets validated) never responds... well, nothing happens. The handshake is neither sync nor async, because it is F+F without a result-box.

We therefore extend the timeout logic to explicitly identify this handshake scenario (via a new pseudo-CommandFlags value), and burn the connection accordingly.

@mgravell mgravell merged commit f9af64f into main Mar 19, 2026
8 checks passed
@mgravell mgravell deleted the marc/ddos branch March 19, 2026 16:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SE.Redis Connection Storms & High CPU

2 participants