Skip to content

ZOOKEEPER-5033: Quorum SASL authentication fails permanently after Login TGT refresh thread exits#2367

Open
JHSUYU wants to merge 1 commit intoapache:masterfrom
JHSUYU:ZOOKEEPER-5033
Open

ZOOKEEPER-5033: Quorum SASL authentication fails permanently after Login TGT refresh thread exits#2367
JHSUYU wants to merge 1 commit intoapache:masterfrom
JHSUYU:ZOOKEEPER-5033

Conversation

@JHSUYU
Copy link
Copy Markdown

@JHSUYU JHSUYU commented Mar 25, 2026

JIRA: https://issues.apache.org/jira/browse/ZOOKEEPER-5033

Problem

When the Login TGT refresh thread silently exits (due to clock skew, KDC unavailability, etc.), the Kerberos credentials in the Subject expire. Subsequent reconnection attempts fail permanently because no code path triggers a re-login. The authLearner object is created once in QuorumPeer.initialize() and reused for all retry attempts with the same stale Subject.

Fix

  • Add Login.forceReLogin() that re-logins immediately (bypassing the minimum time check and Kerberos guard) to refresh stale credentials from JAAS config/keytab
  • Call forceReLogin() in SaslQuorumAuthLearner.authenticate() and SaslQuorumAuthServer.authenticate() on SASL failure, so the next authentication attempt uses fresh credentials
  • Add unit test SaslQuorumAuthReLoginTest that verifies credential recovery after corruption (passes with fix, fails without)

Test

  • SaslQuorumAuthReLoginTest.testReLoginOnSaslAuthFailure — verifies that after credential corruption and auth failure, forceReLogin() restores valid credentials for the next attempt
  • Existing QuorumDigestAuthTest (6 tests) — all pass, no regressions

…gin TGT refresh thread exits

Add forceReLogin() to Login that re-logins immediately without the
minimum time check, and call it from SaslQuorumAuthLearner and
SaslQuorumAuthServer when authentication fails. This ensures the next
authentication attempt uses fresh credentials after TGT expiration.
@JHSUYU
Copy link
Copy Markdown
Author

JHSUYU commented Mar 25, 2026

@symat @eolivelli Hi, could you take a look to see if this fix makes sense? Thanks! Happy to modify it based on any feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant