fix: resolve readiness WaitReady blocking for 5 minutes on startup#408
fix: resolve readiness WaitReady blocking for 5 minutes on startup#408
Conversation
Backport fixes from upstream apache/apisix-ingress-controller#2663. Root cause: readiness.Start() is asynchronous. If a controller's reconcile loop calls Done() before Start() finishes registering resources, Done() finds no state entry and returns early. The resource is never removed from state, causing WaitReady to block until the 5-minute timeout. Changes: - Done() now waits for Start() to complete (<-r.started) before operating on state, eliminating the race condition - WaitReady() returns false on timeout instead of true (semantic fix: timed-out != ready) - Remove unnecessary mutex in registerState() since Done() is now guaranteed to run after Start() closes r.started - Add log statements for easier debugging of readiness lifecycle
|
Caution Review failedPull request was closed or merged during review No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe readiness manager now synchronizes initialization and resource cleanup: ChangesReadiness Manager Lifecycle Fixes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 5 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Pull request overview
This PR backports an upstream fix to the readiness manager to prevent startup full-sync from being blocked for the full 5-minute timeout due to a race between asynchronous Start() initialization and early Done() calls from controller reconciles.
Changes:
- Gate
Done()execution on readiness initialization by waiting onr.startedbefore accessingr.state. - Fix
WaitReady()timeout semantics to returnfalse(timeout ≠ ready). - Adjust readiness lifecycle logging and remove the mutex from
registerState().
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| r.isReady.Store(true) | ||
| close(r.done) | ||
| } | ||
| r.log.Info("readiness manager started") |
| if r.IsReady() { | ||
| return | ||
| } | ||
| <-r.started | ||
|
|
| r.mu.Lock() | ||
| defer r.mu.Unlock() | ||
| gvk := types.GvkOf(obj) | ||
| r.log.Info("marking resource as done", "gvk", gvk, "name", nn, "state_count", len(r.state[gvk])) |
conformance test report - apisix-standalone modeapiVersion: gateway.networking.k8s.io/v1
date: "2026-05-08T03:51:52Z"
gatewayAPIChannel: experimental
gatewayAPIVersion: v1.3.0
implementation:
contact: null
organization: APISIX
project: apisix-ingress-controller
url: https://github.com/apache/apisix-ingress-controller.git
version: v2.0.0
kind: ConformanceReport
mode: default
profiles:
- core:
result: success
statistics:
Failed: 0
Passed: 12
Skipped: 0
name: GATEWAY-GRPC
summary: Core tests succeeded.
- core:
result: partial
skippedTests:
- HTTPRouteHTTPSListener
statistics:
Failed: 0
Passed: 32
Skipped: 1
extended:
result: partial
skippedTests:
- HTTPRouteRedirectPortAndScheme
statistics:
Failed: 0
Passed: 11
Skipped: 1
supportedFeatures:
- GatewayAddressEmpty
- GatewayPort8080
- HTTPRouteBackendProtocolWebSocket
- HTTPRouteDestinationPortMatching
- HTTPRouteHostRewrite
- HTTPRouteMethodMatching
- HTTPRoutePathRewrite
- HTTPRoutePortRedirect
- HTTPRouteQueryParamMatching
- HTTPRouteRequestMirror
- HTTPRouteResponseHeaderModification
- HTTPRouteSchemeRedirect
unsupportedFeatures:
- GatewayHTTPListenerIsolation
- GatewayInfrastructurePropagation
- GatewayStaticAddresses
- HTTPRouteBackendProtocolH2C
- HTTPRouteBackendRequestHeaderModification
- HTTPRouteBackendTimeout
- HTTPRouteParentRefPort
- HTTPRoutePathRedirect
- HTTPRouteRequestMultipleMirrors
- HTTPRouteRequestPercentageMirror
- HTTPRouteRequestTimeout
name: GATEWAY-HTTP
summary: Core tests partially succeeded with 1 test skips. Extended tests partially
succeeded with 1 test skips.
- core:
result: partial
skippedTests:
- TLSRouteSimpleSameNamespace
statistics:
Failed: 0
Passed: 10
Skipped: 1
name: GATEWAY-TLS
summary: Core tests partially succeeded with 1 test skips. |
conformance test report - apisix modeapiVersion: gateway.networking.k8s.io/v1
date: "2026-05-08T03:52:31Z"
gatewayAPIChannel: experimental
gatewayAPIVersion: v1.3.0
implementation:
contact: null
organization: APISIX
project: apisix-ingress-controller
url: https://github.com/apache/apisix-ingress-controller.git
version: v2.0.0
kind: ConformanceReport
mode: default
profiles:
- core:
result: success
statistics:
Failed: 0
Passed: 12
Skipped: 0
name: GATEWAY-GRPC
summary: Core tests succeeded.
- core:
failedTests:
- HTTPRouteInvalidBackendRefUnknownKind
result: failure
skippedTests:
- HTTPRouteHTTPSListener
statistics:
Failed: 1
Passed: 31
Skipped: 1
extended:
result: partial
skippedTests:
- HTTPRouteRedirectPortAndScheme
statistics:
Failed: 0
Passed: 11
Skipped: 1
supportedFeatures:
- GatewayAddressEmpty
- GatewayPort8080
- HTTPRouteBackendProtocolWebSocket
- HTTPRouteDestinationPortMatching
- HTTPRouteHostRewrite
- HTTPRouteMethodMatching
- HTTPRoutePathRewrite
- HTTPRoutePortRedirect
- HTTPRouteQueryParamMatching
- HTTPRouteRequestMirror
- HTTPRouteResponseHeaderModification
- HTTPRouteSchemeRedirect
unsupportedFeatures:
- GatewayHTTPListenerIsolation
- GatewayInfrastructurePropagation
- GatewayStaticAddresses
- HTTPRouteBackendProtocolH2C
- HTTPRouteBackendRequestHeaderModification
- HTTPRouteBackendTimeout
- HTTPRouteParentRefPort
- HTTPRoutePathRedirect
- HTTPRouteRequestMultipleMirrors
- HTTPRouteRequestPercentageMirror
- HTTPRouteRequestTimeout
name: GATEWAY-HTTP
summary: Core tests failed with 1 test failures. Extended tests partially succeeded
with 1 test skips.
- core:
result: partial
skippedTests:
- TLSRouteSimpleSameNamespace
statistics:
Failed: 0
Passed: 10
Skipped: 1
name: GATEWAY-TLS
summary: Core tests partially succeeded with 1 test skips. |
conformance test reportapiVersion: gateway.networking.k8s.io/v1
date: "2026-05-08T04:13:28Z"
gatewayAPIChannel: experimental
gatewayAPIVersion: v1.3.0
implementation:
contact: null
organization: APISIX
project: apisix-ingress-controller
url: https://github.com/apache/apisix-ingress-controller.git
version: v2.0.0
kind: ConformanceReport
mode: default
profiles:
- core:
failedTests:
- GRPCExactMethodMatching
- GRPCRouteHeaderMatching
- GRPCRouteListenerHostnameMatching
- GatewayModifyListeners
result: failure
statistics:
Failed: 4
Passed: 8
Skipped: 0
name: GATEWAY-GRPC
summary: Core tests failed with 4 test failures.
- core:
failedTests:
- GatewayModifyListeners
result: failure
skippedTests:
- HTTPRouteHTTPSListener
statistics:
Failed: 1
Passed: 31
Skipped: 1
extended:
failedTests:
- HTTPRouteBackendProtocolWebSocket
result: failure
skippedTests:
- HTTPRouteRedirectPortAndScheme
statistics:
Failed: 1
Passed: 10
Skipped: 1
supportedFeatures:
- GatewayAddressEmpty
- GatewayPort8080
- HTTPRouteBackendProtocolWebSocket
- HTTPRouteDestinationPortMatching
- HTTPRouteHostRewrite
- HTTPRouteMethodMatching
- HTTPRoutePathRewrite
- HTTPRoutePortRedirect
- HTTPRouteQueryParamMatching
- HTTPRouteRequestMirror
- HTTPRouteResponseHeaderModification
- HTTPRouteSchemeRedirect
unsupportedFeatures:
- GatewayHTTPListenerIsolation
- GatewayInfrastructurePropagation
- GatewayStaticAddresses
- HTTPRouteBackendProtocolH2C
- HTTPRouteBackendRequestHeaderModification
- HTTPRouteBackendTimeout
- HTTPRouteParentRefPort
- HTTPRoutePathRedirect
- HTTPRouteRequestMultipleMirrors
- HTTPRouteRequestPercentageMirror
- HTTPRouteRequestTimeout
name: GATEWAY-HTTP
summary: Core tests failed with 1 test failures. Extended tests failed with 1 test
failures.
- core:
failedTests:
- GatewayModifyListeners
- TLSRouteSimpleSameNamespace
result: failure
statistics:
Failed: 2
Passed: 9
Skipped: 0
name: GATEWAY-TLS
summary: Core tests failed with 2 test failures. |
Root Cause
readiness.Start()is asynchronous. A controller's reconcile loop can callDone()beforeStart()finishes registering resources intor.state. When this happens:Done()checksr.state[gvk]— entry doesn't exist yet (race condition)Done()returns early without doing anythingStart()later registers the resource intor.stateDone()again for that resourceWaitReadyblocks until the full 5-minute timeoutChanges
Done()now blocks on<-r.startedbefore touchingr.state, ensuringStart()has finished registering all resources before anyDone()call can proceedWaitReady()returnsfalseon timeout instead oftrue— timed out ≠ readyregisterState()— safe becauseDone()is now guaranteed to run afterStart()closesr.started, soregisterState()is always called before any concurrentDone()accessTesting
go build ./...passesgo vet ./...passesSummary by CodeRabbit
Bug Fixes
Chores