fix: requeue quickly after 409 conflict in HTTPProxy reconcile#169
Closed
drewr wants to merge 1 commit into
Closed
fix: requeue quickly after 409 conflict in HTTPProxy reconcile#169drewr wants to merge 1 commit into
drewr wants to merge 1 commit into
Conversation
When a new HTTPProxy is created, concurrent reconciles race to write the child Gateway, HTTPRoute, and EndpointSlice resources. The resulting 409 Conflict errors were returned as plain errors to controller-runtime, which applied exponential backoff. After ~15 conflicts in the initial burst the backoff reached 3-4 minutes, silencing the controller until the next periodic tick. This was observed on tunnel creation: the UI toggle stayed grey for ~3m47s before Programmed=True was set on the HTTPProxy (network-services-operator#166). Fix: replace the exponential-backoff path for 409 Conflict errors on child resource updates and on the HTTPProxy status update with an explicit RequeueAfter of retryAfterConflict (1s). This matches the pattern already used in gateway_dns_controller.go and result.go. Changes: - Rename the unnamed ctrl.Result return to a named 'result' variable so the deferred status-update block can set result.RequeueAfter on conflict without joining an error that would re-enter the backoff queue - Rename controllerutil.OperationResult locals from 'result' to 'opResult' to avoid shadowing the named return - Add IsConflict guard in the Gateway, HTTPRoute, and EndpointSlice CreateOrUpdate error paths - Add TestHTTPProxyReconcileConflictRequeue covering all three write paths
Contributor
Author
|
@savme Seeing if this passes the smell test on some issues I've been having lately with creating tunnels. Thanks! |
Contributor
|
This feels heavy handed. We need to understand why conflicts are happening before we just throw requeues at the problem. Seems like we should be using server side apply or better conflict resolution. |
Contributor
Author
|
Copy that. Incoming. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #166.
When a new tunnel (HTTPProxy) is created, concurrent reconciles race to write the child Gateway, HTTPRoute, and EndpointSlice. The resulting 409 Conflict errors were passed back to controller-runtime as plain errors, which applied exponential backoff — after ~15 conflicts in the initial burst, the backoff reached 3-4 minutes. The controller went silent until the next periodic tick.
This was the root cause of the ~3m47s delay between tunnel creation and
Programmed=Truebeing set on the HTTPProxy, which is when the UI toggle turns green.Before
After
Each 409 Conflict returns
ctrl.Result{RequeueAfter: 1s}instead of entering the exponential-backoff queue. The conflict resolves within one retry cycle;Programmed=Trueshould be set within seconds of the child resources being accepted.Changes
ctrl.Resultreturn from_ ctrl.Resulttoresult ctrl.Resultso the deferred status-update block can setresult.RequeueAfteron conflict without joining an error that re-enters the backoff queue.apierrors.IsConflictguard in the Gateway, HTTPRoute, and EndpointSliceCreateOrUpdateerror paths — returnsctrl.Result{RequeueAfter: retryAfterConflict}(1s), matching the pattern ingateway_dns_controller.goandresult.go.resultlocals (thecontrollerutil.OperationResultstring type) toopResultto avoid shadowing the named return.TestHTTPProxyReconcileConflictRequeuecovering all three write paths (Gateway update, HTTPRoute update, status update) with injected 409s.Testing
All three subtests pass; full controller suite clean.
Related