FIX: Reconcile treafik service with canary at 0#1692
Open
joaosilva15 wants to merge 1 commit into
Open
Conversation
Setting the weight to 100 on both services makes 50% of the traffic go to each service. This made our canary enter an infinity loop while promoting a new version and the traefik service go altered. The traefik service should not be changed as it is managed by flagger but getting stuck in an infinity loop is not great. The loop happened because during promotion with `StepWeightPromotion` when the traefik service gets reconciled the weights are reset. After that the getroutes makes [this calculus](https://github.com/fluxcd/flagger/blob/9a224a0c906354fcfcbc01d4d2df987389301e68/pkg/router/traefik.go#L163-L164) for the weights which returns 0 for the canary and then it would later not be able to exit [this](https://github.com/fluxcd/flagger/blob/v1.36.1/pkg/controller/scheduler.go#L491-L546). Besides this change do you know why are we treating the weights as percentages? Should I also change the get routes function to calculate the percentage based on the weights or it is coded like that because it is expected that flagger keeps the weights with those constraints? Signed-off-by: Joao Pedro Silva <jp.silva15@gmail.com>
f4b2c37 to
286d005
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Setting the weight to 100 on both services makes 50% of the traffic go to each service. This made our canary enter an infinity loop while promoting a new version and the traefik service go altered.
The traefik service should not be changed as it is managed by flagger but getting stuck in an infinity loop is not great. The loop happened because during promotion with
StepWeightPromotionwhen the traefik service gets reconciled the weights are reset. After that the getroutes makes thiscalculus for the weights which returns 0 for the canary and then it would later not be able to exit
this.
Besides this change do you know why are we treating the weights as percentages? Should I also change the get routes function to calculate the percentage based on the weights or is it coded like that because it is expected that flagger keeps the weights with those constraints?