Skip to content

FIX: Reconcile treafik service with canary at 0#1692

Open
joaosilva15 wants to merge 1 commit into
fluxcd:mainfrom
joaosilva15:reconcile-traefik-weight-0
Open

FIX: Reconcile treafik service with canary at 0#1692
joaosilva15 wants to merge 1 commit into
fluxcd:mainfrom
joaosilva15:reconcile-traefik-weight-0

Conversation

@joaosilva15
Copy link
Copy Markdown

Setting the weight to 100 on both services makes 50% of the traffic go to each service. This made our canary enter an infinity loop while promoting a new version and the traefik service go altered.

The traefik service should not be changed as it is managed by flagger but getting stuck in an infinity loop is not great. The loop happened because during promotion with StepWeightPromotion when the traefik service gets reconciled the weights are reset. After that the getroutes makes this
calculus
for the weights which returns 0 for the canary and then it would later not be able to exit
this.

Besides this change do you know why are we treating the weights as percentages? Should I also change the get routes function to calculate the percentage based on the weights or is it coded like that because it is expected that flagger keeps the weights with those constraints?

Setting the weight to 100 on both services makes 50% of the traffic go
to each service. This made our canary enter an infinity loop while
promoting a new version and the traefik service go altered.

The traefik service should not be changed as it is managed by flagger
but getting stuck in an infinity loop is not great. The loop happened
because during promotion with `StepWeightPromotion` when the traefik
service gets reconciled the weights are reset. After that the getroutes
makes [this
calculus](https://github.com/fluxcd/flagger/blob/9a224a0c906354fcfcbc01d4d2df987389301e68/pkg/router/traefik.go#L163-L164)
for the weights which returns 0 for the canary and then it would later
not be able to exit
[this](https://github.com/fluxcd/flagger/blob/v1.36.1/pkg/controller/scheduler.go#L491-L546).

Besides this change do you know why are we treating the weights as
percentages? Should I also change the get routes function to calculate
the percentage based on the weights or it is coded like that because it
is expected that flagger keeps the weights with those constraints?

Signed-off-by: Joao Pedro Silva <jp.silva15@gmail.com>
@joaosilva15 joaosilva15 force-pushed the reconcile-traefik-weight-0 branch from f4b2c37 to 286d005 Compare July 31, 2024 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant