Skip JMX version fetch when broker status is already current#249
Open
hvan wants to merge 1 commit into
Open
Conversation
Adds brokerNeedsVersionUpdate to guard the BrokersState write so that JMX fetches are skipped for brokers whose recorded image and version already match the desired image, reducing unnecessary reconcile work. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
13 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The PR addresses two performance problems in the broker reconcile loop:
1. Eliminating redundant JMX calls
Previously, every reconcile cycle made a JMX HTTP call to every broker — even if nothing had changed. Since the reconciler runs continuously, this meant constant
network traffic to all brokers at all times. The new brokerNeedsVersionUpdate guard checks the cluster's stored state first: if the broker's recorded image and version
already match what's desired, the JMX call is skipped entirely. Only brokers that are new, have an empty version, or had their image changed will trigger a fetch.
2. Parallelizing the JMX calls that do happen
Previously the JMX calls that did run were sequential — broker 0 had to finish before broker 1 started, and so on. In a cluster with N brokers, total wait time was N ×
(JMX latency). The refactored updateStatusWithDockerImageAndVersion fans out all necessary JMX calls concurrently using goroutines and collects results over a
channel, so the total wait time is bounded by the slowest single broker rather than the sum of all of them.
Combined effect: In steady state (no image changes), zero JMX calls are made. During a rolling upgrade, only the brokers being updated trigger calls, and those calls
run in parallel. The 30s HTTP timeout added alongside this prevents any one stalled broker from blocking the reconciler indefinitely.
Summary
are skipped, reducing unnecessary network calls on every reconcile loop.
Test plan
isolation.