console: worker tree map in the console for cluster/ objects page#36645
Draft
leedqin wants to merge 2 commits into
Draft
console: worker tree map in the console for cluster/ objects page#36645leedqin wants to merge 2 commits into
leedqin wants to merge 2 commits into
Conversation
Surfaces an interactive heatmap on the cluster Overview page that answers "where is CPU going?" and "is it skewed across workers?" mirroring step 1 and step 3 of the dataflow-troubleshooting docs ladder. A "Where is CPU going?" button on the Resource Usage section opens a side drawer with per-replica tabs. Each tab renders one row per dataflow on the cluster, one cell per worker, colored by per-row elapsed_ns. Horizontal patterns reveal object-level skew (often a bad GROUP BY key); vertical patterns and the cluster-wide footer row reveal worker-level issues (a noisy neighbor). A skew badge (max/min) plus a tooltip showing ratio-to-average match the docs' canonical skew metric (>2 threshold). Driven by mz_introspection.mz_scheduling_elapsed_per_worker joined to mz_dataflow_operator_dataflows and mz_compute_exports so each row links back to its maintained-object detail page. The button is hidden for system clusters and clusters with replication_factor=0. The heat gradient is theme-aware via useColorModeValue: pale slate base in light mode, dark slate base in dark mode, sharing a warm mid-stop and red high-stop. Colorblind-safe (amber to red). The WorkerSkewHeatmap component is built generic on a HeatmapRow contract so the upcoming object-detail Performance tab can reuse it with operator-scoped rows. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Completes the dataflow-troubleshooting funnel on the maintained-object detail drawer. Where the cluster heatmap answers "which dataflow on this cluster is hot or skewed?", this surface answers the natural next question: "within this object's dataflow, which operator is the bottleneck, and on which workers?" Adds a Performance tab to ObjectDetailPanel (next to Definition and Freshness). The tab contains a replica selector defaulting to the first ready replica, with the same per-row-normalized heatmap as the cluster drawer but rows scoped to operators inside this object's dataflow. Joined via mz_compute_exports.export_id = object GlobalId and filters out structural operators (BuildRegion, InputRegion, LogOperatorHydration, etc.) so the surface only shows operators a user can reason about. The WorkerSkewHeatmap component lands generic on a HeatmapRow contract in the previous commit; this commit defines an OperatorRow that satisfies it, marking rows as non-clickable (operators have no further drill target). Empty states cover (a) objects not bound to a cluster (tables) and (b) clusters with no replicas. Uses the Console Alert wrapper to match existing conventions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Showing Cluster CPU worker skew
By operator in the object details