[DBMON-6141] Adding support for single connection to self-hosted cluster#22970
[DBMON-6141] Adding support for single connection to self-hosted cluster#22970sangeetashivaji wants to merge 9 commits intomasterfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7b15b1c36d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
|
||
| result_row = { | ||
| 'normalized_query_hash': str(normalized_query_hash), | ||
| 'server_node': str(server_node) if server_node else '', |
There was a problem hiding this comment.
Remove stale server_node from merged statement metrics
This row now carries server_node, but ClickhouseStatementMetrics._merge_rows_across_nodes still collapses multiple node rows into one row per normalized_query_hash by summing metrics across all nodes. The merged row therefore keeps a single node label (from the max-count row) while count, total_time, read/write bytes, etc. represent cluster-wide totals, which misattributes data whenever the same query runs on more than one node.
Useful? React with 👍 / 👎.
| if self._config.cluster_name: | ||
| self.tag_manager.set_tag("clustername", self._config.cluster_name, replace=True) |
There was a problem hiding this comment.
Populate clustername before computing database_instance
database_instance is set before the new clustername tag is added, and database_identifier is computed/cached from the tags available at that moment. With cluster_name configured, templates that include $clustername will not resolve correctly, which prevents users from distinguishing multiple cluster configurations on the same endpoint and can collapse identities unexpectedly.
Useful? React with 👍 / 👎.
|
✅ Tests 🎉 All green!❄️ No new flaky tests detected 🔗 Commit SHA: 972e974 | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback! |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files🚀 New features to boost your workflow:
|
What does this PR do?
Adds support for monitoring self-hosted multi-node ClickHouse clusters through a single agent connection. Introduces a new cluster_name config option that, when set, causes the agent to connect to one node but collect metrics and samples from all nodes in the cluster via clusterAllReplicas("cluster", "table").
Also adds per-node hostname attribution to query metrics, activity samples, and query completions - so each query is tagged with the specific ClickHouse node it ran on (server_node/hostname fields).
Motivation
Previously, clusterAllReplicas was only supported via single_endpoint_mode, which was designed for ClickHouse Cloud and hardcoded the cluster name to 'default'. Self-hosted clusters have named clusters (e.g.
dbm_cluster) and had no way to use the single-connection monitoring pattern — they had to configure a separate agent instance per node.
This PR decouples the two use cases:
Review checklist (to be filled by reviewers)
qa/skip-qalabel if the PR doesn't need to be tested during QA.backport/<branch-name>label to the PR and it will automatically open a backport PR once this one is merged