Refine PrometheusFailedRate signal and incident metadata#5477
Refine PrometheusFailedRate signal and incident metadata#5477rhamitarora wants to merge 1 commit into
Conversation
Harden the PrometheusFailedRate rule with scoped and backward-compatible remote-write metrics, safer ratio math, and explicit correlation fields so incidents group by failing endpoint. Update alert tests and runbook URLs to match the new behavior and improve on-call triage. Co-authored-by: Cursor <cursoragent@cursor.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: rhamitarora The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @rhamitarora. Thanks for your PR. I'm waiting for a Azure member to verify that this patch is reasonable to test. If it is, they should reply with Tip We noticed you've done this a few times! Consider joining the org to skip this step and gain Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Harden the PrometheusFailedRate rule with scoped and backward-compatible remote-write metrics, safer ratio math, and explicit correlation fields so incidents group by failing endpoint. Update alert tests and runbook URLs to match the new behavior and improve on-call triage.
What
Why
Testing
Special notes for your reviewer
PR Checklist