Skip to content

TRACING-6127: feat: add span_kind filter, show p95 latency instead of avg operation duration#1044

Merged
openshift-merge-bot[bot] merged 2 commits intorhobs:mainfrom
andreasgerstmayr:apm-dashboard-span-king
Apr 17, 2026
Merged

TRACING-6127: feat: add span_kind filter, show p95 latency instead of avg operation duration#1044
openshift-merge-bot[bot] merged 2 commits intorhobs:mainfrom
andreasgerstmayr:apm-dashboard-span-king

Conversation

@andreasgerstmayr
Copy link
Copy Markdown
Contributor

@andreasgerstmayr andreasgerstmayr commented Mar 27, 2026

Follow-up changes to #1043 (this PR depends on #1043):

  • add Span Kind variable to filter by span kind (defaults to SPAN_KIND_SERVER to avoid double-counting)
  • rename "Duration" to "Latency", and use P95 histogram quantile instead of average operation duration

@openshift-ci-robot
Copy link
Copy Markdown
Collaborator

openshift-ci-robot commented Mar 27, 2026

@andreasgerstmayr: This pull request references TRACING-6127 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Follow-up changes to #1043 (this PR depends on #1043):

  • add Span Kind variable to filter by span kind (defaults to SPAN_KIND_SERVER to avoid double-counting)
  • rename "Duration" to "Latency", and use P95 histogram quantile instead of average operation duration
  • update error rate unit

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Mar 27, 2026

Hi @andreasgerstmayr. Thanks for your PR.

I'm waiting for a rhobs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Tip

We noticed you've done this a few times! Consider joining the org to skip this step and gain /lgtm and other bot rights. We recommend asking approvers on your previous PRs to sponsor you.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 27, 2026

Important

Review skipped

Auto reviews are limited based on label configuration.

🚫 Review skipped — only excluded labels are configured. (1)
  • work-in-progress

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: e289f225-9716-48f3-8fcf-f019617fea50

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Renamed service panel from "Duration" to "Latency" and switched P95 latency computation to histogram_quantile; updated operations table units/headers and request-rate series name; added a multi-select span_kind dashboard variable and wired span_kind=~"${span_kind}" into variable matchers.

Changes

Cohort / File(s) Summary
APM dashboard changes
pkg/controllers/uiplugin/apm.go
Renamed service panel group metric header (Duration → Latency); replaced P95 duration ratio (_sum/_count) with histogram_quantile(.95, sum(rate(..._bucket...)) by (span_name, le)) and set series name to "P95 Latency"; changed operations table "Error rate" unit to RequestsPerSecondsUnit and request-rate series name to "Request rate"; renamed "Duration" column to "P95 Latency" (ms); added span_kind multi-select dashboard variable (default SPAN_KIND_SERVER, all .*) and added span_kind=~"${span_kind}" to variableMatchers.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main changes: adding span_kind filter and switching from average operation duration to P95 latency, which aligns with the changeset in apm.go.
Description check ✅ Passed The pull request description clearly relates to the changeset, mentioning span kind variable addition, renaming Duration to Latency, and using P95 histogram quantile instead of average operation duration.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/controllers/uiplugin/apm.go (1)

101-107: Minor: Error rate unit displays as "req/s" which may be slightly misleading.

Using RequestsPerSecondsUnit for error rate will display as "req/s", but errors aren't requests. This is a minor semantic mismatch. If Perses provides a more generic "per second" unit, that would be more accurate. Otherwise, this is acceptable for consistency with the request rate column.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controllers/uiplugin/apm.go` around lines 101 - 107, The Error rate
column is using common.RequestsPerSecondsUnit (which renders as "req/s") —
update the Format.Unit for the metric with Header "Error rate" (the block where
Name is "value `#2`" and Format is a *common.Format) to use a more generic
per-second unit if Perses exposes one (e.g., common.PerSecondUnit or similar)
instead of RequestsPerSecondsUnit; if no generic unit exists, leave as-is for
consistency but add a short inline comment explaining the semantic mismatch.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/controllers/uiplugin/apm.go`:
- Around line 101-107: The Error rate column is using
common.RequestsPerSecondsUnit (which renders as "req/s") — update the
Format.Unit for the metric with Header "Error rate" (the block where Name is
"value `#2`" and Format is a *common.Format) to use a more generic per-second unit
if Perses exposes one (e.g., common.PerSecondUnit or similar) instead of
RequestsPerSecondsUnit; if no generic unit exists, leave as-is for consistency
but add a short inline comment explaining the semantic mismatch.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4943b1c9-76cd-4c01-8bef-909e9d321b50

📥 Commits

Reviewing files that changed from the base of the PR and between cf377a3 and a946f4d.

📒 Files selected for processing (1)
  • pkg/controllers/uiplugin/apm.go

@jgbernalp
Copy link
Copy Markdown
Member

/ok-to-test

@jgbernalp
Copy link
Copy Markdown
Member

/hold
wait until #1043 is merged, feel free to unhold if it is.

@jgbernalp
Copy link
Copy Markdown
Member

/lgtm

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Apr 7, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreasgerstmayr, jgbernalp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

… duration

* add Span Kind variable to filter by span kind (defaults to
SPAN_KIND_SERVER to avoid double-counting)
* rename "Duration" to "Latency", and use P95 histogram quantile
  instead of average operation duration
* update error rate unit

Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
@andreasgerstmayr andreasgerstmayr force-pushed the apm-dashboard-span-king branch from ad50543 to 5306ca9 Compare April 13, 2026 14:16
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/controllers/uiplugin/apm.go (1)

183-185: Scope span_kind label-values query to active filters.

Consider including namespace="$namespace", service="$collector", and service_name="$service" in this matcher so variable options stay contextual and avoid broad cluster-wide scans.

Proposed diff
 				labelvalues.PrometheusLabelValues("span_kind",
-					labelvalues.Matchers(`{__name__=~"traces_span_metrics_calls(_total)?"}`),
+					labelvalues.Matchers(`{__name__=~"traces_span_metrics_calls(_total)?", namespace="$namespace", service="$collector", service_name="$service"}`),
 				),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controllers/uiplugin/apm.go` around lines 183 - 185, The Prometheus
label-values query for "span_kind" is currently unscoped and can return
cluster-wide values; update the call to labelvalues.PrometheusLabelValues used
with labelvalues.Matchers so the matcher includes the active filter labels
namespace="$namespace", service="$collector", and service_name="$service" (i.e.,
add these matchers alongside `{__name__=~"traces_span_metrics_calls(_total)?"}`)
to ensure span_kind options are contextual to the selected
namespace/collector/service; locate the invocation of
labelvalues.PrometheusLabelValues and adjust the labelvalues.Matchers expression
accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/controllers/uiplugin/apm.go`:
- Around line 183-185: The Prometheus label-values query for "span_kind" is
currently unscoped and can return cluster-wide values; update the call to
labelvalues.PrometheusLabelValues used with labelvalues.Matchers so the matcher
includes the active filter labels namespace="$namespace", service="$collector",
and service_name="$service" (i.e., add these matchers alongside
`{__name__=~"traces_span_metrics_calls(_total)?"}`) to ensure span_kind options
are contextual to the selected namespace/collector/service; locate the
invocation of labelvalues.PrometheusLabelValues and adjust the
labelvalues.Matchers expression accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: c8b66647-736c-42fd-82b2-5b86be1ae9d3

📥 Commits

Reviewing files that changed from the base of the PR and between a946f4d and 5306ca9.

📒 Files selected for processing (1)
  • pkg/controllers/uiplugin/apm.go

Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
@openshift-ci-robot
Copy link
Copy Markdown
Collaborator

openshift-ci-robot commented Apr 13, 2026

@andreasgerstmayr: This pull request references TRACING-6127 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Follow-up changes to #1043 (this PR depends on #1043):

  • add Span Kind variable to filter by span kind (defaults to SPAN_KIND_SERVER to avoid double-counting)
  • rename "Duration" to "Latency", and use P95 histogram quantile instead of average operation duration

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@andreasgerstmayr
Copy link
Copy Markdown
Contributor Author

/retest

@jgbernalp
Copy link
Copy Markdown
Member

/unhold

@jgbernalp
Copy link
Copy Markdown
Member

/test observability-operator-e2e

@jan--f
Copy link
Copy Markdown
Collaborator

jan--f commented Apr 17, 2026

/retest
/lgtm

@openshift-ci openshift-ci bot added the lgtm label Apr 17, 2026
@openshift-merge-bot openshift-merge-bot bot merged commit 6fa1c36 into rhobs:main Apr 17, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants