Skip to content

Label Prometheus metrics by workflow name#517

Open
cnicolov wants to merge 1 commit into
openworkflowdev:mainfrom
cnicolov:add-workflow-name-prometheus-label
Open

Label Prometheus metrics by workflow name#517
cnicolov wants to merge 1 commit into
openworkflowdev:mainfrom
cnicolov:add-workflow-name-prometheus-label

Conversation

@cnicolov
Copy link
Copy Markdown

Summary

  • Adds a backend aggregation for workflow run counts grouped by workflow name and status.
  • Emits workflow_name alongside status on openworkflow_workflow_runs Prometheus samples.
  • Updates dashboard/backend tests and Prometheus docs for the new label.

Verification

  • npx npm@11.8.0 run format
  • npm_config_engine_strict=false npx npm@11.8.0 run build
  • npm_config_engine_strict=false npx npm@11.8.0 run typecheck
  • npm_config_engine_strict=false npx npm@11.8.0 run lint
  • npm_config_engine_strict=false npx npm@11.8.0 run knip
  • npm_config_engine_strict=false npx npm@11.8.0 run lint:duplication
  • npm_config_engine_strict=false npx npm@11.8.0 run test:coverage

Notes

  • npm_config_engine_strict=false was needed locally because cspell declares Node >=22.18.0 while this machine has Node 22.16.0.
  • npm run lint:spell could not run locally for the same Node version reason.

@jamescmartinez
Copy link
Copy Markdown
Contributor

The query itself looks performant and supported by the existing indexes, but adding workflow_name as a Prom label creates a cardinality risk. The Prom metrics are already pretty heavy as-is, and the full counts we're doing for the current metrics are causing problems in prod.

I'm going to do some more tests on a bigger deployment this week to see how this works in prod.

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented May 17, 2026

Open in StackBlitz

npm i https://pkg.pr.new/openworkflowdev/openworkflow/@openworkflow/cli@517
npm i https://pkg.pr.new/openworkflowdev/openworkflow/@openworkflow/dashboard@517
npm i https://pkg.pr.new/openworkflowdev/openworkflow@517

commit: 1f853cc

@codecov
Copy link
Copy Markdown

codecov Bot commented May 17, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@cnicolov
Copy link
Copy Markdown
Author

The query itself looks performant and supported by the existing indexes, but adding workflow_name as a Prom label creates a cardinality risk. The Prom metrics are already pretty heavy as-is, and the full counts we're doing for the current metrics are causing problems in prod.

I'm going to do some more tests on a bigger deployment this week to see how this works in prod.

Hey @jamescmartinez what do you think to make it configurable and off by default?

@jamescmartinez
Copy link
Copy Markdown
Contributor

I've decided to hold off on adding this until v1, rather than add it behind a config flag in the current v0.x dashboard.

The per-workflow breakdown is useful, but workflow_name is a cardinality dimension, and Prom treats the full label set as part of the identity. When I tested this, because the shape of the returned series changed, my existing dashboard changed from one aggregate series into one series per workflow and created a bit of a mess.

Since the Go server/controller work is where the long-term metrics surface will live, we should add this there with explicit cardinality rules instead of extending the (already too heavy) full count 0.x dashboard metrics now.

I'm going to keep this PR open and it as part of the v1 telemetry design.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants