Label Prometheus metrics by workflow name#517
Conversation
|
The query itself looks performant and supported by the existing indexes, but adding I'm going to do some more tests on a bigger deployment this week to see how this works in prod. |
commit: |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Hey @jamescmartinez what do you think to make it configurable and off by default? |
|
I've decided to hold off on adding this until v1, rather than add it behind a config flag in the current v0.x dashboard. The per-workflow breakdown is useful, but Since the Go server/controller work is where the long-term metrics surface will live, we should add this there with explicit cardinality rules instead of extending the (already too heavy) full count 0.x dashboard metrics now. I'm going to keep this PR open and it as part of the v1 telemetry design. |
Summary
workflow_namealongsidestatusonopenworkflow_workflow_runsPrometheus samples.Verification
npx npm@11.8.0 run formatnpm_config_engine_strict=false npx npm@11.8.0 run buildnpm_config_engine_strict=false npx npm@11.8.0 run typechecknpm_config_engine_strict=false npx npm@11.8.0 run lintnpm_config_engine_strict=false npx npm@11.8.0 run knipnpm_config_engine_strict=false npx npm@11.8.0 run lint:duplicationnpm_config_engine_strict=false npx npm@11.8.0 run test:coverageNotes
npm_config_engine_strict=falsewas needed locally because cspell declares Node>=22.18.0while this machine has Node22.16.0.npm run lint:spellcould not run locally for the same Node version reason.