Update Logstash single pipeline dashboard with batch byte_size metrics for p50 p90 max#18311
Conversation
… size and p50 p90 for batch's event count, for last 1 minute window
✅ Vale Linting ResultsNo issues found on modified lines! The Vale linter checks documentation changes against the Elastic Docs style guide. To use Vale locally or report issues, refer to Elastic style guide for Vale. |
🚀 Benchmarks reportTo see the full report comment with |
|
I imagined If we keep all percentiles plus current and average as lines, the graph it's hard to understand, you have to consider 5 lines and understanding the relationship it's not easy. |
|
What if we've made the shaded area between But that isn't 100% bulletproof. It feels like we could split this into two separate charts: one for Current vs. Average, and another strictly for Percentiles (p50, p90, Max). |
|
@perk so your suggestion is to keep as it is but only change the shaded areas, maintaining the shading only for the strip between In that case I don't know hot to do it, I think we need to speak to someone in Kibana hot to that (who?).
It could be, but at that point I think it became difficult to correlate current with the percentile measures (and |
Yes, I think that would be more clear than the current dashboard. But I'm not sure it's feasible either. Is it?
Fair question. I think that ability to correlate on a single dashboard is a good thing in general. |
|
Playing with colo, stacking and custom formula I was able to draw colored stripe that reppresent the p50-p90, as in the following image:
@perk I've one doubt: how effective is to graph the "average lifetime" value for both event count and byte value. If there is an increase or decrease, the time that metrics take to reach a "stable value" is proportional to the time the LS process was running. For example, suppose it was running for some days with pretty steady flow, that originated a an average of 100KB, if there is an increase to 1MB , the average lifetime takes sometime to get close to the 1MB value. I would suggest to use the average last 1 minute, so that all data series reflect the same temporal window. Let me know what you think. |
💚 Build Succeeded
History
cc @andsel |


Proposed commit message
Update the cel script to store p50 p90 and max byte size metrics of the batch for the last minute.
Update the Single Pipeline View to include these new values.
Checklist
changelog.ymlfile.Author's Checklist
How to test this PR locally
Follow the test plan shaped in #17009
Related issues
Screenshots