Skip to content

Improve Summary quantiles with DataSketches #2084

@DCjanus

Description

@DCjanus

Summary.observe() can become expensive when quantiles are recorded at high frequency. This can make the current quantile path visible on hot request paths.

We saw this in ZooKeeper's Prometheus metrics path. In an internal ZooKeeper 3.9.2 fork, a version inspired by ZooKeeper's unmerged DataSketches Summary PR improved peak throughput by about 2x.

DataSketches KLL may be a useful way to improve this in client_java. The goal would be to reduce the cost of the observe path while keeping the external Summary behavior as close as practical.

This would not have to replace the current CKMS-based Summary immediately. DataSketches KLL has a different accuracy model, memory cost, and quantile visibility behavior, so an explicit opt-in path may be a better first step.

Initial questions:

  • Does using DataSketches for Summary quantiles sound like a direction worth exploring?
  • If so, would a separate opt-in artifact be a reasonable way to introduce it?
  • What behavior details and benchmark data would be most useful before going further?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions