Skip to content

Add Granular Metrics to RepartitionExec #21148

@gene-bordegaray

Description

@gene-bordegaray

Is your feature request related to a problem or challenge?

There has been notice that RepartitionExec is quite expensive in certain queries / scenarios recently:

It has been difficult to investigate / isolate the reason for this due to lack of granularity of metrics provided in the RepartitionExec operator. As of now we are only provided:

  • send_time: time spent pulling the next batch from input stream (mixed spill, channel send, etc.)
  • repartition_time: big bucket for repartition work (mixed routing and rebuilding batches from routed indices)
  • fetch_time: per output partition, covered the whole public batch path

Describe the solution you'd like

I would like to introduce more granular metrics that will isolate where repartition is spending its time:

  • fetch_time: unchanged
  • repartition_time: now the end-to-end total repartition time
  • route_time: the time to distribute row indices to output partitions
  • batch_build_time: the time to build the record batches
  • channel_wait_time: per output partition, the time waiting for channel capacity / send(...) to complete
  • spill_write_time: per output partition, the time writing spilled batches
  • spill_read_wait_time: per output partition, time the consumer side waits for a spilled batch to become readable

Describe alternatives you've considered

I have considered other metrics but want to leave hot-path / overhead as small as possible for collection while still gaining good insight into the operator

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions