Skip to content

Sorting spitballing #321

@kwmsmith

Description

@kwmsmith

Some sorting ideas for consideration:

  • It is fairly reliable that communication overhead and latency are the dominant factors, so we want to minimize the total amount of communication.
  • In general, sorting requires an all-to-all communication step: every worker has to send and receive data to and from every other worker, so there is potentially a lot of communication.
  • We want to minimize the all-to-all communication as much as possible.
  • We also assume that sorting the local array is efficient and a solved problem.
  • If we can get all the right data to each worker, then sort the data locally with a local sort, then we're done. So the problem reduces to getting the right data to each worker.
  • If we allow the sorted array to have an irregular block distribution that does not match the distribution of the original array, then that gives a lot of flexibility.

Assume we have n workers that share a block-distributed distarray. Assume we have some way to choose n-1 pivots that partition the global array into n sections such that the number of elements in section i equals the number of elements on worker with rank i. Then the sort can proceed as follows:

  • Partition each worker's localarray into n sections using the `n-1

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions