Skip to content

Limited control for offloading, suggestion to improve #5

@a-roberts

Description

@a-roberts

Hi, I think we should have thresholds for offloading to prevent exceptions and the GPU trying to be used when it's not ideal, we could rely on user code but let's make it easy for our users.

If my input data is too big for the current partitioning strategy, I'm currently going to get an exception and will need to shut down my cluster and set up a partitioning strategy specific to how much memory is available on a GPU; and I may have many that all have different memory amounts.

If my use case involves Spark streaming - so we don't know how much data we're going to be getting, I am not convinced the current solution is robust enough to handle different data sizes.

I propose we add a new system property to allow users to set the threshold in accordance with the property we have for controlling the offloading.

Because users can have different GPUs in their cluster, what do we think about having a key value pairing perhaps in a new file under the conf directory, e.g:

K40m: 100, 100000
K80m: 50, 200000

The format here is:
GPU model name: minimum offload amount, maximum offload amount.

We can parse this file once at startup time and store the values, our code would then check if the current system property is set and then check how big the partition is before loading our library and calling our CUDA code.

The offload amounts refer to how big the partitions are. So users can say "well actually I know I'll only have a few very large partitions so let's set this property and only send work to our GPU when it's of the right size".

If we don't do this we're going to be hitting OoMs for partitions too big to fit on the GPU, or using the GPU when the problem size is too small. We know the size of the data is important for GPU processing as evidenced by our IBM Java class library work.

Currently we're going to always try use the GPU, so I think we should add a fallback to give users control of the offloading.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions