Skip to content

[integration] Support ray DAG as a backend for lineapy pipeline #665

@fishbone

Description

@fishbone

Lineapy is really impressive which covers the flow end2end. I notice there is to_pipeline feature which is very convenient to productionize the DF.

Ray is a distributed computation framework that can be used as a backend of python to do the scale-up (https://docs.ray.io/en/master/)

Recently, Ray is introducing Ray DAG, which is a ray style to define the computation graph: the data communicated between functions are the object refs instead of the actual python data. With this, we can either execute it in normal ray mode or in a workflow mode. The former one is just executing the pipeline in a distributed way and the latter one will do checkpointing for each step which offers durability and fault tolerance.

I'm thinking about the possibility to add it as a backend of lineapy. One benefit of this integration is that we'll be able to scale up the workloads easily by utilizing Ray's functionality.

I haven't dug deep into the implementation, but it seems to_pipeline is implemented as a plugin and the actual work is to convert a Graph defined internally to the code of the target backend. I feel the implementation should be straightforward.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions