Skip to content

Higher-order functions and async UDF don't work together #22091

@gstvg

Description

@gstvg

Describe the bug

Async udfs are removed from their original position and evaluated within a dedicated execution plan, where lambda variables children of the async udf aren't available. For example, select array_transform([-2], v -> async_abs(v)) is planned into:

logical_plan
01)Projection: array_transform(List([-2]), (v) -> async_abs(v)) AS array_transform(make_array(Int64(-2)),(v) -> async_abs(v))
02)--EmptyRelation: rows=1
physical_plan
01)ProjectionExec: expr=[array_transform([-2], (v) -> __async_fn_0@0) as array_transform(make_array(Int64(-2)),(v) -> async_abs(v))]
02)--RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1
03)----AsyncFuncExec: async_expr=[async_expr(name=__async_fn_0, expr=async_abs(v@0))]
04)------PlaceholderRowExec

The async_abs(v@0) in AsyncFuncExpr contains a lambda variable v@0 which isn't available there

Also, using lambda variables and async udf without lambda variables don't work either: select array_transform([1], v -> v + async_abs(-2))

logical_plan
01)Projection: array_transform(List([1]), (v) -> v + async_abs(Int64(-2))) AS array_transform(make_array(Int64(1)),(v) -> v + async_abs(Int64(-2)))
02)--EmptyRelation: rows=1
physical_plan
01)ProjectionExec: expr=[array_transform([1], (v) -> v@0 + __async_fn_0@0) as array_transform(make_array(Int64(1)),(v) -> v + async_abs(Int64(-2)))]
02)--RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1
03)----AsyncFuncExec: async_expr=[async_expr(name=__async_fn_0, expr=async_abs(-2))]
04)------PlaceholderRowExec

Because the new __async_fn_0 column isn't present in the schema during physical planning, lambda variable get the same index: v@0 + __async_fn_0@0, and generates an error during execution (I believe this is similar to #18149)

To Reproduce

SELECT array_transform([1], v -> async_udf(v)) or select array_transform([1], v -> v + async_abs(-2))

Expected behavior

Async udfs should work normally regardless of using lambda variables or not

Additional context

Async UDFs PR #14837
Higher-order function PR #21679

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions