Describe the bug
Async udfs are removed from their original position and evaluated within a dedicated execution plan, where lambda variables children of the async udf aren't available. For example, select array_transform([-2], v -> async_abs(v)) is planned into:
logical_plan
01)Projection: array_transform(List([-2]), (v) -> async_abs(v)) AS array_transform(make_array(Int64(-2)),(v) -> async_abs(v))
02)--EmptyRelation: rows=1
physical_plan
01)ProjectionExec: expr=[array_transform([-2], (v) -> __async_fn_0@0) as array_transform(make_array(Int64(-2)),(v) -> async_abs(v))]
02)--RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1
03)----AsyncFuncExec: async_expr=[async_expr(name=__async_fn_0, expr=async_abs(v@0))]
04)------PlaceholderRowExec
The async_abs(v@0) in AsyncFuncExpr contains a lambda variable v@0 which isn't available there
Also, using lambda variables and async udf without lambda variables don't work either: select array_transform([1], v -> v + async_abs(-2))
logical_plan
01)Projection: array_transform(List([1]), (v) -> v + async_abs(Int64(-2))) AS array_transform(make_array(Int64(1)),(v) -> v + async_abs(Int64(-2)))
02)--EmptyRelation: rows=1
physical_plan
01)ProjectionExec: expr=[array_transform([1], (v) -> v@0 + __async_fn_0@0) as array_transform(make_array(Int64(1)),(v) -> v + async_abs(Int64(-2)))]
02)--RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1
03)----AsyncFuncExec: async_expr=[async_expr(name=__async_fn_0, expr=async_abs(-2))]
04)------PlaceholderRowExec
Because the new __async_fn_0 column isn't present in the schema during physical planning, lambda variable get the same index: v@0 + __async_fn_0@0, and generates an error during execution (I believe this is similar to #18149)
To Reproduce
SELECT array_transform([1], v -> async_udf(v)) or select array_transform([1], v -> v + async_abs(-2))
Expected behavior
Async udfs should work normally regardless of using lambda variables or not
Additional context
Async UDFs PR #14837
Higher-order function PR #21679
Describe the bug
Async udfs are removed from their original position and evaluated within a dedicated execution plan, where lambda variables children of the async udf aren't available. For example,
select array_transform([-2], v -> async_abs(v))is planned into:The
async_abs(v@0)inAsyncFuncExprcontains a lambda variablev@0which isn't available thereAlso, using lambda variables and async udf without lambda variables don't work either:
select array_transform([1], v -> v + async_abs(-2))Because the new
__async_fn_0column isn't present in the schema during physical planning, lambda variable get the same index:v@0 + __async_fn_0@0, and generates an error during execution (I believe this is similar to #18149)To Reproduce
SELECT array_transform([1], v -> async_udf(v))orselect array_transform([1], v -> v + async_abs(-2))Expected behavior
Async udfs should work normally regardless of using lambda variables or not
Additional context
Async UDFs PR #14837
Higher-order function PR #21679