Is your feature request related to a problem or challenge?
Is your feature request related to a problem or challenge?
DataFusion's lag and lead window functions only accept a scalar value as the third (default) argument. The implementation in datafusion/functions-window/src/lead_lag.rs extracts the default
at planning time via get_scalar_value_from_args(input_exprs, 2) and stores it as a ScalarValue on the WindowUDFFieldArgs / partition evaluator state.
This means queries like:
SELECT
b,
LAG(a, 1, c) OVER (ORDER BY b) AS lg,
LEAD(a, 1, c) OVER (ORDER BY b) AS ld
FROM t
…where the default expression is a column reference (or any non-literal expression), cannot be planned natively in DataFusion. Spark accepts an arbitrary Expression here — the value is meant to be
evaluated per row when the offset row does not exist — and downstream projects building Spark compatibility on DataFusion (e.g. Apache DataFusion Comet) currently have to fall back to Spark for this
pattern.
Allow the third argument of lag / lead to be any expression, evaluated per row in the partition evaluator:
- In parse_default_value / partition-evaluator construction, accept an arbitrary PhysicalExpr for the default rather than coercing it to a ScalarValue.
- In evaluate_all / evaluate_all_with_ignore_null (and the shift_with_default_value path), when the offset row does not exist, take the default from the per-row evaluation of the default-expression
column instead of cloning a single ScalarValue.
- Preserve the fast path: when the default expression is a literal (the common case today), continue to materialize it once as a scalar to avoid overhead.
- Update the field/return-type derivation in field so the default-expression's data type still drives result-type unification (the existing NULL_FIELD fallback already covers the Literal(NULL)
default).
Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response
Is your feature request related to a problem or challenge?
Is your feature request related to a problem or challenge?
DataFusion's
lagandleadwindow functions only accept a scalar value as the third (default) argument. The implementation indatafusion/functions-window/src/lead_lag.rsextracts the defaultat planning time via
get_scalar_value_from_args(input_exprs, 2)and stores it as aScalarValueon theWindowUDFFieldArgs/ partition evaluator state.This means queries like:
…where the default expression is a column reference (or any non-literal expression), cannot be planned natively in DataFusion. Spark accepts an arbitrary Expression here — the value is meant to be
evaluated per row when the offset row does not exist — and downstream projects building Spark compatibility on DataFusion (e.g. Apache DataFusion Comet) currently have to fall back to Spark for this
pattern.
Allow the third argument of lag / lead to be any expression, evaluated per row in the partition evaluator:
column instead of cloning a single ScalarValue.
default).
Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response