Skip to content

Support ColumnExpr in LAG/LEAD for default parameter #22082

@comphead

Description

@comphead

Is your feature request related to a problem or challenge?

Is your feature request related to a problem or challenge?

DataFusion's lag and lead window functions only accept a scalar value as the third (default) argument. The implementation in datafusion/functions-window/src/lead_lag.rs extracts the default
at planning time via get_scalar_value_from_args(input_exprs, 2) and stores it as a ScalarValue on the WindowUDFFieldArgs / partition evaluator state.

This means queries like:

SELECT
  b,
  LAG(a, 1, c)  OVER (ORDER BY b) AS lg,
  LEAD(a, 1, c) OVER (ORDER BY b) AS ld                                                                                                                                                                  
FROM t     

…where the default expression is a column reference (or any non-literal expression), cannot be planned natively in DataFusion. Spark accepts an arbitrary Expression here — the value is meant to be
evaluated per row when the offset row does not exist — and downstream projects building Spark compatibility on DataFusion (e.g. Apache DataFusion Comet) currently have to fall back to Spark for this
pattern.

Allow the third argument of lag / lead to be any expression, evaluated per row in the partition evaluator:

  1. In parse_default_value / partition-evaluator construction, accept an arbitrary PhysicalExpr for the default rather than coercing it to a ScalarValue.
  2. In evaluate_all / evaluate_all_with_ignore_null (and the shift_with_default_value path), when the offset row does not exist, take the default from the per-row evaluation of the default-expression
    column instead of cloning a single ScalarValue.
  3. Preserve the fast path: when the default expression is a literal (the common case today), continue to materialize it once as a scalar to avoid overhead.
  4. Update the field/return-type derivation in field so the default-expression's data type still drives result-type unification (the existing NULL_FIELD fallback already covers the Literal(NULL)
    default).

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request
No fields configured for Feature.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions