Skip to content

[Bug] repeat throws on negative count where Spark returns empty string #4462

@andygrove

Description

@andygrove

Describe the bug

CometStringRepeat delegates to DataFusion repeat. DataFusion's repeat throws on negative n, while Spark's UTF8String.repeat returns the empty string for n <= 0. Comet currently reports Compatible for this expression (with a getCompatibleNotes caveat), so users with repeat(s, -1) get a runtime exception under Comet instead of the empty string Spark would produce.

Surfaced by the string-expressions audit in #4461.

Steps to reproduce

SELECT repeat('abc', -1);

Spark: returns ''.
Comet: throws ArrowError("Invalid argument error: repeat requires a non-negative number of repetitions") at execution.

Expected behavior

Either match Spark by returning '', or promote CometStringRepeat to Incompatible(Some(...)) so the path falls back unless explicitly enabled via spark.comet.expression.StringRepeat.allowIncompatible=true.

Additional context

  • Comet serde: spark/src/main/scala/org/apache/comet/serde/strings.scala (CometStringRepeat)
  • Spark reference: UTF8String.repeat(n) short-circuits for n <= 0
  • The current getCompatibleNotes text mentions the divergence but the support level is still Compatible, so the path is taken silently.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions