Skip to content

[Bug] replace returns wrong result for empty-string search #4497

@andygrove

Description

@andygrove

Describe the bug

replace(str, search, replacement) diverges from Spark when search is the empty literal string. Spark returns str unchanged (short-circuit on search.numBytes == 0 in StringReplace.eval). Comet delegates to DataFusion's replace, which inserts replacement between every character and at both boundaries.

How to Reproduce

SELECT replace('hello', '', 'x');
Engine Result
Spark hello
Comet (DataFusion) xhxexlxlxox

Additional context

Surfaced by the string-expressions audit (#4461) follow-up. Issue #3344 covered the same divergence but the body had the expected/actual values swapped, leading to it being closed as already-fixed. This issue restates the divergence with the correct direction.

Workaround: the audit-comet-expression follow-up marks replace as Incompatible(Some(reason)) only when searchExpr is a literal empty string, so the dispatcher falls back to Spark for that specific case unless the user opts in with spark.comet.expression.StringReplace.allowIncompatible=true.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions