Describe the bug
replace(str, search, replacement) diverges from Spark when search is the empty literal string. Spark returns str unchanged (short-circuit on search.numBytes == 0 in StringReplace.eval). Comet delegates to DataFusion's replace, which inserts replacement between every character and at both boundaries.
How to Reproduce
SELECT replace('hello', '', 'x');
| Engine |
Result |
| Spark |
hello |
| Comet (DataFusion) |
xhxexlxlxox |
Additional context
Surfaced by the string-expressions audit (#4461) follow-up. Issue #3344 covered the same divergence but the body had the expected/actual values swapped, leading to it being closed as already-fixed. This issue restates the divergence with the correct direction.
Workaround: the audit-comet-expression follow-up marks replace as Incompatible(Some(reason)) only when searchExpr is a literal empty string, so the dispatcher falls back to Spark for that specific case unless the user opts in with spark.comet.expression.StringReplace.allowIncompatible=true.
Describe the bug
replace(str, search, replacement)diverges from Spark whensearchis the empty literal string. Spark returnsstrunchanged (short-circuit onsearch.numBytes == 0inStringReplace.eval). Comet delegates to DataFusion'sreplace, which insertsreplacementbetween every character and at both boundaries.How to Reproduce
helloxhxexlxlxoxAdditional context
Surfaced by the string-expressions audit (#4461) follow-up. Issue #3344 covered the same divergence but the body had the expected/actual values swapped, leading to it being closed as already-fixed. This issue restates the divergence with the correct direction.
Workaround: the
audit-comet-expressionfollow-up marksreplaceasIncompatible(Some(reason))only whensearchExpris a literal empty string, so the dispatcher falls back to Spark for that specific case unless the user opts in withspark.comet.expression.StringReplace.allowIncompatible=true.