Skip to content

[Bug] str_to_map does not honour Spark 4.1.1 legacy.truncateForEmptyRegexSplit #4477

@andygrove

Description

@andygrove

Describe the bug

Spark 4.1.1 added a legacySplitTruncate flag to StringToMap (and several other split-based expressions) that is driven by spark.sql.legacy.truncateForEmptyRegexSplit. When the flag is enabled (legacy mode), CollationAwareUTF8String.splitSQL truncates trailing empty matches from the split result. Comet's native str_to_map does not honour this flag and always behaves as if it were false (the non-legacy default).

Surfaced by the map-expressions audit (collection PR queue).

Steps to reproduce

SET spark.sql.legacy.truncateForEmptyRegexSplit = true;
SELECT str_to_map('a:1,,b:2,', ',', ':');

Spark 4.1.1 with the legacy flag enabled would truncate the trailing empty ("","null") entry; the non-legacy default and Comet keep it.

Expected behavior

Either propagate the flag through the proto so the native impl can branch on it, or downgrade CometStrToMap to Incompatible when spark.sql.legacy.truncateForEmptyRegexSplit=true on Spark 4.1.1+.

Additional context

  • Comet wiring: QueryPlanSerde.scala -> classOf[StringToMap] -> CometStrToMap (a one-liner over CometScalarFunction("str_to_map")).
  • Spark 4.1.1 change: complexTypeCreator.scala:604-606:
    private lazy val legacySplitTruncate =
      SQLConf.get.getConf(SQLConf.LEGACY_TRUNCATE_FOR_EMPTY_REGEX_SPLIT)
  • The same flag also affects StringSplit (already-tracked behaviour difference there) and other split-based ops in 4.1.1.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions