Skip to content

[Doc] decode does not appear in auto-generated compatibility docs #4466

@andygrove

Description

@andygrove

Describe the bug

decode is handled in Comet via CommonStringExprs.stringDecode called from versionSpecificExprToProtoInternal in the per-Spark-version CometExprShim. There is no CometExpressionSerde[StringDecode] (or equivalent) registered in QueryPlanSerde.scala's stringExpressions map.

Consequence: decode never appears in the auto-generated docs/source/user-guide/compatibility.md, neither under "incompatible" nor "unsupported" reasons. Users have no docs-side way to discover what Comet supports for decode (only literal 'utf-8' charset, case-insensitive) or what it doesn't (every other charset, all four-arg Spark 4.0 cases described in #4465).

Surfaced by the string-expressions audit in #4461.

Expected behavior

Register a CometExpressionSerde[StringDecode] (or a per-version equivalent that the shims dispatch to) so the doc generator picks the expression up and includes both the supported charset restriction and the legacy-flag gap as getUnsupportedReasons() entries.

Additional context

  • Shim: spark/src/main/spark-{3.4,3.5,4.0,...}/org/apache/comet/shims/CometExprShim.scala
  • Helper: CommonStringExprs.stringDecode in spark/src/main/scala/org/apache/comet/serde/strings.scala
  • Doc generator: GenerateDocs reads get*Reasons() from registered CometExpressionSerde instances; helpers called from shims are invisible to it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions