Describe the bug
decode is handled in Comet via CommonStringExprs.stringDecode called from versionSpecificExprToProtoInternal in the per-Spark-version CometExprShim. There is no CometExpressionSerde[StringDecode] (or equivalent) registered in QueryPlanSerde.scala's stringExpressions map.
Consequence: decode never appears in the auto-generated docs/source/user-guide/compatibility.md, neither under "incompatible" nor "unsupported" reasons. Users have no docs-side way to discover what Comet supports for decode (only literal 'utf-8' charset, case-insensitive) or what it doesn't (every other charset, all four-arg Spark 4.0 cases described in #4465).
Surfaced by the string-expressions audit in #4461.
Expected behavior
Register a CometExpressionSerde[StringDecode] (or a per-version equivalent that the shims dispatch to) so the doc generator picks the expression up and includes both the supported charset restriction and the legacy-flag gap as getUnsupportedReasons() entries.
Additional context
- Shim:
spark/src/main/spark-{3.4,3.5,4.0,...}/org/apache/comet/shims/CometExprShim.scala
- Helper:
CommonStringExprs.stringDecode in spark/src/main/scala/org/apache/comet/serde/strings.scala
- Doc generator:
GenerateDocs reads get*Reasons() from registered CometExpressionSerde instances; helpers called from shims are invisible to it.
Describe the bug
decodeis handled in Comet viaCommonStringExprs.stringDecodecalled fromversionSpecificExprToProtoInternalin the per-Spark-versionCometExprShim. There is noCometExpressionSerde[StringDecode](or equivalent) registered inQueryPlanSerde.scala'sstringExpressionsmap.Consequence:
decodenever appears in the auto-generateddocs/source/user-guide/compatibility.md, neither under "incompatible" nor "unsupported" reasons. Users have no docs-side way to discover what Comet supports fordecode(only literal'utf-8'charset, case-insensitive) or what it doesn't (every other charset, all four-arg Spark 4.0 cases described in #4465).Surfaced by the string-expressions audit in #4461.
Expected behavior
Register a
CometExpressionSerde[StringDecode](or a per-version equivalent that the shims dispatch to) so the doc generator picks the expression up and includes both the supported charset restriction and the legacy-flag gap asgetUnsupportedReasons()entries.Additional context
spark/src/main/spark-{3.4,3.5,4.0,...}/org/apache/comet/shims/CometExprShim.scalaCommonStringExprs.stringDecodeinspark/src/main/scala/org/apache/comet/serde/strings.scalaGenerateDocsreadsget*Reasons()from registeredCometExpressionSerdeinstances; helpers called from shims are invisible to it.