Skip to content

[Bug] CometCaseConversionBase gates compat inside convert() instead of getSupportLevel #4467

@andygrove

Description

@andygrove

Describe the bug

CometCaseConversionBase (parent of CometUpper and CometLower) gates compatibility inside convert(...):

override def convert(expr: T, inputs: Seq[Attribute], binding: Boolean): Option[Expr] = {
  if (!CometConf.COMET_CASE_CONVERSION_ENABLED.get()) {
    withInfo(expr, "Comet is not compatible with Spark for case conversion in " +
      s"locale-specific cases. Set ${CometConf.COMET_CASE_CONVERSION_ENABLED.key}=true " +
      "to enable it anyway.")
    return None
  }
  super.convert(expr, inputs, binding)
}

It never overrides getSupportLevel, so the class-level support level defaults to Compatible(None) even though getIncompatibleReasons() is populated. The Step 5 consistency rule in the audit-comet-expression skill calls this out specifically as the canonical antipattern to avoid. Surfaced by the string-expressions audit in #4461.

Concrete consequences:

  1. The dispatcher's allowIncompatible machinery is bypassed. A user who flips the standard spark.comet.expr.allowIncompatible=true (or the per-expression spark.comet.expression.Upper.allowIncompatible) still has lower/upper fall back, because the gate is the older bespoke spark.comet.caseConversion.enabled.
  2. EXPLAIN and the auto-generated compatibility docs disagree. EXPLAIN shows the bespoke message above; the compat doc lists the populated getIncompatibleReasons() text instead.
  3. initcap uses the standard Incompatible(Some(...)) flow; lower/upper use the bespoke flow. The asymmetry is confusing for users who have just opted into all string incompatibilities.

Expected behavior

Override getSupportLevel to return Incompatible(Some(reason)) from a shared private val, drop the in-convert conf check, and either deprecate spark.comet.caseConversion.enabled in favor of the standard per-expression allowIncompatible flag or have the support level read the standard flag.

Additional context

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions