chore(audit): audit json expressions across Spark 3.4.3, 3.5.8, 4.0.1, 4.1.1#4470
Open
andygrove wants to merge 1 commit into
Open
chore(audit): audit json expressions across Spark 3.4.3, 3.5.8, 4.0.1, 4.1.1#4470andygrove wants to merge 1 commit into
andygrove wants to merge 1 commit into
Conversation
…, 4.1.1 Add per-version audit sub-bullets to `get_json_object` in `docs/source/contributor-guide/spark_expressions_support.md`. Spark 3.4.3 and 3.5.8 use a `BinaryExpression with CodegenFallback` with inline Jackson-based eval; Spark 4.0 extracts the eval into a `GetJsonObjectEvaluator` helper and widens `inputTypes` to `StringTypeWithCollation` (`DefaultStringProducingExpression` trait added). 4.1 is identical to 4.0. Apply the one support-level consistency fix surfaced by the audit: - `CometGetJsonObject`: extract the duplicate single-quote / control-character incompatibility reason into a shared `private val` so the doc generator and the EXPLAIN dispatcher cannot drift. No new tracking issues filed. The known incompatibilities (single- quoted JSON, unescaped control characters) are already declared via `getSupportLevel` and `getIncompatibleReasons`. Spark 4.0 collation propagation is covered by the umbrella apache#2190.
This was referenced May 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #.
Rationale for this change
Continuation of the per-category expression audit. Same pattern as #4469 (struct), #4461 (string), and earlier audits, using the updated
audit-comet-expressionskill in #4468 (now also covers Spark 4.1.1).What changes are included in this PR?
Support-doc audit notes
Add per-version audit sub-bullets to
get_json_objectindocs/source/contributor-guide/spark_expressions_support.md. Spark 3.4.3 and 3.5.8 use aBinaryExpression with CodegenFallbackwith inline Jackson-based eval. Spark 4.0 extracts the eval into aGetJsonObjectEvaluatorhelper, mixes inDefaultStringProducingExpression, and widensinputTypestoStringTypeWithCollation(supportsTrimCollation = true). Spark 4.1.1 is identical to 4.0.Support-level consistency fix (in
strings.scala)CometGetJsonObject: extract the duplicate single-quote / control-character incompatibility reason into a sharedprivate valso the doc generator and the EXPLAIN dispatcher cannot drift.Tracking issues filed for follow-up
None. The known incompatibilities (single-quoted JSON, unescaped control characters) are already declared via
getSupportLevelandgetIncompatibleReasons. Non-default Spark 4.0 string collations are covered by the umbrella #2190 (referenced from the support-doc sub-bullet).Audit process
Audited directly using the
audit-comet-expressionskill (4 Spark versions). One backing serde, so no parallel subagents were needed.How are these changes tested?
./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite string/get_json_object" -Dtest=none(2 tests pass; existingget_json_object.sqlalready covers single-character, nested-field, wildcard, deep-nested, unicode, emoji, mixed-script, escaped-quote, and dictionary-encoded inputs).make coresucceeds with the serde change.