apache · andygrove · May 27, 2026 · May 27, 2026
diff --git a/.claude/skills/audit-comet-expression/SKILL.md b/.claude/skills/audit-comet-expression/SKILL.md
@@ -1,6 +1,6 @@
 ---
 name: audit-comet-expression
-description: Audit an existing Comet expression for correctness and test coverage. Studies the Spark implementation across versions 3.4.3, 3.5.8, and 4.0.1, reviews the Comet and DataFusion implementations, identifies missing test coverage, and offers to implement additional tests.
+description: Audit an existing Comet expression for correctness and test coverage. Studies the Spark implementation across versions 3.4.3, 3.5.8, 4.0.1, and 4.1.1, reviews the Comet and DataFusion implementations, identifies missing test coverage, and offers to implement additional tests.
 argument-hint: <expression-name>
 ---
 
@@ -10,7 +10,7 @@ Audit the Comet implementation of the `$ARGUMENTS` expression for correctness an
 
 This audit covers:
 
-1. Spark implementation across versions 3.4.3, 3.5.8, and 4.0.1
+1. Spark implementation across versions 3.4.3, 3.5.8, 4.0.1, and 4.1.1
 2. Comet Scala serde implementation
 3. Comet Rust / DataFusion implementation
 4. Existing test coverage (Comet SQL Tests and Comet Scala Tests)
@@ -24,7 +24,7 @@ Clone specific Spark version tags (use shallow clones to avoid polluting the wor
 
 ```bash
 set -eu -o pipefail
-for tag in v3.4.3 v3.5.8 v4.0.1; do
+for tag in v3.4.3 v3.5.8 v4.0.1 v4.1.1; do
   dir="/tmp/spark-${tag}"
   if [ ! -d "$dir" ]; then
     git clone --depth 1 --branch "$tag" https://github.com/apache/spark.git "$dir"
@@ -37,7 +37,7 @@ done
 Search the Catalyst SQL expressions source:
 
 ```bash
-for tag in v3.4.3 v3.5.8 v4.0.1; do
+for tag in v3.4.3 v3.5.8 v4.0.1 v4.1.1; do
   dir="/tmp/spark-${tag}"
   echo "=== $tag ==="
   find "$dir/sql/catalyst/src/main/scala" -name "*.scala" | \
@@ -48,7 +48,7 @@ done
 If the expression is not found in catalyst, also check core:
 
 ```bash
-for tag in v3.4.3 v3.5.8 v4.0.1; do
+for tag in v3.4.3 v3.5.8 v4.0.1 v4.1.1; do
   dir="/tmp/spark-${tag}"
   echo "=== $tag ==="
   find "$dir/sql" -name "*.scala" | \
@@ -73,6 +73,7 @@ Produce a concise diff summary of what changed between:
 
 - 3.4.3 → 3.5.8
 - 3.5.8 → 4.0.1
+- 4.0.1 → 4.1.1
 
 Pay attention to:
 
@@ -87,7 +88,7 @@ Pay attention to:
 ## Step 2: Locate the Spark Tests
 
 ```bash
-for tag in v3.4.3 v3.5.8 v4.0.1; do
+for tag in v3.4.3 v3.5.8 v4.0.1 v4.1.1; do
   dir="/tmp/spark-${tag}"
   echo "=== $tag ==="
   find "$dir/sql" -name "*.scala" -path "*/test/*" | \
@@ -411,6 +412,11 @@ For an untested case, run it manually first to determine the current
 behaviour, then commit either a regression test (passes) or a
 `query ignore(<issue-url>)` test (fails).
 
+Every item in this bucket either becomes an inline fix + test, or a
+filed GitHub issue + ignored regression test, in the audit PR. Step 7
+spells out the workflow: never leave a high-priority finding as PR-body
+prose only.
+
 ### Medium priority: missing test coverage
 
 Low-risk coverage gaps: additional input permutations on already-tested
@@ -429,9 +435,13 @@ etc. These come from the Step 5 consistency audit.
 
 High-priority findings (correctness divergences and high-risk coverage
 gaps) and consistency issues from Step 5 / Step 6 must not be left as
-prose. Apply them in the same PR as the audit. Only low-risk missing
-coverage requires the user's go-ahead, because adding tests for cases
-that already work on well-exercised paths is incremental polish.
+prose. Apply them in the same PR as the audit. Anything you cannot fix
+inline (because it needs a semantics decision, native code change, or
+larger design work) must still be captured as a GitHub issue per the
+"Findings that need follow-up" section below: prose recommendations in
+the PR body alone are insufficient. Only low-risk missing coverage
+requires the user's go-ahead, because adding tests for cases that
+already work on well-exercised paths is incremental polish.
 
 ### High-priority findings: capture as tests
 
@@ -502,13 +512,53 @@ need user approval. The classes of fix are:
 
 Each fix is one of these patterns. If a finding requires a semantics
 decision (e.g. is a specific branch really `Unsupported`, or is it
-`Incompatible`?), do not guess: leave it as a prose recommendation in
-the PR description and call it out for the reviewer.
+`Incompatible`?), do not guess: **file a GitHub issue per the
+"Findings that need follow-up" section below** and link it from the PR
+description. Do not leave the recommendation as prose only: prose in a
+PR description gets buried as soon as the PR merges.
 
 After every fix, build the affected module to make sure the edit
 compiles. Do not run the full suite; targeted tests suffice if the
 fix could plausibly affect behaviour.
 
+### Findings that need follow-up: always file a tracking issue
+
+Any high-priority finding (correctness divergence, robustness gap,
+behavioural difference from Spark, missing-collation guard, etc.) that
+this PR does not fix inline **must** be filed as a GitHub issue before
+the PR is opened. This includes:
+
+- Semantics decisions the audit surfaces but should not unilaterally
+  resolve (e.g. promote `Compatible` to `Incompatible`, change a
+  default).
+- Architectural concerns that span multiple expressions (e.g. Spark 4.0
+  collation propagation across an entire family).
+- Bugs that are fixable in principle but need more design or native
+  changes than fit in the audit PR.
+- Documentation gaps surfaced by the audit (e.g. an expression that
+  doesn't appear in the auto-generated compatibility doc).
+
+For each follow-up:
+
+1. Search for an existing issue first
+   (`gh issue list --search "<expression> <symptom> in:title,body" --state all --limit 5`).
+   If one exists, link it from the PR description and the support-doc
+   sub-bullet, and stop.
+2. If no issue exists, file one with `gh issue create` using the
+   `correctness` label (or `documentation` for doc-only gaps) plus any
+   relevant area labels (e.g. `spark 4.0`). Title format:
+   `[Bug] <expression> <one-line symptom>`. Body includes: Spark version
+   range affected, a minimal repro, the divergent result, the relevant
+   Comet file/line, and a one-line note that the issue was surfaced by
+   this audit PR.
+3. Reference the issue number from both the support-doc sub-bullet and
+   the PR description "What changes are included" section so reviewers
+   can see what work the audit intentionally deferred.
+
+Prose-only "future work" notes in the PR description are not enough.
+The whole point of the audit is to leave behind durable artefacts; an
+unfiled finding evaporates after the PR merges.
+
 ### Low-risk missing test coverage: ask the user
 
 This is the only Step 7 category that pauses for user input. It only
@@ -586,7 +636,7 @@ entry in `docs/source/contributor-guide/spark_expressions_support.md`.
 
 Add one sub-bullet per Spark version checked, each including:
 
-- Spark version (e.g. 3.4.3, 3.5.8, 4.0.1)
+- Spark version (e.g. 3.4.3, 3.5.8, 4.0.1, 4.1.1)
 - Today's date
 - A brief note for any version-specific finding (behavioral difference, known incompatibility); omit if nothing notable
 
@@ -597,7 +647,7 @@ Add one sub-bullet per Spark version checked, each including:
 Present the audit as:
 
 1. **Expression Summary** - Brief description of what `$ARGUMENTS` does, its input/output types, and null behavior
-2. **Spark Version Differences** - Summary of any behavioral or API differences across Spark 3.4.3, 3.5.8, and 4.0.1
+2. **Spark Version Differences** - Summary of any behavioral or API differences across Spark 3.4.3, 3.5.8, 4.0.1, and 4.1.1
 3. **Comet Implementation Notes** - Summary of how Comet implements this expression and any concerns
 4. **Coverage Gap Analysis** - The gap table from Step 5, plus implementation gaps
 5. **Recommendations** - Prioritized list from Step 6