Skip to content

docs: clarify support-level and reason consistency in audit-comet-expression skill#4447

Merged
andygrove merged 4 commits into
apache:mainfrom
andygrove:audit-skill-support-level
May 27, 2026
Merged

docs: clarify support-level and reason consistency in audit-comet-expression skill#4447
andygrove merged 4 commits into
apache:mainfrom
andygrove:audit-skill-support-level

Conversation

@andygrove
Copy link
Copy Markdown
Member

@andygrove andygrove commented May 27, 2026

Which issue does this PR close?

Closes #.

Rationale for this change

The audit-comet-expression skill produced inconsistent guidance on when to use Unsupported vs Incompatible, and how getSupportLevel, getIncompatibleReasons, getUnsupportedReasons, and convert need to stay aligned. As a result, audits flagged some real issues but missed others, and the recommended fixes were vague.

While exercising the updated skill on a multi-expression audit (#4448), two further gaps in the workflow became obvious: correctness divergences need to be captured as regression tests in the same PR, and mechanical consistency fixes should be applied automatically rather than gating on user approval. Both changes are folded in here so the skill is consistent end-to-end before the wider audit lands.

What changes are included in this PR?

Updates to .claude/skills/audit-comet-expression/SKILL.md:

Step 3 gains a dedicated "Audit getSupportLevel, getIncompatibleReasons, getUnsupportedReasons, and convert" subsection covering:

  • Decision rule for Unsupported vs Incompatible (wrong answer vs fallback/error).
  • Runtime vs docs split: support-level notes flow into EXPLAIN via the dispatcher in QueryPlanSerde.exprToProtoInternal; get*Reasons() is read only by GenerateDocs.
  • Six consistency rules (cover every branch, no dead reasons, share via private val, prefer Some(reason) over None, gate decisions in getSupportLevel rather than convert).
  • Wording guidelines for reason strings (user-observable effect first, sentence case, backtick configs/types, link tracking issues).
  • Real antipattern callouts naming specific serdes.

Step 5 gains a structured 9-item support-level consistency audit checklist that produces findings against the four methods.

Step 6 reorganises the priorities into three named buckets: correctness divergences, missing coverage, and consistency issues.

Step 7 is restructured so the audit produces concrete output without pausing for user approval on mechanical steps:

  • Correctness findings are captured as SQL file tests in the same PR. Walks through the "search for existing issue, file if missing" workflow and the trivial-fix-vs-query ignore(<url>) decision rule.
  • Support-level consistency fixes are applied automatically. The classes of fix (extract private val, switch Incompatible(None) to Some(reason), add missing get*Reasons, move a check from convert into getSupportLevel, hoist a shared reason into a private companion) are all mechanical and do not need user approval.
  • Only the "missing test coverage" bucket still pauses for user input, because adding tests for cases that already work is incremental polish.

How are these changes tested?

This change is documentation for a Claude skill and has no code or test impact. The Step 6/7 changes were validated by applying the skill to a 38-function audit (#4448), which exercised every branch (correctness findings captured as query ignore(...) tests, consistency fixes applied as mechanical edits across 7 serdes, doc audit sub-bullets added).

andygrove added 2 commits May 27, 2026 07:44
…ression skill

Expand the audit skill so it produces consistent, accurate findings about
`getSupportLevel`, `getIncompatibleReasons`, `getUnsupportedReasons`, and
`convert`.

Step 3 gains a dedicated subsection covering:

- A decision rule for `Unsupported` vs `Incompatible` (wrong answer vs
  fallback/error).
- The runtime-vs-docs split: support-level `notes` flow into EXPLAIN via
  the dispatcher, while `get*Reasons()` is read only by `GenerateDocs`.
- Six consistency rules covering every branch, dead-reason detection,
  shared `private val` for reason strings, `Incompatible(Some(reason))`
  preference, and gating decisions in `getSupportLevel` rather than
  `convert`.
- Wording guidelines for reason strings (user-observable effect first,
  sentence case, backtick configs/types, tracking issue links).
- Real antipattern callouts naming specific serdes (`CometHour`,
  `CometMinute`, `CometSecond`, `CometInitCap`).

Step 5 gains a structured 9-item support-level consistency audit
checklist that produces findings against the four methods.

Step 7 splits into two asks: one for missing tests, one for consistency
fixes, so the user can opt in to either independently.
…omatically

Tightens Step 6 / Step 7 of the audit-comet-expression skill so the
workflow produces concrete output without pausing for user approval on
mechanical steps:

- Step 6 reorganises the priorities into three named buckets:
  correctness divergences, missing coverage, and consistency issues.
- Step 7 requires correctness findings to be captured as SQL file
  tests in the same PR as the audit. Walks through the
  "search for existing issue, file if missing" workflow and the
  trivial-fix-vs-`query ignore(<url>)` decision rule, with a concrete
  example.
- Step 7 also applies every Step 5 consistency finding automatically.
  These are mechanical edits (extract `private val`, switch
  `Incompatible(None)` to `Some(reason)`, add missing `get*Reasons`,
  move a check from `convert` into `getSupportLevel`, hoist a shared
  reason into a private companion) and do not need user approval.
- Only "missing test coverage" still pauses for user input, because
  adding tests for cases that already work is incremental polish.

Surfaced while applying the skill to a multi-expression audit in
apache#4448; both behaviours felt obviously right when the audit ran on a
larger surface and several recurring patterns made the asking step
feel like friction rather than safety.
etc.). Each one becomes a captured test in Step 7.

### Medium priority
### Medium priority: missing test coverage
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this prob should be a high priority? as missing tests can lead to correctness divergence

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point — split the bucket in 5e9ccc3. High priority now covers correctness divergences and untested cases on a known-fragile axis (overflow, NULL handling, timezones, ANSI, leap years, version-specific semantics). Those get captured as tests automatically in Step 7 (regression test if it passes today, query ignore(<url>) if it fails). Medium priority is reserved for low-risk gaps on well-exercised code paths.

Copy link
Copy Markdown
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andygrove

Address PR review feedback (apache#4447): missing tests can mask correctness
divergences. Split the Step 6 'missing test coverage' bucket — gaps on
known-fragile axes (overflow, NULL handling, timezones, ANSI mode, leap
years, version-specific semantics) are now High priority and captured
as tests automatically; only low-risk gaps on well-exercised paths stay
Medium priority and require user approval.

Step 7's 'Correctness divergences: capture as tests' is renamed to
'High-priority findings: capture as tests' and now covers both confirmed
divergences and high-risk untested cases (run the case manually first,
then commit either a regression test or a query ignore(<url>) test).
@andygrove andygrove merged commit a95a2d4 into apache:main May 27, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants