Skip to content

[SPARK-57053][SQL][TESTS] Expand time-window.sql SQL test coverage#56098

Open
vladimirg-db wants to merge 1 commit into
apache:masterfrom
vladimirg-db:vladimirg-db/time-window-tests-import
Open

[SPARK-57053][SQL][TESTS] Expand time-window.sql SQL test coverage#56098
vladimirg-db wants to merge 1 commit into
apache:masterfrom
vladimirg-db:vladimirg-db/time-window-tests-import

Conversation

@vladimirg-db
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Adds ~70 new positive and negative test cases to time-window.sql, covering gaps in the existing tumbling/sliding/session window coverage:

  • Tumbling and sliding window edge cases: NTZ time column, subsecond precision, null timestamps, negative/zero/non-literal/unparseable duration, slide >= duration, startTime > slide, ROLLUP/CUBE/GROUPING SETS variants for sliding windows.
  • window_time() function: basic event-time extraction, NTZ, wrong argument type/arity, and struct-input rejection paths.
  • session_window parity with tumbling/sliding coverage: basic aggregation, conditional gap duration (from the docs example), column-valued gap duration, NTZ, null timestamps, subsecond, negative/zero gap, wrong arg count, multiple-window error, and acceptance/rejection of session_window in WHERE / QUALIFY / SELECT *.
  • Nested and stacked windows: nested window in GROUP BY, nested literal windows in SELECT/GROUP BY ALL, nested window via subquery, stacked sibling windows.
  • Window placement: in simple projection, in SELECT with GROUP BY ALL, in select+group by combinations, in EXISTS subquery, in PIVOT aggregate.

Most query texts are imported verbatim from the Reyden time-window-tests golden file. One Reyden-only case (window_overlap_exceeds_max, an internal cap that Spark does not enforce) is skipped because Spark accepts the query and generates an extremely large Expand plan. Two cases whose Reyden names imply rejection but which Spark accepts are renamed to drop the _rejected suffix.

Why are the changes needed?

time-window.sql previously covered only ~23 cases, all GROUP BY variants of window() / session_window(). Resolver work on the single-pass analyzer surfaced multiple uncovered code paths (window_time() extraction, NTZ inputs, struct-input handling, session window with conditional/column gap). This PR expands the suite so future analyzer changes have a broader regression surface.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing test (time-window.sql + time-window.sql_analyzer_test). Goldens regenerated with SPARK_GENERATE_GOLDEN_FILES=1.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (claude-opus-4-7)

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

### What changes were proposed in this pull request?

Adds ~70 new positive and negative test cases to `time-window.sql`,
covering gaps in the existing tumbling/sliding/session window coverage:

- Tumbling and sliding window edge cases: NTZ time column, subsecond
  precision, null timestamps, negative/zero/non-literal/unparseable
  duration, slide >= duration, startTime > slide, ROLLUP/CUBE/GROUPING
  SETS variants for sliding windows.
- `window_time()` function: basic event-time extraction, NTZ, wrong
  argument type/arity, and struct-input rejection paths.
- `session_window` parity with tumbling/sliding coverage: basic
  aggregation, conditional gap duration (from the docs example),
  column-valued gap duration, NTZ, null timestamps, subsecond,
  negative/zero gap, wrong arg count, multiple-window error, and
  acceptance/rejection of session_window in WHERE / QUALIFY / SELECT *.
- Nested and stacked windows: nested window in GROUP BY, nested literal
  windows in SELECT/GROUP BY ALL, nested window via subquery, stacked
  sibling windows.
- Window placement: in simple projection, in SELECT with GROUP BY ALL,
  in select+group by combinations, in EXISTS subquery, in PIVOT
  aggregate.

Most query texts are imported verbatim from the Reyden
`time-window-tests` golden file. One Reyden-only case
(`window_overlap_exceeds_max`, an internal cap that Spark does not
enforce) is skipped because Spark accepts the query and generates an
extremely large `Expand` plan. Two cases whose Reyden names imply
rejection but which Spark accepts are renamed to drop the `_rejected`
suffix.

### Why are the changes needed?

`time-window.sql` previously covered only ~23 cases, all GROUP BY
variants of `window()` / `session_window()`. Resolver work on the
single-pass analyzer surfaced multiple uncovered code paths
(`window_time()` extraction, NTZ inputs, struct-input handling, session
window with conditional/column gap). This PR expands the suite so
future analyzer changes have a broader regression surface.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing test (`time-window.sql` + `time-window.sql_analyzer_test`).
Goldens regenerated with `SPARK_GENERATE_GOLDEN_FILES=1`.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (claude-opus-4-7)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant