Skip to content

Unified SQL/PPL breaking changes analysis#12

Draft
dai-chen wants to merge 18 commits into
feature/unified-calcite-sql-supportfrom
poc/gap-analysis-on-formal
Draft

Unified SQL/PPL breaking changes analysis#12
dai-chen wants to merge 18 commits into
feature/unified-calcite-sql-supportfrom
poc/gap-analysis-on-formal

Conversation

@dai-chen
Copy link
Copy Markdown
Owner

Description

[Describe what this change achieves]

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

dai-chen added 13 commits March 24, 2026 13:46
Add SQL support to the unified query API using Calcite's native parser
pipeline (SqlParser → SqlValidator → SqlToRelConverter → RelNode),
bypassing the ANTLR parser used by PPL.

Changes:
- UnifiedQueryPlanner: use PlanningStrategy to dispatch SQLStrategy vs
  PPLStrategy with no conditionals in plan()
- SQLStrategy: Calcite Planner with try-with-resources for SQL
- UnifiedQueryContext: SqlParser.Config with Casing.UNCHANGED to
  preserve lowercase OpenSearch index names

Signed-off-by: Chen Dai <daichen@amazon.com>
…guage support

- Refactor UnifiedQueryTestBase with queryType() hook for subclass override
- Add UnifiedSqlQueryPlannerTest covering SELECT, WHERE, GROUP BY, JOIN,
  ORDER BY, subquery, case sensitivity, namespaces, and error handling
- Update UnifiedQueryContextTest to verify SQL context creation

Signed-off-by: Chen Dai <daichen@amazon.com>
Add language (PPL/SQL) and queryPattern param dimensions for
side-by-side comparison of equivalent queries across both languages.
Remove separate UnifiedSqlQueryBenchmark in favor of unified class.

Signed-off-by: Chen Dai <daichen@amazon.com>
- Fix schema caching: share single HashMap across getTableMap() calls to avoid
  redundant REST _mapping fetches and race conditions under concurrency
- Unwrap exception cause chain in GapResult.failure() to report root cause
  (e.g. 'Unknown function: query') instead of wrapper ('Failed to plan query')
…data

- Fix gap analyzer to unescape JSON-encoded test queries before passing to
  unified pipeline (test queries are pre-escaped for REST buildRequest() JSON
  template but unified path bypasses the JSON round-trip)
- PPL: 54.7% -> 97.7% success (37 false positives eliminated, 2 real failures)
- SQL report rewritten: covers V2 + legacy tests (807 queries), corrected
  categories (removed misleading CalciteRexNodeVisitor references, fixed ISNULL
  categorization, added legacy test coverage including JOIN/SHOW/DESCRIBE)
…overage

- Fix build.gradle to forward unified.gap.analysis system property to test JVM
- PPL: 1593 queries (all PPL IT tests), 98.3% success, 27 failures
- SQL: 807 queries (V2 + legacy), 0.4% success via Calcite native SQL parser
- Add approach section explaining shadow mode interception
- Correct SQL categories for Calcite ANSI SQL path (hyphenated names, backticks,
  missing functions) vs previous ANTLR-based path
- Note on PPL vs SQL relevance function behavior difference
@dai-chen dai-chen force-pushed the feature/unified-calcite-sql-support branch from 2cd1e28 to dbde012 Compare March 25, 2026 00:56
- Add quoteHyphenatedTableNames() to wrap names like opensearch-sql_test_index_*
  in double quotes so Calcite's parser treats them as identifiers
- SQL: 807 queries, 575 success (71.3%), up from 3 success (0.4%)
- Remaining 232 failures: missing functions (86), EXPR_TIMESTAMP (21),
  comma-join syntax (21), string-as-identifier (16), backtick (8), etc.
@dai-chen dai-chen force-pushed the feature/unified-calcite-sql-support branch from dbde012 to d1dafb6 Compare March 25, 2026 19:02
- Add RESULT_MISMATCH phase to detect queries that succeed in both V2
  and Calcite but return different data
- Compare V2 JDBC datarows against Calcite ResultSet with lenient
  normalization (float rounding, order-independent for non-ORDER BY)
- Pass V2 response through SQLIntegTestCase and PPLIntegTestCase replay
- New report: 548/807 success (67.9%), 27 queries reclassified from
  success to different-result failures across 6 sub-categories:
  extra metadata columns, JOIN row explosion, column truncation,
  numeric precision, geo-point format, LIKE escape behavior
PPL result comparison found 146 queries returning different results:
- Numeric precision truncation (68): V2 truncates floats to ints
- Cast/type conversion (12): same truncation in cast results
- Stats eval GROUP BY NULL (12): Calcite loses group-by values
- Geo-point format (8): JSON vs WKT serialization
- Double field formatting (8): V2 strips trailing .00
- Trendline SMA (5): first-row NULL vs value behavior
- top/rare extra column (5): Calcite includes count column
- rename/eval column order (5): in-place vs append behavior
- dayname/monthname locale (4): English vs JVM default locale
- STDDEV_SAMP NULL vs 0 (4): single-value group handling
- Other (15): regexp type, str_to_date, settings, etc.

Overall: 1420/1593 success (89.1%), down from 98.3%
@dai-chen dai-chen force-pushed the feature/unified-calcite-sql-support branch from d1dafb6 to e40e02a Compare March 25, 2026 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant