Unified SQL/PPL breaking changes analysis#12
Draft
dai-chen wants to merge 18 commits into
Draft
Conversation
Add SQL support to the unified query API using Calcite's native parser pipeline (SqlParser → SqlValidator → SqlToRelConverter → RelNode), bypassing the ANTLR parser used by PPL. Changes: - UnifiedQueryPlanner: use PlanningStrategy to dispatch SQLStrategy vs PPLStrategy with no conditionals in plan() - SQLStrategy: Calcite Planner with try-with-resources for SQL - UnifiedQueryContext: SqlParser.Config with Casing.UNCHANGED to preserve lowercase OpenSearch index names Signed-off-by: Chen Dai <daichen@amazon.com>
…guage support - Refactor UnifiedQueryTestBase with queryType() hook for subclass override - Add UnifiedSqlQueryPlannerTest covering SELECT, WHERE, GROUP BY, JOIN, ORDER BY, subquery, case sensitivity, namespaces, and error handling - Update UnifiedQueryContextTest to verify SQL context creation Signed-off-by: Chen Dai <daichen@amazon.com>
Add language (PPL/SQL) and queryPattern param dimensions for side-by-side comparison of equivalent queries across both languages. Remove separate UnifiedSqlQueryBenchmark in favor of unified class. Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
…ugh unified pipeline
…hrough unified pipeline
- Fix schema caching: share single HashMap across getTableMap() calls to avoid
redundant REST _mapping fetches and race conditions under concurrency
- Unwrap exception cause chain in GapResult.failure() to report root cause
(e.g. 'Unknown function: query') instead of wrapper ('Failed to plan query')
…data - Fix gap analyzer to unescape JSON-encoded test queries before passing to unified pipeline (test queries are pre-escaped for REST buildRequest() JSON template but unified path bypasses the JSON round-trip) - PPL: 54.7% -> 97.7% success (37 false positives eliminated, 2 real failures) - SQL report rewritten: covers V2 + legacy tests (807 queries), corrected categories (removed misleading CalciteRexNodeVisitor references, fixed ISNULL categorization, added legacy test coverage including JOIN/SHOW/DESCRIBE)
…overage - Fix build.gradle to forward unified.gap.analysis system property to test JVM - PPL: 1593 queries (all PPL IT tests), 98.3% success, 27 failures - SQL: 807 queries (V2 + legacy), 0.4% success via Calcite native SQL parser - Add approach section explaining shadow mode interception - Correct SQL categories for Calcite ANSI SQL path (hyphenated names, backticks, missing functions) vs previous ANTLR-based path - Note on PPL vs SQL relevance function behavior difference
2cd1e28 to
dbde012
Compare
- Add quoteHyphenatedTableNames() to wrap names like opensearch-sql_test_index_* in double quotes so Calcite's parser treats them as identifiers - SQL: 807 queries, 575 success (71.3%), up from 3 success (0.4%) - Remaining 232 failures: missing functions (86), EXPR_TIMESTAMP (21), comma-join syntax (21), string-as-identifier (16), backtick (8), etc.
dbde012 to
d1dafb6
Compare
- Add RESULT_MISMATCH phase to detect queries that succeed in both V2 and Calcite but return different data - Compare V2 JDBC datarows against Calcite ResultSet with lenient normalization (float rounding, order-independent for non-ORDER BY) - Pass V2 response through SQLIntegTestCase and PPLIntegTestCase replay - New report: 548/807 success (67.9%), 27 queries reclassified from success to different-result failures across 6 sub-categories: extra metadata columns, JOIN row explosion, column truncation, numeric precision, geo-point format, LIKE escape behavior
PPL result comparison found 146 queries returning different results: - Numeric precision truncation (68): V2 truncates floats to ints - Cast/type conversion (12): same truncation in cast results - Stats eval GROUP BY NULL (12): Calcite loses group-by values - Geo-point format (8): JSON vs WKT serialization - Double field formatting (8): V2 strips trailing .00 - Trendline SMA (5): first-row NULL vs value behavior - top/rare extra column (5): Calcite includes count column - rename/eval column order (5): in-place vs append behavior - dayname/monthname locale (4): English vs JVM default locale - STDDEV_SAMP NULL vs 0 (4): single-value group handling - Other (15): regexp type, str_to_date, settings, etc. Overall: 1420/1593 success (89.1%), down from 98.3%
d1dafb6 to
e40e02a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
[Describe what this change achieves]
Related Issues
Resolves #[Issue number to be closed when this PR is merged]
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.