Skip to content

fix(parser): support qualified names and nested functions in SQL#99

Open
cpsievert wants to merge 1 commit intoposit-dev:mainfrom
cpsievert:fix/parser-function-args
Open

fix(parser): support qualified names and nested functions in SQL#99
cpsievert wants to merge 1 commit intoposit-dev:mainfrom
cpsievert:fix/parser-function-args

Conversation

@cpsievert
Copy link
Contributor

Summary

Fixes parsing limitations that caused common SQL patterns to fail. The tree-sitter grammar's positional_arg rule was too restrictive, preventing:

  • Table alias prefixes: SELECT p.name, SUM(s.quantity) FROM sales s JOIN products p
  • Cross-table arithmetic: SUM(quantity * price) across joined tables
  • Nested function calls: ROUND(AVG(price), 2), COALESCE(NULLIF(TRIM(x), ''), 'default')
  • Full table qualifiers: products.product_name instead of just product_name

Root Cause

The positional_arg rule in grammar.js only accepted:

positional_arg: $ => choice(
  $.identifier,  // Simple column name only
  $.number,
  $.string,
  '*'
),

This meant function arguments like SUM(s.quantity) failed because s.quantity is a qualified_name, not a simple identifier. Similarly, ROUND(AVG(x), 2) failed because AVG(x) is a function_call, not one of the accepted types.

Investigation

Using tree-sitter parse, I traced the exact failure points:

# SUM(s.quantity) - the "s." caused an ERROR node
(function_call [0, 15] - [0, 30]
  (identifier [0, 15] - [0, 18])
  (ERROR [0, 19] - [0, 21])  ← "s." not parseable
  (function_args ...))

# ROUND(AVG(price), 2) - nested function not recognized
(function_call [0, 7] - [0, 16]
  (identifier [0, 7] - [0, 12])
  (function_args ...))
(subquery [0, 16] - [0, 23]  ← "(price)" parsed as subquery, not AVG()
  (subquery_body ...))

Solution

Extended positional_arg to support complex expressions:

positional_arg: $ => prec.left(choice(
  $.qualified_name,  // table.column references
  $.number,
  $.string,
  '*',
  $.function_call,   // nested functions
  // arithmetic/comparison expressions
  seq($.positional_arg, choice('+', '-', '*', '/', ...), $.positional_arg),
  seq('(', $.positional_arg, ')')  // parenthesized expressions
)),

Test plan

  • Added 6 regression tests covering all reported patterns
  • All 379 existing Rust tests pass
  • All 53 tree-sitter grammar tests pass
  • End-to-end execution verified with DuckDB

New tests added:

Test Pattern
test_table_alias_prefixes_in_select SELECT p.name, SUM(s.quantity)
test_cross_table_arithmetic SUM(quantity * price)
test_nested_function_calls ROUND(AVG(price), 2)
test_full_table_name_qualifiers products.product_name
test_simple_cross_table_multiplication a.x * b.y
test_deeply_nested_functions COALESCE(NULLIF(TRIM(name), ''), 'Unknown')

🤖 Generated with Claude Code

The tree-sitter grammar's `positional_arg` rule was too restrictive,
only accepting simple identifiers, numbers, strings, and wildcards.
This caused parse failures for common SQL patterns:

1. Table alias prefixes: `SELECT p.name, SUM(s.quantity)`
2. Cross-table arithmetic: `SUM(quantity * price)`
3. Nested function calls: `ROUND(AVG(price), 2)`
4. Full table qualifiers: `products.product_name`

The fix extends `positional_arg` to also accept:
- `qualified_name` for table.column references
- `function_call` for nested functions
- Binary arithmetic/comparison expressions
- Parenthesized expressions

Adds 6 regression tests covering all reported limitations.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Collaborator

@georgestagg georgestagg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, nice spot!

Would you mind tweaking where the tests live before merging?

Comment on lines +3137 to +3142

// ========================================
// Parser Limitation Tests (from ggsql-parser-limitations.md)
// Tests to verify and fix reported parsing issues
// ========================================

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this comment adds anything.

I don't mind comments naming groups of specific types of tests, but I'd like to avoid littering the code with comments referencing specific planning files (ggsql-parser-limitations.md), or general statements like "reported parsing issues".

Comment on lines +3143 to +3236
#[test]
fn test_table_alias_prefixes_in_select() {
// Issue 1: Table alias prefixes in SELECT clause
// Query like `SELECT p.product_name FROM products p` should parse
let query = r#"
SELECT p.product_name, SUM(s.quantity) as total
FROM sales s JOIN products p ON s.product_id = p.product_id
GROUP BY p.product_name
VISUALISE
DRAW bar MAPPING product_name AS x, total AS y
"#;

let result = parse_test_query(query);
assert!(result.is_ok(), "Parse failed: {:?}", result);
}

#[test]
fn test_cross_table_arithmetic() {
// Issue 2: Cross-table arithmetic expressions
// `quantity * price` across joined tables should work
let query = r#"
WITH t AS (
SELECT region, SUM(quantity * price) as revenue
FROM sales JOIN products ON sales.product_id = products.product_id
GROUP BY region
)
VISUALISE FROM t
DRAW bar MAPPING region AS x, revenue AS y
"#;

let result = parse_test_query(query);
assert!(result.is_ok(), "Parse failed: {:?}", result);
}

#[test]
fn test_nested_function_calls() {
// Issue 3: Nested function calls
// `ROUND(AVG(price), 2)` should parse
let query = r#"
SELECT category, ROUND(AVG(price), 2) as avg_price
FROM products
GROUP BY category
VISUALISE
DRAW bar MAPPING category AS x, avg_price AS y
"#;

let result = parse_test_query(query);
assert!(result.is_ok(), "Parse failed: {:?}", result);
}

#[test]
fn test_full_table_name_qualifiers() {
// Issue 4: Full table name qualifiers
// `products.product_name` (full table name, not alias) should work
let query = r#"
SELECT products.product_name, SUM(sales.quantity) as total
FROM sales
JOIN products ON sales.product_id = products.product_id
GROUP BY products.product_name
VISUALISE
DRAW bar MAPPING product_name AS x, total AS y
"#;

let result = parse_test_query(query);
assert!(result.is_ok(), "Parse failed: {:?}", result);
}

#[test]
fn test_simple_cross_table_multiplication() {
// Simplified version of cross-table arithmetic
let query = r#"
SELECT a.x * b.y as result
FROM a JOIN b ON a.id = b.id
VISUALISE
DRAW point MAPPING result AS x
"#;

let result = parse_test_query(query);
assert!(result.is_ok(), "Parse failed: {:?}", result);
}

#[test]
fn test_deeply_nested_functions() {
// Multiple levels of nesting
let query = r#"
SELECT COALESCE(NULLIF(TRIM(name), ''), 'Unknown') as clean_name
FROM data
VISUALISE
DRAW bar MAPPING clean_name AS x
"#;

let result = parse_test_query(query);
assert!(result.is_ok(), "Parse failed: {:?}", result);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like these tests! But I think they should live in the parser.

Can we move them into the tree-sitter-ggsql/test/corpus/basic.txt file, converted into the format required there?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants