Add BigQuery CREATE EXTERNAL TABLE WITH PARTITION COLUMNS parsing support#66
Add BigQuery CREATE EXTERNAL TABLE WITH PARTITION COLUMNS parsing support#66mesmacosta wants to merge 2 commits intotobilg:mainfrom
Conversation
- Parse CREATE EXTERNAL TABLE with column definitions, WITH PARTITION COLUMNS, WITH CONNECTION, and OPTIONS clauses - Add is_bigquery_external guard to prevent generic with_properties from consuming the WITH token needed by BigQuery-specific clauses - Add BigQuery EXTERNAL TABLE generation (builder + generator) - Add EXTERNAL to BigQuery dialect's special_modifier list - Add 19 tests covering parsing, roundtrip, and WITH syntax compatibility (CTEs, PARTITION BY, CLUSTER BY, OR REPLACE, etc.) - All 916 tests pass with no regressions
|
Nice work! Clean early-return pattern that keeps the BigQuery-specific logic well-scoped. A few things to address: Must fix
self.advance(); // consume PARTITION
self.match_identifier("COLUMNS"); // ← return value discarded
if self.check(TokenType::LParen) {If someone writes if !self.match_identifier("COLUMNS") {
// error or handle bare WITH PARTITION case if BigQuery allows it
}Suggestions
Nits
What looks good
|
1a9add1 to
0f99e5f
Compare
|
@tobilg addressed the review items. |
|
Will have a look, thank you! |
|
@tobilg did you get a chance to look at this? |
|
Hey @tobilg (I work with Marcelo), as I stated here (#28 (comment)) we really like this library and it's potential and we could see ourselves contribute - hence this PR. One observation however is that while issues seem to get merged fast, there is currently 3 PRs that are moving very slow, comparably, even if contributors seem to turn around fast. In terms of adoption and "feeling good" about using a library like this I feel a bit uneasy. Our options are either keep using sqlparser-rs which is stable both in terms of functionality, adoption and community. My assumption is that this at the moment lives mainly on your computer and it's much easier to feed the issues via MCP or CLI directly to an LLM and crank out code vs integrating external code. This is not a critique per se, we know where this library comes from and it's an impressive piece of work. But it would be good to understand your plans and thinking going forward. Will it stay a "pet project" along with other initiatives or would you look to add other maintainers etc? |
|
Hey @MartinSahlen and @mesmacosta, thanks for the PR and your contributions. Sorry for the delay. During the last few weeks this repo was hammered with automated GH issues (assumingly) created by clankers, and I maintain multiple dozens of other projects, many of which are related to DuckDB which had two releases in short term as well that had me busy updating them as well. This PR was stale and not mergeable since nearly three weeks if Claude Code is correct. I merged the PRs contents to the main branch this morning, this should be available in the next version I was on holidays for one week and didn't touch a computer in this time. I attribute multiple hours a week to my OSS projects, and kept doing this for the last 8-10 years. I pay a monthly 3-digit dollar amount for my AI subscriptions, and a 2-digit dollar amount for running CI and test environments, all of it used to create and maintain my OSS projects, for free. Regarding the question whether to use sqlparser-rs or polyglot, I think this is up to you guys to decide. I try my best to do many releases, some with very noticeable improvements regarding speed etc., but the project is currently just me with Claude Code & Codex, some occasional PRs and a lot of (automated) GH issue reports of sometimes questionable quality. All I can offer is best effort, as I do have a full day job... I'm open to onboard other maintainers, but so far nobody came forward with such an idea. |
|
All good, as you understand this was not meant as harsh critique, more as an outsiders perspective and genuine question. Hence my question of where this sits in your "portfolio" of other OSS work and projects, and how you see it evolve. |
Summary
BigQuery's
CREATE EXTERNAL TABLEsyntax supports clauses likeWITH PARTITION COLUMNSandWITH CONNECTIONthat use theWITHkeyword differently from the genericWITHproperties parser. The generic parser was causing parse failures whenever column definitions or partition columns were present.Example: failing query before this fix
Fix:
WITHtoken consumption site, with a clean early returnEXTERNALto BigQuery'sspecial_modifierdialect listwith_partition_columns,with_connection)This does not affect parsing of regular BigQuery
CREATE TABLEstatements, CTEs inAS SELECT, or any other dialect'sWITHhandling — the logic only activates whentable_modifier == "EXTERNAL"ANDdialect == BigQuery.Changes
parser.rs— BigQuery EXTERNAL TABLE parsing + 19 new testsexpressions.rs— Newwith_partition_columnsandwith_connectionfields onCreateTablebuilder.rs— Builder support for new fieldsgenerator.rs— SQL generation for BigQuery EXTERNAL TABLE syntaxdialects/mod.rs— AddEXTERNALto BigQuery's special modifiersTest plan