feat(gf-cypher): replace LALRPOP with recursive-descent + Pratt parser scaffold#654
Conversation
…r scaffold (#554) LALRPOP produced 50 unresolvable structural shift/reduce conflicts in LR(1) mode after full openCypher operator coverage. State-merging contaminated FOLLOW sets across unrelated grammar contexts — not fixable within LALRPOP. Replace with hand-written recursive-descent (clauses) + Pratt (expressions), the architecture used by Neo4j, PostgreSQL, and SQLite. Conflict-free by construction; no build-time grammar generation. Changes: - Remove lalrpop, lalrpop-util, logos from gf-cypher Cargo.toml - Delete grammar.lalrpop and build.rs (no-op remnant also removed) - Add lexer.rs: hand-written Tok lexer with compound tokens (NOT IN, IS NULL, IS NOT NULL, STARTS WITH, ENDS WITH) - Add parser/mod.rs: TokenStream scaffold (peek, peek_n, eat, eat_if, advance, span helpers) — 11 unit tests passing - lib.rs: delegates directly to parser::parse() - gf-ast: add UnionClause, FunctionCall::star field, UnterminatedBlockComment error variant - ADR 0003: rewritten to document full history (LALR→LR(1)→RD+Pratt) and final decision rationale - Docs: 7-verb API surface, algorithms catalog, RD+Pratt parser diagram, Swift/Kotlin UniFFI bindings, roadmap, api reference all updated - CLAUDE.md: updated for current architecture Implementation continues in child issues: - #650 Pratt expression parser (expr.rs) - #651 Pattern parser (patterns.rs) - #652 Read-only clauses (clauses.rs) - #653 Write clauses + final wire-up Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
WalkthroughAdds a hand-written Cypher lexer (Tok, Lexer) with span-aware errors, extends gf-ast for ChangesLexer and Parser Foundation
Sequence Diagram(s)sequenceDiagram
participant Source as Source Text
participant Lexer as gf_cypher::lexer::Lexer
participant TokenStream as gf_cypher::parser::TokenStream
participant Parser as gf_cypher::parser::parse
Source->>Lexer: provide characters
Lexer->>Lexer: skip whitespace/comments, lex token
Lexer->>TokenStream: emit Spanned<Tok> (start,tok,end)
TokenStream->>Parser: peek/eat/advance tokens
Parser->>TokenStream: request spans (span_from/current_span)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #654 +/- ##
=======================================
Coverage 84.09% 84.09%
=======================================
Files 49 49
Lines 12921 12921
Branches 3628 3628
=======================================
Hits 10866 10866
Misses 1270 1270
Partials 785 785
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report in Codecov by Sentry.
|
- cargo fmt: lexer.rs enum variants one-per-line (CI fmt check) - mkdocs.yml: add algorithms.md and ADR 0003 to nav (docs build) - overview.md: fix broken ../adr/0004-uniffi-bindings.md link → 0002 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
cargo fmt expands Ok(Self { tokens, pos: 0, input }) to multi-line form
when line length exceeds the configured threshold on CI.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
crates/gf-ast/src/tests.rs (1)
237-251: ⚡ Quick winAdd explicit tests for
f(*)andUNIONAST paths.Line 241 only exercises
star: false. Please add one roundtrip test forFunctionCall { star: true, args: vec![] }and one forAstClause::Unionspan + JSON roundtrip.Proposed test additions
#[test] fn ast_function_call_roundtrip() { @@ assert_eq!(e, back); } + +#[test] +fn ast_function_call_star_roundtrip() { + let e = Expr::FunctionCall(FunctionCall { + name: vec!["count".into()], + distinct: false, + star: true, + args: vec![], + span: Span::new(0, 8), + }); + let json = serde_json::to_string(&e).unwrap(); + let back: Expr = serde_json::from_str(&json).unwrap(); + assert_eq!(e, back); +} + +#[test] +fn ast_union_clause_roundtrip_and_span() { + let clause = AstClause::Union(UnionClause { + all: true, + span: Span::new(5, 14), + }); + assert_eq!(clause.span(), Span::new(5, 14)); + let json = serde_json::to_string(&clause).unwrap(); + let back: AstClause = serde_json::from_str(&json).unwrap(); + assert_eq!(clause, back); +}Based on learnings: Write comprehensive unit tests for each layer (parser, planner, executor), integration tests for end-to-end behavior, and aim for 100% coverage on new code (90% minimum).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/gf-ast/src/tests.rs` around lines 237 - 251, The existing ast_function_call_roundtrip test only covers FunctionCall with star: false; add two tests: (1) a new test (e.g., ast_function_call_star_roundtrip) that constructs Expr::FunctionCall(FunctionCall { name: vec!["f".into()], distinct: false, star: true, args: vec![], span: Span::new(...)}), serializes with serde_json::to_string and deserializes with serde_json::from_str and asserts equality; (2) a new test (e.g., ast_clause_union_roundtrip) that constructs an AstClause::Union with an appropriate Span, serializes/deserializes via serde_json and asserts equality to exercise the UNION AST path; reuse existing patterns from ast_function_call_roundtrip (serde_json::to_string/from_str and assert_eq!) and import/construct Span as done elsewhere in tests.rs.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/gf-cypher/src/lexer.rs`:
- Around line 217-231: The Unicode-escape branch handling Some('u') in lexer.rs
currently ignores malformed escapes; change it to reject invalid escapes by
returning a lexer error instead of silently skipping: ensure you collect exactly
four hex digits (fail if fewer than 4), validate u32::from_str_radix(&hex, 16)
succeeds, and validate char::from_u32(n) yields Some(ch); on any failure
propagate/return the lexical error using the lexer’s existing error path (the
surrounding scanning function’s error/Err return), rather than pushing a
replacement or continuing.
In `@crates/gf-cypher/src/parser/mod.rs`:
- Around line 10-13: The stub parser error uses a fixed Span::new(0, 0) which
misreports EOF location; update the ParseError::new call (the UnexpectedEof
construction in crates/gf-cypher/src/parser/mod.rs) to use the actual EOF
position by replacing Span::new(0, 0) with a span anchored at the end of the
input (e.g., Span::new(input.len(), input.len())), keeping the rest of the
ParseError::new(ParseErrorKind::UnexpectedEof { ... }, ..., "parser not yet
implemented") intact.
---
Nitpick comments:
In `@crates/gf-ast/src/tests.rs`:
- Around line 237-251: The existing ast_function_call_roundtrip test only covers
FunctionCall with star: false; add two tests: (1) a new test (e.g.,
ast_function_call_star_roundtrip) that constructs
Expr::FunctionCall(FunctionCall { name: vec!["f".into()], distinct: false, star:
true, args: vec![], span: Span::new(...)}), serializes with
serde_json::to_string and deserializes with serde_json::from_str and asserts
equality; (2) a new test (e.g., ast_clause_union_roundtrip) that constructs an
AstClause::Union with an appropriate Span, serializes/deserializes via
serde_json and asserts equality to exercise the UNION AST path; reuse existing
patterns from ast_function_call_roundtrip (serde_json::to_string/from_str and
assert_eq!) and import/construct Span as done elsewhere in tests.rs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 3cb7c02c-7616-4b3e-998b-53884bf3ca9d
⛔ Files ignored due to path filters (15)
CLAUDE.mdis excluded by!**/*.mdCargo.lockis excluded by!**/*.lock,!**/*.lockdocs/adr/0002-rust-core.mdis excluded by!**/*.md,!**/docs/**docs/adr/0003-lr1-grammar.mdis excluded by!**/*.md,!**/docs/**docs/architecture/algorithms.mdis excluded by!**/*.md,!**/docs/**docs/architecture/ast-and-planning.mdis excluded by!**/*.md,!**/docs/**docs/architecture/execution-model.mdis excluded by!**/*.md,!**/docs/**docs/architecture/overview.mdis excluded by!**/*.md,!**/docs/**docs/development/contributing.mdis excluded by!**/*.md,!**/docs/**docs/development/workflow.mdis excluded by!**/*.md,!**/docs/**docs/index.mdis excluded by!**/*.md,!**/docs/**docs/reference/RELEASE_TRACKING.mdis excluded by!**/*.md,!**/docs/**docs/reference/api.mdis excluded by!**/*.md,!**/docs/**docs/reference/opencypher-compatibility.mdis excluded by!**/*.md,!**/docs/**docs/releases/roadmap.mdis excluded by!**/*.md,!**/docs/**
📒 Files selected for processing (8)
crates/gf-ast/src/ast.rscrates/gf-ast/src/lib.rscrates/gf-ast/src/parse_error.rscrates/gf-ast/src/tests.rscrates/gf-cypher/Cargo.tomlcrates/gf-cypher/src/lexer.rscrates/gf-cypher/src/lib.rscrates/gf-cypher/src/parser/mod.rs
| Err(ParseError::new( | ||
| ParseErrorKind::UnexpectedEof { expected: vec![] }, | ||
| Span::new(0, 0), | ||
| "parser not yet implemented", |
There was a problem hiding this comment.
Use EOF span in the stub error for accurate diagnostics.
At Line 12, returning Span::new(0, 0) for all inputs makes downstream error reporting misleading. Use input.len() for the placeholder error span.
Proposed fix
pub fn parse(input: &str) -> Result<AstQuery, ParseError> {
let _ts = TokenStream::new(input)?;
+ let eof = input.len();
Err(ParseError::new(
ParseErrorKind::UnexpectedEof { expected: vec![] },
- Span::new(0, 0),
+ Span::new(eof, eof),
"parser not yet implemented",
))
}🧰 Tools
🪛 GitHub Actions: PR `#654` / 0_Analyze (rust).txt
[warning] 11-11: macro expansion failed for 'vec'
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/gf-cypher/src/parser/mod.rs` around lines 10 - 13, The stub parser
error uses a fixed Span::new(0, 0) which misreports EOF location; update the
ParseError::new call (the UnexpectedEof construction in
crates/gf-cypher/src/parser/mod.rs) to use the actual EOF position by replacing
Span::new(0, 0) with a span anchored at the end of the input (e.g.,
Span::new(input.len(), input.len())), keeping the rest of the
ParseError::new(ParseErrorKind::UnexpectedEof { ... }, ..., "parser not yet
implemented") intact.
- ast-and-planning.md: fix broken backtick nesting in Pratt binding powers line - api.md: common parameters table now shows per-verb availability instead of claiming all three params are shared by all verbs - adr/0002-rust-core.md: "four analyst-intent verbs" → seven (correct count) - algorithms.md: "four analyst-intent verbs" → five (algorithm verbs only, correct count); add forge.similar() to overview code block Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tly dropping Invalid hex digits or surrogate code points now return ParseError with the offending escape span. Previously \uZZZZ would silently produce an empty string fragment. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| for _ in 0..4 { | ||
| match self.advance() { | ||
| Some(h) => hex.push(h), | ||
| None => break, | ||
| } | ||
| } |
There was a problem hiding this comment.
🟡 Medium src/lexer.rs:288
In read_string, the \uXXXX escape handler consumes any 4 characters rather than exactly 4 hex digits. For input like "\uAB", the closing quote is swallowed as part of the hex sequence, producing hex = "AB\" and corrupting the lexer state. The loop should validate each character is a hex digit and break early if not.
- for _ in 0..4 {
- match self.advance() {
- Some(h) => hex.push(h),
- None => break,
- }
- }
+ for _ in 0..4 {
+ match self.peek() {
+ Some(h) if h.is_ascii_hexdigit() => {
+ self.advance();
+ hex.push(h);
+ }
+ _ => break,
+ }
+ }🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file crates/gf-cypher/src/lexer.rs around lines 288-293:
In `read_string`, the `\uXXXX` escape handler consumes any 4 characters rather than exactly 4 hex digits. For input like `"\uAB"`, the closing quote is swallowed as part of the hex sequence, producing `hex = "AB\"` and corrupting the lexer state. The loop should validate each character is a hex digit and break early if not.
Evidence trail:
crates/gf-cypher/src/lexer.rs lines 284-306 at REVIEWED_COMMIT: The `for _ in 0..4` loop (line 288) calls `self.advance()` and pushes any `Some(h)` character without checking `h.is_ascii_hexdigit()`. Line 290 `Some(h) => hex.push(h)` accepts any character. Line 275 `Some(c) if c == quote => break` is the quote-termination check but it's in the outer loop and never reached during the inner `for` loop's execution.
* feat(gf-cypher): Pratt expression parser — all Expr variants (#650) Implements parse_expr(ts, min_bp) in crates/gf-cypher/src/parser/expr.rs. Binding power table: OR(10/11), XOR(20/21), AND(30/31), NOT prefix(35), comparisons(40/41), IS NULL/NOT IN/STARTS WITH/etc(40/40), +/-(50/51), */%( 60/61), ^(70/70 right-assoc), ./[(80/81 postfix) Produces all Expr AST variants: Literal, Var, Param, BinaryOp, UnaryOp, FunctionCall, Property, List, Map, Case, ListComprehension, IsNull, InList, StringOp, RegexMatch, Parenthesized - COUNT(*), COUNT(DISTINCT x), keyword-named functions (exists, any, none, single, filter, extract, reduce, shortestPath) - Simple and searched CASE expressions - List comprehensions with optional WHERE filter and | projection - Subscript [idx] and slice [lo..hi] encoded as synthetic _subscript and _slice FunctionCall nodes until AST gains a dedicated Slice node - eat_ident: error span now references the actual bad token - parse_map_literal: duplicate keys are now a ParseError 49 tests passing (11 TokenStream + 38 expression). Part of #554. Depends on #650 scaffolding (merged in #654). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(parser): extend eat_ident keyword-as-property coverage Add NOT, AND, OR, XOR, OPTIONAL, ANY, NONE, SINGLE, FILTER, EXTRACT, REDUCE, SHORTESTPATH, ALLSHORTESTPATHS to tok_as_keyword_str so that property accesses like n.not or n.any parse correctly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(parser): use exact token end positions for literal spans Atomic literal/param prefixes now capture (l, _, r) from advance() and use Span::new(l, r) directly, so spans don't bleed into trailing whitespace before the next token. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
gf-cypherafter 50 unresolvable structural shift/reduce conflicts in LR(1) mode (see ADR 0003 for full history)Toklexer with compound tokens (NOT IN,IS NULL,STARTS WITH, etc.)TokenStreamscaffold (peek,peek_n,eat,eat_if,advance, span helpers) — 11 unit testsgf-astwithUnionClause,FunctionCall::star,UnterminatedBlockCommentparser::parse()currently returnsErr(unimplemented)— the clause and expression parsers follow in child issues.Child issues (implement in order after this merges)
expr.rs)patterns.rs)clauses.rs)grammar.lalrpopTest plan
cargo build -p gf-cypher— clean, no conflict warnings, no grammar generation stepcargo test -p gf-cypher— 11 TokenStream unit tests passingcargo test --workspace— 66 tests, 0 failuresCloses #554 (tracking — child issues carry the implementation)
🤖 Generated with Claude Code
Note
Replace LALRPOP with hand-written recursive-descent + Pratt parser in
gf-cypherlalrpop-utilandlogosdependencies fromgf-cypher/Cargo.tomland replaces them with a hand-writtenLexerandTokenStream.NOT IN,IS NULL,STARTS WITH, etc.) with precise span-based error reporting.TokenStreameagerly lexes input into aVecof spanned tokens and provides lookahead, consume, and error helpers to support the upcoming recursive-descent clause parser and Pratt expression parser.UnionClause,AstClause::Union, astarfield onFunctionCall, and anUnterminatedBlockCommentparse error kind.parser::parseis a stub that always returnsUnexpectedEof; no Cypher queries can be parsed yet.Macroscope summarized 0533882.
Summary by CodeRabbit
New Features
Bug Fixes
Documentation