Skip to content

Support pattern expressions as boolean expressions#2360

Open
gregfelice wants to merge 4 commits into
apache:masterfrom
gregfelice:feature_pattern_in_where
Open

Support pattern expressions as boolean expressions#2360
gregfelice wants to merge 4 commits into
apache:masterfrom
gregfelice:feature_pattern_in_where

Conversation

@gregfelice
Copy link
Copy Markdown
Contributor

@gregfelice gregfelice commented Mar 25, 2026

Summary

Adds support for openCypher pattern expressions as general boolean expressions. Pattern expressions are now accepted anywhere a regular expression is valid — in WHERE, RETURN, WITH, SET, CASE, and as operands of boolean combinators — not just in WHERE.

Note on scope: An earlier version of this PR was titled "Support pattern expressions in WHERE clause via GLR parser." While implementing, Copilot correctly pointed out that adding anonymous_path to expr_atom actually enables pattern expressions in every expression context, not only WHERE. Rather than restrict the grammar with a WHERE-specific nonterminal, this PR now explicitly embraces the broader surface area — matching what openCypher defines and what Neo4j users expect — and covers the new contexts with regression tests. The PR title and description have been updated accordingly.

// WHERE (original motivation — still works)
MATCH (a:Person), (b:Person)
WHERE (a)-[:KNOWS]->(b)
RETURN a.name, b.name

// NOT pattern expression in WHERE
MATCH (a:Person)
WHERE NOT (a)-[:KNOWS]->(:Person)
RETURN a.name

// RETURN — project a boolean
MATCH (a:Person)
RETURN a.name, (a)-[:KNOWS]->(:Person) AS knows_someone

// CASE WHEN
MATCH (a:Person)
RETURN a.name,
       CASE WHEN (a)-[:KNOWS]->(:Person) THEN 'social' ELSE 'loner' END

// Combined with AND / OR
MATCH (a:Person)
RETURN a.name,
       (a)-[:KNOWS]->(:Person) AND (a)-[:WORKS_WITH]->(:Person) AS has_both

// SET — persist a computed boolean as a property
MATCH (a:Person)
SET a.is_social = (a)-[:KNOWS]->(:Person)
RETURN a.name, a.is_social

// WITH pipeline
MATCH (a:Person)
WITH a.name AS name, (a)-[:KNOWS]->(:Person) AS knows
WHERE knows
RETURN name

Implementation

Grammar (cypher_gram.y)

  • Switched the parser to GLR mode (%glr-parser) to handle the inherent ambiguity between parenthesized expressions and graph patterns. Both start with (, so a single-token lookahead cannot decide which production applies until several tokens later. GLR forks at the conflict point and discards the failing alternative.
  • Added anonymous_path to expr_atom with a %dprec 1 annotation. A bare (a) prefers the expression-variable interpretation (%dprec 2 on '(' expr ')'), so single-node pattern expressions still resolve as plain variable references.
  • Added make_exists_pattern_sublink() to wrap the pattern in an EXISTS subquery — a bare pattern (a)-[:KNOWS]->(b) in expression context is semantically equivalent to EXISTS((a)-[:KNOWS]->(b)).
  • Added a helper make_explain_stmt() and placed it immediately above its docstring comment so the grammar file stays readable.
  • Replaced (rhs)[1] with YYRHSLOC(rhs, 1) in the YYLLOC_DEFAULT macro — a required GLR-correctness fix, not stylistic. GLR's rhs stack has a different layout than LALR(1)'s; the raw index lands on the wrong slot under GLR and produces incorrect source locations.
  • Conflict budget pinned with %expect/%expect-rr. GLR produces 7 shift/reduce and 3 reduce/reduce conflicts (all arising from the ( ambiguity between path_node and parenthesized expr, plus expr_var/var_name_opt overlap on )/}/=). These are handled correctly at runtime by GLR + %dprec. The grammar declares %expect 7 and %expect-rr 3 so any deviation up or down fails the build and forces an audit of the new conflicts. The Makefile keeps -Werror and scopes the relaxation to -Wno-error=conflicts-sr -Wno-error=conflicts-rr, so other Bison warning categories (unused rules, undeclared types, etc.) still fail the build. Earlier rounds of this PR tried %expect-less variants for "Bison version portability," but Bison's %expect is exact-match — so the right answer is to pin the totals and update them deliberately if a future Bison reports different counts, not to remove the alarm.

Parser (cypher_analyze.c, cypher_clause.c)

  • Existing EXISTS sublink handling already covers the pattern case — no new transform code needed on the analyze/clause side.

Files changed

File Change
src/backend/parser/cypher_gram.y %glr-parser, %dprec annotations, anonymous_path in expr_atom, make_exists_pattern_sublink(), GLR comment block explaining why conflicts are expected
Makefile BISONFLAGS += -Werror -Wno-error=conflicts-sr -Wno-error=conflicts-rr — keep -Werror for other warning categories, scope relaxation to the two known-noise conflict categories; register pattern_expression regression test
regress/sql/pattern_expression.sql New regression suite: WHERE, NOT, nested, multi-hop, RETURN projection, mixed projections, CASE WHEN, AND/OR combinators, SET, WITH pipeline, regression guards for parenthesized arithmetic
regress/expected/pattern_expression.out Expected output for above

Test plan

  • Basic pattern in WHERE: WHERE (a)-[:KNOWS]->(b)
  • NOT pattern in WHERE: WHERE NOT (a)-[:KNOWS]->(:Person)
  • Pattern in WHERE with label: WHERE (a)-[:KNOWS]->(:Person)
  • Pattern via EXISTS subquery (original syntax still works)
  • Pattern in RETURN with AS alias → boolean column
  • Pattern in RETURN without alias (positional column)
  • Multiple pattern expressions in same RETURN
  • Pattern in CASE WHEN ... THEN ... ELSE ... END
  • Pattern combined with AND and OR in RETURN
  • Pattern in SET a.flag = (a)-[:R]->(:L)
  • Pattern in WITH projection + subsequent WHERE filter
  • Regression guards: RETURN (1 + 2), RETURN (n.name) — parenthesized expressions still work
  • All 32 regression tests pass (31 existing + pattern_expression)
  • Build is clean under -Werror with %expect 7 %expect-rr 3 pinned (Bison 3.8.2); any conflict-count drift now fails the build deliberately

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds openCypher-compatible “pattern expressions” as boolean predicates in WHERE by resolving the (a) ambiguity (parenthesized expression vs node pattern) using Bison GLR parsing.

Changes:

  • Switch cypher_gram.y to %glr-parser, add %dprec annotations, and introduce make_exists_pattern_sublink() to share EXISTS(pattern) wrapping logic.
  • Allow anonymous_path to appear as an expr_atom, translating bare patterns into an EXISTS sublink boolean.
  • Add regression coverage for pattern expressions (regress/sql + regress/expected) and register the new test in REGRESS.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
src/backend/parser/cypher_gram.y Enables bare pattern parsing in expression context via GLR + %dprec, refactors EXISTS wrapping into a helper.
regress/sql/pattern_expression.sql New regression test cases covering pattern expressions in WHERE and boolean combinations.
regress/expected/pattern_expression.out Expected output for the new regression test.
Makefile Adds pattern_expression to the regression test run list.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/backend/parser/cypher_gram.y Outdated
Comment thread src/backend/parser/cypher_gram.y
Comment thread regress/sql/pattern_expression.sql
@jrgemignani
Copy link
Copy Markdown
Contributor

@gregfelice Please see the above comments by Copilot

@gregfelice
Copy link
Copy Markdown
Contributor Author

Addressed all 3 Copilot suggestions:

  1. Misplaced comment — Moved "Helper function to create an ExplainStmt node" from above make_exists_pattern_sublink() to directly above make_explain_stmt() where it belongs.

  2. %expect/%expect-rr documentation — Added block comment explaining the conflict budget: 7 shift/reduce from path extension vs arithmetic operators on -/<, 3 reduce/reduce from expr_var vs var_name_opt on )/}/=. All resolved by GLR forking + %dprec annotations. Noted to update counts if grammar rules change.

  3. Misleading test comment — Changed "Regular expressions still work" to "Regular (non-pattern) expressions still work" to avoid confusion with regex.

Regression test passes (pattern_expression: ok).

@jrgemignani
Copy link
Copy Markdown
Contributor

@gregfelice Did you push them?

@gregfelice
Copy link
Copy Markdown
Contributor Author

@jrgemignani Pushed now — sorry about that! All 3 Copilot suggestions addressed in commit 5d11080.

And noted on the workflow — I'll reply directly to Copilot's comments going forward instead of posting standalone. Thanks for the heads up.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/backend/parser/cypher_gram.y
Comment thread src/backend/parser/cypher_gram.y
@jrgemignani
Copy link
Copy Markdown
Contributor

@gregfelice Please see Copilot above

@gregfelice
Copy link
Copy Markdown
Contributor Author

gregfelice commented Apr 6, 2026 via email

@gregfelice gregfelice changed the title Support pattern expressions in WHERE clause via GLR parser Support pattern expressions as boolean expressions Apr 15, 2026
@gregfelice
Copy link
Copy Markdown
Contributor Author

@jrgemignani — both Copilot items addressed and threads resolved. The %expect concern is moot (we use -Wno-conflicts-sr -Wno-conflicts-rr via BISONFLAGS instead), and the broader pattern-expression surface area is intentional, documented in the PR description, and covered by regression tests for WHERE, RETURN, CASE WHEN, SET, WITH, and boolean combinators. Ready for re-review. Thanks!

@gregfelice gregfelice force-pushed the feature_pattern_in_where branch from d05092d to 908adc6 Compare April 20, 2026 14:50
@gregfelice
Copy link
Copy Markdown
Contributor Author

Rebased onto current master (new HEAD 908adc61) and force-pushed. Conflicts were all alphabetical-merge reconciliation with #2359: Makefile regression-test list, helper function declarations in cypher_gram.y, and the predicate-function body placement. Dropped one duplicate /* Helper function to create an ExplainStmt node */ comment that came along with the HEAD side of the conflict. No logic changes from the rebase.

Local PG18-cassert installcheck is green on all 34 AGE tests including the new pattern_expression and upstream's new predicate_functions (only pgvector fails, environmental — extension not installed on my cassert PG).

CI on the new SHA is in action_required pending maintainer approval. @MuhammadTahaNaveed / @jrgemignani — would appreciate a kick when convenient. Thanks!

gregfelice and others added 3 commits April 23, 2026 13:57
…che#1577)

Enable bare graph patterns as boolean expressions in WHERE clauses:

  MATCH (a:Person), (b:Person)
  WHERE (a)-[:KNOWS]->(b)        -- now valid, equivalent to EXISTS(...)
  RETURN a.name, b.name

Previously, this required wrapping in EXISTS():
  WHERE EXISTS((a)-[:KNOWS]->(b))

The bare pattern syntax is standard openCypher and is used extensively
in Neo4j.  Its absence was the most frequently cited migration blocker.

Implementation approach:
- Switch the Cypher parser from LALR(1) to Bison GLR mode.  GLR handles
  the inherent ambiguity between parenthesized expressions '(' expr ')'
  and graph path nodes '(' var_name label_opt props ')' by forking the
  parse stack and discarding the failing path.
- Add anonymous_path as an expr_atom alternative with %dprec 1 (lower
  priority than expression path at %dprec 2).  The action wraps the
  pattern in a cypher_sub_pattern + EXISTS SubLink, reusing the same
  transform_cypher_sub_pattern() machinery as explicit EXISTS().
- Extract make_exists_pattern_sublink() helper shared by both
  EXISTS(pattern) and bare pattern rules.
- Fix YYLLOC_DEFAULT to use YYRHSLOC() for GLR compatibility.
- %dprec annotations on expr_var/var_name_opt resolve the reduce/reduce
  conflict between expression variables and pattern node variables.

Conflict budget: 7 shift/reduce (path extension vs arithmetic on -/<),
3 reduce/reduce (expr_var vs var_name_opt on )/}/=).  All are expected
and handled correctly by GLR forking + %dprec disambiguation.

All 32 regression tests pass (31 existing + 1 new).  New
pattern_expression test covers: bare patterns, NOT patterns, labeled
nodes, AND/OR combinations, left-directed patterns, anonymous nodes,
multi-hop patterns, EXISTS() backward compatibility, and non-pattern
expression regression checks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. Move "Helper function to create an ExplainStmt node" comment from
   above make_exists_pattern_sublink() to above make_explain_stmt()
   where it belongs.

2. Add block comment documenting the %expect/%expect-rr conflict
   budget: 7 S/R from path vs arithmetic on - and <, 3 R/R from
   expr_var vs var_name_opt on ) } =.

3. Clarify test comment: "Regular expressions" -> "Regular (non-pattern)
   expressions" to avoid confusion with regex.

Regression test: pattern_expression OK.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Pattern expressions are now accepted anywhere an expr is valid
  (RETURN, WITH, SET, CASE, boolean combinations), not only WHERE.
  This matches openCypher semantics and documents the broader surface
  area that was already implicitly enabled by adding anonymous_path
  to expr_atom.  Added regression tests for each new context:
  RETURN projection (bare and AS-aliased), mixed with other
  projections, CASE WHEN, boolean AND/OR combinators, SET to
  persist a computed boolean property, and WITH ... WHERE pipeline.

- Remove the hardcoded `%expect 7` / `%expect-rr 3` conflict budget
  from cypher_gram.y.  The exact conflict counts can drift across
  Bison versions and distros, which would break builds even though
  the grammar is correct (GLR handles the conflicts at runtime via
  fork + %dprec).  Instead, pass -Wno-conflicts-sr / -Wno-conflicts-rr
  via BISONFLAGS in the Makefile so the build stays clean without
  binding us to a specific Bison release.  Kept a block comment in
  the grammar explaining why GLR conflicts are expected and how
  they resolve.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@gregfelice gregfelice force-pushed the feature_pattern_in_where branch from 908adc6 to 61426cf Compare April 23, 2026 17:59
@gregfelice
Copy link
Copy Markdown
Contributor Author

Rebased onto current master (new HEAD 61426cf0) and force-pushed to clear mergeability drift. Three commits replay on top of master (84e29542). Only conflict was the Makefile bison-flags line: master still carries -Werror, this branch replaces it with -Wno-conflicts-sr -Wno-conflicts-rr plus the explanatory comment block (the whole reason for the PR, since pattern-expression grammar introduces inherent GLR conflicts). No logic changes from the rebase.

Local PG18-cassert installcheck green on all 34 AGE tests including pattern_expression (51 ms).

CI on the new SHA will sit in action_required pending maintainer approval — would you mind kicking it when you have a moment? Thanks.

@jrgemignani
Copy link
Copy Markdown
Contributor

@gregfelice

From Opus 4.7 -

Deep review: PR #2360 — Pattern expressions as boolean expressions

I read the PR description and the actual diff (cypher_gram.y, Makefile, regression test). Status: 3 commits, force-pushed onto current master (84e29542), Copilot's 3+2+0 rounds resolved, jrgemignani approved. Here's my analysis.

What it does (correctly)

The change is fundamentally sound:

  1. Switches cypher_gram.y to %glr-parser. Right call — the (a) ambiguity (parenthesized expr vs. node pattern) genuinely is unresolvable at one-token lookahead. LALR(1) can't see far enough to know whether ( introduces (1+2) or (a)-[:R]->(b). GLR is the textbook tool.
  2. Adds anonymous_path %dprec 1 to expr_atom, alongside the existing '(' expr ')' %dprec 2. Bare (a) tied to both productions — %dprec 2 > %dprec 1 correctly disambiguates to expression-variable.
  3. Refactors EXISTS-wrapping into make_exists_pattern_sublink() — and re-uses it for both the existing EXISTS(pattern) production and the new bare-pattern production. The refactor is byte-identical to what the inline code did (same cypher_sub_pattern, CSP_EXISTS, EXISTS_SUBLINK, node_to_agtype(..., "boolean", ...)). Zero semantic change for the existing EXISTS path. ✓
  4. YYRHSLOC(rhs, 1) macro fix — silently correct and necessary. GLR has a different rhs stack layout than LALR(1); (rhs)[1] indexes the wrong slot under GLR. This isn't mentioned in the description but it's a real correctness fix the PR depends on.
  5. Tests cover meaningful surface area — WHERE, NOT, AND/OR, RETURN (alias + positional), mixed projections, CASE WHEN, SET, WITH pipeline, backward-compat with EXISTS(...). Good coverage of the new contexts.

Concerns worth raising

1. Removing -Werror is heavier than needed

The Makefile change is:

- src/backend/parser/cypher_gram.c: BISONFLAGS += --defines=... -Werror
+ src/backend/parser/cypher_gram.c: BISONFLAGS += --defines=... -Wno-conflicts-sr -Wno-conflicts-rr

This drops -Werror entirely, not just the two conflict categories. Any other Bison warning (deprecated %name-prefix=, undeclared types, unused rules, etc.) will silently be ignored going forward.

Suggested: keep -Werror, add only the two no-error opts:

BISONFLAGS += --defines=... -Werror -Wno-error=conflicts-sr -Wno-error=conflicts-rr

Bison 3.4+ supports -Wno-error=<category>. Keeps the safety net on everything else.

2. No %expect budget at all is fragile

The description's argument — "exact counts can drift across Bison versions" — is correct. But going to "any number of conflicts is fine" loses the alarm bell that catches new, unintentional conflicts a future grammar change might introduce. A new rule that adds 12 more conflicts would silently slip in.

Suggested: %expect 7 / %expect-rr 3 with a code comment "expected to drift on Bison upgrades; update as needed." This is the standard PostgreSQL approach in gram.y (which has %expect 0 and updates when needed). The point of %expect is not to pin a version, it's to fail loudly when unexpected conflicts appear. If the worry is distro drift, the right answer is a wider %expect value (e.g., %expect 12 %expect-rr 5) — generous enough to absorb Bison-version churn, tight enough to flag a new grammar bug.

3. GLR runtime cost on the parse hot path

GLR forks parse stacks at every conflict point. On ambiguous prefixes it does several full reductions before discarding losers. AGE parses Cypher on every single cypher() call, often inside tight per-row contexts (UDF arguments, prepared statements re-planned, etc.). I don't see any parse-time benchmark in the PR.

Concretely: a MATCH (a) query enters the ( ambiguity at character 7, and GLR will explore both '(' expr ')' and anonymous_path until it sees the closing ) and what follows. For short read-only queries, parser time is non-trivial relative to plan+execute.

Recommended diligence before merge:

  • Microbench: pgbench -f script.sql -T 30 -c 4 running 1000× of typical Cypher query (MATCH (a:Person {id: $1}) RETURN a) on LALR build vs. GLR build. Look for backend CPU% in parse.
  • Spot-check EXPLAIN (ANALYZE, BUFFERS) SELECT cypher(...) planning time, before/after.

If parse time regresses meaningfully (say >5%) on simple queries, this is worth knowing and documenting, even if it's accepted as the cost of the feature. If it's noise (<1%), it's a non-issue.

4. Pattern expression semantics — coverage gaps

The grammar change makes anonymous_path legal anywhere an expr_atom is. That opens the door wider than the tests probe:

  • Single-node labeled patterns in expression position: RETURN (a:Person) is, in openCypher, the pattern "does a have label Person?" — a boolean. Under this PR, (a:Person) would be parsed by GLR — does it pick the (expr) branch (which would fail because a:Person isn't an expression) or the anonymous_path branch (which would succeed)? Per %dprec, both fork; the failing fork is dropped; we get the pattern interpretation. The test suite doesn't include this case. Worth adding.
  • In list/map literals: RETURN [(a)-[:R]->(b), (a)-[:S]->(c)] — list of booleans. RETURN {flag: (a)-[:R]->(b)} — map containing a boolean. The grammar now allows these. Untested.
  • In function arguments: RETURN count((a)-[:R]->(b)). Should work (boolean is countable). Untested.
  • In OPTIONAL MATCH WHERE: OPTIONAL MATCH (a) WHERE (a)-[:R]->(:L). Untested.
  • Nested EXISTS: WHERE EXISTS((a)-[:R]->(b)) AND (a)-[:S]->(c). The PR's "mixed" test doesn't combine the two syntaxes in one predicate. Untested.

None of these is a blocker — they're follow-up coverage. But the PR description claims "matching what openCypher defines and what Neo4j users expect," and these are exactly the cases Neo4j users will try.

5. Performance of the resulting plan

Bare pattern → EXISTS sublink → planner builds a correlated subquery. In a RETURN a.name, (a)-[:KNOWS]->(:Person) AS knows over a million rows, the planner has to either:

  • (a) Evaluate the EXISTS sublink per row (O(N) sublink evals), or
  • (b) Convert it to a semi-join (PG's pull_up_sublinks).

For Cypher, the pattern is a SRF (cypher's own scan), not a SQL subquery the planner can semi-join in the usual way. So path (a) is what will happen, and on large data it will be slow.

That's the correct semantics. But it's worth documenting in the PR or a follow-up that pattern-expressions in projection contexts are O(rows × subquery cost), distinct from pattern-expressions in WHERE (where the planner might pull it up). Tests use 4 rows — invisible. Real workloads would notice.

6. The expr_func_subexpr: EXISTS '(' anonymous_path ')' production stays

It's now redundant with anonymous_path %dprec 1 in expr_atom. Both produce the same make_exists_pattern_sublink(...) output. The grammar will accept both spellings. Keeping the explicit form is fine for backward compat (users may have written WHERE EXISTS((...)) for years), but it does add to the conflict count. Could be noted in the description as "intentionally redundant for backward compatibility."

7. var_name_opt %dprec 1 on both alternatives

var_name_opt:
    /* empty */ %dprec 1
        { $$ = NULL; }
    | var_name %dprec 1
    ;

Equal priorities on both alternatives is a no-op for %dprec. Bison will treat this as "no preference" and discard the failing fork. That's fine, but the annotation is misleading — the priorities don't disambiguate, they're inert. Either remove them or set unequal values. Minor.

Verdict

Direction: correct. GLR is the right tool, the refactor preserves semantics exactly, the test coverage spans the new contexts, and Copilot's review rounds caught the small stuff.

Items I'd want addressed before merge:

  1. Keep -Werror, scope down to -Wno-error=conflicts-{sr,rr} rather than nuking -Werror.
  2. Re-introduce some %expect budget (generous, e.g. %expect 10 %expect-rr 5) so future grammar changes that add unintended conflicts get flagged.
  3. Quick parse-time microbench (LALR vs. GLR) on a couple of representative read-only Cypher queries. Even a one-line "no measurable regression on MATCH (a:L {p: $1}) RETURN a" in the PR description would satisfy.

Items I'd address as a follow-up (not blocking):

  1. Test coverage gaps: RETURN (a:Label), pattern in list/map literals, pattern in function args, OPTIONAL MATCH WHERE.
  2. Document the EXISTS-sublink-per-row cost in projection contexts so users aren't surprised on large data.
  3. Note the YYRHSLOC fix in the PR body — it's not just a stylistic change, it's required for GLR correctness.

Items that are fine as-is:

  1. The make_exists_pattern_sublink refactor (semantics preserved byte-for-byte).
  2. The conflict explanation block comment (genuinely helpful for future grammar maintainers).
  3. The decision to broaden scope from "WHERE only" to "all expression contexts" — Copilot was right, the grammar change inherently does this and a WHERE-only restriction would have required a parallel grammar branch.

Overall, this is a high-quality feature PR that solves a real openCypher-compat gap, and the maintainer approval is well-earned. The Makefile -Werror removal and absent %expect budget are the two items I'd push back on; everything else is polish.

@jrgemignani jrgemignani self-requested a review May 13, 2026 15:11
Copy link
Copy Markdown
Contributor

@jrgemignani jrgemignani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the Opus 4.7 review

@jrgemignani
Copy link
Copy Markdown
Contributor

@gregfelice I'm being extra careful due to this modifying the grammar.

Reverts the broad `-Werror` drop and the no-%expect approach from the
prior round on jrgemignani's request.  The earlier framing — that
conflict counts drift across Bison versions, so %expect is fragile —
overcorrected: it removed the only build-time alarm bell for unintended
new conflicts.

Makefile: keep -Werror so any unexpected Bison warning (unused rules,
undeclared types, etc.) still fails the build; downgrade only the two
conflict categories to plain warnings via -Wno-error=conflicts-sr
-Wno-error=conflicts-rr.  pgxs auto-adds -Wno-deprecated, so existing
%name-prefix= / %pure-parser directives remain non-erroring.

cypher_gram.y: add `%expect 7` and `%expect-rr 3` matching the
Bison 3.8.2 totals.  Bison treats %expect as exact-match, not as a
ceiling — any deviation fails the build and forces an audit of the new
conflicts.  Comment updated to reflect that future Bison versions
reporting different counts should bump the numbers explicitly with a
version note in the commit message, rather than removing the directive.

No grammar or runtime change.  Cassert installcheck 34/34 AGE tests
green.
@gregfelice
Copy link
Copy Markdown
Contributor Author

@jrgemignani — thanks for the careful review. Addressed all three blocking asks in 553e9761, with one factual correction to the underlying reasoning that I want to surface explicitly:

1. Makefile -Werror scope — kept -Werror, downgraded only the two conflict categories via -Wno-error=conflicts-sr -Wno-error=conflicts-rr. pgxs auto-adds -Wno-deprecated so the existing %name-prefix= / %pure-parser directives still build clean. Any other Bison warning (unused rules, undeclared types, etc.) now fails the build again — same coverage as master, minus the two known-noise categories.

2. %expect / %expect-rr budget — added %expect 7 and %expect-rr 3 matching the Bison 3.8.2 totals. Caveat: Bison treats %expect as exact-match, not as a ceiling, so the "wider budget to absorb Bison drift" framing in the review doesn't actually work — %expect 12 on a 7-conflict grammar fails the build with found 7, expected 12 (verified empirically). Exact pinning is actually a stronger signal than a generous bound: it catches both new conflicts AND silent disappearance of expected ones (either is a surprise worth auditing). The grammar comment documents that future Bison versions reporting different counts should bump these numbers explicitly with a version note in the commit message.

3. Parse-time microbench — Phase 3.5 release-build perf preflight — built a user-owned PG18 release cluster (-O2, no cassert) and ran a pgbench equivalent of the workload-mapping-prescribed smoke (W01 + W04 for parser-only changes), 5 iterations × 15 s × 1 client per side:

LALR (master) mean GLR (this PR) mean delta
W04-style: MATCH (a:L {p: 42}) RETURN a (1000-node graph) 4944 TPS (4904–4976, ±1.5%) 4979 TPS (4961–4999, ±0.8%) +0.7%
W01-style: CREATE (:L {p: 99}) (in rolled-back tx) 11064 TPS (10749–11353, ±5.6%) 11066 TPS (10920–11176, ±2.3%) +0.02%

Both deltas well below the 5% gate, and W01's LALR-side range (5.6%) is itself wider than the LALR-vs-GLR delta. No measurable parse-time regression on representative read-only or write Cypher.

Cassert installcheck: 34/34 AGE tests green (pgvector skipped — vector extension not installed in my local cassert PG; unrelated to this PR).

I also updated the PR description to (a) note the YYRHSLOC(rhs, 1) replacement is a GLR-stack correctness fix, not stylistic, and (b) reflect the restored -Werror + %expect 7/3 approach.

Non-blocking follow-ups from the review — your call on bundling:

  • Test coverage for RETURN (a:Label), list/map literals, function args, OPTIONAL MATCH WHERE — happy to fold into this PR's pattern_expression regression suite as a second commit if you want them here, otherwise I'll open a follow-up.
  • Doc note on EXISTS-sublink-per-row cost in projection contexts — better as a standalone docs commit so it's discoverable from the Cypher reference, not just this PR's history.

@jrgemignani
Copy link
Copy Markdown
Contributor

@gregfelice Ty. Just me - I prefer the builds to fail on anything, warnings included :) And grammars are tricky.

@jrgemignani
Copy link
Copy Markdown
Contributor

jrgemignani commented May 14, 2026

@gregfelice I would add the non-blocking follow ups that you listed. Follow ups sometimes get forgotten. Otherwise, everything else looks good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants