Skip to content

PHQL parser fixes#5

Merged
niden merged 4 commits intophalcon:1.0.xfrom
niden-code:1.0.x
Apr 9, 2026
Merged

PHQL parser fixes#5
niden merged 4 commits intophalcon:1.0.xfrom
niden-code:1.0.x

Conversation

@niden
Copy link
Copy Markdown
Member

@niden niden commented Apr 9, 2026

Summary

Pure-PHP PHQL scanner and parser port (phalcon-phql) had several bugs preventing correct tokenization and AST generation. This PR fixes all of them; all 136 @group phql tests now pass (1 skipped) and the full MySQL database suite (775 tests) is unaffected.


Fixes

1. Scanner — PHP 8 loose comparison: digit '0''8' matched control-character cases

File: src/Scanner/Scanner.php

In PHP 8, comparing a numeric string to an integer uses numeric equality: "0" == 0x00true. The re2c-generated state machine used integer hex literals (case 0x00: through case 0x08:) in switch statements keyed on a single-character string. This caused digit characters '0''8' to match control-character cases instead of the default: fall-through, routing them to error or EOF states.

Fix: Changed all case 0x00:case 0x08: to case "\x00":case "\x08": (string literals, compared lexicographically). Changed if ($yych <= 0x00) to if ($yych === "\x00").


2. Scanner — break 2 in states 8 and 12 exited while(true) prematurely

File: src/Scanner/Scanner.php

The inner scanning loop is structured as:

while (true) {          // (A)
    switch ($yystate) { // (B)
        case 0:
            switch ($yych) { // (C) — nested switch present
                case '"': $yystate = 8; break 2; // exits (C)+(B), stays in (A) ✓
            }
        case 8:
            $yystate = 68;
            break 2; // exits (B)+(A) — no nested switch, so while(true) exits ✗
    }
}

States 8 and 12 (the entry points for "…" and '…' string literals) had break 2 directly inside switch ($yystate) with no inner switch. This exited both the outer switch and while(true), sending control back to the outer while (RETCODE_IMPOSSIBLE == $status) which reset $yystate = 0. The scanner restarted at state 0 with the cursor already past the opening quote, treating the string contents as a bare identifier.

Fix: Changed break 2 → break in cases 2, 8, and 12 so the inner while(true) continues to the next state.


  1. Scanner — State 70 included the opening quote in the string value

File: src/Scanner/Scanner.php

State 70 emits the PHQL_T_STRING token. It extracted the value as:

substr($yyinput, $q, $yycursor - $q - 1)

$q is the cursor position at the start of scanForToken() — i.e. the position of the opening " or '. In the C re2c original, YYMARKER (q) is advanced past the opening quote inside pp11/pp16 (ppch = *(PPMARKER = ++PPCURSOR)), so the value is extracted without the leading delimiter.

The PHP equivalent sets $yymarker = $yycursor in states 8/12 after state 0 has already advanced the cursor past the opening quote. Using $yymarker instead of $q gives the correct content.

Fix:

// Before
$token->value = substr($yyinput, $q, $yycursor - $q - 1);

// After
$token->value = substr($yyinput, $yymarker, $yycursor - $yymarker - 1);

  1. Scanner — SPLACEHOLDER/BPLACEHOLDER values included leading :

File: src/Scanner/Scanner.php

Cases 142 (PHQL_T_SPLACEHOLDER) and 194 (PHQL_T_BPLACEHOLDER) extracted the placeholder value starting from $q (the position of the leading :). Query.php prepends ":" when building the placeholder, producing ::name instead of :name.

Fix: Changed extraction offset from $q to $q + 1 and length from

$yycursor - $q - 1 to $yycursor - $q - 2.                     

  1. Scanner — PHQL_T_EQUALS and PHQL_T_LESS were string opcodes

File: src/Scanner/Opcode.php

// Before                                                     
public const PHQL_T_EQUALS = '=';
public const PHQL_T_LESS   = '<';

// After
public const PHQL_T_EQUALS = 61; // ord('=')
public const PHQL_T_LESS   = 60; // ord('<')

Query.php uses integer constants (case self::PHQL_T_EQUALS:) in strict PHP 8 switch comparisons. String opcodes never matched, causing = and < expressions to throw "Unknown expression type".


  1. Parser — distinct_all and distinct_or_null empty productions returned wrong values

File: resources/files/parser.php

  • Case 9 (distinct_all ::= — no DISTINCT/ALL in SELECT): returned [], causing isset($expr["distinct"]) to be true and prepending ALL to every SELECT. Fixed to return null.

  • Case 135 (distinct_or_null ::= — no DISTINCT in function call): returned [], causing COUNT(*) to render as COUNT(DISTINCT *). Fixed to return null.

  • Case 134 (distinct_or_null ::= DISTINCT): returned 0 (falsy), causing COUNT(DISTINCT col) to drop DISTINCT. Fixed to return true.

  • Property $output type widened from array to mixed to accommodate null and bool reduction values.


  1. Parser — Optional clauses returned [] instead of null

File: resources/files/parser.php

Cases 69, 71, 78, 83, 85, 89, 91, and 137 correspond to empty optional productions: where_clause ::=, limit_clause ::=, order_clause ::=, group_clause ::=, having_clause ::=, etc. These returned [].

The phql_ret_delete_statement, phql_ret_update_statement, and phql_ret_select_statement helpers guard with if ($x !== null) before adding keys to the AST — matching the C original's if (Z_TYPE_P(W) != IS_UNDEF). An empty array bypasses that guard, inserting "where" => [] into the AST. Query.php then calls getExpression([]) which throws "Unknown expression".

Fix: All eight cases now return null, matching cphalcon's ZVAL_UNDEF.


Test results

Tests: 136, Assertions: 780, Skipped: 1      (@group phql, MySQL)
Tests: 775, Assertions: 2970, Skipped: 106   (full database suite, MySQL)

@niden niden requested a review from Jeckerson April 9, 2026 18:52
@niden niden self-assigned this Apr 9, 2026
@niden niden added the bug Something isn't working label Apr 9, 2026
@niden niden added this to Phalcon v6 Apr 9, 2026
@github-project-automation github-project-automation Bot moved this to In progress in Phalcon v6 Apr 9, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 9, 2026

Codecov Report

❌ Patch coverage is 46.61017% with 63 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (1.0.x@84228f5). Learn more about missing BASE report.

Files with missing lines Patch % Lines
src/Parser.php 0.00% 35 Missing ⚠️
src/Scanner/Scanner.php 62.06% 22 Missing ⚠️
resources/files/parser.php 82.35% 3 Missing ⚠️
src/Parser/Status.php 60.00% 2 Missing ⚠️
src/Parser/Parser.php 66.66% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##             1.0.x       #5   +/-   ##
========================================
  Coverage         ?   32.90%           
  Complexity       ?     5600           
========================================
  Files            ?        7           
  Lines            ?     8683           
  Branches         ?        0           
========================================
  Hits             ?     2857           
  Misses           ?     5826           
  Partials         ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@niden niden merged commit 2b5d521 into phalcon:1.0.x Apr 9, 2026
13 checks passed
@github-project-automation github-project-automation Bot moved this from In progress to Implemented in Phalcon v6 Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

Status: Implemented

Development

Successfully merging this pull request may close these issues.

2 participants