Treat lookahead/lookbehind as zero-width in RegexGroupParser and look through non-capturing groups in getRootAlternation#5603
Open
phpstan-bot wants to merge 1 commit intophpstan:2.1.xfrom
Conversation
…ok through non-capturing groups in `getRootAlternation` - In `walkGroupAst`, skip recursion into `#lookahead`, `#negativelookahead`, `#lookbehind`, `#negativelookbehind` children since they are zero-width assertions that do not contribute characters to the match - In `isMaybeEmptyNode`, return true for lookahead/lookbehind nodes (always zero-width) and handle `#alternation` correctly: an alternation is maybe-empty if ANY branch is maybe-empty, not ALL (fixing the generic recursion which required all children to be maybe-empty) - Isolate `$isNonFalsy` per alternation branch in `isMaybeEmptyNode` to prevent a non-falsy literal in one branch from incorrectly marking the entire alternation as non-falsy - In `getRootAlternation`, look through `#noncapturing` groups to find the alternation inside, enabling union type creation for patterns like `((?:|\d+))` → `''|numeric-string`
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Regex capturing groups containing lookahead/lookbehind assertions or empty alternation branches inside non-capturing groups were incorrectly inferred as
non-empty-stringornon-falsy-stringwhen they could match empty strings. This fix teachesRegexGroupParserthat lookahead/lookbehind assertions are zero-width (don't consume characters) and that alternation branches should use "any" semantics for maybe-empty checks.Changes
src/Type/Regex/RegexGroupParser.php:walkGroupAst: Added early return for#lookahead,#negativelookahead,#lookbehind,#negativelookbehindnodes to prevent their content from influencingnonEmpty/nonFalsy/numericstateisMaybeEmptyNode: Added early returntruefor lookahead/lookbehind nodes (zero-width assertions). Added special handling for#alternationnodes: an alternation is maybe-empty if ANY branch is maybe-empty (previously used the generic recursion which required ALL children to be maybe-empty). Isolated$isNonFalsyper alternation branch to prevent cross-branch leakagegetRootAlternation: Extended to look through#noncapturinggroups for both#capturingand#namedcapturinggroups, enabling union type creation for patterns like((?:|\d+))→''|numeric-stringtests/PHPStan/Analyser/nsrt/preg_match_shapes.php:bug12840test function with 9 test cases covering: empty alternation in non-capturing group, lookahead/lookbehind/negative variants in alternation, lone lookahead in group, two non-capturing groups with empty alternation, named capturing groups with non-capturing wrappers, and lookahead in concatenation with literalRoot cause
Three interrelated issues in
RegexGroupParser:Lookahead/lookbehind treated as character-consuming:
walkGroupAstrecursed into their children, causing the content of zero-width assertions (e.g., thexin(?=x)) to setnonEmpty=yesas if the assertion consumed characters.isMaybeEmptyNodesimilarly didn't recognize them as zero-width.isMaybeEmptyNodeused wrong semantics for alternation: The generic recursion checked if ALL children were maybe-empty (correct for concatenation) but alternation only needs ANY branch to be maybe-empty. This meant(?:(?=x)|y)was considered non-empty because theybranch was non-empty, even though the(?=x)branch is zero-width.getRootAlternationdidn't see through non-capturing groups: Patterns like((?:|\d+))couldn't use the union-type code path (which creates''|numeric-string) because the alternation was wrapped in a#noncapturingnode.Test
bug12840function intests/PHPStan/Analyser/nsrt/preg_match_shapes.phpwith 9assertTypecalls covering:((?:|\d+))→''|numeric-string(?=...),(?<=...),(?!...),(?<!...)→string(notnon-falsy-string)((?=x))→string(notnon-empty-string)(?P<g>(?:|\d+))→''|numeric-string((?=x)a)→non-falsy-string(correctly preserved)Fixes phpstan/phpstan#12840