You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The cut operator (`^`) is a backtracking fence. Once the expression
to its left succeeds, we become committed to the alternative; the
remainder of the expression must parse successfully or parsing will
fail. See *Packrat Parsers Can Handle Practical Grammars in Mostly
Constant Space*, Mizushima et al.,
<https://kmizu.github.io/papers/paste513-mizushima.pdf>.
This operator solves a problem for us with C string literals. These
literals cannot contain a null escape. But if we simply fail to lex
the literal (e.g. `c"\0"`), we may instead lex it successfully as two
separate tokens (`c "\0"), and that would be incorrect.
As long as we only use cut to express constraints that can be
expressed in a regular language and we keep our alternations disjoint,
the grammar can still be mechanically converted to a CFG.
Let's add the cut operator to our grammar and use it for C string
literals and some similar constructs.
In the railroad diagrams, we'll render the cut as a "no backtracking"
box around the expression or sequence of expressions after the cut.
The idea is that once you enter the box the only way out is forward.
The general format is a series of productions separated by blank lines. The expressions are as follows:
@@ -110,6 +113,7 @@ The general format is a series of productions separated by blank lines. The expr
110
113
| Prose |\<any ASCII character except CR\>| An English description of what should be matched, surrounded in angle brackets. |
111
114
| Group | (\`,\` Parameter)+ | Groups an expression for the purpose of precedence, such as applying a repetition operator to a sequence of other expressions.
112
115
| NegativeExpression |~\[\`\` LF\]| Matches anything except the given Charset, Terminal, or Nonterminal. |
116
+
| Cut | Expr1 ^ Expr2 \| Expr3 | The cut operator. Commits to the current alternative if the preceding expression matches. |
113
117
| Sequence |\`fn\` Name Parameters | A sequence of expressions that must match in order. |
114
118
| Alternation | Expr1 \| Expr2 | Matches only one of the given expressions, separated by the vertical pipe character. |
115
119
| Suffix |\_except \[LazyBooleanExpression\]\_| Adds a suffix to the previous expression to provide an additional English description, rendered in subscript. This can contain limited markdown, but try to avoid anything except basics like links. |
Below each grammar block is a button to toggle the display of a [syntax diagram]. A square element is a non-terminal rule, and a rounded rectangle is a terminal.
0 commit comments