Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
f97c6f4
Add support PSLR(1) parser generation
ydah Jan 2, 2026
e033a28
Add `yy_state_accepts_token` function and `YYSETSTATE_CONTEXT` macro …
ydah Mar 7, 2026
169b980
Wire PSLR generation into parser output
ydah Mar 8, 2026
4fc9591
Fail fast on unresolved PSLR inadequacies
ydah Mar 8, 2026
2532866
Split PSLR states by scanner profile
ydah Mar 8, 2026
281cf6b
Track PSLR inadequacies with full propagated lookaheads
ydah Mar 8, 2026
acef81a
Fail before emitting unresolved PSLR parsers
ydah Mar 8, 2026
3c3ed30
Handle PSLR pure-reduce scanner profiles
ydah Mar 8, 2026
405c733
Cover PSLR split paths end to end
ydah Mar 8, 2026
7f61df0
Finish chained PSLR state splits
ydah Mar 8, 2026
0107486
Add chained PSLR shift-angle regressions
ydah Mar 8, 2026
c9d06a8
Add mixed PSLR family regressions
ydah Mar 8, 2026
708a126
Add table-driven PSLR family coverage
ydah Mar 8, 2026
0ed2b8b
Add PSLR lexer bridge macros
ydah Mar 8, 2026
f93774e
Add PSLR growth diagnostics
ydah Mar 8, 2026
0025f78
Automate PSLR family exploration
ydah Mar 8, 2026
510b37a
Add PSLR function declarations block before tables and functions in y…
ydah Mar 8, 2026
1097230
Add `pslr_item_lookahead_set` to State for PSLR-specific lookaheads, …
ydah Mar 8, 2026
164fbca
Fix spec stubs and grammar fixtures to match renamed `pslr_scanner_en…
ydah Mar 8, 2026
f07e454
Add `YY_DECL` definition to PSLR integration fixture grammars to decl…
ydah Mar 8, 2026
f149937
Add PSLR-related RBS signatures, fix `replace_term_attributes` keywor…
ydah Mar 8, 2026
e344991
Enhance PSLR(1) parser support in documentation with new directives a…
ydah Mar 8, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,45 @@ program: args_list(f_opt(number), opt_tail(string), number)

https://github.com/ruby/lrama/pull/779

### [EXPERIMENTAL] Support the generation of the PSLR(1) parser described in this dissertation

Added experimental support for generating the PSLR(1) parser described in this dissertation.
https://open.clemson.edu/all_dissertations/519/

This adds the following PSLR-related grammar directives and integration points:

- `%define lr.type pslr` enables PSLR parser generation
- `%token-pattern` declares token candidates and their regular expressions for PSLR-aware lexical disambiguation
- `%lex-prec` declares how overlapping token patterns should be prioritized
- `%define api.pslr.state-member` names the parser-state field to be shared with the lexer when using the generated helper macros

Typical usage looks like this:

```yacc
%define api.pure
%define lr.type pslr
%define api.pslr.state-member current_state

%parse-param {struct parse_params *p}
%lex-param {struct parse_params *p}

%token-pattern RSHIFT />>/ "right shift"
%token-pattern RANGLE />/ "right angle"
%token-pattern ID /[a-z]+/

%lex-prec RANGLE -s RSHIFT
```

In this setup, `%token-pattern` lists the tokens that the PSLR scanner should consider, and `%lex-prec`
resolves conflicts between overlapping matches. For example, `%lex-prec RANGLE -s RSHIFT` tells Lrama to
prefer `RANGLE` over `RSHIFT` when the shorter token should win.

When the parser and lexer share a context through `%parse-param` / `%lex-param`, the generated header also
provides helpers such as `YYPSLR_PSEUDO_SCAN(...)`, so the lexer can choose a token based on the current parser
state.

But, currently PSLR(1) parser is experimental feature. If you find any bugs, please report it to us. Thank you.

## Lrama 0.7.1 (2025-12-24)

### Optimize IELR
Expand Down
2 changes: 2 additions & 0 deletions lib/lrama.rb
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,10 @@
require_relative "lrama/output"
require_relative "lrama/parser"
require_relative "lrama/reporter"
require_relative "lrama/scanner_fsa"
require_relative "lrama/state"
require_relative "lrama/states"
require_relative "lrama/length_precedences"
require_relative "lrama/tracer"
require_relative "lrama/version"
require_relative "lrama/warnings"
8 changes: 6 additions & 2 deletions lib/lrama/command.rb
Original file line number Diff line number Diff line change
Expand Up @@ -30,11 +30,11 @@ def execute_command_workflow
text = read_input
grammar = build_grammar(text)
states, context = compute_status(grammar)
states.validate!(@logger)
render_reports(states) if @options.report_file
@tracer.trace(grammar)
render_diagram(grammar)
render_output(context, grammar)
states.validate!(@logger)
@warnings.warn(grammar, states)
end

Expand Down Expand Up @@ -84,7 +84,11 @@ def prepare_grammar(grammar)
def compute_status(grammar)
states = Lrama::States.new(grammar, @tracer)
states.compute
states.compute_ielr if grammar.ielr_defined?
if grammar.pslr_defined?
states.compute_pslr
elsif grammar.ielr_defined?
states.compute_ielr
end
[states, Lrama::Context.new(states)]
end

Expand Down
111 changes: 111 additions & 0 deletions lib/lrama/grammar.rb
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@
require_relative "grammar/symbols"
require_relative "grammar/type"
require_relative "grammar/union"
require_relative "grammar/token_pattern"
require_relative "grammar/lex_prec"
require_relative "lexer"

module Lrama
Expand All @@ -40,6 +42,11 @@ class Grammar
# def nterms: () -> Array[Grammar::Symbol]
# def find_symbol_by_s_value!: (::String s_value) -> Grammar::Symbol
# def ielr_defined?: () -> bool
# def pslr_defined?: () -> bool
# def token_patterns: () -> Array[Grammar::TokenPattern]
# def lex_prec: () -> Grammar::LexPrec
# def pslr_max_states: () -> Integer?
# def pslr_max_state_ratio: () -> Float?
# end
#
# include Symbols::Resolver::_DelegatedMethods
Expand Down Expand Up @@ -68,6 +75,8 @@ class Grammar
# @union: Union
# @precedences: Array[Precedence]
# @start_nterm: Lrama::Lexer::Token::Base?
# @token_patterns: Array[Grammar::TokenPattern]
# @lex_prec: Grammar::LexPrec

extend Forwardable

Expand Down Expand Up @@ -100,6 +109,8 @@ class Grammar
attr_accessor :locations #: bool
attr_accessor :define #: Hash[String, String]
attr_accessor :required #: bool
attr_reader :token_patterns #: Array[Grammar::TokenPattern]
attr_reader :lex_prec #: Grammar::LexPrec

def_delegators "@symbols_resolver", :symbols, :nterms, :terms, :add_nterm, :add_term, :find_term_by_s_value,
:find_symbol_by_number!, :find_symbol_by_id!, :token_to_symbol,
Expand Down Expand Up @@ -133,6 +144,9 @@ def initialize(rule_counter, locations, define = {})
@required = false
@precedences = []
@start_nterm = nil
@token_patterns = []
@lex_prec = Grammar::LexPrec.new
@token_pattern_counter = 0

append_special_symbols
end
Expand Down Expand Up @@ -277,6 +291,7 @@ def validate!
validate_no_precedence_for_nterm!
validate_rule_lhs_is_nterm!
validate_duplicated_precedence!
validate_pslr_configuration!
end

# @rbs (Grammar::Symbol sym) -> Array[Rule]
Expand Down Expand Up @@ -304,8 +319,104 @@ def ielr_defined?
@define.key?('lr.type') && @define['lr.type'] == 'ielr'
end

# @rbs () -> bool
def pslr_defined?
@define.key?('lr.type') && @define['lr.type'] == 'pslr'
end

# @rbs () -> String?
def pslr_state_member
@define['api.pslr.state-member']
end

# @rbs () -> Integer?
def pslr_max_states
parse_pslr_positive_integer('pslr.max-states')
end

# @rbs () -> Float?
def pslr_max_state_ratio
parse_pslr_positive_float('pslr.max-state-ratio')
end

# Add a token pattern from %token-pattern directive
# @rbs (id: Lexer::Token::Ident, pattern: Lexer::Token::Regex, ?alias_name: String?, ?tag: Lexer::Token::Tag?, lineno: Integer) -> Grammar::TokenPattern
def add_token_pattern(id:, pattern:, alias_name: nil, tag: nil, lineno:)
token_pattern = Grammar::TokenPattern.new(
id: id,
pattern: pattern,
alias_name: alias_name,
tag: tag,
lineno: lineno,
definition_order: @token_pattern_counter
)
@token_pattern_counter += 1
@token_patterns << token_pattern

# Also register as a terminal symbol
add_term(id: id, alias_name: alias_name, tag: tag)

token_pattern
end

# Add a lex-prec rule from %lex-prec directive
# @rbs (left_token: Lexer::Token::Ident, operator: Symbol, right_token: Lexer::Token::Ident, lineno: Integer) -> Grammar::LexPrec::Rule
def add_lex_prec_rule(left_token:, operator:, right_token:, lineno:)
@lex_prec.add_rule(
left_token: left_token,
operator: operator,
right_token: right_token,
lineno: lineno
)
end

# Find a token pattern by its name
# @rbs (String name) -> Grammar::TokenPattern?
def find_token_pattern(name)
@token_patterns.find { |tp| tp.name == name }
end

private

# @rbs () -> void
def validate_pslr_configuration!
return unless pslr_defined?

member = pslr_state_member
if member && member !~ /\A[a-zA-Z_][a-zA-Z0-9_]*\z/
raise %(%define api.pslr.state-member must be a valid C identifier, got "#{member}".)
end

pslr_max_states
pslr_max_state_ratio
end

# @rbs (String key) -> Integer?
def parse_pslr_positive_integer(key)
value = @define[key]
return nil if value.nil? || value.empty?

parsed = Integer(value, 10)
raise %(%define #{key} must be greater than 0, got "#{value}".) unless 0 < parsed

parsed
rescue ArgumentError
raise %(%define #{key} must be an integer, got "#{value}".)
end

# @rbs (String key) -> Float?
def parse_pslr_positive_float(key)
value = @define[key]
return nil if value.nil? || value.empty?

parsed = Float(value)
raise %(%define #{key} must be greater than or equal to 1.0, got "#{value}".) unless 1.0 <= parsed

parsed
rescue ArgumentError
raise %(%define #{key} must be a number, got "#{value}".)
end

# @rbs () -> void
def sort_precedence
@precedences.sort_by! do |prec|
Expand Down
98 changes: 98 additions & 0 deletions lib/lrama/grammar/lex_prec.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# rbs_inline: enabled
# frozen_string_literal: true

module Lrama
class Grammar
# Represents lexical precedence rules defined by %lex-prec directive
# Based on Definition 3.2.3, 3.2.4, 3.2.10 from the PSLR dissertation
#
# Example: %lex-prec RANGLE -s RSHIFT # RANGLE is shorter than RSHIFT
# %lex-prec IF - ID # IF has higher priority than ID (same length)
class LexPrec
# Precedence relation types
# "," : Same priority (lex-tie)
# "-" : Left has higher priority than right
# "-s" : Left is shorter match priority over right
SAME_PRIORITY = :same #: Symbol
HIGHER = :higher #: Symbol
SHORTER = :shorter #: Symbol

# Represents a single precedence rule
class Rule
attr_reader :left_token #: Lexer::Token::Ident
attr_reader :operator #: Symbol
attr_reader :right_token #: Lexer::Token::Ident
attr_reader :lineno #: Integer

# @rbs (left_token: Lexer::Token::Ident, operator: Symbol, right_token: Lexer::Token::Ident, lineno: Integer) -> void
def initialize(left_token:, operator:, right_token:, lineno:)
@left_token = left_token
@operator = operator
@right_token = right_token
@lineno = lineno
end

# @rbs () -> String
def left_name
@left_token.s_value
end

# @rbs () -> String
def right_name
@right_token.s_value
end
end

attr_reader :rules #: Array[Rule]

# @rbs () -> void
def initialize
@rules = []
end

# @rbs (left_token: Lexer::Token::Ident, operator: Symbol, right_token: Lexer::Token::Ident, lineno: Integer) -> Rule
def add_rule(left_token:, operator:, right_token:, lineno:)
rule = Rule.new(
left_token: left_token,
operator: operator,
right_token: right_token,
lineno: lineno
)
@rules << rule
rule
end

# Check if token t1 has higher priority than t2
# Based on Definition 3.2.4
# @rbs (String t1, String t2) -> bool
def higher_priority?(t1, t2)
@rules.any? do |rule|
rule.operator == HIGHER &&
rule.left_name == t1 &&
rule.right_name == t2
end
end

# Check if token t1 has shorter-match priority over t2
# Based on Definition 3.2.15
# @rbs (String t1, String t2) -> bool
def shorter_priority?(t1, t2)
@rules.any? do |rule|
rule.operator == SHORTER &&
rule.left_name == t1 &&
rule.right_name == t2
end
end

# Check if tokens t1 and t2 are in a lex-tie relationship
# @rbs (String t1, String t2) -> bool
def same_priority?(t1, t2)
@rules.any? do |rule|
rule.operator == SAME_PRIORITY &&
((rule.left_name == t1 && rule.right_name == t2) ||
(rule.left_name == t2 && rule.right_name == t1))
end
end
end
end
end
16 changes: 13 additions & 3 deletions lib/lrama/grammar/symbols/resolver.rb
Original file line number Diff line number Diff line change
Expand Up @@ -52,15 +52,17 @@ def sort_by_number!
def add_term(id:, alias_name: nil, tag: nil, token_id: nil, replace: false)
if token_id && (sym = find_symbol_by_token_id(token_id))
if replace
sym.id = id
sym.alias_name = alias_name
sym.tag = tag
replace_term_attributes(sym, id: id, alias_name: alias_name, tag: tag, token_id: token_id)
end

return sym
end

if (sym = find_symbol_by_id(id))
if replace
replace_term_attributes(sym, id: id, alias_name: alias_name, tag: tag, token_id: token_id)
end

return sym
end

Expand Down Expand Up @@ -229,6 +231,14 @@ def find_nterm_by_id!(id)
end || (raise "Symbol not found. #{id}")
end

# @rbs (Grammar::Symbol sym, id: Lexer::Token::Base, ?alias_name: String?, ?tag: Lexer::Token::Tag?, ?token_id: Integer?) -> void
def replace_term_attributes(sym, id:, alias_name: nil, tag: nil, token_id: nil)
sym.id = id
sym.alias_name = alias_name
sym.tag = tag
sym.token_id = token_id if token_id
end

# @rbs () -> void
def fill_terms_number
# Character literal in grammar file has
Expand Down
Loading