A complete bottom-up parser implementation in C++ supporting both SLR(1) and LR(1) parsing strategies, with full canonical item set construction, parsing table generation, conflict detection, parse tree output, and detailed trace files.
Team: 23i-0747 · 23i-0062 · Section A
- Overview
- Architecture
- Features
- Project Structure
- Grammar Input Format
- Sample Grammars
- Build & Run
- Output Files
- Screenshots
- Parsing Table Examples
- Parse Tree Examples
- Conflict Analysis
- SLR(1) vs LR(1) Comparison
- Known Limitations
This project implements two classical bottom-up parsing algorithms from scratch:
- SLR(1) — Simple LR with 1 token of lookahead, using FOLLOW sets to resolve reduce actions
- LR(1) — Canonical LR with 1 token of lookahead, using precise per-item lookahead sets
Both parsers share a common grammar front-end and produce identical parse trees for accepted inputs. The key difference is in conflict resolution power: LR(1) avoids many conflicts that SLR(1) cannot resolve, as demonstrated with the included grammar_with_conflict.txt.
┌─────────────────────────────────────────────────────────────┐
│ main.cpp │
│ Orchestrates all grammars, parsers, and output files │
└────────────┬───────────────────────────┬────────────────────┘
│ │
┌───────▼───────┐ ┌───────▼───────┐
│ slr_parser │ │ lr1_parser │
│ parse() │ │ parseLR1() │
└───────┬───────┘ └───────┬───────┘
│ │
└──────────┬────────────────┘
│
┌─────────▼──────────┐
│ parsing_table │
│ buildSLRTable() │
│ buildLR1Table() │
└─────────┬──────────┘
│
┌─────────▼──────────┐
│ items │
│ Canonical Item │
│ Set Construction │
│ (SLR & LR1) │
└─────────┬──────────┘
│
┌─────────▼──────────┐
│ grammar │
│ readGrammar() │
│ augmentGrammar() │
│ computeFirst() │
│ computeFollow() │
└────────────────────┘
Data flow:
grammar.cppreads and augments the grammar, then computes FIRST and FOLLOW setsitems.cppconstructs the canonical collection of LR(0) items (for SLR) and LR(1) items (for canonical LR)parsing_table.cppfills the ACTION and GOTO tables, recording any conflictsslr_parser.cpp/lr1_parser.cppdrive the parse stack looptree.cppbuilds and prints the parse tree from the node stack
- Grammar augmentation with fresh start symbol (collision-safe)
- FIRST and FOLLOW set computation
- LR(0) canonical item set construction (for SLR)
- LR(1) canonical item set construction with per-item lookahead propagation
- SLR(1) and LR(1) ACTION/GOTO table construction
- Shift-reduce and reduce-reduce conflict detection and reporting
- Parse stack simulation with step-by-step trace output
- Parse tree construction and indented text output
- Multi-grammar batch run producing 8 structured output files
- SLR(1) vs LR(1) comparison report
.
├── grammar.h / grammar.cpp # Grammar I/O, augmentation, FIRST, FOLLOW
├── items.h / items.cpp # LR(0) and LR(1) item set construction
├── parsing_table.h / parsing_table.cpp # ACTION/GOTO table construction & conflict detection
├── slr_parser.h / slr_parser.cpp # SLR(1) parse loop
├── lr1_parser.h / lr1_parser.cpp # LR(1) parse loop
├── tree.h / tree.cpp # Parse tree node and printer
├── stack.h / stack.cpp # Stack placeholder (uses STL vector internally)
├── main.cpp # Batch runner — all grammars, all outputs
├── input/
│ ├── grammar1.txt # Simple expression grammar (addition)
│ ├── grammar2.txt # Full arithmetic grammar (precedence)
│ ├── grammar3.txt # Dangling-else grammar (SLR conflict)
│ └── grammar_with_conflict.txt # Classic lvalue/rvalue grammar (SLR conflict)
├── output/ # All generated output files
└── assets/ # Screenshots for this README
- One rule per line:
NonTerminal -> alt1 | alt2 - Use
->as the arrow - Separate alternatives with
| - Non-terminals must be multi-character names starting with uppercase (e.g.,
Expr,Stmt,Factor) - Terminals are lowercase words, operators, or single symbols
- Use
epsilonfor the empty string
Expr -> Expr + Term | Term
Term -> Term * Factor | Factor
Factor -> ( Expr ) | id
Expr -> Expr + Term | Term
Term -> Factor
Factor -> id
Accepts: id, id + id, id + id + id
Expr -> Expr + Term | Term
Term -> Term * Factor | Factor
Factor -> ( Expr ) | id
Accepts: id, id + id * id, ( id + id ) * id
Stmt -> if Expr then Stmt | if Expr then Stmt else Stmt | other
Expr -> id
This grammar produces a shift-reduce conflict in SLR(1) but is handled correctly by LR(1).
Start -> L = R | R
L -> * R | id
R -> L
Demonstrates the canonical example from the Dragon Book where SLR(1) fails but LR(1) succeeds.
Requirements: g++ with C++11 support
# Build
make
# Or manually
g++ -std=c++11 -Wall -o parser \
src/main.cpp src/grammar.cpp src/items.cpp \
src/parsing_table.cpp src/slr_parser.cpp \
src/lr1_parser.cpp src/tree.cpp src/stack.cpp
# Run (processes all 4 grammars)
./parserExpected console output:
Done: grammar1.txt SLR-states=9 LR1-states=9 SLR-conflicts=0 LR1-conflicts=0
Done: grammar2.txt SLR-states=12 LR1-states=12 SLR-conflicts=0 LR1-conflicts=0
Done: grammar3.txt SLR-states=11 LR1-states=11 SLR-conflicts=1 LR1-conflicts=0
Done: grammar_with_conflict.txt SLR-states=10 LR1-states=10 SLR-conflicts=1 LR1-conflicts=0
All output files written to output/
All files are written to output/ after a single run:
| File | Description |
|---|---|
augmented_grammar.txt |
Numbered productions after augmentation |
slr_items.txt |
All LR(0) canonical item states |
slr_parsing_table.txt |
SLR(1) ACTION and GOTO tables + conflict list |
slr_trace.txt |
Step-by-step parse trace for all valid and invalid inputs |
lr1_items.txt |
All LR(1) canonical item states with lookaheads |
lr1_parsing_table.txt |
LR(1) ACTION and GOTO tables + conflict list |
lr1_trace.txt |
Step-by-step parse trace for all valid and invalid inputs |
parse_trees.txt |
Indented parse trees for all accepted inputs |
comparison.txt |
Side-by-side SLR vs LR(1) state count and conflict summary |
State 0:
ACTION: ($,r2) (id,s5)
GOTO: (Expr,1) (Factor,3) (Term,2)
State 1:
ACTION: ($,acc) (+,s6)
GOTO:
State 5:
ACTION: ($,r3) (+,r3)
GOTO:
s5→ shift to state 5r2→ reduce by production 2acc→ accept
State 0:
ACTION: (* ,s4) (id,s5)
GOTO: (L,2) (R,1) (Start,0)
State 3:
ACTION: ($,r1) ← reduce only on $, not =
GOTO:
The LR(1) table correctly restricts the reduce action to $ only, avoiding the SLR(1) conflict where FOLLOW(R) = {$, =} causes a false reduce on =.
Expr
Expr
Term
Factor
id
+
Term
Term
Factor
id
*
Factor
id
This tree correctly reflects operator precedence — * binds tighter than +.
Stmt
if
Expr
id
then
Stmt
if
Expr
id
then
Stmt
other
else
Stmt
other
The else is correctly associated with the inner if, matching standard language semantics.
SLR(1) Conflict:
In state S after parsing if Expr then Stmt, the parser sees else and must choose:
- Shift
else(associate with innerif) - Reduce using
Stmt -> if Expr then Stmt(associate with outerif)
Since else ∈ FOLLOW(Stmt), SLR(1) puts both actions in the table → shift-reduce conflict.
LR(1) Resolution:
LR(1) tracks exact lookaheads per item. In the state reached after the inner then Stmt, the reduce item carries lookahead $ and else only where it's unambiguous. The shift wins on else with full context, resolving the conflict.
SLR(1) Conflict:
After reading * and reducing L -> * R, the parser is in a state with item R -> L . and considers reducing. But FOLLOW(R) = {$, =}, so SLR places r on = as well as $. However, shifting = is also valid here → shift-reduce conflict.
LR(1) Resolution:
In LR(1), the item R -> L . , {$} carries only $ as its lookahead (derived from context — R on the RHS of Start -> L = R never appears before =). So LR(1) places reduce only on $, and shift on = — no conflict.
SLR(1) conflicts : 1 (state S, token '=', s? vs r?)
LR(1) conflicts : 0
| Grammar | SLR States | LR(1) States | SLR Conflicts | LR(1) Conflicts | Verdict |
|---|---|---|---|---|---|
| grammar1.txt | 9 | 9 | 0 | 0 | Both parse it |
| grammar2.txt | 12 | 12 | 0 | 0 | Both parse it |
| grammar3.txt | 11 | 11 | 1 | 0 | LR(1) only |
| grammar_with_conflict.txt | 10 | 10 | 1 | 0 | LR(1) only |
Key takeaways:
- For unambiguous non-conflicting grammars, SLR(1) and LR(1) produce the same number of states and identical tables
- SLR(1) uses FOLLOW sets as a blunt instrument; LR(1) uses per-item lookaheads for surgical precision
- Every SLR(1) grammar is also LR(1), but not vice versa — LR(1) strictly subsumes SLR(1)
- The extra power of LR(1) comes at the cost of potentially many more states (LALR(1) merges them, at the cost of some conflicts)
- Non-terminal names must be multi-character (single letters like
E,T,Fare not supported — they are ambiguous with terminals) - Grammar file path is hardcoded per run; edit
main.cppto change inputs - Parse tree output is indented text only — no graphical rendering
- LALR(1) is not implemented (LR(1) state merging is future work)
- Error recovery is not implemented — the parser halts on the first error








