SLR(1) & LR(1) Parser — CS4031 Compiler Construction

A complete bottom-up parser implementation in C++ supporting both SLR(1) and LR(1) parsing strategies, with full canonical item set construction, parsing table generation, conflict detection, parse tree output, and detailed trace files.

Team: 23i-0747 · 23i-0062 · Section A

Overview

This project implements two classical bottom-up parsing algorithms from scratch:

SLR(1) — Simple LR with 1 token of lookahead, using FOLLOW sets to resolve reduce actions
LR(1) — Canonical LR with 1 token of lookahead, using precise per-item lookahead sets

Both parsers share a common grammar front-end and produce identical parse trees for accepted inputs. The key difference is in conflict resolution power: LR(1) avoids many conflicts that SLR(1) cannot resolve, as demonstrated with the included grammar_with_conflict.txt.

Architecture

┌─────────────────────────────────────────────────────────────┐
│                        main.cpp                             │
│  Orchestrates all grammars, parsers, and output files       │
└────────────┬───────────────────────────┬────────────────────┘
             │                           │
     ┌───────▼───────┐           ┌───────▼───────┐
     │  slr_parser   │           │  lr1_parser   │
     │  parse()      │           │  parseLR1()   │
     └───────┬───────┘           └───────┬───────┘
             │                           │
             └──────────┬────────────────┘
                        │
              ┌─────────▼──────────┐
              │   parsing_table    │
              │  buildSLRTable()   │
              │  buildLR1Table()   │
              └─────────┬──────────┘
                        │
              ┌─────────▼──────────┐
              │      items         │
              │  Canonical Item    │
              │  Set Construction  │
              │  (SLR & LR1)       │
              └─────────┬──────────┘
                        │
              ┌─────────▼──────────┐
              │      grammar       │
              │  readGrammar()     │
              │  augmentGrammar()  │
              │  computeFirst()    │
              │  computeFollow()   │
              └────────────────────┘

Data flow:

grammar.cpp reads and augments the grammar, then computes FIRST and FOLLOW sets
items.cpp constructs the canonical collection of LR(0) items (for SLR) and LR(1) items (for canonical LR)
parsing_table.cpp fills the ACTION and GOTO tables, recording any conflicts
slr_parser.cpp / lr1_parser.cpp drive the parse stack loop
tree.cpp builds and prints the parse tree from the node stack

Features

Grammar augmentation with fresh start symbol (collision-safe)
FIRST and FOLLOW set computation
LR(0) canonical item set construction (for SLR)
LR(1) canonical item set construction with per-item lookahead propagation
SLR(1) and LR(1) ACTION/GOTO table construction
Shift-reduce and reduce-reduce conflict detection and reporting
Parse stack simulation with step-by-step trace output
Parse tree construction and indented text output
Multi-grammar batch run producing 8 structured output files
SLR(1) vs LR(1) comparison report

Project Structure

.
├── grammar.h / grammar.cpp         # Grammar I/O, augmentation, FIRST, FOLLOW
├── items.h / items.cpp             # LR(0) and LR(1) item set construction
├── parsing_table.h / parsing_table.cpp  # ACTION/GOTO table construction & conflict detection
├── slr_parser.h / slr_parser.cpp   # SLR(1) parse loop
├── lr1_parser.h / lr1_parser.cpp   # LR(1) parse loop
├── tree.h / tree.cpp               # Parse tree node and printer
├── stack.h / stack.cpp             # Stack placeholder (uses STL vector internally)
├── main.cpp                        # Batch runner — all grammars, all outputs
├── input/
│   ├── grammar1.txt                # Simple expression grammar (addition)
│   ├── grammar2.txt                # Full arithmetic grammar (precedence)
│   ├── grammar3.txt                # Dangling-else grammar (SLR conflict)
│   └── grammar_with_conflict.txt  # Classic lvalue/rvalue grammar (SLR conflict)
├── output/                         # All generated output files
└── assets/                         # Screenshots for this README

Grammar Input Format

One rule per line: NonTerminal -> alt1 | alt2
Use -> as the arrow
Separate alternatives with |
Non-terminals must be multi-character names starting with uppercase (e.g., Expr, Stmt, Factor)
Terminals are lowercase words, operators, or single symbols
Use epsilon for the empty string

Expr -> Expr + Term | Term
Term -> Term * Factor | Factor
Factor -> ( Expr ) | id

Sample Grammars

Grammar 1 — Simple Addition

Expr -> Expr + Term | Term
Term -> Factor
Factor -> id

Accepts: id, id + id, id + id + id

Grammar 2 — Full Arithmetic (with precedence)

Expr -> Expr + Term | Term
Term -> Term * Factor | Factor
Factor -> ( Expr ) | id

Accepts: id, id + id * id, ( id + id ) * id

Grammar 3 — Dangling Else (classic SLR conflict)

Stmt -> if Expr then Stmt | if Expr then Stmt else Stmt | other
Expr -> id

This grammar produces a shift-reduce conflict in SLR(1) but is handled correctly by LR(1).

Grammar 4 — Lvalue/Rvalue (SLR conflict, LR(1) clean)

Start -> L = R | R
L -> * R | id
R -> L

Demonstrates the canonical example from the Dragon Book where SLR(1) fails but LR(1) succeeds.

Build & Run

Requirements: g++ with C++11 support

# Build
make

# Or manually
g++ -std=c++11 -Wall -o parser \
    src/main.cpp src/grammar.cpp src/items.cpp \
    src/parsing_table.cpp src/slr_parser.cpp \
    src/lr1_parser.cpp src/tree.cpp src/stack.cpp

# Run (processes all 4 grammars)
./parser

Expected console output:

Done: grammar1.txt              SLR-states=9   LR1-states=9   SLR-conflicts=0  LR1-conflicts=0
Done: grammar2.txt              SLR-states=12  LR1-states=12  SLR-conflicts=0  LR1-conflicts=0
Done: grammar3.txt              SLR-states=11  LR1-states=11  SLR-conflicts=1  LR1-conflicts=0
Done: grammar_with_conflict.txt SLR-states=10  LR1-states=10  SLR-conflicts=1  LR1-conflicts=0

All output files written to output/

Output Files

All files are written to output/ after a single run:

File	Description
`augmented_grammar.txt`	Numbered productions after augmentation
`slr_items.txt`	All LR(0) canonical item states
`slr_parsing_table.txt`	SLR(1) ACTION and GOTO tables + conflict list
`slr_trace.txt`	Step-by-step parse trace for all valid and invalid inputs
`lr1_items.txt`	All LR(1) canonical item states with lookaheads
`lr1_parsing_table.txt`	LR(1) ACTION and GOTO tables + conflict list
`lr1_trace.txt`	Step-by-step parse trace for all valid and invalid inputs
`parse_trees.txt`	Indented parse trees for all accepted inputs
`comparison.txt`	Side-by-side SLR vs LR(1) state count and conflict summary

Screenshots

Console Output

SLR(1) Item States — Grammar 2

LR(1) Item States with Lookaheads — Grammar 4

SLR(1) Parsing Table — Grammar 1

Parse Trace — Valid Input

Parse Trace — Invalid Input

Parse Tree — Grammar 2

Conflict Detection — SLR(1) vs LR(1) on Grammar 4

Comparison Report

Parsing Table Examples

SLR(1) Table — Grammar 1 (excerpt)

State 0:
  ACTION: ($,r2) (id,s5)
  GOTO:   (Expr,1) (Factor,3) (Term,2)

State 1:
  ACTION: ($,acc) (+,s6)
  GOTO:

State 5:
  ACTION: ($,r3) (+,r3)
  GOTO:

s5 → shift to state 5
r2 → reduce by production 2
acc → accept

LR(1) Table — Grammar 4 (excerpt, no conflicts)

State 0:
  ACTION: (* ,s4) (id,s5)
  GOTO:   (L,2) (R,1) (Start,0)

State 3:
  ACTION: ($,r1)        ← reduce only on $, not =
  GOTO:

The LR(1) table correctly restricts the reduce action to $ only, avoiding the SLR(1) conflict where FOLLOW(R) = {$, =} causes a false reduce on =.

Parse Tree Examples

Input: `id + id * id` (Grammar 2)

Expr
  Expr
    Term
      Factor
        id
  +
  Term
    Term
      Factor
        id
    *
    Factor
      id

This tree correctly reflects operator precedence — * binds tighter than +.

Input: `if id then if id then other else other` (Grammar 3)

Stmt
  if
  Expr
    id
  then
  Stmt
    if
    Expr
      id
    then
    Stmt
      other
    else
    Stmt
      other

The else is correctly associated with the inner if, matching standard language semantics.

Conflict Analysis

Grammar 3 — Dangling Else

SLR(1) Conflict:

In state S after parsing if Expr then Stmt, the parser sees else and must choose:

Shift else (associate with inner if)
Reduce using Stmt -> if Expr then Stmt (associate with outer if)

Since else ∈ FOLLOW(Stmt), SLR(1) puts both actions in the table → shift-reduce conflict.

LR(1) Resolution:

LR(1) tracks exact lookaheads per item. In the state reached after the inner then Stmt, the reduce item carries lookahead $ and else only where it's unambiguous. The shift wins on else with full context, resolving the conflict.

Grammar 4 — Lvalue/Rvalue (Dragon Book Example)

SLR(1) Conflict:

After reading * and reducing L -> * R, the parser is in a state with item R -> L . and considers reducing. But FOLLOW(R) = {$, =}, so SLR places r on = as well as $. However, shifting = is also valid here → shift-reduce conflict.

LR(1) Resolution:

In LR(1), the item R -> L . , {$} carries only $ as its lookahead (derived from context — R on the RHS of Start -> L = R never appears before =). So LR(1) places reduce only on $, and shift on = — no conflict.

SLR(1) conflicts : 1   (state S, token '=', s? vs r?)
LR(1)  conflicts : 0

SLR(1) vs LR(1) Comparison

Grammar	SLR States	LR(1) States	SLR Conflicts	Verdict
grammar1.txt	9	9	0	Both parse it
grammar2.txt	12	12	0	Both parse it
grammar3.txt	11	11	1	LR(1) only
grammar_with_conflict.txt	10	10	1	LR(1) only

Key takeaways:

For unambiguous non-conflicting grammars, SLR(1) and LR(1) produce the same number of states and identical tables
SLR(1) uses FOLLOW sets as a blunt instrument; LR(1) uses per-item lookaheads for surgical precision
Every SLR(1) grammar is also LR(1), but not vice versa — LR(1) strictly subsumes SLR(1)
The extra power of LR(1) comes at the cost of potentially many more states (LALR(1) merges them, at the cost of some conflicts)

Known Limitations

Non-terminal names must be multi-character (single letters like E, T, F are not supported — they are ambiguous with terminals)
Grammar file path is hardcoded per run; edit main.cpp to change inputs
Parse tree output is indented text only — no graphical rendering
LALR(1) is not implemented (LR(1) state merging is future work)
Error recovery is not implemented — the parser halts on the first error

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
docs		docs
input		input
output		output
src		src
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

SLR(1) & LR(1) Parser — CS4031 Compiler Construction

Table of Contents

Overview

Architecture

Features

Project Structure

Grammar Input Format

Sample Grammars

Grammar 1 — Simple Addition

Grammar 2 — Full Arithmetic (with precedence)

Grammar 3 — Dangling Else (classic SLR conflict)

Grammar 4 — Lvalue/Rvalue (SLR conflict, LR(1) clean)

Build & Run

Output Files

Screenshots

Console Output

SLR(1) Item States — Grammar 2

LR(1) Item States with Lookaheads — Grammar 4

SLR(1) Parsing Table — Grammar 1

Parse Trace — Valid Input

Parse Trace — Invalid Input

Parse Tree — Grammar 2

Conflict Detection — SLR(1) vs LR(1) on Grammar 4

Comparison Report

Parsing Table Examples

SLR(1) Table — Grammar 1 (excerpt)

LR(1) Table — Grammar 4 (excerpt, no conflicts)

Parse Tree Examples

Input: id + id * id (Grammar 2)

Input: if id then if id then other else other (Grammar 3)

Conflict Analysis

Grammar 3 — Dangling Else

Grammar 4 — Lvalue/Rvalue (Dragon Book Example)

SLR(1) vs LR(1) Comparison

Known Limitations

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Input: `id + id * id` (Grammar 2)

Input: `if id then if id then other else other` (Grammar 3)

Packages