Skip to content

Rafique610/bottom-up-parser-engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SLR(1) & LR(1) Parser — CS4031 Compiler Construction

A complete bottom-up parser implementation in C++ supporting both SLR(1) and LR(1) parsing strategies, with full canonical item set construction, parsing table generation, conflict detection, parse tree output, and detailed trace files.

Team: 23i-0747 · 23i-0062 · Section A


Table of Contents


Overview

This project implements two classical bottom-up parsing algorithms from scratch:

  • SLR(1) — Simple LR with 1 token of lookahead, using FOLLOW sets to resolve reduce actions
  • LR(1) — Canonical LR with 1 token of lookahead, using precise per-item lookahead sets

Both parsers share a common grammar front-end and produce identical parse trees for accepted inputs. The key difference is in conflict resolution power: LR(1) avoids many conflicts that SLR(1) cannot resolve, as demonstrated with the included grammar_with_conflict.txt.


Architecture

┌─────────────────────────────────────────────────────────────┐
│                        main.cpp                             │
│  Orchestrates all grammars, parsers, and output files       │
└────────────┬───────────────────────────┬────────────────────┘
             │                           │
     ┌───────▼───────┐           ┌───────▼───────┐
     │  slr_parser   │           │  lr1_parser   │
     │  parse()      │           │  parseLR1()   │
     └───────┬───────┘           └───────┬───────┘
             │                           │
             └──────────┬────────────────┘
                        │
              ┌─────────▼──────────┐
              │   parsing_table    │
              │  buildSLRTable()   │
              │  buildLR1Table()   │
              └─────────┬──────────┘
                        │
              ┌─────────▼──────────┐
              │      items         │
              │  Canonical Item    │
              │  Set Construction  │
              │  (SLR & LR1)       │
              └─────────┬──────────┘
                        │
              ┌─────────▼──────────┐
              │      grammar       │
              │  readGrammar()     │
              │  augmentGrammar()  │
              │  computeFirst()    │
              │  computeFollow()   │
              └────────────────────┘

Data flow:

  1. grammar.cpp reads and augments the grammar, then computes FIRST and FOLLOW sets
  2. items.cpp constructs the canonical collection of LR(0) items (for SLR) and LR(1) items (for canonical LR)
  3. parsing_table.cpp fills the ACTION and GOTO tables, recording any conflicts
  4. slr_parser.cpp / lr1_parser.cpp drive the parse stack loop
  5. tree.cpp builds and prints the parse tree from the node stack

Features

  • Grammar augmentation with fresh start symbol (collision-safe)
  • FIRST and FOLLOW set computation
  • LR(0) canonical item set construction (for SLR)
  • LR(1) canonical item set construction with per-item lookahead propagation
  • SLR(1) and LR(1) ACTION/GOTO table construction
  • Shift-reduce and reduce-reduce conflict detection and reporting
  • Parse stack simulation with step-by-step trace output
  • Parse tree construction and indented text output
  • Multi-grammar batch run producing 8 structured output files
  • SLR(1) vs LR(1) comparison report

Project Structure

.
├── grammar.h / grammar.cpp         # Grammar I/O, augmentation, FIRST, FOLLOW
├── items.h / items.cpp             # LR(0) and LR(1) item set construction
├── parsing_table.h / parsing_table.cpp  # ACTION/GOTO table construction & conflict detection
├── slr_parser.h / slr_parser.cpp   # SLR(1) parse loop
├── lr1_parser.h / lr1_parser.cpp   # LR(1) parse loop
├── tree.h / tree.cpp               # Parse tree node and printer
├── stack.h / stack.cpp             # Stack placeholder (uses STL vector internally)
├── main.cpp                        # Batch runner — all grammars, all outputs
├── input/
│   ├── grammar1.txt                # Simple expression grammar (addition)
│   ├── grammar2.txt                # Full arithmetic grammar (precedence)
│   ├── grammar3.txt                # Dangling-else grammar (SLR conflict)
│   └── grammar_with_conflict.txt  # Classic lvalue/rvalue grammar (SLR conflict)
├── output/                         # All generated output files
└── assets/                         # Screenshots for this README

Grammar Input Format

  • One rule per line: NonTerminal -> alt1 | alt2
  • Use -> as the arrow
  • Separate alternatives with |
  • Non-terminals must be multi-character names starting with uppercase (e.g., Expr, Stmt, Factor)
  • Terminals are lowercase words, operators, or single symbols
  • Use epsilon for the empty string
Expr -> Expr + Term | Term
Term -> Term * Factor | Factor
Factor -> ( Expr ) | id

Sample Grammars

Grammar 1 — Simple Addition

Expr -> Expr + Term | Term
Term -> Factor
Factor -> id

Accepts: id, id + id, id + id + id

Grammar 2 — Full Arithmetic (with precedence)

Expr -> Expr + Term | Term
Term -> Term * Factor | Factor
Factor -> ( Expr ) | id

Accepts: id, id + id * id, ( id + id ) * id

Grammar 3 — Dangling Else (classic SLR conflict)

Stmt -> if Expr then Stmt | if Expr then Stmt else Stmt | other
Expr -> id

This grammar produces a shift-reduce conflict in SLR(1) but is handled correctly by LR(1).

Grammar 4 — Lvalue/Rvalue (SLR conflict, LR(1) clean)

Start -> L = R | R
L -> * R | id
R -> L

Demonstrates the canonical example from the Dragon Book where SLR(1) fails but LR(1) succeeds.


Build & Run

Requirements: g++ with C++11 support

# Build
make

# Or manually
g++ -std=c++11 -Wall -o parser \
    src/main.cpp src/grammar.cpp src/items.cpp \
    src/parsing_table.cpp src/slr_parser.cpp \
    src/lr1_parser.cpp src/tree.cpp src/stack.cpp

# Run (processes all 4 grammars)
./parser

Expected console output:

Done: grammar1.txt              SLR-states=9   LR1-states=9   SLR-conflicts=0  LR1-conflicts=0
Done: grammar2.txt              SLR-states=12  LR1-states=12  SLR-conflicts=0  LR1-conflicts=0
Done: grammar3.txt              SLR-states=11  LR1-states=11  SLR-conflicts=1  LR1-conflicts=0
Done: grammar_with_conflict.txt SLR-states=10  LR1-states=10  SLR-conflicts=1  LR1-conflicts=0

All output files written to output/

Output Files

All files are written to output/ after a single run:

File Description
augmented_grammar.txt Numbered productions after augmentation
slr_items.txt All LR(0) canonical item states
slr_parsing_table.txt SLR(1) ACTION and GOTO tables + conflict list
slr_trace.txt Step-by-step parse trace for all valid and invalid inputs
lr1_items.txt All LR(1) canonical item states with lookaheads
lr1_parsing_table.txt LR(1) ACTION and GOTO tables + conflict list
lr1_trace.txt Step-by-step parse trace for all valid and invalid inputs
parse_trees.txt Indented parse trees for all accepted inputs
comparison.txt Side-by-side SLR vs LR(1) state count and conflict summary

Screenshots

Console Output

Console Output

SLR(1) Item States — Grammar 2

SLR Item States

LR(1) Item States with Lookaheads — Grammar 4

LR1 Item States

SLR(1) Parsing Table — Grammar 1

SLR Parsing Table

Parse Trace — Valid Input

Valid Trace

Parse Trace — Invalid Input

Invalid Trace

Parse Tree — Grammar 2

Parse Tree

Conflict Detection — SLR(1) vs LR(1) on Grammar 4

Conflict Detection

Comparison Report

Comparison


Parsing Table Examples

SLR(1) Table — Grammar 1 (excerpt)

State 0:
  ACTION: ($,r2) (id,s5)
  GOTO:   (Expr,1) (Factor,3) (Term,2)

State 1:
  ACTION: ($,acc) (+,s6)
  GOTO:

State 5:
  ACTION: ($,r3) (+,r3)
  GOTO:
  • s5 → shift to state 5
  • r2 → reduce by production 2
  • acc → accept

LR(1) Table — Grammar 4 (excerpt, no conflicts)

State 0:
  ACTION: (* ,s4) (id,s5)
  GOTO:   (L,2) (R,1) (Start,0)

State 3:
  ACTION: ($,r1)        ← reduce only on $, not =
  GOTO:

The LR(1) table correctly restricts the reduce action to $ only, avoiding the SLR(1) conflict where FOLLOW(R) = {$, =} causes a false reduce on =.


Parse Tree Examples

Input: id + id * id (Grammar 2)

Expr
  Expr
    Term
      Factor
        id
  +
  Term
    Term
      Factor
        id
    *
    Factor
      id

This tree correctly reflects operator precedence — * binds tighter than +.

Input: if id then if id then other else other (Grammar 3)

Stmt
  if
  Expr
    id
  then
  Stmt
    if
    Expr
      id
    then
    Stmt
      other
    else
    Stmt
      other

The else is correctly associated with the inner if, matching standard language semantics.


Conflict Analysis

Grammar 3 — Dangling Else

SLR(1) Conflict:

In state S after parsing if Expr then Stmt, the parser sees else and must choose:

  • Shift else (associate with inner if)
  • Reduce using Stmt -> if Expr then Stmt (associate with outer if)

Since else ∈ FOLLOW(Stmt), SLR(1) puts both actions in the table → shift-reduce conflict.

LR(1) Resolution:

LR(1) tracks exact lookaheads per item. In the state reached after the inner then Stmt, the reduce item carries lookahead $ and else only where it's unambiguous. The shift wins on else with full context, resolving the conflict.


Grammar 4 — Lvalue/Rvalue (Dragon Book Example)

SLR(1) Conflict:

After reading * and reducing L -> * R, the parser is in a state with item R -> L . and considers reducing. But FOLLOW(R) = {$, =}, so SLR places r on = as well as $. However, shifting = is also valid here → shift-reduce conflict.

LR(1) Resolution:

In LR(1), the item R -> L . , {$} carries only $ as its lookahead (derived from context — R on the RHS of Start -> L = R never appears before =). So LR(1) places reduce only on $, and shift on = — no conflict.

SLR(1) conflicts : 1   (state S, token '=', s? vs r?)
LR(1)  conflicts : 0

SLR(1) vs LR(1) Comparison

Grammar SLR States LR(1) States SLR Conflicts LR(1) Conflicts Verdict
grammar1.txt 9 9 0 0 Both parse it
grammar2.txt 12 12 0 0 Both parse it
grammar3.txt 11 11 1 0 LR(1) only
grammar_with_conflict.txt 10 10 1 0 LR(1) only

Key takeaways:

  • For unambiguous non-conflicting grammars, SLR(1) and LR(1) produce the same number of states and identical tables
  • SLR(1) uses FOLLOW sets as a blunt instrument; LR(1) uses per-item lookaheads for surgical precision
  • Every SLR(1) grammar is also LR(1), but not vice versa — LR(1) strictly subsumes SLR(1)
  • The extra power of LR(1) comes at the cost of potentially many more states (LALR(1) merges them, at the cost of some conflicts)

Known Limitations

  • Non-terminal names must be multi-character (single letters like E, T, F are not supported — they are ambiguous with terminals)
  • Grammar file path is hardcoded per run; edit main.cpp to change inputs
  • Parse tree output is indented text only — no graphical rendering
  • LALR(1) is not implemented (LR(1) state merging is future work)
  • Error recovery is not implemented — the parser halts on the first error

About

A C++ implementation of SLR(1) and Canonical LR(1) bottom-up parsers featuring grammar augmentation, FIRST/FOLLOW computation, canonical item collection generation, parsing table construction, conflict detection, parse trace visualization, and parse tree generation for context-free grammars.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors