Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# IDE
.idea/
.vscode/
*.swp
*.swo

# Claude Code
.claude/
memory/

# Generated compiler output
*.int
*.asm
*.c

# Python
__pycache__/
*.py[cod]
*.egg-info/
dist/
build/

# OS
.DS_Store
Thumbs.db

55 changes: 55 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Running the Compiler

```bash
python3 starlet.py <input.stl>
```

This produces three output files alongside the input:
- `<name>.int` — intermediate quad representation
- `<name>.c` — C backend (goto-based)
- `<name>.asm` — MIPS32 assembly backend

Example files are in `Examples/` (`.stl` extension).

## Architecture

The entire compiler lives in `starlet.py` as a single file with heavy use of global state. The four stages flow linearly: lex → parse → IR → code gen.

### Lexer (`lex()`)
FSM tokenizer. Reads `data` (the open source file) one character at a time. States 1–9 handle identifiers, integers, multi-char operators (`<=`, `:=`, etc.), and two comment styles (`//` line, `/* */` block). Returns the next token string; the parser calls `lex()` to advance and stores the result in the global `token`.

### Parser
Recursive descent. `program()` is the entry point. All grammar rules are functions (`block`, `statements`, `statement`, `expression`, `term`, `factor`, etc.). The parser both validates syntax and drives IR generation in the same pass — no separate AST is built.

### Intermediate Representation (IR)
Quads stored in `quadDict: {int → [op, x, y, z]}`, numbered by `nextLabel`. Key functions:
- `gen_quad(op, x, y, z)` — emits a quad
- `make_list(label)`, `merge(l1, l2)`, `backpatch(labellist, target)` — implement boolean short-circuit and control flow via forward-reference patching; jump destinations are filled in after the target quad is known

Quad operations: `:=`, `+`, `-`, `*`, `/`, comparison ops (`=`, `<>`, `<`, `<=`, `>`, `>=`), `jump`, `par`/`call`/`retv`, `out`, `inp`, `halt`, `begin_block`, `end_block`.

### Symbol Table
`scopes_list` is a global stack of `Scope` objects. Each `Scope` tracks its `nesting_level`, a reference to its `enclosing_scope`, and an `entities` list. Stack pointer offsets begin at 12 and grow by 4 per entity (`Scope.get_sp()`).

Entity subclasses: `Variable` (VAR), `Function` (FUNC), `Parameter` (PAR), `TempVariable` (TMPVAR). Functions store their `arguments` list (parameter modes), `start_quad`, and `framelength`.

Key lookup: `testing(name)` walks the static chain via `enclosing_scope` links. `search_entity(name, type)` scans `scopes_list` front-to-back.

### MIPS Backend (`write_to_asm`)
Called once per quad during `block()` (not after the full parse). MIPS frame layout:
- `$sp` → saved `$ra`
- `-4($sp)` → static link (access link to enclosing frame)
- `-8($sp)` → return value address
- `-12($sp)` onward → parameters and locals

`$s0` holds the base of the global (level-0) activation record. `gnvlcode(v)` walks the static chain through `-4($t0)` links to compute the address of a non-local variable. `loadvr(v, r)` / `storerv(r, v)` dispatch on entity type and nesting level to emit the correct load/store.

### Parameter Passing
Three modes throughout the compiler:
- `in` / `CV` — call by value
- `inout` / `REF` — call by reference (address passed)
- `inandout` / `RET` — call by return (caller passes address of result variable)
5 changes: 5 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
[project]
name = "compiler"
version = "0.1.0"
requires-python = ">=3.13"
dependencies = []
Loading