LeanPython: Python Interpreter in Lean4

Vision

Build a drop-in Python 3.12 interpreter written in Lean4, with the primary goal of interpreting the leanSpec Ethereum consensus specification. The interpreter serves as a bridge: Python specs go in, Lean4-verified execution comes out.

Target Codebase Analysis

The leanSpec repository contains ~152 Python source files and ~176 test files implementing the Lean Ethereum consensus protocol. The Python features required are substantial but bounded:

Python Features Required (by priority)

Must Have (used pervasively in leanSpec):

Classes with single and multiple inheritance (class State(Container))
__init__, __new__, __slots__, ClassVar
All arithmetic/comparison/bitwise dunder methods (operator overloading)
@classmethod, @staticmethod, @property
@override, @singledispatch, @lru_cache, @dataclass(frozen=True, slots=True)
Exception handling (try/except/finally/raise/custom hierarchies)
List, dict, set, and generator comprehensions
Generators and iterators (yield, __iter__, __next__)
from __future__ import annotations (PEP 563 deferred evaluation)
Type annotations (runtime access via __annotations__, get_type_hints())
f-strings with nested expressions
Tuple/list/dict unpacking (*args, **kwargs, starred assignments)
Slicing (a[1:3], a[i*n:(i+1)*n])
Walrus operator (:=)
Context managers (with statement)
Module system (import, from ... import, relative imports, packages)
Built-in types: int (arbitrary precision), str, bytes, bytearray, bool, float, list, dict, tuple, set, frozenset, memoryview
Built-in functions: len, range, enumerate, zip, map, filter, sorted, reversed, any, all, sum, min, max, abs, pow, hash, id, type, isinstance, issubclass, getattr, setattr, hasattr, super, iter, next, print, repr, str, int, bytes, list, dict, tuple, set, frozenset, bool, float
NamedTuple
assert statements
String methods (.removeprefix(), .startswith(), .hex(), etc.)
int.from_bytes(), int.to_bytes()

Should Have (used in significant portions):

async/await, asyncio.create_task(), async context managers
Callable, Awaitable, Protocol (structural typing at runtime)
TYPE_CHECKING conditional imports
abc.ABC, abc.abstractmethod
collections.defaultdict, collections.OrderedDict, collections.deque
functools.singledispatch with .register
functools.lru_cache
itertools.accumulate, itertools.count
struct.pack, struct.unpack
hashlib.sha256, hashlib.shake_128
hmac
secrets.token_bytes
math.ceil, math.log2
bisect.bisect_left
io.BytesIO
pathlib.Path
logging module
json, yaml serialization
base64 encoding
ipaddress module
signal handling
threading.Lock

Nice to Have (infrastructure, not core spec):

sqlite3
httpx, aiohttp (HTTP client/server)
aioquic (QUIC protocol)
numpy, numba (numerical/JIT)
pydantic (validation framework)
prometheus_client (metrics)
argparse (CLI)

What Pydantic Means for Us

Pydantic is the most challenging dependency. Every container in leanSpec is a Pydantic BaseModel. Two strategies:

Pydantic-in-Lean4: Reimplement the subset of Pydantic used (model definition, model_copy, model_fields, model_dump, field validators, frozen models). This is the target approach.
FFI to Python Pydantic: Call out to CPython for Pydantic operations. This is the escape hatch if (1) proves too costly.

The subset of Pydantic actually used is narrow:

BaseModel with model_config = ConfigDict(frozen=True, extra="forbid")
model_copy(update={...})
model_fields (field introspection)
model_dump(mode="json")
field_validator, field_serializer, model_validator
__get_pydantic_core_schema__ for custom types

Architecture

LeanPython/
  src/
    LeanPython/
      Lexer/           -- Tokenizer (Python 3.12 tokens)
      Parser/          -- PEG parser producing AST
      AST/             -- Python AST node types
      Interpreter/     -- Tree-walking interpreter
        Core/          -- Expression/statement evaluation
        Object/        -- Python object model
        Module/        -- Import system
        Builtins/      -- Built-in functions and types
      Runtime/         -- Runtime support
        Types/         -- int, str, bytes, list, dict, ...
        Exceptions/    -- Exception hierarchy
        StdLib/        -- Standard library modules
        Pydantic/      -- Pydantic compatibility layer
      FFI/             -- Optional CPython FFI bridge
    Tests/             -- Lean4 test infrastructure
  lakefile.lean        -- Build configuration
  lean-toolchain       -- Lean4 version pin

Design Decisions

Tree-walking interpreter (not bytecode VM). Simpler to implement, debug, and prove properties about. Performance is secondary to correctness.
Immutable value semantics where possible, leveraging Lean4's strengths. Python's mutable semantics are modeled explicitly with reference cells.
Arbitrary-precision integers use Lean4's native Nat/Int.
Strings use Lean4's native String with Python-compatible methods.
Bytes use ByteArray.

Phases

Phase 0: Project Scaffolding

Goal: Set up the Lean4 project, CI, and basic infrastructure.

Deliverables:

lakefile.lean with project structure
lean-toolchain pinned to stable Lean4
CI pipeline (GitHub Actions) running lake build and lake test
Basic project structure with placeholder modules

Testing:

lake build succeeds
lake test runs (even if no tests yet)
CI green on push

Phase 1: Lexer

Goal: Tokenize Python 3.12 source code into a token stream.

Scope:

All Python 3.12 token types (identifiers, keywords, literals, operators, delimiters, INDENT/DEDENT, NEWLINE, ENDMARKER)
String literals: single/double/triple-quoted, f-strings, b-strings, r-strings
Numeric literals: integers (dec, hex, oct, bin), floats, complex
Continuation lines (explicit \ and implicit inside brackets)
Comments (discarded)
Encoding detection (UTF-8 default)
INDENT/DEDENT generation from leading whitespace

Key types:

inductive TokenKind where
  | name : String -> TokenKind
  | integer : Int -> TokenKind
  | float_ : Float -> TokenKind
  | string : String -> TokenKind
  | fstring : List FStringPart -> TokenKind
  | bytes : ByteArray -> TokenKind
  | keyword : Keyword -> TokenKind
  | operator : Operator -> TokenKind
  | delimiter : Delimiter -> TokenKind
  | indent | dedent | newline | endmarker
  | comment : String -> TokenKind

structure Token where
  kind : TokenKind
  span : SourceSpan

Testing strategy:

Unit tests: Tokenize individual token types and verify output.
- Every keyword, operator, delimiter
- String edge cases: escapes, raw strings, multiline, f-string nesting
- Numeric edge cases: 0x, 0o, 0b, underscores, leading zeros
- INDENT/DEDENT: tabs, spaces, mixed (error), nested blocks
Round-trip tests: Tokenize then reconstruct source; verify equivalence.
CPython conformance: Tokenize files from references/cpython/Lib/test/ and compare token streams against CPython's tokenize module output.
leanSpec smoke test: Successfully tokenize all 152 source files in references/leanSpec/src/lean_spec/ without errors.

Acceptance criteria:

All 152 leanSpec source files tokenize without errors.
Token stream matches CPython's tokenize output for a curated test suite.

Phase 2: Parser

Goal: Parse the token stream into a Python 3.12 AST.

Scope:

PEG parser following references/cpython/Grammar/python.gram
Full expression grammar (atoms, primaries, unary/binary/ternary ops, comprehensions, lambda, yield, named expressions)
Full statement grammar (simple statements, compound statements)
Decorators
Type annotations (stored but not evaluated at parse time)
match/case (structural pattern matching) -- parse only, interpret later
Error recovery with source locations

Key AST types (mirroring CPython's ast module):

inductive Expr where
  | name : String -> Expr
  | constant : Constant -> Expr
  | binOp : Expr -> BinOp -> Expr -> Expr
  | unaryOp : UnaryOp -> Expr -> Expr
  | boolOp : BoolOp -> List Expr -> Expr
  | compare : Expr -> List (CmpOp × Expr) -> Expr
  | call : Expr -> List Expr -> List Keyword -> Expr
  | attribute : Expr -> String -> Expr
  | subscript : Expr -> Expr -> Expr
  | starred : Expr -> Expr
  | ifExp : Expr -> Expr -> Expr -> Expr
  | lambda_ : Arguments -> Expr -> Expr
  | listComp : Expr -> List Comprehension -> Expr
  | dictComp : Expr -> Expr -> List Comprehension -> Expr
  | setComp : Expr -> List Comprehension -> Expr
  | generatorExp : Expr -> List Comprehension -> Expr
  | await_ : Expr -> Expr
  | yield_ : Option Expr -> Expr
  | yieldFrom : Expr -> Expr
  | fstring : List FStringPart -> Expr
  | namedExpr : String -> Expr -> Expr
  | slice : Option Expr -> Option Expr -> Option Expr -> Expr
  | tuple_ : List Expr -> Expr
  | list_ : List Expr -> Expr
  | set_ : List Expr -> Expr
  | dict_ : List (Option Expr × Expr) -> Expr
  -- ... etc

inductive Stmt where
  | expr : Expr -> Stmt
  | assign : List Expr -> Expr -> Stmt
  | augAssign : Expr -> BinOp -> Expr -> Stmt
  | annAssign : Expr -> Expr -> Option Expr -> Stmt
  | return_ : Option Expr -> Stmt
  | delete : List Expr -> Stmt
  | raise_ : Option Expr -> Option Expr -> Stmt
  | assert_ : Expr -> Option Expr -> Stmt
  | import_ : List Alias -> Stmt
  | importFrom : Option String -> List Alias -> Option Int -> Stmt
  | if_ : Expr -> List Stmt -> List Stmt -> Stmt
  | while_ : Expr -> List Stmt -> List Stmt -> Stmt
  | for_ : Expr -> Expr -> List Stmt -> List Stmt -> Stmt
  | with_ : List WithItem -> List Stmt -> Stmt
  | try_ : List Stmt -> List ExceptHandler -> List Stmt -> List Stmt -> Stmt
  | functionDef : FunctionDef -> Stmt
  | classDef : ClassDef -> Stmt
  | global : List String -> Stmt
  | nonlocal : List String -> Stmt
  | pass_ | break_ | continue_ : Stmt
  | asyncFor : Expr -> Expr -> List Stmt -> List Stmt -> Stmt
  | asyncWith : List WithItem -> List Stmt -> Stmt
  | asyncFunctionDef : FunctionDef -> Stmt
  | match_ : Expr -> List MatchCase -> Stmt
  -- ... etc

Testing strategy:

Unit tests: Parse individual constructs and verify AST structure.
- Every statement type, every expression type
- Operator precedence and associativity
- Decorator chains
- Complex comprehensions with nested for/if
- f-string parsing with nested expressions
- Star expressions, unpacking assignments
- match/case patterns
CPython conformance: Parse files and compare AST against ast.dump() output from CPython (modulo field naming).
Negative tests: Verify parse errors for invalid syntax with clear messages.
leanSpec smoke test: Parse all 152 source files without errors.

Acceptance criteria:

All 152 leanSpec source files parse without errors.
AST structure matches CPython's ast.parse() for a curated test suite.

Phase 3: Core Interpreter -- Expressions and Primitive Types

Goal: Evaluate Python expressions with primitive types.

Scope:

Python object model foundation:
- PyObject base type with __class__, __dict__
- Type/class hierarchy: object, type
- Attribute lookup (instance -> class -> MRO)
- Descriptor protocol (__get__, __set__, __delete__)
Primitive types with full method sets:
- int (arbitrary precision, all operators, to_bytes, from_bytes, bit_length)
- bool (subclass of int)
- float (IEEE 754 double, all operators)
- str (full method set: format, join, split, strip, replace, startswith, endswith, removeprefix, encode, hex, f-string formatting)
- bytes / bytearray (full method set: hex, fromhex, concatenation, slicing)
- None, Ellipsis, NotImplemented
Container types:
- list (append, extend, insert, pop, remove, sort, reverse, slicing, * repeat)
- tuple (indexing, slicing, unpacking)
- dict (get, setdefault, items, keys, values, update, | merge, comprehension)
- set / frozenset (add, remove, union, intersection, difference, symmetric_difference)
Expression evaluation:
- All binary operators with type dispatch
- All unary operators
- Comparison chaining (a < b <= c)
- Boolean operators (and, or, not) with short-circuit
- Conditional expression (x if cond else y)
- Attribute access and method calls
- Subscript and slicing
- Starred unpacking
- Walrus operator
- All comprehension types
Variable scoping (LEGB: Local, Enclosing, Global, Builtin)
Assignment (simple, augmented, annotated, unpacking)
del statement

Testing strategy:

Unit tests per type: Exercise every method and operator.
- int: arithmetic, bitwise, comparison, to_bytes/from_bytes, edge cases (very large numbers, negative, zero)
- str: all methods used in leanSpec, f-string formatting, encoding
- bytes: slicing, hex(), fromhex(), concatenation
- list: mutations, slicing, comprehensions
- dict: comprehensions, | operator, iteration order
- set: set operations, membership testing
Operator dispatch tests: Mixed-type operations, NotImplemented fallback, reflected operators (__radd__ etc.)
Scoping tests: Closures, nested functions, global/nonlocal.
leanSpec micro-tests: Extract small expressions from leanSpec and verify evaluation matches CPython. Examples:
- 2**cls.BITS - 1 for various BITS values
- int.from_bytes(data, "little")
- List/dict comprehensions from hash.py
- Slice operations from state.py

Acceptance criteria:

All primitive type operations produce results matching CPython.
Comprehensions, slicing, and unpacking work correctly.
Scoping rules match CPython behavior.

Phase 4: Control Flow and Functions

Goal: Execute function definitions, calls, and all control flow.

Scope:

Function definitions (positional, keyword, default, *args, **kwargs)
Function calls with full argument binding
return, yield, yield from
Generator protocol (__iter__, __next__, StopIteration)
if/elif/else
while with break/continue/else
for with break/continue/else
try/except/else/finally
raise with chaining (raise X from Y)
assert
with statement (context manager protocol)
del statement
pass, break, continue
Lambda expressions
Closures and free variable capture
global and nonlocal declarations

Testing strategy:

Function call tests:
- Positional, keyword, default, *args, **kwargs in all combinations
- Argument binding edge cases (too few, too many, duplicate keywords)
Generator tests:
- Simple generators, yield from, generator expressions
- send(), throw(), close() on generators
- Generator as iterator in for loops
Exception tests:
- try/except with type matching
- Exception hierarchy (except BaseException, except Exception)
- except ... as e binding
- finally always runs (even with return/break)
- raise from chaining
- assert with and without message
Context manager tests:
- __enter__/__exit__ protocol
- Exception propagation through __exit__
- Multiple with items
leanSpec function tests: Extract functions from leanSpec and execute.
- BaseUint.__new__ with validation
- Fp.__add__, Fp.__mul__, Fp.inverse (field arithmetic)
- Generator methods like Store.ancestors
- State.process_slots loop

Acceptance criteria:

All control flow constructs work correctly.
Generators are fully functional.
Exception handling matches CPython semantics.
Context managers work with proper cleanup.

Phase 5: Object Model -- Classes, Inheritance, Descriptors

Goal: Full Python class system.

Scope:

Class definition with decorators
Single and multiple inheritance
MRO computation (C3 linearization)
super() with implicit and explicit forms
Instance creation (__new__ + __init__)
Attribute access through MRO
Descriptor protocol (__get__, __set__, __delete__, __set_name__)
@classmethod, @staticmethod, @property
@dataclass with frozen=True, slots=True
Metaclasses (type as default metaclass)
__slots__
isinstance(), issubclass()
__repr__, __str__, __hash__, __eq__
__getattr__, __getattribute__, __setattr__, __delattr__
__contains__, __len__, __iter__, __getitem__, __setitem__
__enter__, __exit__
__init_subclass__
Class variables vs instance variables
ClassVar type hint (runtime semantics)
type() for dynamic class creation

Testing strategy:

Inheritance tests:
- Diamond inheritance with C3 MRO
- super() in single and multiple inheritance
- Method resolution through MRO chain
- class Uint64(BaseUint) with BITS = 64 -- from leanSpec
- class Container(SSZModel) -- from leanSpec
Descriptor tests:
- @property getter/setter/deleter
- @classmethod and @staticmethod
- Custom descriptors
Operator overloading tests:
- All dunder methods from BaseUint (add, sub, mul, etc.)
- Fp field arithmetic through operator overloading
- Comparison operators, hashing
Dataclass tests:
- @dataclass(frozen=True, slots=True)
- Field defaults and field() factory
- __post_init__
leanSpec class tests:
- Construct Uint64(42), verify all operators
- Construct Fp(value=7), verify field arithmetic
- Construct Bytes32.zero(), verify methods
- Construct Container subclasses with field access

Acceptance criteria:

C3 MRO matches CPython for all leanSpec class hierarchies.
Operator overloading works for BaseUint, Fp, and BaseBytes.
@dataclass produces correct frozen, slotted classes.
super() works correctly in all leanSpec usage patterns.

Phase 6: Module System

Goal: Import and execute Python modules and packages.

Scope:

import module
from module import name
from module import * (with __all__)
Relative imports (from . import x, from ..sub import y)
Package __init__.py execution
sys.path search and module resolution
Module caching (sys.modules)
__name__, __file__, __package__ attributes
from __future__ import annotations (disable annotation evaluation)
TYPE_CHECKING constant (always False at runtime)
Circular import handling

Testing strategy:

Import resolution tests:
- Absolute imports
- Relative imports at various depths
- Package __init__.py re-exports
- from __future__ import annotations
Module execution tests:
- Module-level code execution
- Module-level constants and Final annotation
- __all__ filtering for import *
Circular import tests:
- Detect and handle circular imports gracefully
- TYPE_CHECKING conditional imports don't trigger at runtime
leanSpec import tests:
- Import lean_spec.types.uint and verify Uint64 is accessible
- Import lean_spec.types.container and verify Container is accessible
- Import lean_spec.subspecs.ssz.hash and verify hash_tree_root works
- Import chain: state.py -> checkpoint.py -> slot.py -> uint.py
- Full leanSpec import graph resolves without errors

Acceptance criteria:

All leanSpec import statements resolve correctly.
Relative imports work within the leanSpec package hierarchy.
Module-level code executes in correct order.

Phase 7: Standard Library

Goal: Implement the standard library modules used by leanSpec.

This phase is split into sub-phases, each adding a group of related modules.

Phase 7a: Core Utilities

functools: singledispatch (with .register), lru_cache, reduce
itertools: accumulate, count, chain
collections: defaultdict, OrderedDict, deque
collections.abc: Iterable, Iterator, Mapping, Sequence, Set, MutableMapping, MutableSequence, MutableSet, Callable, Hashable
typing: Any, ClassVar, Final, Self, Literal, TypeAlias, SupportsInt, SupportsIndex, IO, Protocol, runtime_checkable, TYPE_CHECKING, override, NoReturn, NamedTuple, get_type_hints
abc: ABC, abstractmethod
dataclasses: dataclass, field, fields
enum: Enum, IntEnum
math: ceil, log2, floor, sqrt, inf, nan
copy: copy, deepcopy
operator: itemgetter, attrgetter

Phase 7b: Data Handling

struct: pack, unpack, calcsize
io: BytesIO, StringIO
json: dumps, loads
base64: b64encode, b64decode, b16encode, b16decode
bisect: bisect_left, bisect_right, insort
re: basic regex (if needed)

Phase 7c: Cryptography and Hashing

hashlib: sha256, shake_128, new
hmac: new, digest
secrets: token_bytes, randbelow

Phase 7d: System and OS

os: environ, path, getcwd
sys: path, modules, argv, stdout, stderr, exit
pathlib: Path
logging: Logger, getLogger, basicConfig, handlers
time: time, monotonic, sleep
datetime: datetime, timedelta, timezone
tempfile: NamedTemporaryFile, mkdtemp
signal: signal, SIGINT, SIGTERM
threading: Lock, RLock

Testing strategy per sub-phase:

Unit tests: Each function/class tested independently against CPython.
Conformance tests: Run the same test with both CPython and LeanPython, compare output.
leanSpec integration: After each sub-phase, attempt to import and run progressively more of leanSpec.

Acceptance criteria:

functools.singledispatch works with leanSpec's hash_tree_root pattern.
hashlib.sha256 produces correct digests.
struct.pack/unpack handles little-endian byte encoding.
All standard library functions used in leanSpec produce CPython-identical results.

Phase 8: Pydantic Compatibility Layer

Goal: Implement the subset of Pydantic used by leanSpec.

This is the most challenging phase. Pydantic is deeply integrated into leanSpec -- every container, every type, every validation path uses it.

Scope:

BaseModel with:
- Field declaration via class annotations
- model_config = ConfigDict(frozen=True, extra="forbid", strict=True)
- model_copy(update={...}) (immutable copy with field updates)
- model_fields (field name -> FieldInfo mapping)
- model_dump(mode="json") (serialization to dict)
- __init__ validation (type checking, coercion)
- __eq__, __hash__, __repr__ generation
- __iter__ over field values
Field validators:
- @field_validator("field_name", mode="before"|"after")
- @model_validator(mode="before"|"after")
- @field_serializer("field_name")
Custom type schemas:
- __get_pydantic_core_schema__ hook
- core_schema.union_schema, core_schema.is_instance_schema, core_schema.chain_schema, core_schema.int_schema, core_schema.no_info_plain_validator_function, core_schema.plain_serializer_function_ser_schema
ConfigDict options: frozen, extra, strict, populate_by_name
Alias generation (CamelCase via alias_generator)

Testing strategy:

Model definition tests: Define models, verify field access, immutability, extra field rejection.
Validation tests: Valid/invalid inputs, type coercion, custom validators.
Serialization tests: model_dump output matches CPython/Pydantic output.
model_copy tests: Verify immutable updates produce correct new instances.
leanSpec container tests:
- Define Config, Checkpoint, Slot, ValidatorIndex containers
- Construct, validate, serialize
- model_copy(update={...}) on State (as used in process_slots)
- model_fields iteration (as used in Container.serialize)
- __get_pydantic_core_schema__ for BaseUint types

Acceptance criteria:

All leanSpec container types can be defined and instantiated.
model_copy produces correct immutable updates.
model_fields returns correct field metadata.
Validation rejects invalid inputs as Pydantic does.

Phase 9: leanSpec Integration

Goal: Execute the core leanSpec consensus specification end-to-end.

Scope (incremental):

Phase 9a: Types

Import and use all SSZ types: Uint8..Uint64, Boolean, Bytes32, Fp
Import and use collection types: SSZList, SSZVector, BaseBitlist
Import and use containers: Container base and all field types
SSZ serialization/deserialization round-trip

Phase 9b: SSZ Merkleization

hash_tree_root for all types via singledispatch
merkleize, mix_in_length, pack_bytes, pack_bits
Correct Merkle roots matching CPython output

Phase 9c: Consensus Containers

State, Block, BlockHeader, BlockBody
Checkpoint, Slot, ValidatorIndex
Attestation, AggregatedAttestation, AggregationBits
All container methods

Phase 9d: State Transition

State.process_slots (slot advancement)
State.process_block_header (block header validation)
State.process_attestations (justification/finalization)
State.state_transition (full STF)
State.build_block (block building)

Phase 9e: Fork Choice

Store with block/state management
Store.ancestors generator
Fork choice algorithm

Phase 9f: Cryptographic Subspecs

XMSS signature scheme
Poseidon2 hash function
KoalaBear field arithmetic (already covered by Fp tests)

Testing strategy:

Golden test vectors: For each sub-phase, generate test vectors from CPython/leanSpec and verify LeanPython produces identical results.
State transition tests: Run leanSpec's own tests/lean_spec/ test suite through LeanPython and compare results.
Fixture generation: Run tests/consensus/ fillers and verify the generated JSON fixtures match CPython output byte-for-byte.
Hash comparison: Every hash_tree_root call produces the same Bytes32 as CPython.
End-to-end: Process a sequence of blocks through state_transition and verify final state matches CPython.

Acceptance criteria:

All leanSpec unit tests pass when run through LeanPython.
Generated consensus fixtures match CPython output.
State transition for multi-block sequences produces identical states.

Phase 10: Async Runtime (Optional)

Goal: Support async/await for leanSpec's networking and node code.

Scope:

Event loop (basic asyncio subset)
async def, await, async for, async with
asyncio.create_task, asyncio.gather, asyncio.sleep
asyncio.Queue, asyncio.Event
Task cancellation

This phase is optional for the core consensus spec interpretation but required for the full node implementation (networking, API, validator service).

Testing strategy:

Coroutine tests: Basic async function execution, awaiting results.
Task tests: Concurrent task execution, cancellation.
leanSpec node tests: Run the simplified node startup sequence.

Testing Infrastructure

Test Harness

The test infrastructure supports three modes:

Lean4 Unit Tests (lake test): Native Lean4 tests for interpreter internals (lexer, parser, evaluator).

Python Conformance Tests: A test runner that executes .py test files through both CPython and LeanPython, comparing outputs. Directory structure:

tests/
  conformance/
    expressions/     -- arithmetic, comparisons, boolean ops
    statements/      -- assignments, control flow, imports
    functions/       -- calls, generators, closures
    classes/         -- inheritance, descriptors, operators
    stdlib/          -- standard library module tests
    pydantic/        -- pydantic compatibility tests
    leanspec/        -- extracted leanSpec test cases

leanSpec Test Suite: Run leanSpec's own tests/lean_spec/ tests directly through LeanPython.

Conformance Testing Process

For each phase, conformance tests follow this pattern:

1. Write Python test file (test_foo.py)
2. Run with CPython -> capture output/state
3. Run with LeanPython -> capture output/state
4. Assert outputs match

For deterministic tests (no randomness, no I/O), outputs must match exactly. For non-deterministic tests, structural equivalence is checked.

Continuous Integration

Every PR runs: lake build, lake test, conformance tests for completed phases.
Nightly: Full leanSpec test suite execution (longer running).
Coverage tracking: Which leanSpec files can be fully interpreted.

Progress Tracking

Phase	Description	Status
0	Project scaffolding	Done
1	Lexer	Done
2	Parser	Done
3	Core interpreter (expressions, types)	Done
4	Control flow and functions	Done
5	Object model (classes, inheritance)	Done (5A: core, 5B: dunders/decorators, 5C: C3 MRO/__new__/attr hooks/dataclass)
6	Module system	Done
7a	Stdlib: core utilities	Done (math, copy, functools, abc, dataclasses, operator, collections, itertools, enum, typing)
7b	Stdlib: data handling	Done (struct, io.BytesIO/StringIO, bisect, base64, json, re stub)
7c	Stdlib: cryptography	Done (hashlib sha256/shake_128, hmac, secrets)
7d	Stdlib: system/OS	Done (sys, os/os.path, time, datetime, pathlib, logging, signal stub, threading stub, tempfile stub)
8a	Pydantic: foundation	Done (BaseModel, ConfigDict, model_copy, model_dump, model_fields, frozen, inheritance)
8b	Pydantic: validators, serializers, core_schema	Done (field_validator, model_validator before/after, field_serializer, model_serializer, pydantic_core.core_schema module, evalCoreSchema engine, validator inheritance)
8c	Pydantic: advanced features for leanSpec	Done (extra="forbid" enforcement, alias_generator/to_camel, populate_by_name, model_dump by_alias, ConfigDict MRO inheritance, get_pydantic_core_schema hook, validate_default/arbitrary_types_allowed parsing)
8d-e	Pydantic: remaining advanced features	Not started
9a	leanSpec: types	Done (built-in type subclassing, BaseUint/BaseBytes/Boolean, Container serialize/deserialize, SSZVector/SSZList, BaseBitvector encode/decode round-trip, BaseBitlist with delimiter bit encode/decode, BaseByteList variable-length byte list, f-string support, Pydantic model_fields introspection)
9b	leanSpec: SSZ merkleization	Done (functools.singledispatch with .register, bytes slicing, hash_nodes/pack_bytes/merkleize/mix_in_length pipeline, pack_bits, hash_tree_root for BaseUint/Boolean/BaseBytes/BaseByteList/BaseBitvector/BaseBitlist/SSZVector/SSZList/Container via singledispatch, issubclass with tuple, bytes.join)
9c	leanSpec: consensus containers	In progress (Slot/ValidatorIndex Uint64 subclasses with methods, math.isqrt, Checkpoint nested Container, Config/BlockHeader containers, model_copy on nested containers, default field values, SSZList with Container elements composite hash_tree_root, BlockBody/Block nested Container+SSZList, State.generate_genesis classmethod factory, hash_tree_root verified byte-for-byte against CPython for all consensus containers)
9d	leanSpec: state transition	Not started
9e	leanSpec: fork choice	Not started
9f	leanSpec: cryptographic subspecs	Not started
10	Async runtime (optional)	Not started

Risk Assessment

Risk	Impact	Mitigation
Pydantic complexity	High	Start with minimal subset; FFI escape hatch
Python semantics edge cases	Medium	Extensive conformance testing against CPython
Performance (tree-walking)	Medium	Acceptable for spec interpretation; optimize later if needed
Lean4 library gaps	Low	Use FFI for truly missing functionality
leanSpec evolution	Medium	Pin to specific commit; track upstream changes

Non-Goals

Full CPython compatibility (only the subset used by leanSpec)
Performance parity with CPython
C extension module support
eval()/exec() (not used in leanSpec)
Python 2 compatibility
REPL/interactive mode (initially)
Formal verification of the interpreter (future work)
The networking/API/storage infrastructure in leanSpec (Phase 10 scope)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LeanPython: Python Interpreter in Lean4

Vision

Target Codebase Analysis

Python Features Required (by priority)

What Pydantic Means for Us

Architecture

Design Decisions

Phases

Phase 0: Project Scaffolding

Phase 1: Lexer

Phase 2: Parser

Phase 3: Core Interpreter -- Expressions and Primitive Types

Phase 4: Control Flow and Functions

Phase 5: Object Model -- Classes, Inheritance, Descriptors

Phase 6: Module System

Phase 7: Standard Library

Phase 7a: Core Utilities

Phase 7b: Data Handling

Phase 7c: Cryptography and Hashing

Phase 7d: System and OS

Phase 8: Pydantic Compatibility Layer

Phase 9: leanSpec Integration

Phase 9a: Types

Phase 9b: SSZ Merkleization

Phase 9c: Consensus Containers

Phase 9d: State Transition

Phase 9e: Fork Choice

Phase 9f: Cryptographic Subspecs

Phase 10: Async Runtime (Optional)

Testing Infrastructure

Test Harness

Conformance Testing Process

Continuous Integration

Progress Tracking

Risk Assessment

Non-Goals

FilesExpand file tree

PLAN.md

Latest commit

History

PLAN.md

File metadata and controls

LeanPython: Python Interpreter in Lean4

Vision

Target Codebase Analysis

Python Features Required (by priority)

What Pydantic Means for Us

Architecture

Design Decisions

Phases

Phase 0: Project Scaffolding

Phase 1: Lexer

Phase 2: Parser

Phase 3: Core Interpreter -- Expressions and Primitive Types

Phase 4: Control Flow and Functions

Phase 5: Object Model -- Classes, Inheritance, Descriptors

Phase 6: Module System

Phase 7: Standard Library

Phase 7a: Core Utilities

Phase 7b: Data Handling

Phase 7c: Cryptography and Hashing

Phase 7d: System and OS

Phase 8: Pydantic Compatibility Layer

Phase 9: leanSpec Integration

Phase 9a: Types

Phase 9b: SSZ Merkleization

Phase 9c: Consensus Containers

Phase 9d: State Transition

Phase 9e: Fork Choice

Phase 9f: Cryptographic Subspecs

Phase 10: Async Runtime (Optional)

Testing Infrastructure

Test Harness

Conformance Testing Process

Continuous Integration

Progress Tracking

Risk Assessment

Non-Goals