This is a research platform for execution-grounded prompt evolution.
Framing: evolutionary software optimization, NOT AGI/sentience claims.
Repository is public at NullLabTests/grounded_evolution.
- Type hints on all function signatures and public variables
- No comments unless absolutely necessary (the code should explain itself)
- Max line length: loosely 120 (no hard enforcement, follow existing style)
- Imports: stdlib, then blank line, then third-party, then blank line, then local
(stdlib comes first; no
isort/ruffordering enforced — be practical) - Use
Anyfromtypingfor dynamic types, never bare generics omitted - Prefer
Pathfrompathliboveros.path - File-level docstrings on every
.pyfile
generator.py— LLM code generation, returns(text, usage_dict)tupleevaluator/runtime_evaluator.py— execution-grounded validation (AST, pytest, hidden tests)mutation_engine.py— prompt mutation/crossover operatorsmutation.py/evaluate.py/evolve_forever.py/auto_evolve.py— lexical-only loop (legacy)population_manager.py— JSON-based population persistenceinfinite_research_loop.py— main grounded loop (calls generator → evaluator → population_manager)run_experiment.py— orchestrated ablation experimentsbenchmarks/tasks.json— 3 benchmark definitions with inline hidden test filesexperiments/— all experiment output (logs, archives, ablation runs)
- Lexical loop (
evaluate.py/evolve_forever.py): keyword-matching fitness. Currently at 218 prompts, best score 1000/1000. Less important now. - Grounded loop (
infinite_research_loop.py/generator.py/runtime_evaluator.py): real code execution fitness. This is the primary focus.
LLM_API_KEY— required for grounded loopLLM_MODEL— model name (default:mistral-large-latest)LLM_BASE_URL— API base URL (default:https://api.mistral.ai/v1)
- No test suite for the project itself yet (TODO for future)
- Hidden benchmark tests live in
benchmarks/tasks.jsonashidden_test_filesdict - Rust-based tests (
cargo test) exist in thegenerated_projects/output (not our code)
- Auto-commits on score improvement from the grounded loop
- Manual commits for structural changes (new features, refactors, docs)
- Commit messages: concise, descriptive, no emoji
- Check if the feature already exists (grep for related terms)
- Follow the existing pattern (if it's a mutation, add to
mutation_engine.py) - Type hints everywhere
- Add the new feature to
run_experiment.pyif it's an experimental variable - Update EXPERIMENT_DESIGN.md if the experiment protocol changes