Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,11 +75,14 @@ python scripts/visualizer.py --path examples/function_minimization/openevolve_ou

5. **Iteration (`openevolve/iteration.py`)**: Worker process that samples from islands, generates mutations via LLM, evaluates programs, and stores artifacts.

6. **Repair Subagent (`openevolve/evaluator.py`)**: When an evaluator raises `EvaluatorRepairRequest` (e.g. on compilation failure), the evaluator asks a dedicated LLM ensemble to fix the code and re-evaluates it. Configured via `EvaluatorConfig.repair_on_failure`, `max_repair_attempts`, and `repair_diff_based`. Uses `repair_models` from `LLMConfig` (falls back to `evaluator_models` then `models`). Repair history is stored as artifacts.

### Key Architectural Patterns

- **Island-Based Evolution**: Multiple populations evolve separately with periodic migration
- **MAP-Elites**: Maintains diversity by mapping programs to feature grid cells
- **Artifact System**: Side-channel for programs to return debugging data, stored as JSON or files
- **LLM Repair Loop**: Evaluators can raise `EvaluatorRepairRequest` to trigger LLM-based code repair before discarding broken programs
- **Process Worker Pattern**: Each iteration runs in fresh process with database snapshot
- **Double-Selection**: Programs for inspiration differ from those shown to LLM
- **Lazy Migration**: Islands migrate based on generation counts, not iterations
Expand Down
41 changes: 41 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -468,6 +468,9 @@ evaluator:
enable_artifacts: true # Error feedback to LLM
cascade_evaluation: true # Multi-stage testing
use_llm_feedback: true # AI code quality assessment
repair_on_failure: true # LLM repair on EvaluatorRepairRequest
max_repair_attempts: 2 # Retry limit per broken program
repair_diff_based: false # true=SEARCH/REPLACE diffs, false=full rewrite

prompt:
# Sophisticated inspiration system
Expand Down Expand Up @@ -720,6 +723,44 @@ return EvaluationResult(

This creates a **feedback loop** where each generation learns from previous mistakes!

### LLM-Based Code Repair

When evolved code has a correctable error (e.g. a compilation failure), your evaluator can raise `EvaluatorRepairRequest` to trigger an automatic LLM repair attempt instead of discarding the program:

```python
from openevolve.evaluation_result import EvaluatorRepairRequest

def evaluate(program_path):
result = compile(program_path)
if result.returncode != 0:
with open(program_path) as f:
code = f.read()
raise EvaluatorRepairRequest(
message="Compilation failed",
broken_code=code,
repair_context=result.stderr,
language="cpp",
fallback_metrics={"combined_score": 0.0}, # used if repair fails
)
# ... normal evaluation ...
```

Enable repair in your config:

```yaml
evaluator:
repair_on_failure: true
max_repair_attempts: 2
repair_diff_based: false # true for SEARCH/REPLACE diffs, false for full rewrite

llm:
repair_models: # optional — falls back to evaluator_models, then models
- name: "your-repair-model"
weight: 1.0
```

Repair history is stored in program artifacts and displayed in the visualizer.

## Visualization

**Real-time evolution tracking** with interactive web interface:
Expand Down
24 changes: 22 additions & 2 deletions openevolve/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,9 +109,13 @@ class LLMConfig(LLMModelConfig):
# n-model configuration for evolution LLM ensemble
models: List[LLMModelConfig] = field(default_factory=list)

# n-model configuration for evaluator LLM ensemble
# n-model configuration for evaluator LLM ensemble (LLM feedback scoring)
evaluator_models: List[LLMModelConfig] = field(default_factory=lambda: [])

# n-model configuration for repair LLM ensemble.
# Falls back to evaluator_models (then models) when not set.
repair_models: List[LLMModelConfig] = field(default_factory=lambda: [])

# Backwardes compatibility with primary_model(_weight) options
primary_model: str = None
primary_model_weight: float = None
Expand Down Expand Up @@ -184,7 +188,7 @@ def __post_init__(self):

def update_model_params(self, args: Dict[str, Any], overwrite: bool = False) -> None:
"""Update model parameters for all models"""
for model in self.models + self.evaluator_models:
for model in self.models + self.evaluator_models + self.repair_models:
for key, value in args.items():
if overwrite or getattr(model, key, None) is None:
setattr(model, key, value)
Expand All @@ -194,6 +198,7 @@ def rebuild_models(self) -> None:
# Clear existing models lists
self.models = []
self.evaluator_models = []
self.repair_models = []

# Re-run model generation logic from __post_init__
if self.primary_model:
Expand All @@ -220,6 +225,10 @@ def rebuild_models(self) -> None:
if not self.evaluator_models:
self.evaluator_models = self.models.copy()

# If no repair models are defined, fall back to evaluator_models
if not self.repair_models:
self.repair_models = self.evaluator_models.copy()

# Update models with shared configuration values
shared_config = {
"api_base": self.api_base,
Expand Down Expand Up @@ -383,6 +392,17 @@ class EvaluatorConfig:
enable_artifacts: bool = True
max_artifact_storage: int = 100 * 1024 * 1024 # 100MB per program

# LLM-based repair on EvaluatorRepairRequest
# When a user evaluator raises EvaluatorRepairRequest (e.g. on compile
# failure) OpenEvolve will ask the LLM to fix the code and re-evaluate,
# storing the repaired version in the database rather than the broken
# original.
repair_on_failure: bool = False
max_repair_attempts: int = 2
# True → ask the LLM for SEARCH/REPLACE diffs (uses repair_diff_user template)
# False → ask the LLM for a full rewrite (uses repair_full_rewrite_user template)
repair_diff_based: bool = False


@dataclass
class EvolutionTraceConfig:
Expand Down
5 changes: 5 additions & 0 deletions openevolve/controller.py
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,9 @@ def __init__(
for model_cfg in self.config.llm.evaluator_models:
if not hasattr(model_cfg, "random_seed") or model_cfg.random_seed is None:
model_cfg.random_seed = llm_seed
for model_cfg in self.config.llm.repair_models:
if not hasattr(model_cfg, "random_seed") or model_cfg.random_seed is None:
model_cfg.random_seed = llm_seed

logger.info(f"Set random seed to {self.config.random_seed} for reproducibility")
logger.debug(f"Generated LLM seed: {llm_seed}")
Expand Down Expand Up @@ -139,6 +142,7 @@ def __init__(
# Initialize components
self.llm_ensemble = LLMEnsemble(self.config.llm.models)
self.llm_evaluator_ensemble = LLMEnsemble(self.config.llm.evaluator_models)
self.llm_repair_ensemble = LLMEnsemble(self.config.llm.repair_models)

self.prompt_sampler = PromptSampler(self.config.prompt)
self.evaluator_prompt_sampler = PromptSampler(self.config.prompt)
Expand All @@ -158,6 +162,7 @@ def __init__(
self.evaluator_prompt_sampler,
database=self.database,
suffix=Path(self.initial_program_path).suffix,
repair_llm_ensemble=self.llm_repair_ensemble,
)
self.evaluation_file = evaluation_file

Expand Down
2 changes: 1 addition & 1 deletion openevolve/database.py
Original file line number Diff line number Diff line change
Expand Up @@ -1435,7 +1435,7 @@ def _sample_from_island_weighted(self, island_id: int) -> Program:
Parent program selected using fitness-weighted sampling
"""
island_id = island_id % len(self.islands)
island_programs = list(self.islands[island_id])
island_programs = sorted(self.islands[island_id])

if not island_programs:
# Island is empty, fall back to any available program
Expand Down
47 changes: 46 additions & 1 deletion openevolve/evaluation_result.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,52 @@

import json
from dataclasses import dataclass, field
from typing import Dict, Union
from typing import Dict, Optional, Union


class EvaluatorRepairRequest(Exception):
"""
Raised by a user evaluator to request an LLM-based code repair attempt.

Raise this instead of returning a zero score when the generated code has a
correctable error (e.g. a compilation failure). OpenEvolve will attempt to
repair the code using the configured LLM before recording it in the database,
so that future evolution branches from working code rather than the broken
original.

Args:
message: Human-readable error description (shown in repair history
and logged).
broken_code: The full source that failed. Must be the complete file,
not just the error region, so the repair LLM has full
context.
repair_context: Optional extra information for the repair prompt (e.g.
full compiler stderr, runtime traceback). Defaults to
the same text as *message*.
language: Source-language identifier used in the prompt code fence
(e.g. ``"cpp"``, ``"python"``). Defaults to
``"python"``.
fallback_metrics: Metrics dict to use if repair is disabled or all repair
attempts are exhausted. Should include all feature
dimensions required by the MAP-Elites database set to
appropriate penalty values, plus ``combined_score: 0.0``.
When ``None``, a minimal ``{"combined_score": 0.0}`` is
used.
"""

def __init__(
self,
message: str,
broken_code: str,
repair_context: str = "",
language: str = "python",
fallback_metrics: Optional[Dict[str, float]] = None,
) -> None:
super().__init__(message)
self.broken_code = broken_code
self.repair_context = repair_context or message
self.language = language
self.fallback_metrics: Dict[str, float] = fallback_metrics or {"combined_score": 0.0}


@dataclass
Expand Down
Loading
Loading