"Give an AI agent a real project and let it experiment autonomously." — Inspired by karpathy/autoresearch
Original karpathy/autoresearch — AI agent researches neural network training (nanochat), modifying only train.py with a single val_bpb metric.
ProjectEvolve — extends this idea to any project:
- Any programming language (Python, JavaScript, Go, Rust, ...)
- Any task types (backend, frontend, DevOps, documentation, ...)
- Any files and directories (full freedom of action)
- Cross-platform (Windows, Linux, macOS)
- Knowledge persistence across runs
Key inheritance: agent works autonomously, iteratively improves project, keeps successful changes, discards failures.
ProjectEvolve is a universal tool for running an AI agent on any project. The agent autonomously analyzes code, proposes improvements, makes changes, and learns from previous experiments.
- Analyzes — studies project structure, code, documentation
- Proposes — generates improvement ideas
- Implements — makes changes to code/structure/docs
- Tests — ensures nothing breaks
- Accumulates — next iteration sees previous results
- Repeats — cycle continues autonomously
- 🔄 Autonomous experiments — AI independently analyzes, proposes, and implements improvements
- 📚 Knowledge accumulation — each iteration sees previous results, building project knowledge
- ⚡ Universality — works with Python, JavaScript, Go, Rust, and any other technology
- 🎨 Flexible setup — simple questionnaire adapts to project
- 🌐 Cross-platform — Windows, Linux, macOS
- 🔧 Zero maintenance — agent handles everything
┌─────────────────┐ ┌──────────────┐ ┌─────────────┐
│ Your Project │─────▶│ ProjectEvolve│─────▶│ AI Agent │
│ (any language) │ │ (script) │ │ (Claude) │
└─────────────────┘ └──────────────┘ └─────────────┘
│ │
▼ ▼
┌──────────────┐ ┌─────────────┐
│ Configuration│ │ Experiment │
│ .autoresearch│ │ #1, #2, #3 │
└──────────────┘ └─────────────┘
│ │
▼ ▼
┌──────────────┐ ┌─────────────┐
│ Improvements│◀─────│ Context │
│ code/docs │ │ accumulates│
└──────────────┘ └─────────────┘
- 🔍 Analyze — studies project structure, code, documentation
- 💡 Propose — generates improvement ideas
- 🔨 Implement — makes changes to code, structure, documentation
- 🧪 Quality Loop — built-in self-testing with quantitative metrics
- 📊 Evaluate — automatic scoring (0.0-1.0) with pass/fail decisions
- 📝 Document — updates README, creates new documentation
- 🔄 Iterate — each iteration learns from previous ones
| Platform | Support | Installation |
|---|---|---|
| Windows | ✅ Full | autoresearch.bat |
| Linux | ✅ Full | python autoresearch.py |
| macOS | ✅ Full | python autoresearch.py |
- Python 3.10+
- Claude CLI (Anthropic)
- Git (optional)
ProjectEvolve includes a built-in self-testing system inspired by quality gates:
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Generate │─────▶│ Apply │─────▶│ Evaluate │
│ Idea │ │ Changes │ │ (Score) │
└──────────────┘ └──────────────┘ └──────┬───────┘
│
▼
┌──────────────┐
│ Decision │
│ KEEP/DISCARD│
└──────────────┘
│
┌─────────────────────┘
│ (if kept)
▼
┌──────────────┐
│ Next Iter. │
└──────────────┘
- Universal — works with Python, JavaScript, Go, Rust, Ruby, Java, any language
- Auto-detect — automatically finds test commands (
npm test,pytest,cargo test, etc.) - Quantitative — scores 0.0-1.0 with pass/fail decisions
- Two-phase — Phase A (base quality, 70% threshold) → Phase B (strict quality, 85% threshold)
- Automatic — runs tests after each experiment, decides to keep or discard changes
# Standalone quality check
python F:/IdeaProjects/autoresearch/utils/quality_loop.py --project /path/to/project
# Custom thresholds
python utils/quality_loop.py --project . --threshold-a 0.7 --threshold-b 0.85
# JSON output for parsing
python utils/quality_loop.py --project . --jsonConfiguration file .autoresearch/quality.yml is created automatically:
metrics:
tests:
enabled: true
command: "" # Auto-detect: npm test, pytest, cargo test, etc.
build:
enabled: false
command: "" # Auto-detect: npm run build, cargo build, etc.
thresholds:
a:
min_score: 0.7 # Phase A threshold
required_checks: ["tests"]
b:
min_score: 0.85 # Phase B threshold
required_checks: ["tests", "build"]Keep changes if:
- ✅ Score ≥ baseline + 0.05 (improvement)
- ✅ All required checks pass
- ✅ No critical failures
Discard changes if:
- ❌ Score decreased
- ❌ Critical tests fail
- ❌ Violates project constraints
Manual review if:
⚠️ Score ~ baseline (minimal change)⚠️ Some non-critical tests fail
ProjectEvolve requires Claude Code to run with appropriate permissions.
- ✅ "bypass permissions on" — Recommended! No approvals needed, full autonomy
⚠️ Other modes (auto/manual) — May require permission approvals during execution- ❌ Risk: Agent may hang waiting for user to approve tool usage
ProjectEvolve agent needs these tools to be approved (if not in bypass mode):
Core tools:
Edit— Modify filesRead— Read file contentsWrite— Create new filesGlob— Find files by patternGrep— Search file contents
Optional tools:
Bash— Execute shell commands (for Quality Loop)Agent— Spawn sub-agents
Option 1: Bypass Mode (Recommended)
{
"permissionMode": "bypass"
}Option 2: Auto-Approve Safe Tools
{
"permissionMode": "auto",
"autoApproveSafeTools": true,
"alwaysAllowTools": ["Edit", "Read", "Write", "Glob", "Grep"]
}If agent hangs during experiment execution:
- Check if a permission prompt is waiting for approval
- Approve the required tool (Edit, Read, Write, etc.)
- Or switch to bypass mode for full autonomy
Note: All scripts have a 30-minute timeout per experiment. If Claude CLI hangs (e.g., on permission prompts), the experiment will time out and continue to the next iteration. Check logs for timeout errors.
For AI agent: Read INSTALL.md and configure the system:
# AI agent sets up the environment
# (install Python, Node.js, Claude CLI, create directories)AI agent will:
- ✅ Detect OS (Windows/Linux/macOS)
- ✅ Install missing dependencies
- ✅ Create required directories
- ✅ Verify installation
See:
INSTALL.md— this is the instruction for AI agent on cross-platform setup.
After environment setup, run the script:
# Basic run (10 iterations, 5 min interval)
python F:/IdeaProjects/autoresearch/autoresearch.py --project /path/to/project
# With parameters
python F:/IdeaProjects/autoresearch/autoresearch.py --project . --iter 50 --timeout 2
# Windows (via bat-file)
F:/IdeaProjects/autoresearch/autoresearch.py . 50 2Parameters:
--project— path to your project--iter— number of iterations (default: 10)--timeout— interval between iterations in minutes (default: 5)
autoresearch/
├── autoresearch.py # Main script
├── autoresearch.bat # Windows launcher
├── INSTALL.md # Installation guide (for AI)
├── README.md # This file (English main)
├── README_RU.md # Russian version (full)
├── QUICKSTART.md # Quick guide
├── config/
│ ├── default_prompt.md # Agent prompt template
│ └── quality.yml # Quality gate configuration
├── utils/
│ ├── cli_setup.py # Interactive setup
│ └── quality_loop.py # Quality loop implementation
└── .gitignore # Git ignore
your-project/
├── .autoresearch/
│ ├── .autoresearch.json # Project configuration
│ ├── quality.yml # Quality gate configuration (auto-created)
│ ├── experiments/
│ │ ├── prompt_1.md
│ │ ├── output_1.md
│ │ ├── accumulation_context.md # Accumulated context
│ │ ├── last_experiment.md # Last experiment
│ │ ├── changes_log.md # Changes log
│ │ └── summary.json # Final summary
│ └── logs/
│ └── autoresearch.log # Run logs
========================================================================
ProjectEvolve - First Time Setup
========================================================================
Project: /path/to/your-project
Project name: My Awesome App
Short description: Web app for task management
Project goals (one per line):
> Improve performance
> Add tests
> Update documentation
> [Enter]
Constraints (optional):
> Don't change API
> [Enter]
✓ Configuration saved!
{
"name": "My Awesome App",
"description": "Web app for task management",
"goals": [
"Improve performance",
"Add tests",
"Update documentation"
],
"constraints": [
"Don't change API"
],
"tech_stack": ["Python", "FastAPI", "PostgreSQL"],
"focus_areas": ["performance", "testing", "documentation"]
}# Short form: 3 experiments, 1 minute interval
python F:/IdeaProjects/autoresearch/autoresearch.py . 3 1
# Long form: same as above
python F:/IdeaProjects/autoresearch/autoresearch.py --project . --iter 3 --timeout 1# 50 experiments, 10 minutes interval
python F:/IdeaProjects/autoresearch/autoresearch.py --project . --iter 50 --timeout 10# Initial configuration
python F:/IdeaProjects/autoresearch/autoresearch.py --project /path/to/project --configure
# Later — run
python F:/IdeaProjects/autoresearch/autoresearch.py --project /path/to/project --iter 10# Continue from Experiment 25 (after previous session ended at 24)
python F:/IdeaProjects/autoresearch/autoresearch.py --project . --iter 10 --start-from 25
# This will run Experiments 25-34 (10 experiments starting from 25)
# The agent will still see accumulated context from all previous experiments# Auto-detects next experiment number (if output_1.md exists, starts from 2)
python F:/IdeaResearch/autoresearch/autoresearch.py . 10 1
# Or without --project parameter (uses current directory)
python F:/IdeaProjects/autoresearch/autoresearch.py 10 1npm install -g @anthropic-ai/claude-codeInstall Python 3.10+ and add to PATH.
Increase interval between iterations (--timeout).
python F:/IdeaProjects/autoresearch/autoresearch.py --project . --reconfigure- 📖 INSTALL.md — Installation guide (for AI agent)
- ⚡ QUICKSTART.md — Quick guide
- 🇷🇺 README_RU.md — Русская версия
Contributions welcome! Create issues and pull requests.
- 🌐 Web UI for experiment monitoring
- 📊 Progress visualization
- 🔔 Completion notifications
- 📈 Metrics and analytics
- 🔄 CI/CD integration
MIT License — freely use in any project.
If you find this project useful, please give it a star on GitHub!
Made with ❤️ for autonomous project research