MemGovern enhances SWE-Agent by injecting governance-aware experience memories into the agent's reasoning loop. When facing a new GitHub issue, the agent retrieves similar past experiences and learns from successful resolution patterns.
🐛 New Issue → 🔍 Memory Retrieval → 📚 Experience Injection → 🧠 Enhanced Reasoning → ✅ Better Patches
| Model | SWE-Agent | MemGovern (Ours) | Improvement |
|---|---|---|---|
| Claude-4-Sonnet | 66.6% | 69.8% | +3.2 |
| GPT5-Medium | 65.0% | 67.4% | +2.4 |
| DeepSeek-V3.1T | 62.8% | 65.8% | +3.0 |
| Qwen3-235B | 47.2% | 55.4% | +8.2 |
| Kimi-K2-Instruct | 43.8% | 51.8% | +8.0 |
| GPT-4o | 23.2% | 32.6% | +9.4 |
| GPT-4o-Mini | 14.0% | 17.2% | +3.2 |
From "Solving from Scratch" → To "Learning from Experience"
MemGovern/
├── data/ # 📦 Experience DB artifacts (Git LFS)
│ └── agentic_exp_data_1220_13w_DSnewPrompt/
│ ├── experience_data.json
│ └── chroma_db_experience/
├── trajectories/ # 🗂️ Model trajectory archives (Git LFS)
│ ├── gpt4o_*.tar.gz
│ ├── gemini3_pro_trajectory.tar.gz
│ └── ...
├── config/ # ⚙️ SWE-Agent compatible YAML configs
│ ├── benchmarks/ # Benchmark sweep configurations
│ ├── demo/ # Lightweight demo presets
│ ├── human/ # Human study protocols
│ └── exotic/ # Ablation experiment settings
├── tools/ # 🔧 Memory pipeline utilities
│ ├── experience_server.py
│ ├── issue_memory_rag/
│ ├── exp_search/
│ └── ...
├── scripts/ # 📜 Data collection scripts
│ ├── github_scraper.py
│ └── experience_process.py
├── figs/ # 🖼️ Publication-ready figures
└── requirements.txt # 📦 Runtime deps (installs SWE-agent + utilities)
MemGovern is implemented as memory tools + configs on top of SWE-agent. A full run uses two terminals:
- Terminal A: start the Experience Server (vector search + experience lookup)
- Terminal B: run SWE-agent on SWE-bench with a MemGovern config that calls the server tools
Requirements: Linux (or WSL2), Python ≥ 3.11, Git, Docker.
WSL2 note: Windows drives are mounted under
/mnt/(e.g.,E:\→/mnt/e/).
git clone https://github.com/QuantaAlpha/MemGovern.git
cd MemGovern
python3 -m venv SWE
source SWE/bin/activate
pip install -U pip
pip install -r requirements.txtThe Experience Server needs two artifacts:
experience_data.json: governed experience cards (key → structured fields, includingbug_description/fix_experience)chroma_db_experience/: a persistent ChromaDB store used for semantic retrieval
In this repository, we provide them under:
data/agentic_exp_data_1220_13w_DSnewPrompt/(tracked via Git LFS)
Place them in a directory (example layout):
<EXPERIENCE_DATA_DIR>/
├── experience_data.json
└── chroma_db_experience/
├── chroma.sqlite3
└── <uuid>/
├── data_level0.bin
├── header.bin
├── index_metadata.pickle
├── length.bin
└── link_lists.bin
Notes:
- These artifacts are large; we recommend hosting them via Git LFS or a separate dataset release.
- Retrieval quality depends on using the same embedding model at serving time as was used to build the ChromaDB store.
In our internal runs, we keep these files under a folder named
agentic_exp_data_1220_13w_DSnewPrompt/.
cd <MEMGOVERN_ROOT>/data/agentic_exp_data_1220_13w_DSnewPrompt
source <MEMGOVERN_ROOT>/SWE/bin/activate
export DB_DIR="$PWD/chroma_db_experience"
export JSON_DATA_PATH="$PWD/experience_data.json"
export MODEL_PATH="<PATH_OR_MODEL_ID_FOR_SENTENCE_TRANSFORMERS>"
export HOST="0.0.0.0"
export PORT="9030"
python <MEMGOVERN_ROOT>/tools/experience_server.pyHow to confirm it is running
In another shell:
curl -s http://localhost:9030/healthYou should also see log lines like:
[TOOL] /search ...[TOOL] /get_experience ...
when the agent uses the tools (this is the run-through evidence we use).
Before running, edit config/dsv31t_agenticMemSearch_1220_13w.yaml and replace:
agent.model.api_base: YOUR_API_BASEagent.model.api_key: YOUR_API_KEY
cd <MEMGOVERN_ROOT>
source SWE/bin/activate
sweagent run-batch \
--config config/dsv31t_agenticMemSearch_1220_13w.yaml \
--instances.type swe_bench \
--instances.subset verified \
--instances.split test \
--num_workers 12 \
--instances.shuffle=FalseAbout the config → server wiring
config/dsv31t_agenticMemSearch_1220_13w.yaml sets tool endpoints:
GRAPH_EXP_SEARCH_URL:http://host.docker.internal:9030/searchGRAPH_EXP_READ_URL:http://host.docker.internal:9030/get_experience
This is the recommended setup when SWE-agent runs tasks inside Docker and the Experience Server runs on the host.
After the run finishes, evaluate the produced predictions:
python -m swebench.harness.run_evaluation \
--predictions_path <PATH_TO_PREDS_JSON> \
--dataset_name princeton-nlp/SWE-bench_Verified \
--run_id <RUN_ID> \
--max_workers 8The predictions file is typically named
preds.jsonunder your run’strajectories/output directory.If
python -m swebench...is not available in your environment, install the SWE-bench harness following the official SWE-bench instructions.
Scrape GitHub PR data (metadata + patch + comments):
export GITHUB_TOKEN=your_github_token
python scripts/github_scraper.py \
--csv-path <PATH_TO_INPUT_CSV> \
--output-dir <OUTPUT_DIR> \
--chunk-size 200We provide experience_process.py to transform issue/PR/patch fields into governed experience cards using an LLM.
It reads an input parquet table and writes JSONL/parquet with the Experience Card fields.
export API_KEY=your_llm_key
export BASE_URL=your_llm_base_url # optional if using OpenAI default
export MODEL=your_model_name
python scripts/experience_process.py \
--input <INPUT_PARQUET> \
--output <OUTPUT_JSONL_OR_PARQUET> \
--output-format jsonl \
--max-workers 200Launch the memory retrieval service (see “Reproducing MemGovern” above). The server reads these env vars:
DB_DIRJSON_DATA_PATHMODEL_PATHHOST(default0.0.0.0)PORT(default9030)
| Config | Use Case |
|---|---|
config/benchmarks/*.yaml |
Full benchmark sweeps with different governance settings |
config/demo/*.yaml |
Quick demos with minimal latency |
config/human/*.yaml |
Human evaluation study protocols |
config/exotic/*.yaml |
Ablation: windowed replace, late reproduction |
We welcome contributions of all kinds—new configs, tools, bug fixes, or documentation improvements!
- 🐛 Bug Reports: Open an issue
- 💡 New Configs: Add timestamped YAML files under
config/ - 🔧 New Tools: Extend the
tools/directory with your utilities - 📊 Trajectories: Share model runs via Git LFS
Note: Large files (>50 MB) should use Git LFS. Run
git lfs ls-filesbefore committing.
Special thanks to:
- SWE-Agent - The foundation agent framework
- RepoMaster - Autonomous repository exploration
- SWE-Bench - The evaluation benchmark
- ChromaDB - Vector database for memory retrieval
QuantaAlpha was founded in April 2025 by researchers from Tsinghua University, Peking University, CAS, CMU, HKUST, and more.
🌟 Our mission: Explore the "quantum" of intelligence and pioneer the "alpha" frontier of agent research.
✨ Research Directions:
- CodeAgent: End-to-end autonomous task execution
- DeepResearch: Deep reasoning & retrieval-augmented intelligence
- Agentic RL: Agent-based reasoning and reinforcement learning
- Self-evolution: Multi-agent coordination and learning
🔗 Team Homepage: QuantaAlpha
📧 Email: quantaalpha.ai@gmail.com
⭐ If MemGovern helps your research, please give us a star!
Made with ❤️ by the QuantaAlpha Team



