🧠 Structured self-evolving biomedical multi-hop reasoning with adaptive retrieval
SSE-Bio is an agent framework for biomedical multi-hop question answering. It is designed for settings where a model must resolve intermediate entities, retrieve supporting evidence only when needed, and refine its reasoning process without drifting into unconstrained prompt rewriting.
- Structured self-evolution rather than free-form workflow rewriting
- Adaptive retrieval over knowledge triplets and prior templates
- Proxy-only training with
SFT -> GRPO - Biomedical multi-hop QA support for
BioHopR,MedHop, andHLE: Biomedicine
SSE-Bio is built around four components:
| Component | Responsibility |
|---|---|
Manager |
Maintains the structured state summary as short-term memory, reuses template memory as long-term memory, and converts the current state into a query-specific plan |
Proxy |
Explicitly controls retrieval by deciding whether knowledge triplets and/or prior templates should be retrieved at the current step |
Execution (Dev) |
Executes the current plan with the retrieved evidence, and produces the current reasoning trajectory and answer candidate |
Critic |
Assesses whether the trajectory and answer are coherent and sufficiently supported, and returns structured feedback for refinement |
- Structural constraints prevent self-evolution from drifting into unconstrained prompt mutation.
- Knowledge triplets ground each step in biomedical evidence.
- Prior templates store reusable reasoning guidance rather than factual shortcuts.
The key idea is local repair. Instead of rewriting the whole reasoning scaffold after a failure, SSE-Bio revises only the current state, the routing decision, or a template-level constraint.
Only the Proxy is trained. The Manager, Execution, Critic, retrievers, and reasoning environment remain fixed.
The proxy is initialized with retrieval decision pseudo-labels. For a given structured state, the system compares alternative retrieval branches and uses the action with the highest downstream composite reward as the supervision target.
The proxy is then refined with Group Relative Policy Optimization over decision-contrastive trajectory groups. Alternative retrieval actions are expanded from the same structured state, partially pruned by intermediate answer-grounded reward, and then optimized comparatively.
Training combines:
- final answer correctness
- evidence-supported reasoning behavior
This encourages retrieval decisions that are both effective and grounded.
SSE-Bio includes evaluation entrypoints for:
BioHopRMedHopHumanity's Last Exam: Biomedicine
uv sync
source .venv/bin/activateTwo configs are included:
config.toml.example— default full configurationconfig.opensource.toml— open-source runnable configuration
python run_sse_bio.py run \
"Name all diseases related to a phenotype associated with a given drug." \
--triplets-path path/to/biomedical_triplets.jsonl \
--config config.opensource.tomlpython run_biohopr_eval.py evaluate data/biohopr_bundle \
--triplets-path path/to/biomedical_triplets.jsonl \
--config config.opensource.toml \
--output-path outputs/biohopr_eval.jsonlBuild SFT data:
python run_proxy_sft.py build-data data/biohopr_bundle \
--split train \
--output-path data/proxy_train.jsonlTrain SFT:
python run_proxy_sft.py train data/proxy_train.jsonl \
--model Qwen/Qwen2.5-72B-Instruct \
--output-dir outputs/proxy_sftBuild GRPO data:
python run_proxy_grpo.py build-data data/biohopr_bundle \
--split train \
--output-path data/proxy_grpo.jsonlTrain GRPO:
python run_proxy_grpo.py train data/proxy_grpo.jsonl \
--model outputs/proxy_sft \
--output-dir outputs/proxy_grpo| Path | Purpose |
|---|---|
sse_bio/ |
Core package |
sse_bio/system.py |
End-to-end inference loop |
sse_bio/agents.py |
Manager, proxy, execution, and critic wrappers |
sse_bio/structure.py |
Structured controller and local update operators |
sse_bio/experience_manager.py |
Prior template retrieval and persistence |
sse_bio/triplet_store.py |
Biomedical triplet ingestion and retrieval |
sse_bio/training/ |
Proxy SFT, GRPO, rewards, and training-data export |
sse_bio/eval/ |
Benchmark runners and metrics |
scripts/data/ |
Dataset download helpers |
scripts/hpc/ |
Generic cluster launch scripts for proxy training |
If you use SSE-Bio in academic work, please cite the corresponding paper.
