SSE-Bio

🧠 Structured self-evolving biomedical multi-hop reasoning with adaptive retrieval

SSE-Bio is an agent framework for biomedical multi-hop question answering. It is designed for settings where a model must resolve intermediate entities, retrieve supporting evidence only when needed, and refine its reasoning process without drifting into unconstrained prompt rewriting.

✨ Highlights

Structured self-evolution rather than free-form workflow rewriting
Adaptive retrieval over knowledge triplets and prior templates
Proxy-only training with SFT -> GRPO
Biomedical multi-hop QA support for BioHopR, MedHop, and HLE: Biomedicine

🧩 Method at a Glance

SSE-Bio is built around four components:

Component	Responsibility
`Manager`	Maintains the structured state summary as short-term memory, reuses template memory as long-term memory, and converts the current state into a query-specific plan
`Proxy`	Explicitly controls retrieval by deciding whether knowledge triplets and/or prior templates should be retrieved at the current step
`Execution (Dev)`	Executes the current plan with the retrieved evidence, and produces the current reasoning trajectory and answer candidate
`Critic`	Assesses whether the trajectory and answer are coherent and sufficiently supported, and returns structured feedback for refinement

Core Commitments

Structural constraints prevent self-evolution from drifting into unconstrained prompt mutation.
Knowledge triplets ground each step in biomedical evidence.
Prior templates store reusable reasoning guidance rather than factual shortcuts.

🔄 Inference Flow

The key idea is local repair. Instead of rewriting the whole reasoning scaffold after a failure, SSE-Bio revises only the current state, the routing decision, or a template-level constraint.

🧪 Training

Only the Proxy is trained. The Manager, Execution, Critic, retrievers, and reasoning environment remain fixed.

Stage 1: Supervised Fine-Tuning

The proxy is initialized with retrieval decision pseudo-labels. For a given structured state, the system compares alternative retrieval branches and uses the action with the highest downstream composite reward as the supervision target.

Stage 2: GRPO

The proxy is then refined with Group Relative Policy Optimization over decision-contrastive trajectory groups. Alternative retrieval actions are expanded from the same structured state, partially pruned by intermediate answer-grounded reward, and then optimized comparatively.

Reward Signal

Training combines:

final answer correctness
evidence-supported reasoning behavior

This encourages retrieval decisions that are both effective and grounded.

📚 Benchmarks

SSE-Bio includes evaluation entrypoints for:

BioHopR
MedHop
Humanity's Last Exam: Biomedicine

🚀 Quick Start

Installation

uv sync
source .venv/bin/activate

Configuration

Two configs are included:

config.toml.example — default full configuration
config.opensource.toml — open-source runnable configuration

Run One Example

python run_sse_bio.py run \
  "Name all diseases related to a phenotype associated with a given drug." \
  --triplets-path path/to/biomedical_triplets.jsonl \
  --config config.opensource.toml

Evaluate on BioHopR

python run_biohopr_eval.py evaluate data/biohopr_bundle \
  --triplets-path path/to/biomedical_triplets.jsonl \
  --config config.opensource.toml \
  --output-path outputs/biohopr_eval.jsonl

Train the Proxy

Build SFT data:

python run_proxy_sft.py build-data data/biohopr_bundle \
  --split train \
  --output-path data/proxy_train.jsonl

Train SFT:

python run_proxy_sft.py train data/proxy_train.jsonl \
  --model Qwen/Qwen2.5-72B-Instruct \
  --output-dir outputs/proxy_sft

Build GRPO data:

python run_proxy_grpo.py build-data data/biohopr_bundle \
  --split train \
  --output-path data/proxy_grpo.jsonl

Train GRPO:

python run_proxy_grpo.py train data/proxy_grpo.jsonl \
  --model outputs/proxy_sft \
  --output-dir outputs/proxy_grpo

🗂 Repository Layout

Path	Purpose
`sse_bio/`	Core package
`sse_bio/system.py`	End-to-end inference loop
`sse_bio/agents.py`	Manager, proxy, execution, and critic wrappers
`sse_bio/structure.py`	Structured controller and local update operators
`sse_bio/experience_manager.py`	Prior template retrieval and persistence
`sse_bio/triplet_store.py`	Biomedical triplet ingestion and retrieval
`sse_bio/training/`	Proxy SFT, GRPO, rewards, and training-data export
`sse_bio/eval/`	Benchmark runners and metrics
`scripts/data/`	Dataset download helpers
`scripts/hpc/`	Generic cluster launch scripts for proxy training

📎 Citation

If you use SSE-Bio in academic work, please cite the corresponding paper.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SSE-Bio

✨ Highlights

🧩 Method at a Glance

Core Commitments

🔄 Inference Flow

🧪 Training

Stage 1: Supervised Fine-Tuning

Stage 2: GRPO

Reward Signal

📚 Benchmarks

🚀 Quick Start

Installation

Configuration

Run One Example

Evaluate on BioHopR

Train the Proxy

🗂 Repository Layout

📎 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
scripts/hpc		scripts/hpc
sse_bio		sse_bio
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
config.opensource.toml		config.opensource.toml
config.toml.example		config.toml.example
pyproject.toml		pyproject.toml
run_biohopr_eval.py		run_biohopr_eval.py
run_hle_eval.py		run_hle_eval.py
run_medhop_eval.py		run_medhop_eval.py
run_primekg.py		run_primekg.py
run_proxy_grpo.py		run_proxy_grpo.py
run_proxy_sft.py		run_proxy_sft.py
run_sse_bio.py		run_sse_bio.py

Folders and files

Latest commit

History

Repository files navigation

SSE-Bio

✨ Highlights

🧩 Method at a Glance

Core Commitments

🔄 Inference Flow

🧪 Training

Stage 1: Supervised Fine-Tuning

Stage 2: GRPO

Reward Signal

📚 Benchmarks

🚀 Quick Start

Installation

Configuration

Run One Example

Evaluate on BioHopR

Train the Proxy

🗂 Repository Layout

📎 Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages