WorldMind is a framework for aligning agentic world models through knowledgeable experience learning, enabling agents to learn directly from the environment.
π Overview β’ π₯οΈ Installation β’ π Quick Start β’ π Environments β’ π Plugin β’ π Project Structure β’ π Results β’ π Citation β’ π Acknowledgments
WorldMind introduces a paradigm shift in how embodied AI agents learn and adapt. Unlike traditional approaches that rely on extensive environment interaction or domain-specific fine-tuning, WorldMind operates as a training-free framework that enables agents to:
- Learn from Experience: Extract reusable symbolic knowledge from both successful task completions and prediction errors without gradient updates.
- Generalize Across Tasks: Apply learned causal rules and heuristics to novel situations through semantic similarity-based retrieval.
- Continuously Improve: Accumulate and refine the World Knowledge Repository (WKR) throughout deployment.
| Feature | Description |
|---|---|
| π§ Experience Learning | Combines Goal Experience (heuristics) from successful trajectories with Process Experience (causal boundaries) from prediction errors |
| π Experience-Driven Alignment | Uses State Abstraction and Verifier components to align world model predictions with actual environment dynamics |
| π Universal Adaptability | Seamlessly generalizes across diverse embodied environments (ALFRED, Habitat, Navigation) and tasks without specific fine-tuning |
| π Modular Plugin | Standalone plugin for easy integration into existing agent systems |
WorldMind introduces a two-stage approach for world model alignment:
Stage 1 extracts knowledge during task execution (World Knowledge Building):
- Goal Experience: From successful trajectories, distill procedural heuristics to guide task optimality.
- Process Experience: Employ a Predict-Act-Verify loop. When a Verifier detects a semantic discrepancy between the predicted and actual abstract states, a Self-Reflexion mechanism synthesizes corrective causal rules.
Stage 2 applies learned knowledge to new tasks (Inference via Constrained Simulation):
- Retrieve relevant Process and Goal experiences via semantic similarity.
- Gated Simulation: Selectively simulate outcomes only when target objects are grounded, enhancing inference efficiency.
- Augment world model prompts with retrieved knowledge to constrain planning within physical feasibility.
Note: We need to set up two conda environments:
worldmindfor EB-ALFRED and EB-Habitatworldmind_navfor EB-NavigationPlease use SSH download instead of HTTP to avoid errors during git lfs pull.
git clone https://github.com/zjunlp/WorldMind.git
cd WorldMind# Create environment named 'worldmind'
conda env create -f conda_envs/environment.yaml
conda activate worldmind
pip install -e .# Create environment named 'worldmind_nav'
conda env create -f conda_envs/environment_eb-nav.yaml
conda activate worldmind_nav
pip install -e .For headless servers, start the X server in a separate tmux window:
conda activate worldmind
python -m embodiedbench.envs.eb_alfred.scripts.startx 1π EB-ALFRED (Household Tasks)
1.Download Data:
conda activate embench
git clone https://huggingface.co/datasets/EmbodiedBench/EB-ALFRED
mv EB-ALFRED embodiedbench/envs/eb_alfred/data/json_2.1.02.Verify Installation:
conda activate worldmind
# Remember to start the headless server first!
python -m embodiedbench.envs.eb_alfred.EBAlfEnvποΈ EB-Habitat (Rearrangement Tasks)
1. Install Habitat Sim & Lab:
conda activate worldmind
# Install Habitat-Sim with Bullet physics support
conda install -y habitat-sim==0.3.0 withbullet headless -c conda-forge -c aihabitat
# Install Habitat-Lab
cd ./habitat-lab
pip install -e habitat-lab
cd ..2.Download Data: Download YCB and ReplicaCAD dataset for the Language Rearrangement task.
conda install -y -c conda-forge git-lfs
python -m habitat_sim.utils.datasets_download --uids rearrange_task_assets
mv data embodiedbench/envs/eb_habitatNote: After the above step, there should be a
datafolder underembodiedbench/envs/eb_habitat.
2. Verify Installation: Run the following code to ensure the EB-Habitat environment is working correctly.
python -m embodiedbench.envs.eb_habitat.EBHabEnvπ§ EB-Navigation (Vision-and-Language Navigation)
Verify Installation: Run the following code to ensure the EB-Navigation environment is working correctly.
conda activate worldmind_nav
python -m embodiedbench.envs.eb_navigation.EBNavEnvWe provide a universal run script run.sh for easy experiment execution. Simply configure the script and run:
#!/bin/bash
# WorldMind Universal Run Script
# Supports all three environments: Alfred (eb-alf), Habitat (eb-hab), Navigation (eb-nav)
set -e
# ============================================================
# ENVIRONMENT VARIABLES (Export Section)
# ============================================================
export CUDA_VISIBLE_DEVICES=0
export OPENAI_API_KEY="your-openai-api-key"
export OPENAI_BASE_URL="your-openai-base-url"
# ============================================================
# CONFIGURATION PARAMETERS (Edit here)
# ============================================================
MODEL_NAME="gpt-3.5-turbo" # Choose your model
ENV="eb-hab" # Options: eb-alf, eb-hab, eb-nav
EXP_NAME="test" # Your experiment name
ENABLE_WORLDMIND="True" # True or False
# WorldMind component models (fixed to MODEL_NAME)
export WORLDMIND_DISCRIMINATOR_MODEL="$MODEL_NAME"
export WORLDMIND_SUMMARIZER_MODEL="$MODEL_NAME"
export WORLDMIND_REFLECTOR_MODEL="$MODEL_NAME"
export WORLDMIND_REFINER_MODEL="$MODEL_NAME"
# ============================================================
# VALIDATION
# ============================================================
if [ -z "$OPENAI_API_KEY" ]; then
echo "=========================================="
echo "ERROR: OPENAI_API_KEY not set!"
echo "=========================================="
exit 1
fi
case "$ENV" in
eb-alf|eb-hab|eb-nav)
echo "β Valid environment: $ENV"
;;
*)
echo "=========================================="
echo "ERROR: Invalid environment '$ENV'"
echo "=========================================="
echo "Valid options: eb-alf, eb-hab, eb-nav"
exit 1
;;
esac
# ============================================================
# DISPLAY CONFIGURATION
# ============================================================
echo ""
echo "=========================================="
echo "WorldMind Experiment Configuration"
echo "=========================================="
echo "Environment: $ENV"
echo "Model: $MODEL_NAME"
echo "Experiment: $EXP_NAME"
echo "WorldMind: $ENABLE_WORLDMIND"
echo "----------------------------------------"
echo "GPU Device: $CUDA_VISIBLE_DEVICES"
echo "Display: $DISPLAY"
echo "API Base URL: $OPENAI_BASE_URL"
echo "=========================================="
echo ""
# ============================================================
# RUN EXPERIMENT
# ============================================================
python -m embodiedbench.main \
env="$ENV" \
model_name="$MODEL_NAME" \
exp_name="$EXP_NAME" \
enable_worldmind="$ENABLE_WORLDMIND"Usage:
bash run.shWorldMind uses YAML configuration files for experiment settings. You can find and customize these files in the WorldMind/embodiedbench/configs directory.
π Click to view example configuration (`configs/eb-nav.yaml`)
# configs/eb-nav.yaml
model_name: gpt-4o-mini
model_type: remote
exp_name: navigation_baseline
# WorldMind Settings
enable_worldmind: True
use_vision_discriminator: false
use_experience_trajectory: true
detailed_output: true
# Goal Experience Settings
enable_goal_experience: true
goal_experience_top_k: 2
# Process Experience Settings
enable_process_experience: true
process_experience_top_k: 2
# Experience Refinement
enable_experience_refine: true
use_worldmind_template: true| Parameter | Description | Default |
|---|---|---|
enable_worldmind |
Enable WorldMind components | True |
enable_goal_experience |
Enable goal experience retrieval | True |
goal_experience_top_k |
Number of goal experiences to retrieve | 2 |
enable_process_experience |
Enable process experience retrieval | True |
process_experience_top_k |
Number of process experiences to retrieve | 2 |
enable_experience_refine |
Enable LLM-based experience refinement | True |
A benchmark for grounded language learning in 3D household environments. Tasks require agents to execute multi-step instructions involving object manipulation.
Evaluation Metrics: Success Rate (SR) and Goal Condition (GC)
Evaluation Sets: Base, Common, Complex, Visual, Spatial
A simulation platform for embodied AI research focusing on object rearrangement tasks in realistic indoor environments.
Evaluation Metrics: Success Rate (SR) and Goal Condition (GC)
Evaluation Sets: Base, Common, Complex, Visual, Spatial
A discrete navigation environment where agents must reach target locations through natural language instructions.
Evaluation Metrics: Success Rate (SR)
Evaluation Sets: Base, Common, Complex, Visual
To facilitate seamless integration across diverse domains, we provide a universal, standalone plugin featuring a highly modular architecture. This powerful tool empowers you to rapidly deploy WorldMind's core capabilitiesβsuch as experience extraction and memory retrievalβinto your own custom environments or new projects with minimal effort, significantly accelerating your research and development pipeline.
from worldmind_plugin import (
WorldMindConfig,
ProcessExperienceModule,
GoalExperienceModule,
ExperienceRetrievalModule,
ProcessTrajectoryStep,
GoalTrajectoryStep
)
# Create configuration
config = WorldMindConfig(
api_key="your-api-key",
save_path="./worldmind_output"
)
# Initialize modules independently
process_module = ProcessExperienceModule(config)
goal_module = GoalExperienceModule(config)
retrieval_module = ExperienceRetrievalModule(config)
# Extract goal experience from successful trajectory
trajectory = [
GoalTrajectoryStep(
action="navigate_to(kitchen)",
env_feedback="Arrived at kitchen",
observation="Kitchen counter visible"
),
# ... more steps
]
experience = goal_module.extract_experience(
task_instruction="Go to the kitchen and get an apple",
trajectory=trajectory
)
# Retrieve experiences for a new task
result = retrieval_module.retrieve(
task_instruction="Find the coffee mug",
enable_refine=True
)
# Use in agent prompt
agent_prompt = f"""You are a helpful assistant.
{result['formatted_prompt']}
Task: Find the coffee mug
"""See Plugin/README.md for detailed documentation.
WorldMind/
βββ π embodiedbench/
β βββ π envs/ # Environment implementations
β β βββ eb_alfred/ # ALFRED environment
β β βββ eb_habitat/ # Habitat environment
β β βββ eb_navigation/ # Navigation environment
β βββ π evaluator/ # Evaluation scripts
β βββ π worldmind/ # WorldMind core modules
β βββ alfred/ # ALFRED integration
β βββ habitat/ # Habitat integration
β βββ navigation/ # Navigation integration
βββ π Plugin/ # Standalone WorldMind Plugin
βββ π assets/ # Images and resources
βββ π README.md
| Model | Success Rate (SR) % | Goal Condition (GC) % | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Avg | Base | Common | Complex | Visual | Spatial | Avg | Base | Common | Complex | Visual | Spatial | |
| Open-source and Proprietary Models | ||||||||||||
| GPT-4o | 56.8 | 64.0 | 54.0 | 68.0 | 46.0 | 52.0 | 65.1 | 74.0 | 60.3 | 74.0 | 58.3 | 61.3 |
| GPT-4o-mini | 28.8 | 34.0 | 28.0 | 36.0 | 24.0 | 22.0 | 34.3 | 47.8 | 35.3 | 43.5 | 33.3 | 29.0 |
| Claude-3.7-Sonnet | 67.2 | 68.0 | 68.0 | 70.0 | 68.0 | 62.0 | 65.3 | 72.0 | 66.0 | 76.7 | 63.0 | 59.7 |
| Gemini-1.5-Pro | 63.2 | 70.0 | 64.0 | 72.0 | 58.0 | 52.0 | 67.4 | 74.3 | 66.7 | 76.5 | 62.8 | 59.0 |
| Llama-3.2-90B-Vis | 35.2 | 38.0 | 34.0 | 44.0 | 28.0 | 32.0 | 37.6 | 43.7 | 37.3 | 49.2 | 35.3 | 36.0 |
| InternVL2.5-78B | 37.0 | 41.0 | 40.0 | 39.0 | 16.0 | 49.0 | 41.0 | 42.3 | 35.3 | 43.3 | 35.7 | 40.3 |
| GPT-3.5-turbo Based Methods | ||||||||||||
| ReAct | 44.4 | 52.0 | 48.0 | 52.0 | 32.0 | 38.0 | 50.4 | 55.3 | 53.5 | 55.3 | 42.7 | 45.0 |
| BoN | 42.8 | 46.0 | 42.0 | 50.0 | 42.0 | 34.0 | 50.4 | 54.2 | 46.5 | 56.5 | 52.0 | 42.8 |
| SimuRA | 45.2 | 50.0 | 42.0 | 54.0 | 38.0 | 42.0 | 53.6 | 57.8 | 47.8 | 59.7 | 48.5 | 54.3 |
| ReasoningBank | 41.6 | 50.0 | 36.0 | 44.0 | 36.0 | 42.0 | 47.6 | 57.5 | 41.5 | 47.0 | 44.2 | 48.0 |
| Synapse | 38.8 | 38.0 | 46.0 | 40.0 | 36.0 | 34.0 | 43.6 | 42.5 | 51.3 | 42.7 | 42.0 | 39.7 |
| AWM | 40.0 | 46.0 | 32.0 | 48.0 | 40.0 | 34.0 | 46.2 | 53.2 | 39.2 | 50.7 | 47.0 | 41.0 |
| WorldMind | 48.0 | 58.0 | 48.0 | 56.0 | 34.0 | 44.0 | 54.1 | 63.0 | 52.7 | 61.0 | 41.7 | 52.0 |
| GPT-4.1-mini Based Methods | ||||||||||||
| ReAct | 41.2 | 50.0 | 40.0 | 46.0 | 38.0 | 32.0 | 47.5 | 55.3 | 42.8 | 52.2 | 47.2 | 39.8 |
| BoN | 44.4 | 46.0 | 44.0 | 50.0 | 42.0 | 40.0 | 49.5 | 50.8 | 48.3 | 54.7 | 48.7 | 45.0 |
| SimuRA | 45.6 | 52.0 | 44.0 | 54.0 | 38.0 | 40.0 | 52.2 | 61.0 | 50.3 | 58.2 | 45.3 | 46.3 |
| ReasoningBank | 38.0 | 42.0 | 36.0 | 42.0 | 34.0 | 36.0 | 42.6 | 46.7 | 38.8 | 45.8 | 41.5 | 40.3 |
| Synapse | 37.2 | 40.0 | 32.0 | 44.0 | 36.0 | 34.0 | 42.2 | 41.2 | 37.5 | 49.5 | 41.3 | 41.7 |
| AWM | 41.2 | 44.0 | 36.0 | 48.0 | 38.0 | 40.0 | 46.0 | 48.3 | 42.0 | 52.5 | 44.3 | 42.7 |
| WorldMind | 49.2 | 50.0 | 58.0 | 54.0 | 42.0 | 42.0 | 55.7 | 61.0 | 61.0 | 58.8 | 48.0 | 49.7 |
| Model | Success Rate (SR) % | Goal Condition (GC) % | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Avg | Base | Common | Complex | Visual | Spatial | Avg | Base | Common | Complex | Visual | Spatial | |
| Open-source and Proprietary Models | ||||||||||||
| GPT-4o | 56.8 | 64.0 | 54.0 | 68.0 | 46.0 | 52.0 | 65.1 | 74.0 | 60.3 | 74.0 | 58.3 | 61.3 |
| GPT-4o-mini | 28.8 | 34.0 | 28.0 | 36.0 | 24.0 | 22.0 | 34.3 | 47.8 | 35.3 | 43.5 | 33.3 | 29.0 |
| Claude-3.7-Sonnet | 67.2 | 68.0 | 68.0 | 70.0 | 68.0 | 62.0 | 65.3 | 72.0 | 66.0 | 76.7 | 63.0 | 59.7 |
| Gemini-1.5-Pro | 63.2 | 70.0 | 64.0 | 72.0 | 58.0 | 52.0 | 67.4 | 74.3 | 66.7 | 76.5 | 62.8 | 59.0 |
| Llama-3.2-90B-Vis | 35.2 | 38.0 | 34.0 | 44.0 | 28.0 | 32.0 | 37.6 | 43.7 | 37.3 | 49.2 | 35.3 | 36.0 |
| InternVL2.5-78B | 37.0 | 41.0 | 40.0 | 39.0 | 16.0 | 49.0 | 41.0 | 42.3 | 35.3 | 43.3 | 35.7 | 40.3 |
| GPT-3.5-turbo Based Methods | ||||||||||||
| ReAct | 44.4 | 52.0 | 48.0 | 52.0 | 32.0 | 38.0 | 50.4 | 55.3 | 53.5 | 55.3 | 42.7 | 45.0 |
| BoN | 42.8 | 46.0 | 42.0 | 50.0 | 42.0 | 34.0 | 50.4 | 54.2 | 46.5 | 56.5 | 52.0 | 42.8 |
| SimuRA | 45.2 | 50.0 | 42.0 | 54.0 | 38.0 | 42.0 | 53.6 | 57.8 | 47.8 | 59.7 | 48.5 | 54.3 |
| ReasoningBank | 41.6 | 50.0 | 36.0 | 44.0 | 36.0 | 42.0 | 47.6 | 57.5 | 41.5 | 47.0 | 44.2 | 48.0 |
| Synapse | 38.8 | 38.0 | 46.0 | 40.0 | 36.0 | 34.0 | 43.6 | 42.5 | 51.3 | 42.7 | 42.0 | 39.7 |
| AWM | 40.0 | 46.0 | 32.0 | 48.0 | 40.0 | 34.0 | 46.2 | 53.2 | 39.2 | 50.7 | 47.0 | 41.0 |
| WorldMind | 48.0 | 58.0 | 48.0 | 56.0 | 34.0 | 44.0 | 54.1 | 63.0 | 52.7 | 61.0 | 41.7 | 52.0 |
| GPT-4.1-mini Based Methods | ||||||||||||
| ReAct | 41.2 | 50.0 | 40.0 | 46.0 | 38.0 | 32.0 | 47.5 | 55.3 | 42.8 | 52.2 | 47.2 | 39.8 |
| BoN | 44.4 | 46.0 | 44.0 | 50.0 | 42.0 | 40.0 | 49.5 | 50.8 | 48.3 | 54.7 | 48.7 | 45.0 |
| SimuRA | 45.6 | 52.0 | 44.0 | 54.0 | 38.0 | 40.0 | 52.2 | 61.0 | 50.3 | 58.2 | 45.3 | 46.3 |
| ReasoningBank | 38.0 | 42.0 | 36.0 | 42.0 | 34.0 | 36.0 | 42.6 | 46.7 | 38.8 | 45.8 | 41.5 | 40.3 |
| Synapse | 37.2 | 40.0 | 32.0 | 44.0 | 36.0 | 34.0 | 42.2 | 41.2 | 37.5 | 49.5 | 41.3 | 41.7 |
| AWM | 41.2 | 44.0 | 36.0 | 48.0 | 38.0 | 40.0 | 46.0 | 48.3 | 42.0 | 52.5 | 44.3 | 42.7 |
| WorldMind | 49.2 | 50.0 | 58.0 | 54.0 | 42.0 | 42.0 | 55.7 | 61.0 | 61.0 | 58.8 | 48.0 | 49.7 |
Detailed results and ablation studies available in our paper.
If you find this work useful, please cite:
@article{ren2026aligning,
title={Aligning Agentic World Models via Knowledgeable Experience Learning},
author={Ren, Baochang and Yao, Yunzhi and Sun, Rui and Qiao, Shuofei and Zhang, Ningyu and Chen, Huajun},
journal={arXiv preprint arXiv:2601.13247},
year={2026}
}We thank the following projects and teams for their open-source contributions:
- EmbodiedBench for the evaluation tasks
- ALFRED and AI2-THOR for the household task benchmark and simulation environment
- Habitat for the rearrangement simulation platform
- vLLM for efficient LLM inference and serving

