This project is the code implementation of the paper Adaptive Theory of Mind for LLM-based Multi-Agent Coordination. It implements 0/1/2-order ToM agents and Adaptive ToM agents, and provides batch experiment scripts and result-analysis scripts. The project includes experiments implemented in the following three environments:
- Coordination Game (
coordination_game) - Grid World Navigation (
grid_world) - Overcooked Environment (
overcooked)
- Operating System: Ubuntu provides the best compatibility. Parallel scripts are not supported on Windows.
- Python Version: The default interpreter version is
Python 3.7. This is because it needs to support reinforcement learning policies based on TensorFlow 1.x and the older versions of the Overcooked environment. - Install required packages via:
pip install -r requirements.txt
Our paper reports results on Llama3.3-70B-Instruct. You can also try other high-performing models, such as the GPT series.
Set the LLM API configurations in the following three files:
coordination_game/LLM_agent/api_key.pygrid_world/LLM_agent/api_key.pyovercooked/LLM_agent/api_key.py
-
Single run:
python coordination_game/main.py --player1_type=adaptive --player2_type=adaptive --model_name=llama --exp_name=demo --horizon=20 --use_non_coordiantion_opening --adaptive_alg=Hedge -
Batch run (This approach allows you to quickly reproduce the results of the paper. ):
apt install parallel cd coordination_game chmod +x run.sh ./run.sh -
Output directory:
results/<exp_name>/<player1>_vs_<player2>_<model_name>[_flags]/<pid>/ -
Output files:
action.csv: actions of both players for each roundplayer*_prediction_vs_true_action.csv: predicted partner actions vs. real partner actions (extra output for 0/1/2-order ToM and Adaptive ToM agents)
-
Result analysis:
- Run:
python coordination_game/analyze.py <horizon> <exp_name> - Average score and standard deviation are saved in:
results/<exp_name>/score.txt
- Run:
-
Single run:
python gird_world/main.py --player1_type=adaptive --player2_type=adaptive --game_name=game1 --model_name=llama --exp_name=demo --horizon=20 --adaptive_alg=Hedge -
Batch run (This approach allows you to quickly reproduce the results of the paper. ):
apt install parallel cd gird_world chmod +x run.sh ./run.sh -
Output directory:
results/<exp_name>/<player1>_vs_<player2>_<model_name>/<pid>/ -
Output files:
player*_log.txtandpublic_log.txt: prompt/response traces and environment renderingsscore.txt: number of steps the agents required (or total horizon if unfinished)game_end.txt: True if both players reached their targets, otherwise absent/Falseplayer*_loss*.txt: Adaptive ToM training loss*_prediction_candidate_history.txt: prediction history of Adaptive ToM
-
Result analysis:
- Run:
python gird_world/anaylze.py <exp_name> - Aggregated success rate and score CSVs are saved under
results/<exp_name>/
- Run:
-
Single run:
python overcooked/main.py --player1_type=adaptive_tom --player2_type=adaptive_tom --model_name=llama --exp_name=demo_overcooked --horizon=80 --cook_time=20 --use_counter -
Batch run (This approach allows you to quickly reproduce the results of the paper. ):
apt install parallel cd overcooked chmod +x run.sh ./run.sh -
Output directory:
results/<exp_name>/<player1>_vs_<player2>_<model_name>[_flags]/<pid>/ -
Output files:
env_log.txt: environment rendering textscore.txt: total score (fixed horizon) or steps needed to finish (depending on--use_score_of_fixed_horizon)game_end.txt: True if task completed within the time limit, otherwise Falseplayer*_loss.txt: Adaptive ToM training loss*_prediction_candidate_history.txt: prediction history of Adaptive ToM
coordination_game/coordination_game/main.py: entry point for coordination-game experimentscoordination_game/LLM_agent/: implementations of 0/1/2-order ToM and Adaptive ToM agentscoordination_game/analyze.py: result analysis and summarycoordination_game/run.sh: parallel experiment script
staghut_game/staghut_game/main.py: entry point for stag-hunt experimentsstaghut_game/LLM_agent/: implementations of 0/1/2-order ToM and Adaptive ToM agentsstaghut_game/analyze.py: result analysis and summarystaghut_game/run.sh: batch experiment script
overcooked/overcooked/main.py: entry point for Overcooked experimentsovercooked/overcooked_env/,overcooked/overcooked_ai_py/: environment implementation and dependenciesovercooked/LLM_agent/: ToM agents and prompt templates adapted for Overcookedovercooked/rl_agent.py: RL baselineovercooked/game_prompts/: prompt templates for different layoutsovercooked/requirements.txt: dependencies for this subprojectovercooked/run.sh: batch experiment script