Skip to content

harryden/alpha-toe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MCTS Game Engine

A configurable Monte Carlo Tree Search (MCTS) engine for two-player zero-sum games, demonstrated on Tic-Tac-Toe. The engine is decoupled from game-specific logic and can be adapted to any game that exposes a standard state interface.

Developed in collaboration with Elvina Fahlgren.

Project Structure

alpha-toe/
├── config/
│   └── example.config.json   # experiment parameters
├── docs/                     # project report
├── src/
│   ├── board.py              # Tic-Tac-Toe game state
│   ├── mcts.py               # MCTS engine
│   ├── rollout_strategies.py # simulation policies
│   └── run.py                # experiment runner
├── requirements.txt
└── README.md

How MCTS Works

Monte Carlo Tree Search repeats four phases until the compute budget is exhausted.

1. Selection

Starting from the root, the tree is traversed by repeatedly selecting the child with the highest UCB1 score:

$$\text{UCB1}(v) = \frac{Q(v)}{N(v)} + C \cdot \sqrt{\frac{\ln N(\text{parent}(v))}{N(v)}}$$

Symbol Meaning
$Q(v)$ Cumulative reward at node $v$
$N(v)$ Visit count at node $v$
$C$ Exploration constant (default $\sqrt{2}$)

The first term exploits high-value nodes; the second explores less-visited ones.

2. Expansion

When a node is not fully expanded (i.e. some legal moves have not yet been tried), one untried move is sampled uniformly at random and added to the tree as a new child node.

3. Simulation (Rollout)

A game is played out from the newly expanded node to a terminal state. Two policies are available:

  • heuristic — prioritises winning moves, then blocking opponent wins, then center; falls back to random.
  • random — selects actions uniformly at random.

4. Backpropagation

The terminal reward propagates back through the tree. Each ancestor's visit count $N$ and cumulative reward $Q$ are updated. The reward sign alternates at each level to reflect the zero-sum nature of the game.

Move Selection

After the budget is exhausted the engine returns the robust child: the root's child with the highest visit count $N$. This is statistically more reliable than selecting by highest mean reward alone.


Game State Interface

To use this engine with a different game, implement a class with the following methods:

class MyGame:
    def available_moves(self) -> list: ...
    def make_move(self, move) -> None:  ...
    def clone(self) -> \'MyGame\':        ...
    def game_over(self) -> bool:        ...
    def get_winner(self):               ...  # returns player id or None

Installation

pip install -r requirements.txt

Running Experiments

Copy the example config and adjust as needed:

cp config/example.config.json config/config.json
# edit config/config.json
python src/run.py

You can also pass a config path explicitly:

python src/run.py config/example.config.json

Configuration Reference

Key Type Description
board_size int Board dimension (3 for 3x3)
mcts_iterations int MCTS simulations per move
player_type string "mcts" or "random" for X
rollout_policy string "heuristic" or "random" for X
selection_policy string "uct", "greedy", or "epsilon_greedy" for X
opponent_type string "mcts" or "random" for O
opponent_rollout_policy string Rollout policy for O
opponent_selection_policy string Selection policy for O
exploration_constant float UCB1 exploration constant $C$
num_games int Number of games to run
verbose bool Print board and move trace

About

A game-agnostic AI engine using Monte Carlo Tree Search for optimal decision-making.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages