ML reinforcement model for the game of Go https://www.nature.com/articles/nature24270 agz_unformatted_nature.pdf
This project implements a simplified version of DeepMind's AlphaZero algorithm to play the game of Go on a 5×5 board. Leveraging reinforcement learning (RL) without any human data, the system learns to achieve superhuman performance through self-play and Monte Carlo Tree Search (MCTS).
Key features:
- Inspired by Google DeepMind’s AlphaZero, achieving superhuman strength in board games.
- Pure RL approach: no human games or expert data used.
- Configurable board size (default 5×5, easily adjustable).
- Endgame and win conditions are tweaked for simplicity.
- Pretrained model (
model_19.pt) included for instant play.
Go is an ancient two-player strategy board game with simple rules but immense complexity:
- Board: Traditional sizes are 19×19, 13×13, or 9×9; here we use 5×5 for faster training and experimentation.
- Stones: Black and White alternate placing stones on empty intersections.
- Groups & Liberties: Connected stones form a group; liberties are adjacent empty points. Groups without liberties are captured and removed.
- Objective: Surround more territory (empty points) and capture opponent stones. In this simplified version, the winner is decided by stone count when no legal moves remain or a line of identical stones dominates.
Despite its simple rules, Go’s game-tree complexity on a 5×5 board is already on the order of 3.1×10⁶ possible positions; a 19×19 board has ~10¹⁷⁰ possibilities.
The code is organized into Jupyter notebook cells; here’s what each cell does:
-
Cell 1 (
GoGameAlphaZero):- Implements the Go game logic on an N×N board.
- Manages state representation, valid move generation, capturing rules, and endgame evaluation.
-
Cell 2 (
ResNet&ResBlock):- Defines the neural network architecture: a convolutional residual network for policy and value heads.
- Takes a 3-channel encoded board state and outputs move probabilities and game-value predictions.
-
Cell 3 (Demo & Visualization):
- Shows how to instantiate the game, make moves, encode state, and run the
ResNetforward pass. - Plots the policy distribution over the 25 actions using Matplotlib.
- Shows how to instantiate the game, make moves, encode state, and run the
-
Cell 4 (
Node&MCTS):- Implements the MCTS logic with Upper Confidence Bound (UCB) for selection, expansion, simulation via the neural network, and backpropagation of values.
-
Cell 5 (
AlphaZeroclass):- Coordinates self-play, training, and evaluation loops.
- Saves the best model based on win-rate improvement and checkpoints after each iteration.
-
Cell 6 (Interactive Play Script):
- Provides a command-line interface to play against the trained model.
- Loads
model_19.ptand uses MCTS for AI moves; allows human vs. AI games on the console.
- Clone the repository.
- Install dependencies:
pip install torch numpy matplotlib tqdm
- Place
model_19.ptin the project root for immediate play.
python play.py # uses model_19.pt by defaultFollow on-screen prompts to play as Black against the AI.
- Adjust parameters in
train.py(e.g., board size, number of iterations). - Run:
python train.py
- New models and optimizers are saved per iteration. To use a newly trained model, place its
.ptfile in the root and run the play script.
- Board Size: Change
board_sizeinGoGameAlphaZeroinitializer. - AlphaZero Arguments (in code):
C: Exploration constant for UCB.num_searches: MCTS rollouts per move.num_iterations: Meta-iterations of self-play + training.num_selfPlay_iterations: Games per iteration.num_epochs: Training epochs per iteration.batch_size: Samples per training batch.evaluation_games: Games to evaluate win rate.
Feel free to:
- Scale to larger board sizes.
- Experiment with network depth/width.
- Integrate with GPU training.
This project is released under the MIT License. Enjoy exploring AlphaZero on small boards!