To create and train network run:
.\mvnw.cmd clean package
java -jar .\target\tic-tac-toe-network-1.0.jarPrograms reads Q-Learning Tables from q-table-x.csv and q-table-x.csv files. This files may be generated by the Tic Tac Toe Q-Learning.
This program converts each board space to the number (-1 for empty space, 0 for O mark or 1 for X mark) and
generate vector of 9 numbers. For example, Q-Learning Table encodes board state for the following picture with
'oxx -ox --o'. Program converts this state to the vector [0, 1, 1, -1, 0, 1, -1, -1, 0].
Next, the program finds the best move(-s). The best move is the move with the highest numerical value. For example, if Q-Learning Table contains this row
| State | A1 | A2 | A3 | A4 | A5 | A6 | A7 | A8 | A9 |
|---|---|---|---|---|---|---|---|---|---|
| o-- --x ox- | - | -9.1 | -9.1 | -7.4 | -9.1 | - | - | - | -9.1 |
the best move is move to 4-th space.
Next, neural network is generated with 9 input nodes and 9 output nodes. Network is trained with board state on input nodes and with its best move(-s) on the output nodes. The output nodes contain the percentage values for moving to each of the 9 spaces. In total 100%. Network is trained with Backpropagation algorithm and Cross-Entropy loss function for all board states. As a result of training accuracy is printed. Accuracy is determined by the number of board states in which the neural network makes the best move.
The most accurate network from the iterations is written to the network.txt file. Run the program again without deleting the file, if you want to continue training the network further. Delete the file, if you want to start neural network training from the scratch.
network.txt contains 9+64+9 nodes in 3 layers (1225 parameters) and plays games with 98.4% accuracy. It's quite difficult to win, try it.
Other network configurations
| Network | Configuration | Parameters | Accuracy, % |
|---|---|---|---|
| network-2.txt | 9+2+9 | 47 | 46.3 |
| network-3.txt | 9+3+9 | 66 | 54.6 |
| network-9.txt | 9+9+9 | 180 | 75.9 |
| network-18.txt | 9+18+9 | 351 | 84.8 |
| network.txt | 9+64+9 | 1225 | 98.4 |
| network-81.txt | 9+81+9 | 1548 | 98.8 |
| network-90.txt | 9+90+9 | 1719 | 99.2 |
Example of console output for one of the train:
Create network with random weights and 9 nodes in input layer, 64 nodes in hidden layer, 9 nodes in output layer
Train iterations: 1000
10% done
20% done
30% done
40% done
50% done
60% done
70% done
80% done
90% done
100% done
xox o-- -x- : best move(-s) (any of) = [4, 8] , network answer (any of) = [4]
xxo o-x -xo : best move(-s) (any of) = [4] , network answer (any of) = [4]
o-o x-x -xo : best move(-s) (any of) = [4] , network answer (any of) = [1, 4] [MISTAKE]
--x o-- oxx : best move(-s) (any of) = [0] , network answer (any of) = [5] [MISTAKE]
--o xx- -ox : best move(-s) (any of) = [0, 1, 5, 6] , network answer (any of) = [0]
....
Trained for 4520 states
Mistakes: 73
Accuracy: 98,4%
Train time: 0.5 S
Network has been written to: network.txt