An implementation of many different methods for improving the performance of Q-learning tested on the game Snake.
Developed over a few years, achieving very good performance on 7x7 maps, where it averages around 45 out of 49 possible apples.
Note that in general, Q-learning is suboptimal for most RL scenarios and PPO is often the go to.