This project is a "from scratch" PyTorch transformer implementation. It does a great job of demonstrating how the internal modules of a transformer work as the MHA, encoder, decoder and positional encoder are re-implemented.
I found that using nn.Transformer or nn.TransformerEncoder&nn.TransformerDecoder was more practical than this tutorial for getting a basic model working.
Attention is all you need implementation
YouTube video with full step-by-step implementation: https://www.youtube.com/watch?v=ISNdQcPhsts