Dear @RJT1990,
Awesome project there !!!
I looked internally and I have seem metrics being manually implemented without any testing.
This makes me pretty scary in term of reproducibility and accurate reporting.
I think you should consider using https://github.com/PytorchLightning/metrics as the tool for benchmarking the runs.
There are extremely well tested metrics which works automatically in distributed settings and plain PyTorch.
Best,
T.C