Ideally I want to be able to leave things to train during the night.
So the logic should have something like
- train for 2 epochs to see if things fail
- if not continue
- if things fail, raise unsuccessful status
This means I will first check training feasibility before leaving stuff to train.