Hi,
I had the problem that my estimator got into NaN values after some steps. Sometimes from the second step, and sometimes it trains normally for longer (>1000 steps), depending on different setups (i.e. linear layer dimension, depth, learning rate). Any idea on this?
Thank you!
Hi,
I had the problem that my estimator got into NaN values after some steps. Sometimes from the second step, and sometimes it trains normally for longer (>1000 steps), depending on different setups (i.e. linear layer dimension, depth, learning rate). Any idea on this?
Thank you!