Hi, I saw you add dropout layer after word embedding, which was not mentioned in rnnsearch paper "Neural Machine Translation by Jointly Learning to Align and Translate". Does this trick improve some performance? Is this implemented in vanilla theano version groundhog?
Thanks!
Hi, I saw you add dropout layer after word embedding, which was not mentioned in rnnsearch paper "Neural Machine Translation by Jointly Learning to Align and Translate". Does this trick improve some performance? Is this implemented in vanilla theano version groundhog?
Thanks!