* Make a note that students should not change the input or output node sizes * Swap to pytorch. * Include visualization of each class' parameter distributions instead of just printing it * Improve explanation of train/test split in text of notebook -> general details of the data processing