Issue type
Need help
Summary
Some functions in /cellbox/train.py have some ambiguity in what task they perform. These are crucial to understand to reproduce similar results for Pytorch version of CellBox. Therefore, this issue is for resolving the ambiguity.
Details
- Line 76 to 79 in
train.py, are loss_valid_i and loss_valid_mse_i evaluated on one random batch fetched from args.feed_dicts['valid_set'], or are these losses evaluated on the whole validation set?
- The
eval_model function returns different values with different calls. At line 101 to 103, it returns both the total and mse loss for args.n_batches_eval number of batches on the validation set. At line 109 to 111, it returns only the mse loss for args.n_batches_eval number of batches on the test set. And at line 262 it returns the expression predictions y_hat for the whole test set. Are all of these statements correct?
- The
record_eval.csv file generated after training, using the default training arguments and config file as specified in the README (python scripts/main.py -config=configs/Example.random_partition.json), has test_mse column to be None. Is it the expected behaviour of the code?
random_pos.csv, generated after training, stores the index of the perturbation conditions. Does it indicate how the conditions for training, validation, and testing are split?
- After each substage, say substage 6, the code generates
6_best.y_hat.loss.csv, containing the expression prediction for perturbation conditions in the test set for all nodes, but it does not indicate which row in this file corresponds to which perturbation condition. How is this file and random_pos.csv related?
Issue type
Need help
Summary
Some functions in
/cellbox/train.pyhave some ambiguity in what task they perform. These are crucial to understand to reproduce similar results for Pytorch version of CellBox. Therefore, this issue is for resolving the ambiguity.Details
train.py, areloss_valid_iandloss_valid_mse_ievaluated on one random batch fetched fromargs.feed_dicts['valid_set'], or are these losses evaluated on the whole validation set?eval_modelfunction returns different values with different calls. At line 101 to 103, it returns both the total and mse loss forargs.n_batches_evalnumber of batches on the validation set. At line 109 to 111, it returns only the mse loss forargs.n_batches_evalnumber of batches on the test set. And at line 262 it returns the expression predictionsy_hatfor the whole test set. Are all of these statements correct?record_eval.csvfile generated after training, using the default training arguments and config file as specified in the README (python scripts/main.py -config=configs/Example.random_partition.json), hastest_msecolumn to be None. Is it the expected behaviour of the code?random_pos.csv, generated after training, stores the index of the perturbation conditions. Does it indicate how the conditions for training, validation, and testing are split?6_best.y_hat.loss.csv, containing the expression prediction for perturbation conditions in the test set for all nodes, but it does not indicate which row in this file corresponds to which perturbation condition. How is this file andrandom_pos.csvrelated?