This project aims to provide a codebase for the image classification task implemented by PyTorch. It does not use any high-level deep learning libraries (such as pytorch-lightening or MMClassification). Thus, it should be easy to follow and modified.
The code is tested on python==3.9, pyhocon==0.3.57, torch=1.8.0, torchvision=0.9.0
First, clone this repo:
git clone --recursive https://github.com/chenyaofo/image-classification-codebase
Then, you can get started with a resnet20 convolution network on cifar10 with the following command.
Single node, single GPU:
CUDA_VISIBLE_DEVICES=0 python -m entry.run --conf conf/cifar10.conf -o output/cifar10/resnet20Tips: run
CUDA_VISIBLE_DEVICES=0 python -m entry.run --conf conf/resnet50-benchmark.conf -o output/benchmarkto check throughput performance, more details can be found at doc/benchmark.md
If you want to use multiple GPUs to accelerate the training with distributed data paralle, you can run the following command:
Single node, multiple GPUs:
CUDA_VISIBLE_DEVICES=0,1 python -m entry.run --world-size 2 \
--conf conf/cifar10.conf -o output/cifar10/resnet20Multiple nodes:
Node 0:
CUDA_VISIBLE_DEVICES=0,1 python -m entry.run --world-size 4 --dist-url \
'tcp://IP_OF_NODE0:FREEPORT' --node-rank 0 --conf conf/cifar10.conf -o output/cifar10/resnet20Node 1:
CUDA_VISIBLE_DEVICES=0,1 python -m entry.run --world-size 4 --dist-url \
'tcp://IP_OF_NODE1:FREEPORT' --node-rank 1 --conf conf/cifar10.conf -o output/cifar10/resnet20This codebase adopt configuration file (.hocon) to store the hyperparameters (such as the learning rate, training epochs and etc.).
If you want to modify the configuration hyperparameters, you have two ways:
-
Modify the configuration file to generate a new file.
-
You can add
-Min the running command line to modify the hyperparameters temporarily.
For example, if you hope to modify the total training epochs to 100 and the learning rate to 0.05. You can run the following command:
CUDA_VISIBLE_DEVICES=0 python -m entry.run --conf conf/cifar10.conf -o output/cifar10/resnet20 -M max_epochs=100 optimizer.lr=0.05If you modify a non existing hyperparameter, the code will raise an exception.
To list all valid hyperparameters names, you can run the following command:
pyhocon -i conf/cifar10.conf -f properties- We use NVIDIA DALI to accelerate the data preprocessing on ImageNet (use it by the flag
data.use_dali) and tfrecord format to store the ImageNet (create the tfrecords bytools/make_tfrecord.pyand use it by the flagdata.use_tfrecord).
Finally, enjoy the code.
@misc{chen2020image,
author = {Yaofo Chen},
title = {Image Classification Codebase},
year = {2021},
howpublished = {\url{https://github.com/chenyaofo/image-classification-codebase}}
}