Now Let me take you along a journey on finding the right model for the task. It started with MobileNet vs efficientnet, finally deciding to move forward with efficientnetb4. That was a mistake, while the model has only 19.3 million parameters it still was not quite computationally efficient enough for the task! Then comes efficientnet v2 small containing nearly 22 million parameters but an improved speed on training times and improved accuracy. I then compared this model side by side against mobile ViT.
| Model | Final Accuracy | Training Time | Parameters | Throughput |
|---|---|---|---|---|
| EfficientNetV2-S | 91.54% | 2817.9s | 20,269,720 | 822.2 samples/sec |
| MobileViT | 89.61% | 2826.6s | 1,960,568 | 819.6 samples/sec |
That was not enough to convince me to use efficientnet v2s. Yes it is a fantastic SOTA model but still required more resources which is why I started looking more into the mobile Vit models and found mobileViT v2 had about 3 million parameters which provides more room if Id like to include something like segmentation that could potentially take up more resources.
Open Google Colab for live demo using MobileViT v2 classify disease on your own image (will provide what plants it will be capable of analyzing soon!)
GradCam was utilized to understand what MobileViT v2 was focusing on.
- lightweight
- easy to train
- closer to traditional cnn than v1