GitHub

Crop Damage Classification

Using Machine Learning Algorithm to identify different crop categories in smallholders farms in Africa

Introduction

This post describes how we used the machine learning algorithm to classify crops majorly maize into different categories;Good growth, Nutrient Deficiency, Weed, Drought and Others(Pests, Diseases and fire damage) for easier retrieval and follow up by the insurance to enhance easier payout to farmers incase of any claims. I used this pipeline to enter Zindi’s Crop Damage Classification. We may not have won the contest but we learnt some great techniques for working with image classification which I detail in this post. Here are the preprocessing steps we followed: 1. 2. 3. 4. 5. 6. 7. 8. The python notebooks I created can be found in this github repository:https:https://github.com/MugoDom/crop_damage_classification/blob/main/index.ipynb

The Challenge

Zindi is an African competitive data science platform that focusses on using data science for social benefit. In Zindi’s 2019 Farm Pin Crop Detection Challenge, participants to **trained machine learning models using Image Classification in order to classify the crops being grown in fields in Africa.

The data used was gathered from the pictures sent by insured farmers from their smartphone.The data supplied to contestants consisted of two shape files containing the training set and test set. The bar graph above shows the damage plot distribution analysis. The training and test sets consisted of 26068 fields and 8663 fields respectively. Each field in the training set was labelled with the ID, damage and file name. The crop type was majorly Maize and it was classified into different categories;

Good growth (G)
Drought (DR)
Nutrient Deficient (ND)
Weed (WD)
Other (including pest, disease, or wind damage)

Data preprocessing.

Dividing the Dataset into Training and Testing Sets

we chose to partition the provided training data into distinct train and test sets. This decision was made to guarantee the availability of a dedicated test set for evaluating our optimized model.

Organizing the images

We first defined the source and destination directories for the images, created the destination directories if they didnt exist and moved the images to their respective directories.Went ahead and checked the number of images in each directory to confirm the data balancing. We did the minority class oversampling to handle class imbalance issues.

Memory-Efficient Data Loading

Given the substantial size of our dataset, employing traditional preprocessing techniques—such as loading the entire dataset into memory—would have posed significant memory challenges. In light of this, we opted for the use of the ImageDataGenerator.

Building a prediction model

The organized training images in the base directory were resized to (224,224) pixels processing them in batches of 256 during training.

Data Augmentation

rescaling
horizontal flips
zoom
shear we used the above to enhance the models efficiency.

Training and Validation

The data was split into training and validation sets using validation_split=0.2. We visualized the training and validation performance metrics using two subplots; T a) The loss; b) The Accuracy plot as shown below Confusion Matrix for the base model

Optimized Model Evaluation

Results and Areas of Improvement

The imbalance in the classes was one of the biggest problems we encountered. This clearly had an impact on the model. We attempted to improve the model by oversampling the minority class, but the results don't seem to be very noticeable. The low F1 score could be partially explained by this. We also tried with adding class weighting to the model, but this did not improve performance; instead, it increased overfitting. After giving this some thought, we'll investigate alternative approaches like loss functions, which, unlike cross entropy, consider every label equally and so account for the class imbalance in the dataset.

We also plan to investigate the fastai deep learning library, which offers high-level building blocks that quickly yield "state-of-the-art" outcomes. Another challenge was running out of compute resources limiting the number of epochs we can run. Given enough computing resources we could run more epochs with guaranteed improvement to performance.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
data		data
deployment		deployment
logs		logs
models/optimized_model		models/optimized_model
plots		plots
presentation		presentation
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
crop_image.jpg		crop_image.jpg
index.ipynb		index.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Crop Damage Classification

Introduction

The Challenge

Data preprocessing.

Dividing the Dataset into Training and Testing Sets

Organizing the images

Memory-Efficient Data Loading

Building a prediction model

Data Augmentation

Training and Validation

Optimized Model Evaluation

Results and Areas of Improvement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

MugoDom/crop_damage_classification

Folders and files

Latest commit

History

Repository files navigation

Crop Damage Classification

Introduction

The Challenge

Data preprocessing.

Dividing the Dataset into Training and Testing Sets

Organizing the images

Memory-Efficient Data Loading

Building a prediction model

Data Augmentation

Training and Validation

Optimized Model Evaluation

Results and Areas of Improvement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages