Skip to content

KarsonE/Home-Credit-Default-Risk

Repository files navigation

Home-Credit-Default-Risk

MSBA Capstone Project 1

This markdown file represents some of my analysis for the 2023 University of Utah Masters in Business Analytics Capstone Project. For this project, we worked collaboratively in groups to: assess a business problem, perform exploratory data analysis (EDA), construct several machine learning models, and present our findings on the project. The project utilized data from a past Kaggle competition sponsored by Home Credit. Home Credit sought to use non-traditional analytical features to identify default risk in potential customers.

This notebook follows the same process described above in a more narrow fashion. The notebook contains the R code and explanations for summary EDA. It extracts important variables from a larger data set. The notebook builds three Naive Bayes (NB) models, which all narrowly outperform the majority class. I chose to focus on NB models for this project because of their resilient reputation for consistent performance with messy data sets. I was also interested in learning more about tuning NB models and measuring their performance. The third and final NB model performed better on the testing data partition than the prior two. I used a 10x cross-validation tuning grid with multiple parameters to optimize the model. I also upsampled the target variable (TARGET=1) to avoid problems with class imbalance.

This project has been an extensive learning experience. The data set was large and messy. Everything from data importation to model training and testing took patience. In preparing the markdown file in this repository, I was able to refine some of the initial steps. Nearly every step of the process had a logistical or theoretical learning curve. Even pulling out and cleaning up the functional code I wrote for other files has generated new and exciting errors in this notebook. It taught me about the iterative process inherent in data science. The CRISP-DM life cycle feels particularly true with large data, messy data sets like the ones we encountered with the home credit default risk analysis. Each step of the process involves a deeper understanding of both the data and the business needs.

About

MSBA Captsone Project 1

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published