Skip to content

Haelles/DataMiningProject

Repository files navigation

DataMiningProject

Project for DATA620007, Fudan University

  • Homework 1 - 6, including Linear Regression, Logistic Regression, Naive Bayes, Decision Trees, AdaBoost, Random Forest, PCA, K means, etc.

The main content is as follows.

  • Developed logistic regression model for default probability prediction using user credit data, employed forward selection algorithm with BIC criterion for feature selection, achieved 0.83 AUC on the test set
  • Conducted analysis on the popularity for Olympic videos, implemented Decision Trees, AdaBoost and Random Forest for categorizing the degree of discussion, performed parameter tuning with five-fold cross-validation on the training set, achieved 0.93 AUC on the test set
  • Analyzed player market values in the big five European leagues, utilized PCA model and determined the number of principal components using scree plots, calculated factor loadings to interpret the importance of original variables, applied K-means clustering based on principal component scores and determined the optimal number of clusters using the elbow method

Final project

  • Conducted data analysis for wine quality classification, including normality testing and kernel density estimation, implemented GBDT, XGBoost+LR models, achieved AUC of 0.88 on the test set.

About

DataMiningProject for DATA620007, Fudan University

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published