Skip to content

mst-2005/Supervised-Machine-Learning

Repository files navigation

Supervised Machine Learning Implementations 🧠

This repository contains a comprehensive suite of supervised machine learning algorithms, providing a structured approach to predictive modeling. The implementation covers the entire machine learning pipeline, including Exploratory Data Analysis (EDA), data preprocessing, model training, and performance evaluation.

📂 Core Implementations

The project documents the transition from fundamental linear logic to advanced ensemble techniques:

Regression Analysis

  • Simple Linear Regression (SLR): Modeling the linear relationship between a single independent and dependent variable.
  • Multiple Linear Regression (MLR): Analyzing how multiple independent features (e.g., house size, bedrooms) influence a single target outcome.
  • Polynomial Regression: Utilizing nth-degree polynomials to model non-linear and curvilinear relationships in data.
  • Regularization (Ridge & Lasso): Implementing L1 and L2 regularization to manage model complexity, prevent overfitting, and perform feature selection.

Classification Algorithms

  • Logistic Regression: Binary classification using the Sigmoid function to predict class probabilities.
  • K-Nearest Neighbors (KNN): An instance-based learning method that classifies data points based on their proximity in the feature space.
  • Support Vector Machines (SVM): Determining optimal hyperplanes to separate classes, applied to diagnostic tasks such as Breast Cancer classification.
  • Decision Trees: Hierarchical data splitting based on criteria like Gini Impurity and Entropy to create interpretable decision rules.
  • Naïve Bayes: A probabilistic classifier based on Bayes' Theorem, assuming independence between input features.

Advanced Ensemble Methods

  • Bagging (Bootstrap Aggregating): Training multiple independent models on data subsets to reduce variance (e.g., Random Forest logic).
  • Boosting: Sequentially training models where each new model attempts to correct the errors of its predecessor to reduce bias.
  • Stacking: Combining diverse base learners via a meta-model to integrate their predictions for enhanced accuracy.

🛠️ Project Structure

The repository is organized into modular Jupyter Notebooks and supporting datasets:

  • Implementation Notebooks: Core files including SVM_BreastCancer.ipynb, MultipleLinearRegression_on_HousingDataset.ipynb, and SimpleLinearRegression_Demo.ipynb.
  • Notebook Documentation: Each notebook contains executable code, mathematical explanations, and result visualizations.
  • Datasets: Includes standard benchmarks such as Housing, Student Performance, and Medical diagnostic data (e.g., housing.csv, Student_Performance.csv).
  • Reference Files: Includes SML_Lab_Manual.docx, SML_Lab_List of Experiments.docx, and the course syllabus for technical context.

🧪 Technical Stack

  • Language: Python
  • Core Libraries: Scikit-learn, NumPy, Pandas
  • Visualization: Matplotlib, Seaborn
  • Platform: Jupyter Notebook / Google Colab

Developed with a focus on robust software architecture and comprehensive result analysis.

About

Implementation of core supervised machine learning algorithms including Regression (Linear, Multiple, Polynomial, Ridge, Lasso), Classification (KNN, Decision Trees, Logistic Regression, SVM, Naïve Bayes), and Ensemble Methods (Bagging, Boosting, Stacking) using Python.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors