This repository contains a comprehensive suite of supervised machine learning algorithms, providing a structured approach to predictive modeling. The implementation covers the entire machine learning pipeline, including Exploratory Data Analysis (EDA), data preprocessing, model training, and performance evaluation.
The project documents the transition from fundamental linear logic to advanced ensemble techniques:
- Simple Linear Regression (SLR): Modeling the linear relationship between a single independent and dependent variable.
- Multiple Linear Regression (MLR): Analyzing how multiple independent features (e.g., house size, bedrooms) influence a single target outcome.
- Polynomial Regression: Utilizing nth-degree polynomials to model non-linear and curvilinear relationships in data.
- Regularization (Ridge & Lasso): Implementing L1 and L2 regularization to manage model complexity, prevent overfitting, and perform feature selection.
- Logistic Regression: Binary classification using the Sigmoid function to predict class probabilities.
- K-Nearest Neighbors (KNN): An instance-based learning method that classifies data points based on their proximity in the feature space.
- Support Vector Machines (SVM): Determining optimal hyperplanes to separate classes, applied to diagnostic tasks such as Breast Cancer classification.
- Decision Trees: Hierarchical data splitting based on criteria like Gini Impurity and Entropy to create interpretable decision rules.
- Naïve Bayes: A probabilistic classifier based on Bayes' Theorem, assuming independence between input features.
- Bagging (Bootstrap Aggregating): Training multiple independent models on data subsets to reduce variance (e.g., Random Forest logic).
- Boosting: Sequentially training models where each new model attempts to correct the errors of its predecessor to reduce bias.
- Stacking: Combining diverse base learners via a meta-model to integrate their predictions for enhanced accuracy.
The repository is organized into modular Jupyter Notebooks and supporting datasets:
- Implementation Notebooks: Core files including
SVM_BreastCancer.ipynb,MultipleLinearRegression_on_HousingDataset.ipynb, andSimpleLinearRegression_Demo.ipynb. - Notebook Documentation: Each notebook contains executable code, mathematical explanations, and result visualizations.
- Datasets: Includes standard benchmarks such as Housing, Student Performance, and Medical diagnostic data (e.g.,
housing.csv,Student_Performance.csv). - Reference Files: Includes
SML_Lab_Manual.docx,SML_Lab_List of Experiments.docx, and the course syllabus for technical context.
- Language: Python
- Core Libraries: Scikit-learn, NumPy, Pandas
- Visualization: Matplotlib, Seaborn
- Platform: Jupyter Notebook / Google Colab
Developed with a focus on robust software architecture and comprehensive result analysis.