This repository contains the complete end-to-end Data Cleaning and Exploratory Data Analysis (EDA) workflow performed on a real-world Layoff Dataset using MySQL. All SQL scripts in this repository were written and executed by me (Ajay Tiwari).
├── CleaningDataset.sql # Full data cleaning process (duplicates, NULLs, standardization)
├── ExploratoryDataAnlysis.sql # EDA queries to understand trends and insights
├── layoffs.csv # Raw dataset used for the project
├── README.md # Project documentation (this file)
└── LICENSE
The cleaning pipeline is implemented step-by-step in CleaningDataset.sql. Key steps performed:
-
Create a staging copy of the raw table to avoid altering original data:
CREATE TABLE layoff_staging LIKE layoffs; INSERT INTO layoff_staging SELECT * FROM layoffs;
-
Detect & remove duplicates using
ROW_NUMBER()window function and intermediate staging tables (layoff_staging2,layoff_staging3):INSERT INTO layoff_staging2 SELECT *, ROW_NUMBER() OVER ( PARTITION BY company, location, industry, total_laid_off, percentage_laid_off, `date`, stage, country, funds_raised_millions ) AS row_num FROM layoff_staging; DELETE FROM layoff_staging2 WHERE row_num > 1;
-
Standardize text fields:
- Trim whitespace from
company,location, andcountry - Normalize industry values (e.g.,
CryptoCurrency,Crypto Currency→Crypto) - Remove trailing dots from
locationandcountryusingTRIM()
- Trim whitespace from
-
Convert
datecolumn to properDATEtype:UPDATE layoff_staging2 SET `date` = STR_TO_DATE(`date`, '%m/%d/%Y'); ALTER TABLE layoff_staging2 MODIFY COLUMN `date` DATE;
-
Handle missing/NULL values:
- Identify rows where
total_laid_offandpercentage_laid_offare NULL and remove them - Fill missing
industryvalues by joining rows for the same company and copying non-null industry values
- Identify rows where
-
Final cleaned table:
layoff_staging3(1995 records after cleaning in the provided script)
All EDA queries are in ExploratoryDataAnlysis.sql. Major analyses include:
-
Basic stats
SELECT MAX(total_laid_off) FROM layoff_staging3; -- highest single layoff count (12000) SELECT MAX(percentage_laid_off) FROM layoff_staging3; -- highest percentage (100%)
-
Top companies by total layoffs
SELECT company, SUM(total_laid_off) FROM layoff_staging3 GROUP BY company ORDER BY SUM(total_laid_off) DESC;
-
Industry-wise layoffs
SELECT industry, SUM(total_laid_off) FROM layoff_staging3 GROUP BY industry ORDER BY 2 DESC;
Finding: Consumer industry had the highest layoffs (~45,182).
-
Country-wise layoffs
SELECT country, SUM(total_laid_off) FROM layoff_staging3 GROUP BY country ORDER BY 2 DESC;
Finding: United States leads (~256,559), India is the second highest.
-
Year-wise & month-wise trends
SELECT YEAR(`date`) AS year, SUM(total_laid_off) FROM layoff_staging3 GROUP BY YEAR(`date`); SELECT SUBSTRING(`date`, 1, 7) AS month, SUM(total_laid_off) FROM layoff_staging3 GROUP BY month;
Finding: Peak layoffs in 2022 and 2023 (post-pandemic trends).
-
Stage-wise analysis
SELECT stage, SUM(total_laid_off) FROM layoff_staging3 GROUP BY stage;
Finding:
Post-IPOstage had the highest layoffs (~204,132). -
Rolling totals & rankings
- Calculated cumulative monthly layoffs using window functions.
- Used
DENSE_RANK()per year to extract top 5 companies each year.
- Create a MySQL database (example:
layoff_db) and importlayoffs.csvinto a table namedlayoffs. - Open MySQL client or MySQL Workbench and run the SQL script
CleaningDataset.sqlstep-by-step. - After cleaning completes, run
ExploratoryDataAnlysis.sqlto generate analysis outputs. - Export query results or visualize using a BI tool (Tableau, Power BI, or Python/R) as needed.
- The SQL scripts use window functions, CTEs, and standard MySQL functions (
ROW_NUMBER(),STR_TO_DATE(),TRIM(),SUBSTRING(), etc.). - The cleaning process intentionally preserves the raw
layoffstable and operates on staging tables to keep reproducibility and safety. - The README and scripts are authored by Ajay Tiwari. For clarifications, refer to the SQL files or contact the repository owner.
This repository is released under the MIT License. See LICENSE for details.
- Ajay Tiwari
- B.Tech - Computer Science and Engineering (Artificial Intelligence): 2022-26