This repository demonstrates the implementation of Customer Segmentation using the RFM (Recency, Frequency, Monetary) model and K-Means clustering. The analysis focuses on segmenting customers of an online retail store based on their purchasing behaviour, providing actionable insights to enhance customer engagement and marketing strategies.
RFM is a technique used in marketing to evaluate and segment customers based on their purchasing behavior:
- Recency (R): How recently a customer made a purchase.
- Frequency (F): How often a customer makes purchases.
- Monetary (M): How much money a customer spends.
By assigning scores to each customer for R, F, and M metrics, we can create meaningful customer segments and design targeted strategies for each group.
-
Data Cleaning:
- Removed duplicates and irrelevant data.
- Handled missing values.
- Filtered valid transactions (excluding cancellations and refunds).
-
RFM Feature Engineering:
- Computed Recency, Frequency, and Monetary values for each customer.
- Scored customers based on their RFM metrics.
-
K-Means Clustering:
- Scaled data using
StandardScalerfor normalization. - Used the Elbow Method to determine the optimal number of clusters.
- Applied K-Means clustering to segment customers.
- Scaled data using
-
Data Visualization:
- Visualized RFM metrics and clusters using scatter plots, heatmaps, and bar charts.
-
Insights Generation:
- Identified key customer segments such as "High-Value Customers" and "At-Risk Customers."
The dataset is from an online retail store, containing transactional data, including:
- Customer ID
- Invoice Date
- Invoice Number
- Transaction Amount
(Note: Actual data files are available in the data/ directory. Ensure the dataset is clean before proceeding.)
-
Data Cleaning:
- Handle missing values and filter relevant data.
-
RFM Analysis:
- Compute Recency, Frequency, and Monetary values.
- Score customers and create RFM segments.
-
Scaling and Clustering:
- Scale the RFM data using
StandardScaler. - Determine the optimal number of clusters using the Elbow Method.
- Perform K-Means clustering.
- Scale the RFM data using
-
Visualization and Insights:
- Visualize clusters and generate actionable insights.
- Follow the Jupyter notebooks in the
notebooks/directory to reproduce the analysis:01_data_cleaning.ipynb02_rfm_analysis.ipynb03_clustering_kmeans.ipynb04_visualization.ipynb
Through the analysis, the following customer segments were identified:
- High-Value Customers: Recently purchased, frequent buyers, and high spenders.
- Loyal Customers: Frequent buyers with moderate spending.
- At-Risk Customers: Customers who haven't purchased recently and have low spending.
- Potential Loyalists: Customers who recently purchased and spent a significant amount but have low frequency.
These insights can be used to:
- Target marketing campaigns.
- Design loyalty programs for high-value customers.
- Re-engage at-risk customers.