Customer Retention Cohort Analysis
Project Overview
This project analyzes customer behavior using cohort analysis, focusing on acquisition patterns, retention trends, and cumulative customer value over time. The goal is to understand how different cohorts perform and derive actionable business insights to improve acquisition quality, retention strategy, and customer lifetime value.
This project answers key business questions such as:
Which period brought the highest number of newly acquired customers?
Are there specific months where customer acquisition significantly declined?
How does customer retention trend over time across cohorts?
Which cohorts contributed the highest customer value?
Are there cohorts that consistently bring lower value customers?
Does the quality of acquired customers remain consistent across months?
Is there alignment between acquisition volume and customer value/retention?
Github Repositories: Link
Dataset: Online Retail II
Reference: wunderdata - cohort analysis
Methodology
Cohort Definition: Customers are grouped based on the month of their first transaction.
Retention Measurement: Retention is calculated as the percentage of customers from each cohort who remain active in subsequent periods.
Activity Definition: A customer is considered active if they record at least one transaction in the given period.
Cumulative Sales Analysis: Computed average cumulative sales per customer for each cohort to evaluate long-term customer value and spending patterns.
Visualization Techniques: Insights presented using heatmaps, line charts, and cohort tables for retention and revenue behavior.
Key Insights
- Dec-2009 acquired the largest number of customers.
- There is a significant drop in customer acquisition from April to September.
- Retention improves noticeably in September–November 2010.
- Retention declines during June–August, indicating seasonal or engagement issues.
- High-value customers were acquired in Dec-2009 and Sep-2010.
- Customers acquired between April and August generate lower value than other cohorts.
Business Implications
| Insight | Recommended Action |
|---|---|
| December 2009 acquired the highest number of customers | Replicate the acquisition strategies used during this period. |
| Significant drop in customer acquisition Apr–Sep | Investigate marketing activity and strengthen acquisition campaigns. |
| Retention increases in Sep–Nov 2010 | Apply the successful retention tactics more consistently. |
| Retention declines in Jun–Aug | Launch re-engagement initiatives and analyze seasonal/operational drivers. |
| High-value customers acquired in Dec 2009 & Sep 2010 | Scale the channels that brought these valuable segments. |
| Low-value customers acquired Apr–Aug | Improve onboarding, upselling, and cross-selling for these cohorts. |
Tools & Libraries
Python: pandas, numpy, matplotlib, seaborn
Jupyter Notebook / Quarto for analysis and reporting
Project Structure
├── data/
│ └──
├── notebooks/
│ └── cohort_analysis.ipynb
│ └──
├── reports/
│ └── cohort_analysis.qmd
├── src/
│ └── data_processing.py
│ └── cohort_functions.py
└── README.md
Full Code & Details
This project consist of 2 Notebooks:
- Data Cleaning & EDA Notebook
Here I perform some basic EDA and do a little cleaning. - Cohort Analysis Notebook
Here is where I do the actual analysis for the dataset.