Churn Prediction: 12-week Observation window with 1-Week Gap

Machine Learning
Portofolio
Built a customer churn prediction model using RFM-T feature as predictors that is extracted from the past 12-week observation windows with a 1-week gap. Achieved realistic AUC of 0.623 and 75.9% F1-score on time-based test set.
Published

November 26, 2025

Project Overview

This project predicts whether a customer will churn (stop transacting) in the next 6 weeks based on their past behavior. I used the past 12-week as the observation window and a 1-week gap between observation and prediction windows.

The model is designed to simulate real production conditions — no future information is ever used during training.

Dataset: Online Retail II
This Project Github: link

Key Design Decisions

Component Choice Reason
Data generation Sliding window across the entire timeline
Observation Window 12 weeks ~2 median purchase cycles
Gap 1 full week (strictly excluded) Current week is incomplete → cannot be used in features
Prediction Window Next 6 weeks after gap median purchase cycles
Label 0 transactions in PW → churn = 1 Business definition
Train-test split Strict time-based (latest period held out as test) No Overlapping period

Features Used (RFM-based)

  • Recency (weeks since last transaction)
  • Frequency (number of transactions in 12 weeks)
  • Monetary value
  • Average Order Value (AOV)
  • Customer tenure (weeks since first transaction)

Model

  • Logistic Regression
  • Xgboost
  • LightGBM
  • Catboost

Threshold Selection

To determine the threshold, I evaluate multiple criteria:

  • F1-maximizing threshold — balances precision and recall (most commonly used in retention campaigns)
  • Closest point to (0,1) on the ROC curve — geometrically nearest to perfect classification
  • The optimal threshold selected by prioritizing catching more churners (recall) rather than how clean the predictions (precision).

Most of the model final threshold uses the F1-maximizing threshold. This threshold was then fixed and evaluated on a time-based test set (the most recent period, no overlap with training data)

Model Comparison (Time-based Test Set)

Model AUC F1 Recall Precision
Logistic Regression 0.565 0.760 0.973 0.635
XGBoost 0.614 0.754 0.931 0.6338
CatBoost 0.620 0.756 0.938 0.633
LightGBM 0.623 0.759 0.940 0.636

Tech Stack

  • Python
  • Pandas
  • Scikit-learn
  • Matplotlib / Seaborn
  • LGBM
  • XGBoost
  • CatBoost
  • Quarto (for documentation)

Full Code & Details

Complete notebook below.

Data Cleaning & EDA Notebook

I use the same data cleaning and EDA for this project and the cohort analysis project as it uses the same dataset.