Stochastic Gradient Boosting Approach to Daily Attrition Scoring

Transcription

Stochastic Gradient Boosting Approach to Daily Attrition Scoring
Stochastic Gradient Boosting Approach to Daily Attrition Scoring
Based on High-dimensional RFM Features
Dr. Gerald Fahner
Senior Director Analytic Science, FICO
© 2015 Fair Isaac Corporation. Confidential.
This presentation is provided for the recipient only and cannot be reproduced or shared without Fair Isaac Corporation’s express consent.
Agenda
•
Ultra-dynamic Attrition Scoring
•
Case Study—Credit Card Attrition
•
Category Attrition
© 2015 Fair Isaac Corporation. Confidential.
2
Ultra-Dynamic (Daily) Attrition Scoring Approach
Customer uses card
Daily attrition risk score
•
Prolonged inactivity signals higher risk—drives up attrition risk score
 Re-engage customer when attrition risk exceeds some threshold
… Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We
…
© 2015 Fair Isaac Corporation. Confidential.
3
Transaction Dynamics Hold Key Information
•
Given information at time of scoring, who is more likely to attrite?
─
•
Which measures are most informative?
How to combine Recency and Frequency into predicting attrition risk?
Recency
Spence
Observation Period
Attrite?
Frequency
Time of Scoring
Attila
Observation Period
© 2015 Fair Isaac Corporation. Confidential.
4
Attrite?
Recency:
Days since last card use
Frequency:
Fraction of days card
used during obs. period
How Machine Learning Complements Domain Expertise
Domain Expertise
Machine Learning
Good at intuiting key predictors 1
Doesn’t scale to many variables
Lacks intuition
2
Poor at combining multiple predictors
Poor at quantifying uncertainty
Need story behind the numbers 4
3
Excels at combining many
features into accurate
probabilistic predictions
Diagnose and visualize models to
gain insight into effects
# Recommended path
© 2015 Fair Isaac Corporation. Confidential.
5
Key Elements of Approach
Based on Recencies, Frequencies, Monetary
values
•
High-dimensional feature space of complex events
Featurization of
transaction events
•
Machine learning /
classification tools
•
•
Stochastic Gradient Boosting
Partial dependence visualization
Performance evaluation
•
•
Lift related to portfolio profit gain
Out-of-sample / Out-of-time evaluation
© 2015 Fair Isaac Corporation. Confidential.
6
Stochastic Gradient Boosting[1]
Combines predictions from 100’s or 1000’s of shallow CARTs
Training Data
Prediction
Function
CART 2
Weighted
average
Score
Outcomes
CART 1
Scored
? New case
CART M
Predictors
Inexplicable model by direct inspection
© 2015 Fair Isaac Corporation. Confidential.
7
Predictors
Agenda
•
Ultra-dynamic Attrition Scoring
•
Case Study—Credit Card Attrition
─
•
Machine Learning for Higher Profit
Category Attrition
© 2015 Fair Isaac Corporation. Confidential.
8
Credit Card Case Study
Data and Project Design
~5 million accounts. More than 1 billion transactions over 3 years
• Transaction information: Date, Merchant Code, Amount, Authorized Flag
•
2 years
Model
development
6 months
Performanc
e period
Observation period
Time of Scoring
Out-of-Time
validation
Observation period
Performance
period
Attrition Performance Definition
Scoring Exclusions
Binary indicator of card activity during Performance period
Inactive
© 2015 Fair Isaac Corporation. Confidential.
9
Statistical Measures of Model Performance
Lift and Precision
Target top α %
High Scores
with retention offer
Would-be attriters
Non-attriters
λ=
=
Low Scores
© 2015 Fair Isaac Corporation. Confidential.
Lift at α % operating point:
10
Fraction of Attriters Among Targeted
Base Attrition Rate
Precision
Base Attrition Rate

 # Attriters Among Targeted
# Targeted 
=
# Attriters Total
# Total
(
)
Profit from a Retention Campaign
Actual Behavior of
Targeted Customer
Profit Contribution
per Customer
Would-be attriter
we persuade to stay
(CLV Gain
– Contact Cost
– Incentive Cost)
Precision * Persuasion Rate
Unpersuadable
attriter
(No CLV Gain
– Contact Cost)
Precision * (1–Persuasion Rate)
Non-attriter,
erroneously targeted
(No CLV Gain
– Contact Cost
– Incentive Cost)
© 2015 Fair Isaac Corporation. Confidential.
11
Fraction of Targeted Customers
with this Behavior
1–Precision
Profit Gain From Attrition Model Improvement[2]
Gain = (λB − λ A ) Nαβ 0 (γ CLV + δ (1 − γ )) is Portfolio Profit Gain
from improving model B over model A, where :
λA
Lift from model A
λB
Lift from model B
α
β0
Targeting Fraction
5%
Base Attrition Rate
8%
N
CLV
δ
γ
Portfolio Size
5 million
Customer Lifetime Value
$1,000
Incentive Cost
$100
Persuasion Rate
20%
© 2015 Fair Isaac Corporation. Confidential.
12
Will benchmark
alternative models
Portfolio-specific
assumptions
Benchmarking Predictive Models of Increasing Complexity
•
How much can we gain by making models more complex?
•
Are complex models robust over time?
Model 3: Interaction
model in R and F of
complex events
Model 2: Interaction
model in R and F of
card use
Model 1: Additive
model in R and F of
card use
Complex Event Examples
•
•
•
R: Recency F: Frequency
Recent restaurant visit and frequent hotels
More than $1,000 spent on travel last week
Recent car deal and frequently at the
pump
Dimensionality of Feature Space
© 2015 Fair Isaac Corporation. Confidential.
13
Interaction Detection Experiment
 Should Capture (Recency X Frequency) Interactions
•
Predictors: Recency and Frequency of card use
─
Model 1: Additive, nonlinear in R and F
─ Model 2: Captures interaction between R and F
Out-of-sample / λ = 6.03
1
Out-of-time
⇒
λ2 = 6.54
validation
•
Gain = $2.86 MM s.t. portfolio assumptions
Interaction effect in agreement with research by Fader and Hardie[3]
© 2015 Fair Isaac Corporation. Confidential.
14
Interaction Visualization Tells Story
Two-dimensional Partial Dependence Function[4]
Probability to use
card during next
6 months
= 1–Pr(Attrition)
Attila
Spence
Attila is at higher risk of
attrition because his card
use has lapsed for an
unusually long time interval
?
Spence: R=20, F=0.05
?
Frequency
Recency
Fraction of days card used
Days since last card use
© 2015 Fair Isaac Corporation. Confidential.
15
Attila:
R=20, F=0.55
Featurization Experiment
 Should Capture Complex Events in Your Models
•
Define R and F features for complex events
•
Model 3: Candidate predictors include:
Card use events
+ Hundreds of merchant category events
+ Monetary events defined by spending bands
+ No-authorization events
Out-of-sample /
Out-of-time
validation
λ3 = 7.52
Recall :
λ1 = 6.03
λ2 = 6.54
⇒ Gain over Model 1 (simple, additive) = $8.34 MM s.t. portfolio assumptions
© 2015 Fair Isaac Corporation. Confidential.
16
Learning Curves Experiment
 Should Exploit Larger Samples to Develop More Complex Models
Lift (O-o-S / O-o-T)
Model 3 (high-dim complex
events)
7
6
Model 2 (card R and F
only)
5
1,000
© 2015 Fair Isaac Corporation. Confidential.
17
10,000
100,000
#Training Samples
Agenda
•
Ultra-dynamic Attrition Scoring
•
Case Study—Credit Card Attrition
•
Category Attrition
─
Detecting Subtle Forms of Attrition
© 2015 Fair Isaac Corporation. Confidential.
18
Merchant Category (MC) Attrition
•
Hundreds of credit card MC’s
•
Performance definition for a specific MC:
─
•
Stop buying from this MC–while continuing card use for other MC’s
May signal competitive influence or early belt-tightening—before total attrition
occurs. Quick detection informs rapid intervention
Card-level model
Overall
customer status
Grocery model
Grocery status
Travel status
Travel model
Gas station status
Gas station model
© 2015 Fair Isaac Corporation. Confidential.
19
Possible interventions:
Offer incentives at
service stations, or start
customer dialogue
Summary
•
Daily attrition scoring quickly detects emergent attrition—signaled by unusually
long time lapse since last transaction
•
With large transaction volumes, more complex models are more profitable
•
Machine learning helps with insight, automation, scale
© 2015 Fair Isaac Corporation. Confidential.
20
References
[1] Greedy Function Approximation: A Gradient Boosting Machine, by Jerome
Friedman, The Annals of Statistics, 29(5), 2001, 1189-1232.
[2] Defection Detection: Measuring and Understanding the Predictive Accuracy of
Customer Churn Models, by Scott Neslin et al., Journal of Marketing Research,
43(2), 2006, 204-211.
[3] RFM and CLV: Using Iso-Value Curves for Customer Base Analysis, by Peter
Fader, Bruce Hardie, and Ka Lok Lee, Journal of Marketing Research, 42(4), 2005,
415-430.
[4] Predictive learning via rule ensembles, by Jerome Friedman et al., The Annals
of Applied Statistics, 2(3), 2008, 916-954.
© 2015 Fair Isaac Corporation. Confidential.
21
Thank You
Dr. Gerald Fahner
++1 512 5323621
[email protected]
© 2015 Fair Isaac Corporation. Confidential.
This presentation is provided for the recipient only and cannot be reproduced or shared without Fair Isaac Corporation’s express consent.