How to perform predictive analysis ... web analytics tool data FREE Webinar by June 19

Transcription

How to perform predictive analysis ... web analytics tool data FREE Webinar by June 19
A GACP and GTMCP company
How to perform predictive analysis on your
web analytics tool data
June 19th, 2013
6/19/2013
FREE Webinar by
#tatvicwebinar
Before we start...
www
6/19/2013
Q
&
A
A GACP and GTMCP company
?
#tatvicwebinar
Our speakers
A GACP and GTMCP company
Carolina Araripe
Inbound Marketing Strategist
@Tatvic
http://linkd.in/YazvVn
Amar Gondaliya
Data Model Engineer
@Tatvic
http://linkd.in/16cpDQI
Kushan Shah
Web Analyst
@Tatvic
http://linkd.in/18rfFfV
6/19/2013
#tatvicwebinar
Talking about Analytics…
A GACP and GTMCP company
Descriptive:
What has
happened?
Analytics
Predictive:
Predicts the
outcome or
future
6/19/2013
Prescriptive:
What should
happen?
#tatvicwebinar
Talking about Analytics…
A GACP and GTMCP company
Descriptive:
What has
happened?
Analytics
Predictive:
Predicts the
outcome or
future
6/19/2013
Prescriptive:
What should
happen?
#tatvicwebinar
In other words…
A GACP and GTMCP company
Predictive Analytics
“Technology that learns from experience (data) to
predict the future behavior of individuals in order
to drive better decisions.”
Source: Siegel, E. (2013) “Predictive Analytics. The power to predict who will click, buy, lie or die.”
6/19/2013
#tatvicwebinar
Outline of this webinar
A GACP and GTMCP company
Predictive Analytics
Tool
Data
Model
R
Google
Analytics
Logistic
Regression
Visualization
6/19/2013
#tatvicwebinar
Outline of this webinar
A GACP and GTMCP company
Predictive Analytics
Tool
Data
Model
R
Google
Analytics
Logistic
Regression
Visualization
6/19/2013
#tatvicwebinar
Introduction to R
What
A GACP and GTMCP company
• Open source statistical computing language, widely used by
organizations to solve business problems.
• Data Analysis
• Statistical Tests
• Data Visualization
• Predictive Model
• Easy to integrate
• Data frame
•
• Choose and download
a user-friendly GUI
• Forecasting
Applications
Why
How to get
started
6/19/2013
Download
and install
• Pre developed
packages
RStudio
#tatvicwebinar
R Packages
Categories of Packages
Data Extraction
A GACP and GTMCP company
For this webinar
• RGoogleAnalytics
Usage: To extract Google Analytics data into R
Contibutors: Michael Pearmain, Nick Mihailovski,
Amar Gondaliya and Vignesh Prajapati
Data Visualization
• ggplot2
Usage: Build plots and charts
Contibutor: Hadley Wickham
Time Series
Machine Learning
6/19/2013
#tatvicwebinar
Outline of this webinar
A GACP and GTMCP company
Predictive Analytics
Tool
Data
Model
R
Google
Analytics
Logistic
Regression
Visualization
6/19/2013
#tatvicwebinar
Outline of this webinar
A GACP and GTMCP company
Predictive Analytics
Tool
Data
Model
R
Google
Analytics
Logistic
Regression
Visualization
6/19/2013
#tatvicwebinar
Google Analytics data
A GACP and GTMCP company
Extracting your GA data into R
User performing
data extraction
Google OAuth2
Authorization
Server
Google Analytics
API
Access Token Request
Access Token Response
Call API for list
of profiles
Call API for
query
6/19/2013
#tatvicwebinar
Outline of this webinar
A GACP and GTMCP company
Predictive Analytics
Tool
Data
Model
R
Google
Analytics
Logistic
Regression
Visualization
6/19/2013
#tatvicwebinar
Outline of this webinar
A GACP and GTMCP company
Predictive Analytics
Tool
Data
Model
R
Google
Analytics
Logistic
Regression
Visualization
6/19/2013
#tatvicwebinar
Business Problem
A GACP and GTMCP company
Projected Growth of Retail eCommerce in US
US Retail eCommerce Sales 2011-2016
(in billion $)
$384.90
$338.90
$296.70
$194.70
2011
$225.50
2012
$258.90
2013
2014
2015
2016
Source: http://www.emarketer.com/Article/Retail-Ecommerce-Set-Keep-Strong-Pace-Through-2017/1009836
6/19/2013
#tatvicwebinar
Business Problem
A GACP and GTMCP company
Product return
“Returns are on the rise-up 19% from 2007. For every US$1 spent on merchandize, 9¢ are returned.”
“Average return rate for ecommerce retailers varies from 3-12%.”
Source: Time Magazine, Sept. 04th, 2012
Product Return Impact (per day)
Average Return Rate
9%
7%
Average Order Value
$100
$100
Orders Per Day
500
500
Total Income
$50,000
$50,000
Loss due to returns
$4,500
$3,500
Revenue post loss
$45,500
$46,500
-----
$1000
Increase in Revenue/day
6/19/2013
Increase in Revenue with
recovered returns in long run
Month
x30
$30,000
Year
x365
$365,000
#tatvicwebinar
Data Introduction
A GACP and GTMCP company
Transactional Data
6/19/2013
Pre Purchase
Data
Browsing Behavior up to shopping
cart
In Purchase
Data
Purchase Behavior from shopping
cart to thank you page
#tatvicwebinar
Modeling
A GACP and GTMCP company
Loading Input Data
Introducing Model Variables
Model Creation
Model Performance
Applying Model to Test Data
6/19/2013
#tatvicwebinar
Machine Learning Tech.
A GACP and GTMCP company
Supervised Learning
Generates a function that maps inputs (labeled data) to desired outputs (e.g.: Spam Detection)
Variables
Supervised Learning Model
Labels are right answers
from historical data
Training
Data
Machine
Learning
Algorithm
Labels
e.g.: Spam Detection
Input Data: Contains
emails marked Spam/No
Spam
Variables
Test Data
6/19/2013
Predictive
Model
Predicted
Outcome
labels
#tatvicwebinar
Modeling
A GACP and GTMCP company
Loading Input Data
Introducing Model Variables
Model Creation
Model Performance
Applying Model to Test Data
6/19/2013
#tatvicwebinar
Modeling
A GACP and GTMCP company
Loading Input Data
Introducing Model Variables
Model Creation
Model Performance
Applying Model to Test Data
6/19/2013
#tatvicwebinar
Feature engineering
A GACP and GTMCP company
Going beyond algorithms and using domain knowledge to augment new
variables to model
•
•
•
•
E.g.: Products purchased as gifts are less likely to be returned
Create a New Variable with binary values: 1 – Product purchased as gift, 0 –
otherwise
Products purchased in holiday season are more likely to be returned
Based on Purchase date, create new variable with binary values: 1 – Product
purchased in the month Nov-Dec, 0 - otherwise
6/19/2013
#tatvicwebinar
Predictor/Response Variables
A GACP and GTMCP company
700,000.00
Price of House ($)
Response Variable
800,000.00
600,000.00
500,000.00
400,000.00
300,000.00
200,000.00
100,000.00
0.00
0
500
1,000
1,500
2,000 2,500 3,000
Size of House (sq ft)
3,500
4,000
4,500
5,000
Predictor Variable
6/19/2013
#tatvicwebinar
Modeling
A GACP and GTMCP company
Loading Input Data
Introducing Model Variables
Model Creation
Model Performance
Applying Model to Test Data
6/19/2013
#tatvicwebinar
Generalized Linear Models
A GACP and GTMCP company
glm (formula, family, data)
Formula
Response ~ Predictor (This argument shows which all variables are
independent (predictor) variables and which variable is/are
dependent(response) variable/s
Family
Binomial (Since the output variable (which is product return is
defined as binary value 0 or 1, we are using binomial family)
Data
Train data set – This data set consists values of all 18 variables (i.e.
values of dependent variables and independent variables are
given). This dataset is also called labeled data.
6/19/2013
#tatvicwebinar
Modeling
A GACP and GTMCP company
Loading Input Data
Introducing Model Variables
Model Creation
Model Performance
Applying Model to Test Data
6/19/2013
#tatvicwebinar
Modeling
A GACP and GTMCP company
Loading Input Data
Introducing Model Variables
Model Creation
Model Performance
Applying Model to Test Data
6/19/2013
#tatvicwebinar
Machine Learning Tech.
A GACP and GTMCP company
Supervised Learning
Generates a function that maps inputs (labeled data) to desired outputs (e.g. Spam Detection)
Variables
Supervised Learning Model
Labels are right answers
from historical data
Training
Data
Machine
Learning
Algorithm
Labels
e.g.: Spam Detector
Input Data: Contains
emails marked Spam/No
Spam
Variables
Test Data
6/19/2013
Predictive
Model
Predicted
Outcome
labels
#tatvicwebinar
Summary
A GACP and GTMCP company
Probability of product return > 60%
Number of Transactions
Probability of product return ≤ 60%
> 60 %
≤ 60 %
> 60 %
< 60 %
Probability of Product Returns
 Call customer before shipping
 Send discount coupon to initiate customer for future purchase
6/19/2013
#tatvicwebinar
Outline of this webinar
A GACP and GTMCP company
Predictive Analytics
Tool
Data
Model
R
Google
Analytics
Logistic
Regression
Visualization
6/19/2013
#tatvicwebinar
Outline of this webinar
A GACP and GTMCP company
Predictive Analytics
Tool
Data
Model
R
Google
Analytics
Logistic
Regression
Visualization
6/19/2013
#tatvicwebinar
ggplot2
Geometric Shapes
6/19/2013
Scales and Coordinate Systems
A GACP and GTMCP company
Plot Annotations
#tatvicwebinar
Q&A Round
6/19/2013
A GACP and GTMCP company
#tatvicwebinar
A GACP and GTMCP company
Thank you!
Carolina Araripe
[email protected]
+91 7600-515-354
+1 276-644-0456
6/19/2013
#tatvicwebinar