Decision Trees

Transcription

Decision Trees
Introduction to
Decision
Trees
Decision Tree – Example #1 - 20 Questions
Decision Tree – Example #2 – What to do at Strata?
Go To A Tutorial?
A Fun One?
Yes
Sorry!
Wrong Place
No
Yes
No
Meeting
Scheduled?
Yes
Great to see
you!
No
Coffee time!
Get going!
Training Data
RefId
1
12
13
14
26
IsBadBuy
0
0
1
0
1
PurchDate
12/7/2009
12/14/2009
12/14/2009
12/14/2009
12/21/2009
Auction VehYear VehicleAge
Make
Model
ADESA
2006
3
MAZDA
MAZDA3
ADESA
2001
8
FORD
F150 PICKUP 2WD V6
ADESA
2005
4
DODGE
CARAVAN GRAND
ADESA
2005
4
NISSAN
ALTIMA
ADESA
2004
5
MERCURY
SABLE
Trim
i
XL
SE
Bas
LS
Color
RED
WHITE
RED
WHITE
WHITE
Decision Tree – Example #3 – Don’t Get Kicked
= Buy!
= Don’t!
Car Age?
<= 4 years
Price?
Price?
<= $6200
> 4 years
<= $4600
> $6200
Model?
Mazda3
not Mazda3
> $4600
Color?
orange not orange
How do we construct a tree from data?
Is it an animal?
Information Gain
Good Cars
Bad Cars
Good Cars
Bad Cars
How do we construct a tree from data?
1. Search for the feature with highest information gain
–
–
–
Is the car orange?
Is the price < $500? $1000? $2000?
Is the car <= 3 years old? 4 years? 5 years?
2. Choose this feature and split the training set
3. Recursion until:
–
–
–
Only one class left in the data
We’ve hit a preset max depth
We’ve hit a preset minimum split size or leaf size
Decision trees are popular
Decision trees are popular, with good reason
• Simple underlying concept
• Training is fast, prediction is even faster
– Train a tree on 50,000 samples ~ 5 seconds
– Predictions on 50,000 new samples ~ 0.1 seconds
• Can do both classification or regression
• Gracefully handles missing values
Introduction to
Random ForestsTM
http://images4.fanpop.com/image/photos/20000000/Green-Forest-Wallpaper-green-20036585-1280-1024.jpg
Random Forests
• Simply a bunch of decision trees
• To train each tree (typical size: 100 trees):
– Randomly take 10,000 samples from your training set
– Train a decision tree on this data
– Consider only 30% of the features at each split point
• To make predictions:
– Make a prediction for each tree
– Average predictions across all trees
Random Forests
(Verikas et al. 2011)
Feature Extraction
Bag of Words
We the People of the United
States, in Order to form a
more perfect Union,
establish Justice, insure
domestic Tranquility, provide
for the common defense,
promote the general
Welfare, and secure the
Blessings of Liberty to
ourselves and our Posterity,
do ordain and establish this
Constitution for the United
States of America. …
against
all
amendment
an
and
any
article
as
at
be
been
but
by
case
cases
choose
citizens
14
41
35
18
264
79
28
64
23
179
17
33
100
13
13
11
18
Text Analytics Examples
16
www.kaggle.com/c/asap-aes
www.kaggle.com/c/asap-sas
Essay Sample
Human Grader 1: 22/30
Human Grader 2: 26/30
Sample Prompt: We all understand the benefits of laughter. For example,
someone once said, “Laughter is the shortest distance between two people.”
Many other people believe that laughter is an important part of any
relationship. Tell a true story in which laughter was one element or part.
Sample Response Excerpt: I met my best friend, Mike, in the third grade.
Best friends are the people who come into your lives and stay with you for the
rest of it, through the easy moments and the hard times. We learn from our
best friends and share experiences with each other. No matter the distance,
their friendship is always there for you, even if you are thousands of miles
apart or just a few houses away. The laughter between two best friends is an
element that bonds them for life, because when two friends truly laugh
together, they will always have a small moment in their lives intertwined for as
long as they live. …
Short Answer Sample
Sample Prompt: Starting with mRNA leaving the nucleus, list and describe
four major steps involved in protein synthesis.
Sample Response:
1) The mRNA will take the genetic code to a ribosome.
2) tRNA comes to the ribosome and matches it`s codon with the respective
codons of the mRNA, dropping one molecule of amino acid as it goes
3) The amino acid collects, forming a long string of amino acid
4) The ribosome takes in the amino acid and creates protein
Human Grader 1: 2/2
Human Grader 2: 2/2
Short Answer Scoring – Machine vs. Human Raters
Benchmark Performance
Correlation with 1st human grader
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Random
Grading
Answer
Length
Bag of 2nd Human
Words +
Grader
RF
Before we dive into IPython Notebook …
21