Decision Trees
Transcription
Decision Trees
Introduction to Decision Trees Decision Tree – Example #1 - 20 Questions Decision Tree – Example #2 – What to do at Strata? Go To A Tutorial? A Fun One? Yes Sorry! Wrong Place No Yes No Meeting Scheduled? Yes Great to see you! No Coffee time! Get going! Training Data RefId 1 12 13 14 26 IsBadBuy 0 0 1 0 1 PurchDate 12/7/2009 12/14/2009 12/14/2009 12/14/2009 12/21/2009 Auction VehYear VehicleAge Make Model ADESA 2006 3 MAZDA MAZDA3 ADESA 2001 8 FORD F150 PICKUP 2WD V6 ADESA 2005 4 DODGE CARAVAN GRAND ADESA 2005 4 NISSAN ALTIMA ADESA 2004 5 MERCURY SABLE Trim i XL SE Bas LS Color RED WHITE RED WHITE WHITE Decision Tree – Example #3 – Don’t Get Kicked = Buy! = Don’t! Car Age? <= 4 years Price? Price? <= $6200 > 4 years <= $4600 > $6200 Model? Mazda3 not Mazda3 > $4600 Color? orange not orange How do we construct a tree from data? Is it an animal? Information Gain Good Cars Bad Cars Good Cars Bad Cars How do we construct a tree from data? 1. Search for the feature with highest information gain – – – Is the car orange? Is the price < $500? $1000? $2000? Is the car <= 3 years old? 4 years? 5 years? 2. Choose this feature and split the training set 3. Recursion until: – – – Only one class left in the data We’ve hit a preset max depth We’ve hit a preset minimum split size or leaf size Decision trees are popular Decision trees are popular, with good reason • Simple underlying concept • Training is fast, prediction is even faster – Train a tree on 50,000 samples ~ 5 seconds – Predictions on 50,000 new samples ~ 0.1 seconds • Can do both classification or regression • Gracefully handles missing values Introduction to Random ForestsTM http://images4.fanpop.com/image/photos/20000000/Green-Forest-Wallpaper-green-20036585-1280-1024.jpg Random Forests • Simply a bunch of decision trees • To train each tree (typical size: 100 trees): – Randomly take 10,000 samples from your training set – Train a decision tree on this data – Consider only 30% of the features at each split point • To make predictions: – Make a prediction for each tree – Average predictions across all trees Random Forests (Verikas et al. 2011) Feature Extraction Bag of Words We the People of the United States, in Order to form a more perfect Union, establish Justice, insure domestic Tranquility, provide for the common defense, promote the general Welfare, and secure the Blessings of Liberty to ourselves and our Posterity, do ordain and establish this Constitution for the United States of America. … against all amendment an and any article as at be been but by case cases choose citizens 14 41 35 18 264 79 28 64 23 179 17 33 100 13 13 11 18 Text Analytics Examples 16 www.kaggle.com/c/asap-aes www.kaggle.com/c/asap-sas Essay Sample Human Grader 1: 22/30 Human Grader 2: 26/30 Sample Prompt: We all understand the benefits of laughter. For example, someone once said, “Laughter is the shortest distance between two people.” Many other people believe that laughter is an important part of any relationship. Tell a true story in which laughter was one element or part. Sample Response Excerpt: I met my best friend, Mike, in the third grade. Best friends are the people who come into your lives and stay with you for the rest of it, through the easy moments and the hard times. We learn from our best friends and share experiences with each other. No matter the distance, their friendship is always there for you, even if you are thousands of miles apart or just a few houses away. The laughter between two best friends is an element that bonds them for life, because when two friends truly laugh together, they will always have a small moment in their lives intertwined for as long as they live. … Short Answer Sample Sample Prompt: Starting with mRNA leaving the nucleus, list and describe four major steps involved in protein synthesis. Sample Response: 1) The mRNA will take the genetic code to a ribosome. 2) tRNA comes to the ribosome and matches it`s codon with the respective codons of the mRNA, dropping one molecule of amino acid as it goes 3) The amino acid collects, forming a long string of amino acid 4) The ribosome takes in the amino acid and creates protein Human Grader 1: 2/2 Human Grader 2: 2/2 Short Answer Scoring – Machine vs. Human Raters Benchmark Performance Correlation with 1st human grader 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Random Grading Answer Length Bag of 2nd Human Words + Grader RF Before we dive into IPython Notebook … 21