Module 1 – Proposed Rearrangements and Additions New Order
Transcription
Module 1 – Proposed Rearrangements and Additions New Order
Module 1 – Proposed Rearrangements and Additions New Order Topic BN1.1 Now Showing: Basic Numeracy BN1.2 Background Bugaboos BN1.3 Times Table Troubles BN1.4 Now Showing: Computations, Benchmarks BN1.5 Perceptions, Pictures, Pcts BN1.6 Computation and Common Sense BN1.7 Really Random Reasoning BN1.8 Hardwired to Slippery Thinking BN1.9 Why Numeracy Matters BN1.10 Mean versus Median BN1.11 Variation Matters BN1.12 Computing the Standard Deviation Comment Five questions Currently 1.1 Currently 1.2 Five Questions Currently 1.3 Currently 1.4 Currently 1.12 Currently 1.13 New New New New Mathematics Required None Percentages, addition, division Multiplication Percentages Percentages Division Counting None Division Addition, division Addition, division, counting Addition, division, square root Software? No No No No No No No No No Yes Optional Yes Stage Encounter Encounter Encounter Engage Engage Engage Reflect Reflect Reflect Extend Extend Extend Corrections, Changes to Existing Material NA None Question 1 refers to Table 1.1 when it should refer to Table 1.2 NA (suggestion to update benchmarks on video) None None None None NA NA NA NA BN1.13 BN1.14 BN1.15 BN1.16 BN1.17 BN1.18 BN1.19 BN1.20 BN1.21 BN1.22 BN1.23 BN1.24 Now Showing: Expers - Introduction Slippery Evidence and Confounding Confounding Confusion Now Showing: Compare and Rand Experimentation Takes Flight Catching on to Experimentation Now Showing: Stat Sig Questionable Evidence Random Reflections Assessing Statistical Significance Designer Thoughts What to Believe? Five questions Currently 1.6 New Five Questions Currently 1.8 New Five Questions New New New New New None None None Percentages, counting Addition, division, counting None None None None Addition, division, square root Addition, division None No No No No No No No No Optional Yes No Encounter Encounter Encounter Engage Engage Engage Reflect Reflect Reflect Extend Extend Extend NA None NA NA None NA NA NA NA NA NA NA BN1.25 BN1.26 BN1.27 BN1.28 BN1.29 BN1.30 BN1.31 BN1.32 BN1.33 BN1.34 BN1.35 BN1.36 Now Showing: Scatterplots Scatterplot – Part I Scatterplot –Part II Now Showing: Corr Coef Corr – Part I Corr – Part II Now Showing: Causation Association and Causation Association and Causation Revisited Simpson Simpson Revisited Correlation and Outliers Five Questions New New Five Questions New New Five Questions Currently 1.9 Currently 1.10 Currently 1.14 Currently 1.15 New None None None Addition, division, square root Addition, division, square root Addition, division, square root None None None Fractions, percentages Fractions, percentages Addition, division, square root No Yes No No Yes No No No No No Yes Encounter Encounter Encounter Engage Engage Engage Reflect Reflect Reflect Extend Extend Extend NA NA NA NA NA NA NA Replace the graph in Exhibit 2 with one I generated for BN 1.27 None None None NA BC1.1 A Very Lucky Project None No Beyond the Class None BC1.2 I Got Your Simpson Right Here Currently BC 1.1 Currently BC 1.2 Fractions Yes Beyond the Class BC1.3 Watch My Slippery Evidence Currently BC 1.3 None No Beyond the Class Replace first bullet link (no longer exists) with http://blog.revolutionanalytics.com/2013/10/an-interactive-tool-toexplain-simpsons-paradox.html None REMOVE Now Showing Number Sense Currently BN1.5 Old single-page video questions. Being broken into many different pages now. BEYOND THE NUMBERS 1.1_ LEARNING OUTCOME _ Name: To be graded, all assignments must be completed and submitted on the original book page. Section Number: BEYOND THE NUMBERS 1.4_ LEARNING OUTCOME _ Name: To be graded, all assignments must be completed and submitted on the original book page. Section Number: Name: Section Number: To be graded, all assignments must be completed and submitted on the original book page. Exhibit 1 Nursing Knowledge Needed Questions 2. The nurse had injected the patient four times with a full 0.9 milliliter syringe. What was the nurse’s mistake? Defend your answer. Exhibit 2 Statistical Citizenship Questions 1. Briefly list three “features of the Constitution that suggest a numerical approach to governance.” 2. Initially the government was reluctant to collect more than the most basic census information of race, sex, and age. Why? During which of the three time periods addressed by Cohen did this attitude change? 3. In the colonies, if you had 48 pounds of soap, how many firkins of soap did you have? 4. Cohen writes that “the post-Civil War era finally brought a full melding of statistical data with the functioning of representative government.” List three facts supporting this claim. BEYOND THE NUMBERS 1.10_ LEARNING OUTCOME _ Name: To be graded, all assignments must be completed and submitted on the original book page. Exhibit 1 Winging It Section Number: Exhibit 2 A BIG Visit Exhibit 3 Gates Proof Inference BEYOND THE NUMBERS 1.11_ LEARNING OUTCOME _ Variation Matters Name: Section Number: To be graded, all assignments must be completed and submitted on the original book page. Exhibit 1 Uncommon Reach The wingspans of 18 persons are recorded in the table below, nine being ordinary folks and nine being current or former NBA players. Questions 1. Use a software package (e.g. Excel or Numbers) to compute the mean and the median of the Ike Diogu Wingspan (in) 88 73 Anthony Davis 88 P3 72 Shelden Williams 88 P4 69 Elton Brand 90 P5 69 Shawn Bradley 90 P6 68 Bismack Biyombo 90 P7 68 Saer Sene 93 P8 64 Gheorghe Muresan 94 P9 73 Manute Bol 102 Non- NBA Persons P1 Wingspan (in) 70 P2 entire 18-person data set (“Data Set I”). Mean ________ Median ________ 2. How do the mean and median compare? NBA Persons Exhibit 2 Middle Muddle Interval (Bin) Wingspan ≤ 60 Questions Frequency 60 < Wingspan ≤ 65 1. Construct a histogram of all 18 of the wingspans in the table in Exhibit 1. Use the intervals shown in the table to the right. And plot the histogram on the axes below. Your instructor may require you to use a software package to do this exercise so follow her lead. 65 < Wingspan ≤ 70 70 < Wingspan ≤ 75 75 < Wingspan ≤ 80 80 < Wingspan ≤ 85 85 < Wingspan ≤ 90 The label of “60” on the plot below denotes the first bin – Wingspan ≤ 60. The label of “75” denotes the fourth bin – 70 < Wingspan ≤ 75 – and so on. 95 < Wingspan ≤ 100 100 < Wingspan ≤ 105 Wingspan Data 7 Frequency 6 5 4 3 2 1 0 60 65 70 75 80 85 90 95 100 105 More Wingspan (inches) 2. Locate the mean on the plot by drawing in a vertical line segment there. How useful is the mean at describing these 18 wingspans? Explain? 3. Let’s add an additional data set. Now suppose you have a data set of 18 persons, 8 with a wingspan of 80.5 inches, 5 with a wingspan of 75.5 and 5 with a wingspan of 85.5. We’ll call this “Data Set II.” Find the mean and median of these 18 data points. 4. Construct a histogram of all 18 of the wingspans from Question 3. Use the intervals shown in the table to the right. And plot the histogram on the axes below. Your instructor may require you to use a software package to do this exercise so follow her lead. Interval (Bin) Wingspan ≤ 60 60 < Wingspan ≤ 65 65 < Wingspan ≤ 70 70 < Wingspan ≤ 75 75 < Wingspan ≤ 80 The label of “60” on the plot below denotes the first bin – Wingspan ≤ 60. The label of “75” denotes the fourth bin – 70 < Wingspan ≤ 75 – and so on. 80 < Wingspan ≤ 85 85 < Wingspan ≤ 90 95 < Wingspan ≤ 100 100 < Wingspan ≤ 105 Wingspan Data 7 Frequency 6 5 4 3 2 1 0 60 65 70 75 80 85 90 95 Wingspan (inches) 100 105 More Frequency Exhibit 3 The Spice of Life Questions 1. Compare the data sets from Exhibit 1 and Exhibit 2. How close are their average values? 2. Compare the histograms from Exhibits 1 and 2. Specify at least two ways the histograms are notably different. 3. Let’s play a game. It costs you $1000 to play. Here are the rules. You get to pick two wingspans at random, eyes closed, out of either Data Set I or Data Set II. Your choice. Call your choices x1 and x2. You will receive a reward of $(80.5-x1)2 + $(80.5-x2)2. Suppose you decided to pick from Data Set I and chose P2 and Saer Sene. How much money did you make? 4. Think back to the game in Question 3. If you truly get to pick which Data Set you want to choose your two wingspans from, then which Data Set would you always be safest (in terms of anticipated profit) to choose and why? FYI – Some plots and facts. Mean of Data Set 1 is 80.5 Mean of Data Set 2 is 80.5 Histograms are below Wingspan Data 7 6 Frequency 5 4 3 2 1 0 60 65 70 75 80 85 90 95 Wingspan (inches) 100 105 More Wingspan Data 9 8 Frequency 7 6 5 4 3 2 1 0 60 65 70 75 80 85 90 95 Wingspan (inches) 100 105 More BEYOND THE NUMBERS 1.12_ LEARNING OUTCOME _ Name: Section Number: To be graded, all assignments must be completed and submitted on the original book page. Background ̅ ( √ ̅) ( ̅) Uncommon Reach Revisited ( ̅) Variance Standard Deviation 140.2647059 11.84334015 Variance Standard Deviation 14.70588 3.834825 3.088365219 BEYOND THE NUMBERS 1.13_ LEARNING OUTCOME _ Name: To be graded, all assignments must be completed and submitted on the original book page. Section Number: Name: To be graded, all assignments must be completed and submitted on the original book page. Exhibit 1 Brains and Beats Section Number: Exhibit 2 Fuzzy Quasi is a Bear BEYOND THE NUMBERS 1.16_ LEARNING OUTCOME _ Name: To be graded, all assignments must be completed and submitted on the original book page. Section Number: Name: Section Number: To be graded, all assignments must be completed and submitted on the original book page. A Measured Response Distance (cm) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Time (sec) 0.045 0.064 0.078 0.090 0.101 0.111 0.119 0.128 0.135 0.143 0.150 0.156 0.163 0.169 0.175 Distance (cm) 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Time (sec) 0.181 0.186 0.192 0.197 0.202 0.207 0.212 0.217 0.221 0.226 0.230 0.235 0.239 0.243 0.247 Group R 1 2 3 4 5 6 Time (sec) Group L 1 2 3 4 5 6 Time (sec) 4. Find the mean of both groups. Based on those two values, is there evidence of a difference between the reaction times of Group L and Group R? Defend your answer. 5. What role would the variance of the measurements in each group have in making this decision in Question 3 more precise? Explain. BEYOND THE NUMBERS 1.19_ LEARNING OUTCOME _ Name: To be graded, all assignments must be completed and submitted on the original book page. Section Number: Name: To be graded, all assignments must be completed and submitted on the original book page. Exhibit 1 Cancer Carafe Section Number: Exhibit 2 Of Mice and People Name: To be graded, all assignments must be completed and submitted on the original book page. Exhibit 1 What's Random? Section Number: Exhibit 2 Random Opposition B A B A B A A A B A B A A A B B A B A B B A B B Name: Section Number: To be graded, all assignments must be completed and submitted on the original book page. Badge of Big Table B Group R 1 2 3 4 5 6 Time (sec) 0.090 0.119 0.143 0.169 0.064 0.150 Group L 1 2 3 4 5 6 Time (sec) 0.111 0.181 0.090 0.186 0.045 0.143 5. Compute √ 6. It turns out that the one can say that the difference between the left-hand reaction times and righthand reaction times fail to be statistically significant if -2.23 < |Z| < 2.23, where Z is what you computed in Question 5. Do the results shown in Table B support a statistically significant difference in reaction times? Why or why not? Instructor’s Note: The notation Z was used instead of “t” to avoid confusion later in the workbook. Also the degrees of freedom were computed using the simple estimate of 6 + 6 -2. BEYOND THE NUMBERS 1.23_ LEARNING OUTCOME _ Pairing Profits Reaction Times (sec) Left Right 1 1.05 0.74 0.76 0.66 0.71 0.78 0.79 0.68 0.69 0.65 0.72 0.75 0.75 0.69 0.72 0.94 0.99 0.79 0.8 0.81 0.82 0.62 0.67 (24-1) x Variance of All 24 Reaction Times (24-1) x Variance of All 24 Reaction Times = Variance Attributed to Hand (L/R) = Variance Attributed to Hand (L/R) + + Variance Left Unexplained Variance Explained by Pairing + Variance Left Unexplained Time Source DF Sum of Squares Mean Square F Value Pr > F Model 12 0.30148333 0.02512361 Error 11 0.00290000 0.00026364 Corrected Total 23 0.30438333 95.30 <.0001 R-Square Coeff Var Root MSE Time Mean 0.990473 Source DF 2.097337 0.016237 0.774167 Type I SS Mean Square F Value Pr > F Subject 11 0.29608333 0.02691667 102.10 <.0001 1 0.00540000 0.00540000 20.48 0.0009 Hand Source DF Type III SS Mean Square F Value Pr > F Subject 11 0.29608333 0.02691667 102.10 <.0001 1 0.00540000 0.00540000 20.48 0.0009 Hand Time Source DF Sum of Squares Mean Square F Value Pr > F Model 1 0.00540000 0.00540000 Error 22 0.29898333 0.01359015 Corrected Total 23 0.30438333 0.40 0.5350 R-Square Coeff Var Root MSE Time Mean 0.017741 Source DF Hand 15.05836 0.116577 0.774167 Type I SS Mean Square F Value Pr > F 1 0.00540000 0.00540000 0.40 0.5350 Source DF Type III SS Mean Square F Value Pr > F Hand 1 0.00540000 0.00540000 0.40 0.5350 BEYOND THE NUMBERS 1.24_ LEARNING OUTCOME _ Name: To be graded, all assignments must be completed and submitted on the original book page. Exhibit 1 Piltdown Meltdown - 1912 Section Number: Exhibit 2 Marker Mice - 1974 Exhibit 3 Doing the Dishes - 2010 Name: To be graded, all assignments must be completed and submitted on the original book page. 1. 2. 3. 4. 5. Section Number: BEYOND THE NUMBERS 1.26_ LEARNING OUTCOME _ Scatterplots --- Part I Name: Section Number: To be graded, all assignments must be completed and submitted on the original book page. Exhibit 1 Anscombe's Activity Obs x1 y1 x4 y4 1 10 8.04 8 6.58 2 8 6.95 8 5.76 3 13 7.58 8 7.71 4 9 8.81 8 8.84 5 11 8.33 8 8.47 6 14 9.96 8 7.04 7 6 7.24 8 5.25 8 4 4.26 19 12.5 9 12 10.84 8 5.56 10 7 4.82 8 7.91 11 5 5.68 8 6.89 These data were created by F.J. Anscombe* in 1973 to remind us of the importance of plotting our data. You will see these data again later on in this workbook. Questions 1. Create a scatterplot of y1 vs. x1. Does the plot show a positive association or a negative association? How do you know? 2. Create a scatterplot of y4 vs. x4. Does the plot show a positive association or a negative association? How do you know? Make sure you turn in your plots with this assignment 10 10 Y4 15 Y1 15 5 5 0 0 0 5 10 X1 15 20 0 5 10 15 X4 Edward R. Tufte, The Visual Display of Quantitative Information (Cheshire, Connecticut: Graphics Press, 1983), pp. 14‐15. F.J. Anscombe, "Graphs in Statistical Analysis," American Statistician, vol. 27 (Feb 1973), pp. 17‐21 20 Exhibit 2 Vaccines and Risk There is an on‐going debate over possible links between vaccines with thimerosal and onset of autism. The data set below records the percentages of California children who had received 4 doses of DTP by their 2nd birthday and the number of autism cases in California’s Department of Developmental Services’ regional service center system*. California 1980‐1994 1400 Autism Cases 1200 1000 800 600 400 200 0 40 50 60 70 DTP Coverage (%) Questions 80 Year 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 DTP Coverage (%) 50.9 55.4 52.1 47.7 48.9 54.3 54.1 55.3 60.9 62.2 65.9 67.3 69.8 73.6 75.7 Number of Autism Cases 176 201 212 229 246 293 357 347 436 522 663 823 1042 1090 1182 1. Create a scatterplot of Autism Cases versus DTP Coverage. Does the plot show a positive association or a negative association? How do you know? Make sure you turn in your plot with this assignment. 2. Is the association weak or strong? Defend your reasoning. *Dr. Loring Dales from the Immunization Branch, California Department of Health Service made these data publically available at http://www.putchildrenfirst.org/media/4.6.pdf. See also http://www.ncbi.nlm.nih.gov/pubmed/11231748 FYI‐ Plots, so you can see what they students should see. Part 1 Exhibit 1 x1-y1 14 12 10 8 6 4 2 0 0 5 10 15 20 x4-y4 14 12 10 8 6 4 2 0 0 5 10 15 20 California 1980‐1994 1400 1200 Autism Cases Part 1 Exhibit 2 1000 800 600 400 200 0 40 50 60 DTP Coverage (%) 70 80 BEYOND THE NUMBERS 1.27_ LEARNING OUTCOME _ Scatterplots --- Part II Name: Section Number: To be graded, all assignments must be completed and submitted on the original book page. Exhibit 1 Mortality and Global Warming In this exercise we want you to construct a scatterplot of “Child Mortality” versus “CO2 Emissions” for 192 countries, from 2006 data, archived by Dr. Hans Rosling*. These data are available at http://www.heretheyarenow . You must use a computer software package (e.g. Excel or Numbers), or an online applet. Your instructor will tell you which package she requires, if, indeed, a particular one is required. Make sure you label your axes and provide a professional plot. Answer the questions below. Save your computer work. You may need it for another Beyond the Numbers later on. Questions 1. What computer software did you use to construct your plot? Make sure you turn in your plot with this assignment. 2. Does the scatterplot show a positive association or a negative association? How do you know? 3. Is the association weak or strong? Defend your reasoning. *Hans Rosling is Professor of International Health at Karolinska Institute and the co‐founder and chairman of the Gapminder Foundation. Dr. Rosling is committed to making important public data available for easy plotting and analysis with his Gapminder software. Exhibit 2 Mortality and Global Warming Transformed Save your computer work for this Exhibit. You may need it for another Beyond the Numbers later on. Questions 1. Redo the scatterplot from Exhibit 1. Same rules on required use of a computer package and professional‐looking result. This time plot log10(Child Mortality) versus log10(CO2 Emissions). How does this plot compare to the one you did in Exhibit 1? Make sure you turn in your plot with this assignment. 2. Does the scatterplot show a positive association or a negative association? How do you know? 3. Is the association weak or strong? Defend your reasoning. FYI‐ Plots, so you can see what they students should see. Part 2 Exhibit 1 – I’d like this to replace the plot on page 20 of current workbook. Same data, this is lin/lin scale and should not be proprietary. For reference, India (blue), China (red) and US are highlighted dots. 200 Child mortality (0-5 year old dying per 1,000 born) 180 160 140 120 100 80 60 40 20 0 0 10 20 30 40 50 CO2 emissions (tonnes per person) 60 70 Part 2 Exhibit 2 – Nice linear outcome. Will compare r in Exhibit 1 and Exhibit 2 in next BN Child mortality (0-5 year old dying per 1,000 born) 3 2.5 2 1.5 1 0.5 0 -2 -1 0 1 CO2 emissions (tonnes per person) 2 BEYOND THE NUMBERS 1.28_ LEARNING OUTCOME _ Name: To be graded, all assignments must be completed and submitted on the original book page. 1. 2. 3. 4. 5. Section Number: BEYOND THE NUMBERS 1.29_ LEARNING OUTCOME _ Computing Correlations --- Part I Name: Section Number: To be graded, all assignments must be completed and submitted on the original book page. Exhibit 1 Anscombe's Activity Revisited 2 Obs x1 y1 1 10 8.04 2 8 6.95 3 13 7.58 4 9 8.81 5 11 8.33 6 14 9.96 7 6 7.24 8 4 4.26 9 12 10.84 10 7 4.82 11 5 5.68 Σx = Σy = x1y1 x1 2 y1 Recall the Anscombe’s data from an earlier Beyond the Numbers. In this activity you will be asked to compute the correlation coefficient for each pair of variables and compare. Questions 1. Compute r for the (x1,y1) pairs. Σxy = 2. Compute r for the (x4,y4) pairs. 2 Σx = Σx2 = 2 y4 1 8 6.58 2 8 5.76 3 8 7.71 4 8 8.84 5 8 8.47 6 8 7.04 7 8 5.25 8 19 12.5 9 8 5.56 10 8 7.91 11 8 6.89 Σy = x4 2 x4 Σx = x4y4 y4 Obs 3. Compare the two r values you found in light of the scatterplots of these data (which you plotted earlier). What note of inferential caution does this exercise sound? Σxy = 2 Σx = Σx2 = Exhibit 2 Vaccines and Risk Revisited Dr. Loring Dales of the Immunization Branch, California Department of Health Service writes “here are the data we have on (a) percentages of California children who had received 4 doses of DTP by their 2nd Year X = DTP Y = Number of 2 2 xy x y Coverage (%) Autism Cases 1980 50.9 176 1981 55.4 201 1982 52.1 212 1983 47.7 229 1984 48.9 246 1985 54.3 293 1986 54.1 357 1987 55.3 347 1988 60.9 436 1989 62.2 522 1990 65.9 663 1991 67.3 823 1992 69.8 1042 1993 73.6 1090 1994 75.7 1182 Σx = Σy = Σxy = 2 Σx = Σx2 = Questions 1. Fill out all the entries in the table that are missing. Your instructor may have you retype the table if you are not required to turn in this actual page. 2. Compute the correlation coefficient between DTP Coverage and Autism Prevalence FYI‐ Key bit, so you can see what they students should see. Part 1 Exhibit 1 x1-y1 14 12 10 8 6 4 2 0 0 5 10 15 20 r = 0.82 x4-y4 14 12 10 8 6 4 2 0 0 5 10 15 20 r = 0.82 YIKES reaction Part 1 Exhibit 2 R = 0.9616552 (“strong” correlation, will address later in causation section) Part 2 Exhibit 1 R = ‐0.438960093 (seems a bit small – because of curvature) Part 2 Exhibit 2 R = ‐0.802859716 (much more reasonable after transformation) BEYOND THE NUMBERS 1.30_ LEARNING OUTCOME _ Computing Correlations --- Part II Name: Section Number: To be graded, all assignments must be completed and submitted on the original book page. Exhibit 1 Mortality and Global Warming Revisited Refer to BN1.B: Scatterplots – Part II. Hans Rosling is Professor of International Health at Karolinska Institute and the co‐founder and chairman of the Gapminder Foundation. Dr. Rosling is committed to making important public data available for easy plotting and analysis with his Gapminder software. In this exercise we want you to compute the correlation coefficient between “Child Mortality” versus “CO2 Emissions” for 192 countries, from 2006 data archived by Dr. Rosling. These data are available at http://www.heretheyarenow . You must use a computer software package such as Excel or Numbers, or an online applet. Your instructor will tell you which package she requires, if, indeed, a particular one is required. Questions 1. What is the value of r? 2. Does the value of r suggest the association between Child Mortality and CO2 Emissions is strong or weak? How do you know? 3. Do you think the computation of r is appropriate for these data? Why or why not? (You constructed a scatterplot of these data in BN_1.27). Exhibit 2 Transformations Revisited Refer to BN1.B: Scatterplots – Part II. In this exercise we want you to compute the correlation coefficient between log10(Child Mortality) and log10(CO2 Emissions) for the 192 countries, from 2006, in the data archived by Dr. Rosling. The raw data are available at http://www.heretheyarenow . You must use a computer software package such as Excel or Numbers, or an online applet. Your instructor will tell you which package she requires, if, indeed, a particular one is required. Questions 1. What is the value of r? 2. How does this value of r compare to the one found in Exhibit 1? 3. Does the value of r suggest the association between Child Mortality and CO2 Emissions is strong or weak? How do you know? 4. Do you think the computation of r is appropriate for these data? Why or why not? (You constructed a scatterplot of these data in BN_Ab. See page kjlkjl also. FYI‐ Key bit, so you can see what they students should see. Part 2 Exhibit 1 R = ‐0.438960093 (seems a bit small – because of curvature) Part 2 Exhibit 2 R = ‐0.802859716 (much more reasonable after transformation) Name: To be graded, all assignments must be completed and submitted on the original book page. 1. 2. 3. 4. 5. Section Number: BEYOND THE NUMBERS 1.36_ LEARNING OUTCOME _ Outliers and Leverage Points Name: Section Number: To be graded, all assignments must be completed and submitted on the original book page. Exhibit 1 Heptathletes Finish data for two events from the 1992 Olympic Heptathlon are shown below. A scatterplot of the data are shown just to the right of the table. Chouaa is the green data point and Barber is the red one. Joyner‐Kersee Nastase Dimitrova Belova Braun Beer Court Kamrowska Wlodarczyk Greiner Kaljurand Zhu Skjaeveland Lesage Nazaroviene Aro Marxer Rattya Carter Atroshchenko Vaidianu Teppe Clarius Bond‐Mills Barber Chouaa Hurdles Javelin (seconds) (meters) 12.85 44.98 12.86 41.3 13.23 44.48 13.25 41.9 13.25 51.12 13.48 48.1 13.48 52.12 13.48 44.12 13.57 43.46 13.59 40.78 13.64 47.42 13.64 45.12 13.73 35.42 13.75 41.28 13.75 44.42 13.87 45.42 13.94 41.08 13.96 49.02 13.97 37.58 14.03 45.18 14.04 49 14.06 52.58 14.1 45.14 14.31 43.3 14.79 0 16.62 44.4 Heptatlon Results from 1992 Javelin Distance (Meters) Name 60 50 40 30 20 10 0 12 13 14 15 16 17 Hurdles Time (Seconds) Questions 1. What kind of association do you see in the scatterplot – positive, negative, neither? Support your answer. 2. Compute the correlation coefficient “r” for the entire data set. You should use a software package or an online applet as required by your instructor. Is this value of “r” consistent with what you answered in Question 1? Why or why not? Exhibit 2 Language “Outliers” in a scatterplot are data pairs that are not spatially close to the bulk of the data. Outliers are not necessarily a problem for the human inference that arises from a correlation coefficient. However, if the removal of a single outlier causes a distinct change in the correlation, then that outlier would be called an “influence point” and influence points can disguise the essence of an association. Questions 1. Looking at the scatterplot above, which athletes are outliers? 2. Compute the correlation coefficient “r” for the data set with Barber removed. Is Barber an influence point? Why? 3. Compare the values of “r” that you computed for the entire data set and for the data set with Barber removed. Which one best reflects the association seen in the scatterplot? Why? FYI – Easy computations show the following: a) r overall is ‐0.25213, not large but notably incongruous with the plot b) r w/o barber is 0.00061 definitely making her a leverage point