Chapter 4: Numerical Methods for Describing Data Review Pack Name _________________________
Transcription
Chapter 4: Numerical Methods for Describing Data Review Pack Name _________________________
Chapter 4: Numerical Methods for Describing Data Review Pack Name _________________________ The following questions are in a True / False format. The answers to these questions will frequently depend on remembering facts, understanding of the concepts, and knowing the statistical vocabulary. Before answering these questions, be sure to read them carefully! T F 1. The trimmed mean is less sensitive to outliers than is the mean. T F 2. The mean is the middle value of an ordered data set. T F 3. One disadvantage of using the mean as a measure of center for a data set is that its value is affected by the presence of even a single outlier in the data set. T F 4. The variance is the positive square root of the standard deviation. T F 5. For any given data set, the median must be greater than or equal to the lower quartile, and less than or equal to the upper quartile. T F 6. For data that is skewed to the right, T F 7. By definition, an outlier is "extreme" if it is more than 3.0 iqr away from the closest quartile. T F 8. According to Chebyshev’s rule, the fraction of observations that are within 3 standard deviations of the mean is at least eight-ninths. T F 9. When using a 20% trimmed mean, the largest 10% and the smallest 10% of the observations are discarded for calculation purposes. T F 10. When the histogram of a data set is closely approximated by a normal curve, the standard deviation and the interquartile range are very close to equal on average. T F 11. The interquartile range is resistant to the effect of outliers. T F 12. If there are no outliers, a skeletal and modified boxplot can differ in the length of the box, but not in the whisker lengths. x x 0 . Chapter 4, Review Pack Page 1 of 9 Chapter 4: Numerical Methods for Describing Data Review Pack 1. Astronomers are interested in the recessional velocity of galaxies – that is, the speed at which they are moving away from the Milky Way. The accompanying table contains the recessional velocities for a sample of galaxies, measured in km/sec. Negative velocity indicates the galaxy is moving towards us. Recessional velocities (km/sec) 170 150 290 500 -130 920 -70 500 -220 960 200 500 290 850 200 800 300 1090 (a) Calculate these numerical summaries: The mean The standard deviation The median The interquartile range _______________ _______________ _______________ _______________ (b) Construct a skeletal box plot for these data. Chapter 4, Review Pack Page 2 of 9 650 Chapter 4: Numerical Methods for Describing Data Review Pack (c) Judging from the data and your responses in parts (a) and (b), would you say this distribution is skewed or approximately symmetric? Justify your response using appropriate statistical terminology. 2. A wide variety of oak trees grow in the United States. In one study a sample of acorns was collected from different locations, and their volumes, in cm3, were recorded. In the table at right are summary statistics for these data. Acorn Statistics Statistic (a) Describe a procedure that uses these some or all of these summary statistics to determine whether outliers are present in the data. (b) Using your procedure from part (a), determine if there are outliers in these data. Chapter 4, Review Pack Page 3 of 9 Value N 38 Mean 3.0 Median 1.8 St. Dev. 2.6 Minimum 0.3 Maximum 10.5 1st Quartile 1.1 3rd Quartile 4.3 Chapter 4: Numerical Methods for Describing Data Review Pack 3. An insurance agent is studying fire damage claims in a major city to see if the insurance premiums are matched to the company's risk. She takes a random sample of 20 claims, and finds the amount of each claim, in thousands of dollars. Her results are shown below: Fire Damage Claims in a major city ($1,000) 52 59 32 54 45 73 39 62 97 65 58 48 62 28 30 69 13 41 75 36 (a) Under what circumstances should one consider using a trimmed mean as a description of the center of a distribution? (b) Does the fire damage data exhibit the characteristic(s) that suggest a trimmed mean is the appropriate statistic to calculate? Explain. Chapter 4, Review Pack Page 4 of 9 Chapter 4: Numerical Methods for Describing Data Review Pack 1. Consider a study in which the heights of a sample of 1000 high school seniors were recorded. The mean height is 70" and the standard deviation of the heights is 3". It is observed that the height distribution is approximately normal. (a) Approximately what percent of heights in this sample would exceed 79"? (b) What is the approximate percentile of a senior who is 73" tall? (c) When the data were summarized the value of the first quartile was written down but then smudged. There is general agreement that the writer meant to indicate either 66" or 68". Which of these values is most likely the correct one? Justify your answer with appropriate statistical reasoning. Chapter 4, Review Pack Page 5 of 9 Chapter 4: Numerical Methods for Describing Data Review Pack 2. In recent years there has been considerable discussion about the appropriateness of the body shapes and proportions of Ken and Barbie dolls. These dolls are very popular, and there is some concern that the dolls may be viewed as having the "ideal body shape," potentially leading young children to risk anorexia in pursuit of that ideal. Researchers investigating the dolls' body shapes scaled Ken and Barbie up to a common height of 170.18 cm (5' 7") and compared them to body measurements of active adults. Common measures of body shape are the chest (bust), waist, and hip circumferences. These measurements for Ken and Barbie and their reference groups are presented in the table below: Doll and Human Reference Group Measurements (cm) Ken Chest Waist Barbie Hips Chest Waist Hips Doll 75.0 56.5 72.0 82.3 40.7 72.7 Human x 91.2 80.9 93.7 90.3 69.8 97.9 Human s 4.8 9.8 6.8 5.5 4.7 5.4 For the following questions, suppose that the researchers' scaled up dolls suddenly found themselves in the human world of actual men and women. (a) Convert Ken's chest, waist, and hips measurements to z-scores. Which of those measures appears to be the most different from Ken's reference group? Justify your response with an appropriate statistical argument. Chapter 4, Review Pack Page 6 of 9 Chapter 4: Numerical Methods for Describing Data Review Pack (b) The z-scores for Barbie's Chest, Waist, and Hips when compared to active female adults are approximately 1.4 , 6.2 , and 4.7 respectively. Do these z-scores provide evidence to justify the claim that the Barbie doll is a thin representation of adult women? Justify your response with an appropriate statistical argument. (c) If men's waist measurements are approximately normally distributed, based on the sample above what is the approximate percentile of a 100 cm waist? Chapter 4, Review Pack Page 7 of 9 Chapter 4: Numerical Methods for Describing Data Review Pack 3. The Territory of Iowa was initially surveyed in the 1830's. The surveyors were very careful to note the trees and vegetation; it was believed at that time that the richness of the soil could be measured by the density of trees encountered. The sample of Ash tree diameters from the original survey of what is now Linn County, Iowa, is presented in the stem and leaf plot below. The display uses five lines for each stem. Thus, "1t|" is the stem for diameters of 12 and 13, "1f|" for 14 and 15, "1s|" for 16 and 17, and so on. (The "t" then stands for leaves that are twos and threes, the "f" for leaves of fours and fives, etc.) The mean diameter of ash trees in this sample is 11.500 inches, and the standard deviation is 3.842 inches. Linn County Trees in 1830 Ash Diameters 1|0 = 10 inches N = 102 0.| 0t|2 0f|44 0s|666777 0*|8888888888888888888999 1.|000000000000000000 1t|22222222222222222222222 1f|444444444445 1s|666666 1*|8888888888 2.|0 2t| 2f|4 2s| 2*| (a) What is the approximate diameter of an ash tree at the 20th percentile in this distribution? Chapter 4, Review Pack Page 8 of 9 Chapter 4: Numerical Methods for Describing Data Review Pack (b) The Empirical Rule would suggest that 68% of ash tree diameters are between what two values? (c) Chebyshev's Rule would suggest that at least 75% of the data are between what two values? Chapter 4, Review Pack Page 9 of 9