Graded Homework 5
Transcription
Graded Homework 5
STAT 225 Fall 2014 ---Graded Homework 5 -------DUE in class on Wednesday, December 10, 2014. ONLINE students—deadline 11:59 pm on December 10, 2014. SHOW WORK!!!!! NOTE: for any question asking you to determine a probability----you MUST write out a probability statement using proper notation!!!! You may use Excel or other software to assist with some calculations, but you will need to show the work. 1. A company manufactures concrete blocks that are used for construction purposes. The weights of the individual concrete blocks are normally distributed with an average weight of 11.0 kg, and a standard deviation of 0.3 kg. a) What is the probability that a randomly chosen concrete block weighs less than 10.5 kg? b) If the concrete blocks are too light (under 10 kg) the company cannot sell them. Likewise if they are too heavy (over 11.4 kg), the company is wasting materials and will not sell them. What percent of the concrete blocks falls in the sellable range? c) Given that a concrete block weighs at least 10.6 kg, what is the probability that it weighs more than 11.5 kg? d) What weight represents the upper quartile of the weights of the concrete blocks? 2. A box of strawberries is normally distributed with an average weight of 1 pound, and a standard deviation of 1 ounce. Answer the following questions by using the empirical rule. (Note: there are 16 ounces in 1 pound. Abbreviations: pound = lb ; ounce = oz ) a) What is the probability of getting a box of strawberries greater than 1lb 1oz? b) What is the probability of getting a box of strawberries less than 14oz? c) Given that a particular box of strawberries weighs less than 1lb 2oz, what is the probability that it is greater than 15 oz? d) Laura needs to buy 10 boxes of strawberries for her party. What is the expected value and variance of the total weight of Laura’s strawberries? e) What is the probability that at least two of Laura’s strawberry boxes are greater than 14oz? 3. Purdue has an undergraduate admission rate of 60%. (That means that 60% of those who apply to Purdue for undergraduate studies are offered admission.) Suppose Purdue has 30,000 applicants for the 2015-16 year. Please answer the following questions: a) Let X be the number of undergraduate students that Purdue offers admission to for 2015-16. What is the distribution and support of X? b) What is the probability that between 20,000 and 21,000 (inclusive) high school students get admitted? (Write out the probability statement and summarize the formula (with values) that you would use (but DO NOT SOLVE). c) Is there an approximation method to calculate the probability in part b)? State your reasoning (with values calculated) to support this. d) Use that approximation method to find the probability that between 20,000 and 21,000 (inclusive) undergraduate students are admitted next year. e) If a student gets admitted, he/she will enroll at Purdue with a probability of 0.41. What is the expected size of the class entering Purdue in Fall 2015? 4. The following table contains the Monster University 2015 Fall graduate school application information a) Complete the table with the row totals, column totals and an overall total. b) What percent of applicants are admitted by Monster University? Is this a marginal, conditional, or joint probability? c) What is the probability that an applicant applied to Engineering and got rejected? Is this a marginal, conditional, or joint probability? d) What is the probability that an applicant applied to the School of Business? Is this a marginal, conditional, or joint probability? e) Rank the rejection rates for each College/School from lowest rejection rate to highest rejection rate. (Make sure to show your work) Are these marginal, conditional, or joint probabilities? f) What question does the Chi-Square Test attempt to answer in the context of this problem? g) Create a table of expected counts. h) Create a table of partial 𝜒 2 values. i) What is the value of the 𝜒 2 statistic? What are its degrees of freedom? j) Using significance level 0.05, state the conclusion of the Chi-Square test in the context of this problem. State your reasoning behind your conclusion. 5. Using the graph below and the labels given, answer the following questions. a) b) c) d) e) f) g) h) i) j) What does the point labeled F represent on the boxplot? What does the point labeled A represent on the boxplot? How would you calculate the range for this dataset? Which letter represents the value of the first quartile? How would you calculate the IQR? Between what two letters does the 35th percentile fall? What letter represents the median? What is the approximate value for the median? Would you expect the mean to be less than the median, greater than the median or about the same value as the median? What is the approximate value of Q3? If you were to summarize this data, would it be more appropriate to use the 5-Number summary or the mean and standard deviation? Explain your answer. 6. A student recorded the number of people in line at the Starbucks in the Purdue Memorial Union at 10:00 am for 3 weeks. The data is shown below: 8, 30, 23, 15, 28, 24, 0, 24, 23, 14, 18, 23, 17, 12, 5, 3, 18, 19, 23, 27, 17 a) b) c) d) e) f) Calculate the 5 number summary. Are there any outliers? Support your claim with calculations. Produce a modified boxplot of the data---make sure it is to scale and the axis is clearly marked. Determine the 65th percentile of the data. Calculate the mean and variance of the data. Which measures of center and spread should be used to describe the data? Why? 7. Using Figures 1 and 2 above answer the following. a) State whether each data set is symmetric, skewed right, or skewed left. b) For each figure, identify letter a, b, and c as the mean, median, mode, range, or standard deviation. (Choose only one of these per letter. Note: the locations of a, b, and c are approximations.) 8. Given each scenario, identify an appropriate type of graph that you would use to determine the answer. (e.g. pie chart, scatterplot, etc.) a) We want to understand distribution of body masses of 60 male undergraduates. b) We want to see if there is a relationship between weight of a passenger vehicle and fuel efficiency (measured as miles per gallon). c) We want to understand the distribution of the bushels of corn harvested in Indiana between 1975 and 2001. d) We want to understand the distribution of household incomes in Tippecanoe County for the whole year. e) A poll of 500 undergraduates was taken to determine the respondent’s favorite movie of 2013. f) What is the percentage of Indiana vehicles that are small passenger cars, large passenger cars, trucks, SUVs, and other? g) Is the correlation between daily outside temperature and amount of natural gas for heating positive or negative? 9. The number of hours that a random sample of students spent studying on STAT 225 Final exam are as follows: 4.0, 4.1, 6.1, 9.2, 0, 2.9, 3.5, 3.9, 4.3, 6.4, 7.1, 8.6, 9.1, 7.7, 9.1, 3.7, 5.4, 6.7, 7.4, 7.9 a) b) c) d) Create stem-and-leaf plot of the data. Compute the five number summary for this data. Compute the sample mean and sample standard deviation for this data. Are there any outliers in the data set? Support your claim mathematically. 10. The auditors in a company want to understand how the weight of a vehicle in its fleet affects its gas mileage. A random sample of 30 vehicles in the large North American fleet of this company produced the following data pairing vehicle weight in hundreds of pounds and miles per gallon (MPG) of the vehicle. The data and scatterplot are below. Vehicle # 1 2 3 4 5 6 7 8 9 10 Weight MPG 36 34 30 27 38 33 30 34 35 34 13.96 17.16 18.24 20.28 14.58 19.96 20.97 12.49 15.8 17.59 Vehicle # 11 12 13 14 15 16 17 18 19 20 Weight MPG 31 26 39 33 26 29 35 37 35 29 20.18 23.85 13.61 17.56 22.32 23.43 15.23 13.27 17.62 18.61 Vehicle # 21 22 23 24 25 26 27 28 29 30 The least squares regression line is 𝑦̂ = −0.7713𝑥 + 42.821 . a) b) c) d) Weight MPG 36 35 31 26 35 39 33 39 36 38 14.07 17.88 18.7 20.78 18.24 11.34 18.91 12.85 12.5 12.07 𝑅 2 = 0.7692 What is the slope of the regression line? What does this mean in terms of the story? What is the correlation between vehicle weight and MPG? What percent of the variation in MPG is not explained by the regression on weight? Predict the MPG when the vehicle weight in hundreds of pounds is 27, 32 and 45. Are these predictions valid? Explain your reasoning. e) Determine the residual for MPG when weight in hundreds of pounds is 27. 11. Dr. JD Statmann at Central College teaches an undergraduate math class and an undergraduate statistics class. He has several students that are in both his math class and his stat class. He randomly selected 5 students and recorded their scores. The data are shown in the table below. Student 1 2 3 4 5 Math Score 95 85 80 70 60 Stat Score 85 95 70 65 70 a) Produce a scatterplot of the data, with the Math score vs. Stat score. Draw a rough linear regression line by eyeballing. b) How would you characterize the relationship between the Math Score and Stat Score in terms of direction, form and strength? c) Compute the mean and standard deviation of the Math Score and the Stat Score. d) Calculate the covariance of the Math Score and the Stat Score. e) Calculate the correlation between the Math Score and the Stat Score. f) What percentage of the variation in the Stat Score is explained by the linear relationship with Math Score? 12. For the following questions a-f, choose the correct graph. Each answer may be used once, more than once or not at all. a) b) c) d) e) f) Which graph shows you data in which the mean is greater than the median? Which graph clearly shows you the 25th percentile? Which graph should only be used for qualitative data? Which graph shows data that was collected using a survey with a biased question? Which graph shows you a distribution that is skewed left? Which graph shows you a distribution that is skewed right? 13. For each of the scatterplots below, identify the most appropriate correlation from the following possible set of correlations 1, -0.7714, 0.9495, 1.3126, -0.0250, 0, -1, -0.3287, -3.1257, 0.235, and 0.4308. a) b) c) d) e) f) Correlation of Figure A Correlation of Figure B Correlation of Figure C Correlation of Figure D Correlation of Figure E Correlation of Figure F