COR1-GB.1305 SAMPLE FINAL EXAM
Transcription
COR1-GB.1305 SAMPLE FINAL EXAM
COR1-GB.1305 SAMPLE FINAL EXAM This is the question sheet. There are 10 questions, each worth 10 points. Please write all answers in the answer book, and justify your answers. Good Luck! 1) The following table presents data collected in the 1960s for 21 countries on X=Annual Per Capita Cigarette Consumption (“Cigarette”), and Y=Deaths from Coronary Heart Disease per 100,000 persons of age 35-64 (“Coronary”). Country United States Canada Australia New Zealand United Kingdom Switzerland Ireland Iceland Finland West Germany Netherlands Greece Austria Belgium Mexico Italy Denmark France Sweden Spain Norway Cigarette 3900 3350 3220 3220 2790 2780 2770 2290 2160 1890 1810 1800 1770 1700 1680 1510 1500 1410 1270 1200 1090 Coronary 259.9 211.6 238.1 211.8 194.1 124.5 187.3 110.5 233.1 150.3 124.7 41.2 182.1 118.1 31.9 114.3 144.9 144.9 126.9 43.9 136.3 Scatterplot of Coronary vs Cigarette 250 Coronary 200 150 100 50 0 1000 1500 2000 2500 Cigarette 3000 3500 4000 Versus Fits (response is Coronary) 100 Residual 50 0 -50 -100 100 120 140 160 180 Fitted Value 200 220 240 260 Regression Analysis: Coronary versus Cigarette The regression equation is Coronary = 29.5 + 0.0557 Cigarette Predictor Constant Cigarette Coef 29.45 0.05568 S = 46.5558 SE Coef 29.48 0.01288 R-Sq = 49.6% T 1.00 4.32 P 0.330 0.000 R-Sq(adj) = 46.9% Analysis of Variance Source Regression Residual Error Total DF 1 19 20 SS 40484 41181 81666 MS 40484 2167 F 18.68 P 0.000 A) Based on the scatterplot of Coronary versus Cigarette, does there appear to be a linear relationship between cigarette consumption and heart disease? If so, does the relationship appear to be negative or positive? (1 Point.) B) What patterns or problems, if any, do you see in the residuals versus fits plot? Would you feel reasonably comfortable in fitting a simple linear regression model to this data set? (1 Point.) C) Write the equation for the fitted model. (2 Points.) D) Give an interpretation of the fitted slope, βˆ . (3 Points.) E) How much natural variability is associated with the estimated intercept αˆ ? (3 Points.) 2) For the situation described in Problem 1, answer these questions. A) Compute the residual for Greece. (2 Points.) B) Do you think that natural variability alone could account for such a large value of βˆ as actually found here? Explain. (2 Points.) C) Using the Minitab output, determine whether sufficient statistical evidence exists to conclude that there is a positive linear relationship between Cigarette and Coronary at the 1% level of significance. (2 Points.) D) Based on R 2 , assess the strength of the linear relationship between Cigarette and Coronary. (2 Points.) E) Do the p-value for βˆ and the value of R 2 provide contradictory evidence on the strength of the linear relationship between smoking and heart disease? Explain. (2 Points.) Questions 3) and 4) pertain to a data set on quarterly sales (in Millions of Dollars) of Lowe's Companies, a home improvement retailer, for the time period 1997-2002 (n=24). We will use a response variable of Y=log Sales. The explanatory variables are Housing Starts (in millions) and Mortgage Rate. Figure 1 gives a scatterplot for log Sales vs. Housing Starts. This is followed by the corresponding Minitab Simple Linear Regression output. Fig 1: Scatterplot of Log Sales vs. Housing Starts, Lowe's Companies Quarterly, 1997-2002 3.9 3.8 Log Sales 3.7 3.6 3.5 3.4 3.3 3.0 3.5 4.0 Housing starts 4.5 5.0 Regression Analysis: Log Sales versus Housing starts The regression equation is Log Sales = 2.95 + 0.166 Housing starts Predictor Constant Housing starts S = 0.129419 Coef 2.9459 0.16578 SE Coef 0.2067 0.05147 R-Sq = 32.0% T 14.25 3.22 P 0.000 0.004 R-Sq(adj) = 29.0% Analysis of Variance Source Regression Residual Error Total DF 1 22 23 SS 0.17374 0.36848 0.54223 MS 0.17374 0.01675 F 10.37 P 0.004 3) A) Based on the fitted linear regression model, if Housing Starts increase by 1 million, what happens to log Sales? (2 Points). B) Is there evidence of a positive linear relationship between Housing Starts and log Sales at the .003 (3 in 1000) level of significance? (3 points). C) Construct a 95% confidence interval for the true coefficient of Housing Starts (3 Points). D) Interpret the confidence interval you constructed in Part C) (2 Points). The table below gives Minitab output for the multiple regression of log Sales vs. Housing Starts and Mortgage Rate Regression Analysis: Log Sales versus Housing starts, Mortgage The regression equation is Log Sales = 3.43 + 0.152 Housing starts - 0.0580 Mortgage Predictor Constant Housing starts Mortgage S = 0.128297 Coef 3.4330 0.15198 -0.05805 SE Coef 0.4617 0.05235 0.04930 R-Sq = 36.3% T 7.44 2.90 -1.18 P 0.000 0.009 0.252 R-Sq(adj) = 30.2% Analysis of Variance Source Regression Residual Error Total DF 2 21 23 SS 0.19656 0.34566 0.54223 MS 0.09828 0.01646 F 5.97 P 0.009 4) A) Based on the multiple regression output above, is there evidence at the 5% level of significance of a negative relationship between the Mortgage Rate and log Sales? (3 Points.) B) Based on the simple and multiple regression output, does Mortgage Rate seem to be an important variable for predicting log Sales, above and beyond what can be achieved using Housing Starts alone? (2 Points.) C) Explain how the F-statistic of 5.97 can be obtained from other numbers given in the Minitab output. (2 Points.) D) Do the results of the F-test imply that, beyond a reasonable doubt, all of the true slope coefficients in the model are nonzero? Explain. (3 Points.) 5) If the assumptions for the simple linear regression model are all satisfied and the sample size is n=6, then what is the probability that half of the data points (that is, exactly three of the points) will lie above the true regression line? 6) Consider the following general statement: “The larger the p-value, the higher the probability that the alternative hypothesis is true.” If this statement is correct, explain why. If this statement is not correct, explain why not and then provide a correct statement starting with “The larger the p-value …”. 7) One hundred randomly selected milk cows were observed for one week and then given a genetically engineered drug designed to increase milk production. The increase in milk production (second week minus first week) averaged to 11 gallons with a sample standard deviation of 50 gallons. A) State the appropriate null and alternative hypotheses for this problem, in terms of μ . (2 Points.) B) What is the meaning of μ (in terms of cows)? (2 Points.) C) What do the null and alternative hypotheses imply about the effectiveness of the drug? (2 Points.) D) Give all values of the significance level α at which the null hypothesis can be rejected. (2 Points.) E) Suppose the drug had no effect. Then out of 1000 random samples of 100 cows, how many samples would be expected to yield an increase in milk production at least as large as what was found in our sample? (2 Points.) 8) Consider a new rule for testing H 0 : μ = μ 0 based on a large sample size. First, we observe the t-statistic. If the t-statistic is positive, then we perform a hypothesis test of H 0 : μ = μ 0 versus H A : μ > μ 0 at significance level .05. If the t-statistic is negative, then we perform a hypothesis test of H 0 : μ = μ 0 versus H A : μ < μ 0 at significance level .05. If the null hypothesis is true and we use this method, what is the probability that we will find the results to be statistically significant at level .05? 9) True or False: “The least-squares method allows us to compute the slope and intercept of the true regression line based on sample data.” Explain. 10) For a random sample of 10 observations from a standard normal distribution, what is the probability that the absolute value of the sample mean will exceed 2.262 SE, where SE is the estimated standard error of the mean?