Statistics 512 Divisions 1 and 4, Spring 2014 Sample Midterm I
Transcription
Statistics 512 Divisions 1 and 4, Spring 2014 Sample Midterm I
Statistics 512 Divisions 1 and 4, Spring 2014 Sample Midterm I 1. In a simple linear regression problem with n = 28, the following estimates were ob¯ = −0.0773, tained: b0 = 4.30, b1 = 0.56, s{b0 } = 0.012, s{b1 } = 0.034, s = 0.1385, X SSX = 14.598. Values of X were in the range −1.43 to 0.683. (a) Write the simple linear regression model. Include the distributional assumption. (b) Write the estimated regression line and the estimated error variance. (c) If appropriate, estimate the mean value of Y for the given X. If not appropriate, say why. i. X = 0. ii. X = 2. (d) Give a 95% confidence interval for the slope of the regression line. 1 (e) Give a 99% prediction interval for the next value of Y when X = 0.5. 2. Short answer questions. Unless stated otherwise, each part is unrelated. (a) If the design matrix has dimensions 20 × 4, what is the dimension of the error vector? (b) A particular linear model has parameter values β0 = 20, β1 = 0.5, and σ 2 = 4. Assuming all the linear model assumptions are true, what is the probability that an observation Y would be greater than 19.4, given that X = 6? (c) A quarter of the data fall outside the prediction region, what is the approximate confidence level for the prediction intervals? (d) For a model with 5 predictors and 40 observations, if R2 is 0.3 what is the test statistic for the ANOVA F test? 2 (e) The α levels of 2 confidence intervals are 0.05 and 0.01 . Find a lower bound on the overall coverage rate for the two confidence intervals. (In other words, the probability that the confidence intervals cover both their values is at least what?) (f) You run a least squares regression on SAS and get an SSE value of 20 m2 . Your officemate claims to be able to find a line (using the same data) with an SSE of 15 m2 . Should you believe your officemate? Why or why not? (g) For a particular X value, the prediction interval has length 10; the confidence interval has length 5. If SSX is 3, what is the length of the confidence interval for the slope? (h) For a given set of data, the optimal Box-Cox transformation (of the response) is at λ = 0.25 . You decide to use the “suggested” transformation and take the square root of the response. What is the optimal Box-Cox transformation for the transformed data? 3 (i) Using the matrices from least squares estimation, what are the entries of the vector X0 e? 3. Refer to the SAS output on the last pages (marked OUTPUT FOR PROBLEM 3). The data are from a study of 78 seventh grade students. The goal is to predict GRADE (average school grade on a scale of 0 to 11) from variables which include IQ (score on an I.Q. test) and GENDER (0 = female, 1 = male). (a) Using the output for the simple linear regression, does there appear to be a linear relationship between GRADE and IQ? Give a test statistic with degrees of freedom and p-value to support your answer (you may use other evidence as well). (b) Individual 51 has GRADE = 0.53 and IQ = 103 . What value of GRADE is predicted for this individual by the estimated simple linear regression model? (c) The variable IQGEN is the product of IQ and GENDER. Examine the output for the model involving these three variables. Write down the estimated regression equation for this model. Also write down the two separate fitted lines for female and male students. (d) Examine the results of the t-tests for the three regression coefficients as well as the result of the (general linear) F -test labeled “SAMELINE”. The results of 4 this general linear test were produced with the SAS input line “test gender, iqgen;”. State the null hypotheses tested by each of these four tests and whether that hypothesis is rejected. What apparent conflict do you see between the results of these tests? 5 OUTPUT FOR PROBLEM 3 The REG Procedure Model: MODEL1 Dependent Variable: grade Analysis of Variance Sum of Squares Source DF Model Error Corrected Total 1 76 77 136.31881 203.10809 339.42689 Root MSE Dependent Mean Coeff Var 1.63477 7.44654 21.95343 Variable DF Parameter Estimate Intercept iq 1 1 -3.55706 0.10102 Mean Square F Value Pr > F 51.01 <.0001 136.31881 2.67247 R-Square Adj R-Sq 0.4016 0.3937 Parameter Estimates Standard Error t Value Pr > |t| 1.55176 0.01414 -2.29 7.14 95% Confidence Limits 0.0247 <.0001 -6.64766 0.07285 -0.46645 0.12919 The REG Procedure Model: MODEL1 Dependent Variable: grade Analysis of Variance Source DF Sum of Squares Mean Square Model Error Corrected Total 3 74 77 155.42484 184.00205 339.42689 51.80828 2.48651 Root MSE Dependent Mean Coeff Var 1.57687 7.44654 21.17586 R-Square Adj R-Sq F Value Pr > F 20.84 <.0001 0.4579 0.4359 Parameter Estimates Variable DF Parameter Estimate Standard Error t Value Pr > |t| Intercept iq gender iqgen 1 1 1 1 -2.25235 0.09400 -3.84266 0.02656 2.15377 0.02017 3.03670 0.02784 -1.05 4.66 -1.27 0.95 0.2991 <.0001 0.2097 0.3432 6 Test sameline Results for Dependent Variable grade Source DF Mean Square Numerator Denominator 2 74 9.55302 2.48651 7 F Value Pr > F 3.84 0.0259