Statistics 512 Divisions 1 and 4, Spring 2014 Sample Midterm I

Transcription

Statistics 512 Divisions 1 and 4, Spring 2014 Sample Midterm I
Statistics 512
Divisions 1 and 4, Spring 2014
Sample Midterm I
1. In a simple linear regression problem with n = 28, the following estimates were ob¯ = −0.0773,
tained: b0 = 4.30, b1 = 0.56, s{b0 } = 0.012, s{b1 } = 0.034, s = 0.1385, X
SSX = 14.598. Values of X were in the range −1.43 to 0.683.
(a) Write the simple linear regression model. Include the distributional assumption.
(b) Write the estimated regression line and the estimated error variance.
(c) If appropriate, estimate the mean value of Y for the given X. If not appropriate,
say why.
i. X = 0.
ii. X = 2.
(d) Give a 95% confidence interval for the slope of the regression line.
1
(e) Give a 99% prediction interval for the next value of Y when X = 0.5.
2. Short answer questions. Unless stated otherwise, each part is unrelated.
(a) If the design matrix has dimensions 20 × 4, what is the dimension of the error
vector?
(b) A particular linear model has parameter values β0 = 20, β1 = 0.5, and σ 2 = 4.
Assuming all the linear model assumptions are true, what is the probability that
an observation Y would be greater than 19.4, given that X = 6?
(c) A quarter of the data fall outside the prediction region, what is the approximate
confidence level for the prediction intervals?
(d) For a model with 5 predictors and 40 observations, if R2 is 0.3 what is the test
statistic for the ANOVA F test?
2
(e) The α levels of 2 confidence intervals are 0.05 and 0.01 . Find a lower bound
on the overall coverage rate for the two confidence intervals. (In other words, the
probability that the confidence intervals cover both their values is at least what?)
(f) You run a least squares regression on SAS and get an SSE value of 20 m2 . Your
officemate claims to be able to find a line (using the same data) with an SSE of
15 m2 . Should you believe your officemate? Why or why not?
(g) For a particular X value, the prediction interval has length 10; the confidence
interval has length 5. If SSX is 3, what is the length of the confidence interval
for the slope?
(h) For a given set of data, the optimal Box-Cox transformation (of the response)
is at λ = 0.25 . You decide to use the “suggested” transformation and take the
square root of the response. What is the optimal Box-Cox transformation for the
transformed data?
3
(i) Using the matrices from least squares estimation, what are the entries of the
vector X0 e?
3. Refer to the SAS output on the last pages (marked OUTPUT FOR PROBLEM 3).
The data are from a study of 78 seventh grade students. The goal is to predict GRADE
(average school grade on a scale of 0 to 11) from variables which include IQ (score on
an I.Q. test) and GENDER (0 = female, 1 = male).
(a) Using the output for the simple linear regression, does there appear to be a linear
relationship between GRADE and IQ? Give a test statistic with degrees of freedom
and p-value to support your answer (you may use other evidence as well).
(b) Individual 51 has GRADE = 0.53 and IQ = 103 . What value of GRADE is
predicted for this individual by the estimated simple linear regression model?
(c) The variable IQGEN is the product of IQ and GENDER. Examine the output for
the model involving these three variables. Write down the estimated regression
equation for this model. Also write down the two separate fitted lines for female
and male students.
(d) Examine the results of the t-tests for the three regression coefficients as well as
the result of the (general linear) F -test labeled “SAMELINE”. The results of
4
this general linear test were produced with the SAS input line “test gender,
iqgen;”.
State the null hypotheses tested by each of these four tests and whether that
hypothesis is rejected. What apparent conflict do you see between the results of
these tests?
5
OUTPUT FOR PROBLEM 3
The REG Procedure
Model: MODEL1
Dependent Variable: grade
Analysis of Variance
Sum of
Squares
Source
DF
Model
Error
Corrected Total
1
76
77
136.31881
203.10809
339.42689
Root MSE
Dependent Mean
Coeff Var
1.63477
7.44654
21.95343
Variable
DF
Parameter
Estimate
Intercept
iq
1
1
-3.55706
0.10102
Mean
Square
F Value
Pr > F
51.01
<.0001
136.31881
2.67247
R-Square
Adj R-Sq
0.4016
0.3937
Parameter Estimates
Standard
Error
t Value
Pr > |t|
1.55176
0.01414
-2.29
7.14
95% Confidence Limits
0.0247
<.0001
-6.64766
0.07285
-0.46645
0.12919
The REG Procedure
Model: MODEL1
Dependent Variable: grade
Analysis of Variance
Source
DF
Sum of
Squares
Mean
Square
Model
Error
Corrected Total
3
74
77
155.42484
184.00205
339.42689
51.80828
2.48651
Root MSE
Dependent Mean
Coeff Var
1.57687
7.44654
21.17586
R-Square
Adj R-Sq
F Value
Pr > F
20.84
<.0001
0.4579
0.4359
Parameter Estimates
Variable
DF
Parameter
Estimate
Standard
Error
t Value
Pr > |t|
Intercept
iq
gender
iqgen
1
1
1
1
-2.25235
0.09400
-3.84266
0.02656
2.15377
0.02017
3.03670
0.02784
-1.05
4.66
-1.27
0.95
0.2991
<.0001
0.2097
0.3432
6
Test sameline Results for Dependent Variable grade
Source
DF
Mean
Square
Numerator
Denominator
2
74
9.55302
2.48651
7
F Value
Pr > F
3.84
0.0259