assignment 11.
Transcription
assignment 11.
Exercise 10.3.4 March 29, 2015 1 Question Consider the n = 11 data values in the following table. Observation 1 2 3 4 5 6 X -5.00 -4.00 -3.00 -2.00 -1.00 0.00 Y -10.00 -8.83 -9.15 -4.26 -0.30 -0.04 Observation X Y 7 1.00 3.52 8 2.00 5.64 9 3.00 7.28 10 4.00 7.62 11 5.00 8.51 Suppose we consider the simple linear regression to describe the relationship between the response Y and the predictor X. (a) Plot the data in a scatter plot. (b) Calculate the least-squares line and plot this on the scatter plot in part (a). (c) Plot the standardized residuals against X. (d) Produce a normal probability plot of the standardized residuals. (e) What are your conclusions based on the plots produced in parts (c) and (d)? (f) If appropriate, calculate 0.95-confidence intervals for the intercept and slope. (g) Construct the ANOVA table to test whether or not there is a relationship between the response and the predictors. What is your conclusion? (h) If the model is correct, what proportion of the observed variation in the response is explained by changes in the predictor? 1 2 Solution (a) The simple linear regression model is Yi = β0 + β1 Xi + i where i ∼ N (0, σ 2 ). Furthermore, i and j are independent for i 6= j. ● ● ● 5 ● 0 ● ● Y ● −5 ● −10 ● ● ● −4 −2 0 2 4 X Figure 1: Scatter plot of Y against X. A linear relationship between X and Y seems reasonable. ########################## ## R codes ## X <- seq(-5, 5, by = 1) Y <- c(-10.00, -8.83, -9.15, -4.26, -0.30, -0.04, 3.52, 5.64, 7.28, 7.62, 8.51) plot(X, Y) ########################## (b) The estimates of the slope and intercept parameters are: b1 = 2.1023636 2 and b0 = −0.0009091, respectively. Hence, the estimated regression line is Yˆi = −0.0009091 + 2.1023636Xi . The estimated regression line is superimposed to the scatter plot in the following figure. ● ● ● 5 ● 0 ● ● Y ● −5 ● −10 ● ● ● −4 −2 0 2 4 X Figure 2: Estimated linear regression line superimposed to the scatter plot of Y against X. ########################## ## R codes ## reg.fit <- lm(Y ~ X) summary(reg.fit) plot(X, Y) abline(reg.fit) ########################## (c) The standardized residuals are plotted against X in the following figure. 3 3 2 ● ● ● 0 ● ● ● ● ● −1 Standardized Residuals 1 ● ● −3 −2 ● −4 −2 0 2 4 X Figure 3: Standardized residuals against X. ########################## ## R codes ## n <- length(Y) e <- residuals(reg.fit) SSE <- sum(e^2) MSE <- SSE/(n - 2) stdres <- e/sqrt(MSE) plot(X, stdres, ylim = c(-3, 3), xlab = "X", ylab = "Standardized Residuals") abline(h = 0, lty = 2) ########################## (d) The normal probality plot is produced in the following figure. ########################## ## R codes ## res.std = rstandard(reg.fit) qqnorm(res.std, xlim = c(-2, 1.5), ylim = c(-2, 1.5), ylab="Standardized Residuals", 4 1.5 Normal Q−Q Plot 1.0 ● ● ● 0.0 ● ● ● −0.5 ● ● −1.5 −1.0 Standardized Residuals 0.5 ● −2.0 ● ● −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 Normal Scores Figure 4: Normal Probability Plot of the residuals. xlab="Normal Scores") qqline(res.std) ########################## (e) Both figures indicate that the normal simple linear regression model is reasonable. (f) The 0.95-confidence intervals for both of the regression coefficients are given below: P (−1.053171 ≤ β0 ≤ 1.051353) = 0.95 and P (1.769609 ≤ β1 ≤ 2.435118) = 0.95. ########################## ## R codes ## confint(reg.fit, level=0.95) ########################## 5 Regression Residuals Total Df Sum Sq Mean Sq F value Pr(>F) 1 486.19 486.19 204.27 0.0000 9 21.42 2.38 10 507.61 (g) The analysis of variance (ANOVA) table is given below We want to test the null hypothesis H0 : β1 = 0, against alternative HA : β1 6= 0. The test statistic is F = 204.27 which follows F distribution with 1 and 9 degrees of freedoms. The corresponding P-Value is 0.000 which is very small and smaller than 0.05 = 1 − γ. Hence, the null hypothesis is rejected at the 0.05 level of significance. There exists a straight line relationship between X and Y . ########################## ## R codes ## anova(reg.fit) ########################## (h) The coefficient of determination is computed as R2 = 486.19 = 0.9578. 507.61 This tells us that approximately 95.78% of the variability in Y is expressed by the fitted regression model. ########################## ## R codes ## summary(reg.fit) ########################## 6