UNIVERSITY OF TORONTO AT SCARBOROUGH Sample Exam STAC67
Transcription
UNIVERSITY OF TORONTO AT SCARBOROUGH Sample Exam STAC67
UNIVERSITY OF TORONTO AT SCARBOROUGH Sample Exam STAC67 Duration - 3 hours AIDS ALLOWED: THIS EXAM IS OPEN BOOK (NOTES) Calculator (No phone calculators are allowed) LAST NAME_____________________________________________________ FIRST NAME_____________________________________________________ STUDENT NUMBER___________________________________________ There are 17 pages including this page. Total marks: 95 PLEASE CHECK AND MAKE SURE THAT THERE ARE NO MISSING PAGES IN THIS BOOKLET. 1) The following SAS output (from PROC UNIVARIATE) was obtained from a study of the relationship between the boiling temperature of water (in degrees Fahrenheit) and the atmospheric pressure (in inches of mercury). In the SAS outputs below the boiling temperature is denoted by BT and the atmospheric pressure by AP. The UNIVARIATE Procedure Variable: BT N Mean Std Deviation Skewness Uncorrected SS Coeff Variation 31 191.6 8.37321125 0.58262076 1140130.68 4.37015201 Sum Weights Sum Observations Variance Kurtosis Corrected SS Std Error Mean 31 5939.6 70.1106667 -0.5660379 2103.32 1.50387314 The UNIVARIATE Procedure Variable: AP N Mean Std Deviation Skewness Uncorrected SS Coeff Variation 31 20.0276452 3.86371881 0.96406479 12882.1534 19.2919276 Sum Weights Sum Observations Variance Kurtosis Corrected SS Std Error Mean 31 620.857 14.928323 0.23090812 447.849691 0.69394438 The CORR Procedure 2 Variables: BT AP Pearson Correlation Coefficients, N = 31 Prob > |r| under H0: Rho=0 BT AP BT 1.00000 0.98455 <.0001 AP 0.98455 <.0001 1.00000 a) [5 points] Assuming that a linear relationship exists between AP and BT and that the data satisfy the necessary assumptions, calculate the least squares regression equation of BT on AP. Sol B1=rSy/Sx = Bo= y_bar-b1x_bar b) [2 points] What proportion of the variability in the boiling temperature of water (i.e. BT) is explained by the this simple linear regression model? Sol This is R-sq=0.98455^2 2 c) [5 points] Calculate a 95% confidence interval for the slope of the regression line. Sol Find MSE first using R^2 = 1-SSE/SST and then use the formula for the CI for b1 Or use SSR=b1_SqSxx 2) A researcher wished to study the relation between patient satisfaction (Y) and patient’s age (X1), severity of illness (X2, an index) and anxiety level (X3). Some SAS outputs for the regression analysis of his data are given below. You may assume that the model is appropriate (i.e. satisfies the assumptions needed.) for answering the questions below. The REG Procedure Model: MODEL1 Model Crossproducts X'X X'Y Y'Y Variable Intercept x1 x2 x3 y Intercept x1 x2 46 1766 2320 105.2 2832 1766 71378 90051 4107.2 103282 2320 90051 117846 5344.7 140814 Model Crossproducts X'X X'Y Y'Y Variable Intercept x1 x2 x3 y x3 y 105.2 4107.2 5344.7 244.62 6327 2832 103282 140814 6327 187722 The REG Procedure Model: MODEL1 Dependent Variable: y X'X Inverse, Parameter Estimates, and SSE Variable Intercept x1 x2 x3 y Intercept x1 x2 3.2477116535 0.0092211391 -0.06793079 -0.067298817 158.49125167 0.0092211391 0.0004560816 -0.000318596 -0.004662271 -1.141611847 -0.06793079 -0.000318596 0.0023924814 -0.017710085 -0.442004262 X'X Inverse, Parameter Estimates, and SSE Variable x3 y 3 Intercept x1 x2 x3 y -0.067298817 -0.004662271 -0.017710085 0.4982577303 -13.47016319 158.49125167 -1.141611847 -0.442004262 -13.47016319 4248.8406818 Parameter Estimates Variable Intercept x1 x2 x3 DF Parameter Estimate Standard Error t Value Pr > |t| Type I SS 1 1 1 1 158.49125 -1.14161 -0.44200 -13.47016 18.12589 0.21480 0.49197 omitted 8.74 -5.31 -0.90 omitted <.0001 <.0001 0.3741 omitted 174353 8275.38885 480.91529 364.15952 i) [4 points] Test whether there is a regression relation between Y and the explanatory variables X1, X2 and X3. State the null and the alternative hypotheses. Use α = 0.05. Sol SSE = 4248.8406818 SST = 187722 – 46 x (2832)^2 and so calculate F ii) [4 points] Calculate a 95% confidence interval for β3 (the coefficient of X3 in the above model) Sol bera3_hat = -13.47016 S^2(beta3_hat) = MSE x 3rd diagonal element of X'X Inverse = (SSE/(46-3+1)) x 0.4982577303 CI = bera3_hat +/- ts iii) [4 points] Calculate a 95% confidence interval for β 2 − β3 ( β2 and β3 are the coefficient of X2 and X3 respectively in the above model) Sol estimate of β 2 − β3 = -0.44200 - -13.47016 SE^2 of β 2 − β3 = S^2(beta2_hat) + S^2(beta3_hat) -2 x cov(beta2_hat , beta3_hat) cov(beta2_hat , beta3_hat) is MSE x 2nd row 3rd col element of of X'X Inverse iv) [4 points] Calculate and interpret the value of the coefficient of partial determination between Y and X2, given that X1 is in the model. Sol SSR(X2|X1)/SSE(X1) 4 We have SST. SSR(X1) = Type I SS for X1 = 8275.38885 and SSE(X1) = SST – SSR(X1) SSR(X2|X1) = Type 1 SS for X2 = 480.91529 v) [4 points] Test whether both X2 and X3 can be dropped from the model (i.e. keeping only X1 in the model). Use α = 0.05. Sol SSdrop = 480.9+364.16 = 845.07 vi) [4 points] Test whether both X1 and X2 can be dropped from the model (i.e. keeping only X3 in the model). Use α = 0.05. Sol to calculate SSR(reduced) calculate b1 for this simple linear regression model and then SSR =b1^2 x Sxx and then use the drop test vii) [4 points] Give the ANOVA table (with all entries calculated) for the regression model for Y with the two independent variables X1 and X2. 3) (Based on q16 p173 Terry this is q5 STAB27 Final W08) A company designing and marketing lighting fixtures needed to develop forecasts of sales (.SALES = total monthly sales in thousands of dollars). The company considered the following predictors: ADEX = adverting expense in thousands of dollars MTGRATE = mortgage rate for 30-year loans (%) HSSTARTS = housing starts in thousands of units The company collected data on these variables and the SAS outputs below were obtained from this study. The REG Procedure Model: MODEL1 Dependent Variable: SALES Number of Observations Read 46 Analysis of Variance Source DF Sum of Squares Mean Square Model Error Corrected Total 3 42 45 6187071 1026032 7213102 2062357 24429 F Value Pr > F 84.42 <.0001 5 Root MSE Dependent Mean 156.29883 1631.32609 R-Square Adj R-Sq 0.8578 0.8476 Parameter Estimates Variable DF Parameter Estimate Standard Error t Value Pr > |t| Variance Inflation 1 1 1 1 1612.46147 0.32736 -151.22802 12.86316 432.34267 0.43919 39.74780 1.18625 3.73 0.75 -3.80 10.84 0.0006 0.4602 0.0005 <.0001 0 2.82856 2.99244 1.14005 Intercept ADEX MTGRATE HSSTARTS Plot of Residuals vs Predicted Values (Response: Sales) 400 300 Residuals 200 100 0 -100 -200 -300 1000 1250 1500 1750 Predicted Values 2000 2250 Plot of Residuals vs Normal Scores (Response: SALES) 400 300 Residuals 200 100 0 -100 -200 -300 -2 -1 0 Normal Scores 1 2 i) [3 points] Calculate the value of R-squared for the regression of ADEX on MTRATE and HSSTARTS. 6 Ans VIF = 1/ (1-R-sq(ADEX | , MTGRATE, HSSTARTS)) = 2.82856 And so R-sq = 0.646463218 Here is the complete output: Regression Analysis: ADEX versus MTGRATE, HSSTARTS The regression equation is ADEX = 766 - 71.2 MTGRATE - 0.067 HSSTARTS Predictor Constant MTGRATE HSSTARTS S = 54.2712 Coef 765.54 -71.219 -0.0670 SE Coef 94.38 8.516 0.4118 R-Sq = 64.6% T 8.11 -8.36 -0.16 P 0.000 0.000 0.871 VIF 1.1 1.1 R-Sq(adj) = 63.0% ii) State whether the following statements are true or false. Circle your answer. [1 point for each part] a) The residual plots above show that the distribution of residuals is left-skewed. (True / False) Ans F b) The residual plots above show clear evidence of non-constant variance of errors. (True / False) Ans F c) The small p-value (p = 0.000 from the ANOVA table) for the global F-test for model 1 implies that all three variables should be retained in the model. (True / False) Ans F d) If we add another predictor for the above model with three predictors (so that we have 4 predictors), the SSE for that model (i.e. the model with 4 predictors) will be greater 1026032. (True / False) Ans F, SSE decreases as k increases e) If we add another predictor for the above model with three predictors (so that we have 4 predictors), the SSRegression for that model (i.e. the model with 4 predictors) will be less than 6187071. (True / False) 7 And F, SSReg increases as k increases. f) If we add another predictor for the above model with three predictors (so that we have 4 predictors), the SSTotal for that model (i.e. the model with 4 predictors) will be less than 7213102. (True / False) Ans F SST does not depend on X’s g) The value of the adjusted R-squared for the regression model for SALES on MTGRATE and HSSTARTS (i.e with only two predictors) will be less than 0.8476. Ans F Regression Analysis: SALES versus MTGRATE, HSSTARTS The regression equation is SALES = 1863 - 175 MTGRATE + 12.8 HSSTARTS Predictor Constant MTGRATE HSSTARTS S = 155.489 Coef 1863.1 -174.54 12.841 SE Coef 270.4 24.40 1.180 R-Sq = 85.6% T 6.89 -7.15 10.88 P 0.000 0.000 0.000 VIF 1.1 1.1 R-Sq(adj) = 84.9% 4) [5 points] A researcher suspected that the systolic blood pressure of individuals are relates to weight. He calculated the least squares regression equation of systolic plod pressure on weight based on a sample of 14 individuals. The estimated slope of this simple linear regression model was 0.13173 with a standard error of 0.04625 ( i.e b1 =0.13173 and sb1 = 0.04625). Calculate the correlation between systolic blood pressure and weight for this sample of individuals. Sol T= 0.13173/0.04625 = 2.848216216 F=R-sq/[(1-R-sq)/(14-2)] = t^2 = 8.112335613 And so R-sq = 8.11/(12+8.11) = 0.4032819493 This question is based on the data form summer 06 B22 final (regression question). Here are some useful outputs 8 Systolic blood pressure readings of individuals are thought to be related to weight The following MINITAB output was obtained from a regression analysis of systolic blood pressure on weight (in pounds). The next five questions are based on this information. Descriptive Statistics: Systolic, Weight Variable Systolic Weight N 14 14 N* 0 0 Mean 154.50 194.07 SE Mean 1.49 7.18 StDev 5.57 26.86 Minimum 145.00 164.00 Q1 150.75 173.00 Median 153.50 188.00 Q3 158.50 212.00 Correlations: Systolic, Weight Pearson correlation of Systolic and Weight = 0.635 The regression equation is Systolic = 129 + 0.132 Weight Predictor Constant Weight Coef 128.935 0.13173 StDev 9.055 0.04625 T 14.24 2.85 P 0.000 0.015 R-Sq = (omitted) Analysis of Variance Source Regression Residual Error Total DF 1 12 13 SS 162.75 240.75 403.50 MS 162.75 20.06 F 8.11 P 0.015 5) The data and some useful information on a response variable y and two explanatory variables x1 and x2 are given below: y 3 5 8 11 7 6 12 3 x1 1 2 2 3 3 2 3 2 x2 1 1 2 2 4 4 2 3 1.67 -0.57 -0.11 ( X ' X) = -0.57 0.33 -0.08 -0.11 -0.08 0.12 −1 a) [ 4 points] Estimate the linear regression model for y on the two explanatory variables x1 and x2. 9 Sol Use ( X ' X) −1 X′Y b) [ 6 points] MSE for the simple linear regression model of y on x1 is 4.5. Test for the lack of fit of this model (i.e. simple linear regression model of y on x1) using pure error sums of squares. Regression Analysis: y versus x1, x2 The regression equation is y = - 0.75 + 4.41 x1 - 0.966 x2 Predictor Constant x1 x2 Coef -0.746 4.407 -0.9661 S = 2.04193 SE Coef 2.642 1.181 0.7033 T -0.28 3.73 -1.37 R-Sq = 73.6% P 0.789 0.014 0.228 R-Sq(adj) = 63.0% Analysis of Variance Source Regression Residual Error Total Source x1 x2 DF 1 1 DF 2 5 7 SS 58.028 20.847 78.875 MS 29.014 4.169 F 6.96 P 0.036 Seq SS 50.161 7.867 MTB > info Information on the Worksheet Column C1 C2 C3 Count 8 8 8 M3 3 x MTB > print Name y x1 x2 3 XPXI3 XPXI3 Data Display Matrix XPXI3 10 1.67373 -0.57203 -0.11017 -0.572034 0.334746 -0.076271 -0.110169 -0.076271 0.118644 MTB > Regress 'y' 1 'x1' ; SUBC> Constant; Regression Analysis: y versus x1 The regression equation is y = - 1.64 + 3.79 x1 Predictor Constant x1 Coef -1.643 3.786 S = 2.18763 SE Coef 2.742 1.169 R-Sq = 63.6% T -0.60 3.24 P 0.571 0.018 R-Sq(adj) = 57.5% Analysis of Variance Source Regression Residual Error Total DF 1 6 7 SS 50.161 28.714 78.875 MS 50.161 4.786 F 10.48 P 0.018 MTB > Regress 'y' 1 'x1' ; SUBC> Constant; SUBC> Pure; SUBC> Brief 2. Regression Analysis: y versus x1 The regression equation is y = - 1.64 + 3.79 x1 Predictor Constant x1 Coef -1.643 3.786 S = 2.18763 SE Coef 2.742 1.169 R-Sq = 63.6% T -0.60 3.24 P 0.571 0.018 R-Sq(adj) = 57.5% Analysis of Variance Source Regression Residual Error Lack of Fit Pure Error Total DF 1 6 1 5 7 SS 50.161 28.714 1.714 27.000 78.875 MS 50.161 4.786 1.714 5.400 F 10.48 P 0.018 0.32 0.597 1 rows with no replicates 11 5)[5 points] Consider the simple linear regression model: Yi = β 0 + β1 X i + ε i with the usual assumptions (i.e. E (ε i ) = 0 for all i, V (ε i ) = σ 2 for all i, Cov(ε i , ε j ) = 0 whenever, i ≠ j . The normality of ε i ’s is not requires for the results below.). Let b0 and b1 be the least squares estimators of β 0 and β1 respectively. 2 1 (X − X ) , where ei = Yi − Yˆi . Prove that Var[ei ] = σ 2 1 − − n i n ( X i − X )2 ∑ i =1 Sol Var[ei ] = Var[Yi − Yˆi ] = Var[Yi ] + Var[Yˆi ] − 2Cov[Yi , Yˆi ] (X j − X ) Cov[Yi , Yˆi ] = Cov[Yi , b0 + b1 X i ] = Cov[Yi , ∑ k ′jY j + X i ∑ k jY j ] where k j = and S XX j =1 j =1 1 k ′j = − X k j . n n n n n j =1 j =1 Cov[Yi , Yˆi ] = Cov[Yi , b0 + b1 X i ] = Cov[Yi , ∑ k ′jY j + X i ∑ k jY j ] = ki′Cov(Yi , Yi ) + X i ki Cov(Yi , Yi ) = σ 2 [ ki′ + X i ki ] 2 (X − X ) 1 1 1 = σ 2 − X ki + X i ki = σ 2 + ( X i − X ) ki = σ 2 + n i 2 n n n (Xi − X ) ∑ i =1 and so Var[ei ] = Var[Yi − Yˆi ] = Var[Yi ] + Var[Yˆi ] − 2Cov[Yi , Yˆi ] 2 2 1 (X − X ) (X − X ) 1 − 2σ 2 + n i = σ 2 +σ 2 + n i n 2 2 n (Xi − X ) (Xi − X ) ∑ ∑ i =1 i =1 2 1 X X ( − ) = σ 2 1 − − n i 2 n (Xi − X ) ∑ i =1 ■ 12 6) A psychologist conducted a study to examine the nature of the relation, if any, between an employee’s emotional stability (X) and the employee’s ability to perform in a task group (Y). Emotional stability was measured by a written test, for which the higher the score, the greater the emotional stability. Ability to perform in a task group (Y = 1 if able, Y = 0 if unable) was evaluated by the supervisor. The psychologist is considering a logistic regression model for the data. The SAS output below is based on the results for 27 employees. The SAS System The LOGISTIC Procedure Model Information Data Set Response Variable Number of Response Levels Number of Observations Model Optimization Technique WORK.A Y 2 27 binary logit Fisher's scoring Response Profile Ordered Value Y Total Frequency 1 2 0 1 13 14 Probability modeled is Y=1. Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq 8.1512 7.3223 5.7692 1 1 1 0.0043 0.0068 0.0163 Likelihood Ratio Score Wald Analysis of Maximum Likelihood Estimates Parameter Intercept X DF 1 1 Estimate -10.3089 0.0189 Standard Error 4.3770 0.00788 The SAS System Wald Chi-Square 5.5472 5.7692 Pr > ChiSq 0.0185 0.0163 The LOGISTIC Procedure Odds Ratio Estimates Effect X Point Estimate omitted 95% Wald Confidence Limits omitted omitted 13 [2 points] i) Estimate the probability that an employee with emotional stability score of 500 (i.e. X = 500) will be able to perform the task. [4 points] ii) Calculate a 90 percent confidence interval for the odds ration of X. 7) A personnel officer in a company administered four aptitude tests to each of 25 applicants for entry-level clerical positions. For purpose of this study, all 25 applicants were accepted for positions irrespective of their test scores. After a period each applicant was rated for proficiency (denoted by Y) on the job. The SAS output below is intended to identify the best subset of the four tests (denoted by X1, X2, X3, and X4). The SAS System 1 The REG Procedure Model: MODEL1 Dependent Variable: Y Adjusted R-Square Selection Method Number of Observations Read Number of Observations Used Number in Model 3 4 2 3 2 3 3 2 1 2 2 1 2 1 1 25 25 Adjusted R-Square R-Square C(p) 0.9560 0.9555 0.9269 0.9247 0.8661 0.8617 0.8233 0.7985 0.7962 0.7884 0.7636 0.7452 A 0.2326 0.2143 0.9615 0.9629 0.9330 0.9341 0.8773 0.8790 0.8454 0.8153 0.8047 0.8061 0.7833 0.7558 0.4642 0.2646 0.2470 3.7274 5.0000 17.1130 18.5215 47.1540 48.2310 66.3465 80.5653 84.2465 85.5196 97.7978 110.5974 269.7800 375.3447 384.8325 Variables in Model X1 X1 X1 X1 X3 X2 X1 X1 X3 X2 X2 X4 X1 X1 X2 X3 X2 X3 X2 X4 X3 X2 X4 X4 X3 X4 X3 X4 X4 X3 X4 X2 Even though this SAS output is for R-square selection method, it has useful information that can be used in other selection methods. 14 a) [5 points] Identify the variable that will enter the model at the second step of the stepwise regression procedure. Explain clearly how you identified this variable. Sol X3 has the largest Rsq among the four single variable models and so it enters the model at the first step (assuming F = [Rsq/df_reg]/[(1-Rsq)/df_error] is significant at the required sig level to enter the model. Now among the three two-variable models containing X3 , the model with X1 has the highest R-sq and so X1 has the highest t-ratio among the three models containing X3 . If a variable will be selected at this step, it must be X1 (see MINITAB output below) b) [2 points] Identify the variables that you will select if you want to use the Mallow’s C(p) criterion. Explain clearly the reason for your answer. Sol The model with Cp = number of variables +1 (other than the model with all variables) Eg the model with X1 X3 X4 which has Cp = 3.7274 (close to 4) c) [3 points] Calculate the value of the adjusted R-square for the model with the predictors X1 and X1 only. (Note this is the model for which the adjusted R-square has been deleted in the above SAS output) Ans 1-(24/22)*(1-0.4642) = 0.4154909091 Here are some useful outputs The SAS System 1 The REG Procedure Model: MODEL1 Dependent Variable: Y Adjusted R-Square Selection Method Number of Observations Read Number of Observations Used Number in Model 3 25 25 Adjusted R-Square R-Square C(p) 0.9560 0.9615 3.7274 Variables in Model X1 X3 X4 15 4 2 3 2 3 3 2 1 2 2 1 2 1 1 0.9555 0.9269 0.9247 0.8661 0.8617 0.8233 0.7985 0.7962 0.7884 0.7636 0.7452 0.4155 0.2326 0.2143 0.9629 5.0000 0.9330 17.1130 0.9341 18.5215 0.8773 47.1540 0.8790 48.2310 0.8454 66.3465 0.8153 80.5653 0.8047 84.2465 0.8061 85.5196 0.7833 97.7978 0.7558 110.5974 0.4642 269.7800 0.2646 375.3447 0.2470 384.8325 The SAS System L X1 X1 X1 X3 X2 X1 X1 X3 X2 X2 X4 X1 X1 X2 X2 X3 X2 X4 X3 X2 X4 X3 X4 X3 X4 X4 X3 X4 X2 2 The REG Procedure Model: MODEL2 Dependent Variable: Y Number of Observations Read Number of Observations Used 25 25 Stepwise Selection: Step 1 Variable X3 Entered: R-Square = 0.8047 and C(p) = 84.2465 Analysis of Variance Source DF Sum of Squares Mean Square Model Error Corrected Total 1 23 24 7285.97715 1768.02285 9054.00000 7285.97715 76.87056 Variable Intercept X3 F Value Pr > F 94.78 <.0001 Parameter Estimate Standard Error Type II SS F Value Pr > F -106.13284 1.96759 20.44719 0.20210 2071.05812 7285.97715 26.94 94.78 <.0001 <.0001 Bounds on condition number: 1, 1 --------------------------------------------------------------------------Stepwise Selection: Step 2 Variable X1 Entered: R-Square = 0.9330 and C(p) = 17.1130 Analysis of Variance Source DF Sum of Squares Mean Square Model Error Corrected Total 2 22 24 8447.34255 606.65745 9054.00000 4223.67128 27.57534 Parameter F Value Pr > F 153.17 <.0001 Standard 16 Variable Intercept X1 X3 Estimate Error Type II SS F Value Pr > F 12.68526 2789.93352 0.05369 1161.36540 0.12307 6051.48790 The SAS System The REG Procedure Model: MODEL2 Dependent Variable: Y 101.17 42.12 219.45 <.0001 <.0001 <.0001 -127.59569 0.34846 1.82321 ^L 3 Stepwise Selection: Step 2 Bounds on condition number: 1.0338, 4.1351 --------------------------------------------------------------------------Stepwise Selection: Step 3 Variable X4 Entered: R-Square = 0.9615 and C(p) = 3.7274 Analysis of Variance Source DF Sum of Squares Mean Square Model Error Corrected Total 3 21 24 8705.80299 348.19701 9054.00000 2901.93433 16.58081 Variable Intercept X1 X3 X4 F Value Pr > F 175.02 <.0001 Parameter Estimate Standard Error Type II SS F Value Pr > F -124.20002 0.29633 1.35697 0.51742 9.87406 0.04368 0.15183 0.13105 2623.35826 763.11559 1324.38825 258.46044 158.22 46.02 79.87 15.59 <.0001 <.0001 <.0001 0.0007 Bounds on condition number: 2.8335, 19.764 --------------------------------------------------------------------------All variables left in the model are significant at the 0.1500 level. No other variable met the 0.1500 significance level for entry into the model. Summary of Stepwise Selection Variable Step Entered 1 2 3 X3 X1 X4 Variable Removed Number Partial Model Vars In R-Square R-Square 1 2 3 0.8047 0.1283 0.0285 0.8047 0.9330 0.9615 C(p) 84.2465 17.1130 3.7274 F Value Pr > F 94.78 <.0001 42.12 <.0001 15.59 0.0007 17 8) In a study of the larvae growing in a lake, the researchers collected data on the following variables. Y = The number of larvae of the Chaoborous collected in a sample of the sediment from an area of approximately 225 cm 2 of the lake bottom X1 = The dissolved oxygen (mg/l) in the water at the bottom X2 = The depth (m) of the lake at the sampling point Some useful SAS outputs for fitting the regression model Y = β 0 + β1 X 1 + β 2 X 2 + β 3 X 1X 2 + ε , using the data from this study are given below. Assume that the model given below is appropriate (i.e. satisfies all the necessary assumptions) to answer the questions below. The REG Procedure Model: MODEL1 Dependent Variable: Y Number of Observations Read Number of Observations Used 14 14 Analysis of Variance Source DF Sum of Squares Mean Square Model Error Corrected Total 3 10 13 1311.15692 154.84308 1466.00000 437.05231 15.48431 F Value Pr > F 28.23 <.0001 Parameter Estimates Variable Intercept X1 X2 X1X2 DF Parameter Estimate Standard Error t Value Pr > |t| Type I SS 1 1 1 1 24.30070 -2.31549 1.22493 -0.00563 7.91874 1.08721 0.95978 0.15660 3.07 -2.13 1.28 -0.04 0.0119 0.0590 0.2307 0.9720 5054.00000 1197.46122 113.67566 0.02003 State whether each of the following statements is true or false (based on the information given above). [1 point for each part] i) The effect of the amount of oxygen dissolved in water (i.e. X1) on the number of larvae depends on the depth (at α = 0.1) (True / False) Ans F, the p-value for X1X2 is > 0.1 ii) The terms X2 and X1X2 have no significant contribution to the model and so both these terms can be dropped from the above model (at α = 0.1) (True / False) 18 Ans False 113.68+0.02 = 113.7 ANS/2 = 56.85 ANS/15.48 = 3.67248062 p-value = 1- 0.936301 = 0.063699, F(2, 10, 0.10) = 2.92 and so rej Ho. iii) The value of the t-statistic for testing the hull hypothesis H 0 : β 2 = 1 against H1 : β 2 > 1 , is greater than 1.20. (True / False) Ans F t = (b2-1)/SE(b2) = (1.2249 – 1)/ 0.9598 = 0.2343196499 < 1.20 iv) The p-value of the t-test for testing the null hypothesis H 0 : β1 = 0 against H1 : β1 > 0 is less than 0.10. (True / False) Ans F P-value = 1- 0.0295 = 0.9705 v) The sum of squares of errors (SSE) for the simple linear regression model of Y on X1 is greater than 150.0. (True / False) Ans T it is gteater than the SSE for the bigger model above i.e 154.84. 19 Multiple-choice questions (Miscellaneous) (2 points for each question) 9) If the slope of a least squares regression line of Y on X is negative, what else must be negative? A) B) C) D) E) The correlation of X and Y The slope of a least squares regression line of X on Y The coefficient of determination (R-sq) for the regression of Y on X More than one of the above must be negative None of the above need be negative And D, correlation of X and Y and the slope of X on Y must be negative. 10) If there were no linear relationship between X and Y (i.e. correlation (r) = 0), what would the predicted value Y (predicted using the estimated least squares regression equation) at any given value of X? A) B) C) D) E) 0 mean of Y the values (i. e. Y ) mean of X values(i. e. X ) (Mean of Y values - Mean of X values ) (i.e. Y − X ) It depends on variance of Y Ans B 20 Total 95 points 21