Sample midterm2 (Solutions)
Transcription
Sample midterm2 (Solutions)
Business Statistics 41100 Instructor: Federico M. Bandi Sample midterm2 (Solutions) The allotted time is 1 hour and 30 minutes. The exam is divided into three parts. The first and second part are true-false and multiple choice, respectively. Please answer the true-false and multiple choice questions on the exam by circling the best answer. There will be no partial credit for these questions. The third part of the exam consists of two problems. Please answer these problems in the space provided on the exam (you may use the back of the sheets if necessary). You will get partial credit for these problems provided that your answers are organized and legible so that your train of thought can be easily followed. Note: You should answer all questions on the exam. The blue books will not be looked at. Please print your name in the space provided below and sign. Panicking is not allowed and will be penalized! Name: Please, sign the following pledge: “I pledge my honor that I have not violated the Honor Code during this examination.” Signature: True/False Multiple Choice Question 1 Question 2 8 Points 18 Points 11 Points 26 Points Total 63 Points 1 True or False (1 point each) [1] If we converted the X variable from dollars to analysis would become 100 times larger. 1 0 100 s of dollars, then the slope estimate b1 in regression TF True. Here is one way to see this. We know that b1 = rxy × sy . sx Also, we know that the standard deviation of a data set is in the same units as the data set. Hence, if we divide the data set X by 100 the standard deviation will also be divided by 100 and, consequently, the slope will be multiplied by 100 (since everything else is unchanged). [2] If the R2 of a simple linear regression is zero, then the Y and the X variable must be uncorrelated. TF True. The R-squared of a regression of Y on X is equal to the squared correlation between Y and X. Hence, an R-squared equal to zero implies a correlation (and a covariance) equal to zero. [3] If the t-statistic t is distributed as a t distribution with 2 degrees of freedom, then P (−1.96 < t < 1.96) > 0.95. TF False. The t distribution has more probability in the tails than the standard normal distribution. [4] If the assumptions of the SLR model hold, then a histogram of the standardized residuals should look normal with mean equal to 0 and standard deviation equal to 1. TF True. One of the assumptions of the SLR model is that the true residuals are normally distributed mean zero and variance σ 2 . Hence, the true standardized residuals should be normally distributed mean zero and variance 1. The same applies to the estimated standardized residuals (but, of course, normality would be an approximation in this case). [5] It is possible to reject a certain null hypothesis about a certain parameter of interest (the slope in the SLR model, say) even when the estimated value from the sample is equal to the conjectured value. TF False. If the estimated value from the sample is the same as the conjectured value, then the t statistic is equal to zero, the p-value is equal to 1 and we always fail to reject. [6] The p-value of a test is the probability of rejecting the null hypothesis when the null hypothesis is true. TF False. The probability of rejecting the null hypothesis when the null hypothesis is true is the level of the test (5%, 1%, and so on). 2 [7] It is possible to have a residual plot that shows evidence of both non-linearity and heteroskedasticity. TF True. The residuals might have a nonlinear pattern to them as well as increasing dispersion around some underlying curve. [8] A Durbin-Watson statistic that is equal to 2 provides evidence in favor of correlation in the residuals. TF False. When the DW statistic is equal to 2 we have no evidence of autocorrelation (see Chapter 5 in the notes.) 3 Multiple Choice (3 points each) [1] Consider the following variance-covariance matrix (i.e., variances are on the diagonal, covariances are off the diagonal) of monthly returns on the S&P 500 index and monthly returns on the W indsor mutual fund. S&P 500 0.00230401 0.00215591 S&P 500 W indsor W indsor 0.00215591 0.00236580 If the beta of W indsor is the slope estimate of a regression of W indsor on the S&P 500, what is the W indsor beta? (a) 0.9112 (b) 0.95 (c) 0.9357 (d) 1 (e) Cannot be computed based on the information given The formula for the slope coefficient is b1 = Covariance(Y, X) V ariance(X) (see Chapter 1 in the notes). Hence, b1 = 0.00215591 = 0.9357. 0.00230401 [2] Consider the following pairs of X and Y values: (2, 5), (3, 7), (4, 9), (5, 12), (10, ??). What value should ?? be for a regression of Y on X to deliver an R2 equal to 1? (a) 20 (b) 21 (c) 22 (d) No value can work (e) None of the above No value can work. For the R2 to be equal to 1 the five pairs should lie on a positively-sloped line. The first three numbers are on the line Y=1+2X but the fourth one is not. No choice of ?? would make the pairs lie on the same positively-sloped line. 4 [3] Consider the following SLR model: Yi = 1 + 2Xi + εi , where εi ; N (0, 4) i.i.d.. The error term ε is independent of X for every i. If X = 2, then Y is: (a) equal to 5 (b) N (5, 4) (c) N (0, 4) (d) N (5, 2) (e) None of the above The answer is (b). If X = 2, then Y = 5 + ε which implies that E(Y ) = 5 and V (Y ) = 4 since the error term has mean 0 and variance 4. Finally, note that Y is a linear function of a normal random variable since the error term is normally distributed. Linear functions of normal random variables are normal, thus Y is N (5, 4). [4] Which of the following statements is FALSE? (a) If we are simply interested in predicting Y given X, then non-linearity is a more serious violation of the standard assumptions than heteroskedasticity (b) If the estimated value from the sample is equal to 1.0001 and the conjectured value is equal to 1, then we always fail to reject the assumption (c) The larger the t-statistic, the smaller the p-value, the more we want to reject (d) The sample means of Y and X lie on the regression line (e) It is easier to estimate E(Y |X) than Y given X (f) None of the above Point (b) is false. If the standard error is very small, then the t statistic might be large enough to reject the hypothesis. The intuition is the following. If the sample allows us to estimate the parameter very well (with a small standard error, that is), then even small deviations from the conjecture might lead to rejections of the hypothesis (since the sample is very informative about the true parameter). See Chapter 4, Statistical vs. Practical Significance. 5 [5] A class of 100 students has just taken an exam. The exam consisted of 40 true-false questions. A diligent teaching assistant has recorded the number of correct answers (Y) and wrong answers (X) for each student. Subsequently, the diligent (but not so bright) teaching assistant has regressed Y on X. Which of the following statements is WRONG? (a) The estimated intercept is 40 (b) The estimated slope is −1 (c) The estimated s value is 0 (d) The correlation between Y and X is 1 (e) The R2 is 1 (f) None of the above The relationship between correct answers and wrong answers is Y = 40 − X. Hence, the wrong statement is (d). The correlation between Y and X is −1. (6) A common belief in finance is that days with more trading activity also tend to be days with larger price moves (positive or negative). Intuitively the greater the disagreement about the value of the asset across investors the greater the trading activity and the greater the price moves. A researcher decides to measure trading activity by the number of contracts traded (volume). The researcher regresses the absolute value of returns (absret) on volume and obtains the following regression output. The regression equation is absret = −0.0367 + 0.00000115volume Intercept volume Estimate −0.0367 0.00000115 Std error 0.3342 0.00000028 T-ratio −0.11 4.10 P-value 0.913 ?? Taking into account that the sample is very large, which of the following statements is WRONG? (a) Statistically the intercept does not play much of a role (b) The missing p-value is smaller than the reported p-value (c) An approximate 68% confidence interval for the true slope is 0.00000115 ± 1 ∗ 0.00000028 (d) At the 1% level, we fail to reject the null hypothesis that the true slope is equal to 0 (e) None of the above The wrong statement is (d). With a very large number of observations (as in the assumptions of the problem) the t cut-off value for a 1% test is about 3 (as in the standard normal case). But, 4.10 > 3, hence we reject the null. 6 Long Problems [1] (11 points) The market that the relationship between the return ¡ ¢ model that we discussed in class ¡ implies ¢ on any stock A RtA and the return on the market RtM can be represented using the following SLR model: RtA = β 0 + β 1 RtM + εt t = 1, 2, 3, ... Assume all the standard assumptions of the SLR model are satisfied (for example, the errors are iid and normally distributed with mean 0 and variance σ 2 ). Assume you are going to be given the following quantity: b= R2A − R1A R2M − R1M (a) (3 points) Compute the expected value of b, E(b). (Hint: treat the returns on the market as being non-random. Only the returns on the stock are random and, as always, their randomness is induced by the error terms through the model.) As in Chapter 2 (Section 2.5), plug the corresponding values into b first. b β 0 + β 1 R2M + ε2 − β 0 − β 1 R1M − ε1 R2A − R1A = R2M − R1M R2M − R1M ε2 − ε1 = β1 + M . R2 − R1M = Hence, µ E(b) = E β 1 + ε2 − ε1 R2M − R1M ¶ = β1 + E(ε2 ) − E(ε1 ) = β1, R2M − R1M by applying the standard formula for the expected value of a linear combination of random variables. (b) (2 points) Interpret your result from part (a). One or two sentences will more than suffice. 7 b is an unbiased estimator of the beta of the stock. (One can show that it is not as good as the least-squares estimator, but this is another story...) (c) (3 points) Compute the variance of b, V (b). (Hint: same as for part (a).) µ V (b) = V β 1 + ε2 − ε1 R2M − R1M ¶ = V ar(ε2 ) + V ar(ε1 ) 2σ 2 = ¡ M ¢ ¡ ¢2 2 R2 − R1M R2M − R1M by applying the standard formula for the variance of a linear combination of random variables. (d) (3 points) What is the probability that b > β 1 , i.e., P (b > β 1 )? As always, b is normally distributed since ε2 and ε1 are normally distributed. Hence, à ! 2σ 2 b ∼ N β1, ¡ ¢2 R2M − R1M and P (b > β 1 ) = 0.5. 8 [2] (26 points) We are interested in the relation between someone’s GMAT (0-800) and SAT (0-1600) scores. We have data on 41 GSB students. We use GMAT as the response variable (the regressand) and SAT as the explanatory variable (the regressor) and obtain the following regression: gmat = 403.62 + 0.214sat Table1 Intercept slope Estimate 403.62 0.21431 Std error 51.63 0.03921 T-ratio ?? 5.47 P-value 0.000 0.000 s = 30.39 and R2 =?? Table2 Analysis of Variance Source Regression Residual Error Total DF 1 39 40 SS 27597 ?? 63620 (a) (2 points) Give an interpretation for the sign and magnitude of the estimated slope coefficient. A higher SAT score implies a higher GMAT score. Theoretically (but only theoretically given the way SAT and GMAT scores are given) an increase in the SAT score of one point implies an (average) increase of a fifth of a point in the GMAT score. (b) (2 points) What is the value of the missing t-ratio? t= 403.62 = 7.81 51.63 9 (c) (3 points) Test the hypothesis that the slope is equal to 0 at the 1% level. (You should be very precise here.) P − value = 0 < 0.01 ⇒ reject (d) (3 points) Test the hypothesis that the intercept is equal to 500 at the 5% level. (You should be as precise as possible here.) ¯ ¯ ¯ 403.62 − 500 ¯ ¯ = 1.86 < about 2 ⇒ fail to reject t = ¯¯ ¯ 51.63 (e) (2 points) You have a younger cousin whose SAT score is 1400 . Use the following output to predict your younger cousin’s GMAT score. Choose only one interval and explain your choice. Table3 Fit 703.66 St. Error Fit 5.89 95% CI (691.76, 715.57) 95% PI (641.05, 766.28) You should choose the 95% PI since your younger cousin is going to take the test only once. The 95% CI would be an interval for the expected (average) score over several trials. (See Chapter 4, subsection 4.7, in the notes.) (f) (3 points) Table 3 above gives you sf it . What is the standard error of the predicted value (spred from class)? 10 s2pred = s2f it + s2 = 5.892 + 30.392 = 34.69 + 923.55 = 958.24 ⇒ spred = 30.95. (g) (3 points) Use your result from part (f) to find the t cut-off value t39,0.025 . 766.28 = 703.66 + t39,0.025 30.95 ⇒ t39,0.025 = 11 766.28 − 703.66 = 2.02. 30.95 (h) (2 points) Use your result from part (g) to re-run the test in part (d). (You should be very precise now.) ¯ ¯ ¯ 403.62 − 500 ¯ ¯ = 1.86 < 2.02 ⇒ fail to reject ¯ t=¯ ¯ 51.63 (i) (3 points) What value is the missing term in Table 2? SSE = 63620 − 27597 = 36023. (j) (3 points) What value is the missing R2 ? R2 = 27597 = 43.3%. 63620 12