Multiple regression analysis: Further topics EC 252
Transcription
Multiple regression analysis: Further topics EC 252
Multiple regression analysis: Further topics EC 252 Introduction to Econometric Methods Abhimanyu Gupta March 2,5, 2015 1 / 30 Today’s lecture: I How can we compare different sets of estimates when the measurement units are different? I How can we accurately calculate the percentage effect in a log-model? I How can we interpret the coefficients when we use quadratic functions? I Adjusted R-squared measure and how to select between different models I How can we use our estimates for residual analysis? 2 / 30 Outline Effects of Data Scaling on OLS Statistics Functional form More on Logarithmic Functional Forms Percentage effects in log models Models with Quadratics Models with Interaction Terms Adjusted R-squared Residual Analysis Reading: Wooldridge, Chapter 6 3 / 30 Outline Effects of Data Scaling on OLS Statistics Functional form More on Logarithmic Functional Forms Percentage effects in log models Models with Quadratics Models with Interaction Terms Adjusted R-squared Residual Analysis 4 / 30 Effects of Data Scaling on OLS Statistics We begin by examining the effect of rescaling the dependent or independent variables. I I We will see that they change in ways that preserve all measured effects and testing outcomes. Many variables do not have a natural scale: I I I Pounds versus thousands of pounds Degrees Celcius versus degrees Fahrenheit Time in months versus years Application: How can we compare different sets of estimates, when the variables have been rescaled? Or to put it another way: does choice of measurement units matter? I In Problem Set 5 (Q.3), we used the definition of the OLS estimator to show how estimates change when we rescale variables. I Here we take a different approach, and re-write our estimation model directly 5 / 30 Example: Salaries of professional basketball players log\ (wage) =β0 + β1 minutes + β2 points + β3 exper + β4 expersq n = 269, R 2 = 0.49 where minutes ... minutes played per season, points ...average points per game. Question: Suppose you replaced minutes by hours played per season ( hours ), how would the coefficient change? 1. The two variables are related by: #minutes = 60 × # hours 2. Thus, we can re-write the model as log\ (wage) =β0 + β1 (60hours ) + β2 points + β3 exper + β4 expersq log\ (wage) =β0 + (60β1 ) hours + β2 points + β3 exper + β4 expersq 3. Conclusion: We expect the coefficient on hours to be 60 times as large as the coefficient on minutes. 6 / 30 Example (Regression results for basketball salaries) Dependent variable: log (wage) (1) (2) minutes 0.000159 (0.0000808) hours points exper expersq constant R2 Observations 0.0612 (0.0120) 0.156 (0.0384) -0.00612 (0.00276) 5.493 (0.114) 0.491 269 0.00954 (0.00485) 0.0612 (0.0120) 0.156 (0.0384) -0.00612 (0.00276) 5.493 (0.114) 0.491 269 7 / 30 Effects of Data Scaling on OLS Statistics Similarly, suppose we re-scale the dependent variable. Given the fitted values: yˆ = βˆ0 + βˆ1 x suppose we had multiplied y by a constant c so that y˜ := c × yˆ. We can re-write the model: y˜ := c yˆ = c βˆ0 + βˆ1 x = (c βˆ0 ) + (c βˆ1 ) x Conclusion: We expect all the coefficients to go up by factor c. 8 / 30 Effects of Data Scaling on OLS Statistics What if some variables enter the regression in logarithms? I If the dependent variable is in logarithmic form, changing the unit of measurement does not affect the slope coefficient: log (c1 yi ) = log (c1 ) + log (yi ) for any constant c1 > 0. I So the new intercept will be βˆ0 − log (c1 ). I Similarly, changing the unit of measurement of any xj , where log (xj ) appears in the regression, only affects the intercept. β1 log (cxi ) = β1 log (c) + β1 log (xi ) =⇒ This corresponds to what we know about percentage changes and elasticities: they are invariant to the units of measurement of either y or the xj . 9 / 30 Outline Effects of Data Scaling on OLS Statistics Functional form More on Logarithmic Functional Forms Percentage effects in log models Models with Quadratics Models with Interaction Terms Adjusted R-squared Residual Analysis 10 / 30 More on Logarithmic Functional Forms We have already seen that we often write the dependent variable and/or some independent variables in logarithmic form: \ log (y ) = βˆ0 + βˆ1 log (x1 ) + βˆ2 x2 Why use logarithms? 1. Coefficients have appealing interpretation. =⇒ We can ignore the units of measurement of variables in log form. 2. When y > 0, models using log (y ) as the dependent variable often satisfy the CLM assumptions more closely than models using y . 3. Taking logs usually narrows the range of the variable. =⇒ The estimates are less sensitive to outliers. 11 / 30 Some rules of thumb: I When a variable is a (positive) currency amount, the log is often taken (e.g. wages, salaries, firm sales). I Variables which are large integer values also appear in logarithmic form (e.g. population, total number of employees, pupils). I Variables that are measured in years usually appear in their original form (e.g. education, experience, age, tenure). I A variable that is a proportion or a percent are usually used in level form (e.g. the unemployment rate). Limitation: the log cannot be used if a variable takes on zero or negative values. I We often deal with variables which can take value 0 in principle (wealth, income). I In cases where a variable y is nonnegative but can take on the value 0, log (1 + y ) is sometimes used. 12 / 30 Interpretation of the coefficient in a log-model Consider the general estimated model: \ log (y ) = βˆ0 + βˆ1 log (x1 ) + βˆ2 x2 I Take x1 to be fixed, and change x2 by ∆x2 ... I ...or imagine that there are two units with the same value of x1 , but x2 values that differ by ∆x2 I Then the predicted difference in the dependent variable is \ ∆log (y ) = βˆ2 ∆x2 . So far we have used the approximation %∆y ≈ 100∆log (y ). I I I Example: If log(y ) goes up by 0.01, this represents an increase in y of approximately 1%. However, this approximation becomes less accurate as the change in log (y ) becomes larger. 13 / 30 Interpretation of the coefficient in a log-model Fortunately, we can compute the exact percentage change in y as follows: %∆ˆ y = 100 [exp(βˆ2 ∆x2 ) − 1] Example (Computing the exact percentage change for y ) Suppose we obtain the following estimates in a regression: dy = 0.988 + 0.503x log Question: What is the percentage effect of increasing x from 2.3 to 4.6? 1. What would be the prediction using our previous approximation? 2. What is the exact percentage change? 3. How far are we off by using the approximation? 14 / 30 Models with Quadratic Terms I Quadratic functions are used often in applied economics to capture decreasing or increasing marginal effects. I Simplest case: y = β0 + β1 x + β2 x 2 + u I The estimated equation is: yˆ = βˆ0 + βˆ1 x + βˆ2 x 2 I Take derivatives to get the partial effect: d yˆ = βˆ1 + 2βˆ2 x dx I So we have the approximation: ∆ˆ y ≈ βˆ1 + 2βˆ2 x ∆x which can readily be seen to depend on x itself. 15 / 30 Example (At which experience is log wage expected to be highest?) I A common empirical case is one of positive, but declining, returns: log\ (wage) = 5.493 + 0.00016 minutes + 0.061 points + 0.155 exper − 0.006 expersq , (0.114) (0.00008) (0.012) (0.038) (0.00276) I Holding other covariates constant: log\ (wage) = constant + 0.155exper − 0.006exper 2 1.2 1 I Find the maximum wage as a function of experience by solving the first-order condition to obtain: ∗ exper = 12.92 0.8 0.6 0.4 0.2 0 0 2 4 6 8 10 12 14 16 18 20 Years of experience 16 / 30 Models with Quadratics (ctd.) I We can combine the use of quadratics along with logarithms... I ...but extra care is needed to figure out the correct partial effects (Wooldridge, pp. 192–197). I Finally, other polynomial terms can be included in regression models, such as a cubic term, a quartic term, etc. I The interpretation would proceed along similar lines as in the quadratic case. Polynomials in x are an example of how we can allow for a flexible effect of x on y in a multiple linear regression model. I 17 / 30 Models with Interaction Terms I Sometimes, it is natural for the partial effect of the dependent variable with respect to an explanatory variable to depend on the magnitude of another explanatory variable. I Example: consider the following model of house prices: price = β0 + β1 sqrft + β2 bdrms + β3 sqrft × bdrms + β4 bthrms + u where sqrft refer to size in square feet, bdrms to number of bedrooms, bthrms to number of bathrooms. =⇒ There is an interaction effect between square footage and number of bedrooms. I It’s important to be able to understand and use models with interaction effects. 18 / 30 Models with Interaction Terms I I The partial effect of bdrms on price (holding all other variables fixed) in this linear model is: ∆price = β2 + β3 sqrft ∆ bdrms If β3 > 0, an additional bedroom yields a higher increase in housing price for larger houses. I we say: the effect of an additional bedroom increases with the size of the house. 19 / 30 Outline Effects of Data Scaling on OLS Statistics Functional form More on Logarithmic Functional Forms Percentage effects in log models Models with Quadratics Models with Interaction Terms Adjusted R-squared Residual Analysis 20 / 30 Goodness-of-Fit and Selection of Regressors Recall: I A small R 2 implies that the error variance is large relative to the variance of y. =⇒ It might be difficult to precisely estimate the βj . I However, a large error variance can be offset by a large sample size. =⇒ If we have enough data, we may be able to precisely estimate the βj even though we have not controlled for many unobserved factors. I Note: poor explanatory power has nothing to do with unbiased estimation of the βj ! I That is determined by the zero conditional mean assumption (MLR.4). 21 / 30 Goodness-of-Fit and Selection of Regressors Recall the definition of our standard R-squared measure, and re-write slightly: R2 = 1 − SSR SSR/n =1− SST SST /n where SSR/n is an estimate of σu2 and SST is an estimate of σy2 . I What happens as we include more and more variables? I SSR goes down (better fit) I SST stays the same (only depends on the observed data, not on our estimates) I If we try to have a high R 2 , we will include more and more variables. I It may be useful to adjust for the number of regressors we have included as a “fair way” of assessing how much of the variation our model explains. 22 / 30 Goodness-of-Fit and Selection of Regressors Replace the estimators for σu2 and σy2 with their unbiased counterparts. Then the ‘adjusted’ goodness-of-fit measure is as follows: Definition (Adjusted R-squared) 2 R =1− SSR / (n − k − 1) SST / (n − 1) Intuition: 2 I Adding variables reduces SSR, so raising R ... I ...but larger k increases −1/(n − k − 1), so lowering R ... I ...making the overall effect ambiguous 2 23 / 30 Goodness-of-Fit and Selection of Regressors 2 We now understand that R imposes a penalty for adding additional independent variables to a model. 2 I If we add a new independent variable to a regression, R increases if, and only if, the t statistic on the new variable is greater than one in absolute value. I If we add a new group of independent variables to a regression, R increases if, and only if, the F statistic for the joint significance of the new variables is greater than unity. 2 24 / 30 Goodness-of-Fit and Selection of Regressors We have already seen how we can decide whether we can exclude a particular set of variables from our model: I a t-test for an individual variable I an F -test for sets of variables This is a form of model selection, but it only works for comparing nested models: I one model (the restricted model) is a special case of the other model (the unrestricted model). 2 We can use R to compare between non-nested models. 25 / 30 Goodness-of-Fit and Selection of Regressors 2 One use of R is as an aid to selecting the best functional form for your regression, when non-nested sets of covariates are under consideration. I For example, consider the following two models: y = β0 + β1 log x + u and y = β0 + β1 x + β2 x 2 + u How would you decide which specification to adopt? No t- or F -test helps here... I ...and the first model contains one fewer parameter, so R 2 would not allow for a fair comparison. I I One option is to adopt R as a decision criterion. This approach does not work for deciding which functional form is appropriate for the dependent variable. I In Lecture 10, we look at a specific test for functional form. 2 26 / 30 Outline Effects of Data Scaling on OLS Statistics Functional form More on Logarithmic Functional Forms Percentage effects in log models Models with Quadratics Models with Interaction Terms Adjusted R-squared Residual Analysis 27 / 30 Residual Analysis Sometimes we are interested not in the systematic relationship (summarized by the coefficient estimates), but in the deviations from the expected value for specific observations. I After estimating the coefficients, we can compute the residual for each observational unit. Recall that uˆi = yi − yˆi I We can then study whether we have outliers in the data (unusually large individual residuals) I We can look at the histogram corresponding to the residuals to get an idea of their distribution. I We can study whether particular units have positive or negative residuals, i.e. whether they lie above or below (resp.) their predicted value. 28 / 30 Example Question: Is a specific individual over- or underpaid relative to his peers? Suppose you were manager of a basketball team, and wanted to buy a new player in the market. You could use the residual analysis to I study how individual pay relates to performance measures I which player might be currently ‘underpaid’ given his performance. 29 / 30 Outline Effects of Data Scaling on OLS Statistics Functional form More on Logarithmic Functional Forms Percentage effects in log models Models with Quadratics Models with Interaction Terms Adjusted R-squared Residual Analysis Next lecture: Binary variables (Reading: Wooldridge Chapter 7) 30 / 30