Handout
Transcription
Handout
EEP/IAS 118 - Introductory Applied Econometrics Spring 2015 1 Sylvan Herskowitz Section Handout 6 Warm-up 1: OVB Practice A researcher is interested in studying the relationship between smoking and life expectancy. To do this, she estimates the following regression model: years = β 0 + β˜1 packsperweek + β˜2 f emale + u In particular, she is interested in β˜1 which is estimate for the number of packs of cigarettes an individual smoked per week on average. Years is the number of years that a person in the data set lived. 1. What sign do you expect β˜1 to have? 2. What omitted variables do you think may be biasing β˜1 ? 3. What two correlations do we need to know in order to sign this bias? 4. From this omitted variable, do we expect that our estimate for β˜1 is too high or too low? 5. Given our intuition about the direction of the bias on β˜1 , if β˜1 = −1.7, which true value for β 1 do we think would be more likely: -1.5 or -1.9? Why? 1 EEP/IAS 118 - Introductory Applied Econometrics Spring 2015 2 Sylvan Herskowitz Section Handout 6 Final Example: OVB in Action In this section, I use the wage data (WAGE1.dta) from your textbook to demonstrate the evils of omitted variable bias and show you that the OVB formula works. Let’s pretend that this sample of 500 people is our whole population of interest, so that when we run our regressions, we are actually revealing the true parameters instead of just estimates. We’re interested in the relationship between wages and gender, and our “omitted” variable will be tenure (how long the person has been at his/her job). Suppose our population model is: (1) log(wage)i = β 0 + β 1 f emalei + β 2 tenurei + ui First let’s look at the correlations between our variables and see if we can predict how omitting tenure will bias βˆ 1 : . corr lwage female tenure | lwage female tenure -------------+--------------------------lwage | 1.0000 female | -0.3737 1.0000 tenure | 0.3255 -0.1979 1.0000 If we ran the regression: (2) log(wage)i = β˜ 0 + β˜ 1 f emalei + ei ...then the information above tells us that β˜ 1 β 1 . Let’s see if we were right. Imagine we ran the regressions in Stata (we did) and we get the below results for our two models: (1) (2) log(wage)i = 1.6888 − 0.3421 f emalei + 0.0192tenurei + ui log(wage)i = 1.8136 − 0.3972 f emalei + ei From these results we now “know” that β 1 = and β˜ 1 = . This means that our BIAS is equal to: There’s one more parameter missing from our OVB formula. What regression do we have to run to find its value? The Stata results give us: Now we can plug all of our parameters into the bias formula to check that it in fact gives us the bias from leaving out tenure from our wage regression: β˜ 1 = E[ βˆ˜ 1 ] = β 1 + β 2 ρ1 = = 2 EEP/IAS 118 - Introductory Applied Econometrics Spring 2015 3 Sylvan Herskowitz Section Handout 6 Switching Gears: Big Picture - Hypothesis Testing In the last few lectures this class has changed gears. We went from talking about assumptions for unbiased estimates, regression results, and R2 to suddenly reverting back to basic statistics and hypothesis testing. Why? The reason is because up until now, we haven’t really needed to use statistics to calculate a fitted regression line to our data. But now we want to start making claims such as “with 95% confidence we can reject the null hypothesis that income is not related to education” (and to actually know where this comes from and why it is a valid claim). In less technical phrasing, we want to be clear how much we trust our results. Just because OLS gives us the best fit for our model to the data, doesn’t mean that we should believe a given slope is necessarily capturing a true relationship. We need to use statistics to tell us how much confidence we have that our results aren’t just random noise. This is what we are now building towards, but it requires us to step back from interpreting regression results to think about distributions, sampling distributions, and confidence intervals. The Population: For now, lets forget about regression coefficients relating one variable to another and think of a much simpler random variable: the sample mean. First, in lecture we saw that in Nicaragua we may have a true population distribution of incomes such as this: Because any individual person’s income could be randomly drawn from this distribution, income here can be thought of as a random variable, X. Check: 1. What is the total area under the illustrated kernel density? 2. If we wanted to calculate the likelihood of randomly drawing an income below 10,000 Cordobas, how would we do it? 3. With the full population data, could we calculate the population mean, µ and variance, σx2¯ ? 3 EEP/IAS 118 - Introductory Applied Econometrics Spring 2015 Sylvan Herskowitz Section Handout 6 The Sample: Typically, we don’t have the full set of population data to work with. This is why we take (random) samples. We want to learn about the population and take our best guess at its mean, µ. To do this ¯ But we know that even this mean of a random we take a sample and calculate our sample mean, x. sample is itself a random variable. It will almost always be different from µ. So are we any better off than we were before? Yes. We know something about x¯ from the Central Limit Theorem: For a random sample of a variable { x1 , ..., x N }, the Central Limit Theorem tells us that for very large samples (large N), the sample average x¯ ∼ N (µ, σx2¯ ). ¯ before being drawn and realized, This is saying that our estimator for a sample average (x), is normally distributed with its mean centered at the true population mean, µ. This is sort of amazing. Even though our population distribution was very skewed, our distribution of sample means has the shape of a normal distribution. In simulated data, randomly drawing about 500 different samples from our full set of “population” data we saw that the distribution of sample means looks like this: The picture would improve even further if we continued drawing additional samples and calculating their means. Normally distributed variables are wonderful because their properties and shape are very well known. However, normal distributions are still tricky to work with, and it’s easier to standardize normally distributed variables so that they have a mean of 0 and a variance of 1. Remember our formula to find the expected value and variance of a transformed variable... If v is normally distributed with expected value E[v] and Var (v) = σv2 : h i v −E[ v ] v −E[ v ] E = Var = σv σv Since we’re interested in the distribution of x¯ (which is normal), we can standardize it just x¯ −µ like above so that: σx¯ ∼ N (0, 1). 4 EEP/IAS 118 - Introductory Applied Econometrics Spring 2015 4 Sylvan Herskowitz Section Handout 6 Confidence Intervals Using a standard normal distribution, we can now (finally) unlock a set of established tools with which to construct confidence intervals (and later conduct hypothesis tests). Confidence intervals use the randomness of our sample estimates to say something useful about where the true population parameter likely actually is. As mentioned, we know that the sample average x¯ ∼ N (µ, σx2¯ ) and, even better, N (0, 1). x¯ −µ σx¯ ∼ Now we can use what we know about the distribution of standard normal variables to help us say something meaningful about what the true population mean, µ X , might be: • We choose our confidence level and use our table to find the appropriate critical value, c. For a 95% confidence level we have a critical value of 1.96. • We thus know that for any standard normal variable v, 95% = Pr ( • We know that x¯ −µ X σx¯ ) is standard normal x¯ −µ But we’re not really interested in the variable σx¯ X . The whole point of this is to learn more about µ X ! So we need to do some manipulation of this to isolate µ X : This is great! We can say that with 95% confidence a randomly chosen interval will contain the true parameter, µ X . However, we must make two caviates/adjustments before we actually draw a sample, plug in numbers, and construct a confidence interval: 1. We don’t know the√true parameter, σv . To address this we have to estimate it with our sample standard error, s/ n 2. We can’t use the regular standard normal distribution because we don’t have the full population. Instead, we use the t-distribution table, our desired level of confidence, and the appropriate degrees of freedom to obtain our critical value. With these adjustments in place, we can now use our randomly chosen samples and calculated means to construct our confidence intervals. CIW = x¯ − cW s √ n , x¯ + cW s √ n The most important thing to remember about a confidence interval is that the is what’s random, not the . 5 5 Practice: Constructing and Interpreting a Confidence Interval Suppose I took a random sample of 121 UCB students’ heights in inches, and found that x¯ = 65 and s2 = 4. Now, I’d like to construct a 95% confidence interval for the average height of UCB students. Step 1. Determine the confidence level. Step 2. Compute your estimates of x¯ and s. Step 3. Find c from the t-table. The value of c will depend on both the sample size (n) and the confidence level (always use 2-Tailed for confidence intervals): • If our confidence level is 80% with a sample size of 10: c80 = • If our confidence level is 95%, with a sample size of 1000: c95 = In this problem with a sample of 121: c95 = Step 4. Plug everything into the formula and interpret. The formula for a W% confidence interval is: s s CIW = x¯ − cW √ , x¯ + cW √ n n Where cW is found by looking at the t-table for n − 1 degrees of freedom. So for our problem, we can plug everything in to get: CI95 = The 95% confidence interval is . This interval has a 95% chance of covering the true average height of the UCB student population. 6 EEP/IAS 118 - Introductory Applied Econometrics Spring 2015 Sylvan Herskowitz Section Handout 6 Practice Use the Stata output below to construct a 90% confidence interval for Michigan State University undergraduate GPA from a random sample of the MSU student body: Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------colGPA | 101 2.984 .3723103 2.2 4 1. Confidence level: 2. x¯ & s: 3. Find c90 : 4. Compute & Interpret interval: Interpretation: 7 EEP/IAS 118 - Introductory Applied Econometrics Spring 2015 6 Sylvan Herskowitz Section Handout 6 Hypothesis Testing Hypothesis testing is intimately tied to confidence intervals. When we interpret our confidence interval, we specify that there’s a 95% chance that the interval covers the true value. But then there’s only a 5% chance that it doesn’t—so after calculating a 95% confidence interval, you would be skeptical to hear that µ is equal to something above or below your interval. To see this, think about the above practice problem, except suppose (just for the moment) that we know the unknown true µ and Var ( x¯ ); so we know that the true MSU average GPA is 3.0 and Var ( x¯ ) = 0.0015. ¯ Below is plotted the distribution of x. Is there a high probability that a random sample would yield an x¯ = 2.984? It certainly looks like ¯ it, given what we know about the true distribution of x. Is there a high probability that a random sample would yield an x¯ = 3.1? Hypothesis testing reverses this scenario, but uses the exact same intuition. Suppose the dean of MSU firmly believes the true average GPA of her students is 3.1, and she’s convinced our random sample isn’t an accurate reflection of the caliber of MSU students. Suppose we decide to entertain her beliefs for the moment that the true mean GPA is 3.1. According to the dean, the distribution of x¯ looks like this: Should we be skeptical of the dean’s claim? 8