Handout

Transcription

Handout

EEP/IAS 118 - Introductory Applied Econometrics
Spring 2015
1
Sylvan Herskowitz
Section Handout 6
Warm-up 1: OVB Practice
A researcher is interested in studying the relationship between smoking and life expectancy. To
do this, she estimates the following regression model:
years = β 0 + β˜1 packsperweek + β˜2 f emale + u
In particular, she is interested in β˜1 which is estimate for the number of packs of cigarettes
an individual smoked per week on average. Years is the number of years that a person in the data
set lived.
1. What sign do you expect β˜1 to have?
2. What omitted variables do you think may be biasing β˜1 ?
3. What two correlations do we need to know in order to sign this bias?
4. From this omitted variable, do we expect that our estimate for β˜1 is too high or too low?
5. Given our intuition about the direction of the bias on β˜1 , if β˜1 = −1.7, which true value for
β 1 do we think would be more likely: -1.5 or -1.9? Why?
1
Spring 2015
2
Sylvan Herskowitz
Section Handout 6
Final Example: OVB in Action
In this section, I use the wage data (WAGE1.dta) from your textbook to demonstrate the evils of
omitted variable bias and show you that the OVB formula works. Let’s pretend that this sample
of 500 people is our whole population of interest, so that when we run our regressions, we are
actually revealing the true parameters instead of just estimates. We’re interested in the relationship
between wages and gender, and our “omitted” variable will be tenure (how long the person has
been at his/her job). Suppose our population model is:
(1)
log(wage)i = β 0 + β 1 f emalei + β 2 tenurei + ui
First let’s look at the correlations between our variables and see if we can predict how omitting
tenure will bias βˆ 1 :
. corr lwage female tenure
|
lwage
female
tenure
-------------+--------------------------lwage |
1.0000
female | -0.3737
1.0000
tenure |
0.3255 -0.1979
1.0000
If we ran the regression:
(2)
log(wage)i = β˜ 0 + β˜ 1 f emalei + ei
...then the information above tells us that β˜ 1
β 1 . Let’s see if we were right. Imagine we ran
the regressions in Stata (we did) and we get the below results for our two models:
(1)
(2)
log(wage)i = 1.6888 − 0.3421 f emalei + 0.0192tenurei + ui
log(wage)i = 1.8136 − 0.3972 f emalei + ei
From these results we now “know” that β 1 =
and β˜ 1 =
.
This means that our BIAS is equal to:
There’s one more parameter missing from our OVB formula. What regression do we have
to run to find its value?
The Stata results give us:
Now we can plug all of our parameters into the bias formula to check that it in fact gives us
the bias from leaving out tenure from our wage regression:
β˜ 1 = E[ βˆ˜ 1 ]
= β 1 + β 2 ρ1
=
=
2
Spring 2015
3
Sylvan Herskowitz
Section Handout 6
Switching Gears: Big Picture - Hypothesis Testing
In the last few lectures this class has changed gears. We went from talking about assumptions for
unbiased estimates, regression results, and R2 to suddenly reverting back to basic statistics and
hypothesis testing. Why?
The reason is because up until now, we haven’t really needed to use statistics to calculate
a fitted regression line to our data. But now we want to start making claims such as “with 95%
confidence we can reject the null hypothesis that income is not related to education” (and to actually know where this comes from and why it is a valid claim). In less technical phrasing, we
want to be clear how much we trust our results. Just because OLS gives us the best fit for our
model to the data, doesn’t mean that we should believe a given slope is necessarily capturing a
true relationship.
We need to use statistics to tell us how much confidence we have that our results aren’t
just random noise. This is what we are now building towards, but it requires us to step back from
interpreting regression results to think about distributions, sampling distributions, and confidence
intervals.
The Population:
For now, lets forget about regression coefficients relating one variable to another and think of a
much simpler random variable: the sample mean.
First, in lecture we saw that in Nicaragua we may have a true population distribution of
incomes such as this:
Because any individual person’s income could be randomly drawn from this distribution,
income here can be thought of as a random variable, X.
Check:
1. What is the total area under the illustrated kernel density?
2. If we wanted to calculate the likelihood of randomly drawing an income below 10,000 Cordobas, how would we do it?
3. With the full population data, could we calculate the population mean, µ and variance, σx2¯ ?
3
Spring 2015
Sylvan Herskowitz
Section Handout 6
The Sample:
Typically, we don’t have the full set of population data to work with. This is why we take (random)
samples. We want to learn about the population and take our best guess at its mean, µ. To do this
¯ But we know that even this mean of a random
we take a sample and calculate our sample mean, x.
sample is itself a random variable. It will almost always be different from µ.
So are we any better off than we were before? Yes.
We know something about x¯ from the Central Limit Theorem: For a random sample of a
variable { x1 , ..., x N }, the Central Limit Theorem tells us that for very large samples (large N), the
sample average x¯ ∼ N (µ, σx2¯ ).
¯ before being drawn and realized,
This is saying that our estimator for a sample average (x),
is normally distributed with its mean centered at the true population mean, µ. This is sort of
amazing. Even though our population distribution was very skewed, our distribution of sample
means has the shape of a normal distribution. In simulated data, randomly drawing about 500
different samples from our full set of “population” data we saw that the distribution of sample
means looks like this:
The picture would improve even further if we continued drawing additional samples and
calculating their means. Normally distributed variables are wonderful because their properties
and shape are very well known.
However, normal distributions are still tricky to work with, and it’s easier to standardize
normally distributed variables so that they have a mean of 0 and a variance of 1. Remember
our formula to find the expected value and variance of a transformed variable... If v is normally
distributed with expected value E[v] and Var (v) = σv2 :
h
i
v −E[ v ]
v −E[ v ]
E
=
Var
=
σv
σv
Since we’re interested in the distribution of x¯ (which is normal), we can standardize it just
x¯ −µ
like above so that: σx¯ ∼ N (0, 1).
4
Spring 2015
4
Sylvan Herskowitz
Section Handout 6
Confidence Intervals
Using a standard normal distribution, we can now (finally) unlock a set of established tools with
which to construct confidence intervals (and later conduct hypothesis tests).
Confidence intervals use the randomness of our sample estimates to say something useful
about where the true population parameter likely actually is.
As mentioned, we know that the sample average x¯ ∼ N (µ, σx2¯ ) and, even better,
N (0, 1).
x¯ −µ
σx¯
∼
Now we can use what we know about the distribution of standard normal variables to help
us say something meaningful about what the true population mean, µ X , might be:
• We choose our confidence level and use our table to find the appropriate critical value, c. For
a 95% confidence level we have a critical value of 1.96.
• We thus know that for any standard normal variable v, 95% = Pr (
• We know that
x¯ −µ X
σx¯
)
is standard normal
x¯ −µ
But we’re not really interested in the variable σx¯ X . The whole point of this is to learn more about
µ X ! So we need to do some manipulation of this to isolate µ X :
This is great! We can say that with 95% confidence a randomly chosen interval will contain
the true parameter, µ X . However, we must make two caviates/adjustments before we actually
draw a sample, plug in numbers, and construct a confidence interval:
1. We don’t know the√true parameter, σv . To address this we have to estimate it with our sample
standard error, s/ n
2. We can’t use the regular standard normal distribution because we don’t have the full population. Instead, we use the t-distribution table, our desired level of confidence, and the
appropriate degrees of freedom to obtain our critical value.
With these adjustments in place, we can now use our randomly chosen samples and calculated means to construct our confidence intervals.
CIW = x¯ − cW
s
√
n
, x¯ + cW
s
√
n
The most important thing to remember about a confidence interval is that the
is what’s random, not the
.
5
5
Practice: Constructing and Interpreting a Confidence Interval
Suppose I took a random sample of 121 UCB students’ heights in inches, and found that x¯ = 65
and s2 = 4. Now, I’d like to construct a 95% confidence interval for the average height of UCB
students.
Step 1. Determine the confidence level.
Step 2. Compute your estimates of x¯ and s.
Step 3. Find c from the t-table.
The value of c will depend on both the sample size (n) and the confidence level (always use
2-Tailed for confidence intervals):
• If our confidence level is 80% with a sample size of 10: c80 =
• If our confidence level is 95%, with a sample size of 1000: c95 =
In this problem with a sample of 121: c95 =
Step 4. Plug everything into the formula and interpret.
The formula for a W% confidence interval is:
s
s
CIW =
x¯ − cW √
, x¯ + cW √
n
n
Where cW is found by looking at the t-table for n − 1 degrees of freedom. So for our problem,
we can plug everything in to get:
CI95 =
The 95% confidence interval is
. This interval has a 95% chance of covering the true average height of the UCB student population.
6
Spring 2015
Sylvan Herskowitz
Section Handout 6
Practice
Use the Stata output below to construct a 90% confidence interval for Michigan State University undergraduate GPA from a random sample of the MSU student body:
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------colGPA |
101
2.984
.3723103
2.2
4
1. Confidence level:
2. x¯ & s:
3. Find c90 :
4. Compute & Interpret interval:
Interpretation:
7
Spring 2015
6
Sylvan Herskowitz
Section Handout 6
Hypothesis Testing
Hypothesis testing is intimately tied to confidence intervals. When we interpret our confidence interval, we specify that there’s a 95% chance that the interval covers the true value. But then there’s
only a 5% chance that it doesn’t—so after calculating a 95% confidence interval, you would be
skeptical to hear that µ is equal to something above or below your interval. To see this, think
about the above practice problem, except suppose (just for the moment) that we know the unknown true µ and Var ( x¯ ); so we know that the true MSU average GPA is 3.0 and Var ( x¯ ) = 0.0015.
¯
Below is plotted the distribution of x.
Is there a high probability that a random sample would yield an x¯ = 2.984? It certainly looks like
¯
it, given what we know about the true distribution of x.
Is there a high probability that a random sample would yield an x¯ = 3.1?
Hypothesis testing reverses this scenario, but uses the exact same intuition. Suppose the
dean of MSU firmly believes the true average GPA of her students is 3.1, and she’s convinced our
random sample isn’t an accurate reflection of the caliber of MSU students. Suppose we decide
to entertain her beliefs for the moment that the true mean GPA is 3.1. According to the dean, the
distribution of x¯ looks like this:
Should we be skeptical of the dean’s claim?
8

Handout

Transcription

Similar documents

1. Data Types

CALCULUS I Worksheet #69 1. Find all inflection points for f(x) = 2x3

Statistics 512 Divisions 1 and 4, Spring 2014 Sample Midterm I

MIDTERM REVIEW SHEET

Small-Sample C.I.s for one- sample, two-sample and (matched) paired data

Activity - The IRIS Center

SAMPLE MEANS 8/4/2014

How to Capture More of the Math Market

Review SHORT ANSWER. Write the word or phrase that best

Chapter 7 μ Confidence Intervals and Sample Size