Supplement 13A: Partial F Test

Transcription

Supplement 13A: Partial F Test
Supplement 13A: Partial F Test
Purpose of the Partial F Test
For a given regression model, could some of the predictors be eliminated without sacrificing too much in
the way of fit? Conversely, would it be worthwhile to add a certain set of new predictors to a given
regression model? The partial F test is designed to answer questions such as these by comparing two
linear models for the same response variable. The extra sum of squares is used to measure the
marginal increase in the error sum of squares when one or more predictors are deleted from a
model. Conversely, the extra sum of squares measures the marginal reduction in the error sum
of squares when one or more predictors are added to a model.
Eliminating Some Predictors
We will start by showing how to assess the effect of eliminating some predictors from a model
that contains k predictors. The model containing all the predictors is called the full model:
(13A.1)
Y = 0 + 1X1 + 2X2 + … + kXk
A model with fewer predictors is a reduced model. We estimate the linear regression for each of
the two models, and then look at the error sum of squares (SSE) from the ANOVA table for each
model. We can use the following notation, assuming that m predictors were eliminated in the
reduced model:
Full model SSE:
Reduced model SSE:
Extra SSE:
SSEFull
SSEReduced
SSEReduced – SSEFull
 df Full = n–k–1
 df Reduced = n–k–1+m
 df=( n–k–1+m) – ( n–k–1) = m
The partial F test statistic is the ratio of two variances. The numerator is the difference in error
sums of squares (the “extra sum of squares”) between the two models, divided by the number of
predictors eliminated. The denominator is the mean squared error for the full model (SSEFull)
divided by its degrees of freedom.
(13A.2)
Fcalc
 SSEReduced  SSEFull 


m
  if m predictors are eliminated

 SSEFull 
 n  k 1 


Degrees of freedom for this test will then be (m, n–k–1). If only one predictor has been
eliminated, then m = 1. We can calculate the p-value for the partial F test using =F.DIST.RT(Fcalc,
m, n–k–1).
Illustration: Predicting Used Car Prices
CarPrice
Table 13A.1 shows a data set consisting of 40 observations on prices of used cars of a particular
brand and model (hence controlling for an obviously important factor that would affect prices).
The response variable is Y (SellPrice) = sale price of the vehicle (in thousands of dollars). We
have observations on three potential predictors: X1(Age) = age of car in years, X2 (Mileage) =
miles on odometer (in thousands of miles), X3 (ManTran) = 1 if manual transmission, 0
otherwise. The three predictors are viewed as non-stochastic, independent variables (we can later
investigate the latter assumption by looking at VIFs, if we wish).
TABLE 13A.1 Selling Price and Characteristics of 40 Used Cars
CarPrice
X1
X2
X3
Y
Age
Mileage
ManTran
SellPrice
13
148.599
0
0.370
2
17.367
0
29.810
13
174.904
0
0.390
…
…
…
…
10
145.886
0
11.210
8
93.22
0
12.270
5
75.907
0
19.260
Note: Only the first and last three observations are shown here. The units for SellPrice and
Mileage have been adjusted to thousands to improve data conditioning.
Eliminating a Single Predictor
Let us first test whether the single predictor ManTran could be eliminated to achieve a more
parsimonious model than using all three predictors. We are comparing two potential linear
regression models:
Full model:
Reduced model:
SellPrice = 0 + 1 Age + 2 Mileage + 3 ManTran
SellPrice = 0 + 1 Age + 2 Mileage
Here are the ANOVA tables from these two regressions, presented side-by-side:
Full Model ANOVA table
Source
SS
Regression 2,334.5984
Error
199.1586
Total
2,533.7570
df
3
36
39
MS
778.1995
5.5322
Reduced Model ANOVA table
Source
SS
df
Regression 2,314.3730
2
Error
219.3840 37
Total
2,533.7570 39
MS
1,157.1865
5.9293
The elimination of ManTran increases the sum of squared errors, as you would expect (you have
already learned that extra predictors can never decrease the R2.even if they are not significant).
Although the predictor ManTran is contributing something to the model’s overall explanatory
power (reduced SSE) the question remains whether ManTran is making a statistically significant
extra contribution. The calculations are:
Full model:
Reduced model:
Extra SSE:
SSEFull =199.1586
 df Full = n–k–1 = 40–3–1 = 36
SSEReduced=219.3840  df Reduced = n–k–1+m = 40–3–1+1 = 37
SSEReduced – SSEFull  df=(n–k–1+m) – (n–k–1) = 1
Fcalc
 SSEReduced  SSEFull   219.3840  199.1586 

 
 20.2254
m
1



 3.6559
5.5322
 SSEFull 
 199.1586 


 n  k 1 
36




From Excel, we obtain the p-value =F.DIST.RT(3.65559,1,36) = .0639. Therefore, if we are using α
= .05, we would say that the extra sum of squares is not significant (i.e., ManTran does not
make a significant marginal contribution). Instead of using the p-value, we could reach the same
conclusion by comparing Fcalc = 3.6559 with F.05(1,36) =F.INV.RT(0.05,1,36) = 4.114 to draw the
same conclusion. In effect, the hypotheses we are testing are:
H0: 3 = 0
H1: 3 ≠ 0
The test statistic is not far enough from zero to reject the hypothesis H0: 3 = 0. You may already
have realized that if we are only considering the effect of one single predictor, we could reach
the same conclusion from its t-statistic in the fitted regression of the full model:
Regression output
Variables
Coefficients
Intercept
33.7261
Age
-1.6630
Mileage
-0.0584
ManTran?
-1.6538
Std. Error
0.9994
0.2938
0.0224
0.8650
t (df=36)
33.747
-5.660
-2.610
-1.912
p-value
7.60E-29
1.98E-06
.0131
.0639
In the single predictor case, the partial F test statistic is equal to the square of the corresponding t
test statistic in the full model. The t-test uses the same degrees of freedom as the denominator of
the partial F test, so the p-values will be the same as long as we use a two-tailed t-test (to
eliminate the sign so that rejection in either tail could occur):
Predictor ManTran:
Excel’s p-value:
tcalc2 = (-1.912)2 = 3.656
=T.DIST.2T(1.912,36) = .0639
In effect, the hypotheses we are testing are:
H0: 3 = 0
H1: 3 ≠ 0
In the case of a single predictor, we could get by without using the partial F test. It is shown here
because it illustrates the test in a simple way, and reveals the connection between F and t
distributions. An advantage of the t-test is that it could also be used to test a one-sided hypothesis
(e.g., H1: 3 < 0) which might be relevant in the case of this example (all our predictors seem to
have an inverse relationship with a car’s selling price).
Eliminating More Than One Predictor
We now turn to the more general case of using the partial F test to assess the effect of
eliminating m predictors simultaneously (where m > 1). This can be especially useful when we
have a large model with many predictors that we are thinking of eliminating because their effects
seem to be weak in the full model. To test the effects of discarding m predictors at once, the
hypotheses are:
H0: All the j = 0 for a subset of m predictors in the full model
H1: Not all the j = 0 (at least some of the m coefficients are non-zero)
For example, suppose we want to know whether we can eliminate both Mileage and ManTran at
once. The hypotheses are:
H0: 2 = 0 and 3 = 0
H1: One or both coefficients are non-zero
The models to be compared are:
Full model:
Reduced model:
SellPrice = 0 + 1 Age + 2 Mileage + 3 ManTran
SellPrice = 0 + 1 Age
Here are the ANOVA tables from these two regressions, presented side-by-side:
Full Model ANOVA table
Source
SS
Regression 2,334.5984
Error
199.1586
Total
2,533.7570
df
3
36
39
MS
778.1995
5.5322
Reduced Model ANOVA table
Source
SS
df
Regression 2,269.8421
1
Residual
263.9148 38
Total
2,533.7570 39
MS
2,269.8421
6.9451
The elimination of both Mileage and ManTran increases the sum of squared errors, as you
would. The question is whether these two predictors are making a statistically significant extra
contribution to reducing the sum of squared errors. The calculations are:
Full model:
Reduced model:
Extra SSE:
Fcalc
SSEFull =199.1586
 df Full = n–k–1 = 40–3–1 = 36
SSEReduced=263.9148  df Reduced = n–k–1+m = 40–3–2+2 = 38
SSEReduced – SSEFull  df=(n–k–1+m) – (n–k–1) = m = 2
 263.9148  199.1586 

 64.7562
2



 11.7053
5.5322
 199.1586 


36


From Excel, we obtain the p-value =F.DIST.RT(11.7053,2,36) = .0001. If we are using α = .05, we
would say that the extra sum of squares is highly significant (i.e., these two predictors do make a
significant marginal contribution). Alternatively, we can compare Fcalc = 11.705 with F.05(2,26)
=F.INV.RT(0.05,2,36) = 3.259 to draw the same conclusion.
Adding Predictors
We have been discussing eliminating predictors. The calculations for adding predictors to a
linear model are similar if we define the “full” model as the “big” model (more predictors) and
the “reduced” model as the “small” model (fewer predictors). The “extra sum of squares” is still
the difference between the two sums of squares:
(13A.3)
Fcalc
 SSE for big model  SSE for small model 


Number of extra predictors



 SSE for big model 


n  k 1


More Complex Models
We can use variations on these partial F tests based on error sums of squares for other purposes.
For example, we can test whether two coefficients in a model are the same (e.g., 2 = 3) or to
calculate the effects of any given predictor given the presence of other sets of predictors in the
model (using coefficient of partial determination). Such tests are ordinarily reserved for more
advanced classes in statistics, and may entail using more specialized software.
Full Results for Car Data
CarPrice
To allow you to explore the car data on your own, full results are shown below for the full model
based on the used car data. SellPrice is negatively affected by Age and Mileage (both highly
significant) and marginally so by ManTran (p-value significant at α = .10 but not at α = .05. You
can also look at the data file and do your own regressions.
Regression Analysis
R²
Adjusted R²
R
Std. Error
ANOVA table
Source
Regression
Residual
Total
SS
2,334.5984
199.1586
2,533.7570
0.921
0.915
0.960
2.352
n 40
k 3
Dep. Var. SellPrice
df
3
36
39
Regression output
Variables
Coefficients Std. Error
Intercept
33.7261
0.9994
Age
-1.6630
0.2938
Mileage
-0.0584
0.0224
ManTran?
-1.6538
0.8650
MS
778.1995
5.5322
F
140.67
t (df=36)
33.747
-5.660
-2.610
-1.912
p-value
7.60E-29
1.98E-06
.0131
.0639
p-value
6.17E-20
confidence interval
95% lower 95% upper
31.6993
35.7530
-2.2589
-1.0671
-0.1038
-0.0130
-3.4081
0.1004
VIF
6.384
6.371
1.014
It appears that as a car ages, it loses about $1,663 in value per year (ceteris paribus). Similarly,
for each extra mile driven, a car loses on average about $58. Cars with manual transmission
seem to sell for about $1,654 less than those with automatic transmission (remember, the brand
and model are controlled already). There is evidence of multicollinearity between Age and
Mileage, which would be expected (as cars get older, they accumulate more miles). This would
require further consideration, by the analyst.
Section Exercises
13A.1 Instructions: Use α = .05 in all tests. (a) Perform a full linear regression to predict
ColGrad% using all eight predictors in DATA SET E shown here. State the SSE and df for
the full model. (b) Fit a reduced linear regression model by eliminating predictor Age.
State the SSE and df for the reduced model. (c) Calculate the partial F test statistic to see
whether predictor Age was significant. (d) Calculate the p-value for the partial F-test.
What is your conclusion? (e) Does your conclusion from the partial F test agree with the
test using the t-statistic in the full model regression? (e) Fit a reduced regression model
by eliminating two predictors Age and Seast simultaneously. State the SSE and df for the
reduced model. (f) Calculate the partial F test statistic to see whether predictors Age and
Seast can both be eliminated. State your conclusion.
References
Kutner, Michael H.; Christopher J. Nachtsheim; and John Neter. Applied Linear Regression
Models. 4th ed. McGraw-Hill/Irwin, 2004, pp. 256-271.