STAC67H3: Regression Analysis Fall, 2014 Instructor: Jabed Tomal October 30, 2014

Transcription

STAC67H3: Regression Analysis Fall, 2014 Instructor: Jabed Tomal October 30, 2014
STAC67H3: Regression Analysis
Fall, 2014
Instructor: Jabed Tomal
Department of Computer and Mathematical Sciences
University of Toronto Scarborough
Toronto, ON
Canada
October 30, 2014
Jabed Tomal (U of T)
Regression Analysis
October 30, 2014
1 / 52
Multiple Regression I
First-Order Model with Two Predictor Variables
A first-order (linear in the predictor variables) regression model with
two predictor variables is as follows:
Yi = β0 + β1 Xi1 + β2 Xi2 + i ; i = 1, 2, · · · , n,
where, Yi is the response in the ith trial, Xi1 and Xi2 are the values of
the two predictor variables in the ith trial. The parameters of the
regression model are β0 , β1 , and β2 , and the error term is i .
Assuming E{i } = 0, the regression function for the above model is
E{Y } = β0 + β1 X1 + β2 X2 .
Jabed Tomal (U of T)
Regression Analysis
October 30, 2014
2 / 52
Multiple Regression I
First-Order Model with Two Predictor Variables
Meaning of Regression Coefficients. The parameter β1 indicates the
change in the mean response E{Y } per unit increase in X1 when X2 is
held constant. Likewise, the parameter β2 indicates the change in the
mean response E{Y } per unit increase in X2 when X1 is held constant.
Jabed Tomal (U of T)
Regression Analysis
October 30, 2014
3 / 52
Multiple Regression I
First-Order Model with More than Two Predictor Variables
The regression model with p − 1 predictor variables X1 , X2 , · · · , Xp−1 is
Yi = β0 + β1 Xi1 + β2 Xi2 + · · · + βp−1 Xi,p−1 + i ; i = 1, 2, · · · , n,
is called a first-order model with p − 1 predictor variables.
This model can also be written as
Yi = β0 +
p−1
X
βk Xik + i ; i = 1, 2, · · · , n.
k =1
or, if we let Xi0 ≡ 1, the above model can be written as
Yi =
p−1
X
βk Xik + i ; i = 1, 2, · · · , n.
k =0
Jabed Tomal (U of T)
Regression Analysis
October 30, 2014
4 / 52
Multiple Regression I
First-Order Model with More than Two Predictor Variables
Assuming that E{i } = 0, the regression function is written as
E{Y } = β0 + β1 X1 + β2 X2 + · · · + βp−1 Xp−1 .
Meaning of Regression Coefficients. The parameter βk indicates
the change in the mean response E{Y } with a unit increase in the
predictor variable Xk , when all other predictor variables in the
regression model are held constant.
Jabed Tomal (U of T)
Regression Analysis
October 30, 2014
5 / 52
Multiple Regression I
General Linear Regression Model
The general linear regression model, with normal error terms, is
defined as
Yi = β0 + β1 Xi1 + β2 Xi2 + · · · + βp−1 Xi,p−1 + i ; i = 1, 2, · · · , n
where:
β0 , β1 , · · · , βp−1 are parameters
Xi1 , Xi2 , · · · , Xi,p−1 are known constants
i are independent N(0, σ 2 ).
Jabed Tomal (U of T)
Regression Analysis
October 30, 2014
6 / 52
Multiple Regression I
General Linear Regression Model
Letting X0i ≡ 1, the general linear regression model is written as
Yi =
p−1
X
βk Xik + i
k =0
Since, E{i } = 0, the regression function is written as
E{Y } = β0 + β1 X1 + β2 X2 + · · · + βp−1 Xp−1 .
Jabed Tomal (U of T)
Regression Analysis
October 30, 2014
7 / 52
Multiple Regression I
General Linear Regression Model
The general linear regression model with normal error terms implies
that the observations Yi are independent normal variables, with mean
E{Yi } = β0 + β1 Xi1 + β2 Xi2 + · · · + βp−1 Xi,p−1
and constant variance σ 2 .
Jabed Tomal (U of T)
Regression Analysis
October 30, 2014
8 / 52
Multiple Regression I
General Linear Regression Model
The general linear regression model encompasses a vast variety of
situations.
1. p − 1 Predictor Variables. When X1 , X2 , · · · , Xp−1 represent p − 1
different predictor variables, the general linear regression model is a
first-order model in which there are no interaction effects between the
predictor variables.
Jabed Tomal (U of T)
Regression Analysis
October 30, 2014
9 / 52
Multiple Regression I
General Linear Regression Model
2. Qualitative Predictor Variables. The general linear regression
model includes not only quantitative predictor variables but also
qualitative predictor variables.
Jabed Tomal (U of T)
Regression Analysis
October 30, 2014
10 / 52
Multiple Regression I
General Linear Regression Model
2. Qualitative Predictor Variables.
Consider the first-order regression model
Yi = β0 + β1 Xi1 + β2 Xi2 + i
where:
Xi1 = patient’s age
Xi2 =
1 if patient is female
0 if patient is male
The response function of the regression model is
E{Y } = β0 + β1 X1 + β2 X2 .
Jabed Tomal (U of T)
Regression Analysis
October 30, 2014
11 / 52
Multiple Regression I
General Linear Regression Model
2. Qualitative Predictor Variables.
For male patients, X2 = 0 and response function becomes:
E{Y } = β0 + β1 X1 .
For female patients, X2 = 1 and response function becomes:
E{Y } = (β0 + β2 ) + β1 X1 .
The two response functions represent parallel straight lines with
different intercepts.
In general, we represent a qualitative variable with c classes by
means of c − 1 indicator variables.
Jabed Tomal (U of T)
Regression Analysis
October 30, 2014
12 / 52
Multiple Regression I
General Linear Regression Model
3. Polynomial Regression.
Polynomial regression models contain squared and higher-order
terms of the predictor variable(s).
An example of polynomial regression model with one predictor is
as follows
Yi = β0 + β1 Xi + β2 Xi2 + i .
Jabed Tomal (U of T)
Regression Analysis
October 30, 2014
13 / 52
Multiple Regression I
General Linear Regression Model
4. Transformed Variables.
Models with transformed variables are special cases of the
general linear regression model.
Consider the following model with a transformed Y variable
log Yi = β0 + β1 Xi1 + β2 Xi2 + β3 Xi3 + i .
Letting Yi0 = log Yi , we get the following model
Yi0 = β0 + β1 Xi1 + β2 Xi2 + β3 Xi3 + i .
which is in the form of general linear regression model.
Jabed Tomal (U of T)
Regression Analysis
October 30, 2014
14 / 52
Multiple Regression I
General Linear Regression Model
5. Interaction Effects.
The general linear regression model includes models with
interaction effects (the effect of one predictor variable depends on
the levels of other predictor variables).
An example with two predictor variables X1 and X2 is
Yi = β0 + β1 Xi1 + β2 Xi2 + β3 Xi1 Xi2 + i .
Letting Xi3 = Xi1 Xi2 , we get the following model
Yi = β0 + β1 Xi1 + β2 Xi2 + β3 Xi3 + i .
which is in the form of general linear regression model.
Jabed Tomal (U of T)
Regression Analysis
October 30, 2014
15 / 52
Multiple Regression I
General Linear Regression Model
6. Combination of Cases.
A general linear regression model may combine several of the
elements we have just noted.
Consider the following regression model
2
2
Yi = β0 + β1 Xi1 + β2 Xi1
+ β3 Xi2 + β4 Xi2
+ β5 Xi1 Xi2 + i .
2 , Z = X , Z = X 2 , Z = X X , we
Letting Zi1 = Xi1 , Zi2 = Xi1
i3
i2
i4
i5
i1 i2
i2
get the following model
Yi = β0 + β1 Zi1 + β2 Zi2 + β3 Zi3 + β4 Zi4 + β5 Zi5 + i .
which is in the form of general linear regression model.
Jabed Tomal (U of T)
Regression Analysis
October 30, 2014
16 / 52
Multiple Regression I
General Linear Regression Model
Meaning of Linear in General Linear Regression Model. The term
linear model refers to the fact the general linear regression model is
linear in the parameters; it does not refer to the shape of the response
surface.
A regression model is linear in the parameters when it can written
in the form:
Yi = β0 ci0 + β1 ci1 + β2 ci2 + · · · + βp−1 ci,p−1 + i .
where the terms ci0 , ci1 , etc., are coefficients involving the
predictor variables.
Jabed Tomal (U of T)
Regression Analysis
October 30, 2014
17 / 52
Multiple Regression I
General Linear Regression Model in Matrix Terms:
We write the general linear regression model in matrix terms as
following
Y = X
n×1
β + n×p p×1
n×1
where
The vector of response is


Y1
Y2 
 
Y = . 
n×1
 .. 
Yn
Jabed Tomal (U of T)
Regression Analysis
October 30, 2014
18 / 52
Multiple Regression I
General Linear Regression Model in Matrix Terms:
The matrix of constants is


1 X11 X12 · · · X1,p−1
1 X21 X22 · · · X2,p−1 


X = .
..
..
.. 
.
.
.
n×p
.
.
.
.
. 
1 Xn1 Xn2 · · · Xn,p−1
The vector of parameters is

β0
β1
β2
..
.







β =



p×1


βp−1
Jabed Tomal (U of T)
Regression Analysis
October 30, 2014
19 / 52
Multiple Regression I
General Linear Regression Model in Matrix Terms:
The error vector is
 
1
2 
 
=.
n×1
 .. 
n
which contains independent normal random variables with
expectation
E{} = 0
Jabed Tomal (U of T)
Regression Analysis
October 30, 2014
20 / 52
Multiple Regression I
General Linear Regression Model in Matrix Terms:
and variance-covariance matrix:

 2
σ
0 0 ··· 0
 0 σ2 0 · · · 0 


 0 0 σ2 · · · 0 
Var{} = 
 = σ2 I
n×n

.
..
.. . .
.
n×n
 ..
. .. 
.
.
0 0 0 · · · σ2
Jabed Tomal (U of T)
Regression Analysis
October 30, 2014
21 / 52
Multiple Regression I
General Linear Regression Model in Matrix Terms:
The random vector Y has expectation:
E(Y) = Xβ
n×1
n×1
and the variance-covariance matrix of Y is the same as that of Var{Y} = σ 2 I
n×n
Jabed Tomal (U of T)
Regression Analysis
n×n
October 30, 2014
22 / 52
Multiple Regression I
Estimation of Regression Coefficients:
The least squares criterion for the general linear regression model
is
Q = (Y − Xβ)T (Y − Xβ)
The least squares estimators are those values of β0 , β1 , · · · , βp−1
that minimize Q.
The least squares normal equations for the general linear
regression model are:
XT Xb = XT Y
Jabed Tomal (U of T)
Regression Analysis
October 30, 2014
23 / 52
Multiple Regression I
Estimation of Regression Coefficients:
and the least squares estimators are:


b0
 b1 
 
−1  b2 
XT X Y
b =
 = XT X
 .. 
p×1
p×p
p×p
 . 
bp−1
Jabed Tomal (U of T)
Regression Analysis
October 30, 2014
24 / 52
Multiple Regression I
Estimation of Regression Coefficients:
The maximum likelihood estimator of β can be obtained by
maximizing the following likelihood function with respect to β
1
1
T
2
L(β, σ ) =
exp − 2 (Y − Xβ) (Y − Xβ)
2σ
(2πσ 2 )n/2
The maximum likelihood estimator of β is the same as the least
squares estimator of β:
−1 ˆ = XT X
β
XT X Y
p×1
Jabed Tomal (U of T)
p×p
Regression Analysis
p×p
October 30, 2014
25 / 52
Multiple Regression I
Fitted Values and Residuals
ˆi be denoted by Y:
ˆ
Let the vector of the fitted values Y
ˆ 
Y1
Y
ˆ2 

ˆ =
Y
 . 
n×1
 .. 
ˆn
Y
ˆi be denoted by e:
and the vector of the residual terms ei = Yi − Y
 
e1
e2 

ˆ =
e
 .. 
.
n×1
en
Jabed Tomal (U of T)
Regression Analysis
October 30, 2014
26 / 52
Multiple Regression I
Fitted Values and Residuals
In matrix notation, we have:
ˆ = X
Y
n×1
b
n×p p×1
and
ˆ = Y − X
e = Y − Y
n×1
Jabed Tomal (U of T)
n×1
n×1
n×1
Regression Analysis
b
n×p p×1
October 30, 2014
27 / 52
Multiple Regression I
Fitted Values and Residuals
ˆ can be expressed in terms of the hat
The vector of fitted values Y
matrix H as follows:
ˆ = X X0 X
Y
−1
X0 Y
or, equivalently:
ˆ = H
Y
Y
H = X X0 X
−1
n×1
n×n n×1
where:
n×n
Jabed Tomal (U of T)
Regression Analysis
X0
October 30, 2014
28 / 52
Multiple Regression I
Fitted Values and Residuals
The vector of residuals can be expressed as:
e = (I − H) Y
n×1
Jabed Tomal (U of T)
Regression Analysis
October 30, 2014
29 / 52
Multiple Regression I
Fitted Values and Residuals
The variance-covariance matrix of the residuals is:
σ 2 {e} = σ 2 (I − H)
n×n
which is estimated by:
s2 {e} = MSE (I − H)
n×n
Jabed Tomal (U of T)
Regression Analysis
October 30, 2014
30 / 52
Multiple Regression I
Analysis of Variance
The sum of squares for the analysis of variance in matrix terms are
1
1
SST = Y0 Y −
Y0 JY = Y0 I −
J Y
n
n
SSE = e0 e = (Y − Xb)0 (Y − Xb) = Y0 Y − b0 X0 Y = Y0 [I − H] Y
1
1
0
0
SSR = b X Y −
Y JY = Y H −
J Y
n
n
0 0
where
Jabed Tomal (U of T)


1 1 ··· 1
1 1 · · · 1


J = . . .
.. 
.
.
.
n×n
. .
. .
1 1 ··· 1
Regression Analysis
October 30, 2014
31 / 52
Multiple Regression I
Analysis of Variance
Table: ANOVA Table for General Linear Regression Model.
Source of
Variation
Regression
Error
Total
Jabed Tomal (U of T)
SS
SSR = b0 X0 Y −
1
n
Y0 JY
SSE = Y0 Y − b0 X0 Y
SST = Y0 Y −
1
n
Y0 JY
Regression Analysis
df
p−1
MS
MSR = SSR
p−1
n−p
MSE =
SSR
n−p
n−1
October 30, 2014
32 / 52
Multiple Regression I
F Test for Regression Relation
We set the following hypotheses to test whether there is a regression
relation between the response variable Y and the set of X variables
X1 , X2 , · · · , Xp−1 :
H0 : β1 = β2 = · · · = βp−1 = 0
versus
HA : not all βk (k = 1, 2, · · · , p − 1) equal zero.
We use the following test statistics
MSR
.
MSE
Reject H0 at α level of significance if
F∗ =
F ∗ > F (1 − α; p − 1, n − p).
Jabed Tomal (U of T)
Regression Analysis
October 30, 2014
33 / 52
Multiple Regression I
Coefficient of Multiple Determination
The coefficient of multiple determination, denoted by R 2 , is defined as
R2 =
SSR
SSE
=1−
SST
SST
which ranges from 0 to 1. R 2 assumes the value 0 when all
bk = 0(k = 1, 2, · · · , p − 1), and the value 1 when all Y observations
ˆi for all i.
fall directly on the fitted regression surface, i.e., when Yi = Y
The adjusted coefficient of multiple determination, denoted by Ra2 ,
adjusts R 2 by dividing each sum of squares by its associated degrees
of freedom:
n − 1 SSE
2
Ra = 1 −
n − p SST
Jabed Tomal (U of T)
Regression Analysis
October 30, 2014
34 / 52
Multiple Regression I
Inferences about Regression Parameters
The least squares and maximum likelihood estimators in b are
unbiased:
E{b} = β
The variance-covariance matrix of b:

σ 2 {b0 }
σ{b0 , b1 }
 σ{b1 , b0 }
σ 2 {b1 }

σ 2 {b} = 
..
..

p×p
.
.

· · · σ{b0 , bp−1 }
· · · σ{b1 , bp−1 }


..
..

.
.
2
σ{bp−1 , b0 } σ{bp−1 , b1 } · · · σ {bp−1 }
Jabed Tomal (U of T)
Regression Analysis
October 30, 2014
35 / 52
Multiple Regression I
Inferences about Regression Parameters
In short:
σ 2 {b} = σ 2 X0 X
−1
p×p
Jabed Tomal (U of T)
Regression Analysis
October 30, 2014
36 / 52
Multiple Regression I
Inferences about Regression Parameters
The estimated variance-covariance matrix of b:


s2 {b0 }
s{b0 , b1 } · · · s{b0 , bp−1 }
 s{b1 , b0 }
s2 {b1 }
· · · s{b1 , bp−1 }


2
s {b} = 

..
..
..
.
.


.
p×p
.
.
.
2
s{bp−1 , b0 } s{bp−1 , b1 } · · · s {bp−1 }
is given by:
−1
s2 {b} = MSE X0 X
p×p
Jabed Tomal (U of T)
Regression Analysis
October 30, 2014
37 / 52
Multiple Regression I
Tests for βk
The null and alternative hypotheses are:
H0 : βk = 0
versus
HA : βk 6= 0
The test statistic is:
t∗ =
bk
s{bk }
Decision Rule: Reject H0 at α level of significance if
|t ∗ | > t(1 − α/2; n − p).
Jabed Tomal (U of T)
Regression Analysis
October 30, 2014
38 / 52
Multiple Regression I
Interval Estimation of βk
For the normal error general linear regression model
bk − βk
∼ t(n − p) ; k = 0, 1, · · · , p − 1
s{bk }
Hence, the 100(1 − α)% confidence interval is:
bk ± t(1 − α/2; n − p)s{bk }
Jabed Tomal (U of T)
Regression Analysis
October 30, 2014
39 / 52
Multiple Regression I
Joint Inferences
If g parameters are to be estimated jointly (where g ≤ p), the
confidence limits with family confidence coefficient 1 − α are:
bk ± B s{bk }
where
B = t(1 − α/2g; n − p)
Jabed Tomal (U of T)
Regression Analysis
October 30, 2014
40 / 52
Multiple Regression I
Interval Estimation of E{Yh }
To estimate the mean response at Xh1 , Xh2 , · · · , Xh,p−1 , let us
define the vector:


1
 Xh1 


 Xh2 
Xh = 

 .. 
p×1
 . 
Xh,p−1
The mean response to be estimated is:
E{Yh } = X0h β
The estimated mean response corresponding to Xh is
ˆ h = X0 b
Y
h
Jabed Tomal (U of T)
Regression Analysis
October 30, 2014
41 / 52
Multiple Regression I
Interval Estimation of E{Yh }
The estimator is unbiased:
ˆ h } = X0 β = E{Yh }
E{Y
h
and its variance is:
ˆ h } = σ 2 X0 (X0 X)−1 Xh
σ 2 {Y
h
ˆ h can be expressed as a function of σ 2 {b}
The variance of Y
ˆ h } = X0 σ 2 {b}Xh
σ 2 {Y
h
Jabed Tomal (U of T)
Regression Analysis
October 30, 2014
42 / 52
Multiple Regression I
Interval Estimation of E{Yh }
ˆ h in matrix notation is
The estimated variance of Y
ˆ h } = MSE X0 (X0 X)−1 Xh = X0 s2 {b}Xh
s2 {Y
h
h
The 1 − α confidence limits for E{Yh } are:
ˆh ± t(1 − α/2; n − p)s{Y
ˆh }
Y
Jabed Tomal (U of T)
Regression Analysis
October 30, 2014
43 / 52
Multiple Regression I
Prediction for New Observation Yn(new)
The 1 − α confidence limits for a new observation Yn(new)
corresponding to Xh , the specified values of the X variables, are:
ˆh ± t(1 − α/2; n − p)s{pred}
Y
where:
s2 {pred} = MSE
Jabed Tomal (U of T)
1 + X0h (X0 X)−1 Xh
Regression Analysis
October 30, 2014
44 / 52
Multiple Regression I
Prediction for g New Observation Yn(new)
Simultaneous Scheffe prediction limits for g new observations at g
different levels Xh with family confidence coefficient 1 − α are
given by:
ˆh ± S s{pred}
Y
where:
S 2 = g F (1 − α; g, n − p)
Jabed Tomal (U of T)
Regression Analysis
October 30, 2014
45 / 52
Multiple Regression I
Prediction for g New Observation Yn(new)
Alternatively, Bonferroni simultaneous prediction limits for g new
observations at g different levels Xh with family confidence
coefficient 1 − α are given by:
ˆh ± B s{pred}
Y
where:
B = t(1 − α/2g; n − p)
Jabed Tomal (U of T)
Regression Analysis
October 30, 2014
46 / 52
Multiple Regression I
Diagnostics and Remedial Measures
Most of the diagnostic procedures for simple linear regression that
we described carry over directly to multiple regression.
Box plots, sequence plots, stem-and-leaf plots, and dot plots for
each of the predictor variables and for the response variable can
provide helpful, preliminary univariate information about these
variables.
Scatter Plot Matrix: Scatter plots of the response variable against
each predictor variable can aid in determining the nature and
strength of the bivariate relationships between each of the
predictor variables and the response.
Jabed Tomal (U of T)
Regression Analysis
October 30, 2014
47 / 52
Multiple Regression I
Diagnostics and Remedial Measures
Residual Plots: A plot of the residuals against the fitted values is
useful for assessing the appropriateness of the multiple
regression function and the constancy of the variance of the error
terms, as well as providing information about outliers.
A plot of the residuals against time or against some other
sequence can provide diagnostic information about possible
correlation between the error terms in multiple regression.
Box plots and normal probability plots of the residuals are useful
for examining whether the error terms are reasonably normally
distributed.
Plots of residuals against each of the predictor variables can
provide further information about the adequacy of the regression
function with respect to that predictor variable.
Jabed Tomal (U of T)
Regression Analysis
October 30, 2014
48 / 52
Multiple Regression I
Diagnostics and Remedial Measures
Residuals should also be plotted against important predictor
variables that were omitted from the model, to see if the ommitted
variables have substantial additional effects on the response
variable that have not yet been recognized in the regression
model.
Residuals should be plotted against interaction terms to check if
potential interaction effects have not included in the regression
model.
A plot of the absolute residuals or the squared residuals against
the fitted values is useful for examining the constancy of the
variance of the error terms.
Jabed Tomal (U of T)
Regression Analysis
October 30, 2014
49 / 52
Multiple Regression I
Diagnostics and Remedial Measures
If nonconstancy is detected, a plot of the absolute residuals or the
squared residuals against each of the predictor variables may
identify one or several of the predictor variables to which the
magnitude of the error variability is related.
Breusch-Pagan Test for Constancy of Error Variance: The
Breusch-Pagan test for constancy of the error variance in multiple
regression is carried out exactly the same as for simple linear
regression when the error variance increases or decreases with
one of the predictor variables.
If the error variance is assumed to be related to q ≥ 1 predictor
variables, the chi-squared test statistic involves q degrees of
freedom.
Jabed Tomal (U of T)
Regression Analysis
October 30, 2014
50 / 52
Multiple Regression I
Example
Exercise 6.5. Brand Preference. In a small-scale experimental study
of the relation between degree of brand liking (Y ) and moisture
content (X1 ) and sweetness (X2 ) of the product, the following results
were obtained from the experiment bases on a completely randomized
design (data are coded):
i:
Xi1 :
Xi2 :
Yi :
Jabed Tomal (U of T)
1
4
2
64
2
4
4
73
3
4
2
61
···
···
···
···
Regression Analysis
14
10
4
95
15
10
2
94
16
10
4
100
October 30, 2014
51 / 52
Multiple Regression I
Example
1
Obtain the scatter plot matrix and the correlation matrix. What
information do these diagnostic aids provide here?
2
Fit regression model
Yi = β0 + β1 Xi1 + β2 Xi2 + i
to the data. State the estimated regression function. How is b1
interpreted here?
Jabed Tomal (U of T)
Regression Analysis
October 30, 2014
52 / 52