STAC67H3: Regression Analysis Fall, 2014 Instructor: Jabed Tomal October 30, 2014

Transcription

STAC67H3: Regression Analysis
Fall, 2014
Instructor: Jabed Tomal
Department of Computer and Mathematical Sciences
University of Toronto Scarborough
Toronto, ON
Canada
October 30, 2014
Jabed Tomal (U of T)
Regression Analysis
October 30, 2014
1 / 52
Multiple Regression I
First-Order Model with Two Predictor Variables
A first-order (linear in the predictor variables) regression model with
two predictor variables is as follows:
Yi = β0 + β1 Xi1 + β2 Xi2 + i ; i = 1, 2, · · · , n,
where, Yi is the response in the ith trial, Xi1 and Xi2 are the values of
the two predictor variables in the ith trial. The parameters of the
regression model are β0 , β1 , and β2 , and the error term is i .
Assuming E{i } = 0, the regression function for the above model is
E{Y } = β0 + β1 X1 + β2 X2 .
Regression Analysis
October 30, 2014
2 / 52
First-Order Model with Two Predictor Variables
Meaning of Regression Coefficients. The parameter β1 indicates the
change in the mean response E{Y } per unit increase in X1 when X2 is
held constant. Likewise, the parameter β2 indicates the change in the
mean response E{Y } per unit increase in X2 when X1 is held constant.
Regression Analysis
October 30, 2014
3 / 52
First-Order Model with More than Two Predictor Variables
The regression model with p − 1 predictor variables X1 , X2 , · · · , Xp−1 is
Yi = β0 + β1 Xi1 + β2 Xi2 + · · · + βp−1 Xi,p−1 + i ; i = 1, 2, · · · , n,
is called a first-order model with p − 1 predictor variables.
This model can also be written as
Yi = β0 +
p−1
X
βk Xik + i ; i = 1, 2, · · · , n.
k =1
or, if we let Xi0 ≡ 1, the above model can be written as
Yi =
p−1
X
βk Xik + i ; i = 1, 2, · · · , n.
k =0
Regression Analysis
October 30, 2014
4 / 52
First-Order Model with More than Two Predictor Variables
Assuming that E{i } = 0, the regression function is written as
E{Y } = β0 + β1 X1 + β2 X2 + · · · + βp−1 Xp−1 .
Meaning of Regression Coefficients. The parameter βk indicates
the change in the mean response E{Y } with a unit increase in the
predictor variable Xk , when all other predictor variables in the
regression model are held constant.
Regression Analysis
October 30, 2014
5 / 52
General Linear Regression Model
The general linear regression model, with normal error terms, is
defined as
Yi = β0 + β1 Xi1 + β2 Xi2 + · · · + βp−1 Xi,p−1 + i ; i = 1, 2, · · · , n
where:
β0 , β1 , · · · , βp−1 are parameters
Xi1 , Xi2 , · · · , Xi,p−1 are known constants
i are independent N(0, σ 2 ).
Regression Analysis
October 30, 2014
6 / 52
Letting X0i ≡ 1, the general linear regression model is written as
Yi =
p−1
X
βk Xik + i
k =0
Since, E{i } = 0, the regression function is written as
E{Y } = β0 + β1 X1 + β2 X2 + · · · + βp−1 Xp−1 .
Regression Analysis
October 30, 2014
7 / 52
The general linear regression model with normal error terms implies
that the observations Yi are independent normal variables, with mean
E{Yi } = β0 + β1 Xi1 + β2 Xi2 + · · · + βp−1 Xi,p−1
and constant variance σ 2 .
Regression Analysis
October 30, 2014
8 / 52
The general linear regression model encompasses a vast variety of
situations.
1. p − 1 Predictor Variables. When X1 , X2 , · · · , Xp−1 represent p − 1
different predictor variables, the general linear regression model is a
first-order model in which there are no interaction effects between the
predictor variables.
Regression Analysis
October 30, 2014
9 / 52
2. Qualitative Predictor Variables. The general linear regression
model includes not only quantitative predictor variables but also
qualitative predictor variables.
Regression Analysis
October 30, 2014
10 / 52
2. Qualitative Predictor Variables.
Consider the first-order regression model
Yi = β0 + β1 Xi1 + β2 Xi2 + i
where:
Xi1 = patient’s age
Xi2 =
1 if patient is female
0 if patient is male
The response function of the regression model is
E{Y } = β0 + β1 X1 + β2 X2 .
Regression Analysis
October 30, 2014
11 / 52
2. Qualitative Predictor Variables.
For male patients, X2 = 0 and response function becomes:
E{Y } = β0 + β1 X1 .
For female patients, X2 = 1 and response function becomes:
E{Y } = (β0 + β2 ) + β1 X1 .
The two response functions represent parallel straight lines with
different intercepts.
In general, we represent a qualitative variable with c classes by
means of c − 1 indicator variables.
Regression Analysis
October 30, 2014
12 / 52
3. Polynomial Regression.
Polynomial regression models contain squared and higher-order
terms of the predictor variable(s).
An example of polynomial regression model with one predictor is
as follows
Yi = β0 + β1 Xi + β2 Xi2 + i .
Regression Analysis
October 30, 2014
13 / 52
4. Transformed Variables.
Models with transformed variables are special cases of the
general linear regression model.
Consider the following model with a transformed Y variable
log Yi = β0 + β1 Xi1 + β2 Xi2 + β3 Xi3 + i .
Letting Yi0 = log Yi , we get the following model
Yi0 = β0 + β1 Xi1 + β2 Xi2 + β3 Xi3 + i .
which is in the form of general linear regression model.
Regression Analysis
October 30, 2014
14 / 52
5. Interaction Effects.
The general linear regression model includes models with
interaction effects (the effect of one predictor variable depends on
the levels of other predictor variables).
An example with two predictor variables X1 and X2 is
Yi = β0 + β1 Xi1 + β2 Xi2 + β3 Xi1 Xi2 + i .
Letting Xi3 = Xi1 Xi2 , we get the following model
Yi = β0 + β1 Xi1 + β2 Xi2 + β3 Xi3 + i .
Regression Analysis
October 30, 2014
15 / 52
6. Combination of Cases.
A general linear regression model may combine several of the
elements we have just noted.
Consider the following regression model
2
2
Yi = β0 + β1 Xi1 + β2 Xi1
+ β3 Xi2 + β4 Xi2
+ β5 Xi1 Xi2 + i .
2 , Z = X , Z = X 2 , Z = X X , we
Letting Zi1 = Xi1 , Zi2 = Xi1
i3
i2
i4
i5
i1 i2
i2
get the following model
Yi = β0 + β1 Zi1 + β2 Zi2 + β3 Zi3 + β4 Zi4 + β5 Zi5 + i .
Regression Analysis
October 30, 2014
16 / 52
Meaning of Linear in General Linear Regression Model. The term
linear model refers to the fact the general linear regression model is
linear in the parameters; it does not refer to the shape of the response
surface.
A regression model is linear in the parameters when it can written
in the form:
Yi = β0 ci0 + β1 ci1 + β2 ci2 + · · · + βp−1 ci,p−1 + i .
where the terms ci0 , ci1 , etc., are coefficients involving the
predictor variables.
Regression Analysis
October 30, 2014
17 / 52
General Linear Regression Model in Matrix Terms:
We write the general linear regression model in matrix terms as
following
Y = X
n×1
β + n×p p×1
n×1
where
The vector of response is


Y1
Y2 
 
Y = . 
n×1
 .. 
Yn
Regression Analysis
October 30, 2014
18 / 52
The matrix of constants is


1 X11 X12 · · · X1,p−1
1 X21 X22 · · · X2,p−1 


X = .
..
..
.. 
.
.
.
n×p
.
.
.
.
. 
1 Xn1 Xn2 · · · Xn,p−1
The vector of parameters is

β0
β1
β2
..
.







β =



p×1


βp−1
Regression Analysis
October 30, 2014
19 / 52
The error vector is
 
1
2 
 
=.
n×1
 .. 
n
which contains independent normal random variables with
expectation
E{} = 0
Regression Analysis
October 30, 2014
20 / 52
and variance-covariance matrix:

 2
σ
0 0 ··· 0
 0 σ2 0 · · · 0 


 0 0 σ2 · · · 0 
Var{} = 
 = σ2 I
n×n

.
..
.. . .
.
n×n
 ..
. .. 
.
.
0 0 0 · · · σ2
Regression Analysis
October 30, 2014
21 / 52
The random vector Y has expectation:
E(Y) = Xβ
n×1
n×1
and the variance-covariance matrix of Y is the same as that of Var{Y} = σ 2 I
n×n
Regression Analysis
n×n
October 30, 2014
22 / 52
Estimation of Regression Coefficients:
The least squares criterion for the general linear regression model
is
Q = (Y − Xβ)T (Y − Xβ)
The least squares estimators are those values of β0 , β1 , · · · , βp−1
that minimize Q.
The least squares normal equations for the general linear
regression model are:
XT Xb = XT Y
Regression Analysis
October 30, 2014
23 / 52
and the least squares estimators are:


b0
 b1 
 
−1  b2 
XT X Y
b =
 = XT X
 .. 
p×1
p×p
p×p
 . 
bp−1
Regression Analysis
October 30, 2014
24 / 52
The maximum likelihood estimator of β can be obtained by
maximizing the following likelihood function with respect to β
1
1
T
2
L(β, σ ) =
exp − 2 (Y − Xβ) (Y − Xβ)
2σ
(2πσ 2 )n/2
The maximum likelihood estimator of β is the same as the least
squares estimator of β:
−1 ˆ = XT X
β
XT X Y
p×1
p×p
Regression Analysis
p×p
October 30, 2014
25 / 52
Fitted Values and Residuals
î be denoted by Y:
ˆ
Let the vector of the fitted values Y
ˆ 
Y1
Y
ˆ2 

ˆ =
Y
 . 
n×1
 .. 
ˆn
Y
î be denoted by e:
and the vector of the residual terms ei = Yi − Y
 
e1
e2 

ˆ =
e
 .. 
.
n×1
en
Regression Analysis
October 30, 2014
26 / 52
In matrix notation, we have:
ˆ = X
Y
n×1
b
n×p p×1
and
ˆ = Y − X
e = Y − Y
n×1
n×1
n×1
n×1
Regression Analysis
b
n×p p×1
October 30, 2014
27 / 52
ˆ can be expressed in terms of the hat
The vector of fitted values Y
matrix H as follows:
ˆ = X X0 X
Y
−1
X0 Y
or, equivalently:
ˆ = H
Y
Y
H = X X0 X
−1
n×1
n×n n×1
where:
n×n
Regression Analysis
X0
October 30, 2014
28 / 52
The vector of residuals can be expressed as:
e = (I − H) Y
n×1
Regression Analysis
October 30, 2014
29 / 52
The variance-covariance matrix of the residuals is:
σ 2 {e} = σ 2 (I − H)
n×n
which is estimated by:
s2 {e} = MSE (I − H)
n×n
Regression Analysis
October 30, 2014
30 / 52
Analysis of Variance
The sum of squares for the analysis of variance in matrix terms are
1
1
SST = Y0 Y −
Y0 JY = Y0 I −
J Y
n
n
SSE = e0 e = (Y − Xb)0 (Y − Xb) = Y0 Y − b0 X0 Y = Y0 [I − H] Y
1
1
0
0
SSR = b X Y −
Y JY = Y H −
J Y
n
n
0 0
where


1 1 ··· 1
1 1 · · · 1


J = . . .
.. 
.
.
.
n×n
. .
. .
1 1 ··· 1
Regression Analysis
October 30, 2014
31 / 52
Analysis of Variance
Table: ANOVA Table for General Linear Regression Model.
Source of
Variation
Regression
Error
Total
SS
SSR = b0 X0 Y −
1
n
Y0 JY
SSE = Y0 Y − b0 X0 Y
SST = Y0 Y −
1
n
Y0 JY
Regression Analysis
df
p−1
MS
MSR = SSR
p−1
n−p
MSE =
SSR
n−p
n−1
October 30, 2014
32 / 52
F Test for Regression Relation
We set the following hypotheses to test whether there is a regression
relation between the response variable Y and the set of X variables
X1 , X2 , · · · , Xp−1 :
H0 : β1 = β2 = · · · = βp−1 = 0
versus
HA : not all βk (k = 1, 2, · · · , p − 1) equal zero.
We use the following test statistics
MSR
.
MSE
Reject H0 at α level of significance if
F∗ =
F ∗ > F (1 − α; p − 1, n − p).
Regression Analysis
October 30, 2014
33 / 52
Coefficient of Multiple Determination
The coefficient of multiple determination, denoted by R 2 , is defined as
R2 =
SSR
SSE
=1−
SST
SST
which ranges from 0 to 1. R 2 assumes the value 0 when all
bk = 0(k = 1, 2, · · · , p − 1), and the value 1 when all Y observations
î for all i.
fall directly on the fitted regression surface, i.e., when Yi = Y
The adjusted coefficient of multiple determination, denoted by Ra2 ,
adjusts R 2 by dividing each sum of squares by its associated degrees
of freedom:
n − 1 SSE
2
Ra = 1 −
n − p SST
Regression Analysis
October 30, 2014
34 / 52
Inferences about Regression Parameters
The least squares and maximum likelihood estimators in b are
unbiased:
E{b} = β
The variance-covariance matrix of b:

σ 2 {b0 }
σ{b0 , b1 }
 σ{b1 , b0 }
σ 2 {b1 }

σ 2 {b} = 
..
..

p×p
.
.

· · · σ{b0 , bp−1 }
· · · σ{b1 , bp−1 }


..
..

.
.
2
σ{bp−1 , b0 } σ{bp−1 , b1 } · · · σ {bp−1 }
Regression Analysis
October 30, 2014
35 / 52
In short:
σ 2 {b} = σ 2 X0 X
−1
p×p
Regression Analysis
October 30, 2014
36 / 52
The estimated variance-covariance matrix of b:


s2 {b0 }
s{b0 , b1 } · · · s{b0 , bp−1 }
 s{b1 , b0 }
s2 {b1 }
· · · s{b1 , bp−1 }


2
s {b} = 

..
..
..
.
.


.
p×p
.
.
.
2
s{bp−1 , b0 } s{bp−1 , b1 } · · · s {bp−1 }
is given by:
−1
s2 {b} = MSE X0 X
p×p
Regression Analysis
October 30, 2014
37 / 52
Tests for βk
The null and alternative hypotheses are:
H0 : βk = 0
versus
HA : βk 6= 0
The test statistic is:
t∗ =
bk
s{bk }
Decision Rule: Reject H0 at α level of significance if
|t ∗ | > t(1 − α/2; n − p).
Regression Analysis
October 30, 2014
38 / 52
Interval Estimation of βk
For the normal error general linear regression model
bk − βk
∼ t(n − p) ; k = 0, 1, · · · , p − 1
s{bk }
Hence, the 100(1 − α)% confidence interval is:
bk ± t(1 − α/2; n − p)s{bk }
Regression Analysis
October 30, 2014
39 / 52
Joint Inferences
If g parameters are to be estimated jointly (where g ≤ p), the
confidence limits with family confidence coefficient 1 − α are:
bk ± B s{bk }
where
B = t(1 − α/2g; n − p)
Regression Analysis
October 30, 2014
40 / 52
Interval Estimation of E{Yh }
To estimate the mean response at Xh1 , Xh2 , · · · , Xh,p−1 , let us
define the vector:


1
 Xh1 


 Xh2 
Xh = 

 .. 
p×1
 . 
Xh,p−1
The mean response to be estimated is:
E{Yh } = X0h β
The estimated mean response corresponding to Xh is
ˆ h = X0 b
Y
h
Regression Analysis
October 30, 2014
41 / 52
The estimator is unbiased:
ˆ h } = X0 β = E{Yh }
E{Y
h
and its variance is:
ˆ h } = σ 2 X0 (X0 X)−1 Xh
σ 2 {Y
h
ˆ h can be expressed as a function of σ 2 {b}
The variance of Y
ˆ h } = X0 σ 2 {b}Xh
σ 2 {Y
h
Regression Analysis
October 30, 2014
42 / 52
ˆ h in matrix notation is
The estimated variance of Y
ˆ h } = MSE X0 (X0 X)−1 Xh = X0 s2 {b}Xh
s2 {Y
h
h
The 1 − α confidence limits for E{Yh } are:
ˆh ± t(1 − α/2; n − p)s{Y
ˆh }
Y
Regression Analysis
October 30, 2014
43 / 52
Prediction for New Observation Yn(new)
The 1 − α confidence limits for a new observation Yn(new)
corresponding to Xh , the specified values of the X variables, are:
ˆh ± t(1 − α/2; n − p)s{pred}
Y
where:
s2 {pred} = MSE
1 + X0h (X0 X)−1 Xh
Regression Analysis
October 30, 2014
44 / 52
Prediction for g New Observation Yn(new)
Simultaneous Scheffe prediction limits for g new observations at g
different levels Xh with family confidence coefficient 1 − α are
given by:
ˆh ± S s{pred}
Y
where:
S 2 = g F (1 − α; g, n − p)
Regression Analysis
October 30, 2014
45 / 52
Prediction for g New Observation Yn(new)
Alternatively, Bonferroni simultaneous prediction limits for g new
observations at g different levels Xh with family confidence
coefficient 1 − α are given by:
ˆh ± B s{pred}
Y
where:
B = t(1 − α/2g; n − p)
Regression Analysis
October 30, 2014
46 / 52
Diagnostics and Remedial Measures
Most of the diagnostic procedures for simple linear regression that
we described carry over directly to multiple regression.
Box plots, sequence plots, stem-and-leaf plots, and dot plots for
each of the predictor variables and for the response variable can
provide helpful, preliminary univariate information about these
variables.
Scatter Plot Matrix: Scatter plots of the response variable against
each predictor variable can aid in determining the nature and
strength of the bivariate relationships between each of the
predictor variables and the response.
Regression Analysis
October 30, 2014
47 / 52
Residual Plots: A plot of the residuals against the fitted values is
useful for assessing the appropriateness of the multiple
regression function and the constancy of the variance of the error
terms, as well as providing information about outliers.
A plot of the residuals against time or against some other
sequence can provide diagnostic information about possible
correlation between the error terms in multiple regression.
Box plots and normal probability plots of the residuals are useful
for examining whether the error terms are reasonably normally
distributed.
Plots of residuals against each of the predictor variables can
provide further information about the adequacy of the regression
function with respect to that predictor variable.
Regression Analysis
October 30, 2014
48 / 52
Residuals should also be plotted against important predictor
variables that were omitted from the model, to see if the ommitted
variables have substantial additional effects on the response
variable that have not yet been recognized in the regression
model.
Residuals should be plotted against interaction terms to check if
potential interaction effects have not included in the regression
model.
A plot of the absolute residuals or the squared residuals against
the fitted values is useful for examining the constancy of the
variance of the error terms.
Regression Analysis
October 30, 2014
49 / 52
If nonconstancy is detected, a plot of the absolute residuals or the
squared residuals against each of the predictor variables may
identify one or several of the predictor variables to which the
magnitude of the error variability is related.
Breusch-Pagan Test for Constancy of Error Variance: The
Breusch-Pagan test for constancy of the error variance in multiple
regression is carried out exactly the same as for simple linear
regression when the error variance increases or decreases with
one of the predictor variables.
If the error variance is assumed to be related to q ≥ 1 predictor
variables, the chi-squared test statistic involves q degrees of
freedom.
Regression Analysis
October 30, 2014
50 / 52
Example
Exercise 6.5. Brand Preference. In a small-scale experimental study
of the relation between degree of brand liking (Y ) and moisture
content (X1 ) and sweetness (X2 ) of the product, the following results
were obtained from the experiment bases on a completely randomized
design (data are coded):
i:
Xi1 :
Xi2 :
Yi :
1
4
2
64
2
4
4
73
3
4
2
61
···
···
···
···
Regression Analysis
14
10
4
95
15
10
2
94
16
10
4
100
October 30, 2014
51 / 52
Example
1
Obtain the scatter plot matrix and the correlation matrix. What
information do these diagnostic aids provide here?
2
Fit regression model
Yi = β0 + β1 Xi1 + β2 Xi2 + i
to the data. State the estimated regression function. How is b1
interpreted here?
Regression Analysis
October 30, 2014
52 / 52

STAC67H3: Regression Analysis Fall, 2014 Instructor: Jabed Tomal October 30, 2014

Transcription

Similar documents

Assignment 2B-2: Regression

LRTrek - ActFX Algorithmic Trading

Statistics 512 Divisions 1 and 4, Spring 2014 Sample Midterm I

Document 6466479

R: THE GOOD, THE BAD, AND THE UGLY John D. Cook

Assignment Week 1 Answer the following questions:

STAC67H: Regression Analysis Fall, 2014 Instructor: Jabed Tomal October 26, 2014

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences

A GUIDE TO DETERMINING SAMPLE SIZE A PRIORI INTRODUCTION