Partial and Semipartial Correlation Working With Residuals

Transcription

Partial and Semipartial Correlation Working With Residuals
Partial and Semipartial
Correlation
Working With Residuals
Questions
• Give a concrete
example (names of
vbls, context) where it
makes sense to
compute a partial
correlation. Why a
partial rather than
semipartial?
• Why is the squared
semipartial always less
than or equal to the
squared partial?
• Give a concrete
example where it
makes sense to
compute a semipartial
correlation. Why semi
rather than partial?
• Why is regression more
closely related to
semipartials than
partials?
• How could you use
ordinary regression to
compute 3rd order
partials?
Partial Correlation
• People differ in many ways. When one
difference is correlated with an outcome,
cannot be sure the correlation is not spurious.
• Would like to hold third variables constant,
but cannot manipulate.
• Can use statistical control.
• Statistical control is based on residuals. If we
regress X2 on X1 and take residuals of X2,
this part of X2 will be uncorrelated with X1,
so anything X2 resids correlate with will not
be explained by X1.
Example of Partials
Use SAT to predict grades (HS & College Fresh)
HS=.8557+.0043*SAT; F=.9563+.0038*SAT.
Person
1
2
3
4
5
6
7
8
9
10
SAT-V
500
550
450
400
600
650
700
550
650
550
HSGPA
3.0
3.2
2.8
2.5
3.2
3.8
3.9
3.8
3.5
3.1
FGPA
2.8
3.0
2.8
2.2
3.3
3.3
3.5
3.7
3.4
2.9
PFGPA
2.86
3.05
2.67
2.48
3.24
3.43
3.61
3.05
3.43
3.05
(HS)
(F)
E1
-0.01
-0.02
0.01
-0.08
-0.24
0.15
0.03
0.58
-0.15
-0.12
E2
-0.06
-0.05
0.13
-0.28
0.06
-0.13
-0.12
0.65
-0.03
-0.15
R2 for HS = .76; R2 for F = .62 (fictional data).
Example Partials (2)
There are 2 sets of predicted values; one for each GPA,
however, they correlate 1.0 with each other, so only 1 is
presented.
High correlations
SATV
HS
GPA
F
GPA
P
E1
(HS)
SAT-V
1
HSGPA
.87
1
FGPA
.81
.92
1
P
1.00
.87
.81
1
E1
.00
.50
.45
.00
1
E2
.00
.37
.58
.00
.74
E2
(F)
Note that P and SAT are perfectly
correlated. P & SAT do not correlate with
E1 or E2 (residuals).
1
A partial correlation; the correlation between the
residuals of the two GPAs. The correlation between HS
GPA and FGPA holding SAT constant.
The Meaning of Partials
• The partial is the result of holding
constant a third variable via residuals.
• It estimates what we would get if
everyone had same value of 3rd
variable, e.g., corr b/t 2 GPAs if all in
sample have SAT of 500.
• Some examples of partials? Control for
SES, prior experience, what else?
Computing Partials from
Correlations
Although you compute partials via residuals, sometimes it
is handy to compute them with correlations. Also looking
at the formulas is (could be?) informative.
Notation. The partial correlation is r12.3 where variable 3 is
being partialed from the correlation between 1 and 2. In our
example, r  r
r
 .74
12.3
r12.3 
r12.3 
( HSGPA)( FGPA). SATV
r12  r13r23
1  r132 1  r232
.92  (.87)(.81)
1  .87 2 1  .812
( E1)( E 2 )
The partial correlation can be
a little or a lot bigger or
smaller than the original.
 .74
The Order of a Partial
• If you partial 1 vbl out of a correlation, the resulting
partial is called a first order partial correlation.
• If you partial 2 vbls out of a correlation, the resulting
partial is called a second order partial correlation.
Can have 3rd, 4th, etc., order partials.
• Unpartialed (raw) correlations are called zero order
correlations because nothing is partialed out.
• Can use regression to find residuals and compute
partial correlations from the residuals, e.g. for r12.34,
regress 1 and 2 on both 3 and 4, then compute
correlation between 2 sets of residuals.
Partials from Multiple
Correlation
We can compute squared partial correlations from various R2
values.
2
2
R

R
R12.23 is the R2 from the regression in
2
1.23
1.3
r12.3 
1  R12.3
which 1 is the DV and 2 and 3 are
the Ivs.
Alternative (possibly friendlier) notation.
rY21.2
RY2.12  RY2.2

1  RY2.2
Squared Partials from R2 R R
Venn Diagrams r  1  R
2
Y .12
2
Y 1.2
2
Y .2
2
Y .2
Here we want the partial correlation
Between Y and X1 holding X2
constant.
2.
2
R y .12
Y
1.
Y
U Y:X1
U Y:X2
Sh ar ed X
Sh ar ed Y
X1
X2
X2
X1
2
R y .12 R y .2
2
2
1 - R y .2
3.
4.
Y
Y
X1
X2
X1
X2
Exercise – Find a Partial
1
2
1 ANX
1
2 Fam
History
3 DOC
Visit
.20
1
.35
.15
3
1
What is the correlation between trait anxiety and the
number of doctor visits controlling for family medical
history?
Find a partial
1
2
1 ANX
1
2 Fam
History
.20
1
3 DOC
Visit
.35
.15
r13.2 
3
1
r13  r12 r32
1  r122 1  r322
r13.2 
.35  (.2)(. 15)
1  .2 2 1  .152
 .33
Semipartial Correlation
With partial correlation, we find the correlation between X
and Y holding Z constant for both X and Y. Sometimes, we
want to hold Z constant for just X or just Y. Instead of
holding constant for both, hold for only one, therefore it’s a
semipartial correlation instead of a partial. With a
semipartial, we find the residuals of X on Z or Y on Z but
the other is the original, raw variable. Correlate one raw
with one residual.
In our example, we found the correlation between E1
(HSGPA) and FGPA to be .45. This is the semipartial
correlation between HSGPA and FGPA holding SAT
constant for HSGPA only.
Semipartials from
Correlations
Partial:
r12  r13r23
r12.3 
1  r132 1  r232
Semipartial: r1( 2.3) 
r12  r13r23
1  r232
and r2 (1.3) 
r12  r13r23
1  r132
Note that r1(2.3) means the semipartial correlation between
variables 1 and 2 where 3 is partialled only from 2. In our
example:
r1( 2.3) 
.92  (.87)(.81)
1.81
2
 .37 r2 (1.3) 
.92  (.87)(.81)
1.87 2
Agrees with earlier results within rounding error.
 .44
Squared Semipartials from
Multiple Correlations
Partial:
rY21.2
RY2.12  RY2.2

1  RY2.2
2
2
2
r

R

R
Semipartial: Y (1.2)
Y .12
Y .2
Squared semipartial is an increment in R2.
Y
Y
U Y:X1
U Y:X2
Sh ar ed X
X1
Sh ar ed Y
X1
X2
X2
2
Y (1.2 )
r
R
2
Y .12
R
2
Y .2
UY : X 1

 UY : X 1
1
Partial vs. Semipartial
Partial
Semipartial
2
R y .12 R y .2
2
2
1 - R y .2
Y
Y
X1
X2
X1
X2
Why is the squared partial larger than the squared
semipartial? Look at the respective areas for Y.
Regression and Semipartial
Correlation
• Regression is essentially about semipartials
• Each X is residualized on the other X
variables.
• For each X we add to the equation, we ask,
“What is the unique contribution of this X
above and beyond the others?” Increment in
R2 when added last.
• We do NOT residualize Y, just X.
• Semipartial because X is residualized but Y is
not.
• b is the slope of Y on X, holding the other X
variables constant.
Semipartial and Regression 2
 Y 1 .2
rY 1  rY 2 r12

1  r122
rY (1.2 ) 
rY 1  rY 2 r12
1 r
2
12
Standardized regression
coefficient
Semipartial correlation
The difference is the square root in the denominator.
The regression coefficient can exceed 1.0 in absolute
value; the correlation cannot.
Uses of Partial and
Semipartial
• The partial correlation is most often used
when some third variable z is a plausible
explanation of the correlation between X and
Y.
– Job characteristics and job sat by NA
– Cog ability and grades by SES
• The semipartial is most often used when we
want to show that some variable adds
incremental variance in Y above and beyond
other X variable
– Pilot performance and Cog ability, motor skills
– Patient well being and surgery, social support
Review
• Give a concrete example (names of
vbls, context) where it makes sense to
compute a partial correlation. Why a
partial rather than semipartial?
• Give a concrete example where it
makes sense to compute a semipartial
correlation. Why semi rather than
partial?
Suppressor Effects
• Hard to understand, but
– Inspection of r not enough to tell value
– Need to know to avoid looking dumb
– Show problems with Venn diagrams
• Think of observed variable as
composite of different stuff, e.g.,
satisfaction with car (price, prestige,
etc.)
Suppressor Effects (2)
Y
X1
X2
Y
1
.50
.00
X1
X2
1
.50
1
Note that X2 is correlated with
X1 but NOT with Y. Will X2
be useful in a regression
equation?
If we solve for beta weights, we find, beta1=.667 and
beta2 = -.333. Notice that the beta weight for the first is
actually larger than r (.50), and the second has become
negative. Can also happen that r is (usually slightly)
positive and beta is negative. This is a suppressor effect.
Always inspect your correlations along with your
regression weights to see if this is happening.
What does it mean that beta2 is negative? Sometimes people forget that
there are other X variables in the equation. “The results mean that we
should feed people more to get them to lose weight.”
Suppressor Effects (3)
• Can also happen in path analysis, CSM.
• Explanation – X2 is a measure of prediction
error in X1. If we subtract X2, will have a
more useful measure of X1. X2 ‘suppresses’
the correlation of Y and X1.
• Inspection of correlation matrix not sufficient
to see value of variables.
• Looking dumb.
• Venn diagram.
Review
• Why is the squared semipartial always
less than or equal to the squared partial?
•Why is regression more closely
related to semipartials than partials?
•How could you use ordinary
regression to compute 3rd order
partials?
Exercise – Find a Semipartial
Y
X1
X2
Y
1
.20
.30
X1
X2
1
.40
1
ry (1.2 )  ?
What is the correlation
between Y and X1 holding
X2 constant only for X1?
Find a Semipartial
Y
X1
X2
Y
1
.20
.30
ry (1.2) 
ry (1.2 ) 
X1
X2
1
.40
1
The correlation of X1 with Y
after controlling for X2 (from
X1 only) is rather small.
ry1  ry 2 r12
1 r
2
12
.20  (.30)(.40)
1  .40
2
 .087
Computer Exercise
• Go to labs and download 2IV Example.
• Find the partial correlation between hassles
and well being holding gender and anger
constant (2nd order partial).
• Find the squared semipartial for anger when
well being is the DV and gender and hassles
are the other IVs, that is, find the increment in
R-square when anger is added to the equation
after gender and hassles.