Regression Analysis

Transcription

Regression Analysis
Beasley
BST 622 Assignment 2 – 200 points
Spring 2015
1. Based on Graveley and Littlefield (1992), a researcher conducted a study to determine the cost of J = 3
prenatal clinical staffing models: (j = 1) Physician-based; (j = 2) Mixed (M.D., R.N.) staffing; and (j = 3)
Clinical Nurse Specialist with physician available for consultation. The subjects were women, 18 years of age
or older, who obtained prenatal care at one of the three facilities, and who had delivered within 48 hours of the
interview. The cost was defined as the amount of money billed over and above the amount covered by the
patient’s health insurance (AMNT). These values were also converted to Ranks (RAMNT). There are a set of
dummy-coded indicator variables (Xj) that represent group membership.
The data are in file BST622-ASSN2-AMNT.xls.
Use either SPSS (Analyze-Regression-Linear); SAS (PROC REG) or JMP (Fit Model); to compute the
regression solution (Y = b0 + b1X1 + b2X2) for regressing AMNT and RAMNT (Dependent, Response, Y
variable) on to X (Independent, Factor, X variable).
(A). What is the Model R2?
(B). What are test results of the
Regression Models?
AMNT
R2 = _______
AMNT
F-ratio
p-value
RAMNT
R2 = _______
F-ratio
(2 points).
RAMNT
p-value
(4 points)
(C). For AMNT, What are the values of b0? ________
(1 point)
b1? ________
(1 point)
b2? ________
(1 point)
(D). For RAMNT, What are the values of b0? ________
(1 point)
b1? ________
(1 point)
b2? ________
(1 point)
(E). What is the predicted value for X1 = 1?
AMNT
_______
RAMNT
______
(2 points)
(F). What is the predicted value for X2 = 1?
AMNT
_______
RAMNT
______
(2 points)
(G). For AMNT, How would you interpret the regression intercepts (b0)?
(2 points)
(H). For AMNT, How would you interpret the regression slopes (b1 and b2)?
(3 points)
(I). For AMNT, symbolic notation express the OMNIBUS NULL HYPOTHESIS of the
REGRESSION MODEL in terms of SLOPES.
(2 points)
(J). What is the null hypothesis in (1I) above mean in words?
(2 points)
1
Beasley
BST 622 Assignment 2 – 200 points
Spring 2015
2. Perform a standard J=3 one-way Analysis of Variance (ANOVA) with AMNT as the dependent variable.
SPSS: Use Analyze-General Linear Model-Univariate and
Select AMNT as the Dependent Variable and Group as Fixed Factor
Select Options – Descriptive statistics Estimates of Effect Size Observed Power Parameter Estimates Residual
Plot
Select Post-Hoc and Move Group to the Post Hoc Tests for box and Select the LSD, Tukey, Sidak, and Bonferroni
JMP: Change the X variable to be Nominal, then Use Analyze-Fit Y by X
Under the Oneway Analysis Banner select the Means/Anova/Pooled t and
Means and Std Dev Compare Means – All Pairs, Tukey HSD options
SAS: proc glm data=amnt;class group;model amnt = group; means group /lsd;
means group /tukey cldiff;means group /sidak;means group /bon; RUN;
(A). Complete the ANOVA Source Table with SS, df, MS, and F, and p-value for AMNT.
Source
SS
df
Between
______
___
MS
F
p-value
_____
____
_____
(4 points)
Within
_______
___
_____
___________________________________________________________________________
Total
________
___
(B). For the analysis reported in 2A, express the OMNIBUS NULL HYPOTHESIS
ONE-WAY ANOVA in terms of MEANS.
(2 points)
(C). Explain how are the H0 in (2B) and the H0 in (1I) equivalent?
(3 points)
(D). What is the Model R2 For the one-way ANOVA reported in 2A? R2 = _______
(2 points)
(E). Explain how the R2 value in 2D relates to the R2 value for AMNT in 1A?
(3 points)
(F). Report the Mean Difference and Tukey HSD Simultaneous Confidence Interval for each pairwise
comparison.
(5 points)
Mean Diff
Lower Bound
Upper Bound
1 vs 2
1 vs 3
2 vs 3
3. Answer the following Power Analysis Questions: Use SAS PROC POWER Code.
For Omnibus F test
For contrasts:
proc power;
onewayanova test=overall
alpha = 0.05
groupmeans = 760.100937 | 832 | 763
stddev = 209.2012
ntotal = .
power = .80;
run;
proc power;
onewayanova test=contrast
alpha = 0.05
contrast = (1 -1 0)
groupmeans = 760.100937 | 832 | 763
stddev = 209.2012
ntotal = .
power = .80; run;
2
Beasley
BST 622 Assignment 2 – 200 points
Spring 2015
(A). Holding AMNT results (Means and SDs) constant and assuming equal sample sizes, what would a future
Total Sample Size (N) need to be for the omnibus F-test to have 80% Power (1 –  = 0.80)
at a two-tailed  = 0.05.?
(3 points)
(B). Holding AMNT results (Means and SDs) constant and assuming equal sample sizes, what would a future
Total Sample Size (N) need to be for the MAXIMUM PAIRWISE DIFFERENCE to have 80% Power
(1 –  = 0.80) at a two-tailed  = 0.05 for Fisher’s LSD (no adjustment for multiple testing)?
(3 points)
(C). Holding AMNT results (Means and SDs) constant and assuming equal sample sizes, what would a future
Total Sample Size (N) need to be for the MAXIMUM PAIRWISE DIFFERENCE to have 80% Power
(1 –  = 0.80) at a two-tailed  = 0.05 after Bonferroni adjustment for multiple testing?
(3 points)
Bonferroni adjusted α = __________
(D). Holding AMNT results (Means and SDs) constant and assuming equal sample sizes, what would a future
Total Sample Size (N) need to be for the MINIMUM PAIRWISE DIFFERENCE to have 80% Power
(1 –  = 0.80) at a two-tailed  = 0.05 after Bonferroni adjustment for multiple testing?
(3 points)
Bonferroni adjusted α = __________
4. Perform a standard J=3 one-way Analysis of Variance (ANOVA) with RAMNT as the dependent variable.
(A). What is the Model R2? R2 = _______
(2 points)
(C). Report the Mean Difference and Tukey HSD Simultaneous Confidence Interval for each pairwise
comparison.
(5 points)
Mean Diff
Lower Bound
Upper Bound
1 vs 2
1 vs 3
2 vs 3
5. Nonparametric Alternative
JMP:
Change the X variable to be Nominal, then Use Analyze-Fit Y by X
Under the Oneway Analysis Banner select the Nonparametric Wilcoxon Test
SPSS: Use Analyze-Nonparametric Tests and K Independent Samples. Place AMNT and RAMNT in the Test
Variables List and group as the Grouping variable and Define the range from 1 to 3. Select Descriptives option.
SAS: Use PROC NPAR1WAY; CLASS Group; VAR AMNT RAMNT; RUN; and
(A). What are results of the
Kruskal-Wallis Test?

2
AMNT
p-value

2
RAMNT
p-value
(2 points)
(B). Divide the 2 value by (N-1): 2/(N-1) = ____________
(2 points)
(C). How does the value form 5B relate to the R2 values from 4A and 1A for RAMNT?
(3 points)
6. Do you think there is a causal relationship between these variables? Explain.
(2 points)
3
Beasley
BST 622 Assignment 2 – 200 points
Spring 2015
7. Write a brief Results section that explains which Staffing Models are significantly different and if any
should be preferred over another. The Results section should usually report inferential tests and effect
magnitudes in the text, while descriptive statistics should be reported in a table.
(7 points)
8. In a replication of Schwartz & Bronikowski (2013) a researcher randomizes two strains of mice to two
laboratory conditions. One strain typically lives at lower elevation lakeshore sites and are denoted as fastliving (L-fast); they have earlier maturation at larger body size, higher reproductive rate and decreased
longevity, relative to the slow-living strain (M-slow) that live in higher elevation mountain meadows. On the
day of the experiment, Control animals were moved to a 27°C incubator (CONT27), and treatment animals
were put under a “Heat Stress” condition in an incubator set at 37°C incubator where they were maintained for
2 hours (HEAT37). Subsequently the liver gene expression of GPX1 was measured for all mice. GPX1 codes
for glutathione peroxidase, one of the most important antioxidant enzymes in humans.
The data are in BST622-Assn2-Mice2x2.xls.
(A). Perform a 2 x 2 (Heat by Strain) two-factor ANOVA.
JMP: Use Analyze-Fit Model
Select GPX1 as the Y variable Select HTCond and Strain as Model Effects by Clicking Add
To create an Interaction term Select HTCond in the Select Columns variable list then
Select Strain in the Model Effects list then click Cross
SAS: proc glm data=mice2x2; class Strain HTCond;
model GPX1= HTCond Strain HTCond*Strain; means HTCond*Strain;run;
Complete
the
Source
Table
Source
Heat
Strain
Heat x Strain
Within
Total
df
___
___
___
___
___
SS
____
____
____
____
____
F
____
____
____
p-value
______
______
______
(8 points)
(B). Perform a one-way ANOVA on the J = 4 groups. Report the following Source Table
Note: The SAS code below can be used to answer Questions 8B, 8C, and 9A-9C
JMP: Use Analyze-Fit Model
Select GPX1 as the Y varaible Select Group or Grp as a Model Effect by Clicking Add
After clicking Run Model Under the Group Banner, Select LS Means Tukey HSD Option
Select LS Means Contrast and create the 3 contrasts above.
Use the Plus and Minus keys to create the Contrast.
SAS: PROC GLM data=mice2x2; class GRP; model GPX1 = GRP;
means GRP
contrast
contrast
contrast
/ tukey cldiff;
'Heat Main'
GRP
'Strain Main' GRP
'Interaction' GRP
Source
Between
Within
Total
df
___
___
___
1 -1 1 -1;
1 1 -1 -1;
1 -1 -1 1;run;
SS
____
____
____
F
____
p-value
______
(4 points)
4
Beasley
BST 622 Assignment 2 – 200 points
Spring 2015
(C). Perform all pairwise comparisons using Tukey’s HSD Simultaneous Confidence Intervals. (5 points)
Mean Diff
Lower Bound
Upper Bound
Lfast-C vs Lfast-H
Lfast-C vs Mslow-C
Lfast-C vs Mslow-H
Lfast-H vs Mslow-C
Lfast-H vs Mslow-H
Mslow-C vs Mslow-H
(D). How does the interpretation of the results of the one-way ANOVA with Pairwise Comparisons differ
from the results of the two-way ANOVA with analyses of main effects and interaction.
(3 points)
9. Using Contrast statements
(A). What is Null Hypothesis for the main effect of Heat?
What is the result?
H0: ___ LfastC
t or F = ______
p = _____
(3 points)
t or F = ______
p = _____
(3 points)
___ LfastH
___ MslowC
___ MslowH = 0
(B). What is Null Hypothesis for the main effect of Strain.
H0: ___ LfastC
___ LfastH
___ MslowC
___ MslowH = 0
(C). What is the Null Hypothesis for the Heat x Substance interaction effect.
(Hint: Interactions are multiplicative effects of the main effects).
H0: ___ LfastC
___ LfastH
___ MslowC
___ MslowH = 0
t or F = ______
p = _____
(3 points)
10. Use PROC POWER onewayanova code with contrasts to answer the following questions.
Example SAS code proc power;
onewayanova test=contrast
alpha = 0.05
contrast = (1 1 -1 -1)
groupmeans = 4.7365 | 14.3382
stddev = 6.352795
ntotal = .
power = .80; run;
| 7.2235 | 6.0655
(A). Holding these results (Means and SDs) constant, what would a future Total Sample Size (N) need to be
for the test of the Heat Main Effect to have 80% Power (1 –  = 0.80) at a two-tailed  = 0.05.? (3 points)
(B). Holding these results (Means and SDs) constant, what would a future Total Sample Size (N) need to be
for the test of the Strain Main Effect to have 80% Power (1 –  = 0.80) at a two-tailed  = 0.05.? (3 points)
(C). Holding these results (Means and SDs) constant, what would a future Total Sample Size (N) need to be
for the test of the Heat by Strain Interaction Effect to have 80% Power (1 –  = 0.80) at a two-tailed  = 0.05.?
(3 points)
5
Beasley
BST 622 Assignment 2 – 200 points
Spring 2015
11. Write an interpretation of the results of the two-way ANOVA in the form of an “expanded” Results
Section.
(7 points)
12. This analysis can also be completed by regressing GPX1 onto contrast variables that represent the effects.
Use either SPSS (Analyze-Regression-Lfastinear); SAS (PROC REG) or JMP (Fit Model); to compute the
regression solution (E(Y) = b0 + b1CH + b2CS+ b3CI) for regressing GPX1 (Dependent, Response, Y
variable) on to the Contrast Variables (Independent, Factor, X variable).
(A). What are results of the Regression Models?
R2 = _______
F=
(B). What are the values of b0? ________
(1 point)
(C). What are the Results
(3 points)
p=
b1? ________
(1 point)
b1
t-test
p-value
_______ _______
(1 point) (1 point)
b2? ________
(1 point)
b2
t-test p-value
_______ _______
(1 point) (1 point)
b3? ________
(1 point)
b3
t-test p-value
_______ ________
(1 point) (1 point)
(D). How does these results in 12A-C compare to previous analyses in 9A-C?
(3 points)
13. A researcher bases a pilot study on Caslake et al. (2008, Am. J.of Clin. Nutrition, 88(3): 618-629) to
investigate the effects genotype on cardiovascular biomarker response to fish oils. Eighty African-American
adults, aged 30–45 years, were prospectively recruited according to age, sex, and APOE genotype. Half of the
participants were randomly assigned to ingest three 700 mg EPA+DHA/d (700FO) capsules per day for an 8week intervention period. The other subjects consumed control oil capsules on the same regimen. The
hypotheses of main interest was whether changes in HDL levels (HDL_DIFF) were affected by Treatment
(700FO vs Control) and APOE genotype, and whether the Treatment effect differed across APOE genotypes
(Treatment x Genotype interaction). The data are in file: BST622-Assn2-FOHDL2012.xls.
(A). Perform a 2 x 3 (Treatment by Genotype) two-factor ANOVA.
Source
SS
df
F
Treatment
___
____
____
APOE
___
____
____
Treat x APOE
___
____
____
Within
___
____
Total
___
____
p-value
_____
_____
_____
(8 points)
(B). Conduct Follow-up Analyses.
(7 points)
i). If only Main effects are statistically significant, conduct the appropriate follow-up tests.
ii). If the interaction is statistically significant, perform Simple Main Effects analysis as a follow-up.
proc glm;class treat APOE;
model HDL_DIFF = treat APOE treat*APOE;means treat*APOE;
lsmeans APOE / adjust=tukey cl pdiff tdiff ;
lsmeans treat*APOE / slice=APOE;run;
(C). Write a brief interpretation of these results in the form of an “expanded” Results Section
(7 points)
6
Beasley
BST 622 Assignment 2 – 200 points
Spring 2015
14. Perform a one-way ANOVA on the J = 6 groups; where j = 1 (Control-E2); j = 2 (Control-E3);
j = 3 (Control-E4); j = 4 (Treatment-E2); j = 5 (Treatment-E3); and j = 6 (Treatment-E4);
proc glm;class group; model HDL_DIFF = group;
contrast 'contrast 1' group 1 1 1 -1 -1 -1;
contrast 'contrast 2'
group 1 0 -1 1 0 -1,
group -1 2 -1 -1 2 -1 ;
contrast 'contrast 3'
group 1 -1 0 -1
1 0 ,
group 0 -1 1 0
1 -1 ;
contrast 'contrast 4' group 1 0 0 -1 0 0;run;
(A). Report the following Source Table for the One-Way ANOVA.
Source
df
SS
F
Between Group
___
____
____
Within Group
___
____
Total
___
____
(B). What are results for Contrast 1?
F-ratio
p-value
______
(4 points)
p-value
(2 points).
(C). Explain how the result in 14B relates to the results 13A.
(D). What are results for Contrast 2?
F-ratio
(2 points)
p-value
(2 points).
(E). Explain how the result in 14D relates to the results 13A.
(F). What are results for Contrast 3?
F-ratio
(2 points)
p-value
(2 points).
(G). Explain how the result in 14F relates to the results 13A.
(H). What are results for Contrast 4?
F-ratio
(2 points)
p-value
(2 points).
(I). Explain how the result in 14H relates to the results 13B.
(2 points)
(J). What is the null hypothesis for contrast for the Simple Main Effect of Treatment (CON vs 700FO) at
APOE = E4?
(2 points)
H0: ___ C.E2
___ C.E3
___ C.E4
___ T.E2
___ T.E3
___ T.E4
=0
7
Beasley
BST 622 Assignment 2 – 200 points
Spring 2015
15. In the data set BST622-Assn2-FOHDL2012.xls,
Tx is an effect coding scheme, Control group (CON) is coded -1 and Treatment group (700FO) is coded +1.
AP_add is an linear polynomial code that captures the additive genetic effect of the APOE marker.
E2 is coded -1; E3 is coded 0; E4 is coded +1.
AP_dom is an quadratic polynomial code that captures the non-additive (dominant) genetic effect of the
APOE marker. E2 is coded -0.5; E3 is coded 1; E4 is coded +0.5.
GxTAdd is a cross-product of Tx and AP_add that represents the interaction of Treatment with the additive
genetic effect.
GxTDom is a cross-product of Tx and AP_dom that represents the interaction of Treatment with the nonadditive genetic effect.
Running the following SAS PROC REG code and note the similarity
proc reg data=bsthdl2;
model HDL_DIFF = Tx AP_add AP_dom GxTAdd GxTDom / scorr2 tol;
TRT: test Tx =0;
GENE: test AP_add = AP_dom =0;
TxG: test GxTAdd = GxTDom = 0;run;
(A). Report the following ANOVA Source Table.
Source
df
Model (Regression)
___
Error (Residual)
___
Total
___
SS
____
____
____
(B). What are results for the TRT test statement?
F-ratio
F
____
p-value
______
(4 points)
p-value
(2 points).
(C). Explain how the result in 15B relates to the results in 14B and 13A.
(D). What are results for the GENE test statement?
F-ratio
(2 points)
p-value
(2 points).
(E). Explain how the result in 15D relates to the results in 14D and 13A.
(F). What are results for the TxG test statement?
F-ratio
(2 points)
p-value
(2 points).
(G). Explain how the result in 15F relates to the results in 14F and 13A.
(2 points)
8
BST 622 Assignment 2 – 200 points
Beasley
Spring 2015
Extra Credit 1 (10 points).
In the data set BST622-Assn2-FOHDL2012.xls
Tdum is an dummy code, Control group (CON) is coded 0 and Treatment group (700FO) is coded +1.
GxDAdd is a cross-product of Tdum and AP_add that represents the interaction of Treatment with the
additive genetic effect.
GxDDom is a cross-product of Tdum and AP_dom that represents the interaction of Treatment with the nonadditive genetic effect.
proc reg data=bsthdl2;
model HDL_DIFF = Tdum AP_add AP_dom GxDAdd GxDDom / scorr2 tol;
TRT: test Tdum =0;
GENE: test AP_add = AP_dom =0;
TxG: test GxDAdd = GxDDom = 0;run;
Note any differences between these results and other linear model approaches to analyzing a factorial
ANOVA design and offer possible explanations for these differences.
Multiple Group Comparisons (ANOVA Models) EXTRA CREDIT (up to 20 points)
EC2. Determine the F-ratio which results from the given one-way ANOVA data.
Source
df
SS
MS
F
Between
4
30.5
_____
______
Within
____
_____
_____
________________________________________________
Total
99
165.0
(4 points)
EC3. For the data in question EC1, the estimated percent variance in the dependent variable accounted for by
the independent variable is:
2 = R2 = _______________
(2 points)
In a one factor ANOVA with J = 4 groups and nj = 5 subjects per group:
EC4. Y 1 = 22
Y 2 = 24
What is the value for SS Between?
Y 3 = 20
Y 4 = 26
EC5. s1 = 2.0
s2 = 2.2
What is the value for SS Within?
s3 = 2.1
s4 = 2.3
_________________
_________________
EC6. Reconstruct the ANOVA Source Table
Source
df
SS
MS
F
Between
____
_____
______
_____
Within
____
_____
______
________________________________________________
Total
_____
(4 points)
(4 points)
(4 points)
EC7. For the data in question EC5, the estimated percent variance in the dependent variable accounted for by
the independent variable is:
2 = R2 = _______________
(2 points)
9