Stat 491: Biostatistics Chapter 8: Hypothesis Testing–Two-Sample Inference Solomon W. Harrar Fall 2012

Transcription

Introduction
Inference about µ1 − µ2 : Paired Samples
Inference about µ1 − µ2 : Independent Samples
Stat 491: Biostatistics
Chapter 8: Hypothesis Testing–Two-Sample Inference
Solomon W. Harrar
The University of Montana
Fall 2012
Introduction
Two-Sample Inference
In Chapter 6 and 7, we had only one-sample.
Underlying µ (or p) of the population from which the sample
was drawn was compared with known mean (prevalence rate)
of the general population.
Example: Asian immigrants mean cholesterol was compared
with the general US mean cholesterol known to be 190
mg/dL.
In this chapter, we do have two samples each from a different
population.
Interest lies in comparing the underlying unknown means of
the two populations.
Introduction
Randomized Clinical Trials (RCT)
Patients are assigned to treatments by some random
mechanism.
If sample sizes are large, we expect type of patients assigned
to different treatment modalities to be similar.
If sample sizes is small, patient characteristics of treatment
groups may not be comparable.
A table of characteristics of the treatment groups are
customarily presented to check that the randomization is
working well.
Design features of RCT
Randomization: Complete, Block , Cluster (Group), Stratified
(by age, sex, or overall clinical condition).
Blinding: Single, Double, Triple and unblinded
Example: Greek Health Project
Introduction
Two Types of Samples
Paired Samples: Each data point in one samples is matched
and related to a unique data point in the other sample.
Independent Samples: The data points in one sample are
unrelated to the data points in the other sample.
Example: Suppose we are interested in studying the
association between Oral Contraceptive (OC) use and blood
pressure.
One can start with non OC user women in the child bearing
age group (16-49 years of age) and follow them for one year.
For those who started using OC within the one year period,
compare the blood pressure at baseline and follow-up.
Alternatively, one can identify a group of OC user women and
another group of non users and compare their blood pressures.
Introduction
Paired Samples Arise When
Having the same set of experimental units receive both
treatments (Cross-Over Design)
Having measurement taken before and after treatment
(Repeated-Measures Design)
No randomization.
Matching Subjects (Matched-Pair Design)
Using naturally occurring pairs such as twins or husbands and
wives.
Matching with respect to extraneous factors that may mask
differences in the treatments.
Block Randomization
Matched Case-Control Study (Observational study)
Introduction
Paired or Independent Sample
In repeated measures, each subject is serving as their own
control. This design may benefit from having a control group
as it allows to rule out other factors that may cause changes
between the two time points.
In matching, extraneous factors are expected to influence both
members of the pair equally.
Hence, paired design is definitive in that if difference is
present, it is highly likely that it occurred because of the the
difference in treatment.
Difference in the independent samples are only suggestive.
The differences in the subjects may mask true treatment or
group differences.
Paired design may NOT sometimes be practical and is usually
expensive.
Introduction
Paired t Test
Let µd = µ1 − µ2 .
Let n denote the number of pairs of measurements in the
sample.
Let di denote the difference between the first and second
measurement in the ith pair.
Assumption: d1 , d2 , . . . , dn constitute a random sample from
a normally distributed population with mean µd and unknown
variance σd2 .
We can look at Q-Q plot and Box plots of the d’s to check
violation of the normality assumption.
Compute
s
Pn
n
X
¯ 2
1
i=1 (di − d)
¯
di and sd =
.
d=
n
n−1
i=1
Introduction
The Paired t-test
Hypotheses:
Case 1. H0 : µd = 0
Case 2. H0 : µd = 0
Case 3. H0 : µd = 0
T.S.:
vs Ha : µd > 0
vs Ha : µd < 0
vs Ha : µd 6= 0
d¯
√
sd / n
R.R.: For a specified value of α,
Case 1. Reject H0 if t ≥ tn−1,1−α .
Case 2. Reject H0 if t ≤ −tn−1,1−α .
Case 3. Reject H0 if |t| ≥ tn−1,α/2 .
p-Value:
Case 1. P(t > tcomputed )
Case 2. P(t < tcomputed )
Case 3. 2 × P(t > |tcomputed |) for two-sided test.
t=
Introduction
Confidence Interval for µd
A 100(1 − α)% two-sided confidence interval estimate of the
size of the difference (µd ) is
sd
d¯ ± tn−1,1−α/2 √ .
n
A 100(1 − α)% lower-sided confidence limit for the size of the
difference (µd ) is
sd
d¯ + tn−1,1−α √ .
n
A 100(1 − α)% upper-sided confidence limit for the size of the
difference (µd ) is
sd
d¯ − tn−1,1−α √ .
n
If n is large then the z-test is used and normality is not
needed.
Introduction
Example: Nutrition
An important hypothesis in hypertension research is that sodium
restriction may lower blood pressure. However, it is difficult to
achieve sodium restriction over the long term, and dietary
counseling in a group setting is sometimes used to achieve this
goal. The data on overnight urinary sodium excretion (mEq/8hr)
were obtained on eight individuals enrolled in a sodium-restricted
group. Data was collected at baseline
and after one week of dietary counseling. (d¯ = 1.14 and sd = 12.22)
Person
Baseline
Week 1
di
1
7.85
9.59
-1.74
2
12.03
34.50
-22.47
3
21.84
4.55
17.29
4
13.94
20.78
-6.84
5
16.68
11.69
4.99
6
41.78
32.51
9.27
7
14.97
5.46
9.51
8
12.072
12.95
-0.88
Test the appropriate hypothesis and report p-value. Construct 95%
CI for the true mean change in overnight sodium excretion over a
one-week period. Verify the validity of the normality assumption.
Introduction
Power Analysis and Sample-Size Estimation
Note that di = x1i − x2i where x1i and x2i are the
measurements on the ith subject at the baseline and
follow-up, respectively.
Assumed d1 , . . . , dn constitute a random sample from
N(µd , σd2 ).
If we can get a good working estimate of σd from a previous
or pilot or reproducibility study, we can use the power and
sample-size formulae from the one sample problem here.
More specifically, for the two-sided alternative
PWR(µd ) ≈ P(Z ≤ −z1−α/2 +
n = σd2
|µd |
√ )
σd / n
and
(z1−α/2 + z1−β )2
µ2d
For one-sided test, replace α/2 with α, and the power is exact.
Introduction
Power Analysis and Sample-Size Estimation Cont’d...
However, caution has to be used when using estimate of σd
from a previous study, in particular, in longitudinal studies.
Know that
σd2 = σ12 + σ22 − 2ρσ1 σ2
where ρ is the correlation between X1 and X2 .
σd2 depends on the correlation ρ.
The correlation typically decreases at the time separation
increases.
To use σd from a previous study, we have to make sure that
the time separation between baseline and follow up in the
previous study and the planned study are about the same.
Introduction
Background
Notations: Let us denote the population means and standard
deviations from the two populations as
Population 1: µ1 and σ1
Population 2: µ2 and σ2
Notations: Let us denote the means, standard deviation and
sample sizes of the two independent samples from the two
populations as
Sample 1: x¯1 , s1 and n1
Sample 2: x¯2 , s2 and n2
We are interested in making inference about µ1 − µ2 ..
¯1 − X
¯2 .
A natural estimator of µ1 − µ2 is X
Introduction
¯1 − X
¯2
The Sampling Distribution of X
If the two populations are normally distributed then the
¯1 − X
¯2 is normal with mean
sampling distribution of X
µX¯1 −X¯2 = µ1 − µ2 and standard deviation
σX2¯1 −X¯2 =
σ12 σ22
+ .
n1
n2
If either of the two populations are non-normal but n1 and n2
are both large, then the above sampling distribution of
¯1 − X
¯2 hold approximately. This is a consequence of the
X
CLT.
Introduction
The three cases
Case 1: Both populations are normally distributed with
(a) σ1 = σ2 = σ (Pooled-variance t-procedures).
(b) σ1 =
6 σ2 (Welch-Satterthwaite t-procedures).
Case 2: Both Sample Sizes n1 and n2 are large (z procedures)
Case 3: Either n1 or n2 is small and the population is non-normal.
(Bootstrap or Nonparametric procedures)
Introduction
The Equal-Variance Case
The two populations are normally distributed,
t=
¯1 − X
¯2 ) − (µ1 − µ2 )
(X
q
s n11 + n12
where
S2 =
∼
tn1 +n2 −2
(n1 − 1)S12 + (n2 − 1)S22
.
n1 + n2 − 2
Notice the degrees of freedom n1 + n2 − 2 comes from S 2 .
We will use this quantity to construct tests and confidence
intervals when the two populations are normal and the
standard deviations are equal.
Introduction
Large-Samples Case
When the sample sizes n1 and n2 are large, we use the
quantity
Z=
¯1 − X
¯ ) − (µ1 − µ2 )
(X
q2 2
S22
S1
n1 + n2
·
∼
N(0, 1)
This is true whether or not normality or equality of variance
hold.
This quantity is used for tests and confidence intervals when
n1 and n2 are large.
Introduction
The Independent-Samples t-test for µ1 − µ2
Hypotheses:
Case 1. H0 : µ1 − µ2 ≤ 0 vs Ha : µ1 − µ2 > 0
Case 2. H0 : µ1 − µ2 ≥ 0 vs Ha : µ1 − µ2 < 0
Case 3. H0 : µ1 − µ2 = 0 vs Ha : µ1 − µ2 6= 0
T.S.:
p
t = (¯
x1 − x¯2 )/(s 1/n1 + 1/n2 )
R.R.: For a specified value of α,
Case 1. Reject H0 if t ≥ tn1 +n2 −2,1−α .
Case 2. Reject H0 if t ≤ −tn1 +n2 −2,1−α .
Case 3. Reject H0 if |t| ≥ tn1 +n2 −2,1−α/2 .
p-Value:
Case 1. Reject H0 if P(t > tcomputed ).
Case 2. Reject H0 if P(t < tcomputed ).
Case 3. Reject H0 if P(t > |tcomputed |).
Introduction
100(1 − α)% CI for µ1 − µ2 when σ1 = σ2
A 100(1 − α)% confidence interval for µ1 − µ2 is given by
r
1
1
(¯
x1 − x¯2 ) ± tn1 +n2 −2,1−α/2 s
+
n1 n2
Lower-sided confidence interval for µ1 − µ2
r
1
1
(¯
x1 − x¯2 ) + tn1 +n2 −2,1−α s
+ .
n1 n2
Upper-sided confidence interval for µ1 − µ2
r
1
1
(¯
x1 − x¯2 ) − tn1 +n2 −2,1−α s
+ .
n1 n2
In R inference for difference in means can be tested in one of
the following two ways depending on how your data is
organized.
Introduction
t test in R
Inference for difference in means can be computed in R in one
of the following two ways depending on how your data is
organized.
If the two samples are entered as vectors x and y then
t.test(x,y,mu=0,paired=F,var.equal=T,
alternative="two.sided")
If the all the data form the two samples is in one vector y and
the vector x contains indicators of sample, then we use
t.test(y~x,mu=0,paired=F,var.equal=T,
alternative="two.sided")
Examples:
x=c(2.3,3.4,1.2,4.4)
y=c(3.2,1.5,2.6,3.3,4.5)
t.test(x,y,var.eual=T)
x=c(1,1,1,1,2,2,2,2,2)
y=c(2.3,3.4,1.2,4.4,3.2,1.5,2.6,3.3,4.5)
t.test(y~x,var.eual=T)
Introduction
Example: Veterinary Science
An experiment was conducted to evaluate the effectiveness of a
treatment for tapeworm in the stomachs of sheep. A random
sample of 24 worm-infected lamb of approximately the same age
and health was randomly divided into two groups. Twelve of the
lambs were injected with the drug and the remaining twelve were
left untreated. After a 6-month period, the lambs were slaughtered
and the following worm counts were recorded:
Drug Treated: 18, 43, 28, 50, 16, 32, 13, 35, 38, 33, 6, 7
Untreated: 40, 54, 26, 63, 21, 37, 39, 23, 48, 58, 28,39
(a) Does any of the assumptions of the pooled t-test appear to an
issue? (b) Test whether the mean number of tapeworms in the
stomachs of the treated lambs is less than the mean for untreated
lambs. Use α = 0.05. (c) What is the level of significance for this
test? (d) Place a 95% CI on µ1 − µ2 to assess the size of the
difference in the two means.
Introduction
Pooled-Variance t-test for µ1 − µ2 :An Example Cont’d...
x¯1 = 26.58, s1 = 14.36, x¯2 = 39.67 and s2 = 13.86
Normal Q−Q Plot for Untreated
50
Normal Q−Q Plot for Drug Treated
●
60
●
●
30
●
●
●
●
●
●
30
●
●
40
Sample Quantiles
●
●
●
●
50
●
20
Sample Quantiles
40
●
●
●
10
●
●
−1.5
10
●
20
●
●
−1.0
−0.5
20
0.0
0.5
1.0
1.5
−1.5
−1.0
−0.5
0.0
0.5
Theoretical Quantiles
Theoretical Quantiles
Box Plot for Drug Treated
Box Plot for Untreated
30
40
50
20
30
40
1.0
50
1.5
60
Introduction
Test for Equality of Variances
The choice between the pooled-variance and
Welch-Satterthwaite procedures depends on whether the
variances of the two populations are equal or not.
In reality, it may not always be clear if equality holds or not.
However, we can conduct a statistical test to assess the
departure from equality using sample data.
Assume the two populations are normally distributed.
We are interested in testing
H0 : σ12 = σ22
vs Ha : σ12 6= σ22
Introduction
Test for Equality of Variances Cont’d...
The quantity,
F =
S12 /σ12
S22 /σ22
∼
Fn1 −1,n2 −1
where
n1
P
S12 =
n2
P
¯1 )2
(X1i − X
i=1
n1 − 1
and S22 =
¯2 )2
(X2i − X
i=1
n2 − 1
.
The Fd1 ,d2 distribution depends on two degrees of freedom
known as the numerator and denominator degrees of freedom.
The Fd1 ,d2 distribution is a right-skewed distribution over the
interval (0, ∞).
We want to reject H0 , when the test-statistic F = S12 /S22 is
small or large compared to 1.
Introduction
For a size-α test, we reject H0 if
F ≤ Fn1 −1,n2 −1,α/2
or F ≥ Fn1 −1,n2 −1,1−α/2
In R, the quantiles of Fn1 −1,n2 −1 can be obtained as
qf(alpha/2, n1-1, n2-1)
qf(1-alpha/2, n1-1, n2-1)
p-value,
(
2 × P(F > Fcomputed ) if Fcomputed ≥ 1
p−value =
2 × P(F < Fcomputed ) if Fcomputed < 1
Area under the curve of Fn1 −1,n2 −1 to the left of Fcomputed
can be found in R by
pf(F_computed,n1-1,n2-1)
For the tape worm data, test the hypothesis of equality of
variance.
Introduction
In the test-statistic F ,
F =
S12
S22
H0
∼
Fn1 −1,n2 −1 ,
we are using using the variance of the sample from population
1 in numerator and that of population 2 in the denominator.
The labeling of the population is arbitrary.
We could define the test statistic as
F =
S22
S12
H0
∼
Fn2 −1,n1 −1 .
Do we get the same conclusion? YES.
Introduction
We observe, under the null hypothesis H0 : σ12 = σ22 , that
P(
S22
< Fn2 −1,n1 −1,α/2 ) = α/2
S12
= P(
S12
S22
1
>
F
)
=
P(
<
)
n
−1,n
−1,1−α/2
1
2
Fn1 −1,n2 −1,1−α/2
S22
S12
Therefore,
Fn2 −1,n1 −1,α/2 =
1
Fn1 −1,n2 −1,1−α/2
.
In R equality of variance can be tested in one of the following
two ways depending on how your data is organized.
var.test(x,y,ratio=1,alternative="two.sided")
var.test(y~x,ratio=1,alternative="two.sided")
Introduction
The Behrens-Fisher Problem
Assume two independent samples from normal populations.
We know, by conducting a test or otherwise, σ1 6= σ2 .
Inference about µ1 − µ2 in this situation is known as the
Behrens-Fisher problem.
The test and confidence interval procedure was developed by
Welch(1938) using Satterthwaite approximation for the
degrees of freedom and, hence, is referred to as
Welch-Satterthwaite Method.
Introduction
The Behrens-Fisher Problem Cont’d...
The quantity
t0 =
(¯
x1 − x¯2 ) − (µ1 − µ2 ) ·
q 2
∼ td
s22
s1
n1 + n2
where
d=
(s12 /n1 + s22 /n2 )2
.
(s12 /n1 )2 /(n1 − 1) + (s22 /n2 )2 /(n2 − 1)
This quantity is used for tests and confidence intervals
concerning µ1 − µ2 .
For example, a 100(1 − α)% CI for µ1 − µ2 is given by
s
s12
s2
(¯
x1 − x¯2 ) ± td,1−α/2
+ 2.
n1 n2
Effect of unequal variance is large for unequal sample sizes.
Introduction
Strategy for Testing Equality of Means
When it is not clear whether σ12 = σ22 but normality appears
to hold, use the following strategy.
Fail to Reject
Test
Reject
H0 : σ21 = σ22
Use Pooled
Use Welch's
t Test for
t Test for
H0 : µ1 = µ2
H0 : µ1 = µ2
Test for equality of variance is sensitive to departure from
normality.
Non-parametric methods must be used in these cases.
Introduction
Behrens-Fisher Problem: Example
A possible important environmental determinant of lung function
in children is amount of cigarette smoking in the home. Suppose
this question is studied by selecting two groups: group 1 consists
of 23 nonsmoking children 5-9 years of age, both of whose parents
smoke, who have a mean forced expiratory volume (FEV) of 2.1 L
and standard deviation of 0.7 L; group 2 consists of 20 nonsmoking
children of comparable age, neither of whose parents smoke, who
have mean FEV of 2.3 L and a standard deviation of 0.4 L. (a)
What are the appropriate null and alternative hypothesis in this
situation? (b) What is the appropriate test procedure for the
hypotheses above? (c) Carry out the test and report p-value. (d)
Provide 95% CI for the true mean difference in FEV between 5- to
9-year-old children whose parents smoke and comparable children
whose parents do not smoke.
Introduction
Power Analysis
For given sample sizes n1 and n2 and significance level α, the
power the study will have in detecting a difference of
∆ = |µ1 − µ2 | is
∆
PWR(∆) = P(Z < −z1−α/2 + q
)
σ12 /n1 + σ22 /n2
∆
= pnorm(−z1−α/2 + q
, 0, 1)
σ12 /n1 + σ22 /n2
For one-sided alternative, we replace α/2 with α.
Introduction
Power Analysis : Example
Suppose 100 OC users and 100 non-OC users are available for
study and a true mean difference of µ1 − µ2 = 5 mm Hg is
anticipated, with OC users having the higher mean SBP. How
much power would such a study have if estimates of the
standard deviations for OC users and non-users were obtained
from a pilot study as 15.34 mm Hg and 18.23 mm Hg,
respectively?
Introduction
Sample-Size Estimation
The appropriate sample size to have a probability of 1 − β of
finding a significant difference based on a two-sided test with
significance level α when the absolute difference in mean
between the two groups is ∆ = |µ1 − µ2 | is:
a. Equal sample sizes anticipated
n1 = n2 = (σ12 + σ22 )
(z1−α/2 + z1−β )2
.
∆2
b. A known proportion n2 = kn1 anticipated
n1 = (σ12 + σ22 /k)
(z1−α/2 + z1−β )2
.
∆2
For one-sided test, we replace α/2 with α.
When σ1 = σ2 , the smallest total sample size for a given α
and β is achieved by the equal sample size allocation.
Introduction
Sample-Size Estimation: Example
Suppose we anticipate twice as many non-OC users as OC
users entering the study. From a pilot study, estimates of the
standard deviations for OC users and non-users were obtained
as 15.34 and 18.23, respectively. Project the required sample
size to find a significant difference in a two-sided test with 5%
significance level and 80% power when there a 5 mm Hg
difference in the true SBP means of OC users and non-OC
users.
Introduction
Paired-Samples versus Independent-Samples t Test
For one sided test H0 : µ1 ≤ µ2 versus H0 : µ1 > µ2 , the
power of the Z test is given by,
PWR(∆) = P(Z < −z1−α +
∆
σX 1 −X 2
)
where ∆ = µ1 − µ2 .
For paired sample,
σX2
1 −X 2
=
σ12 σ22
σ1 σ2
+
− 2ρ
.
n
n
n
For independent samples,
σ12 σ22
+ .
1 −X 2
n
n
When ρ > 0, which is typically the case, paired sample will
have higher power than independent samples.
σX2
=
Introduction
Efficiency of Pairing: An Example
A study was designed to measure the effect of home environment
on academic achievement of 12-year-old students. Because genetic
differences may also contribute to academic achievement, the
researcher wanted to control for this factor. Thirty sets of identical
twins were identified who had been adopted prior to their first
birthday, with one twin placed in a home in which academics were
emphasized (Academic) and the other twin placed in a home in
which academics were not emphasize (Nonacademic). The p
values for comparing the mean scores for the academic and
nonacademic environments were 0.000 and 0.24 for the paired and
independent sample t tests, respectively.
Introduction
Efficiency of Pairing: An Example Cont’d...
(a) Is there a difference in in the mean final grade between the
students in an academically oriented home environment and
those in a nonacademic home environments?
(b) Does it appear that using twins in this study to control for
variation in the final scores were effective as compared to
taking a random sample of 30 students in both types of
environments? Justify your answer? See scatter plot on the
next page.
Introduction
Efficiency of Pairing: An Example Cont’d...
Scatter Plot of Scores of Academic and Nonacademic Twins
●
90
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
70
Nonacademic Environment
80
●
●
●
●
●
●
60
●
●
●
●
●
50
●
●
50
60
70
80
Academic Environment
90

Stat 491: Biostatistics Chapter 8: Hypothesis Testing–Two-Sample Inference Solomon W. Harrar Fall 2012

Transcription

Similar documents

An Architectural Pattern for Designing Intelligent Enterprise Systems

Charles Robert Darwin 1809-1882

Teaching Inference - 40th Day of Reading Conference

Infinite Edge ParOOon Models for Overlapping Community

Here - The Stanford Natural Language Processing Group

Web-Scale Knowledge Inference Using Markov Logic Networks

Presentation - Sameer Singh

Lecture 1 - Biostatistics

Contents - Rossett School