• What is hypothesis testing?

Transcription

• What is hypothesis testing?
Ch9, p.1
• What is hypothesis testing?
Question 7.1 (What is a hypothesis testing question?)
1. observed data. x1 , . . . , xn
2. statistical modeling. Regard x1 , . . . , xn as a realization of random
variables
bl X1 , . . . , Xn , and
d assign X1 , . . . , Xn a joint distribution:
d
b
a joint cdf F (·| ), or a joint pdf f (·| ), or a joint pmf p(·| ),
where
= ( 1, . . . , k)
, and i0 s are oxed constants,
constants but their
values are unknown.
point estimation. What is the value of ? Find a function of
3. p
X1 , . . . , Xn , , to estimate or a function of .
4. hypothesis testing. Separate into two sets 0 and A , where
and
.
0
A =
0
A =
Use the data X1 , . . . , Xn to answer the question: which of the two
hypotheses,
hypotheses
H0 :
versus HA :
0
A
is more favorable.
Ch9, p.2
Question 7.2
Can we obtain an estimate of
and accept H0 if ˆ
ˆ
A?
Hint. Consider the case: X Binomial(n, p),
0
and reject H0 if
H0 : p = 0.5 v.s. H1 : p 6= 0.5,
What if n = 105 and pˆ = 0.50001?
Definition 7.1 (null and alternative hypotheses, simple and composite hypotheses, TBp.331,332,334)
• H0 is called null hypothesis.
• HA (sometimes denoted as H1 ) is called alternative hypothesis.
• An hypothesis is said to be a simple hypothesis if that hypothesis
uniquely specioes the distribution.
distribution
• Any hypothesis that is not a simple hypothesis is called a composite
hypothesis.
Question 7.3 (asymmetry between H0 and HA)
Is there a di erence between the roles of H0 and HA ? Can we arbitrary
exchange the two hypotheses?
Ch9, p.3
Example 7.1 (some null and alternative hypotheses, TBp.329, 334)
1. Two coins experiment
p
• Data and problem
— Suppose that I have two coins
— Coin 0 has probability of heads equal to 0.5, and coin 1 has
probability of heads equal to 0.7.
— I choose
h
one off the
th coins,
i
t
toss
it 10 times,
ti
and
d tell
t ll you the
th
number of heads X
— On the basis of observed X, your task is to decide which coin
it was.
• Statistical modeling.
— X Binomial(10, p)
— = {Binomial(10, 0.5), Binomial(10, 0.7)}
• Problem formulation
— H0 : coin 0
0 = {Binomial(10, 0.5)}
— HA : coin 1
A = {Binomial(10, 0.7)}
— Both hypotheses are simple.
Ch9, p.4
2. Testing for ESP
p
• Data and problem
— A subject is asked to identify, without looking, the suits of 20
cards drawn randomly with replacement from a 52 card deck.
— Let T be the number of correct identiocations.
— We would like to know whether the person is purely guessing
or has extrasensory ability.
ability
• statistical modeling
— T
Binomial(20,
( , p)
p).
— Note that p = 0.25 means that the subject is mearly guessing
and has no extrasensory ability.
— = {p : p [0.25, 1]} or = {p : p [0, 1]}
• Problem formulation
— = [0.25,
[0 25 1]
H0 : p = 0.25 v.s. HA : p > 0.25
Then, 0 = {0.25} and A = (0.25, 1].
The H0 is simple and HA is composite. Furhermore, HA
is called a one-sided hypothesis.
Ch9, p.5
—
= [0, 1]
H0 : p = 0.25 v.s. HA : p 6= 0.25
Then, 0 = {0.25} and A = [0, 0.25) (0.25, 1].
The H0 is simple and HA is composite. Furhermore, HA
is called
ll d a two-sided
id d hypothesis.
h
h i
3. goodness-of-ot test (for Poisson distribution)
• Data and problem
— Observe an i.i.d. data X1 , . . . , Xn .
— Suppose that we only know that Xi ’s are discrete data.
— We would like to know whether the observations came from a
Poisson distribution.
• Statistical
St ti ti l modeling:
d li
— X1 , . . . , Xn are i.i.d. from a discrete distribution with cdf F .
{F : F is a discrete cdf}}
— ={
• Problem formulation
— H0 : Data came from some Poisson
0 = {F : F is a P ( ) cdf}
— HA : Data
D t nott from
f
P i
Poisson
\ 0.
A =
— Both hypotheses are composite.
Ch9, p.6
• Neyman-Pearson paradigm --- concept and procedure of hypothesis testing
Definition 7.2 (test, rejection region, acceptance region, test statistic, TBp. 331)
• A test is a rule deoned on sample space of data (denoted as S) that specioes:
1. for which sample values x
2. for which sample values x
S, the decision is made to accept H0
S, reject H0 in favor of HA
• The subset of the sample space for which H0 will be rejected is called the
rejection region, denoted as RR.
• The complement of the rejection region is called the acceptance region,
d
denoted
t d as AR = S\RR = RRc .
decisions
S
AR
hypotheses
RR
decisions
Note. Two types of errors
ma be incurred
may
inc rred in applying
appl ing
this paradigm
accept H0
reject H0
hypo- H0 is true
Type I error
theses H is true Type II error
A
Ch9, p.7
Definition 7.3 (type I error, significant level, type II error, power, TBp. 331)
Let
denote the true value of p
parameters.
• Type I error: reject H0 when H0 is ture, i.e., x
RR when
• Type II error: accept H0 when HA is ture, i.e., x
AR when
0
0
0.
= probability of Type I error = P (RR| 0 ), where 0
–
0.
0
– Signiocance level : the maximum (or the least upper bound)
of the Type I error probability, i.e.,
= max
or
=
0
sup
p
.
0
0
A
–
(AR|| 0 )), where 0
A
0 = probability of Type II error = P (
– For 0
= P (RR| 0 ) = probability of
A , power 0 = 1
0
rejecting H0 when HA is ture
Question 7.4 (dilemma between type I and type II errors)
An ideal test would have
= 0, for
= 0, for
0 and
A ; but
this can be achieved only in trivial cases.
cases ( Example? ).
) In practice,
practice in order
to decrease , must be increases, and vice versa.
smaller ,
larger
S
AR
RR
S AR
RR
larger ,
smaller
Ch9, p.8
A solution to the dilemma.
The Neyman-Pearson
y
approach
pp
imposes
p
an asymmetry
y
y between H0 and HA :
1. the signiocance level is oxed in advance, usually at a rather small
number (e.g. 0.1, 0.05, and 0.01 are commonly used), and
2. then try to construct a test yielding a small value of
for
A.
Example 7.2 (cont. Ex. 7.1 item 1, TBp.330-331)
• T
Testing
ti the
th value
l off the
th parameter
t p off a binomial
bi
i l distribution
di t ib ti with
ith 10
trials. Let X be the number of successes, then X B(10, p).
H0 : p = 0.5 v.s. HA : p = 0.7
• Di erent observations of X give di erent support on HA
T = g(X1 , . . . , Xn )
— In this case, larger observations of X give more support on HA
— Rejection region should consist of large values of X.
X
— The observations that give more support on HA are called “more
extreme” observations
• If the rejection region is {7, 8, 9, 10}, then
= P (X {7, 8, 9, 10}|p = 0.5) = 1 P (X 6|p = 0.5) = 0.18.
Ch9, p.9
• Set the signiocance level as 0.18.
The probability of Type II error is
P (X 6 {7, 8, 9, 10}|p = 0.7) = P (X
6|p = 0.7) = 0.35,
and the power is
{7, 8, 9, 10}|p = 0.7) = 1
P (X
P (X
6|p = 0.7) = 0.65.
• Suppose that we are interested in testing
H0 : p = 0.5 v.s. HA : p > 0.5
and the RR is still set to be {7, 8, 9, 10}:
— Note that p is a function of p:
p
= P (X / {7, 8, 9, 10}|p).
— power function=1
— 1
p
as p
1 as p
p
1, and,
0.5.
Question 7.5
1. Which hypothesis get more protection? What kind of protection?
2. Will the protection make P (Type I error) always smaller than P (Type
for any
II error), i.e., <
A?
Ch9, p.10
Definition 7.4 (test statistic, null distribution, critical value, TBp. 331 & 334)
• The test statistic is a function of data, denoted as
T (X1 , . . . , Xn ),
upon
p
which the statistical decision will be based,, i.e.,, the rejection
j
region and acceptance region are deoned based on the values of T .
• The distribution of a test statistic under the null hypothesis is called
null
ll distribution.
di t ib ti
• If the rejection region is of the form {T > t0 } or {T < t0 }, the number
t0 is called critical value, which separate the rejection region and
acceptance region.
T = g(X
( 1 , . . . , Xn )
Q1: How to find a
good test statistic?
Q2: How to find the
critical value?
hypotheses
S
T=t0
S
AR
RR
based on a function of data,
data T
T
test statistic
Ch9, p.11
Example 7.3 (cont. Ex. 7.2)
In Ex. 7.2, LNp. 8, the original data X1 , . . . , X10 are i.i.d. from Bernoulli
B(p), the test statistics is X = X1 + · · · + X10 , and the null distribution is
Binomial(10, 0.5). The critical value can be set to be 6.5.
Question 7.6 (difficulty with -level tests)
Di erent persons have di erent criteria of signiocance levels, and could come
up with opposite conclusions even based on the same observed data.
data
Example. test statistic: T , and rejection region is of the type: T < t,
smaller values of T give more support on HA , i.e.,
i e cast more doubt on H0
on the other hand, larger values of T cast more doubt on HA
smaller values are more “extreme” ((Note. “Extreme” values of T cast
more doubts on H0 .)
pdf of T under H0 (null distribution)
observed value of T
p-value
p
value = the probability that a value of T
that is as “extreme” or more “extreme”
than the observed value of T would occur by chance if the null hypothesis was
true.
Ch9, p.12
Note. If p-value is very small (i.e., Tobs is a very extreme value), then either
H0 holds and an extremelyy rare event has occurred or H0 is false.
relationship between p-value and =P (Type I error)
= 0.2
= 0.1
Area =
Area =
TOBS < t0.2
Decision rule: Tobs
p-value < 0.2
rejection region
TOBS > t0.1
p-value > 0.1
p-value <
Note 1. The rejection region depends on the value of
depends on the observed value of T
while the p-value
Note 2. It makes more sense to report a p-value than merely report whether
or not the null hypothesis was rejected.
Question : Is p-value a statistic?