• What is hypothesis testing?

Transcription

Ch9, p.1
• What is hypothesis testing?
Question 7.1 (What is a hypothesis testing question?)
1. observed data. x1 , . . . , xn
2. statistical modeling. Regard x1 , . . . , xn as a realization of random
variables
bl X1 , . . . , Xn , and
d assign X1 , . . . , Xn a joint distribution:
d
b
a joint cdf F (·| ), or a joint pdf f (·| ), or a joint pmf p(·| ),
where
= ( 1, . . . , k)
, and i0 s are oxed constants,
constants but their
values are unknown.
point estimation. What is the value of ? Find a function of
3. p
X1 , . . . , Xn , , to estimate or a function of .
4. hypothesis testing. Separate into two sets 0 and A , where
and
.
0
A =
0
A =
Use the data X1 , . . . , Xn to answer the question: which of the two
hypotheses,
hypotheses
H0 :
versus HA :
0
A
is more favorable.
Ch9, p.2
Question 7.2
Can we obtain an estimate of
and accept H0 if

A?
Hint. Consider the case: X Binomial(n, p),
0
and reject H0 if
H0 : p = 0.5 v.s. H1 : p 6= 0.5,
What if n = 105 and p = 0.50001?
Definition 7.1 (null and alternative hypotheses, simple and composite hypotheses, TBp.331,332,334)
H0 is called null hypothesis.
HA (sometimes denoted as H1 ) is called alternative hypothesis.
An hypothesis is said to be a simple hypothesis if that hypothesis
uniquely specioes the distribution.
distribution
Any hypothesis that is not a simple hypothesis is called a composite
hypothesis.
Question 7.3 (asymmetry between H0 and HA)
Is there a di erence between the roles of H0 and HA ? Can we arbitrary
exchange the two hypotheses?
Ch9, p.3
Example 7.1 (some null and alternative hypotheses, TBp.329, 334)
1. Two coins experiment
p
Data and problem
Suppose that I have two coins
Coin 0 has probability of heads equal to 0.5, and coin 1 has
probability of heads equal to 0.7.
I choose
h
one off the
th coins,
i
t
toss
it 10 times,
ti
and
d tell
t ll you the
th
number of heads X
On the basis of observed X, your task is to decide which coin
it was.
Statistical modeling.
X Binomial(10, p)
= {Binomial(10, 0.5), Binomial(10, 0.7)}
Problem formulation
H0 : coin 0
0 = {Binomial(10, 0.5)}
HA : coin 1
A = {Binomial(10, 0.7)}
Both hypotheses are simple.
Ch9, p.4
2. Testing for ESP
p
Data and problem
A subject is asked to identify, without looking, the suits of 20
cards drawn randomly with replacement from a 52 card deck.
Let T be the number of correct identiocations.
We would like to know whether the person is purely guessing
or has extrasensory ability.
ability
statistical modeling
T
Binomial(20,
( , p)
p).
Note that p = 0.25 means that the subject is mearly guessing
and has no extrasensory ability.
= {p : p [0.25, 1]} or = {p : p [0, 1]}
Problem formulation
= [0.25,
[0 25 1]
H0 : p = 0.25 v.s. HA : p > 0.25
Then, 0 = {0.25} and A = (0.25, 1].
The H0 is simple and HA is composite. Furhermore, HA
is called a one-sided hypothesis.
Ch9, p.5

= [0, 1]
H0 : p = 0.25 v.s. HA : p 6= 0.25
Then, 0 = {0.25} and A = [0, 0.25) (0.25, 1].
The H0 is simple and HA is composite. Furhermore, HA
is called
ll d a two-sided
id d hypothesis.
h
h i
3. goodness-of-ot test (for Poisson distribution)
Data and problem
Observe an i.i.d. data X1 , . . . , Xn .
Suppose that we only know that Xi s are discrete data.
We would like to know whether the observations came from a
Poisson distribution.
Statistical
St ti ti l modeling:
d li
X1 , . . . , Xn are i.i.d. from a discrete distribution with cdf F .
{F : F is a discrete cdf}}
={
Problem formulation
H0 : Data came from some Poisson
0 = {F : F is a P ( ) cdf}
HA : Data
D t nott from
f
P i
Poisson
\ 0.
A =
Both hypotheses are composite.
Ch9, p.6
• Neyman-Pearson paradigm --- concept and procedure of hypothesis testing
Definition 7.2 (test, rejection region, acceptance region, test statistic, TBp. 331)
A test is a rule deoned on sample space of data (denoted as S) that specioes:
1. for which sample values x
2. for which sample values x
S, the decision is made to accept H0
S, reject H0 in favor of HA
The subset of the sample space for which H0 will be rejected is called the
rejection region, denoted as RR.
The complement of the rejection region is called the acceptance region,
d
denoted
t d as AR = S\RR = RRc .
decisions
S
AR
hypotheses
RR
decisions
Note. Two types of errors
ma be incurred
may
inc rred in applying
appl ing
this paradigm
accept H0
reject H0
hypo- H0 is true
Type I error
theses H is true Type II error
A
Ch9, p.7
Definition 7.3 (type I error, significant level, type II error, power, TBp. 331)
Let
denote the true value of p
parameters.
Type I error: reject H0 when H0 is ture, i.e., x
RR when
Type II error: accept H0 when HA is ture, i.e., x
AR when
0
0
0.
= probability of Type I error = P (RR| 0 ), where 0

0.
0
Signiocance level : the maximum (or the least upper bound)
of the Type I error probability, i.e.,
= max
or
=
0
sup
p
.
0
0
A

(AR|| 0 )), where 0
A
0 = probability of Type II error = P (
For 0
= P (RR| 0 ) = probability of
A , power 0 = 1
0
rejecting H0 when HA is ture
Question 7.4 (dilemma between type I and type II errors)
An ideal test would have
= 0, for
= 0, for
0 and
A ; but
this can be achieved only in trivial cases.
cases ( Example? ).
) In practice,
practice in order
to decrease , must be increases, and vice versa.
smaller ,
larger
S
AR
RR
S AR
RR
larger ,
smaller
Ch9, p.8
A solution to the dilemma.
The Neyman-Pearson
y
approach
pp
imposes
p
an asymmetry
y
y between H0 and HA :
1. the signiocance level is oxed in advance, usually at a rather small
number (e.g. 0.1, 0.05, and 0.01 are commonly used), and
2. then try to construct a test yielding a small value of
for
A.
Example 7.2 (cont. Ex. 7.1 item 1, TBp.330-331)
T
Testing
ti the
th value
l off the
th parameter
t p off a binomial
bi
i l distribution
di t ib ti with
ith 10
trials. Let X be the number of successes, then X B(10, p).
H0 : p = 0.5 v.s. HA : p = 0.7
Di erent observations of X give di erent support on HA
T = g(X1 , . . . , Xn )
In this case, larger observations of X give more support on HA
Rejection region should consist of large values of X.
X
The observations that give more support on HA are called more
extreme observations
If the rejection region is {7, 8, 9, 10}, then
= P (X {7, 8, 9, 10}|p = 0.5) = 1 P (X 6|p = 0.5) = 0.18.
Ch9, p.9
Set the signiocance level as 0.18.
The probability of Type II error is
P (X 6 {7, 8, 9, 10}|p = 0.7) = P (X
6|p = 0.7) = 0.35,
and the power is
{7, 8, 9, 10}|p = 0.7) = 1
P (X
P (X
6|p = 0.7) = 0.65.
Suppose that we are interested in testing
H0 : p = 0.5 v.s. HA : p > 0.5
and the RR is still set to be {7, 8, 9, 10}:
Note that p is a function of p:
p
= P (X / {7, 8, 9, 10}|p).
power function=1
1
p
as p
1 as p
p
1, and,
0.5.
Question 7.5
1. Which hypothesis get more protection? What kind of protection?
2. Will the protection make P (Type I error) always smaller than P (Type
for any
II error), i.e., <
A?
Ch9, p.10
Definition 7.4 (test statistic, null distribution, critical value, TBp. 331 & 334)
The test statistic is a function of data, denoted as
T (X1 , . . . , Xn ),
upon
p
which the statistical decision will be based,, i.e.,, the rejection
j
region and acceptance region are deoned based on the values of T .
The distribution of a test statistic under the null hypothesis is called
null
ll distribution.
di t ib ti
If the rejection region is of the form {T > t0 } or {T < t0 }, the number
t0 is called critical value, which separate the rejection region and
acceptance region.
T = g(X
( 1 , . . . , Xn )
Q1: How to find a
good test statistic?
Q2: How to find the
critical value?
hypotheses
S
T=t0
S
AR
RR
based on a function of data,
data T
T
test statistic
Ch9, p.11
Example 7.3 (cont. Ex. 7.2)
In Ex. 7.2, LNp. 8, the original data X1 , . . . , X10 are i.i.d. from Bernoulli
B(p), the test statistics is X = X1 + · · · + X10 , and the null distribution is
Binomial(10, 0.5). The critical value can be set to be 6.5.
Question 7.6 (difficulty with -level tests)
Di erent persons have di erent criteria of signiocance levels, and could come
up with opposite conclusions even based on the same observed data.
data
Example. test statistic: T , and rejection region is of the type: T < t,
smaller values of T give more support on HA , i.e.,
i e cast more doubt on H0
on the other hand, larger values of T cast more doubt on HA
smaller values are more extreme ((Note. Extreme values of T cast
more doubts on H0 .)
pdf of T under H0 (null distribution)
observed value of T
p-value
p
value = the probability that a value of T
that is as extreme or more extreme
than the observed value of T would occur by chance if the null hypothesis was
true.
Ch9, p.12
Note. If p-value is very small (i.e., Tobs is a very extreme value), then either
H0 holds and an extremelyy rare event has occurred or H0 is false.
relationship between p-value and =P (Type I error)
= 0.2
= 0.1
Area =
Area =
TOBS < t0.2
Decision rule: Tobs
p-value < 0.2
rejection region
TOBS > t0.1
p-value > 0.1
p-value <
Note 1. The rejection region depends on the value of
depends on the observed value of T
while the p-value
Note 2. It makes more sense to report a p-value than merely report whether
or not the null hypothesis was rejected.
Question : Is p-value a statistic?