• What is hypothesis testing?
Transcription
• What is hypothesis testing?
Ch9, p.1 • What is hypothesis testing? Question 7.1 (What is a hypothesis testing question?) 1. observed data. x1 , . . . , xn 2. statistical modeling. Regard x1 , . . . , xn as a realization of random variables bl X1 , . . . , Xn , and d assign X1 , . . . , Xn a joint distribution: d b a joint cdf F (·| ), or a joint pdf f (·| ), or a joint pmf p(·| ), where = ( 1, . . . , k) , and i0 s are oxed constants, constants but their values are unknown. point estimation. What is the value of ? Find a function of 3. p X1 , . . . , Xn , , to estimate or a function of . 4. hypothesis testing. Separate into two sets 0 and A , where and . 0 A = 0 A = Use the data X1 , . . . , Xn to answer the question: which of the two hypotheses, hypotheses H0 : versus HA : 0 A is more favorable. Ch9, p.2 Question 7.2 Can we obtain an estimate of and accept H0 if A? Hint. Consider the case: X Binomial(n, p), 0 and reject H0 if H0 : p = 0.5 v.s. H1 : p 6= 0.5, What if n = 105 and p = 0.50001? Definition 7.1 (null and alternative hypotheses, simple and composite hypotheses, TBp.331,332,334) H0 is called null hypothesis. HA (sometimes denoted as H1 ) is called alternative hypothesis. An hypothesis is said to be a simple hypothesis if that hypothesis uniquely specioes the distribution. distribution Any hypothesis that is not a simple hypothesis is called a composite hypothesis. Question 7.3 (asymmetry between H0 and HA) Is there a di erence between the roles of H0 and HA ? Can we arbitrary exchange the two hypotheses? Ch9, p.3 Example 7.1 (some null and alternative hypotheses, TBp.329, 334) 1. Two coins experiment p Data and problem Suppose that I have two coins Coin 0 has probability of heads equal to 0.5, and coin 1 has probability of heads equal to 0.7. I choose h one off the th coins, i t toss it 10 times, ti and d tell t ll you the th number of heads X On the basis of observed X, your task is to decide which coin it was. Statistical modeling. X Binomial(10, p) = {Binomial(10, 0.5), Binomial(10, 0.7)} Problem formulation H0 : coin 0 0 = {Binomial(10, 0.5)} HA : coin 1 A = {Binomial(10, 0.7)} Both hypotheses are simple. Ch9, p.4 2. Testing for ESP p Data and problem A subject is asked to identify, without looking, the suits of 20 cards drawn randomly with replacement from a 52 card deck. Let T be the number of correct identiocations. We would like to know whether the person is purely guessing or has extrasensory ability. ability statistical modeling T Binomial(20, ( , p) p). Note that p = 0.25 means that the subject is mearly guessing and has no extrasensory ability. = {p : p [0.25, 1]} or = {p : p [0, 1]} Problem formulation = [0.25, [0 25 1] H0 : p = 0.25 v.s. HA : p > 0.25 Then, 0 = {0.25} and A = (0.25, 1]. The H0 is simple and HA is composite. Furhermore, HA is called a one-sided hypothesis. Ch9, p.5 = [0, 1] H0 : p = 0.25 v.s. HA : p 6= 0.25 Then, 0 = {0.25} and A = [0, 0.25) (0.25, 1]. The H0 is simple and HA is composite. Furhermore, HA is called ll d a two-sided id d hypothesis. h h i 3. goodness-of-ot test (for Poisson distribution) Data and problem Observe an i.i.d. data X1 , . . . , Xn . Suppose that we only know that Xi s are discrete data. We would like to know whether the observations came from a Poisson distribution. Statistical St ti ti l modeling: d li X1 , . . . , Xn are i.i.d. from a discrete distribution with cdf F . {F : F is a discrete cdf}} ={ Problem formulation H0 : Data came from some Poisson 0 = {F : F is a P ( ) cdf} HA : Data D t nott from f P i Poisson \ 0. A = Both hypotheses are composite. Ch9, p.6 • Neyman-Pearson paradigm --- concept and procedure of hypothesis testing Definition 7.2 (test, rejection region, acceptance region, test statistic, TBp. 331) A test is a rule deoned on sample space of data (denoted as S) that specioes: 1. for which sample values x 2. for which sample values x S, the decision is made to accept H0 S, reject H0 in favor of HA The subset of the sample space for which H0 will be rejected is called the rejection region, denoted as RR. The complement of the rejection region is called the acceptance region, d denoted t d as AR = S\RR = RRc . decisions S AR hypotheses RR decisions Note. Two types of errors ma be incurred may inc rred in applying appl ing this paradigm accept H0 reject H0 hypo- H0 is true Type I error theses H is true Type II error A Ch9, p.7 Definition 7.3 (type I error, significant level, type II error, power, TBp. 331) Let denote the true value of p parameters. Type I error: reject H0 when H0 is ture, i.e., x RR when Type II error: accept H0 when HA is ture, i.e., x AR when 0 0 0. = probability of Type I error = P (RR| 0 ), where 0 0. 0 Signiocance level : the maximum (or the least upper bound) of the Type I error probability, i.e., = max or = 0 sup p . 0 0 A (AR|| 0 )), where 0 A 0 = probability of Type II error = P ( For 0 = P (RR| 0 ) = probability of A , power 0 = 1 0 rejecting H0 when HA is ture Question 7.4 (dilemma between type I and type II errors) An ideal test would have = 0, for = 0, for 0 and A ; but this can be achieved only in trivial cases. cases ( Example? ). ) In practice, practice in order to decrease , must be increases, and vice versa. smaller , larger S AR RR S AR RR larger , smaller Ch9, p.8 A solution to the dilemma. The Neyman-Pearson y approach pp imposes p an asymmetry y y between H0 and HA : 1. the signiocance level is oxed in advance, usually at a rather small number (e.g. 0.1, 0.05, and 0.01 are commonly used), and 2. then try to construct a test yielding a small value of for A. Example 7.2 (cont. Ex. 7.1 item 1, TBp.330-331) T Testing ti the th value l off the th parameter t p off a binomial bi i l distribution di t ib ti with ith 10 trials. Let X be the number of successes, then X B(10, p). H0 : p = 0.5 v.s. HA : p = 0.7 Di erent observations of X give di erent support on HA T = g(X1 , . . . , Xn ) In this case, larger observations of X give more support on HA Rejection region should consist of large values of X. X The observations that give more support on HA are called more extreme observations If the rejection region is {7, 8, 9, 10}, then = P (X {7, 8, 9, 10}|p = 0.5) = 1 P (X 6|p = 0.5) = 0.18. Ch9, p.9 Set the signiocance level as 0.18. The probability of Type II error is P (X 6 {7, 8, 9, 10}|p = 0.7) = P (X 6|p = 0.7) = 0.35, and the power is {7, 8, 9, 10}|p = 0.7) = 1 P (X P (X 6|p = 0.7) = 0.65. Suppose that we are interested in testing H0 : p = 0.5 v.s. HA : p > 0.5 and the RR is still set to be {7, 8, 9, 10}: Note that p is a function of p: p = P (X / {7, 8, 9, 10}|p). power function=1 1 p as p 1 as p p 1, and, 0.5. Question 7.5 1. Which hypothesis get more protection? What kind of protection? 2. Will the protection make P (Type I error) always smaller than P (Type for any II error), i.e., < A? Ch9, p.10 Definition 7.4 (test statistic, null distribution, critical value, TBp. 331 & 334) The test statistic is a function of data, denoted as T (X1 , . . . , Xn ), upon p which the statistical decision will be based,, i.e.,, the rejection j region and acceptance region are deoned based on the values of T . The distribution of a test statistic under the null hypothesis is called null ll distribution. di t ib ti If the rejection region is of the form {T > t0 } or {T < t0 }, the number t0 is called critical value, which separate the rejection region and acceptance region. T = g(X ( 1 , . . . , Xn ) Q1: How to find a good test statistic? Q2: How to find the critical value? hypotheses S T=t0 S AR RR based on a function of data, data T T test statistic Ch9, p.11 Example 7.3 (cont. Ex. 7.2) In Ex. 7.2, LNp. 8, the original data X1 , . . . , X10 are i.i.d. from Bernoulli B(p), the test statistics is X = X1 + · · · + X10 , and the null distribution is Binomial(10, 0.5). The critical value can be set to be 6.5. Question 7.6 (difficulty with -level tests) Di erent persons have di erent criteria of signiocance levels, and could come up with opposite conclusions even based on the same observed data. data Example. test statistic: T , and rejection region is of the type: T < t, smaller values of T give more support on HA , i.e., i e cast more doubt on H0 on the other hand, larger values of T cast more doubt on HA smaller values are more extreme ((Note. Extreme values of T cast more doubts on H0 .) pdf of T under H0 (null distribution) observed value of T p-value p value = the probability that a value of T that is as extreme or more extreme than the observed value of T would occur by chance if the null hypothesis was true. Ch9, p.12 Note. If p-value is very small (i.e., Tobs is a very extreme value), then either H0 holds and an extremelyy rare event has occurred or H0 is false. relationship between p-value and =P (Type I error) = 0.2 = 0.1 Area = Area = TOBS < t0.2 Decision rule: Tobs p-value < 0.2 rejection region TOBS > t0.1 p-value > 0.1 p-value < Note 1. The rejection region depends on the value of depends on the observed value of T while the p-value Note 2. It makes more sense to report a p-value than merely report whether or not the null hypothesis was rejected. Question : Is p-value a statistic?