Stat 491: Biostatistics Chapter 7: Testing Hypothesis–One Sample Inference Solomon W. Harrar
Transcription
Stat 491: Biostatistics Chapter 7: Testing Hypothesis–One Sample Inference Solomon W. Harrar
A Statistical Test for µ Test for the Binomial Distribution Stat 491: Biostatistics Chapter 7: Testing Hypothesis–One Sample Inference Solomon W. Harrar The University of Montana Fall 2012 Chapter 7: Testing Hypothesis–One Sample Inference Stat 491: Biostatistics A Statistical Test for µ Test for the Binomial Distribution Elements of a Test of Hypothesis There is a preconceived idea (set of values) about µ. A statistical test is an approach used to see whether there is a compelling evidence in a random sample in favor of these preconceived values for µ. Five components of a statistical test are a. The research hypothesis Ha known as the Alternative Hypothesis. Another commonly used notation for the alternative hypothesis is H1 . b. The negation of the research hypothesis (or the status quo) H0 known as the Null Hypothesis. c. The test statistic (T.S.): a quantity computed from the sample used to state whether or not the data supports Ha . d. Rejection Region (R.R): very unlikely values of the test statistic if H0 is true. e. Check assumptions and draw conclusions. Chapter 7: Testing Hypothesis–One Sample Inference Stat 491: Biostatistics A Statistical Test for µ Test for the Binomial Distribution Elements of a Test of Hypothesis Cont’d ... Example: Suppose average cholesterol level in children is 175 mg/dL. A group of men who have died from heart disease within the past year are identified, and cholesterol levels of their offsprings are measured. Let µ be the mean cholesterol level (mg/dL) of all children whose fathers have died from heart disease within the past year. The main question here is ”Is there a familial aggregation of cardiovascular risk factors?”. State the null and alternative hypothesis. The two hypotheses are: H0 : µ = 175 and Ha : µ > 175. We have seen how to address this problem using the CI method. Another way is in terms of hypothesis testing. ¯ to base our inference on. That is, It is reasonable to use X ¯. our test statistic will be based on X Chapter 7: Testing Hypothesis–One Sample Inference Stat 491: Biostatistics A Statistical Test for µ Test for the Binomial Distribution Two Types of Errors The four possible scenarios in the hypothesis testing are given in the following table. Test Decision Reject H0 Accept H0 State of Nature H0 is True H0 is False Type I error Correct Correct Type II error Denote by α the Probability of Type I Error. That is, α = P(Rejecting H0 |H0 is true). Denote by β the Probability of Type II Error. That is, β = P(Accepting H0 |H0 is false). Example: Interpret α and β in the context of the familial aggregation of cardiovascular risk factor. Chapter 7: Testing Hypothesis–One Sample Inference Stat 491: Biostatistics A Statistical Test for µ Test for the Binomial Distribution Forming the Rejection Region (R.R.) Ideally, we would like to construct our decision rule such that both α and β are small. This is impossible for a fixed sample size because the two are inversely related. We specify a tolerable probability for type I error α = 0.05, say, and then locate the rejection region R. R. For the familial aggregation of cardiovascular-risk-factor example, if the distribution of cholesterol of children who lost their father within the past year due to heart disease is ¯ is normal. normally distributed, the sampling distribution of X More precisely, ¯ ∼ N(µ, σ 2 /n). X Chapter 7: Testing Hypothesis–One Sample Inference Stat 491: Biostatistics A Statistical Test for µ Test for the Binomial Distribution Forming the Rejection Region (R.R.) Cont’d... Let us now consider the test statistic t= ¯ − µ0 X √ s/ n H0 ∼ tn−1 where µ0 is the hypothesized value and for this example µ0 = 175. This test statistic is known as the t-test statistic. It would make sense to reject H0 for large values of t. We want now to define our rejection region so that the probability of type I error is 0.05. We know that P(t > tn−1,0.95 |H0 is True) = 0.05. We use the rejection region, t > tn−1,0.95 which guarantee the probability of falsely rejecting H0 does not exceed 0.05. Chapter 7: Testing Hypothesis–One Sample Inference Stat 491: Biostatistics A Statistical Test for µ Test for the Binomial Distribution Graphical Display of the Rejection Region density tn−−1 0 t1−−α x Chapter 7: Testing Hypothesis–One Sample Inference Stat 491: Biostatistics A Statistical Test for µ Test for the Binomial Distribution Summary of a Statistical Test for µ Assume the population is normally distributed Hypotheses: Case 1. H0 : µ ≤ µ0 Case 2. H0 : µ ≥ µ0 Case 3. H0 : µ = µ0 T.S.: t = vs vs vs Ha : µ > µ0 (One-Sided Alternative) Ha : µ < µ0 (One-Sided Alternative) Ha : µ = 6 µ0 (Two-Sided Alternative) ¯ −µ0 X √ s/ n R.R.: Case 1. Reject H0 if t ≥ tn−1,1−α Case 2. Reject H0 if t ≤ −tn−1,1−α Case 3. Reject H0 if t ≤ −tn−1,1−α/2 or t ≥ tn−1,1−α/2 . In short, |t| ≥ tn−1,1−α/2 . Chapter 7: Testing Hypothesis–One Sample Inference Stat 491: Biostatistics A Statistical Test for µ Test for the Binomial Distribution Example: Cardiovascular Disease Suppose we want to compare fasting serum-cholesterol levels among recent Asian immigrants to the united states with typical levels found in the general US population. The mean cholesterol levels in women ages 21-40 in the US is 190 mg/dL. It is unknown whether cholesterol levels among recent Asian immigrants are higher or lower than those in the general US population. Let us assume that levels among recent female Asian immigrants are normally distributed. Blood tests are performed on 100 female Asian immigrants ages 21-40, and the mean and standard deviation of these 100 people are 181.52 mg/dL and 40 mg/dL, respectively. What can we conclude on the basis of this sample evidence? Chapter 7: Testing Hypothesis–One Sample Inference Stat 491: Biostatistics A Statistical Test for µ Test for the Binomial Distribution Example: Salmonella Enteritidis A massive multi-state outbreak of food-borne illness was attributed to Salmonella enteritidis. Epidemiologists determined that the source of the illness was ice cream. They sampled nine production runs from the company that had produced the ice cream to determine the level of Salmonella enteritidis in the ice cream. These levels in MPN/g (most probable number per gram) are as follows: 0.593, 0.142, 0.329, 0.691, 0.231, 0.793, 0.519, 0.392, 0.418. Use these data to determine whether the average level of Salmonella Enteritidis in the ice cream is greater than 0.3 MPN/g, a level that is considered to be very dangerous. Set α = 0.01. R-code t.test(x=c(0.593, 0.142, 0.329, 0.691, 0.231, 0.793, 0.519, 0.392, 0.418), mu=0.3, alternative="greater") Chapter 7: Testing Hypothesis–One Sample Inference Stat 491: Biostatistics A Statistical Test for µ Test for the Binomial Distribution Example: Salmonella Enteritidis Cont’d ... Box Plot of Salmonella enteritidis Data 0.8 1.0 Normal Q−Q Plot ● 0.6 ● ● 0.4 Sample Quantiles ● ● ● ● 0.2 ● 0.0 ● −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 0.2 0.3 0.4 Theoretical Quantiles Chapter 7: Testing Hypothesis–One Sample Inference Stat 491: Biostatistics 0.5 0.6 0.7 0.8 A Statistical Test for µ Test for the Binomial Distribution Level of Significance (p-value) Reporting the result of the hypothesis test by specifying the rejection region (as we have been doing so far) is called critical-value method An alternative is the p-value method which allows you to state the weight of evidence in the sample against H0 . A p-value (level of significance) is the probability of obtaining a value of the test statistic that is as or more extreme than the actual value obtained, given that the null hypothesis is true. In other words, it is the probability of observing a sample outcome as or more contradictory to H0 than the observed sample result, given the null hypothesis is true. The smaller the p-value is the heavier the evidence in the sample against H0 . Chapter 7: Testing Hypothesis–One Sample Inference Stat 491: Biostatistics A Statistical Test for µ Test for the Binomial Distribution Level of Significance (p-value) Cont’d ... For a specified probability of type I error α, Reject H0 if p-value ≤ α Fail to Reject H0 if p-value > α Summary of how to calculate p-value Case 1 H0 : µ ≤ µ0 Ha : µ > µ 0 P(t ≥ computed t) Case 2 H0 : µ ≥ µ0 Ha : µ < µ 0 P(t ≤ computed t) case 3 H0 : µ = µ0 Ha : µ 6= µ0 2P(t ≥ |computed t|) In R, to compute p-value you use the command pt(computed t, n-1) #for lower-sided test 1-pt(computed t,n-1)#for upped-sided test 2*(1-pt(abs(computed t),n-1)) #for two-sided test Chapter 7: Testing Hypothesis–One Sample Inference Stat 491: Biostatistics A Statistical Test for µ Test for the Binomial Distribution Graphical Display of Two-Sided p-value density tn−−1 −t 0 t x Chapter 7: Testing Hypothesis–One Sample Inference Stat 491: Biostatistics A Statistical Test for µ Test for the Binomial Distribution Level of Significance (p-value) Cont’d ... The entries of the last row in the above table are the p-values. Example: For the Salmonella-Enteritidis example, compute the p-value. For the Asian-immigrants example, compute the p-value. Remarks: Compared to the critical-value method, the p-value method is more precise in that we are reporting the weight of the evidence against H0 . The decision of whether the evidence is enough is up to the reader. Chapter 7: Testing Hypothesis–One Sample Inference Stat 491: Biostatistics A Statistical Test for µ Test for the Binomial Distribution The Z -test Assume the standard deviation of the population σ is known. Consider the statistic Z= ¯ − µ0 ) (X √ σ/ n H0 ∼ N(0, 1). This statistic is less variable than the t-statistic. With the Z -statistic, our decision rule for testing H0 : µ ≤ µ0 vs Ha : µ > µ0 would be to reject H0 if Z ≥ z1−α . Chapter 7: Testing Hypothesis–One Sample Inference Stat 491: Biostatistics A Statistical Test for µ Test for the Binomial Distribution Summary of the Z -test Applicable when the population is normally distributed and the standard deviation is known OR n is large. Hypotheses: Case 1. H0 : µ ≤ µ0 Case 2. H0 : µ ≥ µ0 Case 3. H0 : µ = µ0 T.S.: Z = R.R.: vs vs vs Ha : µ > µ0 Ha : µ < µ0 Ha : µ = 6 µ0 ¯ −µ0 X √ σ/ n Case 1. Reject H0 if Z ≥ z1−α . Case 2. Reject H0 if z ≤ z1−α . Case 3. Reject H0 if |z| ≥ z1−α/2 . p-values Case 1 H0 : µ ≤ µ0 Ha : µ > µ 0 P(Z ≥ computed z) Case 2 H0 : µ ≥ µ0 Ha : µ < µ 0 P(Z ≤ computed z) Chapter 7: Testing Hypothesis–One Sample Inference Stat 491: Biostatistics case 3 H0 : µ = µ0 Ha : µ 6= µ0 2P(Z ≥ |computed z|) A Statistical Test for µ Test for the Binomial Distribution Standard Deviation and Normality Known? Z -test will have advantage over t-test when the standard deviation is known. Previous census or large scale studies may be used to determine σ. For large n, the assumption of normality and known variance are less important. Use t-test when n is small and normality assumption is tenable. For smaller sample size, if the normality appear violated (from Q-Q plot, Box plot, Histogram and Empirical Rule), Non-parametric methods should be used. Chapter 7: Testing Hypothesis–One Sample Inference Stat 491: Biostatistics A Statistical Test for µ Test for the Binomial Distribution Example: Congestive Heart Failure A study was conducted of 25 adult male patients following a new treatment for congestive heart failure. One of the variables measured on the patients was increase in exercise capacity (in minutes) over a 4-week treatment period. The previous treatment regime had produced an average increase of 2 minutes. The researchers wanted to evaluate whether the new treatment has on average increased the exercise capacity. The data yielded x¯ = 2.17. From a previous large study it is known that σ = 1.05. Using α = 0.05, what conclusion can we reach about the research hypothesis? Chapter 7: Testing Hypothesis–One Sample Inference Stat 491: Biostatistics A Statistical Test for µ Test for the Binomial Distribution Power Analysis Recall β = P(Accept H0 |H0 is false) That is, β is the probability of falsely accepting H0 . The power of a test is defined by PWR = 1 − β which is the probability of rejecting H0 when it is false. Tests of hypothesis are evaluated by calculating the power. More importantly, calculation of power is used to plan a study, usually before any data have been obtained, except possibly from a pilot study. Chapter 7: Testing Hypothesis–One Sample Inference Stat 491: Biostatistics A Statistical Test for µ Test for the Binomial Distribution Power Analysis: Notations Notations Let µ0 denote the null value of the population mean. Let µa denote the actual value of the mean in Ha . Let β(µa ) denote the power when the actual value of the mean is µa . Let PWR(µa ) denote the power when the actual value of the mean is µa . Chapter 7: Testing Hypothesis–One Sample Inference Stat 491: Biostatistics A Statistical Test for µ Test for the Binomial Distribution Under the Null Under the Alternative 0.00 0.05 0.10 0.15 Distribution of X µ0 µ0 + Z1−ασ n µa X Chapter 7: Testing Hypothesis–One Sample Inference Stat 491: Biostatistics A Statistical Test for µ Test for the Binomial Distribution Power Analysis: Upper-Sided Alternative For a one-tailed (upper-sided) test PWR(µa ) = P(Z ≤ −z1−α + µ a − µ0 √ ) σ/ n and β(µa ) = 1 − PWR(µa ). In R this area can be computed by using the pnorm command. Example: For the congestive-heart-failure example, compute the β(µa ) and plot the power PWR(µa ) when µa = 1.6, 1.8, 2, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3 Chapter 7: Testing Hypothesis–One Sample Inference Stat 491: Biostatistics A Statistical Test for µ Test for the Binomial Distribution Example: Plotting the Power Upper Sided Alternative 0.0 0.2 0.4 Power 0.6 0.8 1.0 The Power Curve 1.6 1.8 2.0 2.2 2.4 2.6 µa Chapter 7: Testing Hypothesis–One Sample Inference Stat 491: Biostatistics 2.8 3.0 A Statistical Test for µ Test for the Binomial Distribution The Power Analysis: Lower-Sided and Two-Sided Alternatives For one-tailed (lower-sided) test PWR(µa ) = P(Z ≤ −z1−α + µ 0 − µa √ ) σ/ n and β(µa )Z = 1 − PWR(µa ). For a two-tailed test µ0 − µa √ ) σ/ n µa − µ0 √ ) + P(Z ≤ −z1−α/2 + σ/ n β(µa ) = 1 − PWR(µa ) PWR(µa ) = P(Z ≤ −z1−α/2 + Chapter 7: Testing Hypothesis–One Sample Inference Stat 491: Biostatistics and A Statistical Test for µ Test for the Binomial Distribution The Power Analysis: Examples Example: Using 5% level of significance and sample of size 10, compute the power of the test for the cholesterol-aggregation data, with an alternative mean of 190 mg/dL, and a standard deviation (σ) of 50 mg/dL. Example: A new drug in the class of calcium-channel blockers is to be tested for the treatment of patients with unstable angina, a severe form of angina. The effect this drug will have on heart rate is unknown. Suppose 20 patients are to be studied and the change in heart rate after 48 hours is known to have a standard deviation of 10 beats per minute. What power would such a study have of detecting a significant difference in heart rate over 48 hours if it is hypothesized that the true mean change in the heart rate from base line to 48 hours could be either a mean increase or decrease of 5 beats per minute? Chapter 7: Testing Hypothesis–One Sample Inference Stat 491: Biostatistics A Statistical Test for µ Test for the Binomial Distribution Factors Affecting the Power The actual value of the mean in the alternative. For fixed α and n, the power increases as the µa moves away from the values in the null hypothesis. Sample size. For a fixed α and µa , the power increases as sample size increases. Standard Deviation. For fixed n, α and µa , power is higher when standard deviation is low. Probability of type I error (α). For fixed n and µa , the power decreases as α increases. Chapter 7: Testing Hypothesis–One Sample Inference Stat 491: Biostatistics A Statistical Test for µ Test for the Binomial Distribution Sample Size Determination Suppose we are interested in H0 : µ ≤ µ0 vs Ha : µ > µ0 . Specify the value of α. Then the sample size needed to guarantee a probability of type II error at most β when the actual mean differs from µ0 by at least ∆ is (z1−α + z1−β )2 . ∆2 Effect Size = ∆/σ is may be obtained by n = σ2 1 2 3 using µa and σ from previous studies or assessing the smallest clinically important difference expressed in standard deviation units or conducting a pilot study. Sample size obtained in this way is only a ballpark because of the inaccuracy in estimating µa and σ. They are useful, though, in checking a proposed sample size. Chapter 7: Testing Hypothesis–One Sample Inference Stat 491: Biostatistics A Statistical Test for µ Test for the Binomial Distribution Sample Size Determination Cont’d... The same formula also holds to determine the sample size needed to guarantee a probability of type II error at most β when the actual mean differs from µ0 by at least ∆ in the case H0 : µ ≥ µ0 vs Ha : µ < µ0 . For the two sided hypothesis, the approximate sample size needed to guarantee a probability of type II error at most β when the actual mean differs from µ0 by at least ∆ is: n=σ + z1−β )2 . ∆2 2 (z1−α/2 Chapter 7: Testing Hypothesis–One Sample Inference Stat 491: Biostatistics A Statistical Test for µ Test for the Binomial Distribution Example Consider a study of the effect of a calcium-channel-blocking agent on heart rate for patients with unstable angina. Suppose we want at least 80% power for detecting a significant difference if the effect of the drug is to change mean heart rate by 5 beats per minute over 48 hours in either direction and σ = 10 beats per minute. How many patients should be enrolled in such a study? What if the direction of the effect of the drug on heart rate was well known? Chapter 7: Testing Hypothesis–One Sample Inference Stat 491: Biostatistics A Statistical Test for µ Test for the Binomial Distribution Relationship b/n Hypothesis Testing and CIs Suppose we are testing H0 : µ = µ0 versus Ha : µ 6= µ0 . H0 is rejected with a two-sided level α test if and only if the two-sided 100(1 − α)% confidence interval does not contain µ0 . H0 is accepted with a two-sided level α test if and only if the two-sided 100(1 − α)% confidence interval does contain µ0 . To see these, rejecting H0 happens if and only if t < −t1−α/2 or t > t1−α/2 . This happens if and only if µ0 lies outside s x¯ ± t1−α/2 √ . n This relationship is the rationale for using CIs to decide on the reasonableness of specific values for the parameter µ. Similar relationship exists between a one-sided test and one-sided confidence interval. Chapter 7: Testing Hypothesis–One Sample Inference Stat 491: Biostatistics A Statistical Test for µ Test for the Binomial Distribution p-value versus Confidence Interval p-value tells us precisely how statistically significant the results are. However, often results that are statistically significant (because of a large sample) are not clinically important. A confidence interval would give additional information because it would tell you the range of values within which µ is likely to fall. On the other hand, confidence interval does not precisely tell us how significant the results are. Hence, it is good practice to compute both a p-value and confidence interval. Example: For the Cholesterol of Asian-Immigrants, 95% CI for µ is (173.58, 189.46) and p-value is 0.037. These two types of information are complementary. Chapter 7: Testing Hypothesis–One Sample Inference Stat 491: Biostatistics A Statistical Test for µ Test for the Binomial Distribution Sample-Size Estimation Based on CI Width In some situations, it is known that the treatment has a significant effect. Interest focuses instead on estimating the effect with a given degree of precision. Suppose it is well known that propranolol lowers heart rate over 48 hours when given to patients with angina at standard dosage level. A new study is proposed using a higher dose of propranolol than the standard one. Investigators are interested in estimating the drop in heart rate with high precision. How can this be done? The precision of a confidence interval is judged by its width. Chapter 7: Testing Hypothesis–One Sample Inference Stat 491: Biostatistics A Statistical Test for µ Test for the Binomial Distribution Sample-Size Estimation Based on CI Width Cont’d... The length of a 100(1 − α)% two-sided confidence interval is √ L = 2tn−1,1−α/2 s/ n. 2 This implies, n = 4tn−1,1−α/2 s 2 /L2 . We usually approximate tn−1,1−α/2 by z1−α/2 and obtain n = 4z1−α/2 s 2 /L2 . s 2 can be obtained from a previous study or a pilot study. Example: For the propranolol example, find the minimum sample size needed to estimate the mean change in heart rate (µ) if we require that the two-sided 95% CI for µ be no wider than 5 beat per minute and the sample standard deviation for the change in the heart rate equals 10 beats per minute. Chapter 7: Testing Hypothesis–One Sample Inference Stat 491: Biostatistics A Statistical Test for µ Test for the Binomial Distribution Test for Proportion Suppose p denotes the prevalence of a certain disease in a population. We are interested in testing H0 : p = p 0 vs Ha : p 6= p0 or a one-sided alternative. The number of people who have the disease X in a random sample of n individuals has a binomial distribution. Example: We are interested in the effect of having a family history of breast cancer on the incidence of breast cancer. Suppose that 400 of the 10,000 women of ages 50-54 samples whose mothers had breast cancer had breast cancer themselves at some time in their lives. It is known that prevalence rate of breast cancer for US women in this age group is 2%. Does this result indicate a link between family history and incidence of breast cancer? Chapter 7: Testing Hypothesis–One Sample Inference Stat 491: Biostatistics A Statistical Test for µ Test for the Binomial Distribution Approximate Test: Two Sided Let pˆ = X /n be the proportion of individuals in the sample who have the disease. If np0 (1 − p0 ) > 5 then, using the normal approximation to · the binomial, pˆ ∼ N(p0 , p0 (1 − p0 )/n) when H0 is true. Define the test-statistic Z as follows: pˆ − p0 Z=p p0 (1 − p0 )/n · ∼ N(0, 1) if H0 is true. We reject H0 if z < −z1−α/2 or z > z1−α/2 . p-value= 2 × P(Z > |computed z|) Chapter 7: Testing Hypothesis–One Sample Inference Stat 491: Biostatistics A Statistical Test for µ Test for the Binomial Distribution Approximate Test: One-Sided Alternative Let pˆ = X /n be the proportion of individuals in the sample who have the disease. If np0 (1 − p0 ) > 5 then pˆ ∼ N(p0 , p0 (1 − p0 )/n). Define the test-statistic Z as follows: pˆ − p0 Z=p p0 (1 − p0 )/n · ∼ N(0, 1) if H0 is true. We reject H0 if z < −z1−α for lower-sided alternative and z > z1−α for upper-sided alternative. p-value= P(Z < computed z) for lower-sided and p-value= P(Z > computed z) for upper-sided alternative. Assess the statistical significance of the breast cancer data. Chapter 7: Testing Hypothesis–One Sample Inference Stat 491: Biostatistics A Statistical Test for µ Test for the Binomial Distribution Exact Test Lower-sided alternative p-value = P(≤ x successes in n trials |H0 is true). Upper-sided alternative p-value = P(≥ x successes in n trials |H0 is true). Two-sided alternative ( 2 × P(≤ x successes in n trials |H0 is true) p-value = 2 × P(≥ x successes in n trials |H0 is true) if pˆ ≤ p0 . if pˆ > p0 p − value = 1 if the above formula gives a number greater than 1. The exact p-value can be computed in R using the binom.test command. Chapter 7: Testing Hypothesis–One Sample Inference Stat 491: Biostatistics A Statistical Test for µ Test for the Binomial Distribution Example: Occupational Health The safety of people who work at or live close to nuclear power plants has been the subject of widely publicized debate in recent years. One possible health hazard from radiation exposure is an excess of cancer deaths among those exposed. One problem with studying this question is that because the number of deaths attributed to either cancer in general or specific type of cancer is small, reaching statistically significant conclusion is difficult except after long periods of follow-up. An alternative approach is to perform a proportional mortality study, whereby the proportion of deaths attributed to a specific cause in an exposed group is compared with the corresponding proportion in a large population. Suppose, for example, 13 deaths have occurred among 55- to 60-year-old male workers in a nuclear-power plant and that in 5 of them the cause of death was cancer. Assume, based on vital-statistics reports, that approximately 20% of all deaths can be attributed to some form of cancer. Is this result significant? Chapter 7: Testing Hypothesis–One Sample Inference Stat 491: Biostatistics A Statistical Test for µ Test for the Binomial Distribution Power Analysis For two-tailed alternative, the power of the one-sample binomial test the power is given by s √ ! p0 (1 − p0 ) |p0 − pa | n PWR(pa ) = P Z ≤ (−z1−α/2 + p ) pa (1 − pa ) p0 (1 − p0 ) provided np0 (1 − p0 ) > 5 and npa (1 − pa ) > 5. For one-tailed alternative we replace α/2 by α. Example: Suppose we wish to test the hypothesis that women with a sister history of breast cancer are at higher risk of developing breast cancer themselves. Suppose the prevalence of breast cancer among the general population of 50- to 54- US women is 2%. We propose to interview 500 women 50 to 54 years of age with sister history of the disease. What is the power of such a study if the prevalence of breast cancer among women 50- to 54- years of age with sister history of breast cancer was 5%? Chapter 7: Testing Hypothesis–One Sample Inference Stat 491: Biostatistics A Statistical Test for µ Test for the Binomial Distribution Sample-Size Estimation The sample size needed to conduct a two-sided test with significance level α and power 1 − β versus the specified alternative hypothesis of p = pa is p0 (1 − p0 ) z1−α/2 + z1−β n= q pa (1−pa ) p0 (1−p0 ) (pa − p0 )2 2 . For one-tailed test, we replace α/2 with α. Example: In sister’s-history breast cancer example, how many women should we interview to achieve 90% power with α = 0.05. Chapter 7: Testing Hypothesis–One Sample Inference Stat 491: Biostatistics