Power and Sample Size
Transcription
Power and Sample Size
Power and Sample Size Sample Size Determination Sample size determination is the act of choosing the number of observations or replicates to include in a statistical sample. Larger sample sizes generally lead to increased precision when estimating unknown parameters. In some situations, the increase in accuracy for larger sample sizes is minimal, or even nonexistent. This can result from the presence of systematic errors or strong dependence in the data, or if the data follow a heavy-tailed distribution. A study that collects too much data is also wasteful. Therefore, before collecting data, it is essential to determine the sample size requirements of a study. Sample Size Determination Factors That Influence Sample Size The "right" sample size for a particular application depends on many factors, including the following: Cost considerations (e.g., maximum budget, desire to minimize cost). Administrative concerns (e.g., complexity of the design, research deadlines). Minimum acceptable level of precision. Confidence level. Variability within the population or subpopulation (e.g., stratum, cluster) of interest. Sampling method. Large-Sample Confidence Interval for a Population Mean The Confidence Interval is expressed more generally as x ± z 2 x = x ± z 2 n For samples of size > 30, the confidence interval is expressed as s x±z 2 n Requires that the sample used be random Sample Size Determination Sample size determination for 100(1-1)% confidence interval for 3 In order to estimate 3 with a sampling error (this is equal to half the confidence interval width) and with 100(1-1)% confidence, the sample size is found as follows: where is estimated by s or R/4 z /2 n= n 2 (z /2 ) ( SE ) = SE 2 2 Sample Size Determination Ex. The mean inflation pressures of footballs is 13.5 pounds, but uncontrollable factors cause the pressures of individual footballs from 13.3 to 13.7 pounds. We wish to estimate the mean inflation pressure to within 0.025 pound of the true value with a 99% confidence interval. What sample size should be used? = 0.01, z n= (z /2) Range 13.7 13.3 = = 0.1, SE = 0.025 4 4 2 ( 2.575 ) 2 ( 0.1) 2 = =106.09 107 2 ( 0.025 ) / 2 = 2.575 , = 2 ( SE ) 2 Large-Scale Confidence Interval for a Population Proportion Sample size n is large if pˆ ± 3 0 and 1 pˆ falls between Confidence interval is calculated as pˆ ± z 2 p = pˆ ± z 2 x where pˆ = and qˆ = 1 n pq n pˆ pˆ ± z 2 pˆ qˆ n Sample Size Determination Sample size determination for 100(1-1)% confidence interval for p In order to estimate a binomial probability p with a sampling error (this is equal to half the confidence interval width) and with 100(1-1)% confidence, the sample size is found as follows: pq z /2 = SE n 2 ( z / 2 ) ( pq ) n= 2 ( SE ) Sample Size Determination Ex. The following is a 90% Confidence interval for p: (0.26,0.54). How large was the sample used to construct this interval? 0.54 + 0.26 = 0.1, z / 2 =1.645, pˆ = = 0.4 , SE = 0.54 0.4 = 0.14 2 2 2 ( z / 2 ) ( pˆ qˆ ) (1.645 ) ( 0.4 )( 0.6 ) n= = =33.135 34 2 2 ( SE ) ( 0.14 ) Comparing Two Population Means: Independent Sampling Large Sample Confidence Interval for µ1 µ2 2 (x ) x2 ± z 1 2 ( ) (x1 x 2 ) = x1 x 2 ± z 2 1 n1 + assuming independent sampling, which provides the following substitution (x1 x2 )= 2 1 n1 + 2 2 n2 2 1 2 2 s s + n1 n2 2 2 n2 Sample Size Determination Sample size determination for 100(1-1)% confidence interval for 31-32 In order to estimate 31-32 with a margin of error (this is equal to half the confidence interval width) and with 100(1-1)% confidence, the sample size is found as follows: z /2 2 1 n + n1 = n2 = n = 2 2 = ME n 2 (z /2 ) ( 2 1 + 2 ( ME ) 2 2) Sample Size Determination Ex. Assuming that n1=n2 , find the sample size needed to estimate 31-32 for a 90% confidence interval of width 1.0. Assume that 2 2 = 5 . 8 , 1 2 = 7.5 = 0.1, z / 2 =1.645, ME = 0.5 ( z / 2 ) 2 ( 12 + 22 ) (1.645 ) 2 ( 5.8+ 7.5 ) n1 = n2 = = =143.96 144 2 2 ( ME ) ( 0.5 ) Sample Size Determination Sample size determination for 100(1-1)% confidence interval for p1-p2 In order to estimate p1-p2 with a margin of error ME (this is equal to half the confidence interval width) and with 100(1-1)% confidence, the sample size is found as follows: p1q1 p2 q2 z /2 + = ME n n ( z / 2 ) 2 ( p1q1 + p2 q2 ) n1 = n2 = n = ( ME ) 2 µ1 µ 2 Sample Size Determination Ex. Assuming that n1=n2=n , find the sample size needed to estimate p1-p2 for a 90% confidence interval of width 0.05. Assume that there is no prior information available to obtain approximate values of p1 and p2. = 0.1, z / 2 =1.645, p1 = p2 = 0.5, ME = 0.025 ( z / 2 ) 2 ( p1q1 + p2 q2 ) (1.645 ) 2 ( 0.25 + 0.25 ) n= = = 2164.82 2165 2 2 ( ME ) ( 0.025 ) µ1 µ 2 Power and Sample Size of Tests Statistical power is the probability of correctly rejecting a false null hypothesis when a specific alternate hypothesis is true. Power analysis allows us to determine how likely it is that a test of statistical significance such as a z-test will support the claims. It also can determine how many cases we need in our sample to attain a specific level of statistical power. Power analysis can be used to calculate the minimum sample size required, In addition, the concept of power is used to make comparisons between different statistical testing procedures: for example, between a parametric and a nonparametric test of the same hypothesis. Summary of Possible Results accept H-0 reject H-0 H-0 true 1- H-0 false 1- =type 1 error rate =type 2 error rate 1- =statistical power Standard Case Sampling P(T) distribution if H0 were true alpha 0.05 Sampling distribution if HA were true POWER = 1 - T c µ1 µ 2 Upper tailed Z test We will find value z because it is defined on the c u0 null distribution by alpha. z = n rewrite this equation to solve for c = u0 + z n Then we find the value z as the label for the standardized score corresponding to c z z c u a u0 u a z = = + z and n = (u0 ua ) n n 2 Power and Sample Size Calculation for z-test: Use to calculate a hypothesis test of the mean when the population standard deviation (,) is known z z u0 u a + z and n = Upper tailed test : z = (u0 ua ) n 2 z +z 2 u0 u a Lower tailed test : z = n Two tailed test : z L u0 u a = n z and n = z /2 and n = u0 u a + z /2 zU = n Using Z table to find value, power = 1 - (u0 ua ) z1 +z (u0 ua ) 2 /2 Suppose we wish to have a sample large enough to have power of 90% to detect 2a-20 = 4.0 where the standard deviation is 10.0, using one-tailed Z test with alpha error rate of .01. If n=82, what is the power? Thia is upper tailed Z test = 0.01, z = 2.326, n= z z (u0 u a ) 2 = = 1 - 0.9 = 0.1, z = 1.282, 1.282 2.326 4 / 10 2 = 81.36 82 If given n = 82, u0 ua 4 +z = + 2.326 = 1.30 z = n 10 82 = 0.5 0.4032 = 0.0968 power = 1 0.0968 = 0.9032 = 10 We determine whether a cereal box filling machine was derviating from u = 12 ounces. H 0 : u = 12 vs H a : u 12. Calculate the power of the test of the rejection region when ua = 11.9. Assume n = 100, = 0.05, z z z L U = /2 u0 u a n U L power = 1 = 0.05. = 1.96, u0 = 12, ua = 11.9 z u0 u a = +z n = = 0.5, /2 /2 12 11.9 1.96 = 0.04, 0.5 / 100 L = 0.5160 12 11.9 = + 1.96 = 3.96, 0.5 / 100 U = 0.99996 = = 0.99996 0.5160 = 0.48396 = 1 0.48396 = 0.51604 If given power = 0.6, n = ? ( z n= / 2 + z1 ) 2 (u0 ua ) 2 2 (1.96 + 0.253) 2 (0.5) 2 = 122.43 123 = 2 (12 11.9) Power and Sample Size Calculation for t-test An estimate of the population standard deviation (A). For Power and Sample Size calculations, the estimate of A (the population standard deviation or experimental variability) depends on whether you have already collected data. · Prospective studies are done before collecting data so A has to be estimated. You can use related research, pilot studies, or subject-matter knowledge to estimate A. · Retrospective studies are done after data have been collected so you can use the data to estimate A. For sample size calculations, the data have not been collected yet so A has to be estimated. You can use related research, pilot studies, or subject-matter knowledge to estimate A. Power and Sample Size Calculation for t-test: Use to calculate one of the following for a one-sample t-test or paired t-test. t t u0 u a + t and n = Upper tailed test : t = (u0 ua ) s s n u0 u a Lower tailed test : t = s n Two tailed test : t L u0 u a = s n t and n = t / 2 and n = u0 u a + t /2 tU = s n Using T table to find value, power = 1 - 2 2 t +t (u0 ua ) s t1 +t 2 /2 (u0 ua ) s Properties of 9 For fixed n, 1, and s, D decreases as 3a-30 increases. For fixed n, s, 3a and 30, D increases as 1 decreases. For fixed 1, s, 3a and 30, D decreases as n increases.