COURSE 6: SAMPLE SIZE CALCULATION OF SAMPLE SIZE ............................................................85
Transcription
COURSE 6: SAMPLE SIZE CALCULATION OF SAMPLE SIZE ............................................................85
COURSE 6: SAMPLE SIZE CALCULATION OF SAMPLE SIZE ............................................................85 Preparing to Calculate Sample Size.....................................................86 Sample Size Calculations for Dichotomous Response Variables ..89 Sample Size Calculations for Continuous Response Variables ...102 Sample Size for Time-to-failure Data (censored data case) ..........106 Sample Size for Testing Equivalence of Treatments......................109 Pocock’s Table 9.1...................................................................................112 SAMPLE SIZE TABLES ..............................................................................114 POWER AND SAMPLE SIZE (PS) SOFTWARE .....................................135 How to Download PS .............................................................................135 How to Use PS: Examples .....................................................................138 APPENDIX: PAGANO TABLE A.3 ................................................................ I - 84 - CALCULATION OF SAMPLE SIZE Clinical trials should have sufficient statistical power to detect differences between groups considered to be of clinical interest. Therefore, calculation of sample size with provision for adequate levels of significance and power is an essential part of trial planning. Type I error, Type II error, p value, and power of a test H 0 is true (type I error) Reject H 0 Do not reject H 0 H 0 is not true 1 – (power) (type II error) Power = the probability of REJECTING the null hypothesis if a specific alternative is true Power = Fn. (, variation, clinically significant level, and SAMPLE SIZE) Sample size = Fn. (power, variation, clinically significant level) p-value = the probability we would have observed this difference (or a greater difference) if the null hypothesis were true Calculation of a proper sample size is necessary to ensure adequate levels of significance and power to detect differences of clinical interest. Biggest danger: Sample size too small No significant difference found Treatment that may be useful is discarded. Sample size calculations are approximate. Often based on roughly estimated parameter values. Usually based on mathematical models that only approximate truth. Changes may occur in the target population, the eligibility criteria, or the expected treatment effect before the study begins. Be conservative when estimating sample size. 85 Preparing to Calculate Sample Size 1. What is the main purpose of the trial? (This is the question on which sample size is based.) 2. What is the principal measure of patient outcome (endpoint)? Is this measure continuous or discrete? Censoring? 3. What statistical test will be used to assess treatment difference (e.g., ttest, log-rank, chi-square)? With what -level? One-tailed or two-tailed? 4. What result is anticipated with the standard treatment (e.g., average value or rate)? 5. How small a treatment difference is it important to detect (), and with what degree of certainty (power = 1 – )? = type I error = probability of rejecting H 0 : = 0 when H 0 : = 0 is true = type II error = probability of not rejecting H 0 : = 0 when 0. changes as a function of : near zero is large; far from zero is small 1– = power (as a = probability of rejecting H 0 : = 0 when 0 function of ) near zero low power; far from zero high power 86 Often we set = 0.05 or 0.01, but want to check various values of n, , and . For fixed n, a plot of power = 1 – vs. is a power curve. For a two-sided test, it looks like this: 1 large n Power = 1 - small n α <0 H 0 false =0 H 0 true >0 H 0 false Alternatively, we could plot any two parameters for fixed values of the others. For example, for fixed and , we could plot n vs.: 200 1 – = 0.9 Sample size 100 1 – = 0.8 0 0 |δ| Because sample size planning often involves a trade-off between desired sample size, cost, and patient resources, such curves are useful. 87 Alternatively, sample sizes may be based on lengths of confidence intervals instead of power. If done, it’s best to still check power to make sure power is adequate. In either case, C.I.’s are useful for reporting results. These sample size methods assume a single final analysis at the end of the trial. Interim analyses increase the chance of finding significant difference either make adjustments to sample size or use group sequential testing methods. Sample size methods will next be given for dichotomous, continuous, and continuous but censored data. 88 Sample Size Variables Calculations for Dichotomous Response Compare drug A (standard) vs. drug B (new). PA = proportion of failures expected on drug A PB = proportion of failures on Drug B that one would want to detect as being different Note: = P A – P B We want to test H 0 :P A = P B vs. H a :P A P B (P = true value) with significance level , and power = 1 – to detect a difference of = P A – P B . The total sample size required (N in each group) is: 2N 2 Z 2 p(1 p) Z pA(1 pA) pB(1 pB) 2 ( pA pB ) 2 , 2 where: p ( pA pB ) / 2 , and Z /2 and Z are critical values of the standard normal distribution, for example, for = 0.05 (two-sided test), Z 0.05/2 = 1.96. The table below gives Z /2 and Z for common values of and . Z /2 1– Z 0.10 0.05 0.025 0.01 1.645 1.960 2.240 2.576 0.80 0.85 0.90 0.95 0.84 1.03 1.282 1.645 89 Example PA PB = = 0.4 0.3 (An “event” is a failure, so we want a reduced proportion on the new therapy.) Let = 0.05, 1 – = 0.90, two-sided test. Note: p (0.4 0.3) 0.35 2 From the table provided, we have Z /2 = 1.96 and Z = 1.282. Substituting those values into the formula gives: 2N 2 1.96 2(0.35)(0.65) 1.282 (0.4)(0.6) (0.3)(0.7) 2 (0.4 0.3) 2 = 952.3 Rounding up to the nearest 10 yields 2N = 960, or N = 480 in each group. The tables from Fleiss (handout) give sample sizes for several cases calculated using an adjusted version of the above formula. 90 Pocock’s sample size formula for dichotomous response variables The variance of p A p B equals var ( p A) + var ( p B ) if the two samples are independent. The binomial variance [i.e., the variance pˆ = x/n, where x has the binomial (n,p) distribution] is a function of p(1 – p). The trouble is that we don’t know the true values, p A and p B , to compute the true variance. (If we did, we wouldn’t have to do the experiment in the first place!) The variance of p A p B under H 0 :p A = p B = p is a function of 2 p (1 p ) , and the variance of p B p A under H A :p A p B is a function of p A (1 – p A ) + p B (1 – p B ). (The sample size formula derivation will be given later, which shows why the former is multiplied by Z and the latter by Z .) Often these two values will be very similar. Pocock uses p A (1 – p A ) + p B (1 – p B ) in both places in the sample size formula above, which simplifies the formula considerably: b gb 2 pA(1 pA) pB(1 pB) Z / 2 Z 2N 2 pA pB b g g 2 Pocock’s formula uses proportions multiplied by 100% (e.g., 75% instead of 0.75), but this change in scale cancels in the numerator and denominator, and gives the same result as using proportions. Pocock’s Table 9.1 gives (Z + Z for several values of and . Table 9.1 (Pocock). Values of ƒ(,) to calculate the required number of patients for a trial (type I error) 0.10 0.05 0.02 0.01 0.05 (type II error) 0.1 0.2 0.5 10.8 13.0 15.8 17.8 8.60 10.5 13.0 14.9 2.7 3.8 5.4 6.6 6.20 7.90 10.0 11.7 91 N adjusted for continuity correction (Fleiss, 1981; Casagrande et al., 1978) Recall: Underlying distribution is binomial (discrete), which we approximate with a normal distribution (continuous). Using the continuity correction leads to the following adjustment in sample size: N corrected = L M M N O P P Q 2 N 4 1 1 N pA pB 4 Using the previous example, with p A = 0.4, p B = 0.3, N = 480: 2 480 4 499.8 500 1 1 N corrected = 4 480 0.4 0.3 Using the uncorrected N, the sample size would be too small by: 2 x (500 – 480) = 40 patients. The corrected N is recommended, and the continuity-corrected test statistic also should be used. Corrected values are tabulated for extensive combinations of , , p A , and p B in the references. 92 For example, for power = 0.80) Z = 1.96, Z PA PB N 0.05 0.10 0.20 0.30 0.40 0.45 0.50 0.60 0.70 0.80 0.85 0.15 0.20 0.30 0.40 0.50 0.55 0.60 0.70 0.80 0.90 0.95 140 199 293 356 387 391 387 356 293 199 140 References: Fleiss JL. Statistical Methods for Rates and Proportions. 2nd ed. New York, NY: Wiley; 1981. Casagrande JT, Pike MC, Smith PG. An improved approximate formula for calculating sample sizes for comparing two binomial distributions. Biometrics 1978;34(3):483-486. 93 Effect of binomial variance on sample size Recall that the variance of p is a function of p(1 – p), which is graphed below: 0.25 0.20 p(1 – p) 0.15 0.10 0.05 ¼ ½ ¾ 1 p The variance of p is largest when p = 0.5, and smallest when p is near 0 or 1. Larger sample sizes are required to detect a change, p A - p B , when p A and p B are near 0.5. Smaller sample sizes are required for p A and p B near 0 or 1. If one has no idea about the true value of p, then one can conservatively use p = 0.5 in the variance formula for sample size calculations. In general, dichotomous outcomes require substantial sample sizes to detect moderate differences. Continuous outcomes usually require smaller sample sizes. 94 Derivation of the (uncorrected) sample size formula Let p A and p B be the sample proportions, and N is the sample size in each group. To test H 0 :p A = p B vs. H a :p A > p B (one-tailed test used for simpler calculations), we use the test statistic: Z p A p B , 2 pq / N where: 1 p ( p A p B) and q 1 p 2 Testing at level means: P(rejecting H 0 H 0 true) = P(Z>Z H 0 true) = We can perform anlevel test for any sample size (recall power curve). To determine N, we need to specify and . For a given and = p A - p B , we have: P(rejecting H 0 H a true) = P(Z>Z p A - p B = ) = 1 – This probability is a function of N (because Z is a function of N), so we can solve the equation for N. However, the Z statistic does not have a standard normal distribution if H a is true. p A p B was standardized assuming H 0 true, so we must unstandardize and then re-standardize. Recall that under H a : pAqA pBqB p A p B ~ N pA pB, , N where q = 1 – p. 95 So: 1 P( Z Z pA pB ) P F p p Z G G H2 pq / N A B pA pB I JJ K Un-standardizing: P( pˆ A pˆ B Z 2 pq pA pB ) N And re-standardizing: P F G a fa f G a f G G Hstandard normal p A p B pA pB pAqA pBqB / N a f f Z 2 pq / N pA pB a pAqA pBqB / N pA pB I JJ JJ K So: 1 P( Z Z pA pB ) , where Z is a critical value from the normal distribution, and also: Z b g bp q p q g/ N Z 2 pq / N pA pB A A B B 96 We can now solve this equation for N. First multiply by Z N / N 1: Z 2 pq ( pA pB) N pAqA pBqB b g Z pAqA pBqB Z 2 pq ( pA pB) N ( pA pB ) N Z 2 pq Z pAqA pBqB 2 Z 2 pq Z pAqA pBqB , N A pB ) ( p which is the sample size formula given earlier. N is the number required in each group. The formula in the notes for 2N multiplies the above result by 2. (Note that pq is still an unknown quantity, but we will approximate it with [p A + p B ]/2.) 97 Sample size based on width of confidence intervals (McHugh & Le, 1984) If we want a C.I. of width 2d (i.e., d), then solve for N: d Z N bp q p q g/ N A A B B 1 Z pAqA pBqB d 1 N Z d 2 pAqA pBqB For the previous example, p A = 0.4, p B = 0.3, N= 480, and Z = 1.96: d 1.96 (0.4)(0.6) (0.3)(0.7) / 480 0.06 If we wanted a C.I. of width 2(0.05) instead of 2(0.06), the required sample size would be: 1 1.96 N 0.05 2 0.4 (0.6) (0.3)(0.7) 691.5 Reference: McHugh RB, Le CT. Confidence estimation and the size of a clinical trial. Control Clin Trials 1984;5(2):157-163. 98 Adjustment for noncompliance (crossovers) Assume a new treatment is being compared with a standard treatment. Dropouts: those who refuse the new treatment some time after randomization and revert to the standard treatment Drop-ins: those who receive the new treatment some time after initial randomization to the standard treatment These generally dilute the treatment effect. Example: drug A vs. placebo Suppose the true values are: = pA p placebo = 0.6 0.4 0.6 – 0.4 = 0.2 So: = Enroll N = 100 patients in each treatment group. 25% of drug A group drops out. 10% of placebo group drops in. So, instead of observing: E ( pˆ A) 0.6 and E ( pˆ B) 0.4 , we observe: E ( pˆ A) 75 25 (0.6) (0.4) 0.55 100 100 0.55 0.42 0.13 (instead of 0.20) 90 10 (0.4) (0.6) 0.42 E ( pˆ B) 100 100 The power of the study will be less than intended, or else the sample size must be increased to compensate for the dilution effect. 99 For a dropout or drop-in rate of R (crossovers in one direction only), the adjusted sample size is: N adjusted N 1 (1 R) 2 For example, if R = 0.25 in the previous example with N = 480: N adjusted 480 1 480(1.78) 853.3 (1 0.25) 2 For a dropout rate of R 1 (A placebo) and a drop-in rate of R 2 (placebo A), the adjusted sample size is: N adjusted N 1 (1 R1 R 2) 2 For example, if R 1 = 0.25 and R 2 = 0.10: N adjusted 480 1 1,136 (1 0.25 0.10) 2 The large increase in sample size shows the considerable impact of noncompliance on the ability to detect treatment differences. Keep noncompliance to a minimum during trials. Justification for sample size adjustment formula for noncompliance Expected difference between treatments: p A - p B = . R 1 = dropout rate on treatment A. pA E ( p A) pA(1 R1) pB R1 pA R1( pA pB) 100 Recall the (uncorrected) sample size formula: 2 Z 2 pq ZB pAqA pBqB N A pB ) ( p A small change in p A or p B will have little effect on the numerator. The denominator, however, will become: cp p h cp R bp p g p h bp p g R bp p g b p pg b1 R g bp p gb1 R g 2 2 A B 1 A A B 2 B A 2 A B 1 B 2 1 A A B 2 1 B Thus, the adjustment to N for a dropout rate of R 1 is: 1 2 1 R1 b g Similarly, if there is also a drop-in rate of R 2 (treatment B A): pB E ( p B ) pB(1 R 2) pAR 2 pB R 2( pA pB ) The denominator of the sample size formula becomes: c b gh c b ( pA pB) 2 pA R1 pA pB pB R2 pA pB a fa pA pB 1 R1 R 2 f gh 2 2 So the adjustment to N is: 1 1 R1 R 2 2 101 Sample Size Variables Calculations for Continuous Response Examples of continuous response variables: Blood pressure Time to tumor clearance Length of hospital stay Assume all observations are known completely (no censoring). Data are assumed to be approximately normally distributed. A transformation (e.g., log or square root) may be required to normalize skewed data. To test H 0 : = A - B = 0 vs. H a := A - B 0, use the test statistic: A B Z 1 1 NA NB Using the same technique as in the dichotomous response case to derive the sample size formula, we obtain (for given , , , and ): 2N b g 4 Z / 2 Z 2 2 2 Note: This formula is based on a normal (not a t) distribution either is known or N is large enough (N > 30 in both groups) to make this assumption valid. If 2 is not known, compute N for a range of 2 to determine its effect on sample size. If N < 30, this formula will underestimate the correct sample size if is not known. If variances in the two groups are not equal, base N on the larger value. 102 Example In a study of a new diet to reduce cholesterol, a 10 mg/dl difference would be clinically significant. From other data, is estimated to be 50 mg/dl. We want a two-sided test with = 0.05, power = 1 – = 0.9 to detect a 10 mg/dl difference. Z /2 = 1.96 and Z = 1.282. So: 4 1.96 1.282 50 2N 1,051 2 10 2 2 How different would the required sample size be if were actually 60? 4 1.96 1.282 60 2N 1,513.5 2 10 2 2 A big difference in N considering the relatively small increase in . Be conservative in estimates of !! 103 Sample size based on width of confidence intervals e j d Z 2A 2B / N Here, the relationship with power is the same as in the dichotomous response case. 104 Sample size for change-from-baseline response variables For example, = final – baseline cholesterol level. We test: H0: A – B = 0 Ha: A – B 0 vs. The variance of may be much smaller than the variance of the original values (person-to-person variability is removed). Smaller sample sizes result. Example If, in the example above, we used the change in cholesterol level, we may have found = 20 (compared with = 50 above), so N is now: b gb g 4 196 . 1282 . 20 2N 170 (10) 2 2 2 (This is much smaller than 1,051!) 105 Sample Size for Time-to-failure Data (censored data case) Generally we want to compare the survival curves s(t) from two groups, where s(t) = P(T >t) = P(surviving beyond time t). 1 s(t) time (t) Generally, the log-rank or Wilcoxon (nonparametric) tests are used to test differences between survival functions for two groups. However, sample size calculations are often based on assuming time to failure has an exponential distribution (parametric assumption). s(t) = e-t, where is the hazard rate (force of mortality): 1 mean survival time If T is the length of the study, and and B are the hazard rates for patients under treatments A and B, respectively: 2 Z / 2 Z 2 A B , 2N ( A B ) 2 where: 2 ch 1 e T This assumes all patients enter at the beginning of the study. 106 Example We plan a 5-year study (T=5) with A = 0.20 and B = 0.30, = 0.05, 1 – = 0.90, so Z /2 = 1.96, Z = 1.282. Assume all patients will enter at the beginning of the 1st year. Then: (0.2) 2 A 0.0633 , 1 e 0.2(5) (0.3) 2 B 0.1158 , 1 e 0.3(5) and 2 1.96 1.282 2 2N 0.0633 0.1158 376.5 (0.2 0.3)2 For patients recruited continually during this study period, use: 3T ch T 1 e T For the same parameters used above, this would give: (0.2)3 (5) A 0.1087 (0.2)(5) (0.2)(5) 1 e (0.3)3 (5) B 1.867 0.3(5) (0.3)(5) 1 e 2N = 620.9 Accrual throughout the period requires more patients than if all start at the beginning of the study. 107 For the situation where accrual occurs over a fixed time period, T 0 , followed by a fixed interval of follow-up, T, use: 2 1 e T T e T / T 0 0 Early accrual builds information faster, and can lead to reduced sample sizes. See also: Freedman LS. Tables of the number of patients required in clinical trials using the logrank test. Stat Med 1982;1:121-129. 108 Sample Size for Testing Equivalence of Treatments We may be testing a less expensive, less toxic, or less invasive procedure, and want to make sure that it is “as good” as the standard treatment in terms of efficacy. If we do not reject H 0 : A = B , that does not mean that we conclude the treatments are equivalent. We want high power to detect differences of clinical importance, and low power to detect differences that are clinically the same. Power curve 1 Power =1-β Often this will mean switching the emphasis of and (e.g., using = 0.10 and 0.90). δ<0 δ≈0 δ>0 References: Based on a C.I. approach: Makuch R, Simon R. Sample size requirements for evaluating a conservative therapy. Cancer Treat Rep 1978;62(7):1037-1040 Based on hypothesis testing, but switching H 0 and H a : Blackwelder WC. ‘Proving the null hypothesis’ in clinical trials. Control Clin Trials 1982;3(4):345-353. Blackwelder WC. Sample size graphs for ‘proving the null hypothesis.’ Control Clin Trials 1984;5(2):97-105. 109 Sample size for testing equality of several normal means (i.e., continuous response variables) Procedure is straightforward, but requires tables. References: Mace AE. Sample Size Determination. Malabar, FL: Kreiger; 1974. Neter J, Kutner MH, Wasserman W, Nachtsheim CJ. Applied Linear Statistical Models. 4th ed. New York, NY: McGraw-Hill/Irwin; 1996. Cohen J. Statistical Power Analysis for the Behavioral Sciences. 2nd ed. Philadelphia, PA: Lawrence Erlbaum; 1988. 110 Sample size for testing equality of several proportions N based on chi-square test for homogeneity Reference: Lachin JM. Sample size determinants for r X c comparative trials. Biometrics 1997;33(2):315-324. 111 Pocock’s Table 9.1 Pocock’s Table 9.1 gives values of ƒ(,) to calculate the required number of patients for a trial. Table 9.1 (Pocock) (type I error) 0.10 0.05 0.02 0.01 0.05 (type II error) 0.1 0.2 0.5 10.8 13.0 15.8 17.8 8.60 10.5 13.0 14.9 2.7 3.8 5.4 6.6 6.20 7.90 10.0 11.7 = the level of the 2 significance test used for detecting a treatment difference (often set = 0.05). 1 – = the degree of certainty that the difference p 1 - p 2 , if present, would be detected (often set 1 – = 0.90). (commonly called the type I error) is the probability of detecting a significant difference when the treatments are really equally effective (i.e., it represents the risk of a false-positive result). (commonly called the type II error) is the probability of not detecting a significant difference when there really is a difference of magnitude p 1 - p 2 (i.e., it represents the risk of a false-negative result). 1 - commonly called the power) is the probability of detecting a difference of magnitude p 1 - p 2 . Here, p 1 and p 2 are the hypothetical percentage successes on the two treatments that might be achieved if each were given to a large population of patients. They merely reflect the realistic expectations or goals that one aims for when planning the trial and do not relate directly to the eventual results. 112 Example In a trial of anturan, the investigators chose: p1 p2 = = = 90% on placebo expected to survive one year 95% 0.1 The required number of patients on each treatment n = p1 (100 p1) p2 (100 p 2) f ( , ) , ( p 2 p1) 2 where ƒ(,) is a function of and , the values of which are given in Pocock’s Table 9.1 (reproduced above). In fact: f ( , ) 1 ( / 2) 1 ( ) , 2 where is the cumulative distribution function of a standardized normal deviation. Numerical values for -1 may be obtained from statistical tables such as Geigy (1970, p.28). Hence, for the anturan trial: n 90 10 95 5 a95 90f 2 10.5 578 Thus, 578 patients are required on each treatment. 113 SAMPLE SIZE TABLES 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 POWER AND SAMPLE SIZE (PS) SOFTWARE PS is a free resource available for download on the Department of Biostatistics web site. How to Download PS 1. In your favorite browser, type the following URL: http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/PowerSampleSize 2. When the following page appears, click on the “Get PS” link: 135 3. A screen similar to the following one should appear. Click OK. 4. The “save as” dialog box will appear, and you can choose the location to save your file. You may want to save to C:\Temp, so that you can easily remove the setup files after you have installed the software. When you have chosen your location, click Save. 136 5. Go to your C:\Temp folder and double click on the PS icon. 6. The PS software will be automatically installed to your machine. 137 How to Use PS: Examples 138 139 140 141 APPENDIX: PAGANO TABLE A.3 I