Sample Size Determination for Analysis of Covariance
Transcription
Sample Size Determination for Analysis of Covariance
Proceedings of the Annual Meeting of the American Statistical Association, August 5-9, 2001 Sample Size Determination for Analysis of Covariance. Negasi Beyene, and Kung-Jong Lui, San Diego State University. San Diego, CA 92182. Negasi Beyene, CDC/NCHS/Research Data Center, 6525 Belcrest Rd., Hyattsville, MD20782 Key Words: Confounder; Sample Size; Covariate.; . 1. Introduction Sample size determination is one of the important problems in designing an experiment or survey. The general results of how to solve this problem are found in various textbooks and journal articles. In clinical trials as well as in other biomedical experiments, the sample size determination is usually posed in relation to testing of the hypothesis. However, in these experiments data collection is associated with special problems. Patients may drop out of the study; animals may die due to unknown causes. These and other factors can lead to incompleteness in observations of some experimental units. And in most cases clinicians ignore the question of how the sample size requirements is affected by the presence of covariate. If covariates are excluded from the analysis of response variable, estimates of the factor effect are biased and so is the sample size estimate. Note that after data are collected conditional data analysis is usually done by treating the value of potential confounders as fixed constants in application of multiple regression models. By contrast, in the planning stage of trial, it is not uncommon that the values of potential confounders may not be controlled by experimenters and even unknown in advance. 1.1 Purpose of the Study When data are in continuos scale, the multiple linear regression model is one of many commonly used statistical methods to control confounders in clinical trials. When calculating the required sample size, if we don’t incorporate confounding effects into sample size determination without accounting for confounders the result may be quite misleading. In this paper, readers are provided with a systematic discussion of the calculation of the required sample size for detecting a specified treatment effect in the presence of a single potential confounder. On the basis of covariance-adjusted mean estimates, asymptotic sample size formulae in closed forms were derived for both randomized and non randomized trials. On the basis of covariance-adjusted mean estimates, this study provides an insight into influences of unbalanced confounding covariate (in nonrandomized trials), the correlation between the confounding covariates and the subject response, the distance between mean responses in the two treatment groups in unit of standard deviation, and the magnitude of the specified treatment effect that we are interested in detecting on sample size calculation. Therefore, this study, “Sample Size Determination for Analysis of Covariance” should be useful for clinicians when designing their trials. 1.3 Analysis of Covariance In order to discuss how to determine the sample size of analysis of covariance, first we need to familiarize ourselves with analysis of covariance. A combination of an analysis of variance and regression techniques to study the relationship of Y (dependent variable) to one or more independent variables X 1 ... X k is known as covariance analysis. Because analysis of covariance techniques are complicated, we confine ourselves mainly to the simplest type of covariance problem, namely, a combination of one-way analysis of variance with linear regression on a single X variable. The purpose of an analysis of covariance is to remove the effect of one or more unwanted and uncontrollable factors in an analysis of covariance. When experimental units are randomly assigned to â treatments, then analysis of covariance (ANCOVA) provides a method for reducing the size of the residual error term to be used in making inferences. Thus, covariance analysis can result in shorter confidence intervals and more powerful tests and so can sometimes, in an experiment, result in substantial reduction in the sample size needed to establish differences among the population means. Example 1 In studying three diets, rats might be randomly assigned to diets. If the dependent variable is weight after two months on diet, a one-way analysis of variance can be performed on the weight. It is reasonable to suppose however, that weight after two months on a diet is highly correlated with initial weight measured on each rat. The elimination of the linear effect of initial weight on final weight should result in a smaller mean square error. Initial weight is then called a covariate (or concomitant) variable; its linear effect will be removed by the use of regression analysis techniques. Example 2. We let (Y) be language score for students taught by three different methods. Measurements on IQ (X) taken before the language instruction are also available. The students are assigned randomly to the three teaching methods, so that the mean IQ values for the three populations being considered are all equal. The sample IQs probably differ, but large differences are unlikely because of the random assignment to teaching methods. Either a one-way analysis of variance or a covariance analysis can be preformed with the data. The advantage of analysis of covariance is twofold. 1. We may reduce the size of the mean square error to be used in testing for treatment differences and in making confidence intervals. 2. Covariance analysis adjust for the possibility that observed differences among the three sample mean language scores may be partly due to difference among the three mean IQs. The group with the highest mean IQ may also have the highest mean language score, not because of teaching method, but simply because students with higher IQs learn a new language quickly. One-way analysis of variance takes no account of differences among the mean IQ scores by reporting adjusted-treatment means. Frequently the researcher calculates the highest of the − to reject H 0 when H 0 is false); between two means.) are carefully chosen, and if σ (the population variance) is accurately known, then the computed value for N is the desired sample size. An experiment would be constrained if an experimenter properly calculated an N of, for example, 20 samples but then decided that this sample size was excessive for the time or money available and proceeded to test a sample of 10 instead of the 20 required. Nevertheless, it is always necessary to be realistic, and sometimes it will be necessary to choose an N based on time available or money available, or kind of sample (for instance, patients) available, rather than on a calculation from the proper formula. In this type of situation, it is recommended that the adequacy of the experiment be evaluated before it is started if enough information is available. If the α -error is critical, the way to determine the power of a proposed 2 experiment is to compute the value of β -error that would be obtained by using the pre-specified a , δ , and proposed sample size; that is the experimenter would predetermine the risk of missing an improvement of size δ if a substandard sample size should be used. On the other hand, if the β error is critical, the efficiency of a proposed experiment is determined by computing the value of α with the proposed sample size, β and δ . The size of N is 2 a function of Zα , Z β , δ , and σ . If we are sample regression lines at the overall mean X .. ; these are testing called adjusted means. His purpose is simply to have a means that can be compared. Calling the adjusted mean for z and α _ i th population adj Y i . obtain adj − − 1.4 *′′ are in the numerator of each formula, then the size of the must increase as the desired risk of error Yi . = Yi . + b( X ..− X i . ) the variance of the experiment becomes smaller. Likewise. Any decrease in variance * Yi zβ Since − − adjusted mean is obtained by X = X .. in Var H 0 : µ1 = µ2 vs. H a : µ1 ≠ µ2 . N = (2(Zα + Zβ ) 2 σ 2 / δ 2 ) we use the foregoing definition to − δ ( the difference − 2 * 1 ( X − X i. ) = σ e2 + 2 n ΣX r i Importance of Sample Size causes a decrease in the sample size of the experiment. Finally, the smaller the size of an effect that is to be detected, the greater the sample size required. The specified sample size assures the experimenter that the risk of error will be equal to or less than α and β when the experiment is complete. 2. Methods Theory; Sample size is a primary criterion for the validity of experiments. If, in a given experiment, the values for Let α , β , and δ [ α (the risk of making an error of the th first kind; that is, fail to reject H 0 when H 0 is true)]; β ( the risk of making an error of the second kind, that is, fail (Y , X ) ij ij ′ denote the observation for ( j = 1,2,3,..., ni ) s u b j e c t s i n t h e i th ( i = 1,2) treatment group, Where Yij -represents the j subject response and X ij -represents the potential confounding covariate. We assume that 2 where ′ ∑∑ X − X ij i. Yij , X ij independently ~N µ , Σ , i j S xy Where µ ′ = µ y(i ) , µ x(i ) β1 = Y2. − Y1. − β2 X 2. − X 1. ; β2 = 2 , and Sx σ y2 σ yx Therefore, for a β1 is the uniformly minimum variance unbiased Σ = 2 σ σ yx x estimator(UMVUE) of β1 , and X i . = ∑ X ij ni for j given X ij = xij , the conditional distribution of Yij can be i = 1,2 [4]. Note that the conditional variance expressed as β Yij = βο + β1 I i + β2 X ij − X .. + ε ij (2.1) Var 1 depends on the value X ij of the confounder X ∑∑ X ij i j that may be unknown in advance. Because the UMVUE βο = µ y(1) + β2 X ..− µ x(1) ; X = ∑ n of β1 is always an unbiased estimator of β1 regardless of i i ( ) ( ) ( ( ( ) ( ) the σ yx σy , =ρ σ xx σx ( − X 1. ) 2 ) ) value ( ) X , the unconditional Var β1 of 2 X 2. − X 1. β 2 2 1 = σ y 1− ρ = E var X ∑ ∑ X ij − X i. i j ( 0 ⇒ if ..i = 1 1 ⇒ otherwise The indicator variable I i = ) ( ) ( ) 2 [4]. Furthermore, because εij independently ~N (0, σ y( 2 ) (1 − ρ 2 )). Note that (X 2. ) 2 ( − X 1. and ∑ ∑ X ij − X i . i ( ) j ) 2 are independent, we β1 is in fact, the difference of the obtain that VAR β1 = covariance-adjusted mean response[5,25,4] between the two treatments, and non-randomized trials is generally different from ) ( 2. ) β1 = µ y( 2 ) − µ y(1) − β2 µ x( 2 ) − µ x(1) the coefficient ) (X ) ( and the ( ) ( β2 = 1 1 2 2 = σ y 1− ρ + + n1 n2 (µ ( 2) y − µ y(1) ) 1 1 1 1 1 = σ y2 1 − ρ 2 + + + d x2 (2.2) + n1 n2 ( n1 + n2 − 4) n1 n2 , the difference of the two For ni > 2 . ( ) simple sample means. To avoid the possibly confounding effect resulting from the covariate X in comparing the two µ x( 2) − µx(1) Where d x = , the distance between µx( 2) and treatment effects, we may wish to test the hypothesis; σx Ho : β1 = 0 versus Ho : β1 ≠ 0 rather than the hypothesis µ x(1) in units of the standard deviation σ x . Ho : µ y( 2 ) − µ y(1) = 0 versus H1: µ y( 2 ) − µ y(1) ≠ 0 Note that for fixed total sample size n = n + n , the ( ( ) ( ) ) o ( ) 1 2 under model assumption (2.1), we can show that for fixed variance Var β in formula (2.2) reaches the minimum 1 column vector X , of which transpose vector and thereby the power is maximized for detecting β1 ≠ 0 X ′ = X 11, ..., X 1n1 , X 21 ,..., X 2 n2 , the conditional if n1 = n2 .Therefore, we will assume equal sample allocation n1 = n2 = n in the following discussion. In this variance Var β1 X equals case, if the correlation ρ were further equal to 0, then the ( ( ) ) ( ) variance Var β1 would reduce to 2 σ y2 + n 1 2 2 + dx 2( n − 2) n that is always greater ( ) >Var Y2. − Y1. = 2σ y2 n Furthermore, when ∑ Yij , Yi . = j for i = 1,2 . ni ) results suggest that in the case of ρ = 0 , using the test statistic β1 , that has unnecessarily adjusted the confounding effect due to X, always causes the loss of efficiency. On the other hand, if correlation ρ ≠ 0 , by recalling the definition of β1 , we can easily show that dx = ρ between µ y( 2 ) and ( / ρ 2 − β1 σy ( ) ) ≤ − Zβ 2 For given values of d y , β1 and the correlation ρ we can σ y easily apply a trial-error procedure to solve for N in equation (2.4). Furthermore, if N were large, then a good approximation N x for N in sample size calculation procedure (2.4) would equal [ N x ]+1, where [ N x ] denotes µ y( 2 ) − µ y(1) where d y = , the distance the largest integer less than or equal to N x , that is given by σy equation (2.5) {please see below} 2 µ y(1) in units of the standard deviation 2 ( ) σ y . Substituting this result for d x in formula (2.2), the variance for β1 in the case of n1 = n2 = n becomes ) n 2( n1− 2) 2n + ( 2 Var β1 = σ y2 1 − ρ 2 + ( ) ( 2 y 2 1 2 + d y − β1 σ y 1− ρ2 + n 2( n − 2) n ρ2 Ho : β1 = 0 is equivalent to testing µ y( 2) − µ y(1) = 0 . These β dy − 1σ y 2 ρ = 0 , testing the hypothesis ( (1 − ρ ) n2 + 2( n1− 2) n2 + (d ) Za / 2 = Zα / 2 (1 − ρ )[ 2 + d / (2ρ )] + Z (1 − ρ ) 2 + d 2 2 y 2 β1 σ y ) 2 β y β1 − ( ) 2 σ y / 2 ρ 2 d y − β1 σ y (2.3) Note also that in a randomized trial where ρ2 µx( 2 ) = µx(1) , β1 = 0 implies that d y =0. Thus, in this situation, Which is a function of σ y , ρ , β1 , d y and the sample size N. sample size calculation procedure (2.4) will be simply equal to Note that for all fixed n and ρ , the larger the value of β1 σy d y − β1 σ y (or equivalently, the larger the absolute value (2.6) Zα /2 − (2) (1) 1 µx − µx 1− ρ2 of ), the larger is the variance Var β1 . n n ( − 2) σx Therefore, for all other parameters fixed, in nonrandomized Similarly, if µ ( 2 ) = µ (1) , approximate sample size formula N ( ( ) ( ) trials where µx( 2 ) ≠ µx(1) , the variance Var β1 ) x x x of β1 is for N in equation (2.6) would become [ N x ]+1 where N x always larger than that of β1 in a randomized trial where µx( 2 ) ≠ µx(1) . On the basis of formula (2.3), the required sample size N from each of the two treatment groups in testing the hypothesis Ho : β1 = 0 for a power 1- β to detect the alternative hypothesis Ha : β1 ≠ 0 ( say, β1 > 0 ) at an α -level ( two-sided test) is the smallest integer N such that equaion (2.4) Where Zα Zβ are the upper 100α th and 100β th percentiles is N x (Z =2 a /2 + Zβ ) (1 − ρ ) 2 2 d y2 (2.7) Note that sample size (2.7) is, in fact, the product of the corresponding required sample size ( Zα 2 /2 + Zβ ) d y2 2 for power of 1 − β at an α − level on the basis of the test ( ) ( ) statistic Y2. − Y1. and the multiple factor 1 − ρ 2 . Note of the standard normal distribution, respectively. Note that also that sample size calculation procedure (2.4) and (2.6) the sample size below (2.4) depends on σ y only through or approximate sample size formulae (2.5) and (2.7) are all symmetric with respect to the correlation ρ . Therefore, for the parameter d y and the ratio β1 . Below is equation a given the same conditions, the required sample size N for σy ρ = ρο equals that for ρ = − ρο . (2.4). 3. Results. To study the influence of the correlation ρ , the distance in unit of the standard deviation between the two treatment mean responses d y , and the absolute difference ( ) ρ = 0.7 , the required sample size N for a power of 0.90 is 44 at 0.05 level (two-sided). In contrast to this, the required determination, Table 3.1, summarizes the corresponding sample size on the basis of the test statistic Y2. − Y1. , is 85, sample size from each of the two treatment groups for that exceeds the required sample size of 44 in use of the power of 0.80 and 0.90 at 0.05(two-sided) level based on test statistic β1 by over 90%. On the other hand, if ρ = 0 , the test statistic β1 in a variety of situations. For example, in testing the hypothesis Ho : β1 = 0 , for power of 0.80 at as noted before, use of the test statistic β1 that adjust the nonconfounder X in hypothesis testing always causes the 0.05 level (two-sided), when d y =0.5, β1 =0.25,and the loss of efficiency. In nonrandomized trials, especially when σy the absolute value d x for the confounder X is large and correlation ρ = 0.3 , the required sample size N from each of the two treatment groups is 351. In this situation, when we do not incorporate the confounding effect into according to the approximate sample size N in sample size calculating the final power could be quite formula(2.5), we need to take 350 subjects. In order to different from the desired power if we were required to prove our case we gave sample size and estimated the control this confounding effect to avoid the possibly power. As the above sample size equals 350, misleading inference[22] on comparing the two treatment effects. For example, say, we would like to have a power d y = 0.5, β1 = 0.25, and the correlation ρ = 0.3, the 0.90 to detect d = 0.5 at α − level. The required sample σy y d y − β1 σ y = ρ µx( 2 ) − µx(1) / σ x o n sample size estimated power is equal to 0.806 which is approximately size is 85 by use of the formula 2( Z − Z )2 / (d )2 . If a /2 β y equal to 0.80. Therefore, our theorem holds. In general, for ρ = 01 . and β1 σ y = 0.25 (these conditions together with fixed all other parameters, the higher the absolute value ρ of correlation between subject response Y and confounder d y = 0.5 , lead d x to equal 2.5), it is easy to show that on X, the smaller is the required sample size. Note that the basis of the test statistic β1 with such a sample size 85 approximate sample size N x calculated by use of equation we have only a power of approximately 0.01, that is much (2.5) and (2.7) agrees quite well with those N calculated by less than the desired power 0.90. This example use of equations (2.4) and (2.6), respectively, even when N demonstrates that when a confounder has not been is small, although the former one always underestimates incorporated into sample size calculation and is seriously the latter one. Also, we calculated N1 without adjustment unbalanced between the two treatment groups, the for comparison (see Table 3). The formula used is adjustment of this confounder can then cause a tremendous loss of power. Note that in practice the parameters 2 Z + Z 2 β) ( α σ y , ρ , β1 , and d y are usually unknown, and therefore we . N = 1 d y2 need to substitute the corresponding sample estimates for 4. Discussion. these parameters in the test statistic β1 . Because all these In randomized clinical trial where µx( 2 ) = µx(1) , testing the sample estimates converge to their corresponding hypothesis Ho : µ y( 2 ) = µ y(1) is equivalent to testing parameters in probability, we can apply Slutsky’s theorem Comparing the v a r i a n c e and other large sample properties [12, 21] to justify that the Ho : β1 = 0 . sample size formulae derived here to be still Var Y2. − Y1. = 2σ y2 / n with Var β1 in formula (2.3), asymptomatically valid. Note that all the traditional common assumptions for classical linear regression however, after a few algebraic manipulations, we can show analysis[4, 5,25], such as a joint normal distribution for all that Var β1 is smaller than Var Y2. − Y1. if and only if the covariates and a constant covariance matrix across treatments, are needed to derive our sample size calculation 1 absolute value ρ > . Therefore, even in procedures. Although transformation may often apply to normalize the variables which have extremely skewed 2n − 3 distributions or to stabilize the variances ( 2) (1) randomized trials where µx = µx , use of the test statistic of covariates, the sample size formulae presented here β1 instead of Y2. − Y1. for hypothesis testing may still gain should be used cautiously in this situation. ( )( ( ) ) ( ) ( ( ) ) efficiency when there is a high correlation ρ between X and Y. In fact, this gain of efficiency can be substantial. For example, from Table 3.1, when d y = β1 = 0.5 and Table 3 Required sample sizes N and N x from each of the σy two treatment groups on the basis of covariance-adjusted estimates for power of .0.80 and 0.90 at α − level=0.05 (two-sided) and for the distance in units of standard deviation between the mean responses in two treatment groups, d y ranging from 0.25 to 1.0, the ratio of treatment effect to the standard deviation of Y, β1 σ y ranging from 0.25 to 1.0, and the correlation between the response Y and the covariate X, ρ , ranging from 0.1 to 0.9. N1 Without adjustment is shown for comparison. Power 0.80 Correlation ρ dy 0.25 0.50 1.0 0.90 0.1 0.3 0.5 0.7 0.9 0.1 0.3 0.5 0.7 0.9 N .25 NX N1 250 249 252 229 229 252 189 189 252 129 129 252 49 48 252 334 333 337 307 307 337 253 253 337 173 172 337 65 64 337 N .50 Nx N1 161 160 252 68 68 252 51 51 252 34 34 252 13 13 252 215 214 252 91 90 252 68 68 252 45 45 252 17 17 252 N 1.0 Nx N1 83 82 252 23 22 252 15 15 252 10 9 252 5 4 252 133 131 337 33 32 337 21 20 337 13 13 337 6 5 337 N .25 Nx N1 1392 1390 63 351 350 63 226 225 63 142 141 63 52 51 63 1704 1702 85 454 453 85 297 297 85 188 187 85 68 68 85 N .50 Nx N1 63 63 63 58 58 63 48 48 63 33 33 63 13 12 63 84 84 85 78 77 85 64 64 85 44 43 85 17 16 85 N 1.0 Nx N1 115 113 63 26 25 63 16 15 63 10 10 63 5 4 63 153 151 85 34 33 85 21 20 85 13 13 85 6 5 85 N .25 Nx N1 5571 5569 16 776 775 16 352 351 16 186 185 16 62 61 16 7102 7100 22 1003 1001 22 461 460 22 245 244 22 81 81 22 N .50 Nx N1 1193 1191 16 117 176 16 84 83 16 46 45 16 16 15 16 1434 1432 22 220 219 22 108 107 22 59 58 22 21 20 22 N 1.0 Nx N1 17 16 16 15 15 16 13 12 16 9 9 16 4 3 16 22 21 22 20 20 22 17 16 22 12 11 22 5 4 22 β1 σ y Note: Due to limitation of pages to be published, parts of this paper are omitted -like application, Simulation, and references .