Computational Statistics and Data Analysis
Transcription
Computational Statistics and Data Analysis
Computational Statistics and Data Analysis 52 (2008) 4903–4908 Contents lists available at ScienceDirect Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda A simple method of computing the sample size for Chi-square test for the equality of multinomial distributions Jeffrey A. Nisen a , Neil C. Schwertman b,∗ a Department of Statistics, 44 Kidder Hall, Oregon State University, Corvallis, OR 97331, United States b Department of Mathematics & Statistics, California State University Chico, Chico, CA 95929-0525, United States article info Article history: Received 29 November 2007 Received in revised form 8 April 2008 Accepted 8 April 2008 Available online 15 April 2008 a b s t r a c t Computing the appropriate sample size is one of the most important aspects of designing an experiment. For the Chi-square test of the equality of multinomial populations a very simple method is proposed for calculating the sample size to satisfy specified significance level and power. This method is especially understandable and does not require software nor the non-central Chi-square tables. Simulations are used to verify the accuracy of the new method. © 2008 Elsevier B.V. All rights reserved. 1. Introduction One of the most important elements for designing an experiment or survey is determining the minimum sample size needed to detect meaningful differences in populations. Consequently, the topic has received broad coverage in both the textbooks and the literature. The sample size usually has a direct impact on the cost of the study. A sample size that is too small, however, can substantially reduce the likelihood of finding important differences. The appropriate sample size is a function of the variability, desired significance level and the power of the test to detect a meaningful difference as well as the test statistic that is used. Since most statistical testing involves detecting differences in the means most of the literature is focused on determining the sample sizes for t-tests, z-tests or F-tests. Calculating the sample size for the analysis of variance F-test usually uses the non-central F distribution. The non-centrality parameter for the F distribution is difficult to understand and has little intuitive appeal for many researchers. Schwertman (1987) proposed a very simple alternative method for calculating the analysis of variance sample size based on the much more understandable maximum difference in the means between any of the k treatment groups. This method was extended by Schwertman and Schenk (1991) for calculating the appropriate sample size to detect specified trends or interactions in an analysis of variance. While most of the sample size literature has focused on detecting differences in the means or proportions far less investigations have been made in finding the appropriate sample size for Chi-square tests such as are used in testing categorical data. Agresti (1990), however, suggests for a 2 × 2 table calculating the sample size based on the sample size computation for the difference in the parameters of two binomial populations. For the Chi-square test in general, Agresti (1990) finds the sample size by a somewhat tedious computation of the non-centrality parameter for the Chi-square and using tables such as Haynam et al. (1970). Much of the earlier research focused on the 2 × 2 contingency table. Haber (1983) and Greenland (1983), however, investigated sample size and power for the 2 × 2 × 2 contingency table. Greenland (1985) suggested using the non-central X 2 distribution to unify some of the earlier approaches to sample size determination. Meng and Chapman (1966), Guenther ∗ Corresponding author. Tel.: +1 530 898 6329. E-mail address: [email protected] (N.C. Schwertman). 0167-9473/$ – see front matter © 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.csda.2008.04.007 J.A. Nisen, N.C. Schwertman / Computational Statistics and Data Analysis 52 (2008) 4903–4908 4904 (1977) and Lachin (1977) investigated the limiting power function of the asymptotic Chi-square statistic which could be used to find the appropriate sample size. Fleiss (1981), in computing the sample size for rates and proportions, considered only the binomial case but provided extensive tables for determining the minimum sample size for specified significance level, power and parameters P1 and P2 for the two binomial populations. Rochon (1989) suggested a rather complex procedure using the Grizzle et al. (1969) method to generate the minimum sample size needed to find a preset difference with a specified probability. Chow et al. (2003), in calculating the sample size for general goodness of fit and contingency tables, also use the laborious computation of the non-centrality parameter and the non-central Chi-square tables. For the Cochran–Mantel–Haenszel test and the McNemar test for category shift, they do provide simpler sample size formulas that do not use the non-central Chi-square. Whitehead (1993) investigated sample size for ordered categorical data, e.g. categories such as excellent, good, fair and poor. It is pointed out that under proportional odds the data can be analyzed by logistic regression. Whitehead (1993) concludes that “When investigators have control over the number and definition of categories, then sample sizes can be reduced by increasing the number above two, but there is little gain from using more than five.” Furthermore, when categories are equally likely there is greater efficiency (smaller sample sizes can be used). The computer computation of sample size using this method is contained in the PEST3 Program. (see Brunier and Whitehead (1993)). Lindley (1997) used a Bayesian approach and compares this approach to other Bayesian methods suggested by Adcock (1995), Joseph et al. (1995a,b) and Pham-Gia (1995). These Bayesian procedures for computing sample size require some a priori distribution and, for multinomial data, are primarily focused on estimation rather than hypothesis testing. Clearly the determination of the necessary sample size for Chi-square tests has either been for the binomial populations or been based on the computationally intense non-central Chi-square method. In this paper, our purpose is to provide a simple alternative procedure based on an adaptation of the Schwertman (1987) sample size computation for analysis of variance. Schwertman (1987) and Schwertman and Owens (1989) methods of determining sample size are extended to multinominal experiments of the type used, for example, by Gilliland et al. (1999) in the study of diabetes in Native Americans, by Goldberg (1972) to study mental health, and by Wong et al. (2005) to study the relationship between bone mineral density and depression. The new method does not change the usual Chi-square test statistic. The proposed methodology is for computing of sample size only. The distribution of the usual test statistic for testing the equality of multinomial populations is asymptotically central Chi-square under the null hypothesis. The non-null distribution is asymptotically non-central Chisquare with non-central parameter as given in the literature (see, for example, Guenther (1977)). The proposed sample size calculation uses a shift of the central Chi-square critical value instead of the usual non-centrality parameter for computing sample size for a specified power. The simulations in the paper show that the usual sample size calculations based on the non-centrality Chi-square has power quite close to the power resulting using the sample sizes from the new procedure. The proposed method does not require special tables such as the non-central Chi-square or a computer software package such as PEST3 or SAS, but rather uses the readily available central Chi-square tables and a very quick computation on the simplest of hand-held calculators. Furthermore, besides it’s simplicity, the proposed method, instead of using the noncentrality parameter, is based on the much more understandable and intuitive maximum difference between corresponding cells in any two populations. Computation simulations in Section 4 are used to verify the effectiveness of this method. 2. The method Suppose U (df ) is distributed Chi-square with degree of freedom df, and is used for testing the equality of two or more multinomial distributions. That is, if there are “a” multinomial populations and k different categories the null hypothesis is, Ho : Pji = Pj0 i for i = 1, 2, . . . , k and for all j, j = 1, 2, . . . , a and j 6= j0 , where Pji is the proportion from population j in category i. The degrees of freedom for the Chi-square statistic is (a − 1)(k − 1). Further suppose that some cell or category, say “i”, has the largest difference in proportions between any two of the corresponding cells in any two multinomial populations, say populations 1 and 2. Without loss of generality, let the difference in the population proportions for this cell be P1i and P2i and P1i − P2i = ∆ ≥ 0. Under the null hypothesis, Ho : ∆ = 0 and assuming equal sample sizes and sufficiently large samples Zi = q Pˆ 1i − Pˆ 2i (1) P1i (1−P1i )+P2i (1−P2i ) n is distributed approximately as a standard normal variable and U1 (1) = Zi2 has an approximate Chi-square distribution with one degree of freedom. Then by Cochran’s Theorem, U (df ) can be partitioned such that U (df ) = U1 (1) + U2 (df − 1) where U1 and U2 are independent X2 variables. The null hypothesis, Ho : ∆ = 0 is rejected when U (df ) > Xα2 (df ) where Xα2 (df ) is 2 such that P U (df ) > Xα2 (df ) = α or equivalently when U1 (1) + U2 (df − 1) > Xα (df ). To satisfy the power requirement for some specific value for ∆, P U (df ) > Xα2 (df ) ∆ > 0 = π or P U1 (1) + U2 (df − 1) > Xα2 ∆ > 0 = π. (2) J.A. Nisen, N.C. Schwertman / Computational Statistics and Data Analysis 52 (2008) 4903–4908 4905 Since U2 (df − 1) ≥ 0 then U (df ) = U1 (1) + U2 (df − 1) ≥ U1 (1). If the sample size is chosen such that U1 (1) by itself causes rejection of H0 : ∆ = 0 i.e. P U1 (1) > Xα2 (df ) |∆ > 0 = π (3) then this sample size guarantees at least the minimum power π. Furthermore, since U1 (1) is the two sample binomial test finding the sample size such that Eq. (3) is satisfied, requires only a slight modification of the method for finding the sample size in the two sample q binomial test which is readily available in the literature (See, q for example, Fleiss (1981) Eq. (3.14) but replacing Cα/2 with Xα2 (df ) or Chow et al. (2003) page 205 replacing Zα/2 with Xα2 (df ).) The derivation of these sample size formulas is a modification of that in Schwertman (1987). We include the derivationq for completeness in the Appendix. Using the sample size formula in Chow et al. (2003) page 205 and replacing Zα/2 with Xα2 (df ) the sample size formula is q 2 Xα2 (df ) + Zβ [P1i (1 − P1i ) + P2i (1 − P2i )] (4) ∆2 which is identical to the computational formula derived in the Appendix. This estimate of n is conservative in that only U1 was used to cause rejection. A more practical estimate for n, instead of assuming U2 (df − 1) = 0 (as was done going from Eq. (2) to Eq. (3)), replace U2 (df − 1) by its expected value (df − 1). Then Eq. (3) becomes P U1 (1) > Xα2 (df ) − (df − 1) |∆ > 0 = π and the adjusted Eq. (4) sample size formula is n= q 2 Xα2 (df ) − (df − 1) + Zβ [P1i (1 − P1i ) + P2i (1 − P2i )] . (5) ∆2 Schwertman (1987) used this adjustment very effectively for determining sample size in analysis of variance and it was equally effective here as shown in the simulations in Section 4. Clearly the sample size formulas (4) and (5) require values for P1i and P2i . These can be estimated by the experimenters or from a preliminary study. However, a conservative estimate of sample size is achieved when the P1i and P2i are estimated as approximately .5. Then q 2 q 2 Xα2 (df ) − (df − 1) + Zβ [P1i (1 − P1i ) + P2i (1 − P2i )] 2 (df ) − (df − 1) + Z n= X (6) ≤ (.5) /∆2 . β α ∆2 n= 3. Examples 1. Consider that there are two groups, a = 2, k = 4, such as in Zar (1999) hair color by gender example 23.1 page 487 with four possible responses. The resulting Chi-square statistic therefore has 3 degrees of freedom. Suppose that the meaningful difference, ∆, is .10 for the corresponding cells of the two groups and α = .05 and π = .80. Using Eq. (6) q 2 √ 2 2 n= X.205 (3) − 2 + Z.8 (.5) / (.1) = 7.81473 − 2 + .842 (.5/.01) = 529. 2. Consider the Gilliland et al. (1999) study of diabetes in Native Americans and the relationship to education level. There are four education levels and two categories, whether there is diabetes or not. The contingency table is 2 × 4 and the Chi-square statistic has 3 degrees of freedom. Suppose that the meaningful difference is 12%, significance level set at .10, the power is .8 and the expected proportion nearest to .5 is .35. Then 2 q Xα2 (df ) − (df − 1) + Zβ [P1i (1 − P1i ) + P2i (1 − P2i )] n = √ ∆2 = ( 6.251 − 2 + .842) [.35(.65) + .35(.65)] /.122 = 267. 2 3. Consider a hypothetical study of three groups; Republicans, Democrats and Independents/others. The researcher is interested in how these groups compare by education level: no degree, high school degree, college degree, professional/graduate degree and whether they voted in the last Presidential election. The appropriate table is 3 × 4 × 2, the Chi-square test statistic has six degrees of freedom and the researchers would like to determine if between the three groups there is a “meaningful” difference of .15 between any of the eight cells resulting from the pairs of education level and voting participation. Thus the 3 × 4 × 2 table is conceptualized as three multinomial populations (party affiliation) with eight categories (pairings of education level and voting participation). Furthermore, the researcher is satisfied with α = .10 and power of .9 and expects the population proportions in any cell to be no more than .3. Using Eq. (5), q 2 √ 2 X.21 (6) − 5 + 1.282 (.3) (.7) (2) 10.6446 − 5 + 1.282 (.42) n= = = 250. (.15)2 (.15)2 Therefore the researchers need a sample of 250 each of Republicans, Democrats and Independents. J.A. Nisen, N.C. Schwertman / Computational Statistics and Data Analysis 52 (2008) 4903–4908 4906 Table 1 Sample size and simulated rejection proportions k Nominal α π n1 n2 α1 α2 π1 π2 3 3 3 11 11 11 3 3 3 11 11 11 3 3 3 11 11 11 0.1 0.1 0.1 0.1 0.1 0.1 0.05 0.05 0.05 0.05 0.05 0.05 0.01 0.01 0.01 0.01 0.01 0.01 0.8 0.9 0.95 0.8 0.9 0.95 0.8 0.9 0.95 0.8 0.9 0.95 0.8 0.9 0.95 0.8 0.9 0.95 447 558 719 1171 1395 1593 541 696 838 1312 1547 1755 752 932 1095 1602 1861 2089 376 506 628 607 771 920 473 619 753 758 939 1103 688 860 1018 1064 1276 1466 0.0967 0.1020 0.1031 0.0986 0.0994 0.1023 0.0525 0.0515 0.0548 0.0484 0.0449 0.0504 0.0102 0.0105 0.0084 0.0111 0.0120 0.0088 0.0999 0.1000 0.1023 0.0993 0.1028 0.1028 0.0478 0.0529 0.0518 0.0441 0.0479 0.0492 0.0102 0.0107 0.0105 0.0090 0.0089 0.0109 0.8610 0.9336 0.9712 0.9716 0.9907 0.9956 0.8533 0.9294 0.9683 0.9666 0.9884 0.9959 0.8482 0.9239 0.9681 0.9600 0.9844 0.9943 0.7947 0.8967 0.9491 0.7685 0.8609 0.9208 0.7936 0.8917 0.9459 0.7732 0.8708 0.9268 0.8053 0.9002 0.9483 0.7832 0.8771 0.9357 n1 Sample size using Eq. (4) with P ’s = .5. n2 Sample size using Eq. (6). α1 Simulated significance level using n1. α2 Simulated significance level using n2. π1 Simulated power using n1. π2 Simulated power using n2. 4. Simulation study To establish the efficacy of the sample size procedure two data sets from multinomial distributions were simulated ten thousand times using SAS. The multinomial distribution had either three or eleven cells and the simulation used both sample sizes, assuming P1 = P2 = .5, computed using Eq. (4) and the more liberal Eq. (5) for α = .10, .05, .01 and π = .8, .9, .95. For the three cells, the proportions were: .5, .25, .25 in both populations to verify the proper significance level and changed to: .4, .3, .3 in the second population to investigate power. For eleven cells, when verifying significance level both populations had the proportions .5 in the first cell and .05 in the other ten. When investigating power the second populations’ proportions were changed to .4 in the first cell and .06 in the others. These patterns were selected since, other than the first cell, the other cells are as close as possible creating the worst case scenario that minimizes the overall Chi-square statistic for the specified proportions in the first cell. Many and perhaps a majority of the most frequently used contingency table Chi-square tests comparing two or more groups or populations can be formulated as cell-wise comparisons of multinomial populations. Consequently only the basic test for the comparison of two multinomial populations were used in the Monte Carlo study. Table 1 tabulates the simulation rejection proportions. The simulated significance levels were consistently quite close to the nominal values for both the three and eleven category simulations. For simulations of three cells, the smaller sample size from the adjusted formula, Eq. (5), had simulated power that was very close to the nominal value. The largest simulated deviation from the nominal power was .0083. The simulated power using the conservative (larger) sample size from Eq. (4) always exceeded the nominal power. For the eleven cells, using Eq. (5) the simulated power was very close to the nominal power, but averaged 2.6% below the specified power. The largest deviation for eleven cell power simulations was 3.91% below nominal power. As with the three cell simulations, Eq. (4) provided conservative sample sizes much larger than that needed to achieve the designated power. The power criterion for Eq. (5) is for detecting if the difference between any two corresponding multinomial cells is at least as large as some specified value. This criterion is more easily understood than the non-centrality parameter by most researchers. Using this criterion to calculate the non-centrality parameter, (see Guenther (1977)), Table 2 provides a comparison of the sample sizes calculated using Eq. (5) to those obtained using the usual non-central Chi-square. Observe that the samples computed using Eq. (5) are usually smaller than those using the non-central Chi-square. The difference increases with number of multinomial categories or cells. From the simulation study, however, the smaller sample sizes from Eq. (5) only slightly reduce the power from the nominal value, an average of .0046 for three cells and an average of .0259 for eleven cells. Even though the power is slightly below the nominal value, considering the more understandable power criterion and much easier computation, Eq. (5) is an attractive alternative to other methods for determining sample size for testing multinomial populations. 5. Concluding comments The usual methods for determining the appropriate sample size to meet a specified significance level and power for a Chi-square test is based on the non-central Chi-square distribution or a computer program. The concept of a non-centrality J.A. Nisen, N.C. Schwertman / Computational Statistics and Data Analysis 52 (2008) 4903–4908 4907 Table 2 Sample size comparison Method k 3 3 11 11 Eq. (5) N.C. χ2 Eq. (5) N.C. χ2 α = .05 π = .80 π = .90 π = .95 α = .01 π = .80 π = .90 π = .95 473 482 758 812 619 632 939 1026 753 772 1103 1219 688 694 1064 1109 860 872 1276 1349 1018 1033 1466 1563 parameter for analysis of variance can be quite confusing to most researchers. The corresponding concept for categorical data is even more vague and difficult for the researcher to conceptualize and specify what is a meaningful difference between populations. Queries about the magnitude of the non-centrality parameter are followed by confusion since the concept is not intuitive to most researchers. The criteria suggested in this paper, the largest deviation between corresponding population cells, i.e. ∆, replaces the non-centrality parameter in computing the sample size and is intuitive and far easier for the experimenter to understand and specify. The computation of the sample size from Eq. (5) is extremely straight forward and easy and does not require complicated mathematics, extensive tables or a computer program. The Monte Carlo simulations demonstrate that this technique provide sample sizes that come very close to satisfying the specified power. While the procedure was developed for multinomial populations, many multi-dimensional contingency table analyses can be reformulated for analysis as multinomial data as was done in Example 3. The simple accurate technique in this paper is understandable to most researchers and should provide the statistician or consultant a handy and useful tool for determining sample sizes for a variety of Chi-square tests. Appendix Under the null hypothesis H0 : ∆ = 0 for large samples, the test statistic Zi in Eq. (1) becomes Zi = q (Pˆ 1i − Pˆ 2i ) P1i (1−P1i )+P2i (1−P2i ) n and is approximately normally distributed. Then U (1) = Zi2 has an approximate Chi-squared distribution and P((|Zi |) > q Xα2 (df )) = α. Under the alternative hypothesis, HA : P1 − P2 = ∆ > 0, Eq. (3), P (U1 (1) > Xα2 (df )|∆ > 0) = π, becomes P (|Zi | P q q > Xα2 |∆ > 0) = π Pˆ 1i − Pˆ 2i − ∆ P1i (1−P1i )+P2i (1−P2i ) n or > q α (df )|∆ X2 > 0 + P q Pˆ 1i − Pˆ 2i − ∆ P1i (1−P1i )+P2i (1−P2i ) n q 2 < − Xα (df )|∆ > 0 = π. It is assumed that ∆ = P1 − P2 > 0 is sufficiently large that the second probability term is much smaller than the first and is considered to be approximately zero. The test statistic is calculated under the null hypothesis and is Zi = q (Pˆ 1i − Pˆ 2i ) P1i (1−P1i )+P2i (1−P2i ) n . Then P q (Pˆ 1i − Pˆ 2i ) P1i (1−P1i )+P2i (1−P2i ) n = P q q > Xα2 (df )|∆ > 0 Pˆ 1i − Pˆ 2i − ∆ P1i (1−P1i )+P2i (1−P2i ) n q > Xα2 (df ) − q ∆ > 0 = π. P1i (1−P1i )+P2i (1−P2i ) ∆ n The left-hand side of the inequality is approximately distributed as a standard normal variable, i.e. q ∆ 2 = π. P Zi > X (df ) − q α P1i (1−P1i )+P2i (1−P2i ) n But P Zi > −Zβ = π where P Z > Zβ = β and π = 1 − β. J.A. Nisen, N.C. Schwertman / Computational Statistics and Data Analysis 52 (2008) 4903–4908 4908 Then q Xα2 (df ) − q ∆ P1i (1−P1i )+P2i (1−P2i ) n Solving for n, q n= Xα2 (df ) + Zβ ∆2 = −Zβ . 2 (P1i (1 − P1i ) + P2i (1 − P2i )) . References Adcock, C.J., 1995. The Bayesian approach to determination of sample sizes — some comments on the paper by Joseph, Wolfson and du Berger. The Statistician 44, 155–161. Agresti, A., 1990. Categorical Data Analysis. Wiley, New York. Brunier, H., Whitehead, J., 1993. PEST3: Operating Manual. University of Reading. Chow, S., Shao, J., Wang, H., 2003. Sample Size Calculations for Clinical Research. Marcel Dekker, New York. Fleiss, J.L., 1981. Statistical Methods for Rates and Proportions, 2nd ed. Wiley, New York. Gilliland, F.D., Mahier, R., Hunt, W.C., Davis, S.M., 1999. Preventive health care among rural American Indians in New Mexico. Preventive Medicine 28 (2), 294–202(9). Goldberg, D.P., 1972. The detection of psychiatric illness by questionaire: A technique for the identification and assessment of non-psychotic psychiatric illness. In: Maidsley Monogram, vol. 21. Oxford University Press, Oxford, p. 126. Greenland, S., 1983. Tests of interaction in epidemiological studies: A review and study of power. Statistics in Medicine 2, 243–251. Greenland, S., 1985. Power, sample size and smallest detectable effect determination for multivariate studies. Statistics In Medicine 4, 117–127. Grizzle, J.E., Starmer, C.F., Koch, G.G., 1969. Analysis of categorical data by linear models. Biometrics 25, 489–504. Guenther, W.C., 1977. Power and sample size for approximately chi-square tests. American Statistician 31, 83–85. Haber, M., 1983. Sample Size for the exact test of ‘ no interaction’ in a 2 × 2 table. Biometrics 39, 493–498. Haynam, G.E., Govindarajulu, Z., Leone, F.C., 1970. Tables of the cumulative non-central chi-square distribution. In: Harter, H.L., Owens, D.B. (Eds.), Selected Tables in Mathematical Statistics. Markham, Chicago. Joseph, L, Wolfson, D.B., du Berger, R., 1995a. Sample size calculations for binomial proportions via highest posterior density intervals. The Statistician 44, 143–154. Joseph, L., Wolfson, D.B., du Berger, R., 1995b. Some comments on Bayesian sample size determination. The Statistician 44, 167–171. Lachin, 1977. Sample size determination for r × c comparative trials. Biometrics 33, 315–324. Lindley, D.V., 1997. The choice of sample size. The Statistician 46 (2), 129–138. Meng, R.C., Chapman, D.G., 1966. The power of Chi-square tests for contingency tables. Journal of the American Statistical Association 61, 965–975. Pham-Gia, T., 1995. Sample size determination in Bayesian statistics — a commentary. The Statistician 44, 163–166. Rochon, J., 1989. The application of the GSK method to the determination of minimum sample size. Biometrics 45, 193–205. Schwertman, N.C., 1987. An alternative procedure for determining analysis of variance sample size. Communications in Statistics. Simulation and Computation 16 (4), 957–967. Schwertman, N.C., Owens, M.A., 1989. Simple approximation of sample size for the bivariate normal. Computational Statistics & Data Analysis 8, 201–207. Schwertman, N.C., Schenk, K.L., 1991. Determining analysis of variance sample size for detecting specified trends and interactions. Communications Statistics. Theory and Methods 20 (2), 527–538. Whitehead, J., 1993. Sample size calculations for ordered categorical data. Statistics in Medicine 12, 2257–2271. Wong, S.Y.S., Lau, E.M.C., Lynn, H., Leung, P.C., Woo, J., Cummings, S.R., Orwoll, E., 2005. Depression and bone mineral density: Is there a relationship in elderly asian men. Osteoporois International 16 (6), 610–615. Zar, J.H., 1999. Biostatistical Analysis, 4th ed. Prentice-Hall, Upper Saddle River, NJ, p. 07458.