Reviews on the Determining Sample Size using Statistical Method
Transcription
Reviews on the Determining Sample Size using Statistical Method
Reviews on the Determining Sample Size using Statistical Method Bog-Ja Jo, Hee-Hwa Oh, Kyoungho Choi Reviews on the Determining Sample Size using Statistical Method 1 Bog-Ja Jo, 2Hee-Hwa Oh, 3Kyoungho Choi Department of Business Administration, California International Business University, [email protected] 2 School of Management Administration, Chonbuk National University, [email protected] *3, Corresponding Author Department of Basic Medical Science, Jeonju University, [email protected] 1 Abstract The importance of sample size determination and a few types of software and web programs are introduced in this paper. Also, existing problems of the papers regarding the power analysis are examined. Results are as follows. Most studies did not utilize power analysis before performing the clinical trials. As a result, performed analyses were less reliable due to their low test power. This is a very apprehending matter. Domestic (Korea) and overseas software and web programs introduced by this study will be useful for people who want to decide sample sizes. Keywords: Sample size determination, power analysis, type I error 1. Introduction Nowadays, statistics is being used in almost all academic areas, e.g. social sciences, natural sciences, and as well as health and medical sciences. However, when determining the sample sizes for analyses in research or experiments, either only the type I error (α) was considered or some sample sizes were actually determined without even going through such consideration. These cases are clearly noticeable especially in social science category, and a conclusion was usually made for hypothesis with proper significant levels (usually 5%), after deciding the convenient and suitable sample size. In this case, the analyses become less reliable due to uncertain test power. On the other hand, health and medicine fields are using methods to choose reliable sample sizes. As the need for clinical trials is particularly increasing in the health field, α and test power (1- β) are often considered in clinical trials for sample size calculation [15]. On the other hand, [3] has made a table of sample sizes needed for statistical analysis. Whatsoever, there was a limit to present sample sizes or analysis due to the existence of the table; and according to [15], calculating sample sizes was troublesome because it had to go through complicated calculations involving α and test power (1- β). To simplify this complication, software to decide sample sizes that are needed for clinical trials was developed. Furthermore, due to the improvements of Internet system, with the accessibility of CGI and PHP, other web program services became available [10]. Both fee-charging and free, various software and web program services are thoroughly described in [5] with the range of calculating sample sizes. In any event, providing web programs and web sites, [12] is available for domestic (Korea) research, which will calculate sample size needed for mean and proportion test. These researches are very useful despite the fact that there are not many researches related to this topic. Yet, there is a weakness of not providing various information compared to overseas ones. The goal of this paper is to introduce the practical use of the software that helps accurately calculating the sample size under certain conditions (hypothesis form, significant level, effect size, etc.) when conducting clinical trials. [12] and GPower3.1[6] software will be used for domestic and overseas software respectively. Furthermore, both of their software programs will be used to grasp the problems occurring in the previously published researches regarding the domestic health and medicine fields, and to provide suggestions about them. In the second chapter, the process of calculating sample sizes is logically studied because the methods are similar, although there are various ones. The practical use for GPower3.1 software is introduced in the third chapter. In addition, problems in previously published papers of calculating sample sizes and power are discussed in the fourth chapter. In the fifth chapter, calculating sample sizes using statistic methods based on all the information provided in the previous chapters is suggested. Journal of Convergence Information Technology(JCIT) Volume8, Number12, July 2013 doi:10.4156/jcit.vol8.issue12.32 284 Reviews on the Determining Sample Size using Statistical Method Bog-Ja Jo, Hee-Hwa Oh, Kyoungho Choi 2. Sample Sizes Determination Methods The process of producing sample sizes based on the statistical theories is listed below. First, one must define null and alternative hypothesis, then decide the test type (one-sided or two-sided hypothesis). Second, one must choose the appropriate statistical test based on the measurements of independent and dependent variables in hypothesis. Third, one must set the expected size difference, and if necessary, has to consider the reliability of the measurement. Fourth, one must decide the α or β (or power). Fifth, one must produce the necessary number of samples by using given equations. By the way, there are various equations for calculating sample sizes depend on researches, and the followings are the equations given by [12]. If X1, X2, ⋯, Xn is a random sample of N(μ, σ ) and σ is known, the mean test with a null hypothesis of H0 :μ = μ , an alternative hypothesis of H1 : μ ≠ μ , satisfying significance level α, test power 1-β at the least, and sample size nwill be shown as equation (1). Furthermore, under an alternative hypothesis, the test power can also be shown as equation (2). ( ) n= (1) ( ) 1 − β = 1 − Pr(Z > −z + ( ) (2) The equation for sample sizes required when using an effect size, δ = |μ − μ |/σ, is shown as (3). ( ) (3) n= Next, when assuming that X1, X2, ⋯, Xn1 is independently random sample of a normal distribution N(μ , σ ) and Y1, Y2, ⋯, Yn2 is independently random sample of a normal distribution N(μ , σ ), Xi and Yi are also independent from each other. For the convenience, if n1 = n2 = n and σ = σ = σ are assumed, the equation n to get 1-β from the significance level of α, a null hypothesis of H0 :μ − μ = 0, and an alternative hypothesis of H1 :μ − μ ≠ 0 is shown in (4). In this case, the test power is expressed as the equation (5). Where μ , μ are population means under alternative hypothesis, respectively. ( ) n= (4) ( ) 1 − β = 1 − Pr(Z > −z + ( ) (5) In addition, there are two researches, research [14] and research [1]. Research [14] summarizes a test for superiority in clinical trials using equations (1) and (2), and research [1] estimates a sample size using the relative risk and odds ratio in the two populations proportion comparison. They are expressed a little differently, but the method of producing sample sizes is very similar. However, using such formula has one negative aspect. It requires a fairly complicated calculation process. 3. Software Introduction for Sample Size Determination Determining the sample size by clinical trials considering form of hypothesis, significance level, size of effects, power, etc. is very important. However, it is not easy and very inconvenient for researchers, who are not familiar with statistics, to produce sample size by using equation (1). For this reason, software (or a web program) that calculates and produces sample size accurately and quickly would be very useful. Thus, this chapter will introduce free and fee-charging software and web program services’ characteristics and methods for calculating sample sizes based on [5]. Moreover, the practical use for free program [12] and GPower3.1 will be introduced. 3.1 Free and Fee-Charging Software and Web Program Services (1) http://www.tulane.edu/%7Edunlap/psylib.html • powmr.exe: computes power for multiple regressions • power.exe: computes power for one-way ANOVA • powr.exe: computes power for simple correlation 285 Reviews on the Determining Sample Size using Statistical Method Bog-Ja Jo, Hee-Hwa Oh, Kyoungho Choi (2) http://www.psycho.uni-duessldorf.de/aap/projects/gpower/ • performs power analyses for some common statistical tests (t-test, F-test, chi-square) (3) http://pages.infinit.net/rlevesqu/Syntax/SampleSize/SampleSizeForProportions.txt • samples for proportions (SPSS syntax file) (4) http://support.sas.com/faq/042/FAQ04291.html • macro-power and sample size (SAS module for comparing two proportions) (5) nQuery Advisor: http://www.statsol.ie/ (6) Power and Precision (logistic regression): http://power-analysis.com/ (7) Statistical Power Analysis: http://www.statsoft.com/products/power_an.html (8) http://www.danielsoper.com/statcalc/calc05.aspx • effect size calculator for multiple regression • type II error calculator (9)http://home.ubalt.edu/ntsbarsh/Business-stat/otherapplets/SampleSize.htm#rproptyp • sample size for the test of one and two proportions (10) http://www.math.yorku.ca/SCS/Online/power/ • power analysis for ANOVA designs 3.2 Introduction to Domestic and Foreign Software Domestic programs for calculating sample sizes and power is available via web services [10], which made [12] accessible. From http://pluto.hallym.ac.kr/zsize, the services are offered for free. Provided services are calculations of sample sizes and power. For these cases, one-sample mean test (when effective size is decided, hypothesis is decided), independent two-sample test (when effective size is decided, hypothesis is decided), and proportion test (one-sample proportion test, independent twosample proportion test) are provided. [16]’s example 5 from page 254 is used to introduce the instruction for program [12]. A question to calculate the minimum sample sizes that satisfies significant level 0.2 and power 0.8, when null hypothesis is effective size 0.2, is considered. This can be solved by hand using the equation (4), but as Figure 1 is showing, using [12] would get the wanted answer more quickly and accurately. Figure 1. Calculation sample size using web programming by [12] Figure 2. The distribution-based approach of test specification in GPower 3. Figure 3. The power plot window of GPower 3.1 For being the only service provided domestically by [12], it is very convenient, but it has a weakness of providing only two topics, mean test and proportion test. On the other hand, GPower3.1, for the 286 Reviews on the Determining Sample Size using Statistical Method Bog-Ja Jo, Hee-Hwa Oh, Kyoungho Choi following introduction, is free software that is able to variously calculate sample sizes power. For this introduction, [16]’s example 6 from the page 255 is used. A question that solves for the minimum sample size needed to satisfy the requirements of significance level 0.05 and power 0.9 is considered. This is used in a study for determining the correlation between cotinine and bone density for smokers’ quantity of smoking. However, the negative correlation of the blood CO concentration and bone mineral density must be presumed in this study. Output Parameter can be obtained, as Figure 2 shown above, if it is calculated with GPower3.1 and the given information is added in the Input Parameter. As shown in Figure 2, GPower3.1 provides a lot of information. The tap “Protocol of Power Analyses” shows the result of analyses neatly and the tap “X-Y Plot for a Range of Variables” for Plot shows changes in sample sizes due to various parameters. 4. Empirical Analysis Type 1 error (α) is generalized and centralized in statistical significant test of mean difference. The type 2 error (β), on the other hand, bigger errors occur in the result occasionally, due to lack of concern. Generally, error β relies on effective size (the difference in the mean of two groups with σ unit), sample sizes, error, etc., which can be seen in the equation (5). To reduce the errorβ, either increase the effective size, increase the sample size, or decrease the error α. On that account, calculating the sample size for the statistical significant test by groups’ mean difference is related to both error α and error β. Previously, [3] and [11] have provided evidence for determining the sample size, in this, 0.05 or 0.01 for error α and 0.2 for error β usually fixed. This is not a standard, but a convenient set-up [2]. Nonetheless, there is no research among previously published ones about health and medical that calculates sample sizes considering the matters mentioned above. Thus, it is easy to discover studies that were not thoroughly prepared enough to determine sample sizes. For example, [9] has gathered 14 people in 20’s, randomly separate them to perform a test, and analyzed the result with significance level 5%. However, if the effect size (δ) is presumed as δ=0.5, then GPower3.1 can easily calculate that the minimum sample size satisfied by setting α=0.05 and β=0.2 is n=34. Yet, this study was performed with only 14, less than the half of what is required, which resulted in unsuccessful analysis of power 0.57. Therefore, the reliability of this analysis result is not high. Table 1 shows a few other researches that seem to lack preparation. In the table, the minimum sample sizes required should be α=0.05 and β=0.2, same as [2]. Table 1. Underprepared example articles related to sample size determination Sample Required minimum Author Test parameter Power size sample size Gong et al. [7] r (correlation coefficient) 40 0.610 82 Jeon et al. [8] 153 0.802 128 μ − μ (independent) Choi et al. [4] 17 0.706 27 μ − μ (paired) Lee [13] 61 0.450 204 μ − μ (independent) [7] has performed a study to validate the correlation between the static muscle endurance time and the joint working range. If the medium effect size, 0.3, is presumed by the calculation of GPower3.1, then the minimum sample size that satisfies 0.8 is n=82. In reality, sample n=42 was used, which resulted in analysis of power 0.610. On the other hand, [8] has used n=153 to test if the satisfaction in physical therapy of parents of disable children differs based on gender. In this case, the study was performed with unnecessarily too many samples, compare to the common standard, power 0.8, because the sample size of n=128 is enough to get the power 0.8. Moreover, [4] researched whether groups using paired sample have differences. However, this study, too, had a small sample size that produced the actual power 0.706. One of the similarities of these studies is that designing process of the research omitted power analysis. Unlike these, [13] presents power analysis in the studying method. In other words, the minimum sample size was set as 30 based on Cohen [2]’s power analysis. However, this is a sample size for paired test. If the situation is an independent test, the sample size has to be n=204 in 287 Reviews on the Determining Sample Size using Statistical Method Bog-Ja Jo, Hee-Hwa Oh, Kyoungho Choi order to satisfy [13]’s presumption, effect size δ=0.36, and power 0.7. Therefore, using much smaller number of n=61 sample resulted power 0.450 and failed to meet the goal. 5. Conclusion and Suggestion Nowadays, almost all studies in health and medical fields are using a statistical method for general conclusions. However, researchers lack basic statistical knowledge, and they are negligent toward determining sample sizes (power analysis) during process of designing research. In this situation, statistical software (usually SPSS, SAS, etc.) is usually used for results, but the researchers are careless about the satisfaction of the power. This is a matter to be concerned. Statistical software only shows results of certain orders, but does not inform whether the sample size satisfies the power requirements. According to statistical theory, the chances of choosing null hypothesis increase, if the sample size is small. Therefore, researchers are willing to increase the sample size in spite of α risk. By doing this, the error β can be diminished. When the sample size is too small, the error β would increase and cause a false result. Thus, power analysis must be used to determine the sample size. In this study, the practical uses for domestic and overseas software, which consider hypothesis type, significant level, effective size, etc. to quickly and easily calculate sample size required in statistical analysis by clinical trials, are introduced. Also, it analyzed problems of calculating sample size and power in previously published researches by empirical analysis. As a result, following were founded: first, most studies omit power analysis prior to clinical trials; instead of producing the minimum sample size that satisfies the power, sample sizes were being chosen conveniently; and second, most analyses showed low results and have low reliabilities. Statistical analysis is an inference process that estimates parameter by using samples. For results to be reliable, enough samples must be gathered, based on theory. In this process, domestic and overseas software and programs, introduced by this paper, will be very useful. There are expectations for GPower3.1 and other software introduced to be used widely to determine sample size. 6. References [1] Cho, S. K., Kang, W. and Che, S. S., “Sample size calculation based on the inference of relative risk and odds ratio”, Journal of The Korean Society of Health Information and Statistics, vol. 36, no. 1, pp.109-121, 2011. [2] Cho, N., Statistical errors and traps, C. A. J. Press, Seoul, Korea, 2001. [3] Cohen, J., Statistical power analysis for the behavioral sciences, 2/e, Lawrence Erlbaum Associates, Publishers, Hillsdale, NJ, 1988. [4] Choi, Y., Koh, Y. and Kang, Y., “Effects of dance-movement therapy program on stress, anxiety and depression reduction of middle aged women”, Korean Public Health Research, vol. 36, no. 1, pp.9-16, 2010. [5] Dattalo, P., Determining sample size, Oxford University Press, Oxford, NY, 2008. [6] Faul, F., Erdfelder, E., Lang, A. G. and Buchner, A., “G*Power3; A flexible statistical power analysis program for the social, behavioral, and biomedical sciences”, Behavior Research Methods, vol. 39, no. 2, pp. 175-191, 2007. [7] Gong, W., Lee, S. and Lee. Y., “The effect of cervical ROM muscle endurance on cervical joint mobilization of normal adults”, Journal of the Korean Society of Physical Medicine, vol. 5, no. 1, pp. 713, 2010. [8] Jeon, J. K., Kim, B. H., “A survey of satisfaction of parents with handicapped children at physical therapy services-on the basis of Jeon-nam areas-1”, The Korean Academy of Physical Therapy Science, vol. 18, no. 3, pp.41-51, 2011. [9] Jung, S., “The effect comparison of Mckenzie exercise and conservative physical therapy on acute neck pain”, Journal of the Korean Society of Sports Physical Therapy, vol. 7, no. 1, pp. 9-16, 2011. [10] Kang, H., Sim, S., “Implementation of estimation and inference on the web”, Communications of the Korean Statistical Society, vol. 7, no. 3, pp.913-926, 2000. [11] Kirk, R. E., Statistics: An introduction, 5/e, Belmont, CA: Thomson Wadsworth, 2008. 288 Reviews on the Determining Sample Size using Statistical Method Bog-Ja Jo, Hee-Hwa Oh, Kyoungho Choi [12] Lee, C., Kang, H. and Sim, S., “An implementation of the sample size and the power for testing mean and proportion”, Journal of the Korean Data and Information Science Society, vol. 23, no. 1, pp.53-61, 2010. [13] Lee, J., “The effect of exercises program on sprit and sleep of old aged women”, CAU Nursing Journal, vol. 14, pp.21-27, 2010. [14] Lim, C. Y., Kwak, M., “The satisfaction considerations for superiority trial”, Journal of The Korean Society of Health Information and Statistics, vol. 36, no. 2, pp.200-204, 2011. [15] Rosener, B., Fundamentals of biostatistics, 7/e, Books/Cole Cengage Learning, Boston, MA, 2010. [16] Shin, Y., Ahn, Y., Medical research methodology, Seoul National University Press, Seoul, Korea, 2008. 289