Running head: TUTORIAL IN STATISTICS: SAMPLE SIZE DETERMINATION 1
Transcription
Running head: TUTORIAL IN STATISTICS: SAMPLE SIZE DETERMINATION 1
Running head: TUTORIAL IN STATISTICS: SAMPLE SIZE DETERMINATION Tutorial in Statistics: Sample Size Determination for ANOVA and MANOVA Amy Williams University of Calgary 1 TUTORIAL IN STATISTICS: SAMPLE SIZE DETERMINATION 2 Tutorial in Statistics: Sample Size Determination for ANOVA and MANOVA Introduction ANOVA and MANOVA are two forms of statistical analysis that are at the forefront of statistical research today. ANOVA, which stands for analysis of variance, is an approach to analyzing data for two or more groups that involves breaking down the dependent variable into between-group and within-group variances (Kerlinger & Lee, 2000). MANOVA, or multivariate analysis of variance, is similar to ANOVA; this approach, however, involves the analysis of two or more groups on two or more dependent variables, as well as the examination of the correlations that exist between these variables (Stevens, 2009). In order for researchers to carry out either ANOVA or MANOVA effectively, certain factors must be taken into consideration. One of these factors is the determination of an appropriate sample size, or the number of participants needed to take part in a particular study. This tutorial sheds light on the importance of sample size in research, highlights important theoretical information about this topic, and gives an overview of how appropriate sample size determination is carried out for both ANOVA and MANOVA. The Importance of Sample Size Determination Determining sample size is an important aspect of research for several reasons. If a sample size is too small or too large, this can have economic implications for researchers in terms of wasted or overused resources, respectively (Lenth, 2001). Conducting a study can be expensive, so knowing ahead of time how many participants are needed allows researchers to anticipate potential costs (Kerlinger & Lee, 2000). There are also ethical concerns related to sample size. According to Lenth (2001), TUTORIAL IN STATISTICS: SAMPLE SIZE DETERMINATION 3 An undersized experiment exposes the subjects to potentially harmful treatments without advancing knowledge. In an oversized experiment, an unnecessary number of subjects are exposed to a potentially harmful treatment, or are denied a potentially beneficial one (p. 187). For these reasons, researchers and statisticians are advised to invest time in planning statistical studies and ensure that the number of subjects involved will provide them with statistically significant results. Utts and Heckard (2006) encourage researchers to ask two important questions to help guide research design planning: how precise will my results be if my study contains a particular number of participants? And how large does my sample need to be in order to obtain statistically significant results? Important Theoretical Information Despite the importance of sample size determination, the literature related to this topic is not extensive (Lenth, 2001). Introductory statistics textbooks generally provide comprehensive overviews of this topic; the authors of these books tend to emphasize the use of larger sample sizes over smaller sample sizes. Kerlinger and Lee (2000) give an important reason for this preference: “the smaller the sample the larger the error, and the larger the sample the smaller the error” (p. 175). Utts and Heckard (2006) shed light on the greater accuracy and minimized uncertainty that having larger sample sizes brings to research design and statistical studies, but they also caution students and researchers about the statistically significant results that even small effects of large samples can elicit. Sample size determination is therefore not just a matter of simply choosing to use a large sample because more benefits are associated with this choice; careful consideration of the overall statistical study is imperative. TUTORIAL IN STATISTICS: SAMPLE SIZE DETERMINATION 4 How to Determine Sample Size There are several steps involved in determining the sample size for a statistical study. Lenth (2001) sheds light on the power approach, which involves five important steps: first, the researcher must identify both the null and the alternative hypothesis; second, he or she must decide on a significance level; third, the researcher must also decide on an effect size; fourth, he or she must gather missing values from related studies or published literature; and fifth, he or she must then decide on the power value for the study. Kerlinger and Lee (2000) focuses on three similar aspects of sample size determination, beginning with a calculation of the actual or estimated population standard deviation value (this information can come from previous studies) and identifying the amount of error that will be tolerated, and then estimating the probability of making a Type I error. In addition, Kerlinger and Lee (2000) offer a formula for sample size determination which is given in Table 1. Lenth (2001) stresses the importance of power analysis in estimating an appropriate sample size, stating that this method is “one of the most popular approaches to sample size determination” (p. 187). Power, after all, is an integral component of statistical analysis because it is what analysis is based upon: the ability to detect statistical differences when they in fact exist. Stevens (2009) refers to power as “the probability of making a correct decision” (p. 162). Table 1 Sample Size Estimation Equation__________________________________________ n = ( Z2 * σ2 ) / d2 Z2 = standard score corresponding to the specified probability of risk σ = the standard deviation of the population d = specified deviation TUTORIAL IN STATISTICS: SAMPLE SIZE DETERMINATION 5 Note. From Foundations of Behavioral Research (4th ed.) by F.N. Kerlinger and H.B. Lee (2000). Belmont, CA: Cengage Learning, p. 297. Copyright 1992 by Fred N. Kerlinger. Sample Size Determination Example: ANOVA Description of the Study A group of educational researchers in Kuwait wishes to conduct an experiment involving fifth grade students that attend English-speaking private schools. This study would require students – both boys and girls – to take part in a two-week (one 90-minute period a day) Social Studies instructional session in which they would be randomly placed in one of three groups: Group 1: No Study Plan (Control Group) Group 2: Implementation of Social Studies “Study Plan” Group 3: Memorization The purpose of this study is to determine if having students take part in a Social Studies “Study Plan” prior to writing a curriculum-based assessment would enhance their performance on this assessment compared to those students who do not take part in a study plan or those who are simply encouraged to memorize notes that they have taken during the two-week period. A study plan involves a contract that students sign in class and then take home to have their parents sign; it involves students making a commitment to review their notes using different strategies several times a week before a scheduled test. Strategies include, but are not limited to, comparing and contrasting ideas that students have learned using a Venn diagram, summarizing main ideas, or illustrating an important concept. The study plan is meant to elicit higher-level thinking and make students aware of their own learning styles in an attempt to discourage them from simply memorizing their notes. How many students should be placed in each group in order for the group of researchers to detect a statistically significant result? To answer this question, the group of researchers will TUTORIAL IN STATISTICS: SAMPLE SIZE DETERMINATION 6 follow the aforementioned steps outlined in Lenth’s (2001) power approach and calculate the sample size using Kerlinger and Lee’s (2000) sample estimation formula (the researchers could also use one of the online sample calculators listed in the Resource section of this tutorial to calculate the sample size). Power Analysis First, the group of researchers will propose a hypothesis. For this particular study, they predict that the mean score for both Group 1 and Group 3 will be lower than that for Group 2. The null hypothesis, of course, states that a difference between the groups will not be present. Next, the researchers will choose a significance level. In the social sciences, a significance level of .05 is usually chosen for statistical studies, which means that the researcher has less that a 5% chance of making a Type 1 error, or choosing to reject the null hypothesis when it is in fact true (Stevens, 2009). The researchers will also choose an effect size. According to Stevens (2009), the effect size refers to “how much of a difference the treatments make, or the extent to which the groups differ in the population on the dependent variable(s) (p. 163). Generally, small or medium effect sizes are chosen in research pertaining to the social sciences (Stevens, 2009). A medium effect size is 0.5. Next, the population standard deviation is either estimated or derived from past research; for this particular study, the researchers estimate a standard deviation of 0.75. Finally, they decide on the power value. Lenth (2001) considers a power of .80 common in statistical studies, and the researchers of this particular study have chosen it as their target. n = ( Z2 * σ2 ) / d2 According to the above formula, Z represents the standard score associated with the level of significance, or risk (Kerlinger & Lee, 2000). In the t-distribution probability chart in the appendix of Kerlinger and Lee’s book entitled Foundations of Behavioral Research, Z = 1.96. TUTORIAL IN STATISTICS: SAMPLE SIZE DETERMINATION 7 The aforementioned standard deviation of the population (σ) is 0.75 and the specified deviation (d) which identifies the precision the researcher hopes to obtain is set at 0.3. n = 1.96 (0.75) = 3.842 (0.56) = 2.16 = 24 0.3 0.09 0.09 The number of participants needed for each group for this particular ANOVA is 24 (or 72 participants in total). This formula is an effective method of obtaining the required sample size for a study involving random sampling; a variation of this formula exists for studies in which the population size from which the sample will be taken is known (Kerlinger & Lee, 2000). (Lauter, 1978) Sample Size Determination Example: MANOVA Stevens (2009) presents a condensed table that indicates the number of subjects needed per group in a MANOVA depending on the desired effect size (see Table 2). More comprehensive tables for three, four, five, and six-group MANOVA, however, have been adapted from those Lauter (1978) created and are included at the back of Stevens’ book entitled Applied Multivariate Statistics for the Social Sciences; these tables are invaluable to both students and researchers as they minimize the estimation and calculations the formulas such as the one mentioned above requires. In order to effectively use these statistical tables, a researcher must make three decisions: the number of groups required for the study, the effect size, and the significance level (either 0.05 or 0.01). Once these decisions have been made, sample size for MANOVA can be determined. Table 2 Sample Size Estimation for MANOVA Groups Effect Size 3 4 5 6 TUTORIAL IN STATISTICS: SAMPLE SIZE DETERMINATION 8 Very Large 12-16 14-18 15-19 16-21 Large 25-32 28-36 31-40 33-44 Medium 42-54 48-62 54-70 58-76 Small 92-120 105-140 120-155 130-170 From Applied multivariate statistics for the social sciences (5th ed.) by J.P. Stevens. New York, NY: Taylor & Francis Group. Copyright 2009. Description of the Experiment A researcher wants to determine the impact that both marital status and living in housing provided by the school (independent variables) has on overseas teachers’ ratings of job satisfaction and length of employment in Kuwait (dependent variables). The researcher randomly obtains participants from this study from the various international schools in Kuwait and has them fill out a survey that includes questions about job satisfaction (for example, “overall, how would you rate your experience at your particular school”) and length of employment. Upon receiving completed surveys, he plans on dividing the participants up into four groups based on the independent variable: Group 1: Married and living in school accommodations Group 2: Single and living in school accommodations Group 3: Married and living in own accommodations Group 4: Single and living in own accommodations. He has chosen a power of .80 and has decided upon a medium effect size (basing both decisions on what is common in most social science studies; he could have also researched past studies on a similar topic and chosen an effect size this way). According to the statistical table in the appendix of Steven’s (2009) book, this researcher would require 50 participants per group (200 participants in total) in order for his study to produce statistically significant results. If the researcher decides to change the study so that there are now four dependent variables (‘job satisfaction,’ ‘length of employment,’ ‘overall school rating,’ and ‘work TUTORIAL IN STATISTICS: SAMPLE SIZE DETERMINATION environment rating’), and he now anticipates a large effect size, the number of participants required per group would decrease to 37. Sample size determination for MANOVA is therefore dependent upon the number of dependent variables as well as the effect size and the chosen power value. This method of sample size determination is referred to as a priori estimation, which is a method that relies extensively on power values (Stevens, 2009). Summary and Conclusions Determining the sample size for a statistical study is an important aspect of a quality research design; it is also a difficult process (Lenth, 2001). Several methods for determining sample sizes for ANOVA, MANOVA, and other analyses exist, including the power approach, the random sampling formula, and a priori estimation. Each of these methods requires the researcher’s knowledge of the effect size and power and is used before data for a particular statistical study is collected. When used with careful consideration and planning, these methods are effective tools. The post hoc estimation of power is also an option for determining sample size; this method, however, is used after a study has actually been carried out and involves the researcher interpreting the results and identifying the effect sample size and effect size have on the power (Stevens, 2009). Not all researchers and statisticians favour this method. Lenth (2001) cautions against retrospective planning such as the post hoc, stating that the goal of this method involves “collect[ing] enough additional data to obtain statistical significance, while ignoring scientific meaning” (p. 191). Regardless of which sample size estimation method a researcher chooses, he or she must acknowledge that sample size, effect size, and power are dependent on one another and that in order to estimate or determine one, information about the other two is needed. A wealth of 9 TUTORIAL IN STATISTICS: SAMPLE SIZE DETERMINATION 10 information, statistical software, and web resources are available to help researchers determine the appropriate sample size needed for their statistical studies so that this important task – as daunting as it may seem – is not overlooked or neglected altogether. Online Resources Java Applets for Power and Sample Size: http://www.stat.uiowa.edu/~rlenth/Power/ IBM SPSS Sample Power Program: http://www-01.ibm.com/software/analytics/spss/products/statistics/samplepower/ National Statistical Service Sample Size Calculator: http://www.nss.gov.au/nss/home.nsf/pages/Sample+Size+Calculator+Description?OpenDocume nt TUTORIAL IN STATISTICS: SAMPLE SIZE DETERMINATION 11 References Kerlinger, F. N., & Lee, H. B. (2000). Foundations of Behavioral Research (4th ed.). Belmont, CA: Cengage Learning. Lauter, J. (1978). Sample size requirements for the T2 test of MANOVA (tables for one-way classification). Biometrical Journal, 20, 389-406. Lenth, R. V. (2001, August). Some practical guidelines for effective sample size determination. The American Statistician, 55, 187-193. Stevens, J. P. (2009). Applied multivariate statistics for the social sciences (5th ed.). New York, NY: Taylor & Francis Group. Utts, J. M., & Heckard, R. F. (2006). Statistical Ideas and Methods. United States: Thomson Brooks/Cole.