George Menexes , Iannis Papadimitriou
Transcription
George Menexes , Iannis Papadimitriou
POWER ANALYSIS AND SAMPLE SIZE DETERMINATION IN THE CONTEXT OF CORRESPONDENCE ANALYSIS George Menexes1, Iannis Papadimitriou2 University of Macedonia, Thessaloniki, Greece The Correspondence Analysis is considered mainly as a descriptive technique designed to analyze simple two-way and multi-way tables containing some kind of correspondence or association between the rows and the columns. In the case of two variables where the data have been collected using the simple random sampling scheme, the statistical significance of the total inertia of the contingency table can be tested by means of the 2 distribution. If the total inertia is considered as an index of a global effect size then the observed power of the 2 test of independence can be estimated. In the context of the Statistical Power Analysis proposed by Cohen, the minimum required sample size can be estimated, a priori, in such way that a predetermined effect size can be detected as statistically significant by the 2 test, at significance level with power . In this study we introduce the concept of dynamic inertia and we propose a methodology for an a priori and post hoc Power Analysis of the 2 test using the square root of the total inertia and the dynamic inertia as measures of effect size. Using the proposed methodology the minimum required sample size can be estimated in case of a survey or of an experimental study in which the Correspondence Analysis method will be applied. Finally, a generalization of the method is attempted in the multivariate case. Keywords: Chi-square test, effect size, total inertia, dynamic inertia 0 INTRODUCTION A conceptual presentation of the methodology used in Correspondence Analysis (CA) can be provided in a great variety of ways (see Benzécri 1992, Greenacre 1993, Gifi 1996, Nishisato 1980, Israëls 1987). That is probably the main reason why this method “surfaced” on several occasions during the 20th century (Michailidis & De Leeuw, 1998; Clausen, 1998). The CA is mainly considered to be a descriptive method used to examine the relationship between two or more categorical variables. The corresponding results provide information, similar in nature to that produced by the techniques applied in factor analysis, and allow us to study the structure of the variables included in the analysis. One characteristic of this method is that the data from one sample is treated as if it represents the whole population under study. Nevertheless, in the case of two variables, where the collection of the relevant data has been carried out using simple random sampling, the statistical significance of the contingency table’s total inertia can be tested via the 2 distribution (Lebart, et al., 1977; Lebart, et al., 2000). Which is why, many authors (Weller & Romney, 1990; Greenacre, 1993; Van de Geer, 1993; Blasius, 1994; Micheloud, 1997; Clausen, 1998) combine the application of the CA with the 2 test of independence. 1 2 E-mail: [email protected], [email protected] E-mail: [email protected] 1 By tradition, statistical hypothesis testing in scientific research has shown an obvious preference towards the use of statistical significance as a criterion for the rejection or nonrejection of the null hypothesis H0 (Huck, 2000); with the result that more emphasis was placed on the testing and management of Error Type . In recent years, however, and particularly following Cohen’s studies (1962, 1965, 1988) related to Statistical Power Analysis in Behavioral Sciences, the attention of researchers is beginning to turn also towards the testing of Error Type II and the necessity of analyzing the power of statistical tests (Cohen, 1988; Murphy & Myors, 1998). This necessity has exceeded the boundaries of Behavioral Sciences and has made its presence in other scientific fields also (see Muller, et al., 1992; Hubbard & Armstrong, 1992; Verma & Goodale, 1995; Buhl-Mortensen, 1996; Thomas & Juanes, 1996; Heidelbaugh & Nelson, 1996; Meyer & Mark, 1996; Miller, et al., 1997; Sheppard, 1999; Evans & Viengkham, 2001; Foster, 2001; Di Stefano, 2001; Nutahara, et al., 2001). Hallahan & Rosenthal (1996) and Sheppard (1999) present an informative introduction to the theory of Power Analysis of statistical tests, while Thomas & Krebs (1997) attempt to make a comparative presentation of the software used for Power Analysis. In fact, Gatti & Harwell (1998) recommend the use of software for Power Analysis as opposed to traditional Power Charts, which do not facilitate the reading of the numerical values and can lead to false estimations, due to the frequently required linear interpolation. Within the framework of Statistical Power Analysis recommended by Cohen (1988), it is also possible to make an a priori estimation of the minimum required sample size, so that the 2 statistical test of independence for two categorical variables, at a significance level , with power , can detect a predefined Effect Size (ES) as statistically significant. The issue of Power Analysis in relation to the statistical test 2 has also concerned other researchers (Meng & Chapman, 1966; Nathan, 1972; Guenther, 1977; Lachin, 1977). The methodologies they propose seem to resolve the problem on a local basis (e.g. for specific sampling or experimental designs and specific alternative hypotheses) and not through a general methodological framework, like Cohen’s proposal, which, as we shall see later on, also allows us to connect Power Analysis to the total inertia of the contingency table of two categorical variables. In the present paper, we consider total inertia as an index of effect size, based on which we can estimate the observed power of the statistical test of independence 2. We introduce the concept of dynamic inertia of a contingency table with two categorical variables and propose a methodology for an a priori and post-hoc Power Analysis of the statistical test 2 , using the square root of the total inertia and the dynamic inertia as indexes of effect size. Through the methodology we recommend, it is possible to estimate the minimum required sample size for a sampling or experimental research, on whose data Correspondence Analysis will be applied. In Chapters 1-4, we present the basic notation and concepts related to the process of statistical hypothesis testing (Error Type , Error Type II, Error Type ½, Power, Observed Significance Level). Chapter 5 is a summary of the main arguments on which the criticism against statistical hypothesis testing is based. In Chapter 6, there is a brief presentation of a priori and post hoc Power Analysis within the framework recommended by Cohen. Chapter 7 includes an overview of the Power Analysis methodology for the 2 test of independence in the case of two categorical variables and of the relation between the total inertia of the relevant contingency table and the effect size, as determined by Cohen for the test of independence 2. In Chapter 8, we suggest ways of predetermining the effect size and highlight the relationships between the total inertia and other contingency coefficients. In Chapter 9 we introduce the concept of dynamic inertia of a contingency table with two categorical variables. Chapter 10 involves an attempt to generalize on the method for 2 calculating sample size in the case of multiple variables. In Chapter 11, we present the Power Analysis methodology in the case of the 2 goodness of fit test, in order to determine the number of significant factorial axes that will remain after CA is applied to a contingency table with two variables. Finally, Chapter 12 includes numerical examples (the calculations were made using MS-Excel software with the support of the add-in -face). 1 THE STATISTICAL SIGNIFICANCE OF TOTAL INERTIA Let us suppose that X and Y are two categorical variables with k and l categories respectively. We symbolize with: F: the k l contingency table of absolute frequencies with elements fij (i=1,…,k and j=1,…,l) which expresses the joint distribution of the variables and Y fi+ : the marginal absolute frequency of row i, i=1,…,k in table F f+j : the marginal absolute frequency of column j, j=1,…, l in table F : the grand total of table F with fi f i j N j P: the k l correspondence table whose elements are the elements of table F divided by ; i.e. the elements of table P are provided by the formula: f ij pij N , i=1,…,k and j = 1,…, l ri: the mass (or weight) of row i in table P, where: ri f ij p ij j N j , fi i=1,…,k and j=1,…, l N cj: the mass (or weight) of column j in table P, where: cj pij i i f ij f N N j , i=1,…,k and j=1,…, l It is a known fact (see Greenacre, 1993) that the total inertia IF of table F expresses a generalized variance and, more specifically, the weighted average of the squared 2 distances of the row profiles (or equivalents of the column profiles) from their center of gravity. To calculate IF, we can also use the following formulae (Blasius & Greenacre, 1994): ( pij IF i IF j ri c j ) 2 ri c j Q N [1.1] [1.2] From [1.2] Q=NIF [1.3] 3 2 In [1.2], the Q quantity is the statistic as follows: f ij Q i fi f N fi f j j that corresponds to table F and is calculated 2 j ( p ij N i j ri c j ) 2 [1.4] ri c j N When the data has been collected through simple random sampling, based on [1.3] and the acceptance that the preconditions are valid for the application of the 2 statistical test of independence (see Lancaster, 1969), the statistical significance of the quantity NIF can be tested through the 2 distribution with (k-1)(l-1) degrees of freedom (Lebart, et al., 1977; Lebart, et al., 2000). 2 ERROR TYPE AND ERROR TYPE II In any statistical test, the decision related to the rejection of H0 can be correct or false. An erroneous decision is reached when: a) H0 is rejected, when in reality it is true. We then say that an Error Type or error of the first kind is committed. The probability of committing an Error Type is designated by and is the conditional probability: = (rejection of H0 / H0 true) [2.1] b) H0 is not rejected, when in reality it is false. We then say that an Error Type or error of the second kind is committed. The probability of committing an Error Type II is designated by and is the conditional probability: = (non-rejection of H0 / H0 false) [2.2] When an H0 is tested, what is chosen as is a value that expresses the maximum probability of accepting the commitment of an Error Type . This probability is known as significance level and must be determined by the researcher prior to the sampling or execution of an experiment, so that the results of the statistical analyses do not affect its value (Hinkle, et al., 1988; Kachigan, 1991; Cohen, 1988). Thus, the value of should not be determined following certain preliminary data analyses, neither should it be modified in order to cater for the rejection or non-rejection of specific null hypotheses. Furthermore, the significance level expresses the probability of committing an Error Type only when: a) the measurements are valid and reliable and b) the preconditions are valid for the application of the corresponding statistical test. In practice, the conventional (arbitrary) values =0.10 or =0.05 or =0.01 are traditionally used (Hinkle, et al., 1988; Kirk, 1995; Hopkins, 1997; Hair, et al., 1995; Huck, 2000). For example, if in one test the significance level =0.05 or 5% is given and the H0 is rejected, then theoretically in 100 similar cases or 100 repetitions of the experiment, only 5 erroneous decisions are expected to be taken, i.e. rejections of H0, when it is actually correct. It therefore seems that the significance level expresses an error rate that is mainly related to 4 the statistical process and not to the value of the statistic (e.g. t, F and 2) of the test (Lohninger, 1999). The probability of a true H0 not being rejected is determined by the significance level : 1 = (non-rejection of H0 / H0 true) [2.3] The probability of a false H0 being rejected is determined by the probability called the power of the statistical test: =1 = (rejection of H0 / H0 false) and is [2.4] Relations [2.3] and [2.4] express the probability of a correct decision having been taken in a statistical test. Therefore, in order for someone to reach relatively safe and reliable conclusions, based on the available data, the statistical test should minimize and . However, any attempt to minimize one risk causes the other to increase. On a practical level, we attempt to reduce whichever risk is considered most important. One way of simultaneously minimizing both risks is by increasing the sample size (Zar, 1996); however, this is not always feasible due to natural, technical, financial and ethical restrictions. Still, which of the two errors is the most important? The answer is relative, and depends on many factors, such as the general purpose and specific goals of the research, its theoretical framework, the researcher’s knowledge or other justifications. In any case, nevertheless, any decision to reject or not reject a hypothesis must calculate and take into account both and . Statistical tests include several conventions as regards the predefinition of and . For example, many scientists who use statistical testing in their research set 0.05 and 0.20. This means that they consider the risk of committing an Error Type more serious than the risk of an Error Type . If we calculate the ratio: (committing an Error Type II) / (committing an Error Type I), where =0.05 and =0.20, we have, 0.20 / 0.05=4. In this case, the Error Type is considered 4 times more serious, or critical, than the Error Type II. If =0.20 then the power is =0.80. On the other hand, other scientists set the value 0.80 as the minimum accepted power of a statistical test and, if the latter has a smaller power, then they either do not carry out or redesign their research (Kirk, 1995). 3 ERROR TYPE II½ In order to avoid any erroneous conclusions, particular attention should be given to the fact that the test’s inability to reveal a statistically significant result (e.g. difference, effect, correlation) does not signify that the said difference or effect or correlation does not exist in the corresponding populations. This erroneous conclusion is often referred to as Error Type II½ (Kritzer, 1996). It is a logical error or fallacy and pertains to cases when, within a hypothetical syllogism, it is assumed that the conclusion’s argument is valid (i.e. when affirming the consequent) (Dometrius, 1992; Kargopoulos & Raftopoulos, 1998). 5 4 OBSERVED SIGNIFICANCE LEVEL (p-VALUE) The observed significance level of a statistical test is the probability of observing a value of the statistic that is greater or equal to the value given by the sample, provided that the H0 is true, i.e.: p=P(Z |z | / H0 is true) [4.1] where is the random variable that corresponds to the test’s statistic and z the statistic’s value for that specific sample (e.g. t, F and 2). The value of the observed significance level, which is based on the data, will support the decision about whether to reject the H0 or not. If a test’s observed significance level is smaller or equal to the predefined significance level , then the H0 is rejected at significance level (Dometrius, 1992; Kirk, 1995; Kinnear & Gray, 1999). If the observed significance level is greater than the predefined significance level , then the H0 is not rejected. The observed significance level expresses the probability of a statistical result, greater or equal to the observed result, happening “by chance” if the H0 is true (Bryman & Cramer, 1999). The value of the observed significance level expresses the lowest significance level at which the H0 can be rejected. It should however be noted, that in all cases what is actually valid in relation to H0 is unknown. 5 CRITICISM CONCERNING THE H0 SIGNIFICANCE TESTS The Null-Hypothesis Significance-Test Procedure (NHSTP) became the subject of criticism as early as the 1960s, and since then various writers have addressed the issue periodically (Yates, 1951; Kish, 1959; Rozeboom, 1960; Bakan, 1966; Morrison & Henkel, 1970; Pratt, 1976; Cox, 1977; Carver, 1978; Parkhurst, 1985; Guttman, 1985; Oakes, 1986; Chatfield, 1991; Loftus, 1991; Yoccuz, 1991; Schmidt, 1996). The main arguments against the NHSTP can be summarized as follows: The statistical significance of a result can be due to the appropriate choice of the sample size and the significance level . The H0 can never be true. Based on statistical significance we cannot reach conclusions concerning the reverse probability of the hypothesis, i.e. the probability that the H0 is true given the available data. Statistical significance provides no information about the values of the parameters of the populations. Testing for Error Type II is unjustifiably overlooked. Statistical significance cannot be used to reach conclusions related to the practical or clinical significance of a result. The binary logic of the NHSTP (H0 is either rejected or not) does not conform to the fact that knowledge is acquired one step at a time. The procedure in question carries the risk of stochastical and logical errors, as well as misconceptions (Menexes & Oikonomoy, 2002). However, in most cases, the criticism against it is not supported by statements related to the statistical procedure itself, but rather to the fact that the erroneous perceptions of researchers and the stochastically illiteracy are the factors that lead to a wrongful use and interpretation of the results from the H0 statistical significance tests. 6 6 POWER ANALYSIS Power Analysis is commonly carried out during the planning stage of a research or of an experiment, i.e. prior to data collection (a priori) and is used to estimate the probability of a false H0 being rejected. In other words, through Power Analysis we attempt to assess the degree of confidence that will be attributed to the test’s “ability” to actually provide statistically significant results. The power of a statistical test depends mainly on three factors (Cohen, 1988; Murphy & Myors, 1998): a) The significance level b) The sample size n, and c) The Effect size (ES). Effect Size can be generally defined as the extent or the magnitude of the phenomenon under study (Cohen & Cohen, 1983). It is a measurement of the degree to which a phenomenon is realized (Cohen, 1965). From a different viewpoint, ES can also be considered as the observed result’s degree of deviation from the H0 (Kramer & Rosental, 1999). Every statistical test has a different ES that can be measured in two ways (Cohen, 1988; Kramer & Rosental, 1999; Murphy & Myors, 1998): a) as a difference, standardized or not (e.g. Cohen's d, Hedges' g, Glass' delta), or b) as a correlation or contingency (e.g. r, r2, 2, 2, ). In conjunction with power, the three above-mentioned factors constitute a closed system, in the sense that if three of the system’s elements are known and fixed, then the fourth can also be fully defined (Cohen & Cohen, 1983). More specifically, for a given (fixed) n and ES the power of the test increases in line with , for a given and ES the power increases in line with n and for a given and n the power increases in line with ES. The aim of the Power Analysis is to appropriately balance the system’s four parameters, taking into account both the theoretical and practical objectives of the research, in combination with the resources (e.g. financial, technological) available to the researcher. This balance should not contradict the moral-ethical restrictions of the said research. n a practical level, Power Analysis can also give answers to the following two basic questions: a) At a significance level and for a power level , what is the minimum sample size n required for the implemented statistical test to diagnose an ES d as statistically significant? In such a case, d (e.g. 0.20) is an estimation of the minimum ES that can be of practical or clinical significance to the researcher and which is worth detecting as statistically significant. b) Given the sample size n, the significance level and the observed ES, what is the power of the statistical test? The answer to question a) is the a priori approach to power analysis, while the answer to question b) is the post-hoc approach. 7 POWER ANALYSIS ACCORDING TO COHEN FOR A CONTINGENCY TABLE WITH TWO VARIABLES Let us suppose that F is a contingency table of absolute frequencies of two categorical variables X and Y with k and l categories respectively. The general element of table F is fij with i=1,…,k and j=1,…,l (for the notation of the present Chapter, see also Chapter 1). Let us also suppose that the preconditions are valid for the application of the 2 statistical test of independence. The statistical test then realized is the following: H0: X and Y are independent vs Ha: not H0, 7 (at significance level ) 2 The null hypothesis H0 is rejected if Q 2 corresponds to table F and ( k 1)( l 1); , where Q is the statistic 2 is the critical value of the ( k 1)( l 1); 2 that distribution at a significance level with (k-1)(l-1) degrees of freedom (d.f.). Remark 1. If H0 is true, then Q asymptotically follows the 2 distribution with (k-1)(l-1) d.f. If, however, Ha is true, then Q has the limiting non-central 2 distribution, with a non-centrality parameter and (k-1)(l-1) d.f. (Cochran, 1952; Chapman & Nam, 1968; Lachin, 1977; Guenther, 1977). References providing more information concerning the noncentral 2 distribution can be found in Patnaik (1949), Sankaran (1963), Guenther (1964) and Han (1975). In general, the following is valid for the non-centrality parameter (Lachin, 1977): =nf( 0, a ), [7.1] where n is the sample size and f the function of the vectors of parameters 0 and a, which are involved in the statistical test 2 under the H0 and Ha respectively. From a different perspective, f can be considered as the observed result’s degree of deviation from the condition stated through H0 and therefore is a function of the statistical test’s corresponding ES. From [7.1], we find that: n= / f( 0, a ) [7.2] Therefore, if the parameter and its corresponding ES are estimated, then [7.2] can be used to calculate the minimum sample size required, at a significance level and power level , for the statistical test 2 to diagnose the corresponding ES as statistically significant. 7.1 Post-hoc Power Analysis Taking into account [2.2] and Remark 1, the observed Error Type follows: P Q obs where 2 ( k 1)( l 1);a 2 / Ha true P 2 2 nc ( k 1)( l 1) ( k 1)( l 1); a is the value of the non-central nc ( k 1)( l 1) (k-1)(l-1) d.f. Due to [2.4], the observed power expression: obs 1 In order to calculate According to Cohen (1988), obs obs, P obs 2 obs , is estimated as [7.3] distribution with a parameter of the 2 and test is given by the following 2 2 nc ( k 1)( l 1) ( k 1)( l 1); a [7.4] it is necessary to have an estimation of the parameter . 8 =nw2, [7.5] where n is the sample size and w an estimation of the ES, as determined by Cohen for the test of independence 2 with two variables. The ES w is generally estimated using the following formula: kl w p1i p 0i 2 , p0i i [7.6] where p1i and p0i are the relative frequencies of cell i on the contingency table, under the H0 and Ha respectively. In the case of table F in particular, formula [7.6] taking into account formula [1.1] can also be written thus: ( pij w i ri c j ) 2 IF ri c j j [7.7] The w index ranges between 0, which signifies the independence of the two variables, and the maximum value s , where s=min(k-1, l-1) (Cohen, 1988). The limiting maximum value of w signifies a perfect correlation between the two variables. From [7.7] w2 IF [7.8] In addition, from [1.2] it is obvious that: w 2 Q n [7.9] Due to [1.3] and [7.8], formula [7.5] is written as follows: =nw2 =nIF =Q [7.10] Therefore, the observed power of the test of independence following relation: obs P 2 nc ( k 1)( l 1) nI F 2 P ( k 1)( l 1); a 2 nc ( k 1)( l 1) Q 2 is provided by the 2 ( k 1)( l 1); a [7.11] In practice, the numerical calculation of [7.11] is carried out with the aid of noncentral 2 distribution tables (Haynam, et al., 1970) or with the use of software such as SAS and SPSS that include specific functions for all the relevant calculations. 7.2 A Priori Power Analysis Using [7.2] and [7.10], it is possible to make an a priori calculation of the minimum sample size required, when an estimation of the parameter and ES w is given. In this case, we can deduct the sample size from the following formula: 9 n w2 [7.12] IF For the 2 distribution, the values of parameter ( , , u) that correspond to power =1- and a significance level , with u degrees of freedom, can be found in tables (Haynam, et al., 1970; Pearson & Hartley, 1972) or can be calculated using relevant software. The problem lies in providing a predefined estimation of w or IF that has a clinical or practical significance within the framework of the research to be implemented. 8. PREDEFINING THE EFFECT SIZE A predefinition of w or IF can be achieved either through pilot research projects or following meta-analyses of previous comparable studies on the same research subject. In addition, Cohen’s conventions can also be used in relation to what can be considered as a “small”, “medium” or “large” effect size, within the framework of the statistical test of independence 2 (see Table 1). Table 1: Cohen’s Conventions And Correspondence Between w And IF Small ES Medium ES Large ES w=0.10 w=0.30 w=0.50 IF =0.01 IF =0.09 IF =0.25 What is of even greater interest, is the relation between w or IF and contingency coefficients based on the statistic Q. There are two basic reasons that lead us to the abovementioned proposal: a) These indexes can be calculated in a relatively easy manner using published results from other corresponding research studies b) they express the magnitude or the degree of association between the variables on a scale of 0 to a maximum value 1. Two of the most commonly used association indexes for contingency tables with two variables are the Contingency Coefficient C and Cramer’s V index. 8.1 The Relation Of w And IF To The Contingency Coefficient C It is known (Reynolds, 1984) that the C coefficient is given by the formula: C Q [8.1] Q n Due to [7.9] and [7.8], formula [8.1] can also be written thus: C Q n Q n n n w2 w2 1 10 IF IF 1 [8.2] From [8.2] C2 1 C2 w IF [8.3] Therefore, IF C2 1 C2 [8.4] Remark 2. The maximum value of the C coefficient depends on the dimensions of the contingency table, which means that a direct comparison between C indexes from tables with varying dimensions is not feasible (Cohen, 1988). 8.2 The Relation Of w And IF To Cramer’s V Coefficient The Cramer’s V index is given by the formula (Reynolds, 1984): Q , ns V [8.5] where s=min(k-1, l-1). Due to [7.9] and [7.8], formula [8.5] can also be written as follows: w V s IF s [8.6] From [8.6] w V s [8.7] Hence, IF V s IF V 2s [8.8] 9 DYNAMIC INERTIA OF A CONTINGENCY TABLE WITH TWO VARIABLES The concept of inertia plays a core role within the framework of Data Analysis. More specifically, the total inertia of a contingency table with two variables expresses a generalized variance, but can simultaneously be viewed as an index of the information contained within the table. In addition, it has been shown above that total inertia, due to its association with w, can also be regarded as an index of effect size, which expresses the degree or magnitude of 11 the association between two categorical variables. The maximum value of total inertia in the case of two variables is equal to s, where s=min(k-1, l-1). Definition: If IF is the observed inertia of the contingency table F and Imax=s the maximum possible inertia of F with only one constrain concerning the number of rows and columns in the table for a given sample size n, then we define as the dynamic inertia ID of table F the ratio: IF I max ID [9.1] Dynamic inertia expresses the observed inertia as a percentage of the maximum possible inertia. Through formulae [1.2] and [8.5], formula [9.1] can also be written thus: Q ns ID V2 [9.2] Therefore, the dynamic inertia of the contingency table is equal to the square of the contingency coefficient Cramer’s V. The quantity ID 100 expresses IF as a percentage (%) of Imax. For those scientists who use Data Analysis methods in their research and are familiar with the concept of inertia, dynamic inertia can constitute an alternative approach for predefining effect size. The formulae relating ID to w and IF are provided below. w From formulae [8.7] and [9.2] IF From formula [9.2] sI D sI D [9.3] [9.4] Remark 3. In practical applications, for an a priori determination of sample size, it is useful to carry out a relevant sensitivity test, setting limits for ES, and a cost-benefit analysis in order to balance the available resources with the research objectives. 10 CALCULATING SAMPLE SIZE IN THE CASE OF MULTIPLE VARIABLES In the case of q categorical variables, CA is usually applied to the generalized contingency table B (Burt’s table). According to Menexes & Papadimitriou (2004), that part of the total inertia in table B, which expresses the pair-wise correlations (contingencies) between variables and is worth investigating, is the interesting inertia I , defined by the following formula: g Ic I c 1 q (q 1) 2 , g 1, , qq 1 , 2 g I c is the sum of the inertias of the where c 1 [10.1] qq 1 different simple contingency tables of 2 the q variables in pairs. 12 In view of such an observation, we could claim that CA in fact analyzes a “package” of bivariate associations or, in other words, the pair-wise interactions of all variables. And if we consider that total inertia expresses a measure of the information contained within the relevant table or an overall effect size, then we could claim that in each case we are analyzing an average effect size index, which is the result of bivariate and not multivariate relations. In conclusion, it becomes obvious that pair-wise pre-testing of the associations between the variables to be analyzed using CA is a prerequisite. When the sample is derived from simple random sampling, the statistical test of independence 2 can be applied for the qq 1 pair-wise correlations of the variables, with an adjustment of the significance level 2 , for example according to Bonferroni (Girden, 1992; Brown & Melamed, 1990); otherwise, the goodness of fit of a suitable loglinear model can be tested. Therefore, in the case of multiple variables, the minimum required sample size can be calculated for each pair of variables in isolation and then the highest value can be selected qq 1 values that will emerge, so that the requirements of all tests are fulfilled, as from the 2 regards sample size. Naturally, such a process calls for a predefined ES for each bivariate association. 11 POWER ANALYSIS OF THE GOODNESS OF FIT TEST During the application of CA on the contingency table Fk l with two variables, the reconstruction of the table’s absolute frequencies is achieved through the transition formula (Benzécri, 1992): f ij fi f j n u ip v jp 1 p , i=1,…,k and j=1,…,l [11.1] p where p is the eigenvalue (inertia) of the factorial axis p and {uip , vjp} are the standardized coordinates of the rows and column in table F (Greenacre, 1993) respectively, on the factorial axis p where p=0,…r and r=mim(k-1, l-1). The statistical goodness of fit 2 test (Cohen, 1988) can be used to compare the observed frequencies fij in table F, from a random sample size n, with the expected frequencies ij specified by the null hypothesis Hp, which states that only the first p eigenvalues are statistically different to zero (Saporta & Tambrea, 1993). The weighted least squares estimates of the expected frequencies ij are the values that result from the transition formula, where only the first p terms are used in the sum (Andersen, 1991). The goodness of fit test can be realized with the statistic: f ij ij Qp i j 2 [11.2] ij In cases where p=0, i.e. when the two variables are independent, and provided the preconditions for the test’s application are satisfied, the statistic Q0 can be compared with the critical value of the 2 distribution with (k-1)(l-1) degrees of freedom, at a significance level 13 . If p=1, then the statistic Q1 asymptotically follows the 2 distribution with (k-2)(l-2) d.f. In general, under Hp, the statistic Qp can be compared with the critical value of the 2 distribution with (k-p-1)(l-p-1) degrees of freedom (Andersen, 1991; Saporta & Tambrea, 1993; Rao, 1995). The statistical test 2 can be used to test whether the p first factorial axes are sufficient in order to reconstruct table F. In practice, we carry out a series of tests, starting with p=0 until the null hypothesis Hp becomes “accepted” at a significance level . In this way, our interest focuses on the calculation of obs, so as to avoid committing Errors Type and Type ½. The method for calculating the observed power obs of the statistical goodness of fit 2 test can be applied in exactly the same manner as with the test of independence (Cohen, 1988). Remark 4. The statistic Qp has the disadvantage that when the observed frequencies fij are small, then the estimated frequencies ij can even acquire negative values, in which case the statistical test cannot be implemented. What Malinvaud (1987) has recommended on such fi f j an occasion, is to use the quantity as the denominator of the fraction in formula n [11.2]. This then leads us to a modified statistic Q p , given by the formula: f ij ij Qp i j fi f 2 n p 1 p 2 r [11.2] j n The statistic Q p is also a function of the remaining r-p eigenvalues and also asymptotically follows the 12 2 distribution with (k-p-1)(l-p-1) degrees of freedom. NUMERICAL EXAMPLES In the following examples, the numerical calculations required were carried out with the support of the add-in EXCEL -face, which is available at: ftp://ftp.stat.uiowa.edu/pub/rlenth/PiFace/. EXCEL software was chosen as the calculation platform, for two main reasons: a) it is widely-used, and b) it provides the option of creating scenarios and What if Analysis. The face also includes the following functions, amongst others: 1) Chi2PowerG 2) Chi2PowerNC The Chi2Power function accepts as arguments the desired significance level of the 2 test, the non-centrality parameter and the corresponding degrees of freedom. The function returns the observed power of the test and can therefore be used for a post-hoc Power Analysis approach. The Chi2PowerNC function accepts as arguments the desired significance level of the 2 test, the desired power of the test and the corresponding degrees of freedom. It returns the non-centrality parameter and can therefore be used for an a priori Power Analysis approach. 14 Example 1: Post hoc Power Analysis Let us suppose we have two categorical variables X and Y with three categories each. The sample size is n=80. The statistical test 2 has shown that Q=14.32. The value of Q for 4 d.f. is statistically significant at =0.05 (p<0.05). The problem that arises, is how to calculate the observed power obs of the 2 test that corresponds to =0.05. The observed inertia I from the corresponding contingency table with two variables can be calculated using the formula [1.2] and is I=0.179. Formula [7.7] shows us that the inertia I corresponds to an effect size w=0.423 (with a tendency towards a large ES according to Cohen’s conventions). Formula [7.10] can be used to estimate the non-centrality parameter , which is =14.32. From [7.11] and with the help of the Chi2Power function, the observed power of the test is estimated at obs=0.875. Consequently, the probability of the 2 test identifying an ES equal to the one observed as statistically significant, at =0.05, is approximately 87.5%. Example 2: A priori Power Analysis Let us suppose that during the planning stage of a research, attention is focused on testing the association between two categorical variables X and Y with three and four categories respectively. Past experience has shown that an ES that corresponds to a minimum inertia of 0.04 is considered clinically or practically significant, according to the objectives and theoretical framework of the said research. The problem in this case, is how to estimate the minimum required sample size n, so that the statistical test 2 (for 6 d.f.), at =0.10 with power =0.99, can identify the predefined ES as statistically significant. From formulae [7.7], [8.6] for s=2 and [9.2] it is seen that inertia I=0.04 corresponds to w=0.20, to Cramer’s index V=0.14 and to a dynamic inertia ID= 0.019. With the help of the Chi2PowerNC function, the non-centrality parameter =24.65 is calculated. Subsequently, using [7.12] the required sample size is estimated at n=617 sampling units. If the desired power level of the test decreases to 0.95, then the estimated sample size is n=447. And if we set =0.90, then the corresponding sample size is estimated at n=367. In this example, the predefinition of the clinically significant ES could be based either on the V index or on dynamic inertia. The size of the sample that we will eventually attempt to collect will also depend on the resources available. Therefore, the likely effect of other restrictions, which are not related to the subject of the research, should be taken into account during the research planning stage. Example 3: Calculation of sample size in the case of three variables Let us suppose that during the planning stage of a research, attention is focused on testing the pair-wise association between three categorical variables X, Y and Z, with three, four and five categories respectively. Past experience has shown that for the pair of variables (X, Y), an ES that corresponds to a minimum dynamic inertia of 0.005 is considered clinically or practically significant, while the corresponding ES for the pairs (X, Z) and (Y, Z) is 0.045 and 0.085 respectively. The problem in this case, is how to estimate the minimum required sample size for each of the three 2 tests, at =0.05 with power =0.80. The relevant data and results from this problem are presented in Table 2. The methodology of the a priori Power Analysis approach can be used to calculate the sample size for each pair of variables. The last column in Table 2 shows that with a sample size n=1503, the requirements of all three 2 tests are fulfilled. Due to the fact that three statistical 2 tests have to be carried out, the significance level can be adjusted according to 15 Bonferroni, so that the Cumulative Error Rate Type (see Huck, 2000) is not greater than 0.05 0.0167 and 0.05. In this case, the significance level for each test could be predefined at 3 then the relevant calculations can follow. Table 2: Sample Size For Each Pair Of Association 13 Correlation Pairs (X, Y) s 2 Clinically Significant ES: Dynamic Inertia ID 0.005 Sample Size n for =0.05 and =0.80 1503 (X, Z) 2 0.045 152 (Y, Z) 3 0.085 68 CONCLUSIONS In the case of two categorical variables, where the data has been collected using the simple random sampling method, it is possible to proceed with a combination of Correspondence Analysis and the statistical test of independence 2. In the present paper, we have introduced the concept of dynamic inertia and recommended a methodology that can be used to estimate both the observed power of the statistical test 2 (post hoc) as well as the minimum required sample size (a priori) in a sampling or experimental research, on whose data CA will be applied. In order to develop the proposed methodology, we have considered the total and dynamic inertia of a contingency table with two variables as alternative effect size indexes and used as a basis the Power Analysis framework of 2 statistical tests as proposed by Cohen. Finally, the following risk must also be pointed out: an experienced statistical analyst, who intentionally makes wrongful use of Statistics, can design a research in such a way so as to appropriately balance the Errors Type and Type , and after an appropriate selection of , , effect size and sample size, can direct the related conclusions towards certain “desirable” results. 14 REFERENCES Andersen, E. (1991). The Statistical Analysis of Categorical Data. BerlinHeidelberg: Springer-Verlag. Bakan, D. (1966). The test of significance in psychological research. Psychological Bulletin, 66, 423-437. Benzécri, J.-P. (1992). Correspondence Analysis Handbook. New York: Marcel Dekker, Inc. Blasius, J. & Greenacre, M. (1994). Computation of Correspondence Analysis. In: M. Greenacre and J. Blasius (eds), Correspondence Analysis in the Social Sciences. Recent Developments and Applications. London: Academic Press. Blasius, J. (1994). Correspondence Analysis in Social Science Research. In: M. Greenacre and J. Blasius (eds), Correspondence Analysis in the Social Sciences. Recent Developments and Applications. London: Academic Press. Brown, S. & Melamed, L. (1990). Experimental Design and Analysis. Newbury Park, CA: Sage. 16 Bryman, A. & Cramer, D. (1999). Quantitative Data Analysis with SPSS Release 8 for Windows: A Guide for Social Scientists. London and New York: Routledge. Buhl-Mortensen, L. (1996). Type-II Statistical Errors in Environmental Science and the Precautionary Principle. Marine Pollution Bulletin, Vol. 32, No. 7, 528-531. Carver, P. (1978). The case against statistical testing. Harvard Educational Review, 48, 378-399. Chapman, D. & Nam, J. (1968). Asymptotic Power of Chi-Square Tests for Linear Trends in Proportions. Biometrics, Vol. 24, No. 2, 315-327. Chatfield, C. (1991). Avoiding statistical pitfalls. Statistical Science, 6, 240-268. Clausen, S.-E. (1998). Applied Correspondence Analysis: An Introduction. Thousand Oakes, CA: Sage. Cochran, W. (1952). The chi2 Test of Goodness of Fit. The Annals of Mathematical Statistics, Vol. 23, No. 3, 315-345. Cohen, J. & Cohen, P. (1983). Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. New Jersey: Lawrence Erlbaum Associates, Inc. Cohen, J. (1962). The Statistical Power of Abnormal-Social Psychological Research: A Review. Journal of Abnormal and Social Psychology, 65, 145153. Cohen, J. (1965). Some Statistical Issues in Psychological Research. In: B. Wolman (ed.), Handbook of Clinical Psychology. New York: McGraw-Hill. Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences. New Jersey: Lawrence Erlbaum Associates, Inc. Cox, R. (1977). The role of significance tests. Scandanavian Journal of Statistics, 4, 49-70. Di Stefano, J. (2001). Power analysis and sustainable forest management. Forest Ecology and Management, 154, 141-153. Dometrius, N. (1992). Social Statistics Using SPSS. New York: HarperCollins Publishers, Inc. Evans, T. & Viengkham, O. (2001). Inventory time-cost and statistical power: a case study of a Lao rattan. Forest Ecology and Management, 150, 313-322. Foster, J. (2001). Statistical power in forest monitoring. Forest Ecology and Management, 151, 211-222. Gatti, G. & Harwell, M. (1998). Advantages of Computer Programs Over Power Charts for the Estimation of Power. Journal of Statistics Education, Vol. 6, No. 3. Available at: http://www.amstat.org/publications/jse/v6n3/gatti.html Gifi, A. (1996). NonLinearMultivariate Analysis. Chichester: John Willey & Sons Ltd. Girden, E. (1992). ANOVA Repeated Measures. Newbury Park, CA: Sage. Greenacre, M. (1993). Correspondence Analysis in Practice. London: Academic Press. Guenther, W. (1964). Another Derivation of the Non-Central Chi-Square Distribution. Journal of the American Statistical Association, Vol. 59, No. 307, 957-960. Guenther, W. (1977). Power and Sample Size for Approximate Chi-Square Tests. The American Statistician, Vol. 31, No. 2, 83-85. Guttman, L. (1985). The illogic of statistical inference for cumulative science. Applied Stochastic Models and Data Analysis, 1, 3-10. 17 Hair, J., Anderson, R., Tatham, R. & Black, W. (1995). Multivariate Data Analysis With Readings. New Jersey: Prentice-Hall International, Inc. Hallahan, M. & Rosenthal, R. (1996). Statistical Power: Concepts, Procedures, and Applications. Behav. Res. Ther., Vol. 34, No. 5/6, 489-499. Han, C-P. (1975). Some relationships Betwwen Noncentral Chi-Squared and Normal Distributions. Biometrika, Vol. 62, No. 1, 213-214. Haynam, G.E., Govindarajulu, Z. & Leone, F.C. (1970). Tables of the Cumulative Non-central Chi-Square Distribution. In: H. L. Harter and D. B. Owen (eds), Selected Tables in Mathematical Statistics, Vol. I. Chicago: Markham Publishing Co. Heidelbaugh, S. & Nelson, W. (1996). A power analysis of methods for assessment of change in seagrass cover. Aquatic Botany, 53, 227-233. Hinkle, D., Wiersma, W. & Jurs, S. (1988). Applied Statistics for the Behavioral Sciences. Boston: Houghton Mifflin Company. Hopkins, W. (1997). A New View of Statistics. Available at: http://www.sportsci.org/resource/stats/index.html. Hubbard, R. & Armstrong, S. (1992). Are Null Results Becoming an Endangered Species in Marketing? Marketing Letters (3:2), 127-136. Huck, S. (2000). Reading Statistics and Research. New York: Addison Wesley Longman, Inc. Israëls, A. (1987). Eigenvalue techniques for Qualitative Data. Leiden: DSWO Press. Kachigan, S. (1991). Multivariate Statistical Analysis: A Conceptual Introduction. NY: Radius Press. Kargopoulos P. & Raftopoulos T. (1998). The Science of Logic & The Art of Thinking. Thessaloniki: Vanias Publishing House. Kinnear, P. & Gray, C. (1999). SPSS for Windows Made Simple. East Sussex: Psychology Press Ltd. Kirk, R. (1995). Experimental Design: Procedures for the Behavioral Sciences. Pasific Grove, CA: Brooks/Cole Publishing Company, ITP. Kish, L. (1959). Some statistical problems in research design. American Sociological Review, 24, 328-338. Kramer, S., and Rosental, R., (1999): Effect Sizes and Significance Levels in Small-Sample Research. In: R. Hoyle (ed.), Statistical Strategies for Small Sample Research. Thousand Oakes: Sage Publications, Inc. Kritzer, B. (1996). Surviving Statistical Spitting Matches. A Professional Development Seminar presentation for Senior Staff of the National Conference of State Legislatures, Madison, Wisconsin, October 10, 1996. Available at: http://www.polisci.wisc.edu/~kritzer/misc/legstaff/legstaff.htm Lachin, J. (1977). Sample Size Determinations for rxc Comparative Trials. Biometrics, Vol. 33, No. 2, 315-324. Lancaster, H. O. (1969). The Chi-Squared Distribution. John Willey & Sons, Inc. Lebart, L., Morineau, A. & Piron, M. (2000). Statistique Exploratoire Multidimensionnelle. Paris: Dunod. Lebart, L., Morineau, A. & Tabard, N. (1977). Techniques de la Description Statistique: méthodes et logiciels pour l’analyse des grands tableaux. Paris: Dunod. Loftus, R. (1991). On the tyranny of hypothesis testing in the social sciences. Contemporary Psychology, 36, 102-105. 18 Lohninger, H. (1999). Teach Me Data Analysis: Single User Edition, [Computer program manual]. New York: Springer. Malinvaud, E. (1987). Data Analysis in applied socio-economic statistics with special consideration of correspondence analysis. Marketing Science Conference Proceedings. HEC-ISA, Joy en Josas. Menexes, G. & Oikonomoy, A. (2002). Errors and Misconception in Statistical Hypothesis Testing. Workbooks of Data Analysis, 2, 52-64. (in Greek) Menexes, G. & Papadimitriou, I. (2004). Relations of inertia in simple contingency, generalized contingency (Burt) and indicator matrices for two or more variables. Workbooks of Data Analysis, 4, 42-69. (in Greek) Meng, R. & Chapman, D. (1966). The Power of Chi Square Tests for Contingency Tables. Journal of the American Statistical Association, Vol. 61, No. 316, 965-975. Meyer, T. & Mark, M. (1996). Statistical Power and Implications of Meta-Analysis for Clinical Research in Psychosocial Oncology. Journal of Psychosomatic Research, Vol. 41, No. 5, 409-413. Michailidis, G. & De Leeuw, J. (1998). The Gifi System of Descriptive Multivariate Analysis. Statistical Science, Vol. 13, No. 4, 307-336. Micheloud, F.-X. (1997). Jean Paul Benzécri’s Correspondence Analysis. Available at: http://www.micheloud.com/FXM/COR/E/index.htm Miller, J., Daly, J., Wood, M., Roper, M. & Brooks, A. (1997). Statistical power and its subcomponents-missing and misunderstood concepts in empirical software engineering research. Information and Software Technology, 39, 285-295. Morrison, E. & Henkel, E. (1970). Significance tests in behavioral research: Skeptical conclusions and beyond. In: D. E. Morrison and R. E. Henkel (ed.), The Significance Test Controversy---A Reader. Chicago: Aldine. Muller, K., LaVange, L., Landersman-Ramey, S. & Ramey, C. (1992). Power Calculations for General Linear Multivariate Models Including Repeated Measures Applications. Journal of the American Statistical Association, Vol. 87, No. 420, 1209-1226. Murphy, K. & Myors, B. (1998). Statistical Power Analysis: A Simple and General Model for Traditional and Modern Hypothesis Tests. New Jersey: Lawrence Erlbaum Associates, Inc. Nathan, G. (1972). On the Asymptotic Power of Tests for Independence in Contingency Tables from Stratified Samples. Journal of the American Statistical Association, Vol. 67, No. 340, 917-920. Nishisato, S. (1980). Analysis of Categorical Data: Dual Scaling and its Applications. Toronto: University of Toronto Press. Nutahara, H. et al. (2001). A simple computerized program for the calculation of the required sample size necessary to ensure statistical accuracy in medical experiments. Computer Methods and Programs in Biomedicine, 65, 133139. Oakes, M. (1986). Statistical Inference: A Commentary for the Social and Behavioral Sciences. Chichester: John Wiley & Sons, Inc. . : Pagano, M. & Gauvreau, K. (2000). . Parkhurst, F. (1985). Interpreting failure to reject a null hypothesis. Bulletin of the Ecological Society of America, 66, 301-302. 19 Patnaik, P. (1949). The Non-Central chi2 and F Distribution and their Applications. Biometrika, Vol. 36, No. 1/2, 202-232. Pearson, E.S. & Hartley, H.O. (eds) (1972). Biometrika Tables for Statisticians 2. London: Cambridge University Press. Pratt, W. (1976). A discussion of the question: for what use are tests of hypotheses and tests of significance. Communications in Statistics, Series A5, 779-787. Rao, R. (1995). The Use of Hellinger Distance in graphical Displays. In: E.-M. Tiit, T. Kollo and H. Niemi (eds), New Trends in Probability and Statistics Vol. 3 Multivariate Statistics and Matrices in Statistics, Zeist: VSP BV and Vilnius: TEV Ltd., (VSP/TEV). Reynolds, H.T. (1984). Analysis of nominal data. Thousand Oaks, CA: Sage Publications, Inc. Rozeboom, W. (1960). The fallacy of the null-hypothesis significance test. Psychological Bulletin, 57, 416-428. Sankaran, M. (1963). Approximations to the Non-Central Chi-Square Distribution. Biometrika, Vol. 50, No. 1/2, 199-204. Saporta, G. & Tambrea, N. (1993). About the Selection of the Number of Components in Correspondence Analysis. In: J. Janssen and C. Skiadas (eds), Applied Stochastic Models and Data Analysis. Singapore: World Scientific. Schmidt, L. (1996). Statistical significance testing and cumulative knowledge in psychology: implications for training of researchers. Psychological Methods, 1(2), 115-129. Sheppard, C. (1999). How Large should my Sample be? Some Quick Guides to Sample Size and the Power of Tests. Marine Pollution Bulletin, Vol. 38, No. 6, 439-447. Thomas, L. & Juanes, F. (1996). The importance of statistical power analysis: an example from Animal Behaviour. Anim. Behav., 52, 856-859. Thomas, L. & Krebs, C. (1997). A Review of Statistical Power Analysis Software. Bulletin of the Ecological Society of America, Vol. 78(2). Available at: http://sustain.forestry.ubc.ca/cacb/power/review/powrev.html Van de Geer, J. (1993). Multivariate Analysis of Categorical Data: Applications. Thousand Oaks, CA: Sage Publications, Inc. Verma, R. & Goodale, J. (1995). Statistical power in operations management research. Journal of Operations Management, 13, 139-152. Weller, S. & Romney A.K. (1990). Metric Scaling: Correspondence Analysis. Newbury Park, CA: Sage. Yates, F. (1951). The influence of Statistical Methods for Research Workers on the development of the science of statistics. Journal of the American Statistical Association, 46, 19-34. Yoccuz, G. (1991). Use, overuse, and misuse of significance tests in evolutionary biology and ecology. Bulletin of the Ecological Society of America, 72, 106111. Zar, J. (1996): Biostatistical Analysis. New jersey: Prentice-Hall International, Inc. 20