Completely Randomized Experimental Design
Transcription
Completely Randomized Experimental Design
Completely Randomized Experimental Design • Simplified as Completely Randomized Design or CRD in which replicated treatments are assigned randomly to EUs (pots, plots, individuals, containers, chambers, refrig., etc.) • It is the simplest, valid design that results in data that can be analyzed, in order of feasibility, as: - two-sample t-test, when having 1. one continuous response variable 2. one categorical explanatory variable with only 2 levels - one-way ANOVA, when having 1. one continuous response variable 2. one categorical explanatory variable having 2 levels - simple linear regression, when having 1. one continuous response variable 2. one continuous explanatory variable - multi-way ANOVA, when having 1. one continuous response variable 2. Two or more categorical explanatory variables having 2 levels - multiple linear regression, when having 1. one continuous response variable 2. Two or more continuous explanatory variables 16 • CRD is used when EUs are assumed to be homogeneous with respect to all factors, which can affect the response, but are not the focus of the study - most practical for lab; small plots, chambers, etc. But even for such cases the assumption of homogeneity of EUs may not always hold -- e.g., studying the effect of light on bacterial growth using refrigerators as EUs • CRD is simple in design and data analysis - simple design does not mean a bad design • In CRD, loss of information due to missing values replicates) is smaller than other designs (why?) - appropriate for long-term studies, human studies, having severe treatments • CRD provides maximum df for the experimental error, improving the detection of a treatment effect - particularly useful when replication is low • CRD is inefficient if the variability among experimental units is known to be large and can be blocked in certain ways 17 Procedure •Treatments are assigned randomly to EUs (pots, plots, individuals, containers, chambers, refrigerators, etc.) •Throughout the study, the EUs should be processed randomly at any stage, when the order of processing may affect the response (e.g.?) •Both equal and unequal replications can be employed - often preferred, to have equal replication (balanced design) 18 A simple example •Objective: to test the effect of 3 fertilization level on Ht. growth of houseplants grown in 9 independent pots •Research Hypothesis: the 3 different levels affect plant height differently (can causality be inferred?) •Treatment Design, fertilization at: 1. 0 level (control) 2. 5 units 3. 10 units Experimental Unit: EU =? Total # of EUs: n =? Treatment levels: t =? Replicates: r =? 19 Randomization: • If pots are fixed (i.e., immobile; e.g., plots of land, chambers, etc): 1. Number them arbitrarily (e.g., 1 to 9: having 3 replicates) 2. Order treatment levels and their replicates 3. Generate 9 random numbers (stat. book, calculator, etc) and assign subsequently each random number to one of the t×r combinations 4. Order the 9 (t×r) combinations based on their random numbers and assign them to the 9 EUs in order • If the EUs are not fixed (i.e., EUs are mobile): - place them randomly, and then do as above 20 Do EUs seem homogeneous? T. Levels 0 0 0 5 5 5 10 10 10 Rep 1 2 3 1 2 3 1 2 3 Random# Rank 21 6. After a number of days, measure plant height randomly 7. The experimental design part is over, but before analyzing data ensure to recognize or set the following: - the response variable? - the explanatory variable? - statistical Hypothesis: H0 : HA: - a level? - statistical test? 22 • May we use a series of t-tests on pairwise treatment means? if A, B, C represent 3 treatments (0, 5, 10 levels), there are a total of 3 possible pairs of means to compare H0: A vs B,A = B = A vs C,A = C = B vs C,B = C = • There are 3 separate tests • The overall (experimentwise) will exceed the preset value • For a = 0.05, it will increase to a maximum of: 1- (1- )3* 0.14 (when all comparisons are independent) * number of pairs being compared 23 •As the number of treatments increases, the overall (probability of incorrectly rejecting the H0 for at least one of the pairs) increases drastically: # of treat. 2 3 4 5 6 10 # of Pairs 1 3 6 10 15 45 pairwise a level 0.10 0.05 0.01 0.10 0.27 0.47 0.65 0.79 0.99 0.05 0.14 0.26 0.40 0.54 0.90 0.01 0.03 0.06 0.10 0.14 0.36 experimentwise error •Thus, it is improper to use a series of t-tests for unplanned comparison of more than 2 means, because the overall (experimentwise) Type-I error would be underestimated •There is a need for another kind of test--an F-test 24 1. Assume that the plant heights at the end of the study are: 5.9, 5.9, 5.9, 5.9, 5.9, 5.9, 5.9, 5.9, 5.9 (cm) - any variation in the data? - reason for such a result? 2. Now, consider the actual data as below: 6.8, 7.1, 6.6, 7.2, 7.2, 7.1, 7.1, 6.9, 7.0 - any variation in the data? - how much is it? - possible cause(s) for such variation? in other words, what could be the source(s) of such variation? - average magnitude of such variations? - relative importance of average variations? 25 3. Now, consider actual data arranged based on the treatment design treatment level Ht. (cm) Replicate 1 2 3 treatment treatment total mean 0 6.8 6.7 6.6 20.1 6.70 5 7.1 6.9 7.0 21.0 7.00 10 7.2 7.2 7.1 21.5 7.17 treatment levels: t= 3 replicates: r= 3 total pots: n = tr = 9 grand mean: = 6.96 26 Summation Notation For a One-Way Classification Treat. level 1 Response Replicate 2 3 . r Total Mean 1 y11 y12 y13 . y1r y1. y 2 y21 y22 y23 . y2r y2. y 3 y31 y32 y33 . y3r y3. y . . . . . . . t yt1 yt2 yt3 . ytr yt. 1. 2. 3. yt. treatment levels: t replications: r total # of EUs: = n = tr grand mean: y.. 27 4. Actual data arranged based on treatment design and classification notation t 1 r 2 3 0 6.8 6.7 6.6 20.1 6.70 5 7.1 6.9 7.0 21.0 7.00 10 7.2 7.2 7.1 21.5 7.17 treat. levels: replicates: total pots: grand mean: yt. yt. t= 3 r= 3 n = tr = 9 y.. = 6.96 - quantification of the magnitude and relative importance of sources of variations constitutes ANOVA - in most cases (fixed models), ANOVA is used to test hypothesis regarding means - in some cases (random models) ANOVA is used to test hypothesis regarding variances 28 Hypothesis H0 : 1 =2 = 3 = - stat. model: yij = µ + ij (reduced model) - graphical presentation y1 y2 y3 HA : not all µs are equal - stat. model: yij = µ + i + ij (full model) - graphical presentation 1 2 3 yij: jth observation (rep.) from the ith treatment level (j = 1, 2, … r; i = 1, 2, …, t) µ: estimated by the grand mean i: effect of the ith treat. level (deviation of the mean of ith level from µ) ij: random deviation of yij from its expected value (called: error or residual) 29 •Under the H0 model (yij = µ + ij): - assuming no systematic error at any stage, any value (cell) deviates from y..due to a random error: yij – y..= eij -- source of variation is unknown (chance?) -- sum of all individual squared deviations (yij - y..)2 = eij2 constitutes total sum of squares (SSTotal), which can be averaged [SSTotal / (n-1)] as: Mean Squared deviations (MSerror), also known as? •Under the HA model (µ + i + ij): - assuming no systematic error, any cell deviates from y.. due to a combination of a treatment effect and a random error: yij – y.. = eij + y t -- random variation is not the only source of variation -- sum of all individual squared deviations (yij - y..)2 (SSTotal) eij2, and averaging it does not provide MSerror So, the magnitude of variation due to treatment & random noise should be quantified and compared (how?) to test the significance of treatment effect 30 SSTotal : - the Total Sum of Squares of the data -- definition formula: -- computation formula: ( yij y..)2 yij2 [( yij )2 / n] SStreatment: - the SS due to treatment differences and random error also called SSbetween, SSamong … 2 -- definition formula: [r( yi. y..) ] -- computation formula: [( y2 / r)][( y )2 / n] i. ij where: r = # of reps y = treatment-level mean i. yi. = treatment-level total - SStreatment accounts for a portion of SSTotal - SStreatment /(t-1): variance due to treatment and random error, denoted by MStreatment 31 SSerror: - SS due to random variation, also called SSwithin or SSresidulal -- definition formula: ( yij yi.)2 -- computation formula 2 2 1. yij ( yi . / r ) 2 2 2 2. ( S S ... St ) (r -1) 1 2 3. SSTotal-SStreatment The ANOVA table can be completed in several ways: A. calculate the SSTotal and SSerror (from pooled variance), and get SStreatment by subtraction B. calculate SSTotal and SStreatment, and get SSerror by subtraction 32 • Ideal Conditions (Assumptions): 1. Normality ANOVA is robust with regard to departures from normality, especially when replication is large 2. Equality (homogeneity of variances) ANOVA is robust with regard to departures from equality of variance, especially when replication is large 3. Independence A. Should be assured by the experimenter considering the scope of inference B. Can be improved by - random sampling in observational studies - random assignment of treatments to EUs in experimental studies -setting up EUs as far as possible to decrease: a. pre-existing dependence b. treatment spill-over 33 For the plant example, calculate the followings: 1. (yij)2/n, (called: correction factor--C) (6.8+6.7+6.6+7.1+6.9+7+7.2+7.2+7.1) 2 / 9 = 435.4178 2. SS y2 C ij Total (6.82+6.72+6.62+7.12+6.92+72+7.22 +7.22+7.12) - C = 435.8- 435.4178 = 0.3822 3. SS 2 / r)] C [ ( y treatment i. = [(20.12/3)+(212/3)+(21.52/3)] - C = 435.7533 - 435.4178 = 0.3356 4. SSerror = SSTotal-SStreatment = 0.3822-0.3356 = 0.0466 5. Means of sum of squares for SStreatment and SSerror by dividing them over related dfs - complete the ANOVA table - reject the H0 if calculated F > tabular F 34 df n -1 t-1 n-t Source Total treatment error ANOVA table 35 SSerror SStreatment SSTotal SS MSerror = SSerror / (n-t) MStreatment= SStreatment / (t-1) MS MSt / MSe = F P 36 Plant fertilization analysis using the GLM and MIXED Procedures of the SAS System options nocenter nodate ps=74 ls=74; data a; input t r ht @@; lines; 0 1 6.8 0 2 6.7 0 3 6.6 5 1 7.1 5 2 6.9 5 3 7 10 1 7.2 10 2 7.2 10 3 7.1 ; proc glm; class t r ; model ht = t/ss3; run; The GLM Procedure Class Levels Values t 3 0 5 10 r 3 1 2 3 Dependent Variable: ht Source DF Model 2 Error 6 Corrected Total 8 R-Square 0.878 Sum of Squares Mean Square .3356 .0467 .3822 .1678 .0078 Coeff Var 1.268 Source DF t 2 Root MSE 0.088 TypeIII SS .336 F Pr > F 21.6 .0018 ht Mean 6.956 MS F Pr > F .168 21.6 .0018 37 Proc Mixed data = a; class t r ; model ht = t; run; The Mixed Procedure …………………….. …………………….. Covariance Parameter Estimates Cov Parm Residual Estimate .00778 Type 3 Tests of Fixed Effects Effect t Num DF Den DF F 2 6 21.6 Pr > F .0018 38 Notes: 1. The F value is a ratio of variances, and thus it: • is always positive (one-sided) • can also be used to test equality of two variances H0: 12 = 22 2. two-sided, two-sample t-test and one-way ANOVA performed on the same data (when doable) a) yield the same P b) MSe from ANOVA is the same as Sp2 from the t-test c) F = t 2 3. A significant F, resulted from a one-way ANOVA does not indicate which mean differs from which; it only indicates that there is at least a significant difference between one pair of means - to find out which mean differs from which we must perform a multiple mean comparison test (MMCT) 39 Types of One-way ANOVA: • Fixed-effect Model (Model I): Applicable when the levels of a factor are - specifically selected - focus of interest - repeatable e.g., testing specific levels of a drug, fertilization, diet, temperature, or light factor yij = µ + i + ij, where: yij is jth observation from the ith treatment level µ is the grand mean i is the effect of the ith treatment level (a fixed deviation of the mean of ith level from µ) ij is the random deviation of yij from its expected value (called residual) ij = yij -µ, (if no treatment effect) ij = yij – (µ+ i) (if treatment effect exists) 40 • Random-effect Model (Model II): Applicable when performing a test to - make a general statement regarding the effect of a factor, and chose the levels of that factor randomly among many possible levels e.g.,testing different watershed types, species, locations, etc., in general, and not interested in the differences among any specific levels yij = µ + Ai + ij where: Ai includes both a treatment effect and a random effect due to randomness of the treatment levels) - in one-way ANOVA, calculations are the same for both Models (I & II) but the H0 is stated differently - when two or more factors are involved, calculations are not the same for the two Models, • Mixed-effect models (Model III): a mixture of both fixed and random effect factors 41 ANOVA table Source df SS MS Expected MS Fixed Random Treat. t-1 SStreat. SStreat./ (t-1) 2e+ r 2t 2e+ r2t Error n-t SSerror SSerror/ (n-t) 2e 2e Total n -1 SSTotal 42 Test of Assumptions • Usually performed on residuals (why?) • In an ANOVA setting, residuals are original data transformed to have zero means per treatment level, as well as across all treatment levels • In a Regression setting, residuals are original data transformed to have a zero mean across all treatment levels (and not per treatment) • If assumptions are met for residuals, they also are met for the collected data 43 I. Independence: within and among treatments 1.Its violation affects both the scope of inference and the significance of the test 2.Can be improved greatly through randomization - spending more resources (thought, effort, money) 3.Should be dealt with during experimental design (if violated, no remedy during data analysis) - is the responsibility of the researcher not the statistician 4.Is required for both parametric and non-parametric procedures 5.Tests to check it are not generally available, particularly when replication is low - plotting the observation against a sequence, if such sequence can be identified, may be helpful to detect serial correlation 44 A.Dependence within treatments (among replicates), which may be positive or negative e.g., testing the effects of several diets on weight gain of a certain animal 1. Positive correlation: eating in the presence of others increases feeding time (a social species?) a. weight gain increases by a factor other than diet b. within treatment variance is underestimated c. the F value is over estimated d. spurious differences are detected e. Type I error is increased 2. Negative correlation: eating in the presence of others decreases feeding time (an aggressive specie?) a. within treatment variance is overestimated b. The F value is underestimated c. real differences are not detected d. Type II error is increased 45 B. Dependence among treatments, which may be positive or negative e.g.,temporal or spatial dependency, repeatedmeasures, etc. 1. Compound symmetry vs circularity, will be discussed later C. To ensure independence: 1. Understand the system (biology, ecology, etc.) being investigated 2. Prevent dependence a. always assume dependence can occur and find ways to avoid it - keep EU / OU separate - do not use the same EU / OU for more than one treatment - if not avoidable in some cases, -- ensure to use a correct model for data analysis -- define the population being investigated and do not extrapolate the results beyond that population (the scope of inference is 46 decreased) II. Normality 1. ANOVA is robust with regard to normality due to CLT 2. Is usually tested on residuals across treatment levels when variances are equal - graphically -- stem-and-leaf diagram, Box plots, frequency distribution, normal probability plot, residual plot - statistically --D statistics, W statistics, but these are not always reliable, e.g., but note that a. for following normally distributed values 1.8048,-0.0799,0.3966,-1.0833,2.2383,-0.6242,0.51376,-0.0866, -.5942,0.0319,-0.7378,-0.2501,0.6850,-0.8042,-0.74428 the SAS Proc Univariate test of normality returns a P < 0.05, rejecting normality b. for numbers from 1 to 8, from a uniform distribution, the same routine returns a P > 0.9, not rejecting normality c. sensitivity (power to reject) of such tests increases with increasing sample size, but increased sample size improves normality of the means (CLT) 3.Te assumption is, in fact, for the sampling mean distribution within treatment levels, and this can never be tested having only one sample at each treatment level—CLT’s help is manifested!! 47 III. Equality (homogeneity, homoscedasticity) of error variances calculated within treatment levels (i.e., s12error = 22error … = t2error = error) 1. ANOVA is robust with regard to departure from this assumption, particularly when the study is large (more than 5 treatments and 6 replicates) and balanced (equal sample size) 2. When study is small and unbalanced A. if the larger variance is associated with the smaller sample 1. the F value is overestimated 2. Type I error is underestimated B. if the larger variance is associated with the larger sample: 1. the F value is underestimated 2. Type I error is overestimated 3. Type II error is increased (decreased power) 48 3.Statistical Test of equality of variances using SAS A. PROC TTEST (for one explanatory variable with two levels) 1. Folded -F method (performed by default) B. Proc MIXED: Null Model Likelihood Ratio Test With the use of ‘REPEATED’ statement and for for one-way ANOVA only 3.1. Pattern in the variances depicted from: Plots of residuals versus Treatment means or ‘Predicted’ in SAS jargon Note: There might increasing or decreasing variances with treatment means without causing variances to differ significantly. Such pattern is still detrimental to ANOVA 49 Residuals Satisfactory 0 Predicted Residuals Needs log, weighted least squares or square-root transformation 0 Predicted Residuals Time should be included in the model 0 Time order 50 Data Transformation • May solve lack of normality and/or heterogeneity of variances • Should be monotonic (retain the order of the means) • Should be done only to solve a problem (routine transformation not ok, unless suggested by the nature of the system being investigated: distance moved, growth) 1.Test the original data, do transformation if needed, test the transformed data, if the problem: A. is solved, proceed with data analysis B. persists, stop and think 2.Most common transformations: A. square-root transformation (counts, poisson) B. log-transformation C. arc-sin transformation of percentages and proportions (binomial data) 51 3.Kind of transformation should be reported, so others can compare your results with theirs 4.The significance level based on the transformed data do not hold for the original data 5.Means (but not SD or S2) should be transformed back to the original scale for presentations 6.If transformation does not help, note that in large ( > 5 treatments, > 6 replication) balanced studies, ANOVA is robust with regard to departures from this ideal conditions - perform the analysis, the conclusion is valid particularly if no treatment effect is detected (why?) 7.If sample size is small, variances are unequal, transformation does not help, analysis is performed, and a significant treatment effect is detected - use the study as a pilot study to plan a larger experiment 52