Stats 95 t-Tests • Single Sample • Paired Samples
Transcription
Stats 95 t-Tests • Single Sample • Paired Samples
Stats 95 t-Tests • Single Sample • Paired Samples • Independent Samples t Distributions • t dist. are used when we know the mean of the population but not the SD of the population from which our sample is drawn • t dist. are useful when we have small samples. • t dist is flatter and has fatter tails • As sample size approaches 30, t looks like z (normal) dist. • Same Three Assumptions • Dependent Variable is scale • Random selection • Normal Distribution Fat Tails Lose Weight With Larger Sample Size The Robust Nature of the t Statistics • Unfortunately, we very seldom know the if the population is normal because usually all the information we have about a population is in our study, a sample of 10-20. • Fortunately, • 1) distributions in social sciences often approximate a normal curve, and • 2) according to Central Limit Theorem the sample mean you have gathered is part of a normal distribution of sample means, and • 3) in practice t tests statisticians have found the test is accurate even with populations far from normal The Robust Nature of the t Statistics • The only situation in which using a t test is likely to give a seriously distorted result is when you are using a one-tailed test and the population is highly skewed. z Statistic Versus t Statistic z Statistic • When you know the Mean and Standard deviation of a population. t Statistic • When you do not know the Mean and Standard Deviation of the population – E.g., a farmer picks 200,000 apples, the mean weight is 112 grams, the SD is 12grams. – E.g., a farmer picks 30 out of his 200,000 apples, and finds the sample has a Mean of 112 grams. • Calculate the Standard Error of the sample mean • Calculate the Estimate of the Standard Error of the sample mean Scenarios When you would use a Single Sample t test • A newspaper article reported that the typical American family spent an average of $81 for Halloween candy and costumes last year. A sample of N = 16 families this year reported spending a mean of M = $85, with s = $20. What statistical test would we use to determine whether these data indicate a significant change in holiday spending? • Many companies that manufacture lightbulbs advertise their 60-watt bulbs as having an average life of 1000 hours. A cynical consumer bought 30 bulbs and burned them until they failed. He found that they burned for an average of M = 1233, with a standard deviation of s = 232.06. What statistical test would this consumer use to determine whether the average burn time of lightbulbs differs significantly from that advertised? Difference Between Calculating z Statistic and t Statistic z Statistic X N m z t Statistic N (M M ) M 2 ( X M ) 2 s N 1 sm s N (M m ) t sm Standard Deviation of a Sample: Estimates the Population Standard Deviation Standard Error of a Sample: estimates the Sample Error of the Population t Statistic for SingleSample t Test Estimating Population from a Sample • Main difference between t Tests and z score: – use the standard deviation of the sample to estimate the standard deviation of the population. • How? Subtract 1 from sample size! (called degrees of freedom) SD X N 2 ( X M ) s N 1 2 Standard Deviation of a Sample: Estimates the Population Standard Deviation • Use degrees of freedom (df) in the t distribution chart t Distribution Table Example of Single Sample t Test • The mean emission of all engines of a new design needs to be below 20ppm if the design is to meet new emission requirements. Ten engines are manufactured for testing purposes, and the emission level of each is determined. Data: • 15.6, 16.2, 22.5, 20.5, 16.4, 19.4, 16.6, 17.9, 12.7, 13.9 • Does the data supply sufficient evidence to conclude that type of engine meets the new standard, assuming we are willing to risk a Type I error (false alarm, reject the Null when it is true)) with a probability = 0.01? • Step 1: Assumptions: dependent variable is scale, Randomization, Normal Distribution • Step 2: State H0 and H1: – H0 Emissions are equal to (or greater than) 20ppm; – H1 Emissions are lesser than 20ppm (One-Tailed Test) Example of Single Sample t Test • The mean emission of all engines of a new design needs to be below 20ppm if the design is to meet new emission requirements. Ten engines are manufactured for testing purposes, and the emission level of each is determined. Data: • 15.6, 16.2, 22.5, 20.5, 16.4, 19.4, 16.6, 17.9, 12.7, 13.9 • Step 3: Determine Characteristics of Sample Mean = ( X M ) 2 s N 1 Standard Deviation of Sample = s Standard Error of Sample = sm N • Step 4: Determine Cutoff – df = N-1 = 10-1 =9 – t statistic cut-off = -2.822 Example of Single Sample t Test • The mean emission of all engines of a new design needs to be below 20ppm if the design is to meet new emission requirements. Ten engines are manufactured for testing purposes, and the emission level of each is determined. Data: • 15.6, 16.2, 22.5, 20.5, 16.4, 19.4, 16.6, 17.9, 12.7, 13.9 • Step 3: Determine Characteristics of Sample Mean M = 17.17 ( X M ) s Standard Deviation of Sample = 2.98 N s1 s Standard Error of Sample = 0.942 s msm N • Step 4: Determine Cutoff – df = N-1 = 10-1 =9 Step 5: Calculate t Statistic – t statistic cut-off = -2.822 2 (M m ) t sm Example of Single Sample t Test • The mean emission of all engines of a new design needs to be below 20ppm if the design is to meet new emission requirements. Ten engines are manufactured for testing purposes, and the emission level of each is determined. Data: • 15.6, 16.2, 22.5, 20.5, 16.4, 19.4, 16.6, 17.9, 12.7, 13.9 • Mean M = 17.17 Standard Deviation of Sample Standard Error of Sample sm = 0.942 Step 5: Calculate t Statistic s = 2.98 Step 6: Decide (Draw It) t statistic cut-off = -2.822 (M m ) (17.17 20) t 3.00 sm 0.942 t statistic = -3.00 Decide to reject the Null Hypothesis Paired Sample t Test • The paired samples test is a kind of research called repeated measures test (aka, within-subjects design), commonly used in before-after-designs. • Comparing a mean of difference scores to a distribution of means of difference scores – Population of measures at Time 1 and Time 2 – Population of difference between measures at Time 1 and Time 2 – Population of mean difference between measures at Time 1 and Time 2 – (Whew!) Paired Sample t Test Single-Sample • Single observation from each participant • The observation is independent from that of the other participants • Comparing a mean score to a distribution of mean scores . Paired-Sample • Two observations from each participant • The second observation is dependent upon the first since they come from the same person. • Comparing a mean of difference scores to a distribution of means of difference scores • (I don’t make this stuff up) Paired Sample t Test • A distribution of scores. • A distribution of differences between scores. • Central Limit Theorem Revisited. If you plot the mean of randomly sampled observations, the plot will approach a normal distribution. This is true for scores and for differences between scores. Difference Between Calculating Single-Sample t and Paired-Sample t Statistic Single Sample t Statistic ( X M ) 2 SS s N 1 N 1 sm s Paired Sample t Statistic Standard Deviation of a Sample s Standard Error of a Sample N (M m ) t sm ( D x y M x y ) 2 N 1 Standard Deviation of Sample Differences sm t Statistic for SingleSample t Test t SS N 1 s N (M x y x y ) sm Standard Error of Sample Differences T Statistic for Paired-Sample t Test Paired Sample t Test Example • We need to know if there is a difference in the salary for the same job in Boise, ID, and LA, CA. The salary of 6 employees in the 25th percentile in the two cities is given . Profession Boise Los Angeles Executive Chef 53,047 62,490 Genetics Counselor 49,958 58,850 Grants Writer 41,974 49,445 Librarian 44,366 52,263 School teacher 40,470 47,674 Social Worker 36,963 43,542 • Six Steps of Hypothesis testing for Paired Sample Test Paired Sample t Test Example • We need to know if there is a difference in the salary for the same job in Boise, ID, and LA, CA. • Step 1: Define Pops. Distribution and Comparison Distribution and Assumptions – Pop. 1. Jobs in Boise – Pop. 2.. Jobs in LA – Comparison distribution will be a distribution of mean differences, it will be a paired-samples test because every job sampled contributes two scores, one in each condition. – Assumptions: the dependent variable is scale, we do not know if the distribution is normal, we must proceed with caution; the jobs are not randomly selected, so we must proceed with caution Paired Sample t Test Example • • • We need to know if there is a difference in the salary for the same job in Boise, ID, and LA, CA. Step 3: Determine the Characteristics of Comparison Distribution (mean, standard deviation, standard error) M = 7914.333 Sum of Squares (SS) = 5,777,187.333 s SS 5,777,186.333 1074.93 N 1 5 Profession Boise Los Angeles sm X-Y s N 1074.93 D (X-Y)-M 6 438.83 D^2 M = 7914.33 Executive Chef 53,047 62,490 -9,443 -1,528.67 2,336,821.78 Genetic Counselor Grants Writer Librarian School teacher Social Worker 49,958 41,974 44,366 40,470 36,963 58,850 49,445 52,263 47,674 43,542 -8,892 -7,471 -7,897 -7,204 -6,579 -977.67 955,832.11 443.33 196,544.44 17.33 300.44 710.33 504,573.44 1,335.33 1,783,115.11 Paired Sample t Test Example • We need to know if there is a difference in the salary for the same job in Boise, ID, and LA, CA. • Step 4: Determine Critical Cutoff • df = N-1 = 6-1= 5 • t statistic for 5 df , p < .05, two-tailed, are -2.571 and 2.571 • Step 5: Calculate t Statistic t (M x y x y ) sm • Step 6 Decide (7914.333 0) 18.04 438.333 Independent t Test • Compares the difference between two means of two independent groups. • The comparison distribution is a difference between means to a distribution of differences between means. – Population of measures for Group 1 and Group 2 – Sample means from Group 1 and Group 2 – Population of differences between sample means of Group 1 and Group 2 Independent t Test Paired-Sample • Two observations from each participant • The second observation is dependent upon the first since they come from the same person. • Comparing a mean difference to a distribution of mean difference scores Independent t Test • Single observation from each participant from two independent groups • The observation from the second group is independent from the first since they come from different subjects. • Comparing a the difference between two means to a distribution of differences between mean scores . Independent t Test: Steps Step 1 Step 2 Step 3 Step 4 Step 5 ( X M ) 2 sx N 1 ( X M ) 2 sy N 1 df total df x df y s s 2 pooled 2 Mx s df x 2 df y 2 s y s x df total df total 2 pooled Nx 2 s My s 2pooled Ny 2 2 2 s Difference s Mx s My Step 6 2 s Difference s Difference Step 6 Step 7 s Difference s 2 Difference (M X M Y ) t s Difference t Statistic for an independentSamples t Test ( M X M Y ) Lower t ( s Difference) ( M X M Y ) Sample ( M X M Y )Upper t ( s Difference) ( M X M Y ) Sample d (M X M Y ) s pooled Independent t Test • Similar to previous steps except it takes more time to calculate the estimate of the standard error, called the pooled estimate of the standard error. • Must calculate Pooled 2 Variance, a weighted s pooled 2 s My average of the estimates of Ny the variance from both samples. 2 s Mx s 2pooled Nx sm s N Standard Error of a Sample: estimates the Sample Error of the Population ( X M ) s N 1 df N 1 (M m ) t sm 2 Standard Deviation of a Sample: Estimates the Population Standard Deviation x (M M y ) t Statistic for SingleSample t Test t Statistic for Difference ( M x y x y ) T StatisticIndepen for dent tt t for Single Paired-Sample Degrees of Freedom Test s Sample Test Sample t Test, and Paired 2 ( X M ) s2 N 1 t s m Variance for a sample df df ( x) df ( y ) s Pooled 2 2 s Mx s Degrees of Freedom for Independent Samples t Test df x 2 df y s s x df total df total s 2 pooled s Nx 2 Difference s 2 My 2 Mx s Difference s s s 2pooled Pooled Variance. Like adding 2 together the weighted average of y the variance from Variable X and Variable Y. Variance for a Distribution of means for Indep.-Samples t Test Ny 2 Variance for a Distribution of My Differences between Means SD of the distribution of 2 Differences Between Means Difference X1 X2 2 3 2 3 4 5 4 5 6 7 6 7