Week 6 - School of Mathematical Sciences
Transcription
Week 6 - School of Mathematical Sciences
STATS 1000 / STATS 1004 / STATS 1504 Statistical Practice 1 Lecture notes Week 6 Jonathan Tuke School of Mathematical Sciences, University of Adelaide Semester 1, 2015 [W6-1] Sampling distributions [W6-2] Parameters and Statistics A parameter is a number that is calculated from the population. In statistical practice, the value of a parameter is not known because we cannot examine the entire population. A statistic is a number that is calculated from a sample. The value of a statistic can be computed directly from the sample data. We often use a statistic to estimate an unknown parameter. [W6-3] Population and Sample x ¯, s, pˆ Q µ, , p Q Q Q Q [W6-4] Statistical Estimation The process of statistical inference involves using information from a sample to draw conclusions about a wider population. Different random samples yield different statistics. We need to be able to describe the sampling distribution of the possible values of a statistic in order to perform statistical inference. [W6-5] Sampling Variability Different random samples yield different statistics. This basic fact is called sampling variability: The value of a statistic varies in repeated random sampling. To make sense of sampling variability, we ask, “What would happen if we took many samples?” [W6-6] Sampling Distributions If we took every one of the possible samples of a certain size, calculated the sample mean for each, and graphed all of those values, we’d have a sampling distribution. The population distribution of a variable is the distribution of values of the variable among all individuals in the population. The sampling distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population. [W6-7] Example Weights of Cats 3.72 4.16 4.06 4.68 3.36 4.16 4.13 4.20 4.23 5.08 4.54 4.02 4.19 3.93 3.64 4.46 3.92 4.92 4.07 4.33 4.05 4.54 4.72 3.85 4.04 5.24 5.04 4.17 3.37 4.67 4.40 4.11 4.25 3.28 3.93 4.09 4.43 4.25 3.86 3.45 3.47 4.49 3.96 3.85 3.63 4.11 3.10 4.02 3.96 3.57 4.23 4.17 3.57 4.59 3.90 4.41 4.39 4.02 4.05 4.06 4.14 3.56 3.30 4.08 4.11 5.27 3.51 3.55 4.76 3.36 3.16 4.07 4.04 3.05 4.08 5.32 4.07 4.77 4.33 4.16 2.84 4.52 3.63 2.85 3.65 4.40 4.26 4.44 3.88 2.37 3.64 4.35 4.47 4.53 4.39 3.49 4.27 5.11 4.61 3.78 Population mean: µ = 4kg [W6-8] Example Weights of Cats 0.8 density 0.6 0.4 0.2 0.0 2 3 4 weight 5 6 [W6-9] Example Sample 10 cats Weights 3.26, 3.32, 3.51, 4.56, 3.86, 3.47, 4.05, 3.24, 3.59, 3.84 Sample mean x¯ = 3.67kg [W6-10] Example 1000 samples each of 10 cats count 100 50 0 3.3 3.6 3.9 means 4.2 4.5 [W6-11] Example 1000 samples each of 10 cats count 100 50 0 2 3 4 means 5 6 [W6-12] Law of Large Numbers Draw independent observations at random from any population with finite mean µ. Decide how accurately you would like to estimate µ. As the number of observations drawn increases, the mean x¯ of the observed values eventually approaches the mean µ of the population as closely as you specified and then stays that close. [W6-13] x ¯, s, pˆ Q µ, , p Q Q Q Q [W6-14] Law of Large Numbers Cat weight example [W6-15] Mean and Standard Deviation of a Sample Mean Mean of a sampling distribution of a sample mean There is no tendency for a sample mean to fall systematically above or below µ, even if the distribution of the raw data is skewed. Thus, the mean of the sampling distribution is an unbiased estimate of the population mean µ. [W6-16] count 3000 2000 1000 0 0 5 10 15 x [W6-17] ● 1.50 ● ● ● ● ● ● ● ● 1.25 ● ● ● ● ● ● ● ● ● ● ● ● ● ● means ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1.00 ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.75 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 25 50 75 100 sample [W6-18] Mean and Standard Deviation of a Sample Mean Standard deviation of a sampling distribution of a sample mean The standard deviation of the sampling distribution measures how much the sample statistic varies from sample to sample. It is smaller than the standard deviation of the population by a factor √ of n. Averages are less variable than individual observations [W6-19] 0.4 f(x) 0.3 √σ n 0.2 µ 0.1 0.0 −2 0 2 x¯x [W6-20] Example Cats’ weights It is known that the population standard deviation of cats’ weights is σ = 0.5kg. If we take a sample of 10 cats what is the standard deviation of the sample mean? If we take a sample of 30 cats what is the standard deviation of the sample mean? [W6-21] Distribution Of The Sample Mean The average of independent Normal random variables is also Normally distributed. [W6-22] Example Cats’ weights It is known that the population standard deviation of cats’ weights is σ = 0.5kg and the population mean is µ = 4kg. If we take a sample of 10 cats what is the distribution of the sample mean? If we take a sample of 30 cats what is the distribution of the sample mean? [W6-23] Central Limit Theorem Most population distributions are not Normal. What is the shape of the sampling distribution of sample means when the population distribution isn’t Normal? It is a remarkable fact that, as the sample size increases, the distribution of sample means begins to look more and more like a Normal distribution! When the sample is large enough, the distribution of sample means is very close to Normal, no matter what shape the population distribution has, as long as the population has a finite standard deviation. [W6-24] Central Limit Theorem Draw an SRS of size n from a population with mean µ and finite standard deviation σ. The central limit theorem (CLT) says that when n is large, the sampling distribution of the sample mean x¯ is approximately normal: σ x¯ ∼ : N µ, √ n [W6-25] CLT 120000 count 90000 60000 30000 0 0 5 10 15 x [W6-26] CLT Sample size n = 5 count 6000 4000 2000 0 0 1 2 means 3 4 [W6-27] CLT Sample size n = 10 3000 count 2000 1000 0 0 1 2 means 3 [W6-28] CLT Sample size n = 30 1000 count 750 500 250 0 0.5 1.0 1.5 means [W6-29] Confidence Intervals [W6-30] The problem Consider that we have a variable with a Normal distribution. Assume that we know the population standard deviation σ, but we do not know the population mean µ How can we estimate the value of µ? [W6-31] Point estimate If we want a single point estimate of the population mean µ, we can take a simple random sample (SRS) from the population and then use the sample mean x¯ to estimate the population mean. [W6-32] Example Consider the case of estimating the population mean amount of active ingredient in manufactured tablets. You know that the population standard deviation is 0.5mg. You have taken a random sample of 10 tablets and got the following mg of active ingredient: 29.57, 29.82, 30.45, 30.87, 30.46, 29.41, 29.03, 31.05, 30.11, 30.59 What is your estimate of the population mean active ingredient? [W6-33] Confidence intervals What if you would like a range for the population mean rather than a point estimate? Use a confidence interval. [W6-34] Confidence intervals A confidence interval will give a range of values for the population mean that we are confident about to a level C %, usually 95%. [W6-35] What do we mean by 95% confident 100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 50 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 25 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 29.6 ● ● ● 75 ● ● ● ● ● 30.0 ● ● ● 30.4 means [W6-36] Confidence interval All confidence intervals we construct will have a similar form: estimate ± critical value × standard error [W6-37] Confidence interval for population mean with known population standard deviation In the case of a population mean for a Normal distribution with a known population standard deviation, then we have estimate: the sample mean critical value: we will get this from a Normal distribution and denote it as z ∗ standard error: this is the standard deviation of the sample mean √ σ/ n [W6-38] How to calculate z ∗ Consider confidence level 95% 0.4 y 0.3 0.95 0.2 0.025 0.1 0.025 0.0 −4 −2 0 2 4 x [W6-39] How to calculate z ∗ • Calculate a= 1 − C /100 2 • Enter NORMINV(a,0,1) into excel • Remove the minus sign and you have z ∗ For example for a 95% confidence interval z ∗ = 1.96 [W6-40] Example Calculate the 95% confidence interval for the tablet example. [W6-41] Interpretation of confidence interval We are <CI level> confident that the true <parameter> of <population> lies between <lower> and <upper> <units>. What is the interpretation of the 95% CI in this case? [W6-42] Summary The formula for calculating the C% confidence interval for the population mean of a Normal distribution with known population standard deviation is σ x¯ ± z ∗ √ n [W6-43]