Chapter 7. Estimates and Sample Size
Transcription
Chapter 7. Estimates and Sample Size
Chapter 7. Estimates and Sample Size Chapter Problem: How do we interpret a poll about global warming? Pew Research Center Poll: “From what you’ve read and heard, is there a solid evidence that the average temperature on earth has been increasing over the past a few decades, or not?” • 1501 randomly selected US adults responded • 70% yes Important issues related to this poll: • How can the poll be used to estimate the percentage of US adults who believe that the earth is getting warmer? • How accurate is the result of 70% likely to be? • Is the sample size too small? (1501/225,139,000 = 0.0007% of the population) • Does the method of selecting the people to be polled have much of an effect on the results? 1 7.1 Review and Preview Review • Descriptive statistics • Two major applications of inferential statistics o Estimate the value of a population parameter o Test some claim (or hypothesis) about a population This chapter focus on • Estimating important population parameters: o Proportion o Mean o Variance • Methods for determining the sample sizes necessary to estimate the above parameters 2 7.2 Estimating a Population Proportion 1. 2. 3. 4. 5. 6. 7. Introduction Why Do We Need Confidence Interval? Interpreting a Confidence Interval Critical Values Margin of Error Determining Sample Size Better-Performing Confidence Intervals 3 7.2 Estimating a Population Proportion 1. Introduction Definition – a point estimate is a single value (or point) used to approximate a population parameter. We will see, the sample proportion pˆ is the best point estimate of the population proportion p. Proportion, probability, and percent. We focus on proportion. We can also work with probabilities and percentages. 4 7.2 Estimating a Population Proportion 1. Introduction e.g.1 Proportion of Adults Believing in Global warming. In the chapter problem we noted that in a Pew Research Center poll, 70% of 1501 randomly selected adults in the U.S. believe in global warming, so that sample proportion is pˆ = 0.70. Find the best point estimate of the proportion of all adults in the U.S. who believe in global warming. Solution. The best point estimate of p is 0.70 5 7.2 Estimating a Population Proportion 2. Why Do We Need Confidence Interval? Definition – a confidence interval (or interval estimate) is a range (or an interval) of values used to estimate the true value of a population parameter. A confidence interval is sometimes abbreviated as CI. Confidence interval • Associated with a confidence level • Confidence level is the probability 1 (or area 1 – ) • is the complement of confidence level • E.g. for 95% confidence level, = 0.05 6 7.2 Estimating a Population Proportion 2. Why Do We Need Confidence Interval Definition – The confidence level is the probability 1 – that is the proportion of times that the confidence interval actually does contains the population parameter, assuming that the estimation process is repeated a large number of times. Example Here is an example of CI found later (in e.g.3), which is based on the sample data of 1501 adults polled, with 70% of them saying that they believe in global warming: The 95% confidence interval estimate of the population proportion p is 0.677 < p < 0.723 7 7.2 Estimating a Population Proportion 3. Interpreting a Confidence Interval There are correct interpretation and incorrect creative interpretations Correct: “we are 95% confident that the interval from 0.677 to 0.723 actually does contain the true value of p.” This means that if we were to select many different samples of size 1501 and construct the corresponding CI, 95% of them would actually contain the value of population proportion p. (success rate) Wrong: “There is a 95% chance that the true value of p will fall between 0.677 and 0.723.” 8 7.2 Estimating a Population Proportion 3. Interpreting a Confidence Interval – – At any specific point of time, the population has a fixed and constant value p, and a CI constructed from a sample either contain p or does not. A confidence level of 95% tells us that the process that we are using will, in the long run, result in confidence interval limits that contain the true population proportion 95% of time. 0.85 0.80 p = 0.75 0.70 0.65 This confidence interval Does not contain p = 0.75. Figure 7-1 CI’s from 20 Different Samples 9 7.2 Estimating a Population Proportion 4. Critical Values Notation for Critical Value z 2 and z 2 2 2 z/2 z/2 Figure 7-2 Critical Value z in 2 the Standard Normal Distribution Definition – a critical value is the number on the borderline separating sample statistics that are likely to occur from those that are unlikely to occur. The number z 2 is a critical value that is a z score with the property that it separates an area of /2 in the right tail of the standard normal distribution. 10 7.2 Estimating a Population Proportion 4. Critical Values e.g.2 Find the critical value z 2 corresponding to a 95% confidence level. Confidence level 95% 2 0.025 z/2 2 0.025 z/2 The total area to the left of this boundary is 0.975 Figure 7-3 Finding z 2 for a 95% CL By Table A-2, we find that z 2 1.96 11 7.2 Estimating a Population Proportion 4. Critical Values We have the following brief table: Confidence Level 90% 95% 99% 0.10 0.05 0.01 Critical Value, z 2 1.646 1.96 2.575 12 7.2 Estimating a Population Proportion 5. Margin of Error Definition – The margin of error E, is the maximum likely (with probability 1 – ) difference between the observed sample proportion pˆ and the true value of the population p. Formula 7 –1 E z / 2 pˆ qˆ n Margin of error for proportion 13 7.2 Estimating a Population Proportion 5. Margin of Error Requirements for this section 1) 2) 3) Simple random sample Conditions for binomial distribution are satisfied. At least 5 success and at least 5 failures. (With p and q unknown, we estimate their values using sample proportion) Notation for Proportions p = population proportion x pˆ = sample proportion of x successes in a sample of size n n qˆ 1 pˆ n = number of sample values 14 7.2 Estimating a Population Proportion 5. Margin of Error Confidence Interval (or Interval Estimate) for the Population Proportion p ˆ qˆ p ˆ E p p ˆ E Where E z / 2 p n The CI is often expressed in the following equivalent formats: pˆ E or pˆ E, pˆ E Round-off Rule for CI estimate of p Round the CI limits for p to three significant digits. 15 7.2 Estimating a Population Proportion 5. Margin of Error Procedure for Constructing a Confidence Interval for p. 1) Verify that the requirements are satisfied. 2) Find the critical value z / 2 that corresponds to the desired CL. pˆ qˆ n 3) Evaluate the margin or error: E z / 2 4) Obtain the CI: 5) Round the resulting CI limits to three significant digits. ˆ E p p ˆ E p 16 7.2 Estimating a Population Proportion 5. Margin of Error e.g.3 Constructing a Confidence Interval: Poll Results. In the chapter problem we noted that a Pew Research poll of 1501 randomly selected U.S. adults showed that 70% of the respondents believe in global warming. n = 1501 and pˆ = 0.70. a. Find the margin of error corresponding to 95% CL b. Find the 95% CI estimate of the population proportion p. c. Based on the results, can we conclude that the majority of adults believe in global warming? d. Assuming that you are a newspaper … Sol. Requirements met. 17 7.2 Estimating a Population Proportion 5. Margin of Error e.g.3 (Chapter Problem, TI-84 demo). a. z / 2 1.96 , pˆ = 0.70, qˆ= 0.30, and n = 1501, so E z / 2 pˆ qˆ (0.70)(0.30) 1.96 0.023183 n 1501 b. 0.7 – 0.023183 = 0.676817, 0.7 + 0.023183 = 0.723183, so CI: 0.677 < p < 0.723 or in interval notation CI = (0.677, 0.723) c. To interpret the results Based on the CI obtained in part b, it does appears that the proportion of adults who believe in global warming is greater than 50%, so we can safely conclude that the majority of adults believe in global warming. Because the limits of 0.677 and 0.723 are likely to contain the true population, it appears that the proportion is a value greater than 0.5. 18 7.2 Estimating a Population Proportion 5. Margin of Error Analyzing Polls e.g.3 Continue. When analyzing results from polls, we should consider the following: 1) The sample should be simple random sample, not an inappropriate sample (such as a voluntary response sample) 2) The confidence level should be provided. (It is often 95%, but media reports often neglect to identify it) 3) The sample size should be provided. (It is usually provided by the media, but not always) 4) Except for relatively rare cases, the quality of the poll results depends on the sampling method and the size of the sample, but the size of the population is usually not a factor. 19 7.2 Estimating a Population Proportion 6. Determining Sample Size Sample Size for Estimating Proportion p Requirement: simple random sample. When an estimate pˆ is known: Formula 7-2 When no estimate pˆ is known: Formula 7-3 2 z / 2 pˆ qˆ n E2 n z / 2 2 0.25 E2 Round-off rule for Determining Sample Size If the computed sample size n is not a whole number, round it up to the whole number. 20 7.2 Estimating a Population Proportion 6. Determining Sample Size e.g.4 How many adults use the Internet? Assume that a manager for E-Bay wants to determine the current percentage of U.S. adults who now use the Internet. How many adults must be surveyed in order to be 95% confident that the sampling percentage is in error by no more than 3 percentage points? a. Use the result from a Pew Research Center poll: in 2006, 73% of U.S. adults used the Internet b. Assume that we have no prior information. a. ˆ 0.73, qˆ 0.27, E = 0.03 = 0.05, z / 2 1.96, p n = 842 samples b. n = 1068 samples 21 7.2 Estimating a Population Proportion 6. Determining Sample Size Interpretation. To be 95% confident that our sample percentage is within 3 percentage points of the true percentage for all adults, we should obtain a simple random sample of 1068 adults. By comparing this result to the sample size of 842 found in part (a), we can see that if we have no knowledge of a prior study, a larger sample is required to achieve the same result as when the value of pˆ can be estimated. 22 7.2 Estimating a Population Proportion 6. Determining Sample Size Finding the Point Estimate and E from a Confidence Interval. Point estimate of p: (upper confidence limit) (lower confidence limit) pˆ 2 Margin of Error: (upper confidence limit) (lower confidence limit) E 2 e.g.5 Given that a confidence interval = (0.58, 0.81), Find pˆ and E. 0.58 0.81 pˆ 0.695 2 0.81 0.58 E 0.115 2 23 7.2 Estimating a Population Proportion 7. Better-Performing Confidence Intervals • Skip Using Technology for CI’s • Statdisk • TI-84 (1-PropZInt, STAT/TESTS/1-PropZInt) 24 7.3 Estimating a Population Mean: Known 1. 2. 3. 4. Introduction Confidence Interval Determining Sample Size Required to Estimate Using Technology 25 7.3 Estimating a Population Mean: Known 1. Introduction • Goal: present methods for using sample data to find a point estimate and CI estimate of population mean. Requirements: • – – – • Simple random sample The population SD is known Normal distribution or n > 30. Known population standard deviation 26 7.3 Estimating a Population Mean: Known 1. Introduction The sample mean x is the best point estimate of the population mean. • Unbiased estimator of population mean • For many populations, the distribution of sample mean x tends to be more consistent (with less variation) than the distributions of other sample statistics. 27 7.3 Estimating a Population Mean: Known 2. Confidence Interval Margin of error for estimating mean ( known) is: E z / 2 n Confidence Interval Estimate for the Population Mean (with known) is: xE xE Where E z / 2 n or xE or ( x E, x E ) 28 7.3 Estimating a Population Mean: Known 2. Confidence Interval Procedure for Constructing a Confidence Interval for p. 1) Verify that the requirements are satisfied. 2) Find the critical value z / 2 that corresponds to the desired CL. 3) Evaluate the margin or error: E z / 2 4) Obtain the CI: 5) Rounding: next page. n xˆ E xˆ E 29 7.3 Estimating a Population Mean: Known 2. Confidence Interval Round-Off Rule for CI Used to Estimate 1. When use the original set of data to construct a confidence interval, round the CI limits to one more decimal place than is used for the original set of data. 2. When use the summary statistics (n, x , s) to construct CI, round the CI limits to the same number of decimal places used for the sample mean. 30 7.3 Estimating a Population Mean: Known 2. Confidence Interval Interpreting a Confidence Interval – similar to 7.2 for proportion be careful to interpret CI correctly. After obtaining a CI estimate of the population mean , such as 95% CI of 164.49 < < 180.61. Correct: “we are 95% confident that the interval from 164.49 to 180.61 actually does contain the true value of .” This means that if we were to select many different samples of the same size and construct the corresponding CI, 95% of them would actually contain the value of population mean . Wrong: “There is a 95% chance that the true value of will fall between 164.49 and 180.61.” 31 7.3 Estimating a Population Mean: Known 2. Confidence Interval e.g.1 Weights of Men. n = 40, x = 172.55 lb, and = 26 lb. Using a 95% CL, find the following (TI-84): a. The margin of error E, and CI for . b. What do the results suggest about the mean weight of 166.3 lb that was used to determine the safe passenger capacity in 1996? 26 Sol. a. E z / 2 1.96 8.0574835 n 40 CI is x E x E: 172.55 – 8.0574835 < < 172.55 + 8.0574835 164.49 < < 180.61 (round to two decimals as in x ) 32 7.3 Estimating a Population Mean: Known 2. Confidence Interval Interpretation. The confidence interval from part (a) could also be expressed in 172.55 ± 8.06 or as (164.49, 180.61). Based on the sample with n = 40 and x = 173.55 and assumed to be 26, the confidence interval for the population means is 164.49 lb < < 180.61 lb and this interval has a 0.95 confidence level. This means that if we were to select many different simple random samples of 40 men and construct the confidence intervals as we did here, 95% of them would contain the value of the population mean 33 7.3 Estimating a Population Mean: Known 2. Confidence Interval Rationale for CI. By Central Limit Theorem, sample means (of sample size n) are normally distributed with mean and variance / n . In the equation z = ( x x ) / x , replace x with / n , replace x with , then solve for to get xz n 34 7.3 Estimating a Population Mean: Known Determining Sample Size Required to Estimate 3. Sample Size for Estimating mean Formula 7-4 z / 2 n E 2 where z / 2= critical z score based on the desired CL E = desired margin of error = population standard deviation Comments – – It does not depends on population size N. Round up. 35 7.3 Estimating a Population Mean: Known Determining Sample Size Required to Estimate 3. Dealing with Unknown When Finding Sample Size – – – – – Use range rule of thumb: range / 4 Start the sample collection process without knowing and, using the first several values, calculate the sample standard deviation s and use it in places of . The estimated value of can then be improved as more sample data are obtained, and the sample size can be refined accordingly. Estimating the value of by using the results of some other study that was done earlier. Use other know results. E.g. IQ tests are typically designed so that the mean is 100 and SD is 15. For a population of statistics professors, IQ is more than 100 and SD is less than 15. But we can assume SD is 15 to play it safe. My comments: see next section!! 36 7.3 Estimating a Population Mean: Known 3. Determining Sample Size Required to Estimate e.g.2 IQ Scores of Statistics students. Want to estimate the mean IQ score for the population of statistics students. Q: how many statistics students must be randomly selected for IQ tests if we want 95% confidence that the sample mean is within 3 IQ points of the population mean? Sol. z / 2 = 1.96 (since = 0.05) E=3 = 15 (see previous page) Now z / 2 1.96(15) n 96.04 97 samples 3 E 2 2 37 7.3 Estimating a Population Mean: Known 3. Determining Sample Size Required to Estimate Interpretation. Among the thousands of statistics students, we need to obtain a simple random sample of at least 97 of them. Then we get their IQ scores. With a simple random sample of only 97 statistics students, we will be 95% confident that the sample mean x is within 3 IQ points of the true population mean . 38 7.3 Estimating a Population Mean: Known Using Technology for CI’s • See e.g.2, TI-84 Demo • Statdisk (analysis) 39 7.4 Estimating a Population Mean: Not Known 1. 2. 3. 4. 5. Introduction Confidence Interval Choosing the Appropriate Distribution Finding Point Estimate and E from a CI Using Technology 40 7.4 Estimating a Population Mean: Not Known 1. Introduction is not known Use Student t distribution (instead of normal distribution) Method in this section is realistic, practical, and often used Requirement • • • • – – Simple random sample From normally distributed population or n > 30 41 7.4 Estimating a Population Mean: Not Known 1. Introduction Just like in section 7.3, we have: The sample mean x is the best point estimate of the population mean. 42 7.4 Estimating a Population Mean: Not Known 2. Confidence Interval Before finding the CI, first introduce Student t distribution. Student t Distribution If a population has normal distribution, then the distribution of t x s n is a Student t Distribution for all sample of size n. A student t distribution often referred to as a t distribution, is used to find critical values denoted by t / 2 . 43 7.4 Estimating a Population Mean: Not Known 2. Confidence Interval Definition The number of degrees of freedom for a collection of sample data is the number of sample values that can vary after certain restrictions have been imposed on all data values. For application in this section, degree of freedom = n – 1 Example 1. Finding a Critical Value. n = 7 samples is selected from a normally distributed population. Find t / 2 with 95% CL Sol. df = 7 – 1 = 6 Table A-3 6th row for 95% CL, = 0.05, find the column listing values for an area of 0.05 in two tails t / 2 = 2.447 (can do TI-84 demo). 44 7.4 Estimating a Population Mean: Not Known 2. Confidence Interval Margin of Error E for the Estimate of (with not known) E t / 2 s n Confidence Interval for the Estimate (with not known) xE xE 45 7.4 Estimating a Population Mean: Not Known 2. Confidence Interval Procedure for Constructing a CI for (with not known) Step 1. Verify the requirements are met. (simple random sample, either data is from a normal distribution or n > 30) Step 2. Given CL, let df = n –1, use A-3 to find the critical value t / 2 that corresponds to the desired CL s Step 3. Evaluate the margin of error E t / 2 n Step 4. Find x . Then the CI is xE xE Step 5. Original data: add one decimal place; Summary: same. 46 7.4 Estimating a Population Mean: Not Known 2. Confidence Interval e.g.2 Constructing a Confidence Interval: Garlic for Reducing Cholesterol. Use summary data, n = 49, x 0.4 , s = 21.0 and t / 2 2.009. The margin of error (95% CL) is: E 2.009 The CI is 0.4 – 6.027 < < 0.4 + 6.027, or 21.0 6.027. 49 –5.6 < < 6.4 Interpretation. Because the confidence interval contains the value of 0, it is possible that the mean of the changes in LDL cholesterol is equal to 0, suggesting that the garlic treatment did not affect the LDL cholesterol levels. It does not appear that the garlic 47 treatment is effective in lowering LDL cholesterol. 7.4 Estimating a Population Mean: Not Known 2. Confidence Interval Important Properties of the Student t Distribution 1. 2. 3. 4. 5. Different for different sample size (unlike standard normal distribution) Same bell shape as the standard normal distribution. But reflect greater variability (with wider distributions) that is expected with small samples. (n = 3, wider, n = 12 narrower) Mean = 0 (just like standard normal distribution) SD varies with sample size. It is always greater than 1 (SD > 1). Unlike standard normal distribution where SD = 1. As sample size gets larger, Student t distribution gets closer to standard normal distribution. 48 7.4 Estimating a Population Mean: Not Known 3. Choosing the Appropriate Distribution Figure 7 – 6 Choosing Between z and t start Is known yes no ? yes Is the Population normally Distributed ? yes yes no Is n > 30 Is the Population normally Distributed ? no yes Use the normal distribution Is n > 30 no ? ? Z no Use nonparametric or bootstrapping methods t Use the t distribution Use nonparametric or bootstrapping methods 49 7.4 Estimating a Population Mean: Not Known 3. Choosing the Appropriate Distribution Method Use normal (z) distribution Use t distribution Use a nonparametric method or bootstrapping Conditions Known and normally distributed population Or Known and n > 30 not known and normally distributed population Or not known and n > 30 Population is not normally distribution and n 30 Note: 1. Criteria for deciding whether the population is normally distributed: Population need not be exactly norm, but is should appear to be somewhat symmetric with one mode no outliers 2. Sample size > 30: This is a commonly used guideline, but sample sizes of 15 to 30 are adequate if the population appears to have a distribution that is not far from being normal and there is no outliers. For some population distributions that are extremely far from normal, the 50 sample size might need be much larger than 30. 7.4 Estimating a Population Mean: Not Known 3. Choosing the Appropriate Distribution Example 3. Choosing Distributions. To construct CI for . Use the given data to determine whether the margin or error E should be calculated using z / 2(from normal distribution), t / 2 (from Student t distribution) or neither? a. n = 9, x 75, s = 15, and population has a normal distribution t / 2 b. n = 5, x 20, s = 2, and population has a very skewed distribution Neither c. n = 12, x 98.6, = 0.6, and population has a normal distribution z / 2 d. n = 75, x 98.6, = 0.6, and distribution is skewed e. n = 75, x 98.6, s = 0.6, and distribution is extremely skewed z / 2 t / 2 51 7.4 Estimating a Population Mean: Not Known 3. Choosing the Appropriate Distribution e.g.4 CI for Alcohol in Video Games. Twelve different video games showing substance use were observed. The duration times (in second) of alcohol use were recorded, with the times listed below. The design of the study justifies the assumption that the sample can be treated as a simple random sample. 84, 14, 583, 50, 0, 57, 207, 43, 178, 0, 2, 57 Use this data to construct 95% CI estimate of , the mean duration time that the video showed the use of alcohol. Sol. Next page. 52 7.4 Estimating a Population Mean: Not Known 3. Choosing the Appropriate Distribution (TI-84 demo, Tinterval) to get CI = (1.8, 210.7) Frequency e.g.4 Continue. Check requirements: not normal, n > 30 not satisfied. 5 Requirement are not satisfied. 4 3 2 1 600 Interpretation. Because the requirements 0 100 200 not satisfied, we don’t have the 95% confidence that (1.8 sec, 210.7 sec) interval contains the true population mean. We should use some other method. 53 7.4 Estimating a Population Mean: Not Known 4. Finding Point Estimate and E from a CI Given the upper and lower limits of a CI, find the mean and margin of error: (upper confidence limit) (lower confidence limit) Point estimate of : x 2 E Margin of error: (upper confidence limit) (lower confidence limit) 2 e.g.5 Weights of Garbage. Data Set 22 in Appendix B. Given 95% CI (24.28, 30.607) Find the mean and margin of error. Sol. x 30.607 24.28 27.444 lb 2 E 30.607 24.28 3.164 lb 2 54 7.4 Estimating a Population Mean: Not Known 5. Using Technology • TI-84 demo – – summary data original data • Statdisk 55 7.5 Estimating a Population Variance 1. Chi-Square Distribution 2. Estimator of 2 56 7.5 Estimating a Population Variance 1. Chi-Square Distribution • Given a normally distribution population with variance 2, randomly select a sample of size n • Compute the sample variance s2, • The sample statistic 2 = (n – 1)s2/2 has sampling distribution called chi-square distribution. Chi-Square Distribution (n 1) s 2 2 Formula 7 – 5 2 Where n = sample size s2 = sample variance 2 = population variance 57 7.5 Estimating a Population Variance 1. Chi-Square Distribution • • • • • Denote the chi-square by 2 Pronounced “kigh square” To find the critical value, refer to A-4 Chi-square distribution depends on degree of freedom Degree of freedom df = n – 1 58 7.5 Estimating a Population Variance 1. Chi-Square Distribution Properties of the Distribution of the 2 Statistic • Not symmetric, as df increases, it becomes more symmetric • Value of 2 can be positive or zero, but never negative • The 2 distribution is different for different df. As df increases, the 2 distribution approaches to normal distribution (just the t distribution) 59 7.5 Estimating a Population Variance 1. Chi-Square Distribution e.g.1 Finding Critical Values of 2. A Simple random sample of ten voltage levels is obtained. Construction of a confidence level for the population standard deviation requires that the left and right critical values of 2 corresponding to the confidence level of 95% and a sample size of n = 10. Find the critical value of 2 separating an area of 0.025 in the left tail, and find the critical value of 2 separating an area of 0.025 in the right tail. Sol. Df = 10 – 1 = 9. (next page) 60 7.5 Estimating a Population Variance e.g.1 continue (can also use Statdisk, Excel, and Minitab) Sol. 61 7.5 Estimating a Population Variance 2. Estimator of 2 Point estimator • The sample variance s2 is the best point estimate of the population variance 2 • The sample standard deviation s is commonly used as a point estimate of (even though it is a biased estimate) 62 7.5 Estimating a Population Variance 2. Estimator of 2 Interval estimator Requirements: 1) 2) The sample is a simple random sample The population must have normally distributed values CI for the population variance 2: (n 1) s 2 R2 2 (n 1) s 2 L2 CI for the population standard deviation: (n 1) s 2 2 R (n 1) s 2 L2 63 7.5 Estimating a Population Variance 2. Estimator of 2 Procedure for constructing a confidence interval for 2 or . Step 1. verify that the requirements are satisfied Step 2. Using n – 1 degree of freedom, Table A-4 to find critical values L2 and R2 corresponding to the desired confidence level. Step 3. Find the CI: (n 1) s 2 R2 2 (n 1) s 2 L2 Step 4. Find CI for . Take the square root on all three places. Step 5. Rounding. Data: add one decimal place; summary: same. 64 7.5 Estimating a Population Variance 2. Estimator of 2 e.g.2 Confidence Interval for Home Voltage. Sample of 10: 123.3, 123.5, 123.7, 123.4, 123.6, 123.5, 123.5, 123.4, 123.6, 123.8 Step 1. requirements: Histogram is normal. Simple random sample. 65 7.5 Estimating a Population Variance 2. Estimator of 2 e.g.2 Confidence Interval for Home Voltage. Sample of 10: 123.3, 123.5, 123.7, 123.4, 123.6, 123.5, 123.5, 123.4, 123.6, 123.8 2 Step 2. For 95% CL, double sided. Found that L = 2.700, and R2 = 19.023. Step 3. 2 (10 1)(0.15) 2 ( 10 1 )( 0 . 15 ) 2 19.023 2.700 or, 0.010645 < 2 < 0.075000 Step 4. Take square root: 0.10 volt < < 0.27 volt Interpretation: Based on this result, the confidence interval is (0.10, 0.27). The limits of the CI is 0.10 and 0.27. But the format s E cannot be used because the CI does not have s at its center. 66 7.5 Estimating a Population Variance 2. Estimator of 2 Rationale for the CI. If we obtain simple random samples of size n from a population with variance 2, there is a probability of 1 – that the statistic (n – 1)s2/2 will fall between the critical values of L2 and R2 , i.e. there is a probability that the following is true: (n 1) s 2 2 R2 and (n 1) s 2 2 L2 Combine both in equality into one inequality, we have: (n 1) s 2 R2 2 (n 1) s 2 L2 67 7.5 Estimating a Population Variance 2. Estimator of 2 Determine the sample size. • Harder than in the case of mean and proportion. – Use Table 7-2 – Statdisk/Analysis/Sample size determination/Estimate St Dev • Excel, Minitab, TI-84 does not provide sample size. 68 7.5 Estimating a Population Variance 2. Estimator of 2 69 7.5 Estimating a Population Variance 2. Estimator of 2 e.g.3. Finding the sample size for estimating . We want to estimate the standard deviation for all voltage levels in a home. We want to be 95% confident that our estimate is within 20% of the true value of . How large should the sample size be? Assume that the population is normally distributed. Ans. at least 48 samples 70