Chapter 7. Estimates and Sample Size

Transcription

Chapter 7. Estimates and Sample Size
Chapter 7. Estimates and Sample Size
Chapter Problem: How do we interpret a poll about global warming?
Pew Research Center Poll: “From what you’ve read and heard, is there a solid
evidence that the average temperature on earth has been increasing over the past a
few decades, or not?”
• 1501 randomly selected US adults responded
• 70% yes
Important issues related to this poll:
• How can the poll be used to estimate the percentage of US adults who believe
that the earth is getting warmer?
• How accurate is the result of 70% likely to be?
• Is the sample size too small? (1501/225,139,000 = 0.0007% of the population)
• Does the method of selecting the people to be polled have much of an effect on
the results?
1
7.1 Review and Preview
Review
• Descriptive statistics
• Two major applications of inferential statistics
o Estimate the value of a population parameter
o Test some claim (or hypothesis) about a population
This chapter focus on
• Estimating important population parameters:
o Proportion
o Mean
o Variance
• Methods for determining the sample sizes necessary to estimate
the above parameters
2
7.2 Estimating a Population Proportion
1.
2.
3.
4.
5.
6.
7.
Introduction
Why Do We Need Confidence Interval?
Interpreting a Confidence Interval
Critical Values
Margin of Error
Determining Sample Size
Better-Performing Confidence Intervals
3
7.2 Estimating a Population Proportion
1. Introduction
Definition – a point estimate is a single value (or point) used to
approximate a population parameter.
We will see, the sample proportion pˆ is the best point estimate of
the population proportion p.
Proportion, probability, and percent. We focus on proportion. We
can also work with probabilities and percentages.
4
7.2 Estimating a Population Proportion
1. Introduction
e.g.1 Proportion of Adults Believing in Global warming. In the
chapter problem we noted that in a Pew Research Center poll,
70% of 1501 randomly selected adults in the U.S. believe in
global warming, so that sample proportion is pˆ = 0.70. Find the
best point estimate of the proportion of all adults in the U.S.
who believe in global warming.
Solution.
The best point estimate of p is 0.70
5
7.2 Estimating a Population Proportion
2. Why Do We Need Confidence Interval?
Definition – a confidence interval (or interval estimate) is a range
(or an interval) of values used to estimate the true value of a
population parameter. A confidence interval is sometimes
abbreviated as CI.
Confidence interval
• Associated with a confidence level
• Confidence level is the probability 1   (or area 1 – )
•  is the complement of confidence level
• E.g. for 95% confidence level,  = 0.05
6
7.2 Estimating a Population Proportion
2. Why Do We Need Confidence Interval
Definition – The confidence level is the probability 1 –  that is the
proportion of times that the confidence interval actually does
contains the population parameter, assuming that the estimation
process is repeated a large number of times.
Example Here is an example of CI found later (in e.g.3), which is
based on the sample data of 1501 adults polled, with 70% of them
saying that they believe in global warming:
The 95% confidence interval estimate of the population
proportion p is 0.677 < p < 0.723
7
7.2 Estimating a Population Proportion
3. Interpreting a Confidence Interval
There are correct interpretation and incorrect creative interpretations
Correct: “we are 95% confident that the interval from 0.677 to 0.723
actually does contain the true value of p.” This means that if we
were to select many different samples of size 1501 and
construct the corresponding CI, 95% of them would actually
contain the value of population proportion p. (success rate)
Wrong: “There is a 95% chance that the true value of p will fall
between 0.677 and 0.723.”
8
7.2 Estimating a Population Proportion
3. Interpreting a Confidence Interval
–
–
At any specific point of time, the population has a fixed and
constant value p, and a CI constructed from a sample either
contain p or does not.
A confidence level of 95% tells us that the process that we
are using will, in the long run, result in confidence interval
limits that contain the true population proportion 95% of
time.
0.85
0.80
p = 0.75
0.70
0.65
This confidence interval
Does not contain p = 0.75.
Figure 7-1 CI’s from 20
Different Samples
9
7.2 Estimating a Population Proportion
4.
Critical Values
Notation for Critical Value z 2 and  z 
2


2
2
z/2
z/2
Figure 7-2 Critical Value z  in
2
the Standard Normal Distribution
Definition – a critical value is the number on the borderline
separating sample statistics that are likely to occur from those
that are unlikely to occur. The number z 2 is a critical value that
is a z score with the property that it separates an area of /2 in
the right tail of the standard normal distribution.
10
7.2 Estimating a Population Proportion
4. Critical Values
e.g.2 Find the critical value z 2 corresponding to a 95% confidence
level.
Confidence level 95%

2
 0.025
z/2

2
 0.025
z/2
The total area to the left
of this boundary is 0.975
Figure 7-3 Finding z 2 for a 95% CL
By Table A-2, we find that z 2  1.96
11
7.2 Estimating a Population Proportion
4. Critical Values
We have the following brief table:
Confidence Level

90%
95%
99%
0.10
0.05
0.01
Critical Value, z 2
1.646
1.96
2.575
12
7.2 Estimating a Population Proportion
5. Margin of Error
Definition – The margin of error E, is the maximum likely (with
probability 1 – ) difference between the observed sample
proportion pˆ and the true value of the population p.
Formula 7 –1
E  z / 2
pˆ qˆ
n
Margin of error for proportion
13
7.2 Estimating a Population Proportion
5. Margin of Error
Requirements for this section
1)
2)
3)
Simple random sample
Conditions for binomial distribution are satisfied.
At least 5 success and at least 5 failures. (With p and q unknown, we
estimate their values using sample proportion)
Notation for Proportions
p = population proportion
x
pˆ  = sample proportion of x successes in a sample of size n
n
qˆ  1  pˆ
n = number of sample values
14
7.2 Estimating a Population Proportion
5. Margin of Error
Confidence Interval (or Interval Estimate) for the Population
Proportion p
ˆ qˆ
p
ˆ E p p
ˆ  E Where E  z / 2
p
n
The CI is often expressed in the following equivalent formats:
pˆ  E
or
 pˆ  E, pˆ  E 
Round-off Rule for CI estimate of p
Round the CI limits for p to three significant digits.
15
7.2 Estimating a Population Proportion
5. Margin of Error
Procedure for Constructing a Confidence Interval for p.
1) Verify that the requirements are satisfied.
2)
Find the critical value z / 2 that corresponds to the desired CL.
pˆ qˆ
n
3)
Evaluate the margin or error: E  z / 2
4)
Obtain the CI:
5)
Round the resulting CI limits to three significant digits.
ˆ E  p p
ˆ E
p
16
7.2 Estimating a Population Proportion
5. Margin of Error
e.g.3 Constructing a Confidence Interval: Poll Results. In the chapter
problem we noted that a Pew Research poll of 1501 randomly
selected U.S. adults showed that 70% of the respondents
believe in global warming. n = 1501 and pˆ = 0.70.
a. Find the margin of error corresponding to 95% CL
b. Find the 95% CI estimate of the population proportion p.
c. Based on the results, can we conclude that the majority of adults
believe in global warming?
d. Assuming that you are a newspaper …
Sol. Requirements met.
17
7.2 Estimating a Population Proportion
5. Margin of Error
e.g.3 (Chapter Problem, TI-84 demo).
a. z / 2  1.96 , pˆ = 0.70, qˆ= 0.30, and n = 1501, so
E  z / 2
pˆ qˆ
(0.70)(0.30)
 1.96
 0.023183
n
1501
b. 0.7 – 0.023183 = 0.676817, 0.7 + 0.023183 = 0.723183, so
CI: 0.677 < p < 0.723 or in interval notation CI = (0.677, 0.723)
c. To interpret the results
Based on the CI obtained in part b, it does appears that the proportion of
adults who believe in global warming is greater than 50%, so we can safely
conclude that the majority of adults believe in global warming. Because the
limits of 0.677 and 0.723 are likely to contain the true population, it appears
that the proportion is a value greater than 0.5.
18
7.2 Estimating a Population Proportion
5. Margin of Error
Analyzing Polls e.g.3 Continue. When analyzing results from polls,
we should consider the following:
1) The sample should be simple random sample, not an
inappropriate sample (such as a voluntary response sample)
2) The confidence level should be provided. (It is often 95%, but
media reports often neglect to identify it)
3) The sample size should be provided. (It is usually provided by the
media, but not always)
4) Except for relatively rare cases, the quality of the poll results
depends on the sampling method and the size of the sample, but
the size of the population is usually not a factor.
19
7.2 Estimating a Population Proportion
6. Determining Sample Size
Sample Size for Estimating Proportion p
Requirement: simple random sample.
When an estimate pˆ is known: Formula 7-2
When no estimate pˆ is known: Formula 7-3
2

z / 2  pˆ qˆ
n
E2
n
z / 2 
2
 0.25
E2
Round-off rule for Determining Sample Size
If the computed sample size n is not a whole number, round it
up to the whole number.
20
7.2 Estimating a Population Proportion
6.
Determining Sample Size
e.g.4 How many adults use the Internet? Assume that a manager for
E-Bay wants to determine the current percentage of U.S. adults
who now use the Internet. How many adults must be surveyed
in order to be 95% confident that the sampling percentage is in
error by no more than 3 percentage points?
a. Use the result from a Pew Research Center poll: in 2006, 73%
of U.S. adults used the Internet
b. Assume that we have no prior information.
a.
ˆ  0.73, qˆ  0.27, E = 0.03
 = 0.05, z / 2  1.96, p
n = 842 samples
b.
n = 1068 samples
21
7.2 Estimating a Population Proportion
6.
Determining Sample Size
Interpretation. To be 95% confident that our sample percentage is
within 3 percentage points of the true percentage for all adults, we
should obtain a simple random sample of 1068 adults. By comparing
this result to the sample size of 842 found in part (a), we can see that
if we have no knowledge of a prior study, a larger sample is required
to achieve the same result as when the value of pˆ can be estimated.
22
7.2 Estimating a Population Proportion
6. Determining Sample Size
Finding the Point Estimate and E from a Confidence Interval.
Point estimate of p:
(upper confidence limit)  (lower confidence limit)
pˆ 
2
Margin of Error:
(upper confidence limit)  (lower confidence limit)
E
2
e.g.5 Given that a confidence interval = (0.58, 0.81), Find pˆ and E.
0.58  0.81
pˆ 
 0.695
2
0.81  0.58
E
 0.115
2
23
7.2 Estimating a Population Proportion
7. Better-Performing Confidence Intervals
• Skip
Using Technology for CI’s
• Statdisk
• TI-84 (1-PropZInt, STAT/TESTS/1-PropZInt)
24
7.3 Estimating a Population Mean:  Known
1.
2.
3.
4.
Introduction
Confidence Interval
Determining Sample Size Required to Estimate 
Using Technology
25
7.3 Estimating a Population Mean:  Known
1. Introduction
•
Goal: present methods for using sample data to find a point
estimate and CI estimate of population mean.
Requirements:
•
–
–
–
•
Simple random sample
The population SD  is known
Normal distribution or n > 30.
Known population standard deviation 
26
7.3 Estimating a Population Mean:  Known
1. Introduction
The sample mean x is the best point estimate of the population mean.
• Unbiased estimator of population mean
• For many populations, the distribution of sample mean x tends
to be more consistent (with less variation) than the distributions
of other sample statistics.
27
7.3 Estimating a Population Mean:  Known
2. Confidence Interval
Margin of error for estimating mean ( known) is: E  z / 2 

n
Confidence Interval Estimate for the Population Mean  (with
 known) is:

xE   xE
Where E  z / 2 
n
or
xE
or
( x  E, x  E )
28
7.3 Estimating a Population Mean:  Known
2. Confidence Interval
Procedure for Constructing a Confidence Interval for p.
1) Verify that the requirements are satisfied.
2)
Find the critical value z / 2 that corresponds to the desired CL.
3)
Evaluate the margin or error: E  z / 2 
4)
Obtain the CI:
5)
Rounding: next page.

n
xˆ  E    xˆ  E
29
7.3 Estimating a Population Mean:  Known
2.
Confidence Interval
Round-Off Rule for CI Used to Estimate 
1. When use the original set of data to construct a confidence
interval, round the CI limits to one more decimal place than is
used for the original set of data.
2. When use the summary statistics (n, x , s) to construct CI,
round the CI limits to the same number of decimal places used
for the sample mean.
30
7.3 Estimating a Population Mean:  Known
2.
Confidence Interval
Interpreting a Confidence Interval – similar to 7.2 for proportion be
careful to interpret CI correctly. After obtaining a CI estimate of the
population mean , such as 95% CI of 164.49 <  < 180.61.
Correct: “we are 95% confident that the interval from 164.49 to
180.61 actually does contain the true value of .” This means
that if we were to select many different samples of the same
size and construct the corresponding CI, 95% of them would
actually contain the value of population mean .
Wrong: “There is a 95% chance that the true value of  will fall
between 164.49 and 180.61.”
31
7.3 Estimating a Population Mean:  Known
2. Confidence Interval
e.g.1 Weights of Men. n = 40, x = 172.55 lb, and  = 26 lb.
Using a 95% CL, find the following (TI-84):
a. The margin of error E, and CI for .
b. What do the results suggest about the mean weight of 166.3 lb
that was used to determine the safe passenger capacity in 1996?

26
Sol. a.
E  z / 2 
 1.96 
 8.0574835
n
40
CI is x  E    x  E:
172.55 – 8.0574835 <  < 172.55 + 8.0574835
164.49 <  < 180.61 (round to two decimals as in x )
32
7.3 Estimating a Population Mean:  Known
2. Confidence Interval
Interpretation. The confidence interval from part (a) could also be
expressed in 172.55 ± 8.06 or as (164.49, 180.61). Based on the
sample with n = 40 and x = 173.55 and  assumed to be 26, the
confidence interval for the population means is 164.49 lb <  <
180.61 lb and this interval has a 0.95 confidence level. This
means that if we were to select many different simple random
samples of 40 men and construct the confidence intervals as we
did here, 95% of them would contain the value of the
population mean 
33
7.3 Estimating a Population Mean:  Known
2. Confidence Interval
Rationale for CI.
By Central Limit Theorem, sample means (of sample size n) are
normally distributed with mean  and variance  / n . In the
equation z = ( x   x ) /  x , replace  x with  / n , replace  x with ,
then solve for  to get

  xz
n
34
7.3 Estimating a Population Mean:  Known
Determining Sample Size Required to Estimate 
3.
Sample Size for Estimating mean 
Formula 7-4
 z / 2 
n

 E 
2
where z / 2= critical z score based on the desired CL
E = desired margin of error
 = population standard deviation
Comments
–
–
It does not depends on population size N.
Round up.
35
7.3 Estimating a Population Mean:  Known
Determining Sample Size Required to Estimate 
3.
Dealing with Unknown  When Finding Sample Size
–
–
–
–
–
Use range rule of thumb:   range / 4
Start the sample collection process without knowing  and, using the
first several values, calculate the sample standard deviation s and use it
in places of . The estimated value of  can then be improved as more
sample data are obtained, and the sample size can be refined
accordingly.
Estimating the value of  by using the results of some other study that
was done earlier.
Use other know results. E.g. IQ tests are typically designed so that the
mean is 100 and SD is 15. For a population of statistics professors, IQ is
more than 100 and SD is less than 15. But we can assume SD is 15 to
play it safe.
My comments: see next section!!
36
7.3 Estimating a Population Mean:  Known
3. Determining Sample Size Required to Estimate 
e.g.2 IQ Scores of Statistics students. Want to estimate the mean IQ
score for the population of statistics students.
Q: how many statistics students must be randomly selected for IQ
tests if we want 95% confidence that the sample mean is within
3 IQ points of the population mean?
Sol. z / 2 = 1.96 (since  = 0.05)
E=3
 = 15 (see previous page)
Now
 z / 2 
1.96(15) 
n

 96.04  97 samples


3


 E 
2
2
37
7.3 Estimating a Population Mean:  Known
3.
Determining Sample Size Required to Estimate 
Interpretation. Among the thousands of statistics students, we need
to obtain a simple random sample of at least 97 of them. Then we
get their IQ scores. With a simple random sample of only 97
statistics students, we will be 95% confident that the sample mean
x is within 3 IQ points of the true population mean  .
38
7.3 Estimating a Population Mean:  Known
Using Technology for CI’s
•
See e.g.2, TI-84 Demo
•
Statdisk (analysis)
39
7.4 Estimating a Population Mean:  Not Known
1.
2.
3.
4.
5.
Introduction
Confidence Interval
Choosing the Appropriate Distribution
Finding Point Estimate and E from a CI
Using Technology
40
7.4 Estimating a Population Mean:  Not Known
1. Introduction
 is not known
Use Student t distribution (instead of normal distribution)
Method in this section is realistic, practical, and often used
Requirement
•
•
•
•
–
–
Simple random sample
From normally distributed population or n > 30
41
7.4 Estimating a Population Mean:  Not Known
1. Introduction
Just like in section 7.3, we have:
The sample mean x is the best point estimate of the
population mean.
42
7.4 Estimating a Population Mean:  Not Known
2. Confidence Interval
Before finding the CI, first introduce Student t distribution.
Student t Distribution
If a population has normal distribution, then the distribution of
t
x
s
n
is a Student t Distribution for all sample of size n. A student t distribution
often referred to as a t distribution, is used to find critical values denoted
by t / 2 .
43
7.4 Estimating a Population Mean:  Not Known
2. Confidence Interval
Definition
The number of degrees of freedom for a collection of sample
data is the number of sample values that can vary after certain
restrictions have been imposed on all data values.
For application in this section, degree of freedom = n – 1
Example 1. Finding a Critical Value. n = 7 samples is selected from
a normally distributed population. Find t / 2 with 95% CL
Sol. df = 7 – 1 = 6  Table A-3  6th row  for 95% CL,  =
0.05, find the column listing values for an area of 0.05 in two
tails  t / 2 = 2.447 (can do TI-84 demo).
44
7.4 Estimating a Population Mean:  Not Known
2. Confidence Interval
Margin of Error E for the Estimate of  (with  not known)
E  t / 2 
s
n
Confidence Interval for the Estimate  (with  not known)
xE   xE
45
7.4 Estimating a Population Mean:  Not Known
2. Confidence Interval
Procedure for Constructing a CI for  (with  not known)
Step 1. Verify the requirements are met. (simple random sample,
either data is from a normal distribution or n > 30)
Step 2. Given CL, let df = n –1, use A-3 to find the critical value
t / 2 that corresponds to the desired CL
s
Step 3. Evaluate the margin of error E  t / 2 
n
Step 4. Find x . Then the CI is
xE   xE
Step 5. Original data: add one decimal place; Summary: same.
46
7.4 Estimating a Population Mean:  Not Known
2. Confidence Interval
e.g.2 Constructing a Confidence Interval: Garlic for Reducing
Cholesterol. Use summary data, n = 49, x  0.4 , s = 21.0 and
t / 2  2.009.
The margin of error (95% CL) is: E  2.009 
The CI is 0.4 – 6.027 <  < 0.4 + 6.027, or
21.0
 6.027.
49
–5.6 <  < 6.4
Interpretation. Because the confidence interval contains the value of
0, it is possible that the mean of the changes in LDL cholesterol
is equal to 0, suggesting that the garlic treatment did not affect
the LDL cholesterol levels. It does not appear that the garlic
47
treatment is effective in lowering LDL cholesterol.
7.4 Estimating a Population Mean:  Not Known
2. Confidence Interval
Important Properties of the Student t Distribution
1.
2.
3.
4.
5.
Different for different sample size (unlike standard normal distribution)
Same bell shape as the standard normal distribution. But reflect greater
variability (with wider distributions) that is expected with small samples.
(n = 3, wider, n = 12 narrower)
Mean = 0 (just like standard normal distribution)
SD varies with sample size. It is always greater than 1 (SD > 1). Unlike
standard normal distribution where SD = 1.
As sample size gets larger, Student t distribution gets closer to standard
normal distribution.
48
7.4 Estimating a Population Mean:  Not Known
3.
Choosing the Appropriate Distribution
Figure 7 – 6
Choosing Between z and t
start
Is
 known
yes
no
?
yes
Is the
Population normally
Distributed ?
yes
yes
no
Is
n > 30
Is the
Population normally
Distributed ?
no
yes
Use the normal
distribution
Is
n > 30
no
?
?
Z
no
Use nonparametric
or bootstrapping
methods
t
Use the t
distribution
Use nonparametric
or bootstrapping
methods
49
7.4 Estimating a Population Mean:  Not Known
3.
Choosing the Appropriate Distribution
Method
Use normal (z) distribution
Use t distribution
Use a nonparametric method or bootstrapping
Conditions
 Known and normally distributed population
Or
 Known and n > 30
 not known and normally distributed
population
Or
 not known and n > 30
Population is not normally distribution and n 
30
Note:
1.
Criteria for deciding whether the population is normally distributed: Population need not be
exactly norm, but is should appear to be somewhat symmetric with one mode no outliers
2.
Sample size > 30: This is a commonly used guideline, but sample sizes of 15 to 30 are
adequate if the population appears to have a distribution that is not far from being normal and
there is no outliers. For some population distributions that are extremely far from normal, the
50
sample size might need be much larger than 30.
7.4 Estimating a Population Mean:  Not Known
3. Choosing the Appropriate Distribution
Example 3. Choosing Distributions. To construct CI for . Use the
given data to determine whether the margin or error E should
be calculated using z / 2(from normal distribution), t / 2 (from
Student t distribution) or neither?
a.
n = 9, x  75, s = 15, and population has a normal distribution t / 2
b.
n = 5, x  20, s = 2, and population has a very skewed distribution Neither
c.
n = 12, x  98.6,  = 0.6, and population has a normal distribution z / 2
d.
n = 75, x  98.6,  = 0.6, and distribution is skewed
e.
n = 75, x  98.6, s = 0.6, and distribution is extremely skewed
z / 2
t / 2
51
7.4 Estimating a Population Mean:  Not Known
3.
Choosing the Appropriate Distribution
e.g.4 CI for Alcohol in Video Games. Twelve different video games
showing substance use were observed. The duration times (in
second) of alcohol use were recorded, with the times listed
below. The design of the study justifies the assumption that the
sample can be treated as a simple random sample.
84, 14, 583, 50, 0, 57, 207, 43, 178, 0, 2, 57
Use this data to construct 95% CI estimate of , the mean
duration time that the video showed the use of alcohol.
Sol. Next page.
52
7.4 Estimating a Population Mean:  Not Known
3. Choosing the Appropriate Distribution
(TI-84 demo, Tinterval) to get
CI = (1.8, 210.7)
Frequency
e.g.4 Continue.
Check requirements: not normal, n > 30 not satisfied.
5
Requirement are not satisfied.
4
3
2
1
600
Interpretation. Because the requirements 0 100 200
not satisfied, we don’t have the 95% confidence that (1.8 sec, 210.7
sec) interval contains the true population mean. We should use some
other method.
53
7.4 Estimating a Population Mean:  Not Known
4. Finding Point Estimate and E from a CI
Given the upper and lower limits of a CI, find the mean and margin
of error:
(upper confidence limit)  (lower confidence limit)
Point estimate of : x 
2
E
Margin of error:
(upper confidence limit)  (lower confidence limit)
2
e.g.5 Weights of Garbage. Data Set 22 in Appendix B. Given 95%
CI (24.28, 30.607) Find the mean and margin of error.
Sol.
x
30.607  24.28
 27.444 lb
2
E
30.607  24.28
 3.164 lb
2
54
7.4 Estimating a Population Mean:  Not Known
5. Using Technology
• TI-84 demo
–
–
summary data
original data
• Statdisk
55
7.5 Estimating a Population Variance
1. Chi-Square Distribution
2. Estimator of 2
56
7.5 Estimating a Population Variance
1. Chi-Square Distribution
• Given a normally distribution population with variance 2,
randomly select a sample of size n
• Compute the sample variance s2,
• The sample statistic 2 = (n – 1)s2/2 has sampling distribution
called chi-square distribution.
Chi-Square Distribution
(n  1) s 2
2
Formula 7 – 5  
2
Where n = sample size
s2 = sample variance
2 = population variance
57
7.5 Estimating a Population Variance
1. Chi-Square Distribution
•
•
•
•
•
Denote the chi-square by 2
Pronounced “kigh square”
To find the critical value, refer to A-4
Chi-square distribution depends on degree of freedom
Degree of freedom df = n – 1
58
7.5 Estimating a Population Variance
1. Chi-Square Distribution
Properties of the Distribution of the 2 Statistic
• Not symmetric, as df increases, it becomes more symmetric
• Value of 2 can be positive or zero, but never negative
• The 2 distribution is different for different df. As df increases,
the 2 distribution approaches to normal distribution (just the t
distribution)
59
7.5 Estimating a Population Variance
1. Chi-Square Distribution
e.g.1 Finding Critical Values of 2. A Simple random sample of ten
voltage levels is obtained. Construction of a confidence level
for the population standard deviation  requires that the left and
right critical values of 2 corresponding to the confidence level
of 95% and a sample size of n = 10. Find the critical value of 2
separating an area of 0.025 in the left tail, and find the critical
value of 2 separating an area of 0.025 in the right tail.
Sol. Df = 10 – 1 = 9. (next page)
60
7.5 Estimating a Population Variance
e.g.1 continue (can also use Statdisk, Excel, and Minitab)
Sol.
61
7.5 Estimating a Population Variance
2. Estimator of 2
Point estimator
•
The sample variance s2 is the best point estimate of the
population variance  2
•
The sample standard deviation s is commonly used as a point
estimate of  (even though it is a biased estimate)
62
7.5 Estimating a Population Variance
2. Estimator of 2
Interval estimator
Requirements:
1)
2)
The sample is a simple random sample
The population must have normally distributed values
CI for the population variance 2:
(n  1) s 2
 R2
 2 
(n  1) s 2
 L2
CI for the population standard deviation:
(n  1) s 2

2
R
 
(n  1) s 2
 L2
63
7.5 Estimating a Population Variance
2. Estimator of 2
Procedure for constructing a confidence interval for 2 or .
Step 1. verify that the requirements are satisfied
Step 2. Using n – 1 degree of freedom, Table A-4 to find critical
values  L2 and  R2 corresponding to the desired confidence
level.
Step 3. Find the CI:
(n  1) s 2
 R2
 2 
(n  1) s 2
 L2
Step 4. Find CI for . Take the square root on all three places.
Step 5. Rounding. Data: add one decimal place; summary: same.
64
7.5 Estimating a Population Variance
2. Estimator of 2
e.g.2 Confidence Interval for Home Voltage. Sample of 10:
123.3, 123.5, 123.7, 123.4, 123.6, 123.5, 123.5, 123.4, 123.6, 123.8
Step 1. requirements: Histogram is normal. Simple random sample.
65
7.5 Estimating a Population Variance
2. Estimator of 2
e.g.2 Confidence Interval for Home Voltage. Sample of 10:
123.3, 123.5, 123.7, 123.4, 123.6, 123.5, 123.5, 123.4, 123.6, 123.8
2

Step 2. For 95% CL, double sided. Found that L = 2.700, and
 R2 = 19.023.
Step 3.
2
(10  1)(0.15) 2
(
10

1
)(
0
.
15
)
 2 
19.023
2.700
or, 0.010645 < 2 < 0.075000
Step 4. Take square root: 0.10 volt <  < 0.27 volt
Interpretation: Based on this result, the confidence interval is (0.10, 0.27). The
limits of the CI is 0.10 and 0.27. But the format s  E cannot be used because the
CI does not have s at its center.
66
7.5 Estimating a Population Variance
2. Estimator of 2
Rationale for the CI. If we obtain simple random samples of size n
from a population with variance 2, there is a probability of 1 – 
that the statistic (n – 1)s2/2 will fall between the critical values of
 L2 and  R2 , i.e. there is a probability that the following is true:
(n  1) s 2
2
  R2 and
(n  1) s 2
2
  L2
Combine both in equality into one inequality, we have:
(n  1) s 2
 R2
 2 
(n  1) s 2
 L2
67
7.5 Estimating a Population Variance
2. Estimator of 2
Determine the sample size.
• Harder than in the case of mean and proportion.
– Use Table 7-2
– Statdisk/Analysis/Sample size determination/Estimate St Dev
• Excel, Minitab, TI-84 does not provide sample size.
68
7.5 Estimating a Population Variance
2. Estimator of 2
69
7.5 Estimating a Population Variance
2. Estimator of 2
e.g.3. Finding the sample size for estimating . We want to estimate
the standard deviation  for all voltage levels in a home. We
want to be 95% confident that our estimate is within 20% of the
true value of . How large should the sample size be? Assume
that the population is normally distributed.
Ans. at least 48 samples
70