Document 6540260

Transcription

Document 6540260
IT223 - Qualls - Week 6
Objectives
IT223 Data Analysis
Week 6
At the end of this section you should be able to answer
questions concerning sampling distributions of
sample means.
Confidence Intervals
and Sample Sizes
Part 1: Means
Specifically, you should understand:
• what is a sampling distribution (SD)
• the mean and standard deviation of a SD
• the Central Limit Theorem
DePaul University
Bill Qualls
1
2
Sampling Distributions
• Recall µ = Σxp(x) and σ² = Σ(x-µ)²p(x).
• Given p(x=1) = .25, p(x=2) = .50, and p(x=3) =
.25, find the mean and variance.
Sampling Distributions
x
1.0
p(x) xp(x)
.2500 .2500
µ
2
(x-µ
µ)
-1.0
(x-µ
µ )²
1.00
2.0
.5000 1.0000
2
0.0
0.00
.00000
.2500 .7500
2
1.0000 2.0000=µ
µ
1.0
1.00
.25000
σ²=.50
3.0
(x-µ
µ )²p(x)
.25000
3
Sampling Distributions
4
Sampling Distributions
• An experiment consists of drawing a sample of size 2,
with replacement, and finding the sample mean.
• Find the mean and variance.
x1
1
1
1
2
2
2
3
3
3
x2
1
2
3
1
2
3
1
2
3
x
1.0
1.5
2.0
1.5
2.0
2.5
2.0
2.5
3.0
p(x1)
.25
.25
.25
.50
.50
.50
.25
.25
.25
p(x2) p(x) xp(x)
.25 .0625 .0625
.50 .1250 .1875
.25 .0625 .1250
.25 .1250 .1875
.50 .2500 .5000
.25 .1250 .3125
.25 .0625 .1250
.50 .1250 .3125
.25 .0625 .1875
1.0000 2.0000=µ
µ
µ
2
2
2
2
2
2
2
2
2
(x-µ
µ)
-1.0
-0.5
0.0
-0.5
0.0
0.5
0.0
0.5
1.0
(x-µ
µ )²
1.00
0.25
0.00
0.25
0.00
0.25
0.00
0.25
1.00
(x-µ
µ )²p(x)
.06250
.03125
.00000
.03125
.00000
.03125
.00000
.03125
.06250
σ²=.25
σ =.50
5
Updated 5/12/2013
6
IT223 - Qualls - Week 6
Sampling Distributions
Sampling Distributions
• Sampling with n=2: DistributionOfSampleMeans2.xls
Summary
• Sampling with n=3: DistributionOfSampleMeans3.xls
• the mean of the distribution of sample means is
equal to the population mean
• Sampling with n=4: DistributionOfSampleMeans4.xls
• the standard deviation of the distribution of the
sample means is equal to the population standard
deviation divided by the square root of the sample
size
µx = µ
σx =
σ
n
7
8
Together
Central Limit Theorem
The numbers of sales per day made by telemarketer in four days:
1, 11, 9, 3. Assume that samples of size 2 are randomly selected
with replacement from this population of four values.
• The Central Limit Theorem tells us that regardless
of the shape of the distribution of the population,
given n sufficiently large, the distribution of the
sample means is approximately normally distributed.
a. List the 16 different possible samples and find the mean of
each of them.
b. Identify the probability of each sample, then describe the
sampling distribution of sample means.
c.
Find the mean of the sampling distribution.
d. Is the mean of the sampling distribution from part c equal to
the mean of the population of the four listed values? Are those
means always equal?
9
10
Together
Central Limit Theorem
• When working with an individual value from a
normally distributed population, use
z=
Assume adult males' weights are normally distributed
with a mean of 180 pounds and a standard deviation of
30 pounds.
x−µ
σ
• Find the probability that an adult male selected at
random weighs over 200 pounds.
• When working with a mean of a sample drawn from
a population which is normally distributed, be sure to
use the value of σ / n for the standard deviation of
sample means, and use:
z=
• Find the probability that the mean weight of 9 adult
males selected at random is over 200 pounds.
x−µ
n
σ
11
Updated 5/12/2013
12
IT223 - Qualls - Week 6
Together
Together
Assume adult males' weights are normally distributed
with a mean of 180 pounds and a standard deviation of
30 pounds.
Assume IQ scores are normally distributed with a mean
of 100 and a standard deviation of 15.
• If an individual is selected at random, what is the
probability that their IQ score is less than 95?
• Find the probability that an adult male selected at
random weighs between 175 pounds and 190
pounds.
• If 16 individuals are selected at random, what is the
probability that the mean of their IQ scores is less
than 95?
• Find the probability that the mean weight of 16 adult
males selected at random is between 175 pounds
and 190 pounds.
13
14
Together
Objectives
Assume IQ scores are normally distributed with a mean
of 100 and a standard deviation of 15.
At the end of this section you should be able to answer
questions concerning confidence intervals for a
population mean.
• If an individual is selected at random, what is the
probability that their IQ score is between 105 and
110?
Specifically, you should understand:
• how to calculate a confidence interval for a
population mean
• how and when to use z vs. t distributions
• how to determine the margin of error
• how to determine the requisite sample size given a
desired confidence level and margin of error.
• If 25 individuals are selected at random, what is the
probability that the mean of their IQ scores is
between 105 and 110?
15
16
Point Estimate of a Population Mean
We saw the following example in week 1…
• The following are the invoice amounts for 25 invoices
drawn at random from last quarter's sales data:
Confidence Intervals
about a Population Mean
82
105
126
76
86
17
Updated 5/12/2013
77
112
71
67
94
97
68
97
109
77
100
93
84
83
121
99
72
98
100
115
18
IT223 - Qualls - Week 6
Point Estimate of a Population Mean
Point Estimate of a Population Mean
• A point estimate is a single value used to
approximate a population parameter.
Relative Frequency
28%
• The best point estimate of the population mean (µ) is
24%
the sample mean (x-bar).
20%
• The best point estimate of the population standard
16%
12%
deviation (σ) is the sample standard deviation (s).
8%
4%
0
59.5
69.5
79.5
89.5
99.5
109.5
119.5
129.5
Sales
19
20
Interval Estimates
Point Estimate of a Population Mean
• We can, however, assign a level of confidence to an
interval estimate.
• For the given data, n=25, x-bar = 92.4 and s=16.7.
• What if we sampled n=250 invoices and found the
• If you were asked to come up with a 95% confidence
interval for the first case (x-bar = 92.4, n = 25), you
might say you were 95% confident that the true
mean is in the interval 92.4 ± 10.
same sample mean? We would intuitively have more
confidence in the second statistic than in the first.
• But the problem with a point estimate is that we
• But in the second case (x-bar = 92.4, n=250), you
might say you were 95% confident that the true
mean is in the interval 92.4 ± 4.
cannot assign a statistical level of confidence to it.
(Numbers used above are "guesses" only, for illustrative purposes.)
21
22
90% Confidence Interval
CI for Population Mean (σ known)
• The formula for the confidence interval (CI) for a
population mean is usually shown as:
σ
µ = x ± zα / 2
• Or sometimes
n
µ = x ± E where E is the margin of
error and is calculated as:
E = zα / 2
σ
n
• We use σ if known (only in stats textbooks!),
otherwise we use s.
23
Updated 5/12/2013
24
IT223 - Qualls - Week 6
95% Confidence Interval
99% Confidence Interval
25
26
Together
Calculating Confidence Intervals
• Find the 95% confidence interval for the mean
invoice amount using the sample data: n=250, x-bar
= 92.4. Assume σ is known to be 16.7.
• Solution:
µ = x ± zα / 2
σ
n
= 92.4 ± 1.96
16.7
250
= 92.4 ± 2.1
= (90.3, 94.5)
27
Interpretation
28
Interpretation
So what does it mean?
Wrong: We are 95% confident that the population
mean is between 90.3 and 94.5.
Correct: If the sampling process were repeated many
times, and the interval calculated each time, 95% of
those intervals would capture the true mean.
29
Updated 5/12/2013
30
IT223 - Qualls - Week 6
Together
Margin of Error
Find the 99% confidence interval for the population
mean µ of the gambling losses suffered by Packers fans
following the infamous substitute referee debacle of
September 24, 2012 given n = 40 and x-bar = $189.
Assume σ is known to be $87.
Given a confidence interval of [10.2, 16.4].
• What is the mean? (Answer: 13.3)
• What is the margin of error? (Answer: 3.1)
E
Aside from mentioning the Packers, what's wrong with
this question?
10.2
E
16.4
• What is the margin of error for the previous problem?
31
32
When to use z vs. t
Confidence Intervals
about a Population Mean
t-Distribution
33
t Distribution
34
t is a Family of Distributions
• Sometimes we need to use the t distribution
instead of the z distribution (what to use when is
discussed shortly)
• The t distribution has the following properties:
– it is a family of distributions (infinitely many)
– it has mean = 0
– it has standard deviation > 1
– it is flatter, more spread out, than z
– it approaches z as n gets larger
35
Updated 5/12/2013
36
IT223 - Qualls - Week 6
Together
CI for Population Mean (n<30)
• Find the 95% confidence interval for the mean
invoice amount using the sample data: n=25, x-bar
= 92.4 and s=16.7. Assume the population is normal.
• The formula for the confidence interval (CI) for a
population proportion is usually shown as:
µ = x ± tα / 2
• Or sometimes
s
n
• Solution:
µ = x ± tα / 2
µ = x ± E where E is the margin of
error and is calculated as:
E = tα / 2
s
n
= 92.4 ± 2.064
s
n
16.7
25
24df
Two tails
.05
t=2.064
= 92.4 ± 6.9
Margin
• Use n-1 degrees of freedom (df).
= (85.5, 99.3)
of error
37
38
t table (extract)
Comparing Confidence Intervals
---------------------------------------------------------------|
|
α
|
|
|---------------------------------------------------------------------------------------------------------------------|
|
|----------------------------------------------------------|
|
.005
|
.01
|
.025
|
.05
|
.10
|
|
| (1 tail) | (1 tail) | (1 tail) | (1 tail) | (1 tail) |
|
|---------------------------------------------------------------------------------------------------------------------|
|
|----------------------------------------------------------|
|
.01
|
.02
|
.05
|
.10
|
.20
|
| df | (2 tails) | (2 tails) | (2 tails) | (2 tails) | (2 tails) |
|-------------------------------------------------------------------------------------------------------------------------------|
|
|
|
|
|
|
|
|
| 21 |
2.831
|
2.518
|
2.080
|
1.721
|
1.323
|
| 22 |
2.819
|
2.508
|
2.074
|
1.717
|
1.321
|
| 23 |
2.807
|
2.500
|
2.069
|
1.714
|
1.320
|
| 24 |
2.797
|
2.492
|
2.064
|
1.711
|
1.318
|
| 25 |
2.787
|
2.485
|
2.060
|
1.708
|
1.316
|
|
|
|
|
|
|
|
|-------------------------------------------------------------------------------------------------------------------------------|
|
39
40
Together
Together
• Use the given confidence level and sample statistics
to find (a) the margin of error, and (b) the 90%
confidence interval for the population mean µ
lifespan of a home furnace: n = 25, x-bar = 8.5
years, s = 3.1 years. Assume the population is
normally distributed. Find the margin of error.
• Given the following sample, find the 90% confidence
interval for the mean lifespan of a home furnace in
years. Assume the population is normally distributed.
9.4
12.3
6.3
8.7
9.5
6.1
10.6
7.3
8.1
8.4
14.7
9.2
• Find the margin of error.
41
Updated 5/12/2013
42
IT223 - Qualls - Week 6
Sample Size
• How large does sample need to be to get an estimate
of µ, with an acceptable margin of error?
Determining the Proper
Sample Size
E = zα / 2
z σ 
→ solve for n → n =  α / 2 
n
 E 
σ
2
• In the above formula, E might be, for example, 400
as in ±400 dollars.
• If the population standard deviation (σ) is unknown,
then use the sample standard deviation (s).
43
Together
44
Comparing Confidence Intervals
• How many invoices do I need to sample to get a
95% confidence interval of the mean invoice amount
with a $4 margin of error? Previous sampling has
yielded a sample standard deviation of s=16.7.
45
Together
You want to estimate the mean weight loss of people
one year after using the Atkins diet. How many
dieters must be surveyed if we want to be 95%
confident that the sample mean weight loss is within
0.25 lb. of the true population mean? Assume that
the population standard deviation is known to be
10.6 lb (based on data from "Comparison of the
Atkins, Ornish, Weight Watchers, and Zone Diets for
Weight Loss and Heart Risk Disease Reduction", by
Dansinger et. al., Journal of the American Medical
Association, Vol. 293, No. 1).
Source: Triola, Page 348, Section 7-3, #35
47
Updated 5/12/2013
46