Chapter 18. Inference about a Population Mean. Theory & Problems.

Transcription

Chapter 18. Inference about a Population Mean. Theory & Problems.
Chapter 18. Inference about a Population Mean
STAT 145
Conditions needed for realistic inference about population mean:
•
We can regard our data as a simple random sample (SRS) from the population.
This condition is very important.
•
Observations from the population have a Normal distribution with mean μ and standard deviation
σ . In practice, it is enough that the distribution be symmetric and single-peaked unless the sample is
very small. Both μ and σ are unknown parameters.
•
In this book: the population must be much larger than the sample, say at least 20 times as large.
Standard error
When the standard deviation of a statistic is estimated from data, the result is called the standard error of the
statistic. The standard error of the sample mean ¯x is
s
√n
.
The one-sample t statistic and the t distributions
Draw an SRS of size n from a large population that has the Normal distribution with mean μ and standard
deviation σ . The one-sample t statistic
x −μ
t= ¯
s / √n
has the t distribution with n−1 degrees of freedom.
Recall: s=
√
n
1
∑ ( x − ¯x )2 .
n−1 i=1 i
1
Chapter 18. Inference about a Population Mean
STAT 145
The t distributions
•
The density curves of the t distributions are
similar in shape to the Standard Normal curve.
They are symmetric about 0, single-peaked,
and bell-shaped.
•
The spread of the t distributions is a bit greater
than of the Standard Normal distribution. The t
distributions have more probability in the tails
and less in the center than does the Standard
Normal. This is true because substituting the
estimate s for the fixed parameter σ
introduces more variation into the statistic.
•
As the degrees of freedom increase, the t
density curve approaches the N(0, 1) curve
ever more closely. This happens because s
estimates σ more accurately as the sample
size increases. So using s in place of σ
causes little extra variation when the sample is
large.
2
Chapter 18. Inference about a Population Mean
STAT 145
The one-sample t confidence interval
Draw an SRS of size n from a large population having unknown mean . A level C confidence interval for μ is
✴
s
√n
¯x ±t ⋅
✴
✴
✴
where t is the critical value for the t (n−1) density curve with area C between −t and t . This
interval is exact when the population distribution is Normal and is approximately correct for large n in other
cases.
The one-sample t test
Draw an SRS of size n from a large population having unknown mean μ . To test the hypothesis
H 0 :μ=μ 0 , compute the one-sample t statistic
t=
¯x −μ 0
s/√n
In terms of a variable T having the t (n−1) distribution, the P-value for a test of
H a :μ >μ0 is
P(T ≥t) ,
H a :μ <μ0 is
P(T ≤t) ,
H a :μ ≠μ0 is 2 P (T ≤−|t|)
H 0 against
or 2 P (T ≥|t|) - depends on convenience of use.
These P-values are exact if the population distribution is Normal and are approximately correct for large n in
other cases.
Matched pairs t procedures
To compare the responses to the two treatments in a matched pairs design, find the difference between the
responses within each pair. Then apply the one-sample t procedures to these differences.
Robust Procedures
A confidence interval or significance test is called robust if the confidence level or P-value does not change very
much when the conditions for use of the procedure are violated.
t procedures are quite robust for non-Normality of the population except when outliers or strong skewness are
present
3
Chapter 18. Inference about a Population Mean
STAT 145
Using t procedures
•
Except in the case of small samples, the condition that the data are an SRS from the population of
interest is more important than the condition that the population distribution is Normal.
•
Sample size less than 15: Use t procedures if the data appear close to Normal (roughly symmetric single
peak, no outliers). If the data are clearly skewed or if outliers are present, do not use t.
•
Sample size at least 15: The t procedures can be used except in the presence of outliers or strong
skewness.
•
Large samples: The t procedures can be used even for clearly skewed distributions when the sample is
large, roughly n≥40 .
Can we use t procedures for these data?
Figure18.8 shows plots of several data sets. For which of these can we safely use the t procedures?10
• Figure 18.8(a) is a histogram of the percent of each state’s adult residents who are college graduates. We
have data on the entire population of 50 states, so inference is not needed. We can calculate the exact
mean for the population. There is no uncertainty due to having only a sample from the population, and
no need for a confidence interval or test. If these data were an SRS from a larger population, t inference
would be safe despite the mild skewness because n = 50.
Answer: Percent of adult college graduates in the 50 states. No, this is an entire population, not a sample.
• Figure 18.8(b) is a stemplot of the force required to pull apart 20 pieces of Douglas fir. The data are
4
Chapter 18. Inference about a Population Mean
STAT 145
strongly skewed to the left with possible low outliers, so we cannot trust the t procedures for n = 20.
Answer: Force required to pull apart 20 pieces of Douglas fir. No, there are just 20 observations and
strong skewness.
• Figure 18.8(c) is a stemplot of the lengths of 23 specimens of the red variety of the tropical flower
Heliconia. The data are mildly skewed to the right and there are no outliers. We can use the t
distributions for such data.
Answer: Lengths of 23 tropical flowers of the same variety. Yes, the sample is large enough to overcome
the mild skewness.
• Figure 18.8(d) is a histogram of the heights of the students in a college class. This distribution is quite
symmetric and appears close to Normal. We can use the t procedures for any sample size.
Answer: Heights of college students. Yes, for any size sample, because the distribution is close to
Normal.
5
Chapter 18. Inference about a Population Mean
STAT 145
You may be surprised at how variable 10 observations from a Normal(0,1) distribution looks like. Here are 25
samples.
6
Chapter 18. Inference about a Population Mean
STAT 145
And here are samples of size n = 30.
By viewing many versions of this of varying samples sizes you’ll develop your intuition about what a normal
sample looks like.
Problems 1 – 3 are for practice with Table C.
Problem 1.
A study of commuting times reports the travel times to work of a random sample of 1000 employed adults. The
mean is ¯x = 49.2 minutes and the standard deviation is s = 63.9 minutes. What is the standard error of the
mean?
Problem 2.
Use Table C to find
a) the critical value for a one-sided test with level = 0.05 based on the t(4) distribution.
b) the critical value for 98% confidence interval based on the t(26) distribution.
7
Chapter 18. Inference about a Population Mean
STAT 145
Problem 3.
✴
You have an SRS of size 30 and calculate the one-sample t-statistic. What is the critical value t such that
✴
a) T has probability 0.025 to the right of t ?
✴
b) T has probability 0.75 to the left of t ?
Problem 4.
✴
What critical value t from Table C would you use for a confidence interval for the mean of the population
in each of the following situations?
a) A 95% confidence interval based on n = 12 observations.
b) A 99% confidence interval from an SRS of 18 observations.
c) A 90% confidence interval from a sample of size 6.
Problem 5. (Exercise 18.7)
The composition of the earth's atmosphere may have changed over time. To try to discover the nature of the
atmosphere long ago, we can examine the gas in bubbles inside ancient amber. Amber is tree resin that has
hardened and been trapped in rocks. The gas in bubbles within amber should be a sample of the atmosphere at
the time the amber was formed. Measurements on specimens of amber from the late Cretaceous era (75 to 95
million years ago) give these percents of nitrogen:
63.4
65
64.4
63.3
54.8
64.5
60.8
49.1
51.0
Assume that these observations are an SRS from the late Cretaceous atmosphere. Use a 90% confidence
interval to estimate the mean percent of nitrogen in ancient air (Our present-day atmosphere is about 78.1%
nitrogen).
Problem 6.
The one-sample t statistic for testing
H 0 :μ=0
vs.
H a :μ >0 from a sample of n = 20 observations has the
value t = 1.84.
a) What are the degrees of freedom for this statistic?
✴
b) Give the two critical values t from Table C that bracket t .
What are the one-sided P-values for these two entries?
c) Is the value t
= 1.84 significant at the 5% level? Is it significant at the 1% level?
8
Chapter 18. Inference about a Population Mean
STAT 145
Problem 7.
The one-sample t statistic from a sample of n = 15 observations for the two-sided test of
H 0 :μ=64 vs.
H a :μ ≠64 has the value t = 2.12.
a) What are the degrees of freedom for t ?
✴
b) Locate the two-critical values t from Table C that bracket t. What are the two-sided P-values for these
two entries?
c) Is the value t
= 2.12 statistically sigfinicant at the 10% level? At the 5% level?
Problem 8.
Here's a new idea for treating advanced melanoma, the most serious kind of skin cancer. Genetically engineer
white blood cells to better recognize and destroy cancer cells, then infuse these cells into patients. The subjects
in a small initial study were 11 patients whose melanoma had not responded to existing treatments.
One question was how rapidly the new cells would multiply after infusion, as measured by the doubling time in
days. Here are the doubling times:
1.4
1.0
1.3
1.0
1.3
2.0
0.6
0.8
0.7
0.9
1.9
a) Examine the data. Is it reasonable to use the t procedures?
b) Give a 90% confidence interval for the mean doubling time. Are you willing to use this interval to make an
inference about the mean doubling time in a population of similar patients?
Problem 9.
Another outcome in the cancer experiment described above is measured by a test for the presence of cells that
trigger an immune response in the body and so may help fight cancer. Here are data for the 11 subjects: counts
of active cells per 100,000 cells before and after infusion of the modified cells. The difference (after minus
before) is the response variable.
a) Examine the data. Is it reasonable to use the t procedures?
b) If your conclusion in part a) is "Yes", do the data give
convincing evidence that the count of active cells is higher after
treatment?
9
Chapter 18. Inference about a Population Mean
STAT 145
Problem 10.
A researcher claims that the yearly consumption of soft drinks per person is 51 gallons. In the city “A” the
mayor thinks her residents are consuming more than the national average. In a sample of 20 randomly selected
city residents, the mean was 53.6 gallons and the standard deviation was 5.4 gallons. With α = 0.05 (5% level of
significance), is the researchers claim valid?
Problem 11. (statement is different from Problem 10)
A researcher claims that the yearly consumption of soft drinks per person is 52 gallons. In a sample of 50
randomly selected people, the mean was 56.3 gallons and the standard deviation was 3.5 gallons. With α = 0.05
(5% level of significance), is the researchers claim valid?
10