16 Chapter Generalizing a Sample’s Findings to Its

Transcription

16 Chapter Generalizing a Sample’s Findings to Its
6160811_CH16
11/8/06
4:08 PM
Page 456
Chapter
Learning Objectives:
16
Generalizing a
Sample’s Findings to Its
Population and Testing
Hypotheses About
Percents and Means
To distinguish statistics from
parameters
To understand the concept of
statistical inference
To learn how to estimate a
population mean or percentage
To test a hypothesis about a
population mean or percentage
To learn how to perform and
interpret statistical inference
with SPSS
Practitioner Viewpoint
Research is typically conducted to improve the chances of marketing success and reduce the risks of career-ending failure. Companies can’t afford
to interview everyone when they need to make a marketing decision, of
course, so the gods have given us a miracle called “sampling” that allows
us to talk to a few to learn what the many are thinking. Sometimes we use
nonprobability samples because, in the right circumstances, it is sufficient
for us to make a decision based on what x number of people have told us.
However, in other cases, when we want to make estimates of the population
parameters with a known level of precision, we must use probability sample
plans. Thus, if you have used a probability sampling plan, you can then use
SPSS to assess how precise your sample statistic is to the true population
parameter you are trying to estimate. (Of course, you learned in Chapter 13
that the size of the sample determines how accurate your sample estimate
will be.) In this chapter, you will learn how to estimate the accuracy of statistics based on a sample, and you will learn how to test hypotheses, using
the concepts of statistical inference.
Jerry W. Thomas
456
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright © 2003 by Pearson Prentice Hall.
2009934199
President/CEO
Decision Analyst, Inc.
6160811_CH16
11/8/06
4:08 PM
Page 457
Kit Kat’s Annual Mindshare Survey
2009934199
Nestlé’s Kit Kat competes with a vast variety and huge number of candy bars. To
name just a few, its competitors include Almond Joy, Chunky, Heath Bars, Hershey
Bars, M&M’s, Milky Way, Rocky Road, Snickers, Tootsie Roll, and Zagnut. The marketing managers of Kit Kat know that although there are candy bar consumers who
tend to be very brand loyal, such as Milky Way fanatics, a great many candy bar buyers constantly jump from brand to brand. Furthermore, Kit Kat has found from its
research that many of these volatile candy buyers often make up their minds as to
what bar to buy as soon as the craving hits them. So, although Kit Kat knows its Kit
Kat fans will always buy a Kit Kat, it also knows that it must be prominent in the
minds of the other candy bar buyers who are not focused on their favorite brand.
That is, a candy bar buyer prospect feels the craving for chocolate or something
sweet, and he or she then quickly conjures up a small number of competing candy
bar brands in his or her head. The decision as to which one is then made mentally
while the person is headed to the store or the closest vending machine. Kit Kat’s
challenge then is to have “mindshare.” Mindshare is similar to market share, except
it exists in the consumer’s awareness set and, hopefully, in the consideration set. The
awareness set is all the brands of which the consumer is aware in a product category, and the consideration set is those brands in the awareness set that the consumer would consider buying when the purchase urge hits.
To track consumers and, further, to assess the effectiveness of its promotional
campaigns aimed at increasing Kit Kat’s mindshare, Kit Kat commissioned an annual
survey. Here is the conversation between Stan, the Kit Kat marketing manager, and
Barbara, the research project director, which took place recently.
Where We Are:
1. Establish the need for
marketing research
2. Define the problem
3. Establish research objectives
4. Determine research design
5. Identify information types and
sources
6. Determine methods of
accessing data
7. Design data collection forms
8. Determine sample plan and size
9. Collect data
10. Analyze data
11. Prepare and present the final
research report
Statistical inference can help Kit Kat study
its mindshare changes.
457
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright © 2003 by Pearson Prentice Hall.
6160811_CH16
458
11/8/06
4:08 PM
Page 458
Chapter 16 Generalizing a Sample’s Findings to Its Population and Testing Hypotheses About Percents and Means
Stan: “I see that you have the results of our annual Kit Kat Mindshare survey.”
Barbara: “I certainly do. We don’t have the report finished yet, but I have some notes, and
I can summarize our major findings.”
Stan: “Great. I certainly hope this is good news and not bad news.”
Barbara: “I bring good news. Should I deliver it?”
Stan: “I can’t wait to hear it.”
Barbara: “All right, here goes. You probably recall that we found Kit Kat to have 50 percent mindshare last year, so we calculated confidence intervals and told you that Kit
Kat had between 42 percent and 58 percent mindshare.”
Stan: “Yes, I do recall, and it was the basis for our promotional campaign target this year.
We are striving for 70 percent mindshare.”
Barbara: “Goals are always good. The movement is in the right direction, so you are doing
something right. This year we found 65 percent mindshare for Kit Kat. That computes
to a 95 percent confidence interval of 62 to 68 percent.”
Stan: “So we almost made our 70 percent goal. Wait a minute. Last year, we had 50 percent plus or minus 8 percent, and this year we have 65 percent plus or minus 3 percent. If we had the 8 percent from last year, we can say we reached our 70 percent
goal. What gives?”
Barbara: “Two things are different. The 65 percent means more people are in agreement
that Kit Kat will be considered when they get the craving, and we increased the sample
by one-quarter. We have less variability and more sample accuracy, so the plus or
minus number is smaller.”
Stan: “Oh. Well, you’re the expert.”1
As you learned in Chapter 15, descriptive measures of central tendency and measures of variability adequately summarize the findings of a survey. However, whenever a probability sample is drawn from a population, it is not enough to simply
report the sample’s descriptive statistics, for these measures contain a certain degree
of error due to the sampling process. Every sample provides some information about
its population, but there is always some sample error that must be taken into
account. That is what Barbara is communicating to Stan in our opening case. By
reading and comprehending the material in this chapter, you should be able to
understand Barbara’s comments on variability and sample accuracy.
We begin the chapter by noting that the term “statistic” applies to a sample,
whereas the term “parameter” pertains to the related population value. Next, we
describe the concept of logical inference and show how it relates to statistical inference. There are two basic types of statistical inference, and we discuss both cases.
First, there is parameter estimation in which a value, such as the population mean,
is estimated based on a sample’s mean and its size. Second, there is hypothesis testing where an assessment is made as to how much of a sample’s findings support a
manager’s or researcher’s a priori belief regarding the size of a population value. We
provide formulas and numerical examples and also show you examples of SPSS procedures and output using The Hobbit’s Choice Restaurant survey data.
Statistics are sample values,
whereas parameters are corresponding population values.
We begin the chapter by defining the concepts of statistics and parameters. There is
a fundamental distinction you should keep in mind. Values that are computed from
information provided by a sample are referred to as the sample’s statistics, whereas
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright © 2003 by Pearson Prentice Hall.
2009934199
STATISTICS VERSUS PARAMETERS
6160811_CH16
11/8/06
4:08 PM
Page 459
The Concepts of Inference and Statistical Inference
459
Table 16.1 Population Parameters and Their Companion Sample Statistics
STATISTICAL CONCEPT
POPULATION PARAMETER
(GREEK LETTERS)
SAMPLE STATISTIC
(ROMAN LETTERS)
Average
Standard deviation
Percentage
Slope
µ (mu)
σ (sigma)
π (pi)
␤ (beta)
x
s
p
b
values that are computed from a complete census, which are considered to be precise and valid measures of the population, are referred to as parameters. Statisticians
use Greek letters when referring to population parameters and Roman letters when
referring to statistics. As you can see in Table 16.1, the notation used for a percentage is p for the statistic and π for the parameter, the notations for standard deviation
are s (statistic) and σ (parameter), and the notations for the mean are x (statistic)
and µ (parameter). Because a census is impractical, the sample statistic is used to
estimate the population parameter. This chapter describes the procedures used when
estimating various population parameters.
2009934199
THE CONCEPTS OF INFERENCE
AND STATISTICAL INFERENCE
We begin by defining inference because an understanding of this concept will help
you understand what statistical inference is all about. Inference is a form of logic in
which you make a generalization about an entire class based on what you have
observed about a small set of members of that class. When you infer, you draw a
conclusion from a small amount of evidence. For example, if two of your friends
each bought a new Chevrolet and they both complained about their cars’ performances, you might infer that all Chevrolets perform poorly. On the other hand, if
one of your friends complained about his Chevy, whereas the other one did not, you
might infer that your friend with the problem Chevy happened to buy a lemon.
Inferences are greatly influenced by the amount of evidence in support of the
generalization. So, if 20 of your friends bought new Chevrolets, and they all complained about poor performance, your inference would naturally be stronger or
more certain than it would be in the case of only two friends’ complaining.
Statistical inference is a set of procedures in which the sample size and sample
statistics are used to make estimates of population parameters. For now, let us concentrate on the percentage, p, as the sample statistic we are using to estimate the
population percentage, π, and see how sample size enters into statistical inference.
Suppose that Chevrolet suspected that there were some dissatisfied customers, and it
commissioned two independent marketing research surveys to determine the
amount of dissatisfaction that existed in its customer group. (Of course, our
Chevrolet example is entirely fictitious. We don’t mean to imply that Chevrolets perform in an unsatisfactory way.)
In the first survey, 100 customers who had purchased a Chevy in the last six
months were called on the telephone and asked, “In general, would you say that you
are satisfied or dissatisfied with the performance of your Chevrolet since you bought
it?” The survey found that 30 respondents (30 percent) are dissatisfied. This finding
could be inferred to be the total population of Chevy owners who had bought one in
the last six months, and we would say that there is 30 percent dissatisfaction.
However, we know that our sample, which, by the way, was a probability sample,
Inference is drawing a conclusion
based on some evidence.
Statistical inference takes into
account that large random samples are more accurate than are
small ones.
Statistical inference is based on
sample size and variability, which
then determine the amount of
sampling error.
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright © 2003 by Pearson Prentice Hall.
6160811_CH16
460
11/8/06
4:08 PM
Page 460
Chapter 16 Generalizing a Sample’s Findings to Its Population and Testing Hypotheses About Percents and Means
The three types of statistical inference are parameter estimation,
hypothesis tests, and tests of significant differences.
must contain some sample error, and in order to reflect this you would have to say
that there was about 30 percent dissatisfaction in the population. In other words, it
might actually be more or less than 30 percent if we did a census because the sample
provided us with only an estimate.
In the second survey, 1,000 respondents—that’s 10 times more than in the first
survey—were called on the telephone and asked the same question. This survey
found that 35 percent of the respondents are “dissatisfied.” Again, we know that
the 35 percent is an estimate containing sampling error, so now we would also say
that the population dissatisfaction percentage was about 35 percent. This means
that we have two estimates of the degree of dissatisfaction with Chevrolets. One is
about 30 percent, whereas the other is about 35 percent.
How do we translate our answers (remember they include the word “about”)
into more accurate numerical representations? Let us say you could translate them
into ballpark ranges. That is, you could translate them so we could say “30 percent
plus or minus x percent” for the sample of 100 and “35 percent plus or minus y percent” for the sample of 1,000. How would x and y compare? To answer this question, think back on how your logical inference was stronger with 20 friends than it
was with 2 friends with Chevrolets. To state this in a different way, with a larger
sample (or more evidence), we have agreed that you would be more certain that the
sample statistic was accurate with respect to estimating the true population value. In
other words, with a larger sample size you should expect the range used to estimate
the true population value to be smaller. Intuitively, you should expect the range for
y to be smaller than the range for x because you have a large sample and less sampling error.
As these examples reveal, when the statistician makes estimates of population
parameters such as the percentage or mean, the sample statistic is used as the beginning point, and then a range is computed in which the population parameter is estimated to fall. The size of the sample, or n, plays a crucial role in this computation,
as you will see in all of the statistical inference formulas we present in this chapter.
Three types of statistical inferences are often used by marketing researchers. We
will introduce them here and describe them more completely later in the chapter.
They are: parameter estimation, hypothesis testing, and tests of significant differences. Parameter estimation is used to estimate the population value (parameter)
through the use of confidence intervals. Hypothesis testing is used to compare the
sample statistic with what is believed (hypothesized) to be the population value
prior to undertaking the study. Tests of significant differences are used to compare
the sample statistics of two (or more) subgroups in the sample to see whether or not
there are statistically significant differences between their corresponding population
values. (Although we distinguished between differences analysis and inference
analysis in the previous chapter, strictly speaking, differences tests are a form of statistical inference.) We describe the first two types of statistical inference in this chapter and differences tests in Chapter 17. Also, for quick reference, we have listed and
described the three types of statistical inference in Table 16.2 with examples of findings of an online survey conducted for RealPlayer™.
PARAMETER ESTIMATION
Estimation of population parameters is a common type of statistical inference used
in marketing research survey analysis. As indicated earlier, inference is largely a
reflection of the amount of sampling error believed to exist in the sample statistic.
When the New York Times conducts a survey and finds that readers spend an average of 45 minutes daily reading the Times, or when McDonald’s determines through
a nationwide sample that 78 percent of all Egg McMuffin Breakfast buyers buy a
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright © 2003 by Pearson Prentice Hall.
2009934199
To estimate a population parameter you need a sample statistic
(mean or percentage), the standard error of the statistic, and the
desired level of confidence (95%
or 99%).
6160811_CH16
11/8/06
4:08 PM
Page 461
Parameter Estimation
461
Table 16.2 The Three Types of Statistical Inference: Results of Online Music
Listeners Survey Conducted for RealPlayer™
TYPE
DESCRIPTION
EXAMPLE
Parameter
estimation
Estimate the population value
(parameter) through the use of
confidence intervals.
Compare the sample statistic with what
is believed (hypothesized) to be the
population value prior to undertaking
the study.
Compare the sample statistics of two (or
more) subgroups in the sample to see
whether or not there are statistically
significant differences between their
corresponding population values.
The percent of PC users who listen to
music online is 30% ± 10%, or from
20% to 40%.
Online music listeners listen an
average of 45 ± 15 minutes per day,
not 90 minutes as believed by
RealPlayer managers.
Online music listeners who are under
20 years old listen 60 minutes,
whereas those over 20 listen 30
minutes, on average, and the
difference is statistically significant.
Hypothesis
testing
Tests of
significant
differences
Note: The examples are fictitious.
cup of coffee, both companies may want to determine more accurately how close
these estimates are to the actual population parameters.
Parameter estimation is the process of using sample information to compute an
interval that describes the range of a parameter such as the population mean (µ) or
the population percentage (π). It involves the use of three values: the sample statistic
(such as the mean or the percentage), the standard error of the statistic, and the
desired level of confidence (usually 95% or 99%). A discussion of how each value is
determined follows.
In parameter estimation, the sample statistic is usually a mean or a
percentage.
Sample Statistic
2009934199
You should recall from the formula provided in Chapter 15 that the mean is the
average of a set of interval- or ratio-scaled numbers. For example, you might be
working with a sample of golfers and researching the average number of golf balls
they buy per month. Or you might be investigating how much high school students
spend, on average, on fast foods between meals. For a percentage, you could be
McDonald’s can use statistical inference to estimate the
percentage of Egg McMuffin
buyers who order a cup of
coffee.
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright © 2003 by Pearson Prentice Hall.
6160811_CH16
462
11/8/06
4:08 PM
Page 462
Chapter 16 Generalizing a Sample’s Findings to Its Population and Testing Hypotheses About Percents and Means
examining what percentage of golfers buy only Maxfli golf balls, or you might be
looking at what percentage of high school students buy from Taco Bell between
meals. In either case, the mean or percentage is derived from a sample, so it is the
sample statistic.
Standard Error
The standard error is a measure
of the variability in a sampling
distribution.
The formula for mean standard
error differs from a percentage
standard error.
There usually is some degree of variability in the sample. That is, our golfers do not all
buy the same number of golf balls per month and they do not all buy Maxfli. Not all
of our high school students eat fast food between meals and not all of the ones who do
go to Taco Bell. In Chapter 15, we introduced you to variability with a mean by
describing the standard deviation, and we used the percentage distribution as a way of
describing variability when percentages are being used. Also, in Chapter 13, we
described how, if you theoretically took many, many samples and plotted the mean or
percentage as a frequency distribution, it would approximate a bell-shaped curve
called the sampling distribution. The standard error is a measure of the variability in
the sampling distribution based on what is theoretically believed to occur were we to
take a multitude of independent samples from the same population. We described the
standard error formulas in Chapter 13, but we repeat them here because they are vital
to statistical inference in that they tie together the sample size and its variability.
The formula for the standard error of the mean is as follows:
Formula for standard error
of the mean
sx =
s
n
where
sx = standard error of the mean
s = standard deviation
n = sample size
The formula for the standard error of the percentage is as follows:
Formula for standard error
of the percentage
sp =
p×q
n
where
sp = standard error of the percentage
p = the sample percentage
q = (100 − p)
n = sample size
The standard error takes into
account sample size and the
variability in the sample.
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright © 2003 by Pearson Prentice Hall.
2009934199
In both equations, the sample size n is found in the denominator. This means
that the standard error will be smaller with larger sample sizes and larger with
smaller sample sizes. At the same time, both of these formulas for the standard error
reveal the impact of the variation found in the sample. Variation is represented by
the standard deviation s for a mean and by (p × q) for a percentage. In either equation, the variation is in the numerator, so the greater the variability, the greater the
standard error. Thus, the standard error simultaneously takes into account both the
sample size and the amount of variation found in the sample. The following examples illustrate this fact.
Suppose that the New York Times survey on the amount of daily time spent
reading the Times had determined a standard deviation of 20 minutes and had used
a sample size of 100. The resulting standard error of the mean would be as follows:
6160811_CH16
11/8/06
4:08 PM
Page 463
Parameter Estimation
463
s
sx =
n
20
Calculation of standard
sx =
error of the mean with
100
standard deviation = 20
20
=
10
= 2 minutes
On the other hand, if the survey had determined a standard deviation of 40 minutes, the standard error would be as follows:
Notice how sample variability
affects the standard error in these
two examples.
s
sx =
n
40
Calculation of standard
sx =
error of the mean with
100
standard deviation = 40
40
=
10
= 4 minutes
As you can see, the standard error of the mean from a sample with little variability (20 minutes) is smaller than the standard error of the mean from a sample
with much variability (40 minutes), as long as both samples have the same size. In
fact, you should have noticed that when the variability was doubled from 20 to 40
minutes, the standard error also doubled, given identical sample sizes. Refer to
Figure 16.1.
The standard error of a percentage mirrors this logic, although the formula
looks a bit different. In this case, as we indicated earlier, the degree of variability
is inherent in the (p × q) aspect of the equation. Very little variability is indicated
if p and q are very different in size. For example, if a survey of 100 McDonald’s
breakfast buyers determined that 90 percent of the respondents ordered coffee
with their Egg McMuffin and 10 percent of the respondents did not, there would
be very little variability because almost everybody orders coffee with breakfast.
On the other hand, if the sample determined that there was a 50–50 split between
those who had and those who had not ordered coffee, there would be a great deal
more variability because any two customers would probably differ in their drink
orders.
We can apply these two results to the standard error of percentage for a comparison. Using a 90–10 percent split, the standard error of percentage is as follows:
2009934199
sp =
Calculation of standard
error of the percent with
p = 90 and q = 10
With a 90–10 percent split there
is little variability.
p×q
n
=
(90)(10)
100
=
900
100
=
9
= 3%
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright © 2003 by Pearson Prentice Hall.
6160811_CH16
464
11/8/06
4:08 PM
Page 464
Chapter 16 Generalizing a Sample’s Findings to Its Population and Testing Hypotheses About Percents and Means
Standard
Deviation
Sampling Distribution (reflective of variability)
More variability
means a larger
sampling distribution
Standard
Deviation = 40
Standard Error = 4
Less variability
means a smaller
sampling distribution
Standard
Deviation = 20
Standard Error = 2
Figure 16.1 The Variability Found in the Sample Directly Affects the Standard Error
(same sample size)
Using 50–50 percent split, the standard error of the percentage is as follows:
sp =
A 50–50 percent split has a larger
standard error than a 90–10 one
when sample size is the same.
Calculation of standard
error of the percent with
p = 50 and q = 50
p×q
n
=
(50)(50)
100
=
2,500
100
=
25
= 5%
Again, these examples show that greater variability in responses results in a
larger standard error of the percentage at a given sample size.
Confidence Intervals
Population parameters are estimated with the use of confidence
intervals.
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright © 2003 by Pearson Prentice Hall.
2009934199
Confidence intervals are the degree of accuracy desired by the researcher and stipulated as a level of confidence in the form of a percentage. We also introduced confidence intervals in Chapter 13, and we briefly review them here. Because there is
always some sampling error when a sample is taken, it is necessary to estimate the
population parameter with a range. We did this in the Chevrolet owners’ example
earlier. One factor affecting the size of the range is how confident the researcher
wants to be that the range includes the true population percentage. Normally, the
6160811_CH16
11/8/06
4:08 PM
Page 465
Parameter Estimation
researcher first decides on how confident he or she wants to be. The sample statistic
is the beginning of the estimate, but because there is sample error present, a “plus”
amount and an identical “minus” amount is added and subtracted from the sample
statistic to determine the maximum and minimum, respectively, of the range.
Typically, marketing researchers rely only on the 90 percent, 95 percent, or 99
percent levels of confidence, which correspond to ±1.64, ±1.96, and ±2.58 standard
errors, respectively. They are designated zα, so z0.99 is ±2.58 standard errors. By far,
the most commonly used level of confidence in marketing research is the 95 percent
level, corresponding to 1.96 standard errors. In fact, the 95 percent level of confidence is usually the default level found in statistical analysis programs such as SPSS.
Now that the relationship between the standard error and the measure of sample
variability—be it the standard deviation or the percentage—is apparent, it is a simple matter to determine the range in which the population parameter will be estimated. Simply use the appropriate formula for population parameter estimation for
a mean on a percentage that appears below this paragraph. We use the sample statistics, x or p, compute the standard error, and then apply our desired level of confidence. In notation form these are as follows:
Formula for population parameter
Estimation (Mean)
465
The range of your estimate of the
population mean or percentage
depends largely on the sample
size and the variability found in
the sample.
x ± z α sx
where
x = sample mean
zα = z value for 95 percent or 99 percent level of confidence
sx = standard error of the mean
Formula for population parameter
Estimation (Percentage)
Confidence intervals are estimated using these formulas.
p ± z α sp
where
p = sample percentage
zα = z value for 95 percent or 99 percent level of confidence
2009934199
sp = standard error of the percentage
If you wanted to be 99 percent confident that your range included the true population percentage, for instance, you would multiply the standard error of the percentage sp by 2.58 and add that value to the percentage p to obtain the upper limit,
and you would subtract it from the percentage to find the lower limit. Notice that
you have now taken into consideration the sample statistic p, the variability that is
in the formula for sp, the sample size n, which is also in the formula for sp, and the
degree of confidence in your estimate.
How do these formulas relate to inference? Recall that we are estimating a population parameter. That is, we are indicating a range into which it is believed that
the true population parameter falls. The size of the range is determined by those
three bits of information we have about the population on hand as a result of our
sample. The final ingredient is our level of confidence or the degree to which we
want to be correct in our estimate of the population parameter. If we are conservative and wish to assume the 99 percent level of confidence, then the range would be
more encompassing than if we are less conservative and assume only the 95 percent
level of confidence because 99 percent is associated with ±2.58 standard errors and
95 percent is associated with ±1.96 standard errors.
Marketing researchers typically
use only a 95 or a 99 percent
confidence interval.
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright © 2003 by Pearson Prentice Hall.
6160811_CH16
466
11/8/06
4:08 PM
Page 466
Chapter 16 Generalizing a Sample’s Findings to Its Population and Testing Hypotheses About Percents and Means
Using these formulas for the sample of 100 New York Times readers with a mean
reading time of 45 minutes and a standard deviation of 20 minutes, the 95 percent
and the 99 percent confidence interval estimates would be calculated as follows.
x ± 1.96 × sx
45 ± 1.96 ×
Calculation of a 95
percent confidence
interval for a mean
20
100
45 ± 1.96 × 2
45 ± 3.9
41.1− 48.9 minutes
Here are two examples of confidence interval computations with
a mean.
x ± 2.58 × sx
45 ± 2.58 ×
Calculation of a 99
percent confidence
interval for a mean
20
100
45 ± 2.58 × 2
45 ± 5.2
39.8 − 50.2 minutes
If 50 percent of the 100 Egg McMuffin eaters orders coffee, the 95 percent and
99 percent confidence intervals would be computed using the percentage formula.
p ± 1.96 × sp
p ± 1.96 ×
Calculation of a 95
percent confidence
interval for a
percentage
50 ± 1.96 ×
p×q
n
50 × 50
100
50 ± 1.96 × 5
50 ± 9.8
41.2%– 59.8%
Here are two examples of confidence interval computations with
a percentage.
p ± 2.58 × sp
p ± 2.58 ×
Calculation of a 99
percent confidence
interval for a
percentage
50 ± 2.58 ×
p×q
n
50 × 50
100
50 ± 2.58 × 5
50 ± 12.9
37.1%– 62.9%
Notice that the only thing that differs when you compare the 95 percent confidence interval computations to the 99 percent confidence interval computations in
each case is zα. It is 1.96 for 95 percent and 2.58 for 99 percent of confidence. The
confidence interval is always wider for 99 percent than it is for 95 percent when the
sample size is the same and variability is equal. Table 16.3 lists the steps used to
compute confidence intervals.
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright © 2003 by Pearson Prentice Hall.
2009934199
A 99 percent confidence interval
is always wider than a 95 percent
confidence interval if all other
factors are equal.
6160811_CH16
11/8/06
4:08 PM
Page 467
Parameter Estimation
467
Table 16.3 Steps for How to Compute Confidence Intervals for a Mean
or a Percentage
Step 1. Find the sample statistic, either the mean, x , or the percentage, p.
Step 2. Determine the amount of variability found in the sample in the form of standard error of the
mean, s x , or standard error of the percentage, sp.
Step 3. Identify the sample size, n.
Step 4. Decide on the desired level of confidence to determine the value for z: z.95 (1.96)
or z.99 (2.58).
Step 5. Compute your (95%) confidence interval as: x ± 1.96s x or p ± 1.96s p .
How to Interpret an Estimated Population Mean or Percentage Range
2009934199
How are these ranges interpreted? The interpretation is quite simple when you
remember that the sampling distribution notion is the underlying concept. If we were
using a 99 percent level of confidence, and if we repeated the sampling process and
computed the sample statistic many times, 99 percent of these repeated samples results
would produce a range that includes the population parameter. The bell-shaped distribution assumption assures us that the sampling distribution is symmetric.
Obviously, a marketing researcher would take only one sample for a particular
marketing research project, and this restriction explains why estimates must be
used. Furthermore, it is the conscientious application of probability sampling techniques that allows us to make use of the sampling distribution concept. Thus, statistical inference procedures are the direct linkages between probability sample design
and data analysis. Do you remember that you had to grapple with confidence levels
when we determined sample size? Now we are on the other side of the table, so to
speak, and we must use the sample size for our inference procedures. Confidence
intervals must be used when estimating population parameters, and the size of the
random sample used is always reflected in these confidence intervals.
What does this mean? The researcher can say the following about a New York
Times reading: “My best estimate is that readers spend 45 minutes reading the Times.
In addition, I am 95 percent confident that the true population value falls between
41.1 and 48.9 minutes.” With the tools of statistical inference, a researcher has a
good idea of how closely the sample statistic represents the population parameter.
There are five steps to computing
a confidence interval.
With confidence intervals,
Jeff Dean will know what
percentage of his target
market listens to “rock”
radio programs.
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright © 2003 by Pearson Prentice Hall.
6160811_CH16
468
11/8/06
4:08 PM
Page 468
Chapter 16 Generalizing a Sample’s Findings to Its Population and Testing Hypotheses About Percents and Means
As a final note, we want to remind you that the logic of statistical inference is
identical to the reasoning process you go through when you weigh evidence to
make a generalization or conclusion of some sort. The more evidence you have, the
more precise you will be in your generalization. The only difference is that with statistical inference we must follow certain rules that require the application of formulas so our inferences will be consistent with the assumptions of statistical theory.
When you make a nonstatistical inference, your judgment can be swayed by subjective factors, so you may not be consistent with others who are making an inference with the same evidence. But in statistical inference, the formulas are completely objective and perfectly consistent. Plus, they are based on accepted
statistical concepts.
RETURN TO
SPSS
Your Integrated Case
The Hobbit’s Choice Restaurant Survey:
How to Obtain a Confidence Interval with SPSS
We will now show you how to use SPSS to obtain confidence intervals.
How to Obtain a Confidence Interval for a Percent with SPSS
Your SPSS program will not calculate the confidence intervals for a percentage. This is
because with a categorical variable such as the radio programming format of The Hobbit’s
Choice Restaurant survey, there may be many different categories. But you know now that
the computation is fairly easy, and all you need to know is the value of p and the sample
size, both of which you can obtain using the “Frequencies” procedure in SPSS.
Here is an example using The Hobbit’s Choice data set. We found in our descriptive
analysis that “rock” was the most preferred radio show format, and 41.3 percent of the
respondents listened to it. We can calculate the confidence intervals at 95 percent level of
confidence easily as we know that the sample size was 400. Here are the calculations.
p ± 1.96 × s p
Calculation of a 95
percent confidence
interval for the
“rock” radio format
percentage in The
Hobbit’s Choice
population
p ± 1.96 ×
41.3 ± 1.96 ×
p ×q
n
41.3 × 58.7
400
41.3 ± 1.96 × 2.46
41.3 ± 4.8
36.5%– 46.1%
Now we can say, “Our best estimate of the percentage of the population that prefers
‘rock’ radio is 41.3 percent, and we are 95 percent confident that the true population value
is between 36.5 percent and 46.1 percent.”
How to Obtain a Confidence Interval for a Mean with SPSS
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright © 2003 by Pearson Prentice Hall.
2009934199
Fortunately, because the calculations are a bit more complicated and tedious, your SPSS
program will calculate the confidence interval for a mean. To illustrate this feature, we will
revisit some very critical comments that Cory Rogers, research project manager, made to
Jeff Dean. In fact, Cory Rogers had already made some estimates of demand using the
forecasting model. He told Jeff that if only 4 percent of heads of households in the 12 zip
code areas claimed they were “very likely” to patronize the restaurant and if these same
6160811_CH16
11/8/06
4:08 PM
Page 469
Parameter Estimation
people spent an average of $200 per month in restaurants and were willing to pay an average of $18 for an à la carte entrée, then the model predicted a very successful restaurant
operation.
You should have already found via your descriptive analysis (Case 15.3, page 455), that
18 percent of the respondents indicated “very likely.” If you calculate the 95 percent confidence intervals for the population estimate, you will find the range to be 14.2 percent to
21.8 percent, so the first condition is satisfied. The next requirement is an average of $200
per month. We can test this with a 95 percent confidence interval for the mean dollars
spent on restaurants for the “very likely” respondents who represent the “very likely” population.
The requirement says that the “very likely” individuals must spend an average of $200
per month, so we must select only those respondents for this analysis. You learned earlier
how to select respondents, and you should recall that it is accomplished with the DATASELECT CASES menu sequence. Then all we need to specify is the selection condition of
“is likely to patronize Hobbit’s Choice Restaurant” = 5. With this operation, SPSS will analyze only these respondents.
Figure 16.2 shows the clickstream sequence to accomplish a 95 percent confidence
interval estimate using SPSS. As you can see, the correct SPSS procedure is a “One
Sample t-test,” and you use the ANALYZE-COMPARE MEANS-ONE SAMPLE T TEST menu
clickstream sequence to open up the proper window. Refer to Figure 16.2 to see that all
you need to do is to select the “Dollars spent in restaurants per month” variable into the
Test Variables area, and then click “OK.”
Figure 16.3 shows the results of ANALYZE-COMPARE MEANS-ONE SAMPLE T TEST
for dollars spent per month in a restaurant. As you can see, the average monthly restaurant
expenditures for our “very likely” folks are about $282, and the 95 percent confidence
interval is $266.99 to $296.92. To repeat our interpretation of this finding: If we conducted
a great many replications of this survey using the same sample size, we would find that 95
percent of the sample average monthly restaurant expenditures for the “very likely” respondents would fall between about $267 and $297.
469
SPSS
SPSS Student Assistant Online
Establishing Confidence Intervals
2009934199
for Means
Figure 16.2 The SPSS Clickstream to Obtain a 95% Confidence Interval
for a Mean
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright © 2003 by Pearson Prentice Hall.
6160811_CH16
470
11/8/06
4:08 PM
Page 470
Chapter 16 Generalizing a Sample’s Findings to Its Population and Testing Hypotheses About Percents and Means
Figure 16.3 The SPSS Output for a 95% Confidence Interval for a Mean
This is very good news for our budding restaurateur, Jeff Dean, as we have satisfied
two of the three conditions specified by researcher Cory Rogers: the percentage of individuals in the population who are very likely to visit the proposed restaurants exceeds 4
percent, and these people are spending more than an average of $200 per month on
restaurants.
In fact, with our findings, we can estimate the total market potential for all upscale
restaurants in the metropolitan area. See Marketing Research Insight 16.1.
Estimating market potential is only one of many useful applications of statistical inference using survey and other data. A company that has built a global reputation in marketing research and decision-making statistical techniques is Decision
Analyst, Inc. of Arlington, Texas. You can meet Jerry Thomas, CEO of Decision
Analyst, Inc. in our “Meet a Marketing Researcher” item on page 471.
HYPOTHESIS TESTING
A hypothesis is what the manager
or researcher expects the population mean (or percentage) to be.
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright © 2003 by Pearson Prentice Hall.
2009934199
Sometimes someone, such as the marketing researcher or marketing manager, makes
a statement about the population parameter based on prior knowledge, assumptions, or intuition. This statement, called a hypothesis, most commonly takes the
form of an exact specification as to what the population parameter value is.
Hypothesis testing is a statistical procedure used to “accept” or “reject” the
hypothesis based on sample information. With all hypothesis tests, you should keep
in mind that the sample is the only source of current information about the population. Because our sample is a probability sample and therefore representative of the
population, the sample results are used to determine whether or not the hypothesis
about the population parameter has been supported.
6160811_CH16
11/8/06
4:08 PM
Page 471
Hypothesis Testing
471
MARKETING RESEARCH
Additional Insights
How to Estimate Market Potential Using a Survey’s Findings
A common way to estimate total market potential is to rely on
the definition of a market. A market is people with the willingness and ability to pay for a product or a service. This
definition can be expressed somewhat like a formula, in the
following way.
Market potential = Population base times percent likely to
buy times amount they are willing to pay
In The Hobbits’ Choice Restaurant case, we know that
the metropolitan population is about 500,000, which translates to about 167,000 households. We also know that not
every household dines out regularly. A recent article in the
City Magazine reported that one out of 10 households eats
evening meals at the major restaurants in the city. So there
are about 16,700 households comprising the “sit down”
restaurant market population base.
We found the 95 percent confidence intervals for the individuals (who represent households) who are “very likely” to
patronize an upscale restaurant to be 14.2 percent to 21.8
percent, and we found that they spend about $282, on aver-
INSIGHT
16.1
age, each month at restaurants. With these facts, findings,
and confidence intervals, we can make three estimates of
the market potential for an upscale restaurant.
PESSIMISTIC ESTIMATE
BEST ESTIMATE
OPTIMISTIC ESTIMATE
16,700
times 14.2%
times $282
= $668,735
16,700
times 18.0%
times $282
= $847,692
16,700
times 21.8%
times $282
= $1,026,649
Using the 95 percent confidence intervals and the sample percentage, the total market potential is found to be
between about $0.7 million and $1.0 million dollars per
month, which amounts to $8.4 million to $12.0 million per
year. The best annual estimate is about $850,000. It is
“best” because it is based on the sample percentage, which
is the best estimate of the true population percentage of
“very likely” households. The 95 percent confidence interval
estimates are possible, but if many, many replications of the
survey were to take place, most of the percentages would fall
near 18 percent.
MARKETING RESEARCH
Meet a Marketing Researcher
2009934199
Meet Jerry W. Thomas, President/CEO of Decision Analyst, Inc.
INSIGHT
16.2
Jerry W. Thomas, president and chief executive officer, is the
founder of Decision Analyst, chairman of the board of directors, and a member of the executive committee. His principal
responsibilities include strategic planning and general management of the firm. He is deeply involved in internal research and
development activities and in the development of research
techniques and study designs used throughout the company.
He is a widely published author of articles on marketing
research and marketing strategy and a frequent public speaker
on these topics. He currently serves as chairman of the advisory board for the Master of Science in Marketing Research
(MSMR) program at the University of Texas at Arlington.
His strengths are creative study design and problem solving, business and marketing strategy, and the translation of
research data into actionable marketing recommendations.
His expertise spans both qualitative and quantitative techniques. His experience has been concentrated in the packaged goods, food service, and computer and related hightechnology industries.
(box continues)
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright © 2003 by Pearson Prentice Hall.
6160811_CH16
472
11/8/06
4:08 PM
Page 472
Chapter 16 Generalizing a Sample’s Findings to Its Population and Testing Hypotheses About Percents and Means
Before founding Decision Analyst, he was a senior
vice president at M/A/R/C in Dallas, Texas. Before that,
he worked in product management at Anderson
Clayton Foods and in marketing planning at Hallmark
People test and revise intuitive
hypotheses often without thinking
about it.
People engage in intuitive hypothesis testing constantly.
There are five steps in hypothesis
testing.
A hypothesis test gives you the
probability of support for your
hypothesis based on your sample
evidence and sample size.
Cards, Inc. in Kansas City. He holds an M.B.A. degree
from the University of Texas, with additional postgraduate studies in economics at Southern Methodist
University.
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright © 2003 by Pearson Prentice Hall.
2009934199
All of this might sound frightfully technical, but it is a form of inference that
you do every day. You just do not use the words “hypothesis” or “parameter” when
you do it. Here is an example to show how hypothesis testing occurs naturally. Your
friend, Bill, does not wear his seat belt because he thinks only a few drivers actually
wear them. But Bill’s car breaks down, and he has to ride with his coworkers to and
from work while it is being repaired. Over the course of a week, Bill rides with five
different coworkers, and he notices that four out of the five buckle up. When Bill
begins driving his car the next week, he begins fastening his seat belt.
This is intuitive hypothesis testing in action; Bill’s initial belief that few people wear
seat belts was his hypothesis. Intuitive hypothesis testing (as opposed to statistical
hypothesis testing) is when someone uses something he or she has observed to see if it
agrees with or refutes his or her belief about that topic. Everyone uses intuitive hypothesis testing; in fact, we rely on it constantly. We just do not call it hypothesis testing, but
we are constantly gathering evidence that supports or refutes our beliefs, and we reaffirm or change our beliefs based on our findings. Read Marketing Research Insight
16.3 and see that you perform intuitive hypothesis testing a great deal.
Obviously, if you had asked Bill before his car went into the repair shop, he
might have said that only a small percentage, perhaps as low as 10 percent, of drivers wear seat belts. His week of car rides is analogous to a sample of five observations, and he observes that 80 percent of his coworkers buckle up. Now his initial
hypothesis is not supported by the evidence. So Bill realizes that his hypothesis is in
error, and it must be revised. If you asked Bill what percentage of drivers wear seat
belts after his week of observations, he undoubtedly would have a much higher percentage in mind than his original estimate. The fact that Bill began to fasten his seat
belt suggests he perceives his behavior to be out of the norm, so he has adjusted his
belief and his behavior as well. In other words, his hypothesis was not supported, so
Bill revised it to be consistent with what is actually the case.
The logic of statistical hypothesis testing is very similar to this process that Bill
has just undergone.
There are five basic steps involved in hypothesis testing, and we have listed them
in Table 16.4. We have also described with each step how Bill’s hypothesis that only
10 percent of drivers buckle their seat belt is tested via intuition.
Due to the variation that we know will be caused by sampling, it is impossible
to be absolutely certain that our assessment of the acceptance or rejection of the
hypothesis will be correct if we simply compare our hypothesis arithmetically to the
sample finding. Therefore, you must fall back on the sample size concepts discussed
in Chapter 13 and rely on the use of probabilities. The statistical concept underlying
hypothesis testing permits us to say that if many, many samples were drawn, and a
comparison made for each one, a true hypothesis would be accepted, for example,
99 percent of these times.
Statistical hypothesis testing involves the use of four ingredients: the sample statistic, the standard error of the statistic, the desired level of confidence, and the
hypothesized population parameter value. The first three values were discussed in
the section on parameter estimation. The final value is simply what the researcher
believes the population parameter (π or µ) to be before the research is undertaken.
6160811_CH16
11/8/06
4:08 PM
Page 473
Hypothesis Testing
473
MARKETING RESEARCH
Additional Insights
INSIGHT
Intuitive Hypothesis Testing: We Do It All the Time!
People do intuitive hypothesis testing all the time to reaffirm
their beliefs or to reform them to be consistent with reality.
The following diagram illustrates how people perform intuitive hypothesis testing.
Here is an everyday example. As a student studying marketing research, you believe that you will “ace” the first exam
if you study hard the night before the exam. You take the
exam, and you score a 70 percent. Ouch. You now realize
that your belief was wrong, and you need to study more for
I believe
something.
I look for
something
that agrees
with my
belief.
I find
something
that agrees
with my
belief.
I find
something
that disagrees
with my
belief.
16.3
the next exam. So your hypothesis was refuted, and you now
have to come up with a new one.
You ask the student beside you, who did ace the exam,
how much study time he put in. He says he studied for
the three nights before the exam. Notice that he has
found evidence (his A grade) that supports his hypothesis,
so he will not change his study habits belief. You, on the
other hand, must change your hypothesis or suffer the
consequences.
I now believe
something
different.
I look for
something
that agrees
with my new
belief.
I go on
with my
life.
2009934199
Table 16.4 The Five Basic Steps Involved in Hypothesis Testing
(Using Bill’s Seat Belt Hypothesis)
THE STEPS
BILL’S INTUITIVE HYPOTHESIS TEST
Step 1. Begin with a statement about what
you believe exists in the population, that is,
the population mean or percentage.
Step 2. Draw a probability sample and
determine the sample statistic.
Step 3. Compare the statistic to the
hypothesized parameter.
Step 4. Decide whether the sample
supports the original hypothesis.
Step 5. If the sample does not support the
hypothesis, revise the hypothesis to be
consistent with the sample’s statistic.
In our example, Bill believed only 10% of drivers
buckle their seat belts.
Bill found that 80% of his friends buckled up.
Bill noticed that 80% is different from 10%.
The observed 80% of drivers does not support the
hypothesis that 10% buckle up.
The actual incidence of drivers who buckle their
seat belts is about 80%. (Bill, your hypothesis of
10% is not supported; you need to buckle up
like just about everyone else.)
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright © 2003 by Pearson Prentice Hall.
6160811_CH16
474
11/8/06
4:08 PM
Page 474
Chapter 16 Generalizing a Sample’s Findings to Its Population and Testing Hypotheses About Percents and Means
MARKETING RESEARCH
Additional Insights
INSIGHT
What Is an Alternative Hypothesis?
16.4
Whenever you test a stated hypothesis, you always automatically test its alternative. The alternative hypothesis takes in
all possible cases that are not treated by the stated hypothesis. For example, if you hypothesize that 50 percent of all
drivers fasten their seat belts, you are saying that the population percentage is equal to 50 percent (stated hypothesis),
and the alternative hypothesis is that the population percentage is not equal to 50 percent. To say this differently,
the alternative hypothesis is that the population percentage
can be any percentage other than 50 percent, the stated
hypothesis. The alternative hypothesis is always implicit, but
sometimes statisticians will state it along with the stated
hypothesis.
To avoid confusion, we do not formally provide the alternative hypotheses in this textbook. But here are some stated
hypotheses and their alternatives. You may want to refer
back to this exhibit if the alternative hypothesis is important
to your understanding of the concepts being described.
The importance of knowing what the alternative hypothesis stems from the fact that it is a certainty that the sample
results must support either the stated hypothesis or the alter-
There is always an alternative
hypothesis.
native hypothesis. There is no other outcome possible. If the
findings do not support the stated hypothesis, then they
must support the alternative hypothesis because it covers all
possible cases not specified in the stated hypothesis. Of
course, if the stated hypothesis is supported by the findings,
the alternative hypothesis is not supported.
STATED HYPOTHESIS
ALTERNATIVE HYPOTHESIS
Population Parameter Hypothesis
The population mean is
equal to $50.
The population percentage
is equal to 60 percent.
The population mean is
not equal to $50.
The population percentage
is not equal to 60 percent.
Directional Hypothesis
The population mean is
greater than 100.
The population percentage
is less than 70 percent.
The population mean is
less than or equal to 100.
The population percentage
is greater than or equal
to 70 percent.
Statisticians often refer to the alternative hypothesis when performing statistical
tests. This concept is important for you to know about. We have included
Marketing Research Insight 16.4 as a way to introduce you to the idea of an alternative hypothesis and to understand how it is used in statistical hypothesis tests.
Test of the Hypothesized Population Parameter Value
The hypothesized population parameter value can be determined using either a percentage or a mean. The equation used to test the hypothesis of a population percentage is as follows:
Formula for test of a
hypothesis about a percent
z =
p − πH
sx
where
p = the sample percentage
πH = the hypothesized percentage
sp = the standard error of the percentage
The equation used to test the hypothesis of a mean is identical in logic, except it
uses the mean and standard error of the mean.
Formula for test of a
hypothesis about a mean
z =
x − µH
sx
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright © 2003 by Pearson Prentice Hall.
2009934199
Here are formulas used to test a
hypothesized population parameter.
6160811_CH16
11/8/06
4:08 PM
Page 475
Hypothesis Testing
475
where
x = the sample mean
µH = the hypothesized mean
sx = the standard error of the mean
Tracking the logic of the equation for a mean, one can see that the sample mean
( x ), is compared to the hypothesized population mean (µH). Similarly, the sample percentage (p) is compared to the hypothesized percentage (πH). In this case, “compared”
means “take the difference.” This difference is divided by the standard error to determine how many standard errors away from the hypothesized parameter the sample statistic falls. The standard error, you should remember, takes into account the variability
found in the sample as well as the sample size. A small sample with much variability
yields a large standard error, so our sample statistic could be quite far away from the
mean arithmetically but still less than one standard error away in certain circumstances.
All the relevant information about the population as found by our sample is included in
these computations. Knowledge of areas under the normal curve then comes into play
to translate this distance into a probability of support for the hypothesis.
Here is a simple illustration using Bill’s seat belt hypothesis. Let us assume that
instead of observing his friends buckling up, Bill reads that a Harris Poll finds that
80 percent of respondents in a national sample of 1,000 wear their seat belts. The
hypothesis test would be computed as follows (notice we substituted the formula for
sp in the second step):
z =
Calculation of a test of a
Bill’ s hypothesis that only
10% of drivers “buckle
up.”
=
=
=
=
2009934199
Sorry, Bill. No support for
you.
To a statistician, “Compare
means” amounts to “Take the difference.”
p − πH
sp
p − πH
p×q
n
80 − 10
80 × 20
1,000
An example of no support for Bill’s
seat belt hypothesis.
70
1,600
1,000
70
1.6
= 55.6
The crux of statistical hypothesis testing is the sampling distribution concept.
Our actual sample is one of the many, many theoretical samples comprising the
assumed bell-shaped curve of possible sample results using the hypothesized value as
the center of the bell-shaped distribution. There is a greater probability of finding a
sample result close to the hypothesized mean, for example, than of finding one that
is far away. But there is a critical assumption working here. We have conditionally
accepted from the outset that the person who stated the hypothesis is correct. So, if
our sample mean turns out to be within ±2.58 standard errors of the hypothesized
mean, it supports the hypothesis maker at the 99 percent level of confidence because
it falls within 99 percent of the area under the curve.
The sampling distribution concept
says that our sample is one of
many, many theoretical samples
that comprise a bell-shaped curve
with the hypothesized value as the
mean.
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright © 2003 by Pearson Prentice Hall.
6160811_CH16
476
11/8/06
4:08 PM
Page 476
Chapter 16 Generalizing a Sample’s Findings to Its Population and Testing Hypotheses About Percents and Means
You always assume the sample
information to be more accurate
than any hypothesis.
Does the sample support Rex’s
hypothesis that student interns
make $2,750 in the first semester?
Calculation of a test of
Rex’ s hypothesis that
Northwestern Mutual
interns make an average
of $2,750 in their first
semester of work.
Rex is right!
z =
x − µH
sx
=
x − µH
s
n
=
2,800 − 2,750
350
100
50
35
= 1.43
=
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright © 2003 by Pearson Prentice Hall.
2009934199
How many standard errors is
$2,800 away from $2,750?
But what if the sample result is found to be outside this range? Which is
correct—the hypothesis or the researcher’s sample results? The answer to this question is always the same: Sample information is invariably more accurate than a
hypothesis. Of course, the sampling procedure must adhere strictly to probability
sampling requirements and assure representativeness. As you can see, Bill was
greatly mistaken because his hypothesis of 10 percent of drivers wearing seat belts
was 55.3 standard errors away from the 80 percent finding of the national poll.
The following example serves to describe the hypothesis testing process with a
mean. Northwestern Mutual Life Insurance Company has a college student internship program. The program allows college students to participate in an intensive
training program and to become field agents in one academic term. Arrangements
are made with various universities in the United States whereby students will receive
college credit if they qualify for and successfully complete this program. Rex Reigen,
district agent for Idaho, believed, based on his knowledge of other programs in the
country, that the typical college agent will be able to earn about $2,750 in his or her
first semester of participation in the program. He hypothesizes that the population
parameter, that is, the mean, will be $2,750. To check Rex’s hypothesis, a survey
was taken of current college agents, and 100 of these individuals were contacted
through telephone calls. Among the questions posed was an estimate of the amount
of money made in their first semester of work in the program. The sample mean is
determined to be $2,800, and the standard deviation is $350.
In essence, the amount of $2,750 is the hypothesized mean of the sampling distribution of all possible samples of the same size that can be taken of the college
agents in the country. The unknown factor, of course, is the size of the standard
error in dollars. Consequently, although it is assumed that the sampling distribution will be a normal curve with the mean of the entire distribution at $2,750, we
need a way to determine how many dollars are within ±1 standard error of the
mean, or any other number of standard errors of the mean for that matter. The
only information available that would help to determine the size of the standard
error is the standard deviation obtained from the sample. This standard deviation
can be used to determine a standard error with the application of the standard
error formula.
The amount of $2,800 found by the sample differs from the hypothesized
amount of $2,750 by $50. Is this amount a sufficient enough difference to cast
doubt on Rex’s estimate? Or, in other words, is it far enough from the hypothesized
mean to reject the hypothesis? To answer these questions, we compute as follows
(note that we have substituted the formula for the standard error of the mean in the
second step):
6160811_CH16
11/8/06
4:08 PM
Page 477
Hypothesis Testing
477
Acceptance Region
Rejection
Region
Rejection
Region
z = –1.96
$2,750
µH
Hypothesized mean
$2,800
x
$50 computes to
z = +1.43
z = +1.96
Sample mean
Figure 16.4 The Sample Findings Support the Hypothesis in This Example
The sample variability and the sample size have been used to determine the size
of the standard error of the assumed sampling distribution. In this case, one standard error of the mean is equal to $35. When the difference of $50 is divided by $35
to determine the number of standard errors away from which the hypothesized
mean the sample statistic lies, the result is 1.43 standard errors. As is illustrated in
Figure 16.4, 1.43 standard errors is within ±1.96 standard errors of Rex’s hypothesized mean. It also reveals that the hypothesis is supported because it falls in the
acceptance region.
Although the exact probability of support for the hypothesized parameter can be
determined from the use of a table, it is often handy just to recall the two numbers,
1.96 and 2.58; as we have said, these two are directly associated to the intervals of 95
percent and 99 percent, respectively. Anytime that the computed z value falls outside
2.58, the resulting probability of support for the hypothesis is 0.01 or less. Of course,
computer statistical programs such as SPSS will provide the exact probability
because they are programmed to look up the probability in the z table just as you
would have to do if you did the test by hand calculations and you wanted the exact
probability.
The z is calculated to be 1.43
standard errors. What does this
mean?
A computed z of 1.43 is less
than 1.96, so the hypothesis is
supported.
2009934199
Directional Hypotheses
It is sometimes appropriate to indicate a directional hypothesis. A directional
hypothesis is one that indicates the direction in which you believe the population
parameter falls relative to some target mean or percentage. That is, the owner of a
toy store might not be able to state the exact number of dollars parents spend each
time they buy a toy in that store, but the owner might say, “They spend under
$100.” A directional hypothesis is usually made with a “more than” or “less than”
statement. For example, Rex Reigen might have hypothesized that the average college agent working for Northwestern Mutual Life earns more than $2,750. In both
the “more than” case and the “less than” case, identical concepts are brought into
play, but one must take into account that only one side (one tail) of the sampling distribution is being used.
A directional hypothesis is one
in which you specify the hypothesized mean (or percentage) to
be less than or greater than
some amount.
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright © 2003 by Pearson Prentice Hall.
6160811_CH16
478
11/8/06
4:08 PM
Page 478
Chapter 16 Generalizing a Sample’s Findings to Its Population and Testing Hypotheses About Percents and Means
Table 16.5 With a Directional Hypothesis Test, the Critical Points for z
Must Be Adjusted, and the Sign (+ or –) Is Important
LEVEL OF CONFIDENCE
DIRECTION OF HYPOTHESISa
Z
95%
Greater than
Less than
Greater than
Less than
+1.64
−1.64
+2.33
−2.33
99%
aSubtract
When testing a directional
hypothesis, you must look at the
sign as well as the size of the
computed z value.
VALUE
the sample statistic (mean or percentage) from the hypothesized parameter (µ or π).
There are only two differences to keep in mind for directional hypothesis tests.
First, you must be concerned with the sign determined for the z value as well as its
size. When you subtract the hypothesized mean from the sample mean, the sign will
be positive with “greater than” hypotheses, whereas the sign will be negative for
“less than” hypotheses if the hypothesis is true. To use Rex’s example again, the
hypothesized target of $2,750 would be subtracted from the sample mean of $2,800
yielding a +$50, so the positive sign does support the “greater than” hypothesis. But
is the difference statistically significant?
To answer this question requires our second step: to divide the difference by
the standard error of the mean to compute the z value. We did this earlier and
determined the z value to be 1.43. Because we are working with only one side of
the bell-shaped distribution, we need to adjust the critical z value to reflect this
fact. As Table 16.5 shows, a z value of ±1.64 standard errors defines the endpoints for 95 percent of the normal curve, and a z value of ±2.33 standard errors
defines the endpoints for 99 percent confidence levels. Now the “greater than”
directional hypothesis is supported at that level of confidence if the computed z
value is larger than the critical cut point, and, of course, its sign is consistent with
the direction of the hypothesis. Otherwise, the directional hypothesis is not supported at our chosen level of confidence. Although the computed z value is close
(1.43), it is not equal to or greater than 1.64, so Rex’s directional hypothesis is
not supported.
How to Interpret Hypothesis Testing
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright © 2003 by Pearson Prentice Hall.
2009934199
If a hypothesis is not supported by
a random sample finding, use the
sample statistic and estimate the
population parameter.
How do you interpret hypothesis tests? The interpretation of a hypothesis test is
again directly linked to the sampling distribution concept. If the hypothesis
about the population parameter is correct or true, then a high percentage of
sample means must fall close to this value. In fact, if the hypothesis is true, then
99 percent of the sample results will fall between ±2.58 standard errors of the
hypothesized mean. On the other hand, if the hypothesis is incorrect, there is a
strong likelihood that the computed z value will fall outside ±2.58 standard
errors. In other words, you must adjust the “standard” number of standard
errors (1.96 or 2.58) for directional hypothesis tests. We have done this for you
in Table 16.5.
With the directional hypothesis test z values, the interpretation remains the
same. The further away the hypothesized value is from the actual case, the more
likely the computed z value will not fall in the critical range. Failure to support the
hypothesis essentially tells the hypothesizer that his or her assumptions about the
population are in error, and that they must be revised in light of the evidence from
the sample. This revision is achieved through estimates of the population parameter
6160811_CH16
11/8/06
4:08 PM
Page 479
How to Use SPSS to Test a Hypothesis for a Percentage
479
Like all businesspersons,
restaurant owners make
marketing strategy decisions
based on their hypotheses
about what their target
market wants.
just discussed in the previous section. These estimates can be used to provide the
manager or researcher with a new mental picture of the population through confidence interval estimates of the true population value.
HOW TO USE SPSS TO TEST A HYPOTHESIS
FOR A PERCENTAGE
As you found out earlier, SPSS does not perform statistical tests on percentages, so if
you have a hypothesis about a percentage, you are required to do the calculations
with your handy calculator. You should use the SPSS “Frequencies” procedure to
have it calculate the sample p value and to determine the sample size if you do not
know it precisely. Then apply the percentage hypothesis test formula to calculate the
z value. If the value is inside the range of ±1.96, the hypothesized percentage is supported at the 95 percent level of confidence, and if it is inside ±2.58, it is supported
at the 99 percent level. You will have an opportunity to use SPSS and do these calculations in Case 16.2 at the end of this chapter.
RETURN TO
SPSS does not perform percentages hypothesis tests, but you can
use it to obtain the necessary
information to do one by hand
calculation.
Your Integrated Case
The Hobbit’s Choice Restaurant Survey: How to Use SPSS
to Test a Hypothesis for a Mean
2009934199
SPSS
SPSS
We can take Cory Rogers’s third condition—that the “very likely” customers must be “willing to pay an average of $18 for an à la carte entrée”—as a hypothesis, and we can test it
with our Hobbit’s Choice Restaurant sample findings. Your SPSS software can be easily
directed to make a mean estimation or to test a hypothesis for a mean.
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright © 2003 by Pearson Prentice Hall.
6160811_CH16
480
11/8/06
4:08 PM
Page 480
Chapter 16 Generalizing a Sample’s Findings to Its Population and Testing Hypotheses About Percents and Means
Figure 16.5 The SPSS Clickstream to Test a Hypothesis About a Mean
To test a hypothesis about a mean
with SPSS, use the ANALYZECOMPARE MEANS-ONE SAMPLE T
TEST command sequence.
SPSS
SPSS Student Assistant Online
Testing a Hypothesis for a Mean
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright © 2003 by Pearson Prentice Hall.
2009934199
To perform a mean hypothesis test, SPSS provides a Test Value box in which the
hypothesized mean can be entered. As you can see in Figure 16.5, you get to this box by
using the ANALYZE-COMPARE MEANS-ONE SAMPLE T TEST command sequence. You
then select the variable, “average price for evening entrée.” Next, enter in an “18” as Test
Value and click on the OK button. (Remember that we selected only the “very likely”
respondent cases earlier, and that selection will remain in place until we direct SPSS to
“select all” cases or make a different case selection.)
The resulting output is contained in Figure 16.6. When you look at it, you will notice that
the information layout for the output is identical to the previous output table. It indicates
that 72 respondents were selected, and the mean of their answers was calculated to be
$34.06 (rounded up.) The output indicates our test value equal to 18, and the bottom contains 95 percent confidence intervals for the estimated population parameter (the population parameter is the difference between the hypothesized mean and the sample mean,
expected to be 0). There is a mean difference of $16.0556, which was calculated by subtracting the hypothesized mean value (18) from the sample mean (34.0556), and the standard error is provided in the upper half ($.97662). A t value of 16.440 is determined by
dividing 16.0556 by .97662. It is associated with a two-tailed significance level of 0.000.
(For now, assume the t value is the z value we have used in our formulas and explanations.
We describe use of the t value in Chapter 17.)
In other words, our Hobbit’s Choice Restaurant sample finding of a willingness to pay
an average of about $34 does not support the hypothesis of $18. The true mean is, in fact,
quite a bit greater than $18. (Do not use the information reported under “95% Confidence
Intervals” when testing a hypothesis with SPSS.)
6160811_CH16
11/8/06
4:08 PM
Page 481
Summary
481
Figure 16.6 The SPSS Output for the Test of a Hypothesis About a Mean
If you were Jeff Dean, and you just learned that the hypothesis of $18 per entrée was
not supported, and the true population mean was almost twice as large, how would you
feel? Surely, you would feel great because this satisfied the third and last condition specified by his researcher’s model. Jeff should think seriously about buying a bottle of
champagne.
2009934199
SUMMARY
This chapter began by distinguishing a sample statistic from its associated population parameter. We then introduced you to the concept of statistical inference, which
is a set of procedures for generalizing the findings from a sample to the population.
A key factor in inference is the sample size, n. It appears in statistical inference formulas because it expresses the amount of sampling error: Large samples have less
sampling error than do small samples given the same variability. We illustrated the
two inference types commonly used by marketing researchers. First, we described
how a population parameter, such as a mean, can be estimated by using confidence
intervals computed by application of the standard error formula. Second, we related
how a researcher can use the sample findings to test a hypothesis about a mean or a
percentage.
We used SPSS and The Hobbit’s Choice Restaurant data to illustrate how you
can direct SPSS to calculate 95 percent confidence intervals for the estimation of a
mean as well as how to test a hypothesis about a mean. Both are accomplished with
SPSS
SPSS Student Assistant Online
SPSS Results Coach and Case
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright © 2003 by Pearson Prentice Hall.
Studies
6160811_CH16
482
11/8/06
4:08 PM
Page 482
Chapter 16 Generalizing a Sample’s Findings to Its Population and Testing Hypotheses About Percents and Means
the SPSS menu item—One-Sample T Test procedure. For parameter estimation or
test of a hypothesis with a percentage, you can use SPSS to determine the percentage, but you must use the formulas in this chapter to calculate the confidence interval or perform the significance test.
KEY TERMS
Statistics (p. 458)
Parameters (p. 459)
Inference (p. 459)
Statistical inference (p. 459)
Tests of significant differences (p. 460)
Parameter estimation (p. 461)
Standard error (p. 462)
Standard error of the mean (p. 462)
Standard error of the percentage (p. 462)
Confidence intervals (p. 464)
Most commonly used level of
confidence (p. 465)
Formula for population parameter
estimation (p. 465)
Hypothesis (p. 470)
Hypothesis testing (p. 470)
Intuitive hypothesis testing (p. 472)
Alternative hypothesis (p. 474)
Hypothesized population parameter
(p. 474)
Sampling distribution concept (p. 475)
Directional hypothesis (p. 477)
REVIEW QUESTIONS/APPLICATIONS
1. What essential factors are taken into consideration when statistical
inference takes place?
2. What is meant by “parameter estimation,” and what function does it
perform for a researcher?
3. How does parameter estimation for a mean differ from that for a
percentage?
4. List the steps in statistical hypothesis testing. List the steps in intuitive
hypothesis testing. How are they similar? How are they different?
5. When a researcher’s sample evidence disagrees with a manager’s hypothesis,
which is right?
6. What does it mean when a researcher says that a hypothesis has been supported at the 95 percent confidence level?
7. Distinguish a directional from a nondirectional hypothesis, and provide an
example of each one.
8. Here are several computation practice exercises to help you identify which
formulas pertain and learn how to perform the necessary calculations. In
each case, perform the necessary calculations and write your answers in the
column identified by a “question mark.”
a. Determine confidence intervals for each of the following.
SAMPLE
SIZE
CONFIDENCE
LEVEL
Mean: 150
Std. Dev: 30
Percent: 67%
Mean: 5.4
Std. Dev: 0.5
Percent: 25.8%
200
95%
300
250
99%
99%
500
99%
YOUR CONFIDENCE
INTERVALS?
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright © 2003 by Pearson Prentice Hall.
2009934199
SAMPLE
STATISTIC
6160811_CH16
11/8/06
4:08 PM
Page 483
Interactive Learning
b. Test the following hypothesis and interpret your findings.
HYPOTHESIS
Mean = 7.5
Percent = 86%
Mean > 125
Percent < 33%
SAMPLE
FINDINGS
CONFIDENCE
LEVEL
Mean: 8.5
Std dev: 1.2
n = 670
p = 95
n = 1000
Mean: 135
Std dev: 15
n = 500
p = 31
n = 120
95%
YOUR TEST
RESULTS
99%
95%
99%
9. The manager of the aluminum recycling division of Environmental Services
wants a survey that will tell him how many households in the city of Seattle,
Washington, will voluntarily wash out, store, and then transport all of their
aluminum cans to a central recycling center located in the downtown area
and open only on Sunday mornings. A random survey of 500 households
determines that 20 percent of households would do so, and that each participating household expects to recycle about 100 cans monthly with a standard deviation of 30 cans. What is the value of parameter estimation in this
instance?
10. It is reported in the newspaper that a survey sponsored by Forbes magazine
with Fortune 500 company executives has found that 75 percent believe that
the United States trails Japan and Germany in automobile engineering. The
article notes that executives were interviewed at a recent “Bring the U.S. Back
to Competitiveness” symposium held on the campus of the University of
Southern California. Why would it be incorrect for the article to report confidence intervals?
11. Alamo Rent-A-Car executives believe that Alamo accounts for about 50 percent of all Cadillacs that are rented. To test this belief, a researcher randomly
identifies 20 major airports with on-site rental car lots. Observers are sent to
each location and instructed to record the number of rental company Cadillacs
observed in a four-hour period. About 500 are observed, and 30 percent are
observed being returned to Alamo Rent-A-Car. What are the implications of
this finding for the Alamo executives’ belief?
2009934199
INTERACTIVE LEARNING
Visit the Web site at www.prenhall.com/burnsbush. For this chapter, work
through the Self-Study Quizzes, and get instant feedback on whether you need
additional studying. On the Web site, you can review the chapter outlines and
case information for Chapter 16.
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright © 2003 by Pearson Prentice Hall.
483
6160811_CH16
484
11/8/06
4:08 PM
Page 484
Chapter 16 Generalizing a Sample’s Findings to Its Population and Testing Hypotheses About Percents and Means
SPSS CASE 16.1
Auto Online Survey (Part II)
You will find Case 15.2 (Part I) of the Auto Online survey on pages 453–455. You will need
to refer to the questionnaire on these pages in order to perform the proper analysis with
SPSS. You may assume that the respondents to this survey are representative of the population of automobile buyers who visited the Auto Online Web site during their vehicle purchase process.
1. In order to describe this population, estimate the population parameters for the
following.
a. How often they make purchases online.
b. Number of visits they made to Auto Online.
c. The percentage who actually bought their vehicle from Auto Online.
d. The percentage of those who felt it was a better experience than buying at a traditional dealership.
e. How do people feel about the Auto Online Web site (question 6 on the
questionnaire)?
2. Auto Online principals have the following beliefs. Test these hypotheses.
a. People will “strongly agree” to all eight statements concerning use of the Internet
and purchase (question 3 on the questionnaire).
b. Prior to buying a vehicle, people will visit the Auto Online Web site approximately
five times.
c. Just about everyone will say that buying a vehicle online is “a great deal better”
than buying it at a traditional dealership.
d. Those who buy their vehicles from Auto Online will be adults in their mid-thirties.
e. Those who buy from Auto Online will pay an average of $3,500 below the sticker
price, whereas those who buy elsewhere will pay only an average of $2,000 less
than the vehicle sticker price.
CASE 16.2
SPSS
Your Integrated Case
This is your integrated case, described on pages 42–43.
The Hobbit’s Choice Restaurant Survey Inferential Analysis
Cory Rogers was pleased with Celeste Brown’s descriptive analysis. Celeste had done all of the
proper descriptive analyses, and she had copied the relevant tables and findings into a Word
document with notations that Cory could refer to quickly.
Cory says, “Celeste, this is great work. I am going to Jeff Dean’s in an hour to show
him what we have found. In the meantime, I want you to look a bit deeper into the data. I
have jotted down some items that I want you to analyze. This is the next step in understanding how the sample findings generalize to the population of the greater metropolitan
area.”
Your task in Case 16.2 is to again take the role of Celeste Brown, marketing intern.
Using The Hobbit’s Choice Restaurant survey SPSS data set, perform the proper analysis, and interpret the findings for each of the following questions specified by Cory
Rogers.
1. What are the population estimates for each of the following?
a. Preference for “easy listening” radio programming
Viewing of 10 P.M. local news on TV
Subscribe to City Magazine
Average age of heads of households
Average price paid for an evening meal entrée
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright © 2003 by Pearson Prentice Hall.
2009934199
b.
c.
d.
e.
6160811_CH16
11/8/06
4:08 PM
Page 485
Case 16.2
2009934199
2. Because Jeff Dean’s restaurant will be upscale, it will appeal to high-income consumers. Jeff hopes that at least 25 percent of the households have an income level of
$100,000 or higher. Test this hypothesis.
3. With respect to those who are “very likely” to patronize The Hobbit’s Choice
Restaurant, Jeff believes that they will either “very strongly” or “somewhat” prefer each
of the following: (a) waitstaff with tuxedos, (b) unusual desserts, (c) large variety of
entrées, (d) unusual entrées, (e) elegant décor, and (f) jazz combo music. Does the
survey support or refute Jeff’s hypotheses? Interpret your findings.
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright © 2003 by Pearson Prentice Hall.
485