p - HKUST Business School

Transcription

p - HKUST Business School
Karl Schmedders
MANAGERIAL ECONOMICS & DECISION SCIENCES
Visiting Professor of Managerial Economics & Decision
Sciences
PhD, 1996, Operations Research, Stanford University
MS, 1992, Operations Research, Stanford University
Vordiplom, 1990, Business Engineering,
Universitat Karlsruhe, Highest Honors, Ranked first in a
class of 350
EMAIL: [email protected]
OFFICE: Jacobs Center Room 528
Karl Schmedders is Associate Professor in the Department of Managerial Economics and
Decision Sciences. He holds a PhD in Operations Research from Stanford University.
Professor Schmedders’ research interests include computational economics, general equilibrium
theory, asset pricing and portfolio selection. His work has been published in Econometrica, The
Review of Economic Studies, The Journal of Finance, and many other academic journals. He
teaches courses in decision science both in the MBA and the EMBA program at Kellogg.
Professor Schmedders has been named to the Faculty Honor Roll in every quarter he has taught
at Kellogg. He has received numerous teaching awards, including the 2002 Lawrence G.
Lavengood Outstanding Professor of the Year. Professor Schmedders is the only Kellogg
faculty member to receive the ‘Ehrenmedaille’ (Honorary Medal) of Kellogg’s partner school
WHU.
Research Interests
Mathematical economics, in particular general equilibrium models involving time and
uncertainty
Asset pricing
Mathematical programming
KH19, Course Description
Managerial Statistics
Course Description
In this course we will cover the following topics:
 Confidence Intervals
 Hypothesis Tests
 Regression Analysis
Our objective is to quickly cover the first two topics. While they are important by
themselves many people describe them as rather “dry” course material. However, they
will be of great help to us when we cover the main subject of the course, regression
analysis. Regressions are extremely useful and can deliver eye-opening insights in many
managerial situations. You will solve some entertaining case studies which show the
power of regression analysis.
We will cover the material in this case packet as well as the following chapters of the
textbook:







Sections 13.1 and 13.2 of chapter 13;
Section 14.1 of chapter 14;
Chapter 15;
Chapter 16;
Chapter 19;
Chapter 21;
Chapter 23.
Time permitting, we will also cover parts of chapter 25.
There will be several team assignments. After the conclusion of the course, there will be
an in-class final exam on the first day of the following module, that is, on April 1, 2016.
The final grades in this course will be determined as follows.
Team assignments:
Class participation:
Final Exam:
40%
10%
50%
1
KH19, Course Description
In case you would like to prepare for our course, you should start reading the relevant
sections of Chapters 13 and 14 in our textbook. Before you do that, please also consider
the following suggestions.
1) Review the material on the normal distribution from your probability course. In
particular, you should review the use of the functions NORMDIST, NORMSDIST,
NORMINV and NORMSINV in Excel.
2) We will use the software KStat that was developed at Kellogg. Ideally you should
install KStat on your laptop before our first class.
I realize that all of you are very busy and you may not have the time to prepare at length
for our course. Please note, however, that the better you prepare the faster we can cover
the early parts of the course material and the more time we have for the fun part, the
coverage of regression analysis.
Of course, I am happy to help you with your preparation. Please do not hesitate to contact
me with any questions or concerns. My email address is
[email protected].
2
When Scientific Predictions Are So Good They're Bad - The New York Times
Page 1 of 3
September 29, 1998
When Scientific Predictions Are So Good They're Bad
By WILLIAM K. STEVENS
NOAH had it easy. He got his prediction straight from the horse's mouth and was left in no doubt about what to do.
But when the Red River of the North was rising to record levels in the spring of 1997, the citizens and officials of Grand
Forks, N.D., were not so privileged. They had to rely on scientists' predictions about how high the water would rise. And in
this case, Federal experts say, the flood forecast may have been issued and used in a way that made things worse.
The problem, the experts said, was that more precision was assigned to the forecast than was warranted. Officials and citizens
tended to take as gospel an oft-repeated National Weather Service prediction that the river would crest at a record 49 feet.
Actually, there was a wider range of probabilities; the river ultimately crested at 54 feet, forcing 50,000 people to abandon
their homes fast. The 49-foot forecast had lulled the town into a false sense of security, said Dr. Roger A. Pielke Jr. of the
National Center for Atmospheric Research in Boulder, Colo., a consultant on a subsequent inquiry by the weather service.
In fixating on the single number of 49 feet, the people involved in the Grand Forks disaster made a common error in the use
of predictions and forecasts, experts who have studied the case say. It was, they say, a case of what Alfred North Whitehead,
the mathematician and philosopher, once termed ''misplaced concreteness.'' And whether the problem is climate change,
earthquakes, droughts or floods, they say the tendency to overlook uncertainties, margins of error and ranges of probability
can lead to damaging misjudgments.
The problem was the topic of a workshop this month at Estes Park, Colo. In part, participants said, the problem arises because
decision makers sometimes want to avoid making hard choices in uncertain situations. They would rather place responsibility
on the predictors.
Scientifically based predictions, typically using computerized mathematical models, have become pervasive in modern
society. But only recently has much attention been paid to the proper use -- and misuse -- of predictions. The Estes Park
workshop, of which Dr. Pielke was an organizer, was an attempt to come to grips with the question. The workshop was
sponsored by the Geological Society of America and the National Center for Atmospheric Research.
People have predicted and prophesied for millenniums, of course, through means ranging from the visions of shamans and the
warnings of biblical prophets to the examination of animal entrails. With the arrival of modern science, people teased out
fundamental laws of physical and chemical behavior and used them to make better and better predictions.
But once science moves beyond the relatively deterministic processes of physics and chemistry, prediction gets more
complicated and chancier. The earth's atmosphere, for instance, often frustrates efforts to predict the weather and long-term
climatic changes because scientists have not nailed down all of its physical workings and because a substantial measure of
chaotic unpredictability is inherent in the climate system. The result is a considerable range of uncertainty, much more so than
is popularly associated with science. So while computer modeling has often made reasonable predictions possible, they are
always uncertain; results are by definition a model of reality, not reality itself.
The accuracy of predictions varies widely. Some, like earthquake forecasts, have proved so disappointing that experts have
turned instead to forecasting longer-term earthquake potential in a general sense and issuing last-second warnings to distant
communities once a quake has begun.
In some cases, the success of a prediction is near impossible to judge. For instance, it will take thousands of years to know
whether the environmental effects of buried radioactive waste will be as predicted.
On the other hand, daily weather forecasts are checked almost instantly and are used to improve the next day's forecast. But
weather forecasting is also a success, the assembled experts agreed, because people know its shortcomings and take them into
consideration. Weather forecasts ''are wrong a lot of the time, but people expect that and they use them accordingly,'' said
http://www.nytimes.com/1998/09/29/science/when-scientific-predictions-are-so-good-they... 7/15/2009
When Scientific Predictions Are So Good They're Bad - The New York Times
Page 2 of 3
Robert Ravenscroft, a Nebraska rancher who attended the workshop as a ''user'' of predictions.
A prediction is to be distrusted, workshop participants said, when it is made by the group that will use it as a basis for policy
making -- especially when the prediction is made after the policy decision has been taken. In one example offered at the
workshop, modeling studies purported to show no harmful environmental effects from a gold mine that a company had
decided to dig.
Another type of prediction miscue emerged last March in connection with asteroids, the workshop participants were told by
Dr. Clark R. Chapman, a planetary scientist at the Southwest Research Institute in Boulder. An astronomer erroneously
calculated that there was a chance of one-tenth of 1 percent that a mile-wide asteroid would strike Earth in 30 years. The
prediction created an international stir but was withdrawn a day later after further evidence turned up.
This ''uncharacteristically bad'' prediction, said Dr. Chapman, would not have been issued had it been subjected to normal
review by the forecaster's scientific peers. But, he said, there was no peer-review apparatus set up to make sure that ''off-thewall predictions don't get out.'' (Such a committee has since been established by NASA.)
Most sins committed in the name of prediction, however, appear to stem from the uncertainty inherent in almost all forecasts.
''People don't understand error bars,'' said one scientist, referring to margins of error. Global climate change and the Red River
flood offer two cases in point.
Computer models of the climate system are the major instruments used by scientists to project changes in climate that might
result from increasing atmospheric concentrations of heat-trapping gases, like carbon dioxide, emitted by the burning of fossil
fuels.
Basing its forecast on the models, a panel of scientists set up by the United Nations has projected that the average surface
temperature of the globe will rise by 2 to 6 degrees Fahrenheit, with a best estimate of 3.5 degrees, in the next century, and
more after that. This compares with a rise of 5 to 9 degrees since the depths of the last ice age. The temperature has increased
by about 1 degree over the last century.
But the magnitude and nature of any climate changes produced by any given amount of carbon dioxide are uncertain.
Moreover, it is unclear how much of the gas will be emitted over the next few years, said Dr. Jerry D. Mahlman, a workshop
participant who directs the National Oceanic and Atmospheric Administration's Geophysical Fluid Dynamics Laboratory at
Princeton, N.J. The laboratory is one of the world's major climate modeling centers, and the oldest.
This uncertainty opens the way for two equal and opposite sins of misinterpretation. ''The uncertainty is used as a reason for
governments not to act,'' in the words of Dr. Ronald D. Brunner, a political scientist at the University of Colorado at Boulder.
On the other hand, people often put too much reliance on the precise numbers.
In the debate over climate change, the tendency is to state all the uncertainties and caveats associated with the climate model
projections -- and then forget about them, said Dr. Steve Rayner, a specialist in global climate change in the District of
Columbia office of the Pacific Northwest National Laboratory. This creates a ''fallacy of misplaced confidence,'' he said,
explaining that the specific numbers in the model forecasts ''take on a validity not allowed by the caveats.'' This tendency to
focus unwisely on specific numbers was termed ''fallacious quantification'' by Dr. Naomi Oreskes, a historian at the
University of California at San Diego.
Where uncertainty rules, many at the workshop said, it might be better to stay away from specific numbers altogether and
issue a more generalized forecast. In climate change, this might mean using the models as a general indication of the direction
in which the climate is going (whether it is warming, for instance) and of the approximate magnitude of the change, while
taking the numbers with a grain of salt.
None of which means that the models are not a helpful guide to public policy, said Dr. Mahlman and other experts. For
example, the models say that a warming atmosphere, like today's, will produce heavier rains and snows, and some evidence
suggests that this is already happening in the United States, possibly contributing to damaging floods. Local planners might be
well advised to consider this, Dr. Mahlman said.
One problem in Grand Forks was that lack of experience with such a damaging flood aggravated the uncertainty of the flood
forecast. Because the river had never before been observed at the 54-foot level, the models on which the prediction was based
were ''flying blind,'' said Dr. Pielke; there was no historical basis on which to produce a reliable forecast.
But this was apparently lost on local officials and the public, who focused on the specific forecast of a 49-foot crest. This
number was repeated so often, according to the report of an inquiry by the National Weather Service, that it ''contributed to an
http://www.nytimes.com/1998/09/29/science/when-scientific-predictions-are-so-good-they... 7/15/2009
When Scientific Predictions Are So Good They're Bad - The New York Times
Page 3 of 3
impression of certainty.'' Actually, the report said, the 49-foot figure ''created a sense of complacency,'' because it was only a
fraction of a foot higher than the record flood of 1979, which the city had survived.
''They came down with this number and people fixated on it,'' Tom Mulhern, the Grand Forks communications officer, said in
an interview. The dikes protecting the city had been built up with sandbags to contain a 52-foot crest, and everyone figured
the town was safe, he said.
It is difficult to know what might have happened had the uncertainty of the forecast been better communicated. But it is
possible, said Mr. Mulhern, that the dikes might have been sufficiently enlarged and people might have taken more steps to
preserve their possessions. As it was, he said, ''some people didn't leave till the water was coming down the street.''
Photo: Petty Officer Tim Harris patroled an area of Grand Forks, N.D., in April 1997, where the Red River flooded the houses
up to the second story. Residents, relying on the precision of forecasts, were forced to flee quickly. (Reuters)(pg. F6)
Copyright 2009 The New York Times Company
Home
Privacy Policy
Search
Corrections
XML
Help
Contact Us
Back to Top
http://www.nytimes.com/1998/09/29/science/when-scientific-predictions-are-so-good-they... 7/15/2009
Managerial Statistics
KH 19
1 – Sampling
Course material adapted from Chapters 13.1, 13.2, and 14.1 of our textbook
Statistics for Business, 2e © 2013 Pearson Education, Inc.
Learning Objectives







Describe why sampling is important
Understand the implications of sampling variation
Explain the flaw of averages
Define the concept of a sampling distribution
Determine the mean and standard deviation for the
sampling distribution of the sample mean
Describe the Central Limit Theorem and its importance
Determine the mean and standard deviation for the
sampling distribution of the sample proportion
2
Tools of Business Statistics

Descriptive statistics


Collecting, presenting, and describing data
Inferential statistics

Drawing conclusions and/or making decisions
concerning a population based only on
sample data
3
Populations and Samples

A Population is the set of all items or individuals
of interest


Examples:
All likely voters in the next election
All parts produced today
All sales receipts for March
A Sample is a subset of the population

Examples:
1000 voters selected at random for interview
A few parts selected for destructive testing
Random receipts selected for audit
4
Properties of Samples


A representative sample is a sample that reflects the composition of the entire population.
A sample is biased, if a systematic error occurs
in the selection of the sample. For example, the
sample may systematically omit a portion of the
population.
5
Population vs. Sample
Population
a b
Sample
cd
b
ef gh i jk l m n
o p q rs t u v w
x y
z
c
gi
o
n
r
u
y
6
Why Sample?

Less time consuming than a census

Less costly to administer than a census

It is possible to obtain statistical results of a
sufficiently high precision based on samples
7
Two Surprising Properties


Surprise 1: The best way to obtain a representative sample is to pick members of the
population at random.
Surprise 2: Larger populations do not require
larger samples.
8
Randomization



A randomly selected sample is representative of
the whole population (avoids bias).
Randomization ensures that on average a
sample mimics the population.
Randomization enables us to infer characteristics of the population from a sample.
9
Comparison of Two Random Samples
Two large samples (each with 8,000 data points)
drawn at random from a population of 3.5
million customers of a bank
10
(In)Famous Biased Sample
The Literary Digest predicted a landslide defeat
for Franklin D. Roosevelt in the 1936 presidential election. They selected their sample from,
among others, a list of telephone numbers. The
size of their sample was about 2.4 million!
Telephones were a luxury during and soon after
the Great Depression. Roosevelt’s supporters
tended to be poor and were grossly underrepresented in the sample.
11
Simple Random Sample (SRS)



A Simple Random Sample (SRS) is a sample
of n data points chosen by a method that has
an equal chance of picking any sample of size n
from the population.
An SRS is the standard to which all other
sampling methods are compared.
An SRS is the foundation for virtually all of the
theory of statistics.
12
Inferential Statistics

Making statements about a population by
examining sample results
Sample statistics
(known)
Population parameters
Inference
Sample
(unknown, but can
be estimated from
sample evidence)
Population
13
Tools of Inferential Statistics
Drawing conclusions and/or making decisions
concerning a population based on sample results.

Estimation


Example: Estimate the population
mean age using the sample mean
age.
Hypothesis Testing

Example: Use sample evidence to
test the claim that the population
mean age is 40.5 years.
14
Estimating Parameters



Parameter: a characteristic of the population
(e.g., mean µ)
Statistic: an observed characteristic of a
sample (e.g., sample average y , x )
Estimate: using a statistic to approximate a
parameter
15
Notation for Statistics and Parameters
16
Sampling Variation



Sampling Variation is the variability in the
value of a statistic from sample to sample.
Two samples from the same population will
rarely (if ever) yield the same estimate.
Sampling variation is the price we pay for
working with a sample rather than the
population.
17
The Flaw of Averages
18
The Flaw of Averages
(continued)
Our culture encodes a strong bias either
to neglect or ignore variation. We tend to
focus instead on measures of central
tendency, and as a result we make some
terrible mistakes, often with considerable
practical import.
Stephen Jay Gould, 1941 – 2002,
evolutionary biologist, historian of science
19
Point Estimates



A sample statistics is a point estimate. It provides a single number (e.g. the sample mean) for
an unknown population parameter (e.g. the
population mean).
A point estimate delivers no information on the
possible sampling variation.
A key step in any careful statistical analysis is to
quantify the effect of sampling variation.
20
Definitions

An estimator of a population parameter is



a random variable that depends on sample
information . . .
whose value provides an approximation to this
unknown parameter.
A specific value of that random variable is
called an estimate.
21
Sampling Distributions

The sampling distribution is the probability distribution that describes how a
statistic, such as the mean, varies from
sample to sample.
22
Testing of GPS Chips
A manufacturer of GPS chips selects samples for
highly accelerated life testing (HALT).
HALT scores range from 1 (failure on first test) to
16 (chip endured all 15 tests without failure).
Even when the production process is functioning
normally, there is variation among HALT scores.
23
Testing 400 Chips
Distribution of individual HALT scores
24
Distribution of Daily Average Scores
Distribution of average HALT scores
(54 samples, each with sample size n=20)
25
Benefits of Averaging


Averaging reduces variation: The sample-tosample variance among average HALT scores
is smaller than the variance among individual
HALT scores.
The distribution of average HALT scores
appears more “bell shaped” than the
distribution of individual HALT scores.
26
Sampling Distributions
Sampling
Distributions
Sampling
Distribution of
Sample
Mean
Sampling
Distribution of
Sample
Proportion
27
Expected Value of Sample Mean



Let x1, x2, . . . , xn represent a random sample
from a population.
The sample mean value of these observations
is defined as
1 n
x   xi
n i 1
The random variable “sample mean” is denoted
by X and its specific value in the sample by x .
28
Standard Error of the Mean



Different samples from the same population will
yield different sample means.
A measure of the variability in the mean from
sample to sample is given by the Standard
Error of the Mean:
σ
SE(X) 
n
The standard error of the mean decreases as
the sample size increases.
29
Standard Error of the Mean
(continued)


The standard error is proportional to σ. As
population data become more variable, sample
averages become more variable.
The standard error is inversely proportional to
the square root of the sample size n. The larger
the sample size, the smaller the sampling
variation of the averages.
30
If the Population is Normal

If a population is normally distributed with mean
μ and standard deviation σ, then the sampling
distribution of the sample mean X is also
normally distributed with
E(X)  μ
and
SE(X) 
σ
n
31
Sampling Distribution Properties
E(X)  μ
Normal Population
Distribution
( X is unbiased )
μ
x
E(X)  
x
Normal Sampling
Distribution
(has the same mean)
32
Sampling Distribution Properties
(continued)
As n increases,
Larger
sample size
SE(X) decreases
Smaller
sample size
x
μ
33
If the Population is not Normal

We can apply the Central Limit Theorem:


Even if the population is not normal,
… sample means from the population will be
approximately normal as long as the sample size is
large enough.
Properties of the sampling distribution:
E(X)  μ
and
SE(X) 
σ
n
34
Central Limit Theorem
As the
sample
size gets
large
enough …
the sampling
distribution
becomes
almost normal
regardless of
shape of
population.
n↑
x
35
If the Population is not Normal
(continued)
Sampling distribution
properties:
Population Distribution
Central Tendency
E(X)  μ
Variation
SE(X) 
σ
n
x
μ
Sampling Distribution
(becomes normal as n increases)
Larger
sample
size
Smaller
sample size
E(X)  μ
x
36
How Large is Large Enough?


For most distributions, a sample size of n > 30
will give a sampling distribution that is nearly
normal.
For normal population distributions, the sampling
distribution of the mean is always normally
distributed regardless of the sample size.
37
More Formal Condition
Sample Size Condition for an application of the
central limit theorem:
A normal model provides an accurate approximation to the sampling distribution of X if the
sample size n is larger than 10 times the
squared skewness and larger than 10 times the
absolute value of the kurtosis,
n  10K 32
and
n  10 K 4 .
38
Average HALT Scores
Design of the chip-making process indicates that
the HALT score of a chip has a mean µ = 7 with
a standard deviation σ = 4.
Sampling distribution of average HALT scores
(n = 20)

σ2
42

 0 .89 2
X ~N  μ  7 ,
n
20




39
Average HALT Scores
(continued)
The sampling distribution of average HALT scores
is (approximately) a normal distribution with
mean 7 and standard deviation 0.89.
40
Sampling Distributions of
Sample Proportions
Sampling
Distributions
Sampling
Distribution of
Sample
Mean
Sampling
Distribution of
Sample
Proportion
41
Population Proportions p
p = the proportion of the population having some
characteristic

Sample proportion ( p̂ ) provides an estimate of p:
pˆ 


# items in the sample with the characteristic of interest
sample size
0 ≤ p̂ ≤ 1
p has a binomial distribution, but can be approximated
by a normal distribution when n is large enough
42
Sampling Distribution

Normal approximation:
Sampling Distribution
.3
.2
.1
0
Properties:
E(pˆ )  p
0
and
σ 2pˆ 
.2
.4
.6
8
1
p̂
p( 1  p)
n
(where p = population proportion)
43
Sample Size Condition
Sample size condition for proportions,
npˆ  10
and
n(1  pˆ )  10 .
If this condition holds, then the distribution of the
sample proportion p̂ is approximately a normal
distribution.
44
Take Aways

Understand the notion of sampling variation.

Appreciate the dangers of the flaw of averages.

Grasp the concept of a sampling distribution.

Have an idea of the central limit theorem.

Know the sampling distributions of a sample
mean and of a sample proportion.
45
Pitfalls


Do not confuse a sample statistic for the
population parameter.
Do not fall for the flaw of averages.
46
Managerial Statistics
KH 19
2 – Confidence Intervals
Course material adapted from Chapter 15 of our textbook
Statistics for Business, 2e © 2013 Pearson Education, Inc.
Learning Objectives



Distinguish between a point estimate and a
confidence interval estimate
Construct and interpret a confidence interval of
a population proportion
Construct and interpret a confidence interval of
a population mean
2
Point and Interval Estimates


A point estimate is a single number.
A Confidence Interval provides additional
information about variability.
Lower
Confidence
Limit
Upper
Confidence
Limit
Point Estimate
Width of confidence interval
3
Point Estimates
We can estimate a
population parameter …
with a sample statistic
(a point estimate)
mean
μ
x
proportion
p
p̂
4
Confidence Interval Estimate

An interval gives a range of values:





Takes into consideration variation in sample
statistics from sample to sample
Based on observation from a single sample
Provides more information about a population
characteristic than does a point estimate
Relies on the sampling distribution of the
statistic.
Stated in terms of level of confidence

Can never be 100% confident
5
Estimation Process
Random Sample
Population
(mean, μ, is
unknown)
Mean
x = 50
I am 95%
confident that
μ is between
40 and 60.
Sample
6
General Formula

The general formula for all confidence
intervals is:
Point Estimate  (Reliability Factor)(Standard Error)

The value of the reliability factor depends
on the desired level of confidence.
7
Confidence Intervals
Confidence
Intervals
Population
Mean
Population
Proportion
8
Confidence Interval for the Proportion
Recall that the Central Limit Theorem implies a
normal model for the sampling distribution of p̂.
E( p̂ ) = p and SE( p̂ ) =
p (1  p ) / n
SE( p̂ ) is called the Standard Error of the
Proportion.
9
Interpretation
The sample statistic in 95% of samples lies within
1.96 standard errors of the population parameter.
10
Interpretation
(continued)

Probability that sample proportion p̂ deviates by
less than 1.96 standard errors of the proportion
from the true (but unknown) population proportion p is 95%.
P( –1.96 SE(p̂ ) ≤ p – p̂ ≤ +1.96 SE(p̂ ) ) = 0.95.
11
95% Confidence Interval for p



For 95% of samples, the interval formed by
reaching 1.96 standard errors to the left and right
of p̂ will contain p.
Problem: We do not know the value of the
standard error of the proportion, SE(p̂ ), since it
depends on the true (but unknown) parameter p.
We estimate this standard error using p̂ in place
of p,
se(pˆ ) 
pˆ ( 1  pˆ )
n
12
Confidence Interval for p
The 100(1 – α)% confidence interval for p is
pˆ  zα/ 2
pˆ ( 1  pˆ )
 p  pˆ  zα/ 2
n
pˆ ( 1  pˆ )
n
where



z/2 is the standard normal value for the level of
confidence desired (“reliability factor”)
p̂ is the sample proportion
n is the sample size
13
Finding the Reliability Factor, z/2

Consider a 95% confidence interval:
1    .95
α
 .025
2
z units: z =
p units:
α
 .025
2
-1.96
Lower
Confidence
Limit
0
Point Estimate
z=
1.96
Upper
Confidence
Limit
14
Common Levels of Confidence

Most commonly used confidence level is 95%.
Confidence
Level
80%
90%
95%
98%
99%
99.8%
99.9%
Confidence
Coefficient,
1 
Z/2 value
.80
.90
.95
.98
.99
.998
.999
1.28
1.645
1.96
2.33
2.58
3.08
3.27
15
Affinity Credit Card
Before deciding to offer an affinity credit card to
alumni of a university, the credit card company
wants to know how many customers will accept
the offer.
Population: Alumni of the university
Parameter of interest: Proportion p of alumni who
will return the application for the credit card
16
SRS of Alumni
Question: What should we conclude about the
proportion p in the population of 100,000 alumni
who will accept the offer if the card is launched
on a wider scale?
Method: Construct a confidence interval based on
the results of a simple random sample.
17
SRS of Alumni
(continued)
The credit card issuer sent preapproved applications to a sample of 1000 alumni. Of these, 140
accepted the offer and received the card.
Summary Statistics:
18
Checklist for Application of Normal


SRS condition. The sample is a simple random
sample from the relevant population.
Sample size condition (for proportion). Both npˆ
and n(1  pˆ ) are larger than 10.
19
Credit Card: Confidence Interval

The estimated standard error is
se( pˆ ) 

0 .14 ( 1  0 .14 )
 0 .01097
1000
The 95% confidence interval is
0.14 ± 1.96 × 0.01097 ≈ [0.1185, 0.1615]
20
Credit Card: Conclusion


With 95% confidence, the population proportion
that will accept the offer is between 11.85% and
16.15%.
If the bank decides to launch the credit card,
might 20% of the alumni accept the offer? It’s
not impossible but rather unlikely given the
information in our sample; 20% is outside the
95% confidence interval for the unknown
proportion p.
21
Margin of Error

The confidence interval,
pˆ  zα/ 2
pˆ ( 1  pˆ )
 p  pˆ  zα/ 2
n
pˆ ( 1  pˆ )
n
can also be written as pˆ  ME
where ME is called the Margin of Error,
ME  zα/ 2
pˆ ( 1  pˆ )
n
22
Reducing the Margin of Error
The width of the confidence interval is equal to
twice the margin of error.
ME  zα/ 2
pˆ ( 1  pˆ )
n
The margin of error can be reduced if


the sample size is increased (n↑), or
the confidence level is decreased, (1 – ) ↓ .
23
Margin of Error in the News

You often read in the news statements like the
following:
The CNN/USA Today/Gallup poll taken March 7-10
showed that 52% of Americans say… . The poll had a
margin of error of plus or minus four percentage points.


No confidence level is given!
The assumed confidence level is typically 95%.
In addition, the 1.96 is rounded up to 2.
24
Margin of Error in the News
(continued)

For an interpretation of this statement we use
the confidence interval formula
pˆ  ME
where ME = 0.04 ≥ 2

pˆ ( 1  pˆ )
.
n
We can have (slightly more than) 95%
confidence that the true proportion of
Americans saying … is between 48% and 56%.
25
Confidence Intervals
Confidence
Intervals
Population
Mean
Population
Proportion
26
Sampling Distribution of the Mean
Recall that the Central Limit Theorem implies a
normal model for the sampling distribution of X.
E( X ) = μ and
SE(X) 
σ
n
SE( X ) is called the Standard Error of the Mean.
27
Interpretation

Probability that sample mean X deviates by
less than 1.96 standard errors of the mean from
the true (but unknown) population mean μ is
95%.
P( –1.96 SE( X ) ≤ μ – X ≤ +1.96 SE(X )) = 0.95.

Once again, the sample statistic lies within
about two standard errors of the corresponding
population parameter in 95% of samples.
28
Confidence Interval for μ



Since the population standard deviation σ is
unknown, we estimate it using the sample
standard deviation, s.
n
(xi  x )2

s  i 1
n-1
This step introduces extra uncertainty, since s
is variable from sample to sample.
As an adjustment, we use the t-distribution
instead of the normal distribution.
29
Student’s t-Distribution

Consider an SRS of n observations



with mean x and standard deviation s
from a normally distributed population with mean μ.
Then the variable
Tn 1 
X μ
S/ n
follows the Student’s t-distribution with (n - 1) degrees
of freedom.
30
Student’s t-Distribution


The t-distribution is a family of distributions.
The t-value depends on the degrees of
freedom (df).

Number of observations that are free to vary after
sample mean has been calculated
df = n – 1
31
Student’s t-Distribution
Note: t
(continued)
Z as n increases
Standard
Normal
(t with df = ∞)
t (df = 13)
t-distributions are bellshaped and symmetric, but
have ‘fatter’ tails than the
normal
t (df = 5)
0
t
32
t distribution values
With comparison to the Z value
Confidence
t
Level
(df = 10)
t
(df = 20)
t
Z
(df = 30) ____
.80
1.372
1.325
1.310
1.282
.90
1.812
1.725
1.697
1.645
.95
2.228
2.086
2.042
1.960
.99
3.169
2.845
2.750
2.576
Note: t
Z as n increases
33
Confidence Interval for μ

Assumptions




Population is normally distributed.
If population is not normal, use “large” sample.
Use Student’s t-Distribution
100(1-α)% Confidence Interval for μ:
x  tα/ 2 ,n-1
s
s
 μ  x  tα/ 2 ,n-1
n
n
where t α/2,n-1 is the reliability factor from the t-distribution with
n-1 degrees of freedom and an area of α/2 in each tail.
34
Affinity Credit Card
Before deciding to offer an affinity credit card to
alumni of a university, the credit card company
wants to know how large a balance those alumni
will carry who accept the offer.
Population: (Future) credit card balances of (future)
customers among the alumni of the university
Parameter of interest: Mean μ of (future) balances
carried by alumni on their affinity credit card
35
SRS of Alumni
The 140 alumni who accepted the offer and received
the affinity credit card have been carrying an average monthly balance of x = $1990.50 with a
standard deviation of s = $2,833.33.
36
SRS of Alumni
(continued)
Question: What should we conclude about the
average future credit card balance μ on the new
affinity credit card for this particular university?
Method: Construct confidence interval.
37
Checklist for Application of Normal


SRS condition. The sample is a simple random
sample from the relevant population.
Sample size condition (for mean). The sample
size is larger than 10 times the squared skewness and 10 times the absolute value of the
kurtosis.
38
Credit Card: Confidence Interval
The estimated standard error is
se ( X ) = 2,833.33 / 140 = 239.46.
The t-value for a 95% confidence interval with 139
degrees of freedom is
T.INV.2T(0.05,139) = 1.97718.
The 95% confidence interval is
1,990.50 ± 1.97718 × 239.46
= [1517.04, 2463.96].
39
Credit Card: Conclusion


We are 95% confident that the true but unknown
µ lies between $1,517.04 and $2,463.96.
If the bank decides to launch the credit card,
might the average balance be $1,250? It’s not
impossible but based on the sample results it’s
rather unlikely.
40
Confidence Interval and
Confidence Level


If P(a ≤ p ≤ b) = 1 -  then the interval from a to b
is called a 100(1 - )% confidence interval of p.
The quantity (1 - ) is called the confidence level
of the interval ( between 0 and 1).

In repeated samples of the population, the true value
of the parameter p would be contained in 100(1 - )%
of intervals calculated this way.
41
Intervals and Level of Confidence
Sampling distribution of the proportion
α/ 2
Intervals
extend from
pˆ  zα/ 2 se( pˆ )
1 α
E( pˆ )  p
p̂
p̂
to
pˆ  zα/ 2 se( pˆ )
α/ 2
p
100(1-)%
of intervals
constructed
contain p;
100()% do
not.
Confidence Intervals
42
Confidence Level, (1-)

Suppose confidence level = 95%

Also written (1 - ) = 0.95

A relative frequency interpretation:

From repeated samples, 95% of all the
confidence intervals that can be constructed will
contain the unknown true parameter.
43
Common Confusions:
Wrong Interpretations

95% of all customers keep a balance of $1,517
to $2,464.


The CI gives a range for the population mean µ, not
the balance of individual customers.
The mean balance of 95% of samples of 140
accounts will fall between $1,517 and $2,464.

The CI provides a range for µ, not the means of other
samples.
44
Common Confusions:
Wrong Interpretations
(continued)

The mean balance is between $1,517 and
$2,464.

The average balance in the population may not fall
within the CI. The confidence level of the interval is
95%. It may not contain µ.
45
Correct Interpretation


We are 95% confident that the mean monthly
credit card balance for the population of
customers who accept an application lies
between $1,517 and $2,464.
The phrase “95% confident” is our way of
saying that we are using a procedure that
produces an interval containing the unknown
mean in 95% of samples.
46
Transforming Confidence Intervals
Obtaining Ranges for Related Quantities
If [L,U] is a 100(1 – α)% confidence interval for
µ, then [c×L,c×U] is a 100 (1 – α)% confidence
interval for c×µ and [c+L,c+U] is a 100(1 – α)%
confidence interval for c+µ.
47
Application: Property Taxes
Motivation
A mayor is considering a tax on business that is
proportional to the amount spent to lease
property in her city. How much revenue would
a 1% tax generate?
48
Property Taxes
Method
Need a confidence interval for µ (average cost of
a lease) to obtain a confidence interval for the
amount raised by the tax. Check conditions (SRS
and sample size) before proceeding.
49
Property Taxes
(continued)
Mechanics
Univariate statistics
mean
standard deviation
standard error of the mean
Total Lease Cost
478,603.48
535,342.56
35,849.19
minimum
median
maximum
range
20,409.00
290,559.00
2,820,213.00
2,799,804.00
skewness
kurtosis
1.953
4.138
number of observations
t-statistic for computing
95%-confidence intervals
223
1.9707
50
Property Taxes
(continued)
Mechanics
95% confidence interval for average lease cost
478603 ± 1.9707 × 35849
= [407955, 549252]
95% confidence interval for average tax revenue
per business
0.01 × [407955, 549252]
= [4079.55, 5492.52]
51
Conclusion
Message
We are 95% confident that the average cost of
a lease is between $407,955 and $549,252.
The 95% confidence interval for tax raised per
business is therefore [$4079, $5493]. Since the
number of businesses leased in the city is
4,500, we are 95% confident that the amount
raised will be between $18,358,000 and
$24,716,000.
52
Best Practices




Be sure that the data are an SRS from the
population.
Stick to 95% confidence intervals.
Round the endpoints of intervals when
presenting the results.
Use full precision for intermediate calculations.
53
Pitfalls



Do not claim that a 95% confidence interval
holds µ.
Do not use a confidence interval to describe
other samples.
Do not manipulate the sampling to obtain a
particular confidence interval.
54
Managerial Statistics
KH 19
3 – Hypothesis Tests
Course material adapted from Chapter 16 of our textbook
Statistics for Business, 2e © 2013 Pearson Education, Inc.
Learning Objectives

Formulate null and alternative hypotheses for
applications involving

a single population proportion

a single population mean

Execute the four steps of a hypothesis test

Know how to use and interpret p-values

Know what Type I and Type II errors are
2
Motivating Example
An office manager is evaluating software to filter
SPAM e-mails (cost $15,000). To make it
profitable, the software must reduce SPAM to
less than 20%. Should the manager buy the
software?
The manager wants to test the software.
3
Motivating Example
(continued)
To demonstrate how well the software works, the
software vendor applied its filtering system to
email arriving at the office. After passing
through the filter, a sample of 100 messages
contained only 11% spam (and no valid
messages were removed).
4
Motivating Example
(continued)
Question: Okay, 11% is better than 20%. But
does that mean the manager should buy this
software?
Method: Use a Hypothesis Test to answer this
question.
Idea: Use the sample result, pˆ  0.11, to decide
whether the software will be profitable, p < 0.2.
5
What is a Hypothesis?

A hypothesis is a claim about the value of
an unknown parameter:

population proportion
Example: The proportion of spam will be below
20%, that is, p < 0.2.

population mean
Example: The average monthly rent for all rental properties exceeds $500, that is, μ > 500.
6
The Null Hypothesis, H0

The Null Hypothesis, H0, states the claim
to be tested; specifies a default course of
action; preserves the status quo.


Example: The proportion of spam that slips past the
filter is at least 20% (H0: p ≥ 0.2).
H0 is always about a population parameter,
not about a sample statistic.
H0 : p ≥ 0.20
H0 : p̂ ≥ 0.20
7
The Null Hypothesis, H0
(continued)

We begin with the assumption that the null
hypothesis is true.
 Similar idea to the notion of innocent until
proven guilty

Always contains “=” , “≤”, or “” sign

May or may not be rejected
8
The Alternative Hypothesis, Ha

The Alternative Hypothesis, Ha (H1),
is the opposite of the null hypothesis.




Example: The proportion of spam that slips past
the filter is less than 20% (Ha: p < 0.2).
Ha never contains the “=” , “≤”, or “” sign.
Ha may or may not be supported.
Ha is generally the hypothesis that the
decision maker is trying to support.
9
Spam Filter: Hypotheses
Step 1 of a hypothesis test:
Define the hypotheses H0 and Ha.
H0: p ≥ p0 = 0.20
Ha: p < p0 = 0.20
10
Two Possible Options



We may decide to reject H0 (accept Ha).
Alternatively, we may decide not to reject
H0 (we do not accept Ha).
There is no third option.
11
Reason for Rejecting H0
Sampling Distribution of p
0.11
If it is unlikely that
we would get a
sample proportion
of this value ...
p = 0.2
If H0 is true
... if in fact this were the
population proportion…
X
... then we
reject the null
hypothesis that
p ≥ 0.2.
12
Errors in Decision-Making

Type I Error
 Reject a true null hypothesis


Example: Buy software that will not reduce spam to
below 20% of incoming emails.
Considered a serious type of error
Threshold probability of Type I Error is 


Called level of significance or simply level of the test
Set in advance by decision maker
13
Errors in Making Decisions
(continued)

Type II Error
 Fail to reject a false null hypothesis

Example: Do not buy software that would have reduced
spam to below 20% of incoming emails.
The probability of Type II Error is β.
 1-β is also called the power of a test.
14
Outcomes and Probabilities
Possible Hypothesis Test Outcomes
Actual Situation
Key:
Outcome
(Probability)
Decision
H0 True
H0 False
Do Not
Reject
H0
No error
(1 -  )
Type II Error
(β)
Reject
H0
Type I Error
()
No Error
(1-β)
15
Type I & II Errors
 Type I and Type II errors cannot happen at
the same time.

Type I error can only occur if H0 is true.

Type II error can only occur if H0 is false.
16
Evaluation of Hypotheses
 Sample proportion pˆ  0.11 < 0.2. Is this
relationship sufficient to reject the null
hypothesis?
No! The claim is about the population
proportion p. Maybe we just have a lucky
(unlucky?) sample. That is, the test result
may be due to sampling error.
17
Evaluation of Hypotheses
(continued)


Hypothesis tests rely on the sampling distribution
of the statistic that estimates the parameter
specified in the null and the alternative.
Key question: What is the chance of getting a
sample that differs from H0 by as much as (or
even more than) this one if H0 is true?
18
Spam Filter



A sample of size n = 100 delivered a sample
proportion of pˆ  0.11 .
Question: Assuming H0: p ≥ 0.20 is true, how
likely is this deviation of 0.09 (or more)?
Assuming H0 is true, the sampling distribution
of p̂ is approximately normal with mean p =
0.20 and SE( p̂ ) = 0.04 (note that the hypothesized “boundary” value p0 = 0.20 is used to
calculate SE).
19
Spam Filter
(continued)

What is the chance of finding a sample
proportion of pˆ  0.11 or even smaller?
20
Test Statistic
Step 2 of a hypothesis test:
Calculate the test statistic.
z 

pˆ  p 0
p 0 ( 1  p 0 )/n
0 .11  0 .20
0 .20 ( 1  0 .20 )/ 100
  2 .25
21
Meaning of Test Statistic



The test statistic measures the difference
between the sample outcome and the boundary
value of the null hypothesis in multiples of the
standard error.
Spam filter example: The sample proportion lies
2.25 standard errors of the proportion below the
boundary value in the null hypothesis.
Since the sample distribution is assumed to be
normal, the test statistic for proportions is also
called z-statistic.
22
From Test Statistic to Probability


Since the sampling distribution of the sample
proportion is (approximately) normal, we can
calculate the probability of a sample outcome of
at least 2.25 standard errors below the mean.
This probability is the famous p-value.
23
p-value
Step 3 of a hypothesis test:
Calculate the p-value.
p = NORM.S.DIST(-2.25,1) ≈ 0.012
p = NORM.DIST(0.11,0.2,0.04,1) ≈ 0.012
24
Calculating the p-value
p-value =
NORMSDIST(-2.25) = 0.012
-2.25
0
z
pˆ  p0
SE ( pˆ )
Under the null hypothesis (H0: p ≥ 0.2), our sample
proportion is at least 2.25 standard errors below
the population proportion. The probability of such a
sample outcome is 1.2% (p-value).
25
Type I Error and p-value


Question: Suppose we decide to reject H0.
What is the probability of a Type I error?
Answer: The p-value is the (maximal) chance of
a Type I error if H0 is rejected based on the
observed test statistic.
26
Level of Significance



Common practice is to reject H0 only if the pvalue is less than a preset threshold.
This threshold that sets the maximum tolerance
for a Type I error is called level of significance
or α-level.
Statistically significant difference from the null
hypothesis: Data contradicts H0 and leads us to
reject H0 since p-value < α.
27
Decision
Step 4 of a hypothesis test:
Compare p-value to α and make a decision.
p-value = 0.012 < 0.05 = α
We reject H0 and accept the alternative
hypothesis Ha. The spam software reduces the
proportion of spam e-mails to less than 20%.
The office manager should buy the software.
28
Summary
29
Take Aways I
The Four Steps of a Hypothesis Test:
1. Define H0 and Ha.
2. Calculate the test statistic.
3. Calculate the p-value.
4. Compare the p-value to the significance
level α. Make a decision. Accept Ha if pvalue < α.
30
Take Aways II
Hypothesis Testing: The Idea


We always try to prove the alternative hypothesis, Ha.
We then assume that its opposite (the null
hypothesis) is true.


H0 and Ha must be totally exhaustive & mutually
exclusive.
We can never possibly prove H0!
31
Take Aways III

We ask the question: how likely is to obtain our
evidence, given that the null hypothesis is
(supposedly) true?



This probability is called the p-value.
Not likely (small p)  we have statistically
“proven” the alternative hypothesis, so we
reject the null.
Likely (not small p)  we cannot reject the null.
32
Application: Burger King Ads
Motivation
The Burger King ad featuring Coq Roq won critical
acclaim (and resulted in much controversy as well as
several lawsuits). In a sample of 2,500 homes,
MediaCheck found that only 6% saw the ad. An ad must
be viewed by 5% or more of households to be effective.
Based on these sample results, should the local
sponsor run this ad?
33
Burger King Ads
Method
Perform a hypothesis test.
Set up the null and alternative hypotheses.
H0: p ≤ 0.05
Ha: p > 0.05
Use α = 0.05. Note that p is the population
proportion who watches this ad. (Both SRS and
sample size conditions are met.)
34
Burger King Ads
(continued)
Mechanics
Perform the necessary calculations for an
evaluation of the null hypothesis.
z
0.06  0.05
 2.294
0.05(1  0.05) / 2,500
NORM.S.DIST(2.294,1) = 0.9891
p-value = 1 – 0.9891 = 0.0109 < 0.05 = α
Reject H0.
35
Conclusion
Message
The hypothesis test shows a statistically
significant result. We can conclude that more
than 5% of households watch this ad. The
Burger King Coq Roq ad is cost effective and
should be run.
36
Hypothesis Test of a Mean




Hypothesis tests of the mean are similar to tests
of proportions.
H0 and Ha are claims about the unknown
population mean μ. For example,
H0: µ ≤ µ0 and Ha: µ > µ0 .
The test statistic uses the random variable X ,
the sample mean.
Unlike in the test of proportions, the standard
error is not specified since σ is unknown.
37
Hypothesis Test of a Mean
(continued)

Just as in the calculation of a CI we estimate
the unknown population standard deviation σ
with the known sample standard deviation s.
SE(X) 

σ
n
se(X) 
s
n
The resulting test statistic is
t 
x  0
s/ n
38
Hypothesis Test of a Mean
(continued)



In a hypothesis test of a mean the test statistic
is called a t-statistic since the appropriate
sampling distribution is the t-distribution.
Specifically, the distribution of the t-statistic in a
hypothesis test of a mean is the t-distribution
with n-1 degrees of freedom.
We use this distribution to calculate the p-value.
39
Denver Rental Properties
A firm is considering expanding into the Denver
area. In order to cover costs, the firm needs
rents in this area to average more than $500
per month. Are Denver rents high enough to
justify the expansion?
40
Univariate Statistics
Univariate statistics
The firm obtained rents for
a sample of size n = 45;
the average rent was
$647.33 with a sample
std. dev. s = $298.77.
mean
standard deviation
standard error of the mean
Rent ($/Month)
647.3333333
298.7656424
44.53735239
minimum
median
maximum
range
140
610
1600
1460
skewness
kurtosis
0.617
0.992
number of observations
t-statistic for computing
95%-confidence intervals
45
2.0154
41
Hypotheses H0 and Ha


Let µ = mean monthly rent for all rental
properties in the Denver area.
Step 1: Set up the hypotheses.
H0: µ ≤ µ0 = 500
Ha: µ > µ0 = 500
42
Test Statistic

Step 2: Compute the test statistic.
t 
x  0
647 . 33  500

 3 . 308
44.5374
s/ n
The average rent in the sample is 3.308
standard errors of the mean above the
boundary value in the null hypothesis.
43
p-value
Step 3: Calculate the p-value.
T.DIST.RT(3.308,44) = 0.0009394
The p-value is 0.09394% and thus below 0.1%.
44
Make a Decision
Step 4: Compare the p-value to α and make a
decision.
p-value = 0.0009394 < 0.05 = α
We reject H0 and accept Ha. We conclude that
the average rent in the Denver area exceeds
the break-even value.
45
Summary: Tests of a Mean
46
Checklist


SRS condition: the sample is a simple random
sample from the relevant population.
Sample size condition. Unless the population is
normally distributed, a normal model can be
used to approximate the sampling distribution of
if the sample size n is larger than 10 times both
the squared skewness and the absolute value
of the kurtosis.
47
Application: Returns on IBM Stock
Motivation
Does stock in IBM return more, on average,
than T-Bills? From 1980 through 2005, T-Bills
returned 0.5% each month.
48
Returns on IBM Stock
Method
Let µ = mean of all future monthly returns for IBM
stock. Set up the hypotheses as follows (Step 1):
H0: µ ≤ 0.005
Ha: µ > 0.005
The sample consists of monthly returns on IBM for
312 months (January 1980 – December 2005).
49
Returns on IBM Stock
(continued)
Univariate statistics
IBM Return
0.01063365
mean
0.08053206
standard deviation
standard error of the mean 0.00455923
minimum
median
maximum
range
-0.2619
0.0065
0.3538
0.6157
skewness
kurtosis
0.303
1.624
number of observations
t-statistic for computing
95%-confidence intervals
The sample yields
x = 0.01063
s = 0.08053
312
1.9676
50
Returns on IBM Stock
(continued)
Mechanics
Step 2: Calculation of test statistic.
t 
x  μ0
0 .0106  0 .005

 1 .236
0 .004559
s/ n
Step 3: Calculation of p-value.
T.DIST(1.236,311,1) ≈ 0.1088
Step 4: Compare p-value to α = 0.05.
p-value = 0.1088 > 0.05 = α. Do NOT reject H0.
51
Conclusion
Message
According to monthly IBM returns from 1980
through 2005, the IBM stock does not generate
statistically significantly higher earnings than
comparable investments in US Treasury Bills.
52
Failure to Reject H0



Our failure to reject H0 and to prove Ha does
not mean the null is true. We did not prove the
null hypothesis.
Our sample evidence is just too weak to prove
Ha at a 5% or even 10% significance level. If we
had rejected H0, then the chance of making a
Type I error (p-value of about 11%) would have
been too high for the given level of significance.
If the α-level had been 15% then we could have
proven Ha.
53
Significance vs. Importance


Statistical significance does not mean that you
have made a practically important or meaningful discovery.
The size of the sample affects the p-value of a
test. With enough data, a trivial difference from
H0 leads to a statistically significant outcome.
Such a trivial difference may be practically unimportant.
54
Confidence Interval vs. Test

Confidence intervals make positive statements
about the population.


A confidence interval provides a range of parameter
values that are compatible with the observed data.
Hypothesis tests provide negative statements.


A test provides a precise analysis of specific
hypothesized values for a parameter.
A test attempts to reject a specific hypothesis for a
parameter.
55
Two-tailed Hypothesis Test


Hypotheses in a Two-tailed Hypothesis Test
are of the following form:
mean:
H0: µ = 0.005 Ha: µ ≠ 0.005
Ha: p ≠ 0.2
proportion:
H0: p = 0.2
The calculation of the test statistic is identical to
the calculation in a One-tailed Hypothesis
Test.
56
Two-Tailed Hypothesis Test
(continued)



By convention, the p-value in a two-tailed test is
defined as two times the p-value of the
corresponding one-tailed test.
As a consequence, the two-tailed p-value does
not have the intuitive interpretation along the lines
“The probability of the sample result
assuming the null is true”.
This convention leads to a paradox.
57
One-tailed Test on IBM Returns
Step 1: H0: µ ≤ 0.005
Ha: µ > 0.005
Step 2: Calculation of test statistic.
t 
x  0
0 . 0106  0 . 005

 1 . 236
0.004559
s/ n
Step 3: Calculation of p-value.
T.DIST(1.236,311,1) ≈ 0.1088
Step 4: Compare p-value to α = 0.15.
p-value = 0. 1088 < 0.15 = α.
Reject H0.
58
Two-tailed Test on IBM Returns
Step 1: H0: µ = 0.005
Ha: µ ≠ 0.005
Step 2: Calculation of test statistic.
t 
x  0
0 . 0106  0 . 005

 1 . 236
0.004559
s/ n
Step 3: Calculation of p-value.
T.DIST(1.236,311,2) ≈ 0.2175
Step 4: Compare p-value to α = 0.15.
p-value = 0. 2175 > 0.15 = α.
Do NOT reject H0.
59
Paradox



According to the one-tailed hypothesis test we
can prove that µ > 0.005. But according to the
two-tailed test we cannot prove that µ ≠ 0.005.
That’s the paradox!
The reason for the convention leading to the
paradox is to obtain a sensible relation between
two-tailed hypothesis tests and confidence
intervals.
60
Two-tailed Tests and Confidence Interval
The hypothesis Ha: µ ≠ 0.005 can be proved at the
significance level α if and only if the (1- α)*100%
confidence interval does not include 0.005.
61
Summary


Discussed hypothesis testing methodology
Introduced four-step process of hypothesis
testing

Defined p-value

Performed z-test for the proportion

Performed t-test for the mean

Discussed two-tailed hypothesis test
62
Best Practices





Be sure that the data are an SRS from the
population.
Pick the hypotheses before looking at the data.
Pick the α-level before you compute the test
statistic and the p-value.
Think about whether α = 0.05 is appropriate for
each test.
Report a p-value to summarize the outcome of
a test.
63
Pitfalls



Do not confuse statistical significance with
substantive importance.
Do not think that the p-value is the
probability that the null hypothesis is true.
Avoid cluttering a test summary with jargon.
64
Managerial Statistics
KH 19
4 – Simple Linear Regression
Course material adapted from Chapter 19 of our textbook
Statistics for Business, 2e © 2013 Pearson Education, Inc.
Learning Objectives




Calculate and interpret the simple linear
regression equation for a set of data
Describe the meaning of the coefficients of the
regression equation in the context of business
applications
Examine and interpret the scatterplot and the
residual plot as they relate to a regression
Understand the meaning (and limitation) of the
R-squared statistic
2
Diamond Prices
Motivation: What is the relationship between the
price and weight of diamonds?
Method: Using a sample of 320 emerald-cut
diamonds of various weights, regression analysis
produces an equation that relates price to weight.
Mechanics: Let y denote the response (“dependent”)
variable (price) and let x denote the explanatory
(“independent”) variable (weight).
3
Scatterplot of Price vs. Weight
Scatterplot
$2'000.00
$1'800.00
$1'600.00
$1'400.00
Price ($)
$1'200.00
$1'000.00
$800.00
$600.00
$400.00
$200.00
$0.00
0.3
0.35
0.4
0.45
0.5
0.55
Weight (carats)
4
Linear Equation



There appears to be a linear trend.
We identify the trend line (“best-fit line” or “fitted
line”) by an intercept b0 and a slope b1.
The equation of the fitted line is
Estimated Price = b0 + b1 × Weight .

In generic terms, ŷ = b0 + b1 x .
5
Residuals


Not all data points will lie on the best-fit line.
The Residuals are the vertical deviations from
the data points to the line (e=y- ŷ).
6
Method of Least Squares


The Method of Least Squares determines the
best-fit line by minimizing the sum of squared
residuals.
The method uses differential calculus to obtain
the values of the coefficients b0 and b1 that
minimize the sum of squared residuals, also
called the sum of squared errors, SSE.
7
Minimizing SSE

Let the index i indicate the ith data point, (xi,yi).
min SSE  min
 min
 min
e
 (y
 [y
2
i
i
 yˆ i ) 2
i
 (b 0  b1 x i )] 2
8
Least Square Regression

The method of least squares generates the
following coefficient values:
n
b1 
 (x  x)(y
i
i 1
 y)
i
r
n
 (x  x)
i 1
2
sY
sX
i
b0  y  b1 x
9
Diamonds: Fitted Line
The least squares
regression equation
relating diamond
prices to weight is
Estimated Price =
43.5 + 2670 Weight
Regression: Price ($)
coefficient
std error of coef
t-ratio
p-value
beta-weight
constant
Weight (carats)
43.48910163
2669.745803
71.90155144
172.4731816
0.6048
15.4792
54.5715%
0.0000%
0.6555
standard error of regression
R-squared
adjusted R-squared
170.2149256
42.97%
42.79%
number of observations
residual degrees of freedom
320
318
t-statistic for computing
95%-confidence intervals
1.9675
10
Using the Fitted Line


The average price of a diamond that weighs 0.4
carat is
Estimated Price = 43.49 + 2669.75 × 0.4
≈ 1111.39,
that is, the estimated price is (about) $1,111.
A diamond that weighs 0.5 carat costs (about)
$267 more, on average.
11
Illustration
12
Interpreting the Slope


The slope coefficient b1 describes how
differences in the explanatory variable x
associate with differences in the response y.
In the diamond example, we can interpret the
slope b1 as the marginal cost of an additional
carat. (i.e., marginal cost is $2,670 per carat).
13
Interpreting the Intercept



The intercept b0 estimates the average response
when x = 0 (where the line crosses the y axis).
The intercept is the portion of y that is present
for all values of x.
In the diamond example we can interpret b0 as
fixed cost, $43.49, per diamond.
14
Interpreting the Intercept
(continued)


In many applications, the intercept coefficient
does not have a useful interpretation.
Unless the range of x values includes zero, the
value for b0 is the result of an extrapolation.
15
Residual Plot


A Residual Plot shows the variation that
remains in the data after accounting for the
linear relationship defined by the fitted line. Put
differently, the plot shows the variation of the
data points around the fitted line.
The residuals should be plotted against the
predicted values of y (or against x) to check for
patterns.
16
Residual Plot
(continued)


If the least squares line captures the
association between x and y, then a plot of
residuals should stretch out horizontally with
consistent vertical scatter. No particular pattern
should be visible.
Our task is to visually check for the absence of
a pattern.
17
Residuals vs. Predicted Values
Residual Plot
600
400
residuals
200
0
800
900
1000
1100
1200
1300
1400
1500
-200
-400
-600
predicted values of Price ($)
18
Variation of Residuals


The standard deviation of the residuals
measures how much the residuals vary around
the fitted line.
This standard deviation is called the Standard
Error of Regression or the Root Mean
Squared Error (RMSE).
e12  e22    en2
se  SSE/(n  2 ) 
n2
19
Diamonds
For the diamond example,
se=170.21.
The standard error of
regression is $170.21.
Regression: Price ($)
coefficient
std error of coef
t-ratio
p-value
beta-weight
constant
Weight (carats)
43.48910163
2669.745803
71.90155144
172.4731816
0.6048
15.4792
54.5715%
0.0000%
0.6555
standard error of regression
R-squared
adjusted R-squared
170.2149256
42.97%
42.79%
number of observations
residual degrees of freedom
320
318
t-statistic for computing
95%-confidence intervals
1.9675
20
Measures of Variation
Y
yi

SSE = (yi - yi )2

y
_

y
SST = (yi - y)2
 _
SSR = (yi - y)2
_
_
y
y
X
xi
21
Measures of Variation
(continued)

SST = total sum of squares


SSR = regression sum of squares


Variation of the yi values around their mean, y
Explained variation attributable to the linear
relationship between x and y
SSE = error sum of squares (sum of squared errors)

Variation attributable to factors other than the linear
relationship between x and y
22
Measures of Variation
(continued)

Total variation is made up of two parts:
SST 
SSR 
Total Sum of
Squares
Regression Sum
of Squares
SST   (y i  y)2
SSR   (yˆ i  y)2
SSE
Error Sum of
Squares
SSE   (y i  yˆ i )2
where:
y
= Average value of the dependent variable
yi = Observed values of the dependent variable
ŷi = Predicted value of y for the given xi value
23
Coefficient of Determination, R2


The Coefficient of Determination is the
portion of the total variation in the dependent
variable that is explained by variation in the
independent variable.
The coefficient of determination is also called
R-squared and is denoted by r2 or R2.
R2 
SSR regression sum of squares

SST
total sum of squares
note:
0  R2  1
24
Examples of R-squared Values
Y
r2 = 1
r2 = 1
X
100% of the variation in Y is
explained by variation in X.
Y
r2 = 1
Perfect linear relationship
between X and Y:
X
25
Examples of R-squared Values
(continued)
Y
0 < r2 < 1
X
Weaker linear relationships
between X and Y:
Some but not all of the variation in
Y is explained by variation in X.
Y
X
26
Examples of R-squared Values
(continued)
r2 = 0
Y
No linear relationship
between X and Y:
r2 = 0
The value of Y does not
depend on X. (None of the
variation in Y is explained by
variation in X).
X
27
Diamonds
For the diamond example,
r2 = 0.4297.
The R-squared is 43%. That is,
the regression explains 43%
of the variation in price.
Regression: Price ($)
coefficient
std error of coef
t-ratio
p-value
beta-weight
constant
Weight (carats)
43.48910163
2669.745803
71.90155144
172.4731816
0.6048
15.4792
54.5715%
0.0000%
0.6555
standard error of regression
R-squared
adjusted R-squared
170.2149256
42.97%
42.79%
number of observations
residual degrees of freedom
320
318
t-statistic for computing
95%-confidence intervals
1.9675
28
Checklist for Simple Regression



Linear: Examine the scatterplot to see if pattern
resembles a straight line.
Random residual variation: Examine the
residual plot to make sure no pattern exists.
(No obvious lurking variable: Think about whether other
explanatory variables may better explain the linear association
between x and y.)
29
Application: Lease Costs
Motivation
How can a dealer anticipate the effect of age on
the value of a used car? The dealer estimates
that $4,000 is enough to cover the depreciation
per year.
30
Lease Costs
Method
Use regression analysis to find the equation that
relates y (resale value in dollars) to x (age of the
car in years). The car dealer has data on the
prices and age of 218 used BMWs in the
Philadelphia area.
31
Lease Costs
(continued)
Mechanics
(Think about lurking variables)
Check scatterplot
Run regression
Check residual plot
32
Lease Costs: Scatterplot
Scatterplot
$50'000.00
$45'000.00
$40'000.00
Price
$35'000.00
$30'000.00
$25'000.00
$20'000.00
$15'000.00
$10'000.00
0
1
2
3
4
5
6
Age
Regression Equation: Price = 39851.7199 - 2905.5284 Age
33
Lease Costs: Regression
Regression: Price
Mechanics
coefficient
std error of coef
t-ratio
p-value
beta-weight
constant
Age
39851.7199 -2905.5284
758.460867
219.3264
52.5429
-13.2475
0.0000%
0.0000%
-0.6695
standard error of regression
R-squared
adjusted R-squared
3366.63713
44.83%
44.57%
number of observations
residual degrees of freedom
218
216
t-statistic for computing
95%-confidence intervals
1.9710
34
Lease Costs: Residual Plot
Residual Plot
15000
10000
residuals
5000
0
20000
25000
30000
35000
40000
45000
-5000
-10000
predicted values of Price
35
Lease Costs: Regression
Mechanics
The linear regression equation is
Estimated Price = 39,851.72 – 2,905.53 Age
The R-squared is 0.4483, the standard error of
regression is se = $3366.64.
36
Conclusion
Message
The results indicate that used BMWs decline in
resale value by $2,900 per year. The current
lease price of $4,000 per year appears
profitable. However, the fitted line leaves more
than half of the variation unexplained.
Leases longer than 5 years would require
extrapolation.
37
Best Practices

Always look at the scatterplot.

Know the substantive context of the model.


Describe the intercept and slope using units of
the data.
Limit predictions to the range of observed
conditions.
38
Pitfalls

Do not assume that changing x causes changes in y.

Do not forget lurking variables.


Do not trust summaries like R-squared without looking
at plots.
Do not call a regression with a high R-squared “good”
or a regression with a low R-squared “bad”.
39
Managerial Statistics
KH 19
5 – Simple Regression Model
Course material adapted from Chapter 21 of our textbook
Statistics for Business, 2e © 2013 Pearson Education, Inc.
Learning Objectives




Understand the framework of the simple linear
regression model
Calculate and interpret confidence intervals for
the regression coefficients
Perform hypothesis tests on the regression
coefficients
Understand the difference between confidence
and prediction intervals for the predicted value
2
Berkshire Hathaway
Motivation: How can we test the CAPM (Capital
Asset Pricing Model) for Berkshire Hathaway
stock?
Method: Formulate the simple regression with
percentage excess return in Berkshire Hathaway
stock as y and the percentage excess return in
value of the whole stock market (“value-weighted
stock market index) as x.
3
From Description to Inference


We do not only want to describe the historical
relationship between x and y that is evident in
the data. In addition, we now want to make
inferences about the underlying population.
We have to think of our data as a sample from a
population.
4
From Description to Inference
(continued)


Naturally, the question arises, what conclusions
can we derive from the sample about the
population?
The central idea is to use inference related to
regression: standard errors, confidence
intervals and hypothesis tests.
5
Model of the Population



The Simple Linear Regression Model (SRM)
is a model for the association in the population
between an explanatory variable x and a
response variable y.
The SRM equation describes how the
(conditional) mean of y depends on x.
The SRM assumes that these means lie on a
straight line with intercept β0 and slope β1:
 y x  E (Y X  x)   0  1 x
6
Model of the Population
(continued)

The response variable y is a random variable.
The actual values vary around the mean. The
deviations of responses around their
(conditional) mean are called errors,
y  y x  

Errors ε can be positive or negative. They have
zero mean, that is, the average deviation from
the line is zero.
7
Simple Linear Regression Model
The population regression model:
population
Y intercept
dependent
variable
population
slope
coefficient
independent
variable
random
error
term
y  β0  β1 x  ε
linear component
random error
component
8
Simple Linear Regression Model
(continued)
Y
yi  β0  β1 xi  εi
observed value
of y for xi
εi
average value
of y for xi
slope = β1
random error
for this xi value
intercept = β0
xi
X
9
Data Generating Process





The “true regression line” is a characteristic of
the population, not the observed data.
The true line’s parameters β0 and β1 are (and
will remain) unknown!
The SRM is a model and offers a simplified
view of the population.
The observed data points are a simple random
sample from the population.
The fitted line provides an estimate of the
population regression line.
10
Simple Linear Regression
Equation
The simple linear regression equation provides an
estimate of the population regression line.
estimated (or
predicted) y
value for
observation i
estimate of
the regression
intercept
estimate of the
regression slope
yˆ i  b0  b1 xi
value of x for
observation i
The individual random error terms ei are
value of y for
observation i
ei  (yi -yˆ i )  yi -(b0  b1 xi )
11
Estimates vs. Parameters
12
From Description to Inference



We want to use the estimated regression line to make
inferences about the true relationship between the
explanatory and the response variable.
The central idea is to use the standard statistical tools:
standard errors, confidence intervals and hypothesis
tests.
The application of these tools requires us to make some
assumptions.
13
SRM: Classical Assumptions
(1) The regression model is linear.
(2) The error term ε has zero mean, E(ε) = 0.
(3) The explanatory variable x and the error term ε
are uncorrelated.
(4) The error terms are uncorrelated with each
other.
14
SRM: Classical Assumptions
(continued)
(5) The error term has a constant variance, Var(ε)
= σe2 for any value of x. (homoskedasticity)
(6) The error terms are normally distributed.
(This assumption is optional but usually
invoked.)
15
Inference

If assumptions (1) – (6) hold, then we can easily
compute confidence intervals for the unknown
parameters β0 and β1. Similarly, we can perform
hypothesis tests for these parameters.
16
Modeling Process: Practical Checklist

Before looking at plots or running a regression,
ask the following questions:




Does a linear relationship make sense to us?
What type of relationship (sign of coefficients) do we
expect?
Could there be lurking variables?
Then begin working with data.
17
Modeling Process: Practical Checklist
(continued)



Plot y versus x and verify a linear association in
the scatterplot.
Compute the fitted line.
Plot the residuals versus the predicted values
(or x) and inspect the residual plot. Do the …




… residuals appear to be independent?
… residuals appear to have similar variances?
(… residuals appear to be nearly normal?)
(Time series require additional checks.)
18
CAPM: Berkshire Hathaway

Check scatterplot: relationship appears linear
Scatterplot
40
% Change Berk-Hath
30
20
10
0
-25
-20
-15
-10
-5
0
5
10
15
-10
-20
-30
% Change Market
19
CAPM: Berkshire Hathaway
(continued)

Run simple linear regression
Regression: % Change Berk-Hath
constant
% Change Market
coefficient
1.39620459
0.72234946
std error of coef
0.33968223
0.07776332
t-ratio
p-value
beta-weight
standard error of regression
4.1103
9.2891
0.0049%
0.0000%
0.4334
6.51740865
R-squared
18.79%
adjusted R-squared
18.57%
number of observations
375
residual degrees of freedom
373
t-statistic for computing
95%-confidence intervals
1.9663
20
CAPM: Berkshire Hathaway
(continued)

Check residual plot: no pattern visible
Residual Plot
40
30
20
residuals
10
-20
0
-15
-10
-5
0
5
10
15
-10
-20
-30
predicted values of % Change Berk-Hath
21
Standard Errors of the Coefficients


The Standard Errors of the Coefficients
describe the sample-to-sample variability of the
coefficients b0 and b1.
The estimated standard error of b1, se(b1), is
se(b1) 
se
1

n 1 sx
22
Estimated Standard Error of b1

The estimated standard error of b1 depends on
three factors:



Standard deviation of the residuals se. As se
increases, the standard error se(b1) increases.
Sample size n. As n increases, the standard error
se(b1) decreases.
Standard deviation sx of x. As sx increases, the
standard error se(b1) decreases.
23
CAPM: Berkshire Hathaway

CAPM regression for Berkshire Hathaway
Regression: % Change Berk-Hath
constant
% Change Market
coefficient
1.39620459
0.72234946
std error of coef
0.33968223
0.07776332
t-ratio
p-value
beta-weight
standard error of regression
4.1103
9.2891
0.0049%
0.0000%
0.4334
6.51740865
R-squared
18.79%
adjusted R-squared
18.57%
number of observations
375
residual degrees of freedom
373
t-statistic for computing
95%-confidence intervals
1.9663
24
Confidence Intervals

Confidence intervals for the coefficients
The 95% confidence interval for β1 is
b1  t0.025,n2  se(b1 )
The 95% confidence interval for β0 is
b0  t0.025,n2  se(b0 )
25
Confidence Intervals: CAPM
The 95% confidence interval for β1 is
0.72234 ± 1.9663×0.077763 = [0.5694, 0.8753].
The 95% confidence interval for β0 is
1.3962 ± 1.9663×0.33968 = [0.7283, 2.064].
26
Hypothesis Tests

Hypothesis tests on the coefficients
Test statistic for H0: β1 = 0:
t
b1
se(b1 )
Test statistic for H0: β0 = 0:
t
b0
se(b 0 )
27
Hypothesis Tests: CAPM


Hypothesis test of statistical significance for β1:
The t-statistic of 9.2891 with a p-value of less
than 0.0001% indicates that the slope is
significantly different from zero.
Hypothesis test of statistical significance for β0:
The t-statistic of 4.1103 with a p-value of
0.0049% indicates that the intercept is
significantly different from zero.
28
Application: Locating a Gas Station
Motivation
Does traffic volume affect gasoline sales? How
much more gasoline can be expected to be sold
at a gas station with an average of 40,000
drive-bys a day compared to one with an
average of 32,000 drive-bys?
29
Gas Station
Method
Use sales data from a recent month obtained
from 80 gas stations (from the same franchise).
Run a regression of sales against traffic volume.
The 95% confidence interval for 8,000 times the
estimated slope will indicate how much more gas
is expected to sell at the busier location.
30
Gas Station
(continued)
Mechanics
(Think about lurking variables)
Check scatterplot
Run regression
Check residual plot
31
Gas Station: Scatterplot
Mechanics
Check scatterplot: relationship appears linear
Scatterplot
14
Sales (000 gal.)
12
10
8
6
4
2
20
25
30
35
40
Traffic Volume (000)
45
50
55
32
Gas Station: Regression
Regression: Sales (000 gal.)
Mechanics
Run a regression
constant
Traffic Volume (000)
coefficient
-1.3380974
0.23672864
std error of coef
0.94584359
0.02431421
t-ratio
p-value
-1.4147
9.7362
16.1132%
0.0000%
0.7407
beta-weight
1.5054068
standard error of regression
R-squared
54.86%
adjusted R-squared
54.28%
number of observations
80
residual degrees of freedom
78
t-statistic for computing
1.9908
95%-confidence intervals
33
Gas Station: Residual Plot
Mechanics
Check the residual plot: no pattern
Residual Plot
5
4
3
residuals
2
1
0
4
5
6
7
8
9
10
11
-1
-2
-3
-4
predicted values of Sales (000 gal.)
34
Gas Station: Regression
Mechanics
The linear regression equation is
Estimated Sales = -1.338 + 0.23673 Traffic Vol.
The 95% confidence interval for β1 is
0.23673 ± 1.9908×0.024314 = [0.1883, 0.2851].
The 95% confidence interval for 8000×β1 is
8000×[0.1883, 0.2851] ≈ [1507, 2281].
35
Conclusion
Message
Based on a sample of 80 gas stations, we
expect that a station located at a site with
40,000 drive-bys will sell, on average, from
1,507 to 2,281 more gallons of gas daily than a
location with 32,000 drive-bys.
36
Standard Errors of the Fitted Value

The fitted value, ŷ , for a given value of x is an
estimator of two different unknown values:



It is a point estimate for the average value of y for all
data points with the particular x value.
It is a point estimate for the y value of a single
observation with this particular x value.
It is much more difficult to make a prediction
about a single observation than to make a
prediction about an average value.
37
SE Estimated Mean
y = Sales
ŷ = b0 + b1*x
ŷ = 8.13
Confidence Interval for
average Sales at Traffic
Volume = 40.
b0
x = 40
x = Traffic Volume
Std error of ŷ for estimating μy|x: SE of estimated mean.
38
SE Prediction
y = Sales
ŷ = b0 + b1*x
ŷ = 8.13
b0
Prediction Interval for Sales at Traffic
Volume = 40
x = 40
x = Traffic Volume
Std error of ŷ for estimating avg y at x: SE of estimated mean.
Std error of ŷ for estimating individual y: SE of prediction.
(SE of prediction)2 = (SE of est. mean)2 + (SE of regression)2
39
Standard Errors of the Fitted Value


The Standard Error of the Estimated Mean
captures the variability of the estimated mean of
y around μy|x, the (true but unknown) population
average y at the given x.
The fitted ŷ = b0 + b1*x is our estimator for the
average y at x. The SE of Estimated Mean is a
measure for its sample-by-sample variation.
40
Standard Errors of the Fitted Value
(continued)


The Standard Error of Regression , se,
measures the variability of the individual y
around the fitted line.
By SRM assumption (5) (homoskedasticity), the
std. deviation of y around the average μy|x does
not vary with x; this std. deviation is estimated
by the SE of Regression. (Note: it is not the std.
error of any estimator.)
41
Standard Errors of the Fitted Value
(continued)

The Standard Error of Prediction captures the
variability of any individual observation y around
μy|x, the (true but unknown) population average
y at any given x.
(SE of Prediction)2 =
(SE of Est. Mean)2 + (SE of Regression)2
42
Two Different Intervals



Confidence Interval: An interval designed to
hold an unknown population parameter with
some level (often 95%) of confidence.
Prediction Interval: An interval designed to
hold a fraction of the values of the variable y
(for a given value of x).
A prediction interval differs from a confidence
interval because it makes a statement about the
location of a new observation rather than a
parameter of a population.
43
CI vs. PI



(1- α) Confidence Interval for a mean
Predicted Value ± TINV(α,df)×SE Est. Mean
Prediction Interval for a single observation
Predicted Value ± TINV(α,df)×SE Prediction
Prediction intervals are sensitive to SRM
assumptions (5), constant variance, and (6),
normal errors.
44
Gas Station: CI and PI
95% CI: [7.786, 8.476]
Prediction, using most-recent regression
Traffic Volume
(000)
constant
-1.3381
coefficients
0.236729
40
values for prediction
predicted value of Sales (000 gal.)
8.131048
standard error of prediction
1.515364
standard error of regression
1.505407
95% PI: [5.114, 11.148]
0.173427
standard error of estimated mean
95.00%
confidence level
1.9908
t-statistic
78
residual degr. freedom
confidence limits
lower
5.114191
for prediction
upper
11.14791
confidence limits
lower
7.785781
for estimated mean
upper
8.476316
45
Interpretation of Intervals


We are 95% confident that average sales at gas
stations with 40,000 drive-bys per day are
between 7,786 gallons and 8,476 gallons.
We are 95% confident that sales at an
individual gas station with 40,000 drive-bys per
day are between 5,114 gallons and 11,148
gallons.
46
Best Practices






Verify that your model makes sense, both
visually and substantively.
Consider other possible explanatory variables.
Check the conditions, in the listed order.
Use confidence intervals to express what you
know about the slope and intercept.
Check the assumptions of the SRM carefully
before using prediction intervals.
Be careful when extrapolating.
47
Pitfalls




Don’t overreact to residual plots.
Do not mistake varying amounts of data for
unequal variances.
Do not confuse confidence intervals with
prediction intervals.
Do not expect that r2 and se must improve with
a larger sample.
48
Managerial Statistics
KH 19
6 – Multiple Regression
Course material adapted from Chapter 23 of our textbook
Statistics for Business, 2e © 2013 Pearson Education, Inc.
Learning Objectives

Apply multiple regression analysis to decisionmaking situations in business

Analyze and interpret multiple regression
models

Understand the difference between partial and
marginal slopes

Decide when to exclude variables from a
regression model
2
Chain of Women’s Apparel Stores
Motivation: How are sales at a chain of women’s
apparel stores (annually in dollars per square foot
of retail space) affected by competition (number of
competing apparel stores in the same shopping
mall)?
First approach: Formulate a simple regression with
sales at stores of this chain as the response
variable y and the number of competing stores as
the explanatory variable x.
3
Scatterplot of Sales vs. Competitors
Scatterplot
$900.00
$800.00
Sales ($/sq ft)
$700.00
$600.00
$500.00
$400.00
$300.00
0
1
2
3
4
5
6
7
Competitors
4
Simple Linear Regression
Regression: Sales ($/sq ft)
constant Competitors
502.201557 4.63517778
coefficient
25.4436616 8.74691578
std error of coef
19.7378
0.5299
t-ratio
0.0000% 59.8029%
p-value
0.0666
beta-weight
standard error of regression
R-squared
adjusted R-squared
105.778443
0.44%
-1.14%
number of observations
residual degrees of freedom
65
63
t-statistic for computing
95%-confidence intervals
Positive relationship:
more competitors,
higher sales!
Does this make sense?
1.9983
5
Interpretation


A large number of competitors is indicative of a
shopping mall in a location with a high median
household income. Put differently, the number
of competitors and the median household
income are positively correlated.
The simple regression of Sales on Competitors
mixes the decrease in sales associated with
increased competition with the increase in sales
associated with higher income levels (that
accompany a larger number of competitors).
6
Apparel Sales: Multiple Regression

Multiple regression with 2 explanatory variables



Median household income in the area (in thousands
of dollars)
Number of competing apparel stores in the same
mall
Response variable as before

Sales at stores of the chain (annually in dollars per
square foot of retail space)
7
Apparel Sales: Multiple Regression
Regression: Sales ($/sq ft)
constant Income ($000) Competitors
60.3586702
7.965979876 -24.16503223
coefficient
49.290165
0.838249629
6.38991396
std error of coef
1.2246
9.5031
-3.7817
t-ratio
22.5374%
0.0000%
0.0353%
p-value
0.8727
-0.3473
beta-weight
standard error of regression
R-squared
adjusted R-squared
68.03062709
59.47%
58.17%
number of observations
residual degrees of freedom
65
62
t-statistic for computing
95%-confidence intervals
1.9990
Estimated Sales = 60.359 + 7.966 Income – 24.165 Competitors
8
Sales: Residual Plot
Check the residual plot: no pattern
Residual Plot
200
150
residuals
100
50
0
300
350
400
450
500
550
600
650
700
750
800
-50
-100
-150
predicted values of Sales ($/sq ft)
9
Interpreting the Equation

The slope 7.966 for Income implies that a store
in a location with a higher median household of
$10,000 sells, on average, $79.66 more per
square foot than a store in a less affluent location with the same number of competitors.

The slope -24.165 for Competitors implies that,
among stores in equally affluent locations,
each additional competitor lowers average
sales by $24.165 per square foot.
10
Multiple Regression



The Multiple Regression Model (MRM) is a model
for the association in the population between
multiple explanatory variables x1, x2, …,xk and a
response y.
While the SRM bundles all but one explanatory
variable into the error term, multiple regression
allows for the inclusion of several variables in the
model.
Multiple regression separates the effects of each
explanatory variable on the response and reveals
which really matter.
11
Multiple Regression Model
Idea: Examine the linear relationship between a
response (y) & 2 or more explanatory variables (xi)
Multiple regression model with k independent variables:
y intercept
population slopes
random error
y  β0  β1 x1  β2 x2    βk xk  ε
12
Multiple Regression Equation
The coefficients of the multiple regression model
are estimated using sample data
Estimated multiple regression equation:
estimated
intercept
estimated slope coefficients
yˆ  b0  b1 x1  b2 x2    bk xk
13
Graph for Two-Variable Model
y
yˆ  b0  b1 x1  b2 x2
x2
x1
14
Residuals in a Two-Variable Model
y
sample
observation
yˆ  b0  b1 x1  b2 x2
<
residual =
ei = (yi – yi)
yi
<
yi
x2i
x2
x1i
15
MRM: Classical Assumptions
(1) The regression model is linear.
(2) The error term ε has zero mean, E(ε) = 0.
(3) All explanatory variables x1, x2, …,xk are
uncorrelated with the error term ε.
(4) Observations of the error term are uncorrelated with each other.
16
MRM: Classical Assumptions
(continued)
(5) The error term has a constant variance, Var(ε)
= σe2 for any value of x. (homoskedasticity)
(6) No explanatory variable is a perfect linear
function of any other explanatory variables.
(7) The error terms are normally distributed.
(This assumption is optional but usually
invoked.)
17
Multiple vs. Simple Regressions

Partial slope: slope of an explanatory variable
in a multiple regression that statistically excludes the effects of other explanatory variables.

Marginal slope: slope of the explanatory
variable in a simple regression.

Partial and Marginal slopes only agree when
the explanatory variables are uncorrelated.
18
Partial Slopes: Women’s Apparel
Competitors
+
–
Sales
+
Income
Competitors has a direct negative effect on Sales.
Income has a positive effect on Sales.
Competitors and Income are positively correlated.
19
Marginal Slope: Women’s Apparel
+ Income
–
Competitors
+
Sales
– + (+ × +)


The direct effect of Competitors on Sales is
negative (–). The indirect effect (via Income) is
positive (+ × +).
The marginal slope of Competitors in the simple
regression is now the sum of these two effects.
20
Partial vs. Marginal Slopes


The MRM separates the individual effects of all
explanatory variables (into the partial slopes).
Indirect effects (resulting from correlation
among explanatory variables) are not present.
The SRM does not separate individual effects
and so indirect effects are present. The
marginal slope of the (single) explanatory
variable reflects both the direct effect of this
variable as well as the indirect effect(s) due to
missing explanatory variable(s).
21
Apparel Sales: Multiple Regression
Regression: Sales ($/sq ft)
constant Income ($000) Competitors
60.3586702
7.965979876 -24.16503223
coefficient
49.290165
0.838249629
6.38991396
std error of coef
1.2246
9.5031
-3.7817
t-ratio
22.5374%
0.0000%
0.0353%
p-value
0.8727
-0.3473
beta-weight
standard error of regression
R-squared
adjusted R-squared
68.03062709
59.47%
58.17%
number of observations
residual degrees of freedom
65
62
t-statistic for computing
95%-confidence intervals
1.9990
Estimated Sales = 60.359 + 7.966 Income – 24.165 Competitors
22
Inference in Multiple Regression


Hypothesis test of statistical significance for β1:
The t-ratio of 9.5031 with a p-value of less than
0.0001% indicates that the partial slope of
Income is significantly different from zero.
Hypothesis test of statistical significance for β2:
The t-statistic of -3.7817 with a p-value of
0.0353% indicates that the partial slope of
Competitors is significantly different from zero.
23
Inference in Multiple Regression
(continued)


Both explanatory variables, Income and
Competitors, have a statistically significant
effect on the response, Sales.
Hypothesis test of statistical significance for β0:
The t-statistic of 1.2246 with a p-value of
22.5374% indicates that the constant coefficient
is not significantly different from zero.
24
Prediction with a Multiple Regression
Prediction, using most-recent regression
coefficients
values for prediction
constant Income ($000) Competitors
60.35867
7.965979876 -24.16503223
50
3
predicted value of Sales ($/sq ft)
standard error of prediction
standard error of regression
standard error of estimated mean
confidence level
t-statistic
residual degr. freedom
386.1626
69.9607
68.0306
16.3198
95.00%
1.9990
62
confidence limits
for prediction
lower
upper
246.3131
526.0120
confidence limits
for estimated mean
lower
upper
353.5398
418.7853
25
Prediction with a Multiple Regression
(continued)

The 95% prediction interval for annual sales per
square foot at a location with median household
income of $50,000 and 3 competitors is
[$246.31, $526.01].

The 95% confidence interval for average annual
sales per square foot at locations with median
household income of $50,000 and 3 competitors is [$353.54, $418.79].
26
Application: Subprime Mortgages
Motivation
A banking regulator would like to verify how
lenders use credit scores to determine the
interest rate paid by subprime borrowers. The
regulator would like to separate its effect from
other variables such as loan-to-value (LTV)
ratio, income of the borrower and value of the
home.
27
Subprime Mortgages
Method
Use multiple regression on data obtained for 372
mortgages from a credit bureau. The explanatory
variables are the LTV, credit score (FICO),
income of the borrower, and home value. The
response is the annual percentage rate of interest
on the loan (APR).
28
Subprime Mortgages
(continued)
Mechanics
Run regression
Check residual plot
29
Subprime Mortgages: Regression
Regression: APR
coefficient
std error of coef
t-ratio
p-value
beta-weight
constant
LTV
FICO
Stated Income ($000) Home Value ($000)
23.7253652 -1.588843 -0.0184318
0.000403212
-0.000752082
0.6859028 0.51971233 0.00135016
0.003326563
0.000818648
34.5900
-3.0572
-13.6515
0.1212
-0.9187
0.0000%
0.2398%
0.0000%
90.3591%
35.8862%
-0.1339
-0.6008
0.0047
-0.0362
standard error of regression
R-squared
adjusted R-squared
1.24383566
46.31%
45.73%
number of observations
residual degrees of freedom
372
367
t-statistic for computing
95%-confidence intervals
1.9664
30
Subprime Mortgages: Residual Plot
Mechanics
Check the residual plot: no pattern
Residual Plot
8
6
residuals
4
2
0
8
9
10
11
12
13
14
15
16
17
-2
-4
predicted values of APR
31
Subprime Mortgages: Regression
Mechanics
The linear regression equation is
Estimated APR = 23.725 – 1.5888 LTV – 0.01843 FICO
+ 0.0004032 Stated Income – 0.000752 Home Value
The first two variables, LTV and Credit Score
(FICO) have low p-values. The remaining two
variables, Stated Income and Home Value,
have high p-values.
32
Conclusion
Message
Regression analysis shows that the credit score
(FICO) of the borrower and the loan LTV affect
interest rates in the market. Neither income of
the borrower nor the home value improves a
model with these two variables.
33
Dropping Variables

Since the variables Stated Income and Home
Value have no statistically significant effect on
the response variable APR, we may decide to
drop them from the regression.

We run a new regression with only two
explanatory variables, LTV and Credit Score
(FICO).
34
New Regression
Regression: APR
coefficient
std error of coef
t-ratio
p-value
beta-weight
constant
LTV
FICO
23.6913824 -1.5773413 -0.0185656
0.64984629 0.51842379 0.00134003
36.4569
-3.0426
-13.8546
0.0000%
0.2514%
0.0000%
-0.1329
-0.6051
standard error of regression
R-squared
adjusted R-squared
1.24189462
46.19%
45.90%
number of observations
residual degrees of freedom
372
369
t-statistic for computing
95%-confidence intervals
1.9664
Estimated APR = 23.691 – 1.5773 LTV – 0.018566 FICO
35
Removing Variables



Multiple regressions may often indicate that
some of the explanatory variables are not
statistically significant.
Depending on the context of the analysis, we
may decide to remove insignificant variables
from the regression.
If we remove such variables then we should do
so one at a time to make sure that we don’t omit
a useful variable.
36
Best Practices

Know the business context of your model.

Distinguish marginal from partial slopes.

Check the assumptions of the model before
interpreting the output.
37
Pitfalls





Don’t confuse a multiple regression with several
simple regressions.
Don’t believe that you have all of the important
variables. Do not think that you have found causal
effects.
Do not interpret an insignificant t-ratio to mean that
an explanatory variable has no effect.
Don’t think that the order of the explanatory
variables in a regression matters.
Don’t remove several explanatory variables from
your model at once.
38
Managerial Statistics
KH 19
7 – Dummy Variables
Course material adapted from Chapter 25 of our textbook
Statistics for Business, 2e © 2013 Pearson Education, Inc.
Learning Objectives

Incorporate qualitative variables into regression
models by using dummy variables

Interpret the effect of a dummy variable on the
regression equation

Analyze interaction effects by introducing slope
dummy variables

Apply and interpret regression models with
slope dummy variables
2
Dummy Variable

A Dummy Variable is a variable that only takes
values 0 or 1. It usually expresses a qualitative
difference; e.g., whether the observation is for a
man or a woman, or from customer A or B, etc.

For example, we can define a dummy variable
Group as follows:
Group = 0, if the data point is for a woman
Group = 1, if the data point is for a man
3
Gender and Salaries
Motivation: How can we examine the impact of the
variables ‘years of experience’ and ‘gender
(male/female)’ on average salaries of managers?
Method: Represent the categorical variable gender
by a dummy variable. Then run a regression with
the response variable Salary and two explanatory
variables, years of experience and the new
dummy variable.
4
Regression with a Dummy
Regression: Salary ($000)
constant Years of Experience
Group
133.467579
0.853708343 1.024190096
coefficient
2.13151142
0.192481379 2.057626623
std error of coef
62.6164
4.4353
0.4978
t-ratio
0.0000%
0.0016%
61.9298%
p-value
0.3449
0.0387
beta-weight
standard error of regression
R-squared
adjusted R-squared
11.77881458
13.11%
12.09%
number of observations
residual degrees of freedom
174
171
t-statistic for computing
95%-confidence intervals
1.9739
Estimated Salary = 133.47 + 0.8537 Years + 1.024 Group
5
Substituting Values for the Dummy
Estimated Salary = 133.47 + 0.8537 Years +
1.024 Group
Equation for women (Group = 0)
Estimated Salary = 133.47 + 0.8537 Years
Equation for men (Group = 1)
Estimated Salary = 134.49 + 0.8537 Years
6
Effect of the Dummy Coefficient





After substituting the two values 0 and 1 for the dummy
variable, we obtain two regression equations.
The equation for Group = 0 yields a relationship
between Salary and Years for women.
The equation for Group = 1 yields a relationship
between Salary and Years for men.
The two lines have different intercepts but identical
slopes.
The coefficient of the dummy variable, bGroup=1.024,
determines the difference between the intercepts of the
two regression lines.
7
In General Terms

Regression with two variables, x1 and dum:
yˆ  b0  b1 x1  b2 dum

Substituting values for the dummy:
yˆ  b 0  b1 x1  b 2 ( 0 ) 
 b1 x1
dum = 0
yˆ  b 0  b1 x1  b 2 (1)  ( b 0  b 2 )  b1 x1
dum = 1
b0
different
intercept
same
slope
8
Illustration
y
b0 + b2
b0
slope b1
x1
If H0: β2 = 0 is rejected, then the dummy variable
dum has a significant effect on the response y.
9
Dummy: Gender and Salaries



The coefficient of the dummy variable Group,
bGroup, can be interpreted as the difference in
starting salaries between men and women.
The coefficient is bGroup= 1.024. So, on average,
men have higher starting salaries than women.
The p-value of this coefficient is 61.9298%.
Therefore, the difference in starting salaries
appears to be statistically insignificant.
10
Possible Interaction Effect


There is no significant difference between starting salaries of men and women. But, perhaps, a
significant difference arises during the time of
employment. Put differently, one group of employees may see larger pay increases than the
other one.
Such an effect is called an Interaction Effect.
The variables Group and Years interact in their
respective effects on the response variable
Salary.
11
Slope Dummy Variable

How can we detect the presence of such an
interaction effect?

We need to include an Interaction (Variable),
also called, Slope Dummy Variable.

This new variable is the product of an
explanatory variable and a dummy variable.
12
In General Terms

Regression with the variables, x1, dum and
x1×dum:
yˆ  b0  b1 x1  b2 dum  b3 ( x1  dum)

Substituting values for the dummy:
yˆ  b0  b1 x1  b2 (0)  b3 ( x1  0) 

b0
b1
x1
yˆ  b0  b1 x1  b2 (1)  b3 ( x1 1)  (b0  b2 )  (b1  b3 ) x1
different
intercept
different
slope
13
Illustration
y
slope b1+b3
b0 + b2
b0
slope b1
x1
If H0: β2 = 0 is rejected, then the dummy variable dum
has a significant effect on the response y.
If H0: β3 = 0 is rejected, then the slope dummy variable
x1×dum has a significant effect on the response y.
14
Dummy and Slope Dummy
Regression: Salary ($000)
constant Years of Experience
Group
Group x Years
130.988793
1.175983272 4.61128123
-0.41492239
coefficient
3.49019381
0.407570912 4.497011759
0.462459128
std error of coef
37.5305
2.8853
1.0254
-0.8972
t-ratio
0.0000%
0.4417%
30.6627%
37.0876%
p-value
0.4751
0.1743
-0.2314
beta-weight
standard error of regression
R-squared
adjusted R-squared
11.78553688
13.52%
11.99%
number of observations
residual degrees of freedom
174
170
t-statistic for computing
95%-confidence intervals
1.9740
15
Substituting Values for the Dummy
Estimated Salary = 130.99 + 1.176 Years + 4.611
Group – 0.4149 Group×Years
Equation for women (Group = 0)
Estimated Salary = 130.99 + 1.176 Years
Equation for men (Group = 1)
Estimated Salary = 135.60 + 0.7611 Years
16
Significance

Question: Is there a statistically significant
difference between salaries paid to women and
salaries paid to men?

Answer: The differences in salaries are
statistically insignificant. The p-values of the
dummy variable Group and the slope dummy
variable Group×Years exceed 30%, respectively.
17
Principle of Marginality

Principle of Marginality: if the slope dummy is
statistically significant, retain it as well as both
of its components regardless of their level of
significance.

If the interaction is not statistically significant,
remove it from the regression and re-estimate
the equation. A model without an interaction
term is simpler to interpret since the lines fit to
the groups are parallel.
18
Prediction with Slope Dummy
Predictions, using most-recent regression
Predict
constant
Years of Experience
Group
Group x Years
coefficients
130.98879
1.1759833
4.6112812
-0.4149224
predicted value of Salary ($000)
standard error of prediction
standard error of regression
standard error of estimated mean
confidence level
t-statistic
residual degr. freedom
values for prediction
10
0
0
10
1
10
142.7486
11.92218
11.78554
1.799847
143.2107
11.84443
11.78554
1.179728
95.00%
1.9740
170
confidence limits
for prediction
lower
upper
119.214 119.8296
166.2832 166.5918
confidence limits
for estimated mean
lower
upper
139.1957 140.8819
146.3016 145.5395
19
Best Practices







Be thorough in your search for confounding
variables.
Consider interactions.
Choose an appropriate baseline group.
Write out the fits for separate groups.
Be careful interpreting the coefficient of the
dummy variable.
(Check for comparable variances in the groups.)
(Use color-coding or different plot symbols to
identify subsets of observations in plots.)
20
Pitfalls

Don’t think that you have adjusted for all of the
confounding factors.

Don’t confuse the different types of slopes.

Don’t forget to check the conditions of the
MRM.
21
REVISED MARCH 19, 2014
KARL SCHMEDDERS
Germany’s Bundesliga:
KEL754
Does Money Score Goals?
Some people believe football is a matter of life and death; I am very disappointed with
that attitude. I can assure you it is much, much more important than that.
—William “Bill” Shankly (1913–1981),
Scottish footballer and legendary Liverpool manager
“Tor! [Goal!]” yelled the jubilant announcer as 22-year-old midfielder Toni Kroos of FC
Bayern München fired a blistering shot past Borussia Dortmund’s goalkeeper. After sixty-six
minutes of scoreless football (“soccer” in the United States) on December 1, 2012, Bayern had
pulled ahead of the reigning German champion and Cup winner.
A sigh escaped Franz Dully, a financial analyst who covered football clubs belonging to the
Union of European Football Associations (UEFA). He was disappointed for two reasons: Not
only had a bout with the flu kept him home, but as a staunch Dortmund fan he had a decidedly
nonprofessional interest in the outcome. The day’s showdown between Germany’s top
professional teams and archrivals would possibly be the deciding match for the remainder of the
season; with only three more matches before the mid-season break, FC Bayern had already
obtained the coveted title of Herbstmeister (winter champion).
History had shown that the league leader at the break often went on to win the coveted
German Bundesliga Championship title. It was no guarantee, however, as Dortmund had
demonstrated last season when the club had overcome Bayern’s mid-season lead to take the title
in May. This year Bayern, the league’s traditional frontrunner, was determined to reclaim its
glory (and trophy).
As the station cut to the delighted Bayern fans in the stands, the phone rang. Dully knew
exactly who would be on the other end of the line.
“Tough break, comrade! Wish you were here!” yelled his friend Max Vogel. Dully could
barely hear him over the Bayern fans celebrating at Allianz Arena.
“Let’s skip the schadenfreude, shall we? It’s most unbecoming.”
©2014 by the Kellogg School of Management at Northwestern University. This case was developed with support from the December
2009 graduates of the Executive MBA Program (EMP-76). This case was prepared by Professor Karl Schmedders with the assistance
of Charlotte Snyder and Sophie Tinz. Cases are developed solely as the basis for class discussion. Cases are not intended to serve as
endorsements, sources of primary data, or illustrations of effective or ineffective management. To order copies or request permission
to reproduce materials, call 800-545-7685 (or 617-783-7600 outside the United States or Canada) or e-mail
[email protected]. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or
transmitted in any form or by any means—electronic, mechanical, photocopying, recording, or otherwise—without the permission of
Kellogg Case Publishing.
                          
GERMANY’S BUNDESLIGA
KEL754
“Who, me?” Vogel asked. “Surely you jest. I would never take pleasure in my childhood
friend’s suffering. But disappointment is inevitable when you root for the underdog.”
“That underdog, as you call it, has taken the title for the last two years and we’re going for
three in a row.”
Vogel was undeterred. “Fortunately, I had the foresight to move to Munich, city of
champions. Remember the old saying: Money scores goals. And Bayern has the most.”
“Money is no guarantee of success,” Dully countered.
“Really?” his friend shot back. “Haven’t billionaires from Russia, America, and Abu Dhabi
bought the last three English Premier League titles for Chelsea, Manchester United, and
Manchester City?”
“Well, money certainly helps,” Dully conceded. “But you’re using British examples, and
German football is altogether different. To quote our mutual patron saint Sepp Herberger: ‘The
ball is round’ and football is anything but predictable. This match isn’t over until the whistle
blows, and that’s true for the season, too.”
“Well, you’re the numbers wizard. If anyone can calculate whether money offers an
advantage, it’s you. Your readers might find it interesting if you managed to prove what football
fans think they already know.”
“I’ll see,” said Dully, without enthusiasm.
“I’ll drink a beer for you in the meantime! Feel better! Tschüss!”
Dully grunted and put the phone down, but his friend’s offhand remark stuck with him. With
one eye on the game, he leaned over the side of his chair and felt around for his laptop. He
dreaded Vogel’s gloating if Bayern held onto its lead to win the match; perhaps he could quiet
him down if he met his friend’s challenge to show that money correlated with winning football
matches as surely as a talented striker.
The Bundesliga
Football was widely recognized as one of Germany’s top pastimes. Since the German
Football Association (DFB) was founded in 1900, it had grown to encompass nearly 27,000 clubs
and 6.3 million people around the country.1 Initially the game was played only at an amateur
level, although semi-professional teams emerged after World War II.
Professional football in Germany appeared later than in many of its international
counterparts. The country’s top professional league, known as the Bundesliga, was formed on
July 28, 1962, after Yugoslavia stunned the German national team with a quarter-final World Cup
defeat. Sixteen clubs initially were granted admission to the new league based on athletic
1
Deutscher Fussball-Bund, “History,” http://www.dfb.de/index.php?id=311002 (accessed January 4, 2013).
2
KELLOGG SCHOOL OF MANAGEMENT
                          
KEL754
GERMANY’S BUNDESLIGA
performance, economics, and infrastructural criteria. Enthusiasm developed quickly, and 327,000
people watched Germany’s first professional football matches on August 24, 1963.2
The Bundesliga was organized in two divisions, the 1 and 2 Bundesliga, with the former
drawing far more fan attention than the latter. In 2001 the German Football League was formed to
oversee all regular-season and playoff matches, licensing, and operations for both divisions. As of
2012, eighteen teams competed in each division.
The season ran from August to May, with most games played on weekends. Each team
played every other team twice, once at home and once away. The winner of each match earned
three points, the loser received no points, and a draw earned one point for each team. At the end
of the season, the top team from the 1 Bundesliga was awarded the “Deutsche Meisterschaft”
(German Championship, the Bundesliga title). (The fans jokingly referred to the cup given to the
champion as the “Salad Bowl.”) In 2012 the top three teams of the 1 Bundesliga qualified for the
prestigious European club championship known as the Champions League, and the fourth-place
team was given the opportunity to compete in a playoff round for a Champions League spot.
Within the league, the bottom two teams from the 1 Bundesliga were relegated to the 2
Bundesliga and the top two teams from the 2 Bundesliga were promoted. The team that came in
third from the bottom in the 1 Bundesliga played the third-place team of the 2 Bundesliga for the
final spot in the top league for the following season.
Based on the number of spectators, German football was the most popular sport in the world
after the U.S. National Football League—it had higher attendance per game than Major League
Baseball, the National Basketball Association, and the National Hockey League in the United
States. More people attended football games in Germany than in any other country (see Exhibit
1). From a performance perspective, the UEFA ranked the Bundesliga as the third best league in
Europe after Spain and England.3 Germany had also distinguished itself as one of the two most
successful participants in World Cup history.4
***
Dully roared with glee a few minutes later as Dortmund midfielder Mario Götze evened the
score with a shot that sliced through a pack of players before finding the bottom corner of the
Bayern goal.
This is the magic of German football, he reflected. The neck-and-neck races between the top
few teams, the surprises, the upsets, the legends like Franz Beckenbauer and Lothar Matthäus.
And of course, there were the magical moments, perhaps none more so than that rainy 1954 day
when Germany’s David defeated the Hungarian Goliath and stunned the world by winning the
World Cup in what came to be called the Miracle of Berne.
“Call me mad, call me crazy!”5 the announcer had shrieked over the airwaves when Helmut
Rahn nudged the ball past Hungarian goalkeeper Gyuli Grosics and gave Germany the lead over
2
Silvio Vella, “The Birth of Professional Football in Germany,” Malta Independent, July 28, 2012.
UEFA Rankings, http://www.uefa.com/memberassociations/uefarankings/country/index.html (accessed January 4, 2013).
4
FIFA, “All-Time FIFA World Cup Ranking 1930–2010,” http://www.fifa.com/aboutfifa/officialdocuments/doclists/matches.html
(accessed January 4, 2013).
5
Ulrich Hesse-Lichtenberger, Tor!: The Story of German Football (London: WSC Ltd, 2003), 126.
3
KELLOGG SCHOOL OF MANAGEMENT
3
                          
GERMANY’S BUNDESLIGA
KEL754
the Hungarians, a team that had gone unbeaten for thirty-one straight games in the preceding four
years and was considered the undisputed superpower of world football.6 Minutes later, the
Germans raised the Jules Rimet World Cup trophy high for the first time.
Bundesliga Finances: The Envy of International Football
Most European football clubs wrestled with finances: In the 2010–2011 season, the twenty
clubs in the English Premier League showed £2.4 billion in debt,7 a figure surpassed by the
twenty Spanish La Liga clubs, which hit €3.53 billion (£2.9 billion).8 In contrast, the thirty-six
Bundesliga clubs showed a net profit of €52.5 million in 2010–2011. The Bundesliga had the
distinction of being the most profitable football league in the world.
In 2010–2011 the Bundesliga had revenues of €2.29 billion, more than half of which came
from advertising and media management (see Exhibit 2).9 Television was one of the largest
sources of income. This money was split between the football clubs according to their
performance during the season.
Secrets of the Bundesliga’s success included club ownership policies, strict licensing rules,
and low ticket costs. With a few notable exceptions, German football clubs were large
membership associations with the same majority owner: their members. League regulations
dictated a 50+1 rule, which meant that club members had to maintain control of 51 percent of
shares. This left room for private investment without risking instability as a result of individual
entrepreneurs with deep pockets taking over teams and jeopardizing long-term financial stability
for short-term success on the field.
Bundesliga licensing procedures mandated that clubs had to open their books to league
accountants and not spend more than they made in order to avoid fines and be granted a license to
play the following year. Among a host of other stipulations, precise rules established liquidity and
debt requirements; Teutonic efficiency had little patience for inflated transfer fees and spiraling
wages that could send clubs into financial ruin.
Football player salaries were the highest of any sport in the world. A 2012 ESPN survey
revealed that seven of the top ten highest-paying sports teams were football clubs, with U.S.
major league baseball and basketball clubs rounding out the set. FC Barcelona’s players led the
world’s professional athletes with an average salary of $8.68 million—a weekly salary of
$166,934. Real Madrid players followed close behind with an average salary of $7.80 million per
year.10
While the salaries were impressive, the cost of transferring players between countries and
leagues could be even more so. A transfer fee was paid to a club for relinquishing a player (either
still under contract or with an expired contract) to an international counterpart, and such transfers
6
FIFA, “1954 World Cup Switzerland,” http://www.fifa.com/worldcup/archive/edition=9/overview.html.
Deloitte Annual Review of Football Finance, May 31, 2012.
8
“La Liga Debt Crisis Casts a Shadow Over On-Pitch Domination,” Daily Mail, April 19, 2012.
9
Bundesliga Annual Report 2012, p. 50.
10
Jeff Gold, “Highest-Paying Teams in the World,” ESPN, May 2, 2012.
7
4
KELLOGG SCHOOL OF MANAGEMENT
                          
KEL754
GERMANY’S BUNDESLIGA
were regulated by football’s world governing body, the Fédération Internationale de Football
Association (FIFA). Historically, transfers were permitted twice a year—for a longer period
during the summer between seasons, and for a shorter period during the winter partway through
the season. FIFA reported that $3 billion was spent transferring players between teams in 2011
and that a transfer was conducted every 45 minutes.11 Although the average transfer fee was $1.5
million in 2011, clubs often paid top dollar to secure star power. In 2011 thirty-five players
transferred at fees exceeding €15 million,12 including Javier Pastore, who transferred from
Palermo to Paris Saint-Germain for €42 million.13 The highest transfer fee ever paid was €94
million by Real Madrid to Manchester United for Cristiano Ronaldo in 2009.
After financial crises in the business world demonstrated that no company was “too big to
fail” and evidence to this effect began mounting in the football world, the UEFA approved fair
play legislation in 2010 requiring teams to live within their means or face elimination from
competition. The policies were designed to prevent football teams from crumpling under
oppressive debt and to ensure a more stable economic future for the game.14 The legislation was
to be phased in over several years, with some key components taking effect in the 2011–2012
season.
Because the Bundesliga already operated under a system that linked expenditure with
revenue, wealth was relatively evenly distributed among the clubs, and teams could not vastly
outspend one another as was frequently the case in the Spanish La Liga and the British Premier
League. As a result, a greater degree of competitive parity made for exciting matches and
competition for the Deutsche Meisterschaft.
The league’s reasonable ticket prices made Germany arguably one of the greatest places in
the world to be a football fan. A BBC survey revealed that the average price of the cheapest
match ticket in the Premier League was £28.30 ($46), but season tickets to Dortmund matches,
for example, cost only €225 ($14 per game including three Champions League games) and
included free rail travel. In comparison, season tickets to Arsenal matches (the most expensive in
the Premier League) cost £1,955 ($3,154) for 2012–2013.15
Germany had some of the biggest and most modern stadiums in the world as the result of
€1.4 billion spent by the government expanding and refurbishing them in preparation for hosting
the 2006 World Cup.16 According to the London Times, two German stadiums made the list of the
world’s ten best football venues—the Signal Iduna Park (formerly known as Westfalenstadion) in
Dortmund (ranked number one) and the Allianz Arena in Munich (number five).
During the 2010–2011 season, more than 17 million people watched Bundesliga football
matches live in stadiums, and the 1 Bundesliga attendance averaged a record-breaking 42,101 per
game.17 The average attendance at Dortmund’s Signal Iduna Park in the first half of the 2012–
11
Tom McGowan, “A FIFA First: Football’s Transfer Figures Released,” CNN, March 6, 2012.
Mark Chaplin, “Financial Fair Play’s Positive Effects,” UEFA News, August 31, 2012.
13
“PSG Complete Record-Breaking Pastore Transfer,” UEFA News, August 6, 2011.
14
“Financial Fair Play Regulations Are Approved,” UEFA News, May 27, 2010.
15
“Ticket Prices: Arsenal Costliest,” ESPN News, October 18, 2012.
16
“German Football Success: A League Apart,” The Economist, May 16, 2012.
17
Bundesliga Annual Report 2012, p. 56.
12
KELLOGG SCHOOL OF MANAGEMENT
5
                          
GERMANY’S BUNDESLIGA
KEL754
2013 Bundesliga season was 80,577.18 In addition, around 18 million people—nearly a quarter of
the country—tuned in to the Bundesliga matches on television each weekend.19 No other leisure
time activity consistently generated that level of interest in Germany.
FC Bayern München
In the Bundesliga’s fifty-year history, FC Bayern München had been a perennial powerhouse;
the club boasted twenty-one title victories and an aggregate advantage of nearly 500 points in the
“eternal league table.”
Conventional wisdom held that clubs with a higher market value were more likely to win
championships because they could afford to pay the highest wages and transfer fees to attract the
best talent. FC Bayern was the eighth highest-paying sports team in the world, with an average
salary of $5.9 million per player according to ESPN in 2012.20 The highest transfer fee ever paid
in the Bundesliga occurred in the summer of 2012 when Bayern bought midfielder Javi Martinez
from the Spanish team Athletic Bilbao for €40 million.21 Bayern’s appearance in the Champions
League in eleven of the previous twelve years (including one first-place and two second-place
finishes) raised the team to new heights on the international stage and increased its brand value;
in 2012 it was the second most valuable football club brand in the world according to Brand
Finance, a leading independent brand valuation consultancy (see Table 1).
Table 1: Bundesliga Club Brand Value and Average Player Salary
Club
FC Bayern München
Number
of Titles
2012
Rank
2012 Market Value
($ in millions)
Average Annual Salary per Player for
the 2011–2012 Season ($ in millions)
21
2
786
5,907,652
FC Schalke 04
0
10
266
4,187,722
Borussia Dortmund
Hamburger SV
5
3
11
17
227
153
3,122,824
2,579,904
VfB Stuttgart
3
28
71
2,721,154
SV Werder Bremen
4
30
68
2,734,924
Source: Brand Finance Football Brands 2012 and Jeff Gold, “Highest-Paying Teams in the World,” ESPN, May 2, 2012.
Bayern was also the only Bundesliga club to appear on the Forbes magazine list of the fifty
most valuable sports franchises worldwide. It was one of five football teams that consistently
appeared alongside the National Football League teams that dominated the list—from 2010 to
2012, the club’s ranking climbed from 27 to 14. In 2012 the magazine estimated that Bayern had
the fourth highest revenue of any football team in the world and valued the club at $1.23 billion.22
18
“Europe’s Getting to Know Dortmund,” Bundesliga News, December 26, 2012.
“Sky Strikes Bundesliga Deal with Deutsche Telekom,” Reuters, January 4, 2013.
20
Gold, “Highest-Paying Teams in the World.”
21
“Javi Martinez Joins Bayern Munich,” ESPN News, August 29, 2012.
22
Kurt Badenhausen, “Manchester United Tops the World’s 50 Most Valuable Sports Teams,” Forbes, July 16, 2012.
19
6
KELLOGG SCHOOL OF MANAGEMENT
                          
KEL754
GERMANY’S BUNDESLIGA
Despite Bayern’s privileged position, competition in the league remained strong. All eighteen
of the 1 Bundesliga teams ranked among the top 200 highest-paying sports teams in the world,
with average salaries above $1.3 million per year for the 2011–2012 season.23 The Bundesliga’s
depth kept seasons interesting: since 2000, five different teams had won the title and two more
had been Herbstmeister (see Exhibit 3).
Seeking Correlation
Dully flipped off the television and went to the kitchen to get some food. The match had
ended in a 1–1 draw, leaving the country in suspense over whether Bayern would run away from
the pack in the league table or if Dortmund could catch up. The phone rang again.
“Have you proven me right yet?” Vogel asked above the din.
“No,” said Dully. “I’m averse to promoting ‘financial doping.’”
“You always were an idealist,” Vogel observed. “Or a purist or something.”
“I’m the complement to your cynicism.”
“Ah yes, that must be why we get along so well. I’d like to see your analysis, though, when
you actually come up with some.”
“Funny you should ask for that,” Dully said. “I’ll get back to you. Maybe.”
After a few more minutes of banter followed by well-intentioned plans for catching up
someday soon, the friends hung up. Dully returned to the living room and flopped on the couch.
The analyst wondered about the future of a Bundesliga with one team that was much
wealthier than the rest—would it remain competitive and exciting or, as Vogel said, would
“money shoot goals” and give those rich Bayern the German Cup year after year?
Dully returned to the spreadsheet he had started during the match, looking for a statistical
correlation between money and Bundesliga success.
23
Gold, “Highest-Paying Teams in the World.”
KELLOGG SCHOOL OF MANAGEMENT
7
                          
GERMANY’S BUNDESLIGA
KEL754
Exhibit 1: Comparison of Sporting League Attendance Worldwide, 2010–2011
Season
League
Average Attendance per Game
U.S. National Football League
66,960
German Bundesliga
42,690
Australian A-League
38,243
British Premier League
35,283
U.S. Major League Baseball
30,066
Spanish La Liga
Mexican Liga MX
29,128
27,178
Italian Serie A
24,031
French Ligue 1
19,912
Dutch Eredivisie
19,116
Source: ESPN Soccer Zone, WorldFootball.net, and Bundesliga Annual Report 2012, p. 56.
Exhibit 2: Bundesliga Revenue
1 BUNDESLIGA REVENUE
Sector
Revenue (€ in thousands)
% Revenue
Match earnings
411,164
Advertisement
522,699
26.92
Media management
519,629
26.76
Transfers
195,498
10.07
Merchandising
Other
Total
21.17
79,326
4.08
213,665
11.00
1,941,980
100
Source: “Bundesliga Report 2012: The Economic State of German Professional Football,” January 23, 2012.
TOTAL REVENUE FOR 1 AND 2 BUNDESLIGA
Sector
Revenue (€ in thousands)
% Revenue
Match earnings
Advertisement
469,510
634,010
20.41
27.57
Media management
629,079
27.35
Transfers
215,110
9.35
Merchandising
Other
Total
89,493
3.89
262,779
11.43
2,299,980
100
Source: “Bundesliga Report 2012: The Economic State of German Professional Football,” January 23, 2012.
8
KELLOGG SCHOOL OF MANAGEMENT
                          
KEL754
GERMANY’S BUNDESLIGA
Exhibit 3: Bundesliga Mid-Season Leaders and Champions
Season
Mid-Season Leader
2012–2013
FC Bayern München
Champion
2011–2012
FC Bayern München
Borussia Dortmund
2010–2011
Borussia Dortmund
Borussia Dortmund
2009–2010
Bayer 04 Leverkusen
FC Bayern München
2008–2009
1899 Hoffenheim
VfL Wolfsburg
2007–2008
FC Bayern München
FC Bayern München
2006–2007
SV Werder Bremen
VfB Stuttgart
2005–2006
FC Bayern München
FC Bayern München
2004–2005
FC Bayern München
FC Bayern München
2003–2004
SV Werder Bremen
SV Werder Bremen
2002–2003
FC Bayern München
FC Bayern München
2001–2002
Bayer 04 Leverkusen
Borussia Dortmund
2000–2001
FC Schalke 04
FC Bayern München
1999–2000
FC Bayern München
FC Bayern München
1998–1999
FC Bayern München
FC Bayern München
1997–1998
1.FC Kaiserslautern
1.FC Kaiserslautern
1996–1997
FC Bayern München
FC Bayern München
1995–1996
Borussia Dortmund
Borussia Dortmund
Borussia Dortmund
1994–1995
Borussia Dortmund
1993–1994
Eintracht Frankfurt
FC Bayern München
1992–1993
FC Bayern München
SV Werder Bremen
1991–1992
Eintracht Frankfurt
VfB Stuttgart
1990–1991
SV Werder Bremen
1.FC Kaiserslautern
1989–1990
FC Bayern München
FC Bayern München
1988–1989
FC Bayern München
FC Bayern München
1987–1988
SV Werder Bremen
SV Werder Bremen
1986–1987
Hamburger SV
FC Bayern München
1985–1986
SV Werder Bremen
FC Bayern München
1984–1985
FC Bayern München
FC Bayern München
1983–1984
VfB Stuttgart
VfB Stuttgart
1982–1983
Hamburger SV
Hamburger SV
1981–1982
1.FC Köln
Hamburger SV
1980–1981
Hamburger SV
FC Bayern München
1979–1980
FC Bayern München
FC Bayern München
1978–1979
1.FC Kaiserslautern
Hamburger SV
1977–1978
1.FC Köln
1.FC Köln
1976–1977
Borussia Mönchengladbach
Borussia Mönchengladbach
1975–1976
Borussia Mönchengladbach
Borussia Mönchengladbach
1974–1975
Borussia Mönchengladbach
Borussia Mönchengladbach
1973–1974
FC Bayern München
FC Bayern München
1972–1973
FC Bayern München
FC Bayern München
1971–1972
FC Schalke 04
FC Bayern München
1970–1971
FC Bayern München
Borussia Mönchengladbach
1969–1970
Borussia Mönchengladbach
Borussia Mönchengladbach
1968–1969
FC Bayern München
FC Bayern München
1967–1968
1.FC Nürnberg
1.FC Nürnberg
1966–1967
Eintracht Braunschweig
Eintracht Braunschweig
1965–1966
TSV 1860 München
TSV 1860 München
1964–1965
SV Werder Bremen
SV Werder Bremen
1963–1964
1.FC Köln
1.FC Köln
Source: Bundesliga, “History Stats,” http://www.bundesliga.com/en/stats/history (accessed January 4, 2013).
KELLOGG SCHOOL OF MANAGEMENT
9
                          
GERMANY’S BUNDESLIGA
KEL754
Questions
PART I
1. What were the smallest, average, and largest market values of football teams in the
Bundesliga in the 2011–2012 season?
2. Develop a regression model that predicts the number of points a team earns in a season based
on its market value. Write down the estimated regression equation.
3. Are the regression coefficients statistically significant? Explain.
4. Carefully interpret the slope coefficient in your regression in the context of the case.
5. Conventional wisdom among football traditionalists states that the aggregate number of
points at the end of a Bundesliga season closely correlates with the market value of a club.
Simply put, “money scores goals,” which in turn lead to wins and points. Comment on this
wisdom in light of your regression equation.
6. Some of the (estimated) market values at the beginning of the 2012–2013 season were as
follows:
SC Freiburg
€46,650,000
1.FSV Mainz 05
€46,000,000
Eintracht Frankfurt
€49,400,000
Provide a point estimate for the difference between the number of points Eintracht Frankfurt
and 1.FSV Mainz 05 will earn in the 2012–2013 season.
7. Provide a point estimate and a 95% interval for the number of points SC Freiburg will earn in
the 2012–2013 season.
PART II
The first half of a Bundesliga season ends in mid-December. After a break for the holiday season
and potentially bad winter weather (which could lead to the cancellation of games) the league
resumes play in late January.
8. Develop a regression model that predicts the number of points a team earns at the end of a
season based on its market value and the number of points it earned during the first half of the
season. Write down the estimated regression equation.
9. Carefully interpret the two slope coefficients in your regression in the context of the case.
10. Compare your regression equation to the simple linear regression you obtained in Part I. How
did the coefficient of the variable Marketvalue_2011_Mio (€ in millions) change? Provide an
explanation for the difference.
10
KELLOGG SCHOOL OF MANAGEMENT
                          
KEL754
11. Drop all insignificant variables (use
GERMANY’S BUNDESLIGA
= 0.05). Write down the final regression equation.
12. At the beginning of the 2012–2013 season, the market value of Borussia Mönchengladbach
was estimated to be €88,350,000; the market value of 1.FC Nürnberg was estimated at
€41,500,000. During the first half of the 2012–2013 season, Borussia Mönchengladbach
earned 25 points and 1.FC Nürnberg, 20 points.
Provide a point estimate and an 80% interval for the number of points Borussia
Mönchengladbach will earn in the 2012–2013 season.
13. Provide a point estimate for the difference between the number of points Borussia
Mönchengladbach and 1.FC Nürnberg will earn in the 2012–2013 season.
14. An intuitive claim may be that, on average, a team earns twice as many points in an entire
season as it earns in the first half of the season. Put differently, on average, the total number
of a team’s points should just be two times the number of points at mid-season. Can you
reject this claim based on your regression model (at a significance level of = 0.05)?
KELLOGG SCHOOL OF MANAGEMENT
11
                          
KARL SCHMEDDERS AND MARKUS SCHULZE
5-215-250 Solid as Steel: Production Planning at
ThyssenKrupp
On Monday, March 31, 2014, production manager Markus Schulze received a call from
Reinhardt Täger, senior vice president of ThyssenKrupp Steel Europe’s production operations in
Bochum, Germany. Täger was preparing to meet with the company’s chief operating officer and
was eager to learn the reasons why the current figures of one of Bochum’s main production lines
were far behind schedule. Schulze explained that the line had had three major breakdowns in
early March and therefore would miss the planned utilization rate for that month. Consequently,
the scheduled production volume could not be carried out. Schulze knew that a lack of production
capacity utilization would lead to unfulfilled orders at the end of the planning period. In a rough
steel market with fierce competition, however, delivery performance was an important
differentiation factor for ThyssenKrupp.
Täger wanted a chance to review the historic data, so he and Schulze agreed to meet later that
week to continue their discussion.
After looking over the production figures from the past ten years, Täger was shocked. When
he met with Schulze later that week, he expressed his frustration. “Look at the historic data!”
Täger said. “All but one of the annual deviations from planned production are negative. We never
achieved the production volumes we promised in the planning meetings. We need to change
that!”
“I agree,” Schulze replied. “Our capacity planning is based on forecast figures that are not
met in reality, which means we can’t fulfill all customers’ orders in time. And the product cost
calculations are affected, too.”
“You’re right,” Täger said. “We need appropriate planning figures to meet the agreed
delivery time in the contracts with our customers. What do you think would be necessary for
that?”
“Hm, I guess we need a broad analysis of data to identify the root causes.” Schulze answered.
“It’ll take some time to build queries for the databases and aggregate data. And—”
“Stop!” Täger interrupted him. “We need data for the next planning period. The planning
meeting for May is in two weeks.”
©2015 by the Kellogg School of Management at Northwestern University. This case was prepared by Markus Schulze (Kellogg-WHU
’16) under the supervision of Professor Karl Schmedders. It is based on Markus Schulze’s EMBA master’s thesis. Cases are
developed solely as the basis for class discussion. Cases are not intended to serve as endorsements, sources of primary data, or
illustrations of effective or ineffective management. To order copies or request permission to reproduce materials, call 847.491.5400
or e-mail [email protected]. No part of this publication may be reproduced, stored in a retrieval system, used in a
spreadsheet, or transmitted in any form or by any means—electronic, mechanical, photocopying, recording, or otherwise—without the
permission of Kellogg Case Publishing.
PRODUCTION PLANNING AT THYSSENKRUPP
5-215-250
ThyssenKrupp Steel Europe
ThyssenKrupp Steel Europe, a major European steel company, was formed in a 1999 merger
between historic German steel makers Thyssen and Krupp, both of which had been founded in the
nineteenth century. ThyssenKrupp Steel Europe annually produced up to 12 million metric tons
of steel with its 26,000 employees. In fiscal year 2013–2014, the company accounted for €9
billion of sales, roughly a quarter of the group sales of its parent company, ThyssenKrupp AG,
which traded on the DAX 30 (an index of the top thirty blue-chip German companies). Its main
drivers of success were customer orientation and reliability in terms of product quality and
delivery time.
Bochum Production Lines
The production lines at ThyssenKrupp Steel’s Bochum site were
supplied with interim products delivered from the steel mills in
Duisburg, 40 kilometers west of Bochum. Usually, slabs1 were
brought to Bochum by train and then processed in the hot rolling mill
(see Figure 1). The outcome of this production step was coiled hot
strip2 (see Figure 2) with mill scale3 on its surface. Whether the steel
would undergo further processing in the cold rolling mill or would be
sold directly as “pickled hot strip,” the mill scale needed to be
removed from the surface.
The production line in which Täger and Schulze were interested,
a so-called push pickling line (PPL), was designed to remove mill
scale from the upstream hot rolling process. To remove the scale, the
hot strip was uncoiled in the line and the head of the strip was pushed
through the line. The processing part of the line held pickling
containers filled with hot hydrochloric acid, which removed the scale
from the surface. Following this pickling, the strip was pushed
through a rinsing section to remove any residual acid from the
surface. After oiling for corrosion protection, the strip was coiled
again. The product of this step, pickled hot strip, could be sold to
B2B customers, mainly in the automotive industry.
Other types of pickling lines were operated as continuous lines,
in which the head of a new strip was welded to the tail of the one that
preceded it. The differentiating factor of a PPL was its batching
process, which involved pushing in each strip individually.
Production downtimes due to push-in problems did not occur at
continuous lines, but with PPLs this remained a concern.
Figure 1.
Source: ThyssenKrupp AG,
http://www.thyssenkrupp.com/en/presse
/bilder.html&photo_id=898.
Figure 2.
Source: ThyssenKrupp AG,
http://www.thyssenkrupp.com/en/presse
/bilder.html&photo_id=891.
1
Slabs are solid blocks of steel formed in a continuous casting process and then cut into lengths of about 20 meters.
A coiled hot strip is an intermediate product in steel production. Slabs are rolled at temperatures above 1,000°C. As they thin out
they become longer; the result is a flat strip that needs to be coiled.
3
Mill scale is an iron oxide layer on the hot strip’s surface that is created just after hot rolling, when the steel is exposed to air (which
contains oxygen). Mill scale protects the steel to a certain extent, but it is unwanted in further processes such as stamping or cold
rolling.
2
2
KELLOGG SCHOOL OF MANAGEMENT
5-215-250
PRODUCTION PLANNING AT THYSSENKRUPP
Nevertheless, ThyssenKrupp chose to build a PPL in 2000 because increasing demand for highstrength steel made it profitable to invest in such a production line. At that time, high-strength
steel grades could not be welded to one another with existing machines, and the dimensions (at a
thickness of more than 7.0 millimeters) could not be processed in continuous lines.
The material produced on the PPL was not simply a commodity called steel. Rather, it was a
portfolio of different steel grades—that is, different metallurgical compositions with specific
mechanical properties. (For purposes of this case, the top five steel grades in terms of annual
production volume have been randomly assigned numbers from 1 to 5.) Within these top five
grades were two high-strength steel grades. These high-strength grades were rapidly cooled after
the hot rolling process—from around 1,000°C down to below 100°C. Removing the mill scale
generated during this rapid cooling process required a different process speed in the pickling line.
Only one of the five grades could be processed without limitations in speed and without expected
downtimes.
Performance Indicators
At ThyssenKrupp, managers responsible for production lines needed to report regularly on
the performance of the lines and the fulfillment of individual objectives. The output, or
throughput, of the production lines had always been an important metric. Even today, coping with
overcapacities and customers’ increasing demands concerning product quality, the line
throughput was part of the set of key performance indicators. These indicators were taken into
account for internal benchmarking against comparable production lines at other sites. The linespecific variable production cost was calculated as cost over throughput and was expressed in
euros per metric ton. Capacity planning was based on these figures, eventually resulting in
delivery time performance. In the steel industry, production reports contained performance
indicators at different levels of aggregation. A very important metric was throughput (tons4
produced) per time unit5; the performance indicator run time ratio6 (RTR) was the portion of time
used for production (run time) compared to the operating time of a production line.
Operating time = Calendar time – (legal holidays, shortages,7 all scheduled maintenance)
Run Time = Operating time – (breakdowns, exceeding downtime for maintenance, set-up time)
Both figures were reported not only on a daily basis (i.e., a 24-hour production period) but
also monthly and per fiscal year. Deviations from planned figures were typically noted in
automated reports containing database queries. Thus, every plant manager received an overview
of past periods. Comparable production lines of different sites were benchmarked internally.
4
Throughout this case, the term “ton” refers to a metric ton.
Tons produced are usually reported by shift (eight hours), by month, and eventually by fiscal year.
6
The metric run time ratio is calculated as run time over operating time (e.g., 8 hours of operating time, or 480 minutes, with 48
minutes of downtime yields a RTR of 90%).
7
Shortages can refer to material shortages, lack of orders, labor disputes, or energy/fuel shortages (external).
5
KELLOGG SCHOOL OF MANAGEMENT
3
PRODUCTION PLANNING AT THYSSENKRUPP
5-215-250
Deviation from Planned Throughput
Steel production lines had typical characteristics and an average performance calculated
based on an average production portfolio, mostly determined empirically using historic figures.
For planning purposes, a fixed number was usually used to place order volumes on the production
lines and in this way “fill capacities.” On a monthly basis, real orders then were placed to a
certain amount, which was capped by the line capacity. Each month’s production figures had
three possible outcomes.
The first possibility was that the planned throughput would be reached and at the end of the
month there would be extra capacity. In this case, the extra capacity would be filled with orders
from the next month if the intermediate product already were available for processing. Otherwise,
the line would stand still without fulfilling orders. This mode was very expensive because idle
capacity would be wasted, and fixed costs occurred anyway.
The second possibility was that the planned throughput would not be reached. This would
mean that at the end of the month, orders would be left that could not be fulfilled. This mode was
also very expensive because the planned capacity could not be used, and real production costs
were higher than pre-calculated. Product calculation would result in prices that were too low, so
contribution margins would be much lower than expected—or even negative.
In the third scenario, the exact planned throughput would be met (+/- 100 tons per month, or
+/- 1,200 tons per year, was set as accurate). This was the ideal case, but this had occurred only
once in the first ten years of line history (see the annual figures in Table 1).
Table 1: Annual Deviation from Planned Production in the First Ten Years of Line Operation
Year of Operation
Annual Deviation from Planned
Production (tons)
1
- 23,254
2
- 22,691
3
+ 1,115
4
- 22,774
5
- 2,807
6
- 20,363
7
(financial crisis)
- 66,810
8
- 21,081
9
- 4,972
10
- 9,486
Each month, production management had to explain the deviation from planned figures.
Many reasonable explanations had been given in the past. Major breakdowns were a common
explanation because downtimes directly influenced the RTR. The RTR theory—the lower the run
time ratio, the higher the negative deviation from the plan—was often mentioned as the
dominating force behind the PPL not achieving the planned throughput.
The production engineers’ gut feeling was that a straightforward reason would explain
patterns that showed peaks “against the RTR theory,” namely the material structure: The resulting
4
KELLOGG SCHOOL OF MANAGEMENT
5-215-250
PRODUCTION PLANNING AT THYSSENKRUPP
throughput can be explained on the basis of whether the material structure is favorable or
unfavorable. A specific metric of the structure was the ratio meters per ton (MPT), a dimension
indicator. The MPT theory reflected the fact that material with a low thickness and/or a low width
carried a lower weight per meter. In other words, it took longer to put one ton of material through
the production line if the process speed remained constant. According to the MPT theory,
negative deviations in months with average or above-average RTR could be explained by this
metric.
Data
Schulze realized he had to compile data carefully in order to have any hope of finding
possible explanations for the deviations from planned throughput. He decided to define aggregate
clusters for material dimensions such as the width and the thickness of the strips.
The technical data of the Bochum PPL relevant to the data collection were:
Width:
Thickness:
Maximum throughput:
800 to 1,650 mm
1.5 to 12.5 mm
80,000 tons per month
Then Schulze reviewed available past production data, beginning with the night shift on
October 1, 2013, up until the early shift on April 4, 2014. Unfortunately, he had to omit a few
shifts during this six-month period because of missing or obviously erroneous data. Schulze’s
data set accompanies this case in a spreadsheet.
The explanation of the variables in the data set is as follows:
Shift:
The day and time at the beginning of a shift.
Shift type:
The production line operated 24/7 with three eight-hour shifts; the early
shift (“E”) started at 6 a.m., the late (or Midday) shift (“M”) started at 2
p.m., and the night shift (“N”) started at 10 p.m.
Shift number:
ThyssenKrupp Steel used a continuous rolling shift system with five
different shift groups (shift group 1, shift group 2, etc.). The binary
variables indicate whether the shift group i worked a particular shift.
Weekday:
The line operated Monday through Sunday, but engineers usually worked
Monday to Friday on a dayshift basis (usually starting at 7 a.m.).
Throughput:
The throughput (in tons) during a shift.
Delta throughput:
The deviation (in tons) of actual throughput from planned throughput.
KELLOGG SCHOOL OF MANAGEMENT
5
PRODUCTION PLANNING AT THYSSENKRUPP
5-215-250
MPT:
A dimension indicator (meters per ton).
Thickness clusters:
Each cluster represented a certain scope of material thickness in
millimeters within the technical feasible range of the production line.
Strips fell into one of three clusters. The variables “thickness 1,”
“thickness 2,” and “thickness 3” denote the number of strips from the first,
second, and third thickness clusters, respectively, that were processed
during a shift.
Width clusters:
Each cluster represented a certain scope of material width in millimeters
within the technically feasible range of the production line. Strips fell into
one of three width clusters. The variables “width 1,” “width 2,” and “width
3” denote the number of strips from the first, second, and third width
clusters, respectively, that were processed during a shift.
Steel grades:
Strips of many different steel grades were processed on the line. The steel
grades 1 to 5 are the grades with the largest portion by volume. The
variables “grade 1,” “grade 2,” “grade 3,” “grade 4,” and “grade 5” denote
the proportion (in %) of steel of that grade that was processed during a
given shift. The remaining strips were of other steel grades; their
proportion is given by “grade rest.”
RTR:
The run time ratio (in %), which is calculated as run time divided by
operating time.
Schulze quickly realized he had data on more variables than he could employ for his analysis.
Obviously, the total number of strips in the three width clusters had to be the same as the total
number of strips in the three thickness clusters. Similarly, the proportions of the six different steel
grades always added up to 100%. Schulze also decided to omit the dimension indicator (MPT) for
his own analysis, as he now had much more detailed and reliable information about the size of the
strips.
After the analysis of the aggregated and clustered data, Schulze looked at his prediction
model for delta throughput. From his experience, he knew he had found the key drivers for
deviations from the planned production volume. “Look at this equation,” he said to the production
engineer in charge of the PPL. “The model coefficients determine the outcome, which is the
deviation from planning. If we had the forecast figures for May, I could predict the deviation
based on this model. Please get the numbers of coils from the different clusters and the
proportions of the different steel grades. For the RTR, I’m guessing 86% is an appropriate
figure.”
6
KELLOGG SCHOOL OF MANAGEMENT
5-215-250
PRODUCTION PLANNING AT THYSSENKRUPP
Assignment Questions
PART A: INITIAL ANALYSIS
First, obtain an initial overview of the data. Next, plan to examine the two theories proposed
by the production engineers.
Questions:
1. Perform a univariate analysis and answer the following questions:
a. What is the average number of strips per shift?
b. Strips of which thickness cluster are the most common, and strips of which thickness
cluster are the least common?
c. What are the minimum, average, and maximum values of delta throughput and RTR?
d. Are there shifts during which the PPL processes strips of only steel grade 1, or of only
steel grade 2, etc.?
2. Can the RTR theory adequately explain the deviations from the planned production figures?
Explain why or why not.
3. Is the MPT theory sufficient to explain the deviations? Explain why or why not.
PART B: SCHULZE’S MODEL
Now interpret Schulze’s model.
Questions:
4. Develop a sound regression model that can be used to predict delta throughput based on the
characteristics of the strips scheduled for production. Include only explanatory variables that
have a coefficient with a 10% level of significance.
5. Interpret the coefficient of RTR for the PPL and provide a 90% confidence interval for the
value of the coefficient (in the population).
6. A strip of thickness 1 and width 1 is replaced by a strip of thickness 3 and width 3. This
change does not affect any other aspect of the production. Provide an estimate for the change
in delta throughput.
PART C: PREDICTION OF MAY THROUGHPUT
Two weeks after the first phone call about the deviations of production figures from planned
volumes, Schulze was happy to have a sound prediction model on hand. Now he was looking
forward to applying the model for future planning periods. The planning meeting for May was
scheduled for the next day, and the production engineers have provided the requested materialstructure data that would serve as input for the model.
“Let’s see what the prediction tells us,” Schulze said to Täger. As usual, the initial plan
included an average capacity of 750 tons per shift. “I’m pretty sure the initial estimate will yield a
KELLOGG SCHOOL OF MANAGEMENT
7
PRODUCTION PLANNING AT THYSSENKRUPP
5-215-250
useful first benchmark, but we also need to look at the uncertainty in the forecast,” Schulze
continued, and he entered the data.
“All right,” Täger replied. “I can see the predicted deviation from planned production for the
next month in the model. We should show this in the planning meeting tomorrow and adjust the
line capacity for May.”
The next day, the predicted outcome was included in the monthly planning for the very first
time. A new era of production planning at ThyssenKrupp Steel Europe had begun.
Next, determine Schulze’s forecast.
Questions:
7. The table below shows the data provided by the production engineers. Because of major
upcoming maintenance on the PPL, only 84 shifts were planned for the month of May.
Provide an estimate for the average delta throughput per shift in May based on these
estimated figures. (The actual figures are, of course, still unknown.)
Table 2: Planned Production in May (units of all forecasts: numbers of strips)
Characteristic
Forecast
Thickness 1
996
Thickness 2
1,884
Thickness 3
434
Width 1
1,242
Width 2
1,191
Grade 1
109
Grade 2
709
Grade 3
167
Grade 4
243
Grade 5
121
8. Provide a 90% confidence interval for the average delta throughput per shift in May.
9. An RTR of 86% for a production facility such as the Bochum PPL is considered a good
value. A value of 90% would be considered world class. The effort to increase production
performance measured in RTR by just one percentage point, from 86% to 87%, is assumed to
be very costly. In light of your model, would you expect such a performance improvement to
pay for itself?
PART D: ADDITIONAL ANALYSIS
Schulze’s prediction model led to an intensive discussion in the production-planning meeting
that provided him with much food for thought. As a result, he decided to analyze whether the
inclusion of some human or timing factors potentially could enhance his prediction model.
In the final part of the analysis, consider some enhancements to your model.
8
KELLOGG SCHOOL OF MANAGEMENT
5-215-250
PRODUCTION PLANNING AT THYSSENKRUPP
Questions:
10. Determine whether, for given production quantities, the performance of the PPL depends on
the group working each shift. Can you detect any significantly over- or under-performing
shift groups?
11. Tests and rework are regularly scheduled on early shifts during the week (but not on
weekends). Both involve interruptions and slow process speed, which are not indicated as
downtimes and are not included in the RTR. As a result, all else being equal, early shifts
during the week should process less steel than the other shifts. Can you show the presence of
this effect?
12. Provide a final critical evaluation of your prediction model. What are the key insights with
respect to production planning at the Bochum PPL? What are the weaknesses of your model?
KELLOGG SCHOOL OF MANAGEMENT
9
KH19, Exercises
Exercises
QUESTION 1
Unoccupied seats on flights cause airlines to lose revenues. A large airline wants to estimate its
average number of unoccupied seats per flight over the past year. To accomplish this, the records
of 225 flights are randomly selected, and the number of unoccupied seats is noted for each of the
flights in the sample. The sample mean is 14.5 seats and the sample standard deviation is s = 8.2
seats.
a) Provide a 95% confidence interval for the mean number of unoccupied seats per flight during
the past year.
b) Provide an 80% confidence interval for the mean number of unoccupied seats per flight
during the past year.
c) Can you prove, at a 2% level of significance, that the average number of unoccupied seats per
flight during the last year was smaller than 15.5?
QUESTION 2
During the National Football League (NFL) season, Las Vegas odds-makers establish a point
spread on each game for betting purposes. The final scores of NFL games were compared against
the final spreads established by the odds-makers ahead of the game. The difference between the
game outcome and point spread is called the point-spread error. For example, before the 2003
Super Bowl the Oakland Raiders were established as 3-point favorites over the Tampa Bay
Buccaneers. Tampa Bay won the game by 27 points and so the point-spread error was –30. (Had
the Oakland Raiders won the game by 10 points then the point-spread error would have been +7.)
In a sample of 240 NFL games the average point-spread error was – 1.6. The sample standard
deviation was s = 13.3.
Can you reject that the true mean point-spread error for all NFL games is zero? (significance level
α = 0.05)
1
KH19, Exercises
QUESTION 3
In a random sample of 95 manufacturing firms, 67 respondents have indicated that their company
attained ISO certification within the last two years. Find a 99% confidence interval for the
population proportion of companies that have been certified within the last two years.
QUESTION 4
Of a random sample of 361 owners of small businesses that had gone into bankruptcy, 105
reported conducting no marketing studies prior to opening the business. Can you reject the null
hypothesis that at most 25% of all members of this population conducted no marketing studies
before opening the business (significance level α = 0.05)?
QUESTION 5
Hertz contracts with Uniroyal to provide tires for Hertz’ rental car fleet. A clause in the contract
states that the tires must have a life expectancy of at least 28,000 miles. Of the 10,000 cars in the
Hertz’ fleet, 400 are based in Chicago. The Chicago garage tested the tires on 60 of their cars.
The life spans of the 60 tire sets are listed in the file tires.xls. If Hertz wants to use a 1% level of
significance, should Hertz seek relief from (i.e., sue) Uniroyal? That is, can Hertz prove that the
tires did not meet the contractually agreed (average) life expectancy?
QUESTION 6
Tyler Realty would like to be able to predict the selling price of new homes. They have collected
data on size (“sqfoot” in square feet) and selling price (“price” in thousands of dollars) which are
stored in the file tyler.xls.
Download this file from the course homepage and answer the
following questions.
a) Develop a scatter diagram for these data with size on the horizontal axis using KStat. Display
the best fit line in the scatter diagram.
b) Develop an estimated regression equation. Report the KStat regression output.
c) Predict the selling price for a home that is 2,000 square feet.
2
KH19, Exercises
QUESTION 7
The time between eruptions of the Old Faithful geyser in Yellowstone National Park is
random but is related to the duration of the previous eruption. In order to investigate this
relationship you collect data on 21 eruptions. For each observed eruption, you write
down its duration (call it DUR) and the waiting time to the next eruption (call it TIME).
That is, your variables are:
DUR
Duration of the previous eruption (in minutes)
TIME
Time until the next eruption (in minutes)
You obtain the following regression output from KStat.
Regression: TIME
Coefficient
std error of coef
t-ratio
p-value
Constant
DUR
31.01311 9.79006898
4.41658492 1.29990618
7.0220
7.5314
0.0001%
0.0000%
a) Write down the estimated regression equation, and verbally interpret the intercept and the
slope coefficients (in terms of geysers and eruption times).
b) The most recent eruption lasted 3 minutes. What is your best estimate for the time till
the next eruption?
c) Based on your regression, what is difference between the average time until the next
eruption after a 3.2-minute eruption and the average time until the next eruption after
a 3-minute eruption?
3