Chapter 5. Regression STAT 145 Problem 1. Body weight and pack

Transcription

Chapter 5. Regression STAT 145 Problem 1. Body weight and pack
Chapter 5. Regression
STAT 145
Problem 1.
Body weight and pack weight for a group of hikers.
Body weight (lb)
120
187
109
103
131
165
158
116
Backpack
weight (lb)
26
30
26
24
29
35
31
28
In this case Body weight is x (explanatory variable) and Backpack weight is y (response variable).
We'll use CrunchIt tool in StatsPortal to obtain graphs and quick calculations for slope,
intercept, r and other statistics: click on “CrunchIt” icon to the right from Problems in
e-Book.
You can use “CrunchIt” for Home practice and StatsPortal LearningCurve Tasks and Quizzes, but in Exam
you'll need to provide calculations by using usual calculator and showing steps of your calculations.
You can use “CrunchIt” even for external data (not only for data given in the book).
StatsPortal has CrunchIT! Help Videos. You can find them from via top right search window.
1
Chapter 5. Regression
STAT 145
From “CrunchIt”:
Fitted Equation: Backpack = 16.26 + 0.09080 * Body
Estimate Std. Error t value Pr(>|t|)
(Intercept) 16.26
3.937
4.131 0.006137
Body
0.09080 0.02831
3.207 0.01844
estimated sigma: 2.270
2
Chapter 5. Regression
STAT 145
Dashed red regression line - after removing (187, 30):
Body weight (lb)
120
187
109
103
131
165
158
116
Backpack
weight (lb)
26
30
26
24
29
35
31
28
Dashed blue regression line - after removing (165, 35).
Body weight (lb)
120
187
109
103
131
165
158
116
Backpack
weight (lb)
26
30
26
24
29
35
31
28
Dashed gray regression line - after removing both points.
The outliers are influential for correlation and for least-squares regression, because removing any of them
moves the regression line rather greatly (red and blue lines).
If the outlier does not lie close to the line calculated from the other observations, it will be influential.
3
Chapter 5. Regression
STAT 145
Problem 2.
In the early part of the 20th century it was noticed that, when viewed over time, the number of crimes increased
with the number of deaths from cancer. Suggest a lurking variable and explain why it is the most likely
explanation for this.
Problem 3.
A researcher wants to determine whether the rate of water flow (in liters per second) over an experimental soil
bed can be used to predict the amount of soil washed away (in kilograms). The researcher measures the amount
of soil washed away for various flow rates, and from these data calculates the least-squares regression line to be
amount of eroded soil = 0.4 + 1.3 x (flow rate).
What can you say about correlation?
Problem 4.
The equation of the least-squares regression line is:
What is the correlation coefficient?
(a) 0.8765
(b) – 0.8765
(c) 24.02
(d) 1 / 24.02
^y =201.2+24.02∗x and r 2=0.7682 .
Problem 5. (from HW)
5.4 Do heavier people burn more energy? We have data on the lean body mass and resting
metabolic rate for 12 women who are subjects in a study of dieting. Lean body mass, given in kilograms,
is a person’s weight leaving out all fat. Metabolic rate, in calories burned per 24 hours, is the rate at which
the body consumes energy.
(a) Make a scatterplot that shows how metabolic rate depends on body mass. There is a quite strong
linear relationship, with correlation r = 0.876.
(b) Find the least-squares regression line for predicting metabolic rate from body mass. Add this line to
your scatterplot.
(c) Explain in words what the slope of the regression line tells us.
(d) Another woman has a lean body mass of 45 kilograms. What is her predicted metabolic rate?
4
Chapter 5. Regression
STAT 145
(a)
(b)
Fitted Equation: Rate = 201.2 + 24.03 * Mass
Estimate Std. Error t value Pr(>|t|)
(Intercept) 201.2
181.7
1.107 0.2942
Mass
24.03
4.174
5.756 0.0001836
estimated sigma: 95.08
^y =201.2+24.02∗x
OR we can find coefficients a and b by knowing (finding)
cor(Mass,Rate): 0.8765
and
Sample Mean Standard Deviation
Mass
43.03
6.868
Rate
1235
188.3
5
Chapter 5. Regression
b=r∗
STAT 145
sy
188.3
=0.8765∗
=24.03
sx
6.868
a= ¯y −b∗¯x =1235−24.03∗43.03=200.99
then
^y =200.99+ 24.03∗x
(c)
The slope tells that on average, metabolic rate increases by about 24 calories per day for each additional
kilogram of body mass.
(d)
For x=45 kg:
^y =201.2+24.02∗45=1282.1 calories per day.
6
Chapter 5. Regression
STAT 145
Problem 6.
Data show that men, who are married, and also divorced or widowed men, earn quite a bit more than men who
have never been married. Does this mean that getting married can increase your salary?
What lurking variables might explain the association?
Problem 7.
Dan has been saving money each week in a box under his bed. The equation that predicts how much money he
has is ŷ = 20 + 4x, where x is the number of weeks he has added to his box. This equation tells us that he started
with $____.
Problem 8.
Based on scatterplot:
What can be negative in a least-squares regression – slope or intercept?
7