Chapter 5. Regression STAT 145 Problem 1. Body weight and pack
Transcription
Chapter 5. Regression STAT 145 Problem 1. Body weight and pack
Chapter 5. Regression STAT 145 Problem 1. Body weight and pack weight for a group of hikers. Body weight (lb) 120 187 109 103 131 165 158 116 Backpack weight (lb) 26 30 26 24 29 35 31 28 In this case Body weight is x (explanatory variable) and Backpack weight is y (response variable). We'll use CrunchIt tool in StatsPortal to obtain graphs and quick calculations for slope, intercept, r and other statistics: click on “CrunchIt” icon to the right from Problems in e-Book. You can use “CrunchIt” for Home practice and StatsPortal LearningCurve Tasks and Quizzes, but in Exam you'll need to provide calculations by using usual calculator and showing steps of your calculations. You can use “CrunchIt” even for external data (not only for data given in the book). StatsPortal has CrunchIT! Help Videos. You can find them from via top right search window. 1 Chapter 5. Regression STAT 145 From “CrunchIt”: Fitted Equation: Backpack = 16.26 + 0.09080 * Body Estimate Std. Error t value Pr(>|t|) (Intercept) 16.26 3.937 4.131 0.006137 Body 0.09080 0.02831 3.207 0.01844 estimated sigma: 2.270 2 Chapter 5. Regression STAT 145 Dashed red regression line - after removing (187, 30): Body weight (lb) 120 187 109 103 131 165 158 116 Backpack weight (lb) 26 30 26 24 29 35 31 28 Dashed blue regression line - after removing (165, 35). Body weight (lb) 120 187 109 103 131 165 158 116 Backpack weight (lb) 26 30 26 24 29 35 31 28 Dashed gray regression line - after removing both points. The outliers are influential for correlation and for least-squares regression, because removing any of them moves the regression line rather greatly (red and blue lines). If the outlier does not lie close to the line calculated from the other observations, it will be influential. 3 Chapter 5. Regression STAT 145 Problem 2. In the early part of the 20th century it was noticed that, when viewed over time, the number of crimes increased with the number of deaths from cancer. Suggest a lurking variable and explain why it is the most likely explanation for this. Problem 3. A researcher wants to determine whether the rate of water flow (in liters per second) over an experimental soil bed can be used to predict the amount of soil washed away (in kilograms). The researcher measures the amount of soil washed away for various flow rates, and from these data calculates the least-squares regression line to be amount of eroded soil = 0.4 + 1.3 x (flow rate). What can you say about correlation? Problem 4. The equation of the least-squares regression line is: What is the correlation coefficient? (a) 0.8765 (b) – 0.8765 (c) 24.02 (d) 1 / 24.02 ^y =201.2+24.02∗x and r 2=0.7682 . Problem 5. (from HW) 5.4 Do heavier people burn more energy? We have data on the lean body mass and resting metabolic rate for 12 women who are subjects in a study of dieting. Lean body mass, given in kilograms, is a person’s weight leaving out all fat. Metabolic rate, in calories burned per 24 hours, is the rate at which the body consumes energy. (a) Make a scatterplot that shows how metabolic rate depends on body mass. There is a quite strong linear relationship, with correlation r = 0.876. (b) Find the least-squares regression line for predicting metabolic rate from body mass. Add this line to your scatterplot. (c) Explain in words what the slope of the regression line tells us. (d) Another woman has a lean body mass of 45 kilograms. What is her predicted metabolic rate? 4 Chapter 5. Regression STAT 145 (a) (b) Fitted Equation: Rate = 201.2 + 24.03 * Mass Estimate Std. Error t value Pr(>|t|) (Intercept) 201.2 181.7 1.107 0.2942 Mass 24.03 4.174 5.756 0.0001836 estimated sigma: 95.08 ^y =201.2+24.02∗x OR we can find coefficients a and b by knowing (finding) cor(Mass,Rate): 0.8765 and Sample Mean Standard Deviation Mass 43.03 6.868 Rate 1235 188.3 5 Chapter 5. Regression b=r∗ STAT 145 sy 188.3 =0.8765∗ =24.03 sx 6.868 a= ¯y −b∗¯x =1235−24.03∗43.03=200.99 then ^y =200.99+ 24.03∗x (c) The slope tells that on average, metabolic rate increases by about 24 calories per day for each additional kilogram of body mass. (d) For x=45 kg: ^y =201.2+24.02∗45=1282.1 calories per day. 6 Chapter 5. Regression STAT 145 Problem 6. Data show that men, who are married, and also divorced or widowed men, earn quite a bit more than men who have never been married. Does this mean that getting married can increase your salary? What lurking variables might explain the association? Problem 7. Dan has been saving money each week in a box under his bed. The equation that predicts how much money he has is ŷ = 20 + 4x, where x is the number of weeks he has added to his box. This equation tells us that he started with $____. Problem 8. Based on scatterplot: What can be negative in a least-squares regression – slope or intercept? 7