AP Stats – Chap 27 Inferences for Regression

Transcription

AP Stats – Chap 27 Inferences for Regression
AP Stats – Chap 27
Inferences for Regression
Finally, we’re interested in examining how slopes of regression lines vary from sample to
sample. Each sample will have it’s own slope, b1. These are all estimates of the “true”
slope, β1. The distribution of all these slopes follows a t-model and has (n – 2) degrees
of freedom.
Things you will need to recall about regression from Chapters 7-9:
how do we make a scatterplot?
what do we look for in a scatterplot?
how do we find the correlation?
what does it mean?
what does R2 mean?
how do we create a linear model (regression equation)?
which variable goes where in the equation, and which gets the hat on it?
how do we know if a linear model is appropriate?
how do we create a residuals plot?
what does the slope mean…in context?
what does the y-intercept mean…in context?
Regression Slope t-Test
HYPOTHESIS
null: ...within the population, there is no association between the variables (that
we see in the example).
…that the ideal regression line is plain, boring, has a β1 = 0 (horizontal line).
alternative: …there is some relationship between the variables.
…the β1 ≠ 0
MODEL
conditions must be checked in order
Straight Enough Condition – is the scatterplot of the original data
straight enough? check the residuals plot! you may need to re-express.
Independence Condition – this is nearlt impossible to check, so check
for Randomization. often, the fact that the individuals are a representative
sample of the population is the best that can be done.
Does the Plot Thicken? Condition – the spread of the data around
the regression “line” should be nearly constant. no fan shape! no growing
or shrinking tendencies. again…look at residuals plot!
Nearly Normal Condition – make a histogram of the residuals. it
needs to be symmetric and unimodal enough.
If all four conditions are true, the ideal regression line would look like:
“With the conditions having been met, we can use a regression model for the
distribution and a linear regression t-test.”
MECHANICS
if
•
•
•
•
you have the individual data:
enter data into L1 and L2
STAT
TESTS
LinRegTTest
o Xlist: L1
o Ylist: L2
o Freq:1
o choose the two-tailed (≠) option
o RegEQ:
o CALCULATE
if you have a computer regression analysis:
• if the t-value is not given in the analysis, you’ll need to calculate t =
draw a t-curve and shade it
list the p-value
list the regression equation (found in Y1)
b1
SE ( b1 )
CONCLUSION
reject / fail to reject
“There is evidence that…” (provide context!)
confidence interval:
• if you have a TI-84/84+:
o STAT
o TESTS
o LinRegTInt
• if you don’t have a TI-84/84+:
o find the t* value as before. (here, the df = n – 1)
o interval is b1 ± ( t * ) SE ( b1 ) 
“We are ___% confident that the average __(dependent variable)__
increases/decreases/rises/falls/faster/slower/etc. between _(low)_ and _(high)_
__(units)__ for each additional __(independent variable)__.”
Example #1
High Stakes Test
New state requirements force students to take a “high stakes” math
test in order to graduate from high school. Faced with such a
pressure-laden situation, many students become very nervous, which
may interfere with their ability to perform well. Concerned about “test
anxiety,” a researcher enlists 24 student volunteers for a study.
A psychologist interviews them before the math test, assessing their
anxiety levels on a scale from 1 to 10. The table shows the anxiety
levels and exam scores.
1. Sketch a scatterplot.
2. Does there appear to be an association between anxiety level and
test score? Describe what you see in the scatterplot.
3. Find the correlation. What does it indicate?
4. Interpret the R2 in context.
5. Create the linear model.
6. Is this linear model appropriate? Sketch and discuss the residuals plot.
7. Interpret the slope of this line in context.
8. Interpret the y-intercept of this line in context.
9. Is there evidence of an association between anxiety levels and student performance?
(Perform a test.)
10. Provide a 95% confidence interval.
Example #2
Electricity Usage
Investigate the association between average monthly temperature ( °F ) and electrical usage
(kilowatt hours) for a home.
Original data – avg temp (x) v.
kwh (y)
Residual plot – avg temp (x) v.
residuals (y)
Histogram of residuals
Is there evidence of an association between average monthly temperate and electrical usage?
Explain the association using a 95% confidence interval.
Example #3
GPAs
Ten students in a graduate program were randomly selected. Their grade point averages
(GPAs) when they entered the program were between 3.5 and 4.0. The students’ GPAs on
entering the program and their current GPAs were recorded. Use the regression analysis below
to answer the questions.
1. Create the linear model.
2. Interpret the p-value.
3. Find a 95% confidence interval for the slope of the regression line.
Example #4
Heights and Weights
Is the height of a man related to his weight?
The regression analysis from a sample of 26
men is shown.
1. How many degrees of freedom are there?
2. What is the t-value?
3. Find a 98% confidence interval for the slope of the regression line.