Review for the Final Exam STAT E-150 Statistical Methods
Transcription
Review for the Final Exam STAT E-150 Statistical Methods
STAT E-150 Statistical Methods Review for the Final Exam The final exam will be on Wednesday, May 15 during our regular class time. You will have two hours to complete the exam. Please arrive on time and take alternating seats so that you have room for your materials. The exam is open book: you may use your own notes, handouts, homework, textbook, etc. Basically, you may use any of your own materials from this course, from this semester. 2 Topics will include: simple linear regression, multiple regression, oneway ANOVA, two-way ANOVA, repeated measures, logistic regression, multiple logistic regression, experimental design, and nonparametric tests. No specific topics from the first half of the course will be tested, except as they relate to the topics listed here. 3 What to expect • Multiple choice, matching questions, and short answer questions as there were on the midterm • Using SPSS output instead of calculations • Interpreting graphs You’ll be tested on what you’ve seen (assuming you’ve done your homework …) 4 If you would like to have your exam returned to you, please bring a large self-addressed envelope to the exam, with sufficient postage attached (not affixed). If you forget to bring an envelope to the exam, you can send one to me at Pine Manor College, 400 Heath St, Chestnut Hill, MA 02467. 5 WELL BEFORE THE EXAM Organize the course handouts, your homework, homework solutions, and any section materials from the class website in a 3-ring binder, tabbed for quick reference. Materials from before the Midterm could be helpful as well. Make a few pages of your own key notes, such as particular formulas, definitions or concepts. Consider making your own guides to hypothesis tests, such as the example in this document. 6 WELL BEFORE THE EXAM Review the class notes and homework solutions, especially on concepts that you need to review more carefully. Be sure that you have taken the Practice Final using the materials (calculator, notes, section handouts...) that you will use for the exam. Attend some or all of these sections: Fri. 5/10 53 Church St., Rm 201 5:30 - 6:30 Kela Mon. 5/13 53 Church St., Rm 202 5:30 - 6:30 Kela Harvard Hall, Rm 201 4:30 - 5:20 Stephanie Wed. 5/15 7 BEFORE THE EXAM Organize all materials, notebooks, textbook, section handouts, etc. Only your own materials from this course, this semester, are permitted. Be sure you have your calculator, and possibly an extra one. You may not use a graphing calculator for statistics functions, and you may not use a cell phone or PDA calculator. When you are ready to begin, RELAX!, and continue to think positive thoughts about the outcome of the exam, as research has shown this technique to contribute to better scores (Meichenbaum, 1996). 8 DURING THE EXAM Read questions carefully and thoroughly. Pace yourself, and keep track of your progress and the clock. Don’t get bogged down. Consider noting any difficult questions and coming back later. Think carefully about the appropriate analysis: Hypothesis test or confidence interval? Is the response variable quantitative or categorical? How many treatments? Which is the coefficient of interest? Are you concerned with means or relationships between variables? When you are finished, go back to check that you haven’t skipped any questions. 9 AFTER THE EXAM CONGRATULATIONS!! IT’S TIME TO CELEBRATE!!! 10 SAMPLE HYPOTHESIS TEST GUIDE - create your own for each type of test Multiple Regression Test 1: Overall significance of multiple regression model; use an F-test for the model (ANOVA). H0: β1 = β2 =… = βk = 0 Ha: The slopes are not all zero Test 2: Specific significance of single coefficient; use a t-test for each coefficient. H0: βj = 0 Ha: βj ≠ 0 11 Review Questions For each question choose the best method of analysis and write the appropriate hypotheses. Choose from the statistical methods we have discussed this semester: Simple linear regression Multiple regression Logistic regression Multiple logistic regression One-way ANOVA Two-way ANOVA Repeated measures ANOVA 12 1. A survey asked subjects to report their political ideology, measured with seven categories in which 1 = extremely liberal, 4 = moderate, and 7 = extremely conservative. The subjects also reported their gender (male, female) and their level of education (no college, some college, college graduate). The data was used to investigate any differences in the political ideologies of these groups. What is the best method of analysis? What would be your hypotheses? 13 1. A survey asked subjects to report their political ideology, measured with seven categories in which 1 = extremely liberal, 4 = moderate, and 7 = extremely conservative. The subjects also reported their gender (male, female) and their level of education (no college, some college, college graduate). The data was used to investigate any differences in the political ideologies of these groups. What is the best method of analysis? Two-way ANOVA What would be your hypotheses? 14 1. A survey asked subjects to report their political ideology, measured with seven categories in which 1 = extremely liberal, 4 = moderate, and 7 = extremely conservative. The subjects also reported their gender (male, female) and their level of education (no college, some college, college graduate). The data was used to investigate any differences in the political ideologies of these groups. What is the best method of analysis? Two-way ANOVA What would be your hypotheses? H0: μmn = μfn = μms = μfs = μmg = μfg Ha: the means are not all equal 15 2. A survey asked subjects to report their political ideology, measured with seven categories in which 1 = extremely liberal, 4 = moderate, and 7 = extremely conservative. The subjects also reported their gender (male, female) and their level of education (no college, some college, college graduate). The data was used to see if there were differences in the political ideologies of people with different levels of education. What is the best method of analysis? What would be your hypotheses? 16 2. A survey asked subjects to report their political ideology, measured with seven categories in which 1 = extremely liberal, 4 = moderate, and 7 = extremely conservative. The subjects also reported their gender (male, female) and their level of education (no college, some college, college graduate). The data was used to see if there were differences in the political ideologies of people with different levels of education. What is the best method of analysis? One-way ANOVA What would be your hypotheses? 17 2. A survey asked subjects to report their political ideology, measured with seven categories in which 1 = extremely liberal, 4 = moderate, and 7 = extremely conservative. The subjects also reported their gender (male, female) and their level of education (no college, some college, college graduate). The data was used to see if there were differences in the political ideologies of people with different levels of education. What is the best method of analysis? One-way ANOVA What would be your hypotheses? H0: μnone = μsome = μgrad Ha: the means are not all equal 18 3. In a survey related to Jeb Bush’s possible candidacy for President, subjects were asked for their annual income and whether they would vote for Bush, to see if there is any relationship between income and interest in voting for Bush. What is the best method of analysis? What would be your hypotheses? 19 3. In a survey related to Jeb Bush’s possible candidacy for President, subjects were asked for their annual income and whether they would vote for Bush, to see if there is any relationship between income and interest in voting for Bush. What is the best method of analysis? Logistic regression What would be your hypotheses? 20 3. In a survey related to Jeb Bush’s possible candidacy for President, subjects were asked for their annual income and whether they would vote for Bush, to see if there is any relationship between income and interest in voting for Bush. What is the best method of analysis? Logistic regression What would be your hypotheses? H 0: 1 = 0 H a: 1 ≠ 0 21 4. Many women give birth to more than one child. In research on the birthweights of children, data was gathered on the birthweights of children born to six different mothers. The data looked like this: Birthweights Mother Child 1 Child 2 Child 3 Child 4 1 2 : 6 6.4 8.5 : 7.0 6.9 7.8 : 7.8 6.7 7.8 : 8.6 7.1 8.3 : 6.6 What is the best method of analysis? What would be your hypotheses? 22 4. Many women give birth to more than one child. In research on the birthweights of children, data was gathered on the birthweights of children born to six different mothers. The data looked like this: Birthweights Mother Child 1 Child 2 Child 3 Child 4 1 2 : 6 6.4 8.5 : 7.0 6.9 7.8 : 7.8 6.7 7.8 : 8.6 7.1 8.3 : 6.6 What is the best method of analysis? Repeated measures ANOVA What would be your hypotheses? 23 4. Many women give birth to more than one child. In research on the birthweights of children, data was gathered on the birthweights of children born to six different mothers. The data looked like this: Birthweights Mother Child 1 Child 2 Child 3 Child 4 1 2 : 6 6.4 8.5 : 7.0 6.9 7.8 : 7.8 6.7 7.8 : 8.6 7.1 8.3 : 6.6 What is the best method of analysis? Repeated measures ANOVA What would be your hypotheses? H0: μ1 = μ2 = μ3 = μ4 Ha: the means are not all equal 24 5. An investigator for the state police wants to determine the effectiveness of three different defensive driving programs to see if there are gender differences. Five subjects of each gender who recently received speeding tickets are assigned to each program. At the end of the program each is given a written test on his or her knowledge of defensive driving. The scores (out of 100) are given here: Scores Gender Female Male One 8 - hour session 88 92 98 99 91 Two 4 - hour sessions 87 92 91 94 93 Two 2 - hour sessions 80 82 79 86 88 89 96 95 90 96 95 87 90 91 92 77 78 83 78 78 25 Use the SPSS results shown to answer these questions: 1. Is there interaction between program and gender? a. b. c. d. e. No, because the p-value is close to zero. Yes, because .196 is greater than .05 No, because .196 is greater than .05 Yes, because .374 is greater than .05 No, because .374 is greater than .05 Tests of Between-Subjects Effects Dependent Variable:score Type III Sum of Source Squares df Mean Square F Sig. 935.500a 5 187.100 15.923 .000 234967.500 1 234967.500 19997.234 .000 20.833 1 20.833 1.773 .196 890.600 2 445.300 37.898 .000 24.067 2 12.033 1.024 .374 Error 282.000 24 11.750 Total 236185.000 30 1217.500 29 Corrected Model Intercept gender sessions gender * sessions Corrected Total a. R Squared = .768 (Adjusted R Squared = .720) 26 1. Is there interaction between program and gender? a. b. c. d. e. No, because the p-value is close to zero. Yes, because .196 is greater than .05 No, because .196 is greater than .05 Yes, because .374 is greater than .05 No, because .374 is greater than .05 Tests of Between-Subjects Effects Dependent Variable:score Type III Sum of Source Squares df Mean Square F Sig. 935.500a 5 187.100 15.923 .000 234967.500 1 234967.500 19997.234 .000 20.833 1 20.833 1.773 .196 890.600 2 445.300 37.898 .000 24.067 2 12.033 1.024 .374 Error 282.000 24 11.750 Total 236185.000 30 1217.500 29 Corrected Model Intercept gender sessions gender * sessions Corrected Total a. R Squared = .768 (Adjusted R Squared = .720) 27 2. How does the interaction plot support your results in the previous question? a. There is evidence of interaction because for part of the plot the lines appear to be parallel. b. There is no evidence of interaction because for part of the plot the lines are not parallel. c. There is evidence of interaction because the lines do not intersect. d. There is no evidence of interaction because the lines do not intersect. e. None of the above. 28 2. How does the interaction plot support your results in the previous question? a. There is evidence of interaction because for part of the plot the lines appear to be parallel. b. There is no evidence of interaction because for part of the plot the lines are not parallel. c. There is evidence of interaction because the lines do not intersect. d. There is no evidence of interaction because the lines do not intersect. e. None of the above. 29 3. Is there a significant difference in the mean scores for the three types of sessions? a. b. c. d. e. Yes, because F = 1.773 and p is large. No, because F = 1.773 and p is large. Yes, because F = 37.898 and p is close to 0. No, because F = 37.898 and p is close to 0. None of the above. Tests of Between-Subjects Effects Dependent Variable:score Type III Sum of Source Squares df Mean Square F Sig. 935.500a 5 187.100 15.923 .000 234967.500 1 234967.500 19997.234 .000 20.833 1 20.833 1.773 .196 890.600 2 445.300 37.898 .000 24.067 2 12.033 1.024 .374 Error 282.000 24 11.750 Total 236185.000 30 1217.500 29 Corrected Model Intercept gender sessions gender * sessions Corrected Total a. R Squared = .768 (Adjusted R Squared = .720) 30 3. Is there a significant difference in the mean scores for the three types of sessions? a. b. c. d. e. Yes, because F = 1.773 and p is large. No, because F = 1.773 and p is large. Yes, because F = 37.898 and p is close to 0. No, because F = 37.898 and p is close to 0. None of the above. Tests of Between-Subjects Effects Dependent Variable:score Type III Sum of Source Squares df Mean Square F Sig. 935.500a 5 187.100 15.923 .000 234967.500 1 234967.500 19997.234 .000 20.833 1 20.833 1.773 .196 890.600 2 445.300 37.898 .000 24.067 2 12.033 1.024 .374 Error 282.000 24 11.750 Total 236185.000 30 1217.500 29 Corrected Model Intercept gender sessions gender * sessions Corrected Total a. R Squared = .768 (Adjusted R Squared = .720) 31 4. Is there significant difference in the scores by gender? Yes/No, because F = 1.773 and the p-value is large/small. Tests of Between-Subjects Effects Dependent Variable:score Type III Sum of Source Squares df Mean Square F Sig. 935.500a 5 187.100 15.923 .000 234967.500 1 234967.500 19997.234 .000 20.833 1 20.833 1.773 .196 890.600 2 445.300 37.898 .000 24.067 2 12.033 1.024 .374 Error 282.000 24 11.750 Total 236185.000 30 1217.500 29 Corrected Model Intercept gender sessions gender * sessions Corrected Total a. R Squared = .768 (Adjusted R Squared = .720) 32 4. Is there significant difference in the scores by gender? Yes/No, because F = 1.773 and the p-value is large/small. Tests of Between-Subjects Effects Dependent Variable:score Type III Sum of Source Squares df Mean Square F Sig. 935.500a 5 187.100 15.923 .000 234967.500 1 234967.500 19997.234 .000 gender 20.833 1 20.833 1.773 .196 sessions 890.600 2 445.300 37.898 .000 24.067 2 12.033 1.024 .374 Error 282.000 24 11.750 Total 236185.000 30 1217.500 29 Corrected Model Intercept gender * sessions Corrected Total a. R Squared = .768 (Adjusted R Squared = .720) 33