Lab 8: Inference on two samples (19.5 pts. + 2...
Transcription
Lab 8: Inference on two samples (19.5 pts. + 2...
STAT 350 class: April 2, 2014 due: April 7, 2014 Lab 8: Inference on two samples (19.5 pts. + 2 pts. BONUS) Purposes: 1) Inference for 2 – sample independent 2) Inference 2 – sample paired Remember: a) Please put your name, my name, section number (class time) and lab # on the front of the lab. b) Label each part and put them in logical order. c) ALWAYS include your MiniTab procedure for each problem unless stated otherwise. d) Only include the relevant MiniTab output (DO NOT SPAM ME WITH OUTPUT!) 1) Inference for 2 – sample independent You can use the data either in one column like it is below or in two columns by using the appropriate option. In example 7.14 (Data file: 7.wheat): Price 6.8250 7.3025 7.0275 7.0825 7.3000 7.3325 7.5575 7.3125 7.3600 7.5550 Month July July July July July September September September September September Stat → Basic Statistics → 2-Sample t → Samples in one column (Samples: Price (values), Subscripts (Month (differentiate the cases) → Options (Put in the appropriate Confidence level, Test difference (), Alternative – the defaults are appropriate for this example) For pooled variance, check ‘Assume equal variances’. Output from Session Window For this situation, each of the samples needs to be normal. The following procedure is used to generate the histogram and probability plots (I did not include the output): Histogram: Graph → Histogram → With Fit → (Graph variables: Price) → Multiple Graphs → By variables with groups on separate graphs: Month → Data View → Smoother → Check Lowess Normal probability plot: Graph → Probability Plot → Single → Graph variables: Price → Distribution → Data Display (uncheck Show confidence interval) → Scale → YScale Type → Score → Multiple Graphs → By variables with groups on separate graphs: Month 1 Stat 350 Lab 8 MiniTab Two-Sample T-Test and CI: Price, Month Two-sample T for Price Month July September N 5 5 Mean 7.107 7.423 StDev 0.201 0.122 SE Mean 0.090 0.055 Difference = mu (July) - mu (September) Estimate for difference: -0.316 95% CI for difference: (-0.574, -0.058) T-Test of difference = 0 (vs not =): T-Value = -3.00 P-Value = 0.024 DF = 6 Note: the numbers are opposite those in the example in the book because I was matching the SAS output. 2) Inference 2 – sample paired The example output is taken from Example 7.7 (Data file: 7.french). In addition, this example uses a one-tailed alternative hypothesis. When you are performing directional hypothesis, be sure to know which variable is which so the direction is appropriate. Stat → Basic Statistics → Paired t → Samples in columns (It will be first sample – second sample) (First sample: Posttest, Second sample: Pretest) → Options (Put in the appropriate Confidence level, Test difference (), Alternative – Confidence level: 90, Alternative: greater than) The Paired t procedure will not create a QQplot; only a histogram so you need to create the difference by hand and then create the plots from the new column. Calc → Calculator → Store result in variable C5, Expression: ‘Posttest’ – ‘Pretest’, check Assign as a formula. Then you can create the histogram and Probability plot from this new column (not provided). Output from Session Window Paired T-Test and CI: Posttest, Pretest Paired T for Posttest - Pretest Posttest Pretest Difference N 20 20 20 Mean 28.30 25.80 2.500 StDev 5.95 6.30 2.893 SE Mean 1.33 1.41 0.647 90% lower bound for mean difference: 1.641 T-Test of mean difference = 0 (vs > 0): T-Value = 3.86 P-Value = 0.001 Problems All of these problems are for 2 samples; however, you will need to decide whether the samples are independent or paired. There should only be one code for each problem; that is, there should not be separate code for each of the parts. The correct answer to part a) is NOT the format of the data file or which section the problem is from. 2 Stat 350 Lab 8 MiniTab Problem 1 (6.5 pts.) (7.44 Potential insurance fraud? Data Set: 7.FRAUD) Insurance adjusters are concerned about the high estimates they are receiving from Jocko’s Garage. To see if the estimates are unreasonably high, each of 10 damaged cars was taken to Jocko’s and to a “trusted” garage and the estimates recorded. Here are the results: a) Which procedure should you use (independent or paired)? Please explain your answer. b) Examine each sample graphically, with special attention to outliers and skewness (histogram and normal quantile plot). Is use of a t procedure acceptable for these data? c) Perform a hypothesis test to determine if there is a difference between the two garages at a significance level of 0.01. Be sure to perform the 7 steps. d) Calculate and interpret the appropriate confidence interval. e) Based on the answers to c) and d), is there a difference between the two garages? Why or why not? Your submission should consist one code for all of the parts, the plots in b) and the appropriate output in parts c) and d), and the answers to all of the questions. In part d), you may either rewrite the confidence interval or just indicate where it is in the output. Problem 2 (6.5 pts.) (7.93 Study habits. Data Set: 7.Studyhabits) The Survey of Study Habits and Attitudes (SSHA) is a psychological test designed to measure the motivation, study habits, and attitudes toward learning of college students. These factors, along with ability, are important in explaining success in school. Scores on the SSHA range from 0 to 200. A selective private college gives the SSHA to an SRS of both male and female first-year students. The data for the women are as follows: Here are the scores of the men: a) Which procedure should you use (independent or paired)? Please explain your answer. b) Examine each sample graphically, with special attention to outliers and skewedness. Is use of a t procedure acceptable for these data? c) Most studies have found that the mean SSHA score for men is lower than the mean score in a comparable group of women. Perform the appropriate hypothesis test (7 steps) at a significance level of 0.1. (Hint: Please look at the answer key for Lab 6.) d) Calculate and interpret the appropriate confidence bound for the mean difference between the SSHA scores of male and female first-year students at this college. e) Based on the answers to c) and d), is the mean score for men lower than the mean score for women? Why or why not? Your submission should consist one code for all of the parts, the plots in b) and the appropriate output in parts c) and d), and the answers to all of the questions. In part d), you may either rewrite the confidence interval or just indicate where it is in the output. 3 Stat 350 Lab 8 MiniTab Problem 3 (6.5 pts.) (7.72 Sadness and spending. Data Set: 7.Sadness) The “misery is not miserly” phenomenon refers to a sad person’s spending judgment going haywire. In a recent study, 31 young adults were given $10 and randomly assigned to either a sad or a neutral group. The participants in the sad group watched a video about the death of a boy’s mentor (from The Champ), and those in the neutral group watched a video on the Great Barrier Reef. After the video, each participant was offered the chance to trade $0.50 increments of the $10 for an insulated water bottle. Here are the data: a) Which procedure should you use (independent or paired)? Please explain your answer. b) Examine each group’s prices graphically. Is use of the t procedures appropriate for these data? Carefully explain your answer. c) Perform the significance test at a significance level of 0.05 to determine if the spending is dependent on whether the person is sad or not. d) Calculate and interpret the appropriate confidence interval for the mean difference in purchase price between the two groups. e) Based on the answers to c) and d), does spending depend on whether a person is sad or not? Why or why not? Your submission should consist one code for all of the parts, the plots in b) and the appropriate output in parts c) and d), and the answers to all of the questions. In part d), you may either rewrite the confidence interval or just indicate where it is in the output. Problem 4 BONUS (2 pts.) Generate the procedure to calculate the power curve for a t distribution as described in Section 7.3 in the text. Generate a power curve for the example 7.14 in the book (Part 1 above). 4