Author: Min Ren Analysis for Z tests
Transcription
Author: Min Ren Analysis for Z tests
Author: Min Ren STAT 350 (Fall 2014) Lab 5: R Solution 1 Lab 5 - Interpretation of Confidence Intervals and Power Analysis for Z tests Objectives: A Better Understanding of Confidence Intervals and Power Curves. A. (55 points) Interpretation of a Confidence Interval. Use software to generate 40 observations from a normal distribution with µ = 10 and σ = 2. Repeat this 50 times. 1. (30 points) From each set of observations, compute a 90% confidence interval. No data is required, however, you need to include all 50 confidence intervals. Solution: Sample Code 1: (You should run the following code 50 times. Each time, you should copy the table, record the confidence interval and determine by hand if this confidence interval contains the population mean: 10.) n <- 40 RandomData <- rnorm(n,mean=10,sd=2) ci <- t.test(RandomData,conf.level=0.90) answer <- data.frame(i,ci$conf.int[1], ci$conf.int[2]) print(answer)} Sample Code 2: (You may consider the following code, which is more difficult to write, but generate the 50 tables simultaneously.) > > + + + + + > colnames(res)<-c("90% lower limit","90% upper limit","contain mu?") for (i in 1:50){ RandomSample<-rnorm(40,10,2) t<-t.test(RandomSample,conf.level=0.9) res[i,1:2]<-t$conf.int[1:2] res[i,3]<-(10>t$conf.int[1]) & (10<t$conf.int[2]) } res [1,] [2,] [3,] [4,] [5,] [6,] [7,] [8,] [9,] [10,] [11,] [12,] [13,] [14,] [15,] [16,] [17,] 90% lower limit 9.711632 9.603163 9.442489 10.232928 9.548948 9.655207 9.984213 9.394327 9.534189 8.710827 8.948052 9.291709 9.371126 10.192441 9.932500 9.833176 9.449219 90% upper limit 10.656472 10.787195 10.330170 11.131520 10.722365 10.843896 11.099040 10.542664 10.548846 9.658468 9.773528 10.241216 10.356686 11.147714 10.834223 10.806848 10.533126 STAT 350: Introduction to Statistics Purdue University contain mu? 1 1 1 0 1 1 1 1 1 0 0 1 1 0 1 1 1 Fall 2014 Author: Min Ren STAT 350 (Fall 2014) [18,] [19,] [20,] [21,] [22,] [23,] [24,] [25,] [26,] [27,] [28,] [29,] [30,] [31,] [32,] [33,] [34,] [35,] [36,] [37,] [38,] [39,] [40,] [41,] [42,] [43,] [44,] [45,] [46,] [47,] [48,] [49,] [50,] 9.226593 9.597059 9.062439 9.644769 9.434676 9.689123 8.963024 8.980457 9.534915 9.231642 9.235816 9.377265 8.978152 9.234448 10.062863 9.796705 9.478981 10.306448 9.637880 9.334300 9.557508 9.062608 9.266484 9.707183 9.269635 9.581570 9.348208 9.567426 9.560901 9.073926 9.582457 9.403577 9.448157 Lab 5: R Solution 10.431731 10.724411 10.218339 10.790207 10.444451 10.600831 10.063669 10.051737 10.704241 10.425103 10.315803 10.525817 10.185177 10.222421 10.864413 10.706976 10.446474 11.409389 10.698254 10.364254 10.555677 10.117098 10.371214 10.860903 10.572222 10.653135 10.503030 10.748857 10.679847 10.070101 10.771986 10.664905 10.417373 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2. (10 points) Determine how many of these intervals contain the population mean, µ = 10. Please indicate for each confidence interval if it contains the value or not. Is this number that you would expect? Why or why not? Solution: We can see that, in this special case, there are 6 confidence intervals that do not contain the true population mean, 10. This is roughly what we would expect. The confidence level is 90%, so we would expect (0.9)(50) = 45 of the confidence intervals to include the population mean, 10. Now we have 44, which is not far from what we expect. 3. (15 points) GROUP PART: Combine your data with 3 or 4 other students (in any of my sections) and answer the following questions: 1) Is the number of intervals that contain the mean what you would expect for the combined data? 2) How are the results from part 2 and part 3 different? (This is due on the following Monday and must be submitted online also.) Solution: You will need to show your work to the number of intervals that contain the mean (just add up the numbers from each student) and then calculate the percentage by dividing by the total number. I would expect that this number would be more accurate because this is for a larger sample size. This percentage is only valid at large numbers. STAT 350: Introduction to Statistics Purdue University Fall 2014 Author: Min Ren STAT 350 (Fall 2014) Lab 5: R Solution 3 B. (45 points) Water quality testing. The Deely Laboratory is a drinking-water testing and analysis service. One of the common contaminants it tests for is lead. Lead enters drinking water through corrosion of plumbing materials, such as lead pipes, fixtures, and solder. The service knows that their analysis procedure is unbiased but not perfectly precise, so the laboratory analyzes each water sample three times and reports the mean result. The repeated measurements follow a Normal distribution quite closely. The standard deviation of this distribution is a property of the analytic procedure and is known to be σ = 0.25 parts per billion (ppb). The Deely Laboratory has been asked by the university to evaluate a claim that the drinking water in the Student Union has a lead concentration of 6 ppb, well below the Environmental Protection Agency’s action level of 15 ppb. Since the true concentration of the sample is the mean μ of the population of repeated analyses, the hypotheses are The lab chooses the 1% level of significance, = 0.01. They plan to perform three analyses of one specimen (n=3). 1. (30 points, 6 points each part) Using computer software, calculate the following powers: a. At the 1% level of significance, what is the power of this test against the specific alternative μ = 6.5? Solution: Code: > > > > > > > > > > > > > > > n <- 3 alpha <- 0.01 mu0 <- 6 sigma <- 0.25 sigman <- sigma/sqrt(n) #standard error #z is from alpha/2 for a 2-tailed test z <- -qnorm(alpha/2) muprime=6.5 phi1 <- z + (mu0 - muprime)/sigman phi2 <- -z + (mu0 - muprime)/sigman pphi1 <- pnorm(phi1) pphi2 <- pnorm(phi2) beta <- pnorm(phi1)- pnorm(phi2) power <- 1 - beta; power [1] 0.8128029 The power is 0.8128029 STAT 350: Introduction to Statistics Purdue University Fall 2014 Author: Min Ren STAT 350 (Fall 2014) Lab 5: R Solution 4 b. At the 5% level of significance, what is the power of this test against the specific alternative μ = 6.5? Solution: Code: > n <- 3 > alpha <- 0.05 > mu0 <- 6 > sigma <- 0.25 > sigman <- sigma/sqrt(n) #standard error > #z is from alpha/2 for a 2-tailed test > z <- -qnorm(alpha/2) > muprime=6.5 > phi1 <- z + (mu0 - muprime)/sigman > phi2 <- -z + (mu0 - muprime)/sigman > pphi1 <- pnorm(phi1) > pphi2 <- pnorm(phi2) > beta <- pnorm(phi1)- pnorm(phi2) > power <- 1 - beta; > power [1] 0.9337271 The power is 0.9337271 c. At the 1% level of significance, what is the power of this test against the specific alternative μ = 6.75? Solution: Code: > n <- 3 > alpha <- 0.01 > mu0 <- 6 > sigma <- 0.25 > sigman <- sigma/sqrt(n) #standard error > #z is from alpha/2 for a 2-tailed test > z <- -qnorm(alpha/2) > muprime=6.75 > phi1 <- z + (mu0 - muprime)/sigman > phi2 <- -z + (mu0 - muprime)/sigman > pphi1 <- pnorm(phi1) > pphi2 <- pnorm(phi2) > beta <- pnorm(phi1)- pnorm(phi2) > power <- 1 - beta; > power [1] 0.9956077 The power is 0.9956077 STAT 350: Introduction to Statistics Purdue University Fall 2014 Author: Min Ren STAT 350 (Fall 2014) Lab 5: R Solution 5 d. If the lab performs five analyses of one specimen (n=5), what is the power of this test against the specific alternative μ = 6.5? Solution: Code: > n <- 5 > alpha <- 0.01 > mu0 <- 6 > sigma <- 0.25 > sigman <- sigma/sqrt(n) #standard error > #z is from alpha/2 for a 2-tailed test > z <- -qnorm(alpha/2) > muprime=6.5 > phi1 <- z + (mu0 - muprime)/sigman > phi2 <- -z + (mu0 - muprime)/sigman > pphi1 <- pnorm(phi1) > pphi2 <- pnorm(phi2) > beta <- pnorm(phi1)- pnorm(phi2) > power <- 1 - beta; > power [1] 0.9710402 The power is 0.9710402 e. Write a short paragraph explaining the consequences of changing the significance level, alternative μ and sample size on the power. Solution: We have the following conclusion: The larger the value of (or equivalently the higher the significance level), the greater the power is. The larger the distance between the ’ and 0 is, the greater the power. The larger the sample size, the greater the power. 2. (10 points) Generate a power curve when n =3 at a 1% significance level. Please use an interval length of 4. Solution: Code: > > > > > > > > > > > > > > > n <- 3 alpha <- 0.01 mu0 <- 6 sigma <- 0.25 sigman <- sigma/sqrt(n) #standard error #z is from alpha/2 for a 2-tailed test z <- -qnorm(alpha/2) muprime <- seq (from=4,to=8, by=0.05) phi1 <- z + (mu0 - muprime)/sigman phi2 <- -z + (mu0 - muprime)/sigman pphi1 <- pnorm(phi1) pphi2 <- pnorm(phi2) beta <- pnorm(phi1)- pnorm(phi2) power <- 1 - beta; plot(muprime,power,main="Power for the Hypothesis Test",type="l") STAT 350: Introduction to Statistics Purdue University Fall 2014 Author: Min Ren STAT 350 (Fall 2014) Lab 5: R Solution 6 3. (5 points) What sample size would be required for the power to be at least 0.90 at the 1% level of significance against the specific alternative μ = 6.5? Solution: Code: > > > > > > > > > > > > > > > 1 2 3 4 5 6 7 n <- seq (1,15,1) muprime <- 6.5 alpha <- .01 mu0 <- 6 sigma <- 0.25 sigman <- sigma/sqrt(n) z <- -qnorm(alpha/2) phi1 <- z + (mu0 - muprime)/sigman phi2 <- -z + (mu0 - muprime)/sigman pphi1 <- pnorm(phi1) pphi2 <- pnorm(phi2) beta <- pnorm(phi1)- pnorm(phi2) power <- 1 - beta; answer <- data.frame(n,power) answer n 1 2 3 4 5 6 7 power 0.2823677 0.5997105 0.8128029 0.9228015 0.9710402 0.9899145 0.9966929 STAT 350: Introduction to Statistics Purdue University Fall 2014 Author: Min Ren STAT 350 (Fall 2014) 8 9 10 11 12 13 14 15 8 9 10 11 12 13 14 15 Lab 5: R Solution 7 0.9989686 0.9996917 0.9999111 0.9999752 0.9999933 0.9999982 0.9999995 0.9999999 As seen above, we need the sample size to be at least 4. STAT 350: Introduction to Statistics Purdue University Fall 2014