Lab 5 - Interpretation of Confidence Intervals and Power
Transcription
Lab 5 - Interpretation of Confidence Intervals and Power
STAT 350 (Fall 2014) Lab 5: SAS Solution 1 Lab 5 - Interpretation of Confidence Intervals and Power Analysis for Z tests Objectives: A Better Understanding of Confidence Intervals and Power Curves. A. (55 points) Interpretation of a Confidence Interval. Use software to generate 40 observations from a normal distribution with µ = 10 and σ = 2. Repeat this 50 times. 1. (30 points) From each set of observations, compute a 90% confidence interval. No data is required, however, you need to include all 50 confidence intervals. Solution: Sample Code 1: (You should run the following code 50 times. Each time, you should copy the table, record the confidence interval and determine by hand if this confidence interval contains the population mean: 10.) data CI; do i = 1 to 40 by 1; randnorm = rand('normal',10,2) ; output ; drop i ; end; run ; proc ttest data = CI ; var randnorm; run ; Sample Code 2: (You may consider the following code, which is more difficult to write, but generate the 50 tables simultaneously.) data CI ; array x{50}; do i = 1 to 40 by 1; do j = 1 to 50 by 1 ; x{j} = rand('normal',10,2) ; end; output ; drop i ; drop j ; end; run ; proc ttest data = CI ; var x1; var x2; var x3; var x4; var var x6; var x7; var x8; var x9; var var x11; var x12; var x13; var x14; var x16; var x17; var x18; var x19; var x21; var x22; var x23; var x24; var x26; var x27; var x28; var x29; var x31; var x32; var x33; var x34; var x36; var x37; var x38; var x39; var x41; var x42; var x43; var x44; var x46; var x47; var x48; var x49; ods select ConfLimits; ods show ; run ; x5; x10; var x15; var x20; var x25; var x30; var x35; var x40; var x45; var x50; STAT 350 (Fall 2014) Lab 5: SAS Solution 2 You should get 50 tables, each of which contains a confidence interval. The following is the first table I generated. Your report should contain 50 of this tables and highlight all the tables whose confidence intervals do not include (or include) the population mean:10. To make the solution concise, I listed all the confidence intervals I got in the table below and highlight those who do not include the population mean: 10. CI In? 95% CL Mean 9.4097 10.8642 1 95% CL Mean 9.5091 10.6161 11.2007 10.8748 1 10.3901 1 10.6851 1 10.9508 1 10.5657 10.8263 1 10.8777 1 10.4367 1 11.2368 1 10.4578 0 11.1275 1 10.9419 1 11.6291 1 10.3659 1 9.6974 11.1017 9.7448 10.8431 9.3926 10.3865 9.1212 10.3648 9.2466 10.5082 9.1406 10.3695 9.1195 10.4024 9.4113 10.6268 9.335 10.6218 9.4849 10.5298 0 9.6107 11.0554 1 9.0389 10.2855 10.7677 1 9.1935 10.566 1 9.4117 10.529 1 9.4129 10.8171 1 10.1422 11.3419 0 95% CL Mean 1 9.3031 10.6495 1 95% CL Mean 1 9.2112 10.5111 1 95% CL Mean 1 8.9837 10.2604 1 95% CL Mean 1 9.6252 10.9945 1 95% CL Mean 1 9.2355 10.4169 1 95% CL Mean 1 9.1342 10.4469 1 95% CL Mean 1 9.5955 10.7123 1 95% CL Mean 1 9.215 10.5701 1 95% CL Mean 1 8.7384 10.1005 1 95% CL Mean 1 95% CL Mean 1 9.5464 95% CL Mean 95% CL Mean 95% CL Mean 9.418 10.632 95% CL Mean 95% CL Mean 10.2866 9.3576 1 95% CL Mean 95% CL Mean 95% CL Mean 9.574 1 95% CL Mean 95% CL Mean 9.6714 11.2792 10.7016 95% CL Mean 95% CL Mean 95% CL Mean 8.9885 1 95% CL Mean 95% CL Mean 10.026 9.8931 9.2468 95% CL Mean 95% CL Mean 95% CL Mean 9.2941 10.7623 95% CL Mean 95% CL Mean 9.5573 9.6341 In? 95% CL Mean 1 95% CL Mean 95% CL Mean 9.6208 10.7949 95% CL Mean 1 95% CL Mean 9.2483 9.4787 95% CL Mean 95% CL Mean 9.4689 1 95% CL Mean 95% CL Mean 9.5067 10.7147 95% CL Mean 95% CL Mean 9.0976 9.5377 CI 95% CL Mean 95% CL Mean 95% CL Mean 9.5308 In? 95% CL Mean 1 95% CL Mean 9.9826 CI 95% CL Mean 1 9.6202 11.0511 1 STAT 350 (Fall 2014) Lab 5: SAS Solution 3 2. (10 points) Determine how many of these intervals contain the population mean, µ = 10. Please indicate for each confidence interval if it contains the value or not. Is this number that you would expect? Why or why not? Solution We can see that, in this special case, there are 3 confidence intervals that do not contain the true population mean, 10. This is roughly what we would expect. The confidence level is 90%, so we would expect (0.9)(50) = 45 of the confidence intervals to include the population mean, 10. Now we have 47, which is not far from what we expect. 3. (15 points) GROUP PART: Combine your data with 3 or 4 other students (in any of my sections) and answer the following questions: 1) Is the number of intervals that contain the mean what you would expect for the combined data? 2) How are the results from part 2 and part 3 different? (This is due on the following Monday and must be submitted online also.) Solution: You will need to show your work to the number of intervals that contain the mean (just add up the numbers from each student) and then calculate the percentage by dividing by the total number. I would expect that this number would be more accurate because this is for a larger sample size. This percentage is only valid at large numbers. B. (45 points) Water quality testing. The Deely Laboratory is a drinking-water testing and analysis service. One of the common contaminants it tests for is lead. Lead enters drinking water through corrosion of plumbing materials, such as lead pipes, fixtures, and solder. The service knows that their analysis procedure is unbiased but not perfectly precise, so the laboratory analyzes each water sample three times and reports the mean result. The repeated measurements follow a Normal distribution quite closely. The standard deviation of this distribution is a property of the analytic procedure and is known to be σ = 0.25 parts per billion (ppb). The Deely Laboratory has been asked by the university to evaluate a claim that the drinking water in the Student Union has a lead concentration of 6 ppb, well below the Environmental Protection Agency’s action level of 15 ppb. Since the true concentration of the sample is the mean μ of the population of repeated analyses, the hypotheses are The lab chooses the 1% level of significance, = 0.01. They plan to perform three analyses of one specimen (n=3). 1. (30 points, 6 points each part) Using computer software, calculate the following powers: a. At the 1% level of significance, what is the power of this test against the specific alternative μ = 6.5? b. At the 5% level of significance, what is the power of this test against the specific alternative μ = 6.5? c. At the 1% level of significance, what is the power of this test against the specific alternative μ = 6.75? d. If the lab performs five analyses of one specimen (n=5), what is the power of this test against the specific alternative μ = 6.5? e. Write a short paragraph explaining the consequences of changing the significance level, alternative μ and sample size on the power. STAT 350 (Fall 2014) Lab 5: SAS Solution Solution: Sample Code: It is acceptable if you separate the different parts. data powercalculation; *This data calculates everything of part one.* n=3; alpha=.01; mu0 = 6; muprime = 6.5; sigma = 0.25; sigman = sigma/sqrt(n); z = - PROBIT(alpha/2); phi1 = z + (mu0 - muprime)/sigman; phi2 = -z + (mu0 - muprime)/sigman; beta=CDF('normal',phi1)- CDF('normal',phi2); power = 1 - beta; output; n=3; alpha=.05; mu0 = 6; muprime = 6.5; sigma = 0.25; sigman = sigma/sqrt(n); z = - PROBIT(alpha/2); phi1 = z + (mu0 - muprime)/sigman; phi2 = -z + (mu0 - muprime)/sigman; beta=CDF('normal',phi1)- CDF('normal',phi2); power = 1 - beta; output; n=3; alpha=.01; mu0 = 6; muprime = 6.75; sigma = 0.25; sigman = sigma/sqrt(n); z = - PROBIT(alpha/2); phi1 = z + (mu0 - muprime)/sigman; phi2 = -z + (mu0 - muprime)/sigman; beta=CDF('normal',phi1)- CDF('normal',phi2); power = 1 - beta; output; n=5; alpha=.01; mu0 = 6; muprime = 6.5; sigma = 0.25; sigman = sigma/sqrt(n); z = - PROBIT(alpha/2); phi1 = z + (mu0 - muprime)/sigman; phi2 = -z + (mu0 - muprime)/sigman; beta=CDF('normal',phi1)- CDF('normal',phi2); power = 1 - beta; output; run; proc print data = powercalculation ; run ; 4 STAT 350 (Fall 2014) Lab 5: SAS Solution The results are summarized in the following table. We have the following conclusion: The larger the value of (or equivalently the higher the significance level), the greater the power is. The larger the distance between the ’ and 0 is, the greater the power. The larger the sample size, the greater the power. 2. (10 points) Generate a power curve when n =3 at a 1% significance level. Please use an interval length of 4. Solution: data power; n=3; alpha=.01; mu0 = 6; sigma = 0.25; sigman = sigma/sqrt(n); z = - PROBIT(alpha/2); do muprime=4 to 8 by .005; phi1 = z + (mu0 - muprime)/sigman; phi2 = -z + (mu0 - muprime)/sigman; beta=CDF('normal',phi1)- CDF('normal',phi2); power = 1 - beta; output; end; run; title1 h=3 'Power for the Hypothesis Test'; axis1 label=(h=2); axis2 label=(h=2 angle=90); symbol1 v=none i=join c=blue; proc gplot data=power; plot power*muprime/haxis=axis1 vaxis=axis2; run; 5 STAT 350 (Fall 2014) Lab 5: SAS Solution 6 3. (5 points) What sample size would be required for the power to be at least 0.90 at the 1% level of significance against the specific alternative μ = 6.5? Solution: data size; do n = 1 to 20 by 1; *these value may be changed as appropriate; muprime = 6.5; alpha=.01; mu0 = 6; sigma = 0.25; sigman = sigma/sqrt(n); z = -PROBIT(alpha/2); *z alpha/2; phi1 = z + (mu0 - muprime)/sigman; phi2 = -z + (mu0 - muprime)/sigman; pphi1 = CDF ('normal', phi1); pphi2 = CDF ('normal', phi2); beta=CDF('normal',phi1)- CDF('normal',phi2); *CDFs of normal; power = 1 - beta; output; end; run; proc print data=size; run; STAT 350 (Fall 2014) Lab 5: SAS Solution As seen above, we need the sample size to be at least 4. 7