Lecture 1 • , X , ..., X
Transcription
Lecture 1 • , X , ..., X
Lec.1 STAT2912 Lecture 1 • Introduction • Methodology of statistical tests Assumptions: sample X1, X2, ..., Xn. Hypotheses: H vs HA Test Statistics: T = T (X1, X2, ..., Xn). p-value: p = PH (T ≥ t). Conclusion: by ”interpreting” the p-value. Introduction and Methodology 1 Lec.1 STAT2912 Introduction You often come across data in various sizes and formats. Sometimes you have to answer a “yes” or “no” question for certain natures of a data set. Data are the raw material for statisticians Example 1.1.(Beer contents) A brand of beer claims its beer content is 375 (in millilitres) on the label. A sample of 40 bottles of the beer gave a sample average of 373.9 and a standard deviation of 2.5. As a consumer one may like to ask: is there evidence that the mean content of the beer is less than the 375 mL claimed on the label? Introduction and Methodology 2 Lec.1 STAT2912 Example 1.2.(Height comparison) A fourth grade class has 10 girls and 13 boys. The children’s heights are recorded on their 10th birthday as follows: Boys: 135.3 137.0 136.0 139.7 136.5 139.2 138.8 139.6 140.0 142.7 135.5 134.9 139.5; Girls: 140.3 134.8 138.6 135.1 140.0 136.2 138.7 135.5 134.9 140.0 An interesting question is: is there evidence that girls are taller than boys on their 10th birthday? Introduction and Methodology 3 Lec.1 STAT2912 Example 1.3. (Body temprature) The dataset beav1$temp contains measurements of body temperature (in degree Celsius) over a certain time period. > beav1$temp [1] 36.33 36.34 36.35 36.42 36.55 36.69 36.71 36.75 36.81 36.88 36.89 36.91 [13] 36.85 36.89 36.89 36.67 36.50 36.74 36.77 36.76 36.78 36.82 36.89 36.99 [25] 36.92 36.99 36.89 36.94 36.92 36.97 36.91 36.79 36.77 36.69 36.62 36.54 [37] 36.55 36.67 36.69 36.62 36.64 36.59 36.65 36.75 36.80 36.81 36.87 36.87 [49] 36.89 36.94 36.98 36.95 37.00 37.07 37.05 37.00 36.95 37.00 36.94 36.88 [61] 36.93 36.98 36.97 36.85 36.92 36.99 37.01 37.10 37.09 37.02 36.96 36.84 [73] 36.87 36.85 36.85 36.87 36.89 36.86 36.91 37.53 37.23 37.20 37.25 37.20 [85] 37.21 37.24 37.10 37.20 37.18 36.93 36.83 36.93 36.83 36.80 36.75 36.71 [97] 36.73 36.75 36.72 36.76 36.70 36.82 36.88 36.94 36.79 36.78 36.80 36.82 [109] 36.84 36.86 36.88 36.93 36.97 37.15 We may ask: is there evidence that the observation is not from a normal population? Introduction and Methodology 4 Lec.1 STAT2912 The aim of this course This course will attempt to answer these typical yes or no questions. Explicitly we will introduce the methods of applied statistics, both parametric and non-parametric, for the analysis of data and the techniques of statistical inference for one, two and several samples from populations, for simple regression and for categorical data. Some of this has been touched on before in the first year Statistics course. Here we will give a more systematic treatment. Computer intensive methods and simple data analysis are also discussed in some cases. Introduction and Methodology 5 Lec.1 STAT2912 Some basic concepts A population is a collection of all possible results that are the target of our interest, described by a random variable, say X. A sample is a subset selected from a population, say X1, X2, ..., Xn. We require that each element in a dataset is a ”representative” of the population. Theoretically, this is equivalent to saying that X1, X2, ..., Xn, X are required to be independent and identically distributed (iid) random variables. A dataset x1, x2, ..., xn is a observed value of the sample X1, X2..., Xn. Sometimes we also say x1, x2, ..., xn to be a sample or sample value without confusion. Introduction and Methodology 6 Lec.1 STAT2912 Methodology of statistical tests Five steps of a general test Step 1: Model and assumptions. The model represents our belief about the probability distribution F describing the population. In general, we write X ∼ Fθ (x), where Fθ (x) is a specific distribution function only depending on the unknown parameter θ. For example, we can write X ∼ N (µ, σ 2) or X ∼ B(n, p), meaning that we assume normal or binomial model, respectively. In this case, µ, σ 2, p, θ, etc are called parameters. The corresponding statistical analysis is therefore called parametric. A statistical analysis is called nonparametric if our assumptions on the population distribution F do not specify a particular class of probability distributions. Introduction and Methodology 7 Lec.1 STAT2912 Assumptions on the model Parametric hypothesis testing: X1, X2, ..., Xn are independent random variables with Xi ∼ Fθ (x), i = 1, . . . , n. Nonparametric hypothesis testing: X1, X2, ..., Xn are iid random variables with distribution function F . Note: In general, we also require the F is not heavy (long)-tailed, that is, we require certain moment conditions. When the F is heavy tailed, the boxplot (stem-and-leaf) plot of the dataset should be skewed. For skewed data, we can sometimes transform them into symmetric. This will be discussed later. Introduction and Methodology 8 Lec.1 STAT2912 Step 2: Hypotheses on (the parameter in) the population. The statement against which we are searching for evidence is called the null hypothesis, and is denoted by H0 (or simply H). The null hypothesis is often a ”no difference” statement. The statement we will consider if H0 is false is called the alternative hypothesis, and is denoted by H1 (or HA). For example, if X ∈ Fθ (x), we may have H0 : θ = θ0; against H1 : θ > θ0 (right-sided alternative), or θ < θ0 (left-sided alternative), or θ ̸= θ0 (two-sided alternative), depending on the context, or more generally H0 : θ ∈ Θ0 VS H1 : θ ∈ Θ − Θ0 where Θ is a parameter space. Introduction and Methodology 9 Lec.1 STAT2912 Examples (cont.) In Example 1.1, if we assume that claim amount is X ∼ N (µ, σ 2), then the null and alternative hypotheses are as follows: H0 : µ = 375, H1 : µ < 375. In Example 1.2, if we assume that the height of girls X ∼ N (µ1, σ12), and the height of boys Y ∼ N (µ2, σ22), then the null and alternative hypotheses are: H0 : µ1 − µ2 = 0, H1 : µ1 − µ2 > 0. In Example 1.3, the null hypothesis is H0: the measurement X has a normal distribution. Introduction and Methodology 10 Lec.1 STAT2912 Step 3: Find a test statistic, T A function T := T (X1, ..., Xn), of the sample X1, X2, ..., Xn, is called a test statistic. To be a useful statistic for testing the null hypothesis H0, T is chosen in such a way that: A. the distribution of T is completely determined when H0 is true; i.e., when θ ∈ Θ0; B. the particular observed values of T , called rejection region, can be taken as evidence of poor agreement with the assumption that H0 is true. Introduction and Methodology 11 Lec.1 STAT2912 Step 4: Calculate the p-value (observed significance level). For given data x1, x2, ..., xn, we can calculate tobs = T (x1, ..., xn), the observed value of the test statistic T . The p-value, or observed significance level, is defined formally as the smallest α level, such that its corresponding rejection region R contains tobs. Example: Assume large values of T support the alternative. That is, R has the form [r, ∞), where r can be calculated for a specific α. Then the p-value is p = P(T ≥ tobs|H0), where the probability is assumption that H0 is true. calculated under the Depending on the context, the p-value can be calculated as p = P(T ≤ tobs|H0) (for left-sided tests), or as p = P(|T | ≥ tobs|H0) (for two-sided tests). Introduction and Methodology 12 Lec.1 STAT2912 Step 5: Conclusion The p-value is a probability, so it is a number between 0 and 1. What does it mean? We can assess the significance of the evidence provided by the data against the null hypothesis by ”interpreting” the p-value. Although such rules are arbitrary, in this course we shall use the following convention: p-value p-value p-value p-value p-value > 0.10 ∈ [0.05, 0.10) ∈ [0.025, 0.05) ∈ [0.01, 0.025) < 0.01 Introduction and Methodology Data are consistent with H0 Borderline evidence against H0 Reasonably strong evidence against H0 Strong evidence against H0 Very strong evidence against H0. 13 Lec.1 STAT2912 Remarks Recall that the p-value is equal to the probability (under the null hypothesis) that the test statistic T is as extreme, or even more extreme, as tobs. It quantifies the amount of evidence against H0 in the following sense: • If the p-value is small, then either H0 is true and the poor agreement is due to an unlikely event, or H0 is false. Therefore, The smaller the p-value, the stronger the evidence against H0 in favour of H1. • A large p-value does not mean that there is evidence that H0 is true, but only that the test detects no inconsistency between the predictions of the null hypothesis and the results of the experiment. Introduction and Methodology 14 Lec.1 STAT2912 Review of some important results 1 n ∑n Denote the sample mean by X = i=1 Xi , and the sample variance ∑n 1 2 S = n−1 i=1(Xi − X)2; √ S = S 2 is called the sample standard deviation. The following results will be used in this course without proofs. If X1, . . . , Xn are iid N (µ, σ 2), then X −µ √ ∼ N (0, 1) σ/ n ∑n 2 (X − µ) i 2 i=1 ∼ χ n σ2 (n − 1)S 2 2 ∼ χ n−1 σ2 X −µ √ ∼ tn−1 S/ n X and S 2 are independent random variables Introduction and Methodology 15 Lec.1 STAT2912 Some useful R commands pnorm(x, m, s) = P(X ≤ x), where X ∼ N (m, s2). pt(x, d) = P(X ≤ x), where X ∼ td. pchisq(x, d) = P(X ≤ x), where X ∼ χ2d. qnorm(p, m, s) is a number x P(X ≤ x) = p, where X ∼ N (m, s2). such that Examples: pnorm(1.96) gives Φ(1.96) (= 0.9750021). qnorm(0.95) gives Φ−1(0.95) (= 1.644854). Similarly for the t and χ2 variables: pt(2,3) gives P(t3 ≤ 2) = 0.930337, pchisq(2,3) gives P(χ3 ≤ 2) = 0.4275933, qchisq(0.1,5) gives 1.610308, that is, P(χ25 ≤ 1.610308) = 0.1 . Introduction and Methodology 16 Lec.2 STAT2912 Lecture 2 Tests for the population mean • Normal population with unknown variance • One sample and paired sample • One sample t-test and paired sample t-test Tests for mean 17 Lec.2 STAT2912 Tests for mean: normal population One sample: Suppose we have a sample (X1, X2, ..., Xn) of the size n drawn from a normal population with unknown variance. We want to make some statement about the population mean µ. The t-test for one sample • Assumptions: Xj are iid random variables with Xj ∼ N (µ, σ 2), where σ 2 is unknown. • Hypothesis: H : µ = µ0 vs HA : µ > µ0, or µ < µ0, or µ ̸= µ0. Tests for mean 18 Lec.2 STAT2912 • Test statistic (Student’s t-statistic): ) √ ( Tn = n X − µ0 /S, where 1 X = n ∑n 2 1 i=1 Xi and S = n−1 ∑n 2 i=1 (Xi − X) . Note that Tn ∼ tn−1 under H : µ = µ0. • p-value: Let (x1, x2, ..., xn) be a dataset, and t be the observed value of Tn, i.e. √ x − µ0)/s, t = n(¯ where 1 x ¯= n ∑n 2 1 ∑n (x − x x and s = ¯ )2 . i i=1 i=1 i n−1 ( ) p = P tn−1 ≥ t for HA : µ > µ0; ( ) p = P tn−1 ≤ t for HA : µ < µ0; ( ) p = 2 P tn−1 ≥ |t| for HA : µ ̸= µ0. • Decision: by ”interpreting” the p-value. Tests for mean 19 Lec.2 STAT2912 Remarks and the idea behind the t-test: • We have that Tn ∼ tn−1 under H0 : µ = µ0. ¯ is an ideal estimator of µ, which means that X ¯ • X is near µ0 if H : µ = µ0 is true. Therefore it is reasonable to believe√ that we should reject H in favour of HA if t = n(¯ x − µ0)/s (observed value of Tn) is large (negative large or both depending on HA). On the other hand, if t is large, then p-value is small. This coincides with the ”interpretation” to p-value. ( ) ( ) • Note ( that P t)n−1 ≥ t( = P tn−1 ) ≤ −t for t ≥ 0. P |tn−1| ≥ |t| = 2P tn−1 ≥ |t| . ( ) in R, pt(t, n-1) gives P tn−1 ≤ t . Tests for mean 20 Lec.2 STAT2912 Example 2.1. Beer contents in a six pack of VB (in millilitres) are: 374.8, 375.0, 375.3, 374.8, 374.4, 374.9. Question: Is the mean content of the beer is less than the 375 mL claimed on the label? Solution: We consider the sample X1, X2, ..., X6 are from a normal population, that is, Xj ∼ N (µ, σ 2), Tests for mean where σ 2 is unknown. 21 Lec.2 STAT2912 Hypothesis: H0 : µ = 375 vs H1 : µ < 375. For the dataset 374.8, 375.0, 375.3, 374.8, 374.4, 374.9, [∑ ] ( ) 2 1 x ¯ = 374.87, s2 = n−1 −n x ¯ = 0.087 and observed value of the test statistic Tn (= √ the ¯ − 375)/S with n = 6) is n(X n 2 j=1 xj t= √ 6(¯ x − 375)/s = −1.1094. p-value: p = P (t5 ≤ t) = 0.1589 (using table or pt(t,5) in R). Conclusion: Data is consistent with the claim on the label. Tests for mean 22 Lec.2 STAT2912 In R t.test(x, alternative=”???”,mu = µ0) is used to construct the one-sample t-test, where x is a dataset, ”???” can be ”less”, ”greater” or ”two.sided” depending on the context. Example 2.1. (cont.) For the dataset, 374.8, 375.0, 375.3, 374.8, 374.4, 374.9, and the hypothesis H0 : µ = 375 we get the result in R as follows: vs H1 : µ < 375, > x< -c(374.8, 375.0, 375.3, 374.8, 374.4, 374.9) > t.test(x, alternative="less", mu=375) One-sample t-Test data: x t = -1.1094, df = 5, p-value = 0.1589 alternative hypothesis: true mean is less than 375 95 percent confidence interval: NA 375.1088 sample estimates: mean of x 374.8667 Tests for mean 23 Lec.2 STAT2912 Remarks: • For small sample size n (n ≤ 20), the t-test is sensitive to the assumption that the sample is from a normal population. It is recommended that all data should be checked for shape by looking at a boxplot and a normal qq-plot of the data (using R commands boxplot and qqnorm). • For small sample size n (n ≤ 20), it is required that the boxplot of the data should be as symmetric as possible and the points in qq-plot should be nearly around a straightline. Note that the boxplot and the qq-plot may be misguided if n is too small (n ≤ 10). • If the sample size is large (n ≥ 20), as long as the data do not come from a long-tail distribution, the t-test should be fine for use. This will be discussed in the next Lecture. Tests for mean 24 Lec.2 STAT2912 Test for mean: normal population The paired sample: Suppose we have a sample of paired observations (i.e., before/after data or twin data). We want to make some statement about the mean difference. A typical example is as follows. Example 2.2. (From Chapter 11.3 of Rice) Blood samples from 11 individuals before and after they smoked a cigarette are used to measure aggregation of blood platelets. Before: 25 25 27 44 30 67 53 53 52 60 28; After: 27 29 37 36 46 82 57 80 61 59 43. Question: is the aggregation affected by smoking? Tests for mean 25 Lec.2 STAT2912 Analysis: Let Xj and Yj be the aggregation of blood platelets before and after smoking by same individual. In Example 2.2, we get a sample of paired observations, (x1, y1), ..., (xn, yn). Note that Xj and Yj are dependent. However the question is only concerned with the difference of the paired sample, dj = Xj − Yj , j = 1, 2, ..., n. Because (Xj , Yj ) come from different individuals, we may assume that dj are independent random variables with dj ∼ N (µ, σ 2), where σ 2 is unknown, and answer the question by testing the following hypothesis: H:µ=0 vs HA : µ ̸= 0. This reduces the paired-sample problem to a single sample (of difference) problem testing for µ = 0. Tests for mean 26 Lec.2 STAT2912 In general, let (x1, y1), ..., (xn, yn) be a sample of paired observations. We wish to test H : µx = µy vs HA : µx > µy , or µx < µy , or µx ̸= µy . This is same as observing the difference dj = Xj − Yj , j = 1, , ..., n and testing (write µd = µx − µy ) H : µd = 0 vs HA : µd > 0, or µd < 0, or µd ̸= 0. Tests for mean 27 Lec.2 STAT2912 The paired sample t-test • Assumptions: The dj (= Xj − Yj ) are a random sample from a normal population with unknown variance, that is, dj ∼ N (µ, σ 2), where σ 2 is unknown. • √ Test statistic (Student’s t-statistic): ¯ d ∼ tn−1 under H, where d¯ = nd/s and s2d = 1 n−1 ∑n i=1 (di ¯ 2. − d) Tn = 1 n ∑n i=1 di • p-value: ( ) p = P tn−1 ≥ t0 ( ) p = P tn−1 ≤ t0 ( ) p = 2 P tn−1 ≥ |t0| for HA : µd > 0; for HA : µd < 0; for HA : µd ̸= 0, √ ¯ where t0 is the observed value of Tn = nd/sd. Tests for mean 28 Lec.2 STAT2912 in R The function in R t.test(x, y, alternative=”??”, mu=0, paired=T) is used to the paired t-test, where x and y are before and after observations of the dataset (xi, yi) respectively, ”???” can be ”less”, ”greater” or ”two.sided” depending on the context. Or, you may use t.test(x-y, alternative=”??”, mu=0) Tests for mean 29 Lec.2 STAT2912 Example 2.2 (cont.) Blood samples from 11 individuals before and after they smoked a cigarette are used to measure aggregation of blood platelets. Before: 25 25 27 44 30 67 53 53 52 60 28; After: 27 29 37 36 46 82 57 80 61 59 43. Question: is the aggregation affected by smoking? Solution: We consider the aggregation difference dj ∼ N (µ, σ 2), where σ 2 is unknown. Testing the hypothesis: H : µ = 0 vs Tests for mean H : µ ̸= 0. 30 Lec.2 STAT2912 in R, > > x<-c(25, 25, 27, 44, 30, 67, 53, 53, 52, 60, 28) y<-c(27, 29, 37, 36, 46, 82, 57, 80, 61, 59, 43) > t.test(x,y, alternative="two.sided", mu=0, paired=T) Paired t-Test data: x and y t = -2.9065, df = 10, p-value = 0.0157 alternative hypothesis: true mean of differences is not equal to 0 95 percent confidence interval: -14.93577 -1.97332 sample estimates: mean of x - y -8.454545 The p-value = 0.0157. Hence there is strong evidence against H0, that is there is strong evidence that the aggregation is affected by smoking. Tests for mean 31 Lec.2 STAT2912 Remarks: • It is recommended that the data difference should be checked for shape by looking at a boxplot and a normal qq-plot of the data (using R commands boxplot and qqnorm). • For small sample size n (n ≤ 20), it is required that the boxplot of the data difference should be as symmetric as possible and the points in qq-plot should be nearly around a straightline. We have to assume the data difference is from a normal distribution if the sample size n is too small (n ≤ 10). • If the sample size is large (n ≥ 20), as long as the data difference do not come from a long-tail distribution, the t-test should be fine for use. Tests for mean 32 Lec.3 STAT2912 Lecture 3 Tests for population mean • large sample tests If sample size n (n ≥ 20 in general) is large enough, it is not necessary to assume that the sample is from a normal population. Tests for mean 33 Lec.3 STAT2912 Tests for mean Suppose we want to test a population mean µ based on a random sample X1, X2, ..., Xn, where the sample size n is considered to be large enough (n ≥ 20 in general). Let (x1, x2, ..., xn) be a observed value. If the sample does not come from a heavy tail distribution (too many outliers etc in a boxplot), we may use the following procedures to test hypothesis about the population mean µ. Tests for mean 34 Lec.3 STAT2912 Common large sample tests • Hypothesis: H : µ = µ0 vs HA : µ > µ0, or µ < µ0, or µ ̸= µ0. • Test statistic ) √ ( Student’s t-statistic: Tn = n X − µ0 /S, ∑ 1 ∑n (X − X)2 . where X = n1 ni=1 Xi and S 2 = n−1 i i=1 or z-statistic (given population variance σ 2): Zn = where X = Tests for mean 1 n ∑n i=1 √ n(X − µ0)/σ, Xi. 35 Lec.3 STAT2912 • p-value: Let t be the observed value of Tn or Zn, that is, t = t = √ √ n(¯ x − µ0)/s, n(¯ x − µ0)/σ, or with given σ. If the sample size n is large enough (n ≥ 20 in general), the p-value is approximately given by p ≈ 1 − Φ(t) for HA : µ > µ0; p ≈ Φ(t) for HA : µ < µ0; ( ) p ≈ 2 1 − Φ(|t|) for HA : µ ̸= µ0. Tests for mean 36 Lec.3 STAT2912 Remarks • The paired sample may be discussed in a similar manner. • The choice of the test statistics is based on the fact ¯ is a point estimate of µ. It means that X ¯ that X is close to µ0 if H : µ = µ0 is true. If, in reality, ¯ − µ0| is more likely to be large. µ ̸= µ0, |X • If the sample is from a normal population with given variance σ 2, the test procedure for µ (with statistic Zn) is still applicable for small sample size (n√ ≤ 20). Note that, under H : µ = µ0, Zn = n(X − µ0)/σ ∼ N (0, 1) for each n ≥ 1 in this case. • Calculation of p-value is based on the central limit theorem. • Large sample test for µ is equivalent to t-test if the sample size is large enough (n ≥ 20 in general). Tests for mean 37 Lec.3 STAT2912 Example 3.1. (From MSWA, Chapter 10) A vice president in charge of sales for a large corporation claims that salespeople are averaging no more than 15 sales contacts each week. As a check on his claim, 24 salespeople are selected at random, and the number of the contacts made by each is recorded for a single randomly selected week as follows: 18, 17, 18, 13, 15, 16, 21, 14, 24, 12, 19, 18, 17, 16, 15, 14, 17, 18, 19, 20, 13, 14, 12, 15 Does the evidence contradict the vice president’s claim? Solution: We are interested in the hypothesis that the vice president’s claim is incorrect. Thus, we are interested in testing H : µ = 15 vs HA : µ > 15, where µ is the mean number of sales contacts each week. Tests for mean 38 Lec.3 STAT2912 The data is chosen in random, the sample size is large n ≥ 20, the following boxplot shows the data is not too skew. These imply ( ) that we can use √ facts the test statistic Tn = n X − µ0 /S, and obtain the p-value by p = 1 − Φ(t), where t = √ 24(¯ x − 15)/s. in R, > x<-c(18,17,18,13,15,16,21,14,24,12,19,18, 17,16,15,14,17,18,19,20,13,14,12,15) > barx<-mean(x) > s<-sqrt(var(x)) > t<-sqrt(24)*(barx - 15)/s > p<-1-pnorm(t) > p [1] 0.007954637 The p-value is less than 0.01. Hence there is very strong evidence to indicate that the vice president’s claim is incorrect, that is, the average number of sales contacts per week exceeds 15. Tests for mean 39 x 24 22 20 18 14 12 16 24 22 20 18 16 14 12 Tests for mean -2 • • 0 1 Quantiles of Standard Normal -1 • • • •• ••• •• ••• ••• • • • • • 2 • Lec.3 STAT2912 40 Lec.3 STAT2912 One-sample t-Test in R, > x<-c(18,17,18,13,15,16,21,14,24,12,19,18, 17,16,15,14,17,18,19,20,13,14,12,15) > t.test(x, alternative="greater",mu=15) One-sample t-Test data: x t = 2.411, df = 23, p-value = 0.0121 alternative hypothesis: true mean is greater than 15 95 percent confidence interval: 15.42167 NA sample estimates: mean of x 16.45833 The p-value is 0.0121. This gives nearly same conclusion as in large sample test. Tests for mean 41 Lec.3 STAT2912 Theoretical explanation The central limit theorem: If X1, X2, ..., Xn are independent and identically distributed random variables with mean µ and finite variance σ 2, then, for n large enough (n ≥ 20, in general), √ √ ¯ − µ) n(X ∼ N (0, 1), σ (approximately); ¯ − µ) n(X ∼ N (0, 1) or tn−1, S Tests for mean (approximately). 42 Lec.3 STAT2912 Notes: • Let (x1, x2, ..., xn) be a observed value. If the observations come from a random design, the central limit theorem holds true in general. • If the boxplot of the data is too skew, transformations (discuss later) should be done prior to the use of central limit theorem. • If n is large enough (n ≥ 20), P (tn ≤ x) ≈ Φ(x) for each x. See the following density plots. This gives the explanation of that large sample test for µ is equivalent to t-test for µ if the sample size is large enough. Tests for mean 43 Tests for mean 44 dnorm(x) & dt(x, df=5) 0.1 0.0 0.2 0.3 0.4 -4 0.1 0.0 -2 x 0 2 4 dnorm(x) & dt(x, df=20) 0.2 0.3 0.4 -4 -2 x 0 2 4 Lec.3 STAT2912 Lec.3 STAT2912 Some corollaries of the central limit theorem • If Y ∼ B(n, θ), 0 < θ < 1, then for large n (n ≥ 20) (Y − nθ) √ ∼ N (0, 1), nθ(1 − θ) (approximately); • If X1, X2, ..., Xn are i.i.d. random variables from a Poisson population X, i.e., for some λ > 0, λk e−λ P (X = k) = , k! k = 0, 1, 2, ... then for large n (n ≥ 20) √ ¯ − λ) n(X √ ∼ N (0, 1), λ (approximately); Note that EX = λ and var(X) = λ. Tests for mean 45 Lec.3 STAT2912 • If X1, X2, ..., Xn are i.i.d. random variables from a exponential population X, i.e., for some β > 0, X has density function: f (x) = 1 −x/β e , if x ≥ 0; β 0, otherwise, then for large n (n ≥ 20) √ ¯ − β) n(X ∼ N (0, 1), β (approximately). Note that EX = β and var(X) = β 2. Tests for mean 46 Lec.3 STAT2912 Example 3.2. (From MSWA, Chapter 10) A manufacturer of automatic washers offers a particular model in one of three colors : A, B or C. Of the first 100 washers sold, 40 were of color A. Would you conclude that customers have a preference for color A? Solution: Let θ denote the probability that a customer prefers color A, and Y denote the number of customers who prefer color A in the first 100 customers. Then Y ∼ B(100, θ). If customers have no preference for color A, then θ = 1/3. Otherwise θ > 1/3. Hence we have to test the following hypothesis: H : θ = 1/3 vs Tests for mean HA : θ > 1/3. 47 Lec.3 STAT2912 By the central limit theorem, we can use the test statistic: 3(Y − 100/3) √ Tn = √ = . 200 100(1/3)(1 − 1/3) Y − 100/3 The observed value of this test statistic is given by √ 3(40 − 100/3) √ = 20/ 200 = 1.414. t= 200 The p-value p ≃ 1 − Φ(t) = 0.078, approximately. Conclusion: There is a borderline evidence against the null hypothesis H, that is, there is borderline evidence that customers prefer color A. Tests for mean 48 Lect.4 STAT2912 Lecture 4 Fixed level tests for mean • Significance level. Decision rule. • Critical-values and fixed-level tests. • Rejection region Critical values and fixed level tests 49 Lect.4 STAT2912 Test for mean (Summary) Suppose we want to test the mean µ of a population with data x = (x1, x2, ..., xn). • Hypothesis: H : µ = µ0 vs HA : µ > µ0 or ... • Test statistic: T = T (X1, X2, ..., Xn) (t-test and z-test) • p-value: p = PH (T ≥ t) or ..., where t = T (x1, x2, ..., xn). • Decision: ”interpreting” the p-value p-value p-value p-value p-value p-value > 0.10 ∈ (0.05, 0.10) ∈ (0.025, 0.05) ∈ (0.01, 0.025) < 0.01 Critical values and fixed level tests Data consistent with H Borderline evidence against H Reasonably strong evidence against H Strong evidence against H Very strong evidence against H . 50 Lect.4 STAT2912 significance level In this course, to test the hypothesis, our decision rule is to compare the p-value to fixed value 0.05, 0.10, etc. In practice, based on their requirements, one may make their own decision rule. They may give a preassigned level, say α, and then decide simply to accept H or reject H (accept HA) according to p > α or p ≤ α. The α is usually called the significance level of the test, which is the boundary between accept and reject the null hypothesis. Critical values and fixed level tests 51 Lect.4 STAT2912 Critical value and rejection region For the hypothesis H : µ = µ0 vs HA : µ > µ0, or ... given a significance level α, we may find a cα such that PH (T ≥ cα) = α, or ... This cα is called the critical value at the significance level α. The critical value depends on both the distribution of T under H and α. The DECISION RULE at level α can be defined in terms of the observed value t of the test statistic T , with the following convention: if t > cα, we reject H, if t < cα, we accept H, Critical values and fixed level tests or ...; or ... 52 Lect.4 STAT2912 Summary (fixed level α tests) • Hypothesis: H : µ = µ0 vs HA : µ > µ0, or ... • Test statistic: T = T (X1, X2, ..., Xn). • Critical value: cα, which is given by PH (T ≥ cα) = α, or ... • Decision: reject H if t > cα, or ...; accept H if t < cα, or ... where t is a observed value of the test statistic T . Critical values and fixed level tests 53 Lect.4 STAT2912 Test for mean (fixed level α) Suppose we have a sample of size n drawn from some population. Given a significance level α, we want to make some statement about the mean. Assumption 1: The (X1, X2, ..., Xn) is a random sample from normal population with unknown variance σ 2, i.e., Xj ∼ N (µ, σ 2), where σ 2 is unknown. • Hypothesis: H : µ = µ0 vs HA : µ > µ0, or µ < µ0, or µ ̸= µ0. Critical values and fixed level tests 54 Lect.4 STAT2912 • Test statistic: ) √ ( Tn = n X − µ0 /S, where 1 X = n ∑n 2 1 i=1 Xi and S = n−1 ∑n 2 i=1 (Xi − X) . • Decision: Let t be a observed value of Tn. We reject H and accept HA if t > tα (for HA : µ > µ0); if t < −tα (for HA : µ < µ0); if |t| ≥ tα/2 (for HA : µ ̸= µ0), where tβ (β = α or α/2) is the value given by P (tn−1 ≥ tβ ) = β. Critical values and fixed level tests 55 Lect.4 STAT2912 Table 3 is used to find the tα. With α = 0.1, tα(3) = 1.638, tα(5) = 1.476, ... With α = 0.05, tα(3) = 2.353, tα(5) = 2.015, ... ....... in R, qt(1 − β, n − 1) gives tβ (n − 1). Critical values and fixed level tests 56 Lect.4 STAT2912 Computing Critical Values in R Student’s t-Distribution with ’n’ degrees of freedom > qt(p, n) # = v, P (tn < v) = p. Standard Normal Distribution > qnorm(p) # = v, P (Z < v) = p. χ2-Distribution with ’n’ degrees of freedom > qchisq(p, n) # = v, P (χ2n < v) = p. F -Distribution with ’n1’ and ’n2’ degrees of freedom > qf (p, n1, n2) # = v, Critical values and fixed level tests P (Fn1,n2 < v) = p. 57 Lect.4 STAT2912 Example Let’s re-visit Example 2.2. (From Chapter 11.3 of Rice) Blood samples from 11 individuals before and after they smoked a cigarette are used to measure aggregation of blood platelets. Before: 25 25 27 44 30 67 53 53 52 60 28; After: 27 29 37 36 46 82 57 80 61 59 43. Question: is the aggregation affected by smoking at the 0.05 significance level? Solution: This is a paired sample. We may reduce the problem to one sample by considering the aggregation difference dj (yj − xj ): 2, 4, 10, -8, Critical values and fixed level tests 16, 15, 4, 27, 9, -1, 15 58 Lect.4 STAT2912 The sample size is not large enough (n ≤ 20). It needs to assume that the dj is from a normal population, i.e., dj ∼ N (µ, σ 2). We are interested in the testing: H: µ=0 vs HA : µ ̸= 0. In terms of the unknown population variance, the test statistic is chosen to be Tn = √ n(d¯ − 0)/sd, with 1 s2d = n−1 ∑n ¯2 j=1 (dj − d) . Note that d¯ = 8.45, s2d = √ 93.07 and n = 11. The observed value of Tn is t = 11(d¯ − 0)/sd = 2.90. With given significance level α = 0.05, the critical value for HA : µ ̸= 0 is t0.025(10) = 2.228. Because t = 2.90 > t0.025(10), we reject the null hypothesis and accept the alternative hypothesis, that is, the aggregation is affected by smoking at the 0.05 significance level. Critical values and fixed level tests 59 Lect.4 STAT2912 Test for mean (fixed level α) Suppose we have a sample of the size n drawn from some population. Given a significance level α, we want to make some statement about the mean. Assumption 2: The sample (X1, X2, ..., Xn) is not from a population having a long tail, and sample size n is large (n ≥ 20). • Hypothesis: H : µ = µ0 vs HA : µ > µ0, or µ < µ0, or µ ̸= µ0. Critical values and fixed level tests 60 Lect.4 STAT2912 • Test statistic Student’s t-statistic: ) √ ( Tn = n X − µ0 /S, where 1 X = n ∑n 2 1 i=1 Xi and S = n−1 ∑n 2 i=1 (Xi − X) . or z-statistic (given population variance σ 2): Zn = √ n(X − µ0)/σ. • Decision: Let t be a observed value of Tn or Zn, that is, t = t = √ √ n(¯ x − µ0)/s, n(¯ x − µ0)/σ, Critical values and fixed level tests or with given σ. 61 Lect.4 STAT2912 We reject H and accept HA if if t > zα (for HA : µ > µ0); if t < −zα (for HA : µ < µ0); if |t| ≥ zα/2 (for HA : µ ̸= µ0), where zβ (β = α or α/2) is the value which is approximately given by Φ(zβ ) = 1 − β. Table 2 can be used to find zα. For instance, z0.025 = 1.96, z0.05 = 1.645, z0.10 = 1.28. Critical values and fixed level tests 62 Lect.4 STAT2912 Remarks • The paired sample may be discussed in a similar method. • If the sample is from a normal population with given variance σ 2, the testing procedure (with statistic Zn) is still applicable for small sample size (n ≤ 20). • The set {x : t ≥ tα}, ..., or {x : t ≥ zα} ,..., is generally called rejection region. If an observation x is in the rejection region then we say that we reject the null hypothesis at the level α or that the test is significant at the level α. • If the sample size is large enough (n ≥ 20), the rejection regions of large sample test and t-test are nearly same. Critical values and fixed level tests 63 Lect.4 STAT2912 Example 4.1. Let X1, X2, ..., X25 be a sample from a normal distribution with mean µ and variance 100. Find the rejection region for a test at level α = 0.10 of the hypothesis: H: µ=0 vs µ>0 ¯ based on a observed value of the sample mean X. Solution: The sample is from a normal population with given variance σ 2 = 100. It is known that we should reject the null hypothesis at level α = 0.10 if Critical values and fixed level tests 64 Lect.5 STAT2912 Lecture 5 Performances of statistical tests Some tests may perform better than others. • Type I and Type II errors. • Power of a fixed-level-test. • A method to improve power: increasing the sample size. Power and sample size 65 Lect.5 STAT2912 Significance tests with fixed level α gives a rule for making decision: if the values of the test statistic fall in the rejection region, we reject the null hypothesis H, and accept the alternative hypothesis HA. Note that for any fixed rejection region, two types of errors can be made in reaching a decision. A type I error: we reject H, but in fact H is true. A type II error: we accept H, but in fact H is false. Power and sample size 66 Lect.5 STAT2912 Revisit the test for mean µ (fixed level α) • Hypothesis: H : µ = µ0 vs HA : µ > µ0, or ... • Test statistic: T = T (X1, X2, ..., Xn). • Decision: We reject H and accept HA if t ≥ cα (or ...), where t is a observed value of T . A type I error: with some observation value t ≥ cα, or ... we reject H : µ = µ0, but in fact H : µ = µ0 is true. A type II error: with some observation value t < cα, or ... we accept H : µ = µ0, but in fact H : µ = µ0 is false (hence the true value of µ is some µ1 > µ0). Note: The discussion is only for HA : µ > µ0, but the idea still holds for other cases. Power and sample size 67 Lect.5 STAT2912 We have that the probability of a type I error = PH (T ≥ cα) = α, i.e., the significance level of the test. With the true value µ = µ1, the probability of a type II error is given by βα(µ1) = Pµ1 (T < cα), µ1 ≥ µ0, where Pµ1 (.) denotes that this probability is calculated on the assumption that µ = µ1. Power and sample size 68 Lect.5 STAT2912 1 − βα(µ1) is called Power of the test at the significance level α for µ1 ∈ HA. Note that the power: Power(µ1) := 1 − βα(µ1) = Pµ1 (T ≥ cα), is a function of the parameter µ1 ∈ HA. It is the probability of rejecting H with the true value µ = µ1. Power and sample size 69 Lect.5 STAT2912 Remarks Recall that we may have different tests (teststatistics) for same Hypothesis H vs HA. • With a given significance level α, ideally we would like to have a test such that the power Power(µ1) is 1 for all µ1 ∈ HA. However it is NOT possible. • A more practical method is to choose the test so that the power is as large as possible for all, or at least for some µ1 ∈ HA. This is sometimes possible and the theory of optimal tests stresses this question. This course will not consider this formal theory, but will just choose the tests which seem to produce high power. Power and sample size 70 Lect.5 STAT2912 Power of the z-test Example 5.1 Suppose that X1, X2, ..., Xn are independent N (µ, 1), and we wish to test H:µ=0 vs HA : µ > 0, at the level α = 0.05. Find the power of the z-test. With given variance σ 2 = 1, the test-statistic is given by Power and sample size 71 Lect.5 STAT2912 The power of the z-test for µ ∈ HA is calculated by Power and sample size 72 Lect.5 STAT2912 With given n = 9, Power(0.2) = 0.148, Power(1) = 0.912, Power(0.5) = 0.442, ... With given n = 16, Power(0.2) = 0.199, Power(1) = 0.99, Power(0.5) = 0.639, , ... With given n = 25, Power(0.2) = 0.259, Power(1) = 0.999, Power(0.5) = 0.803, , ... ..... Power and sample size 73 20 40 60 80 100 0.0 0.2 0.4 0.6 0.8 1.0 •• ••• ••• • • •• ••• •• • •• •• •• • •• •• • • •• •• • •• •• • •• •• • •• •• • •• •• • •• •• • • •• •• • • •• •• • • •• ••• • • •• ••• ••• • • • ••• ••••• •••• •••••• ••••• • • • • •• •••• •••• • • • •• ••• ••• • • ••• ••• • • •• •• •• • • •• •• • • •• •• • •• •• • •• •• • • •• • •• •• • •• • • • • • • • • • • 0.8 0.6 0.4 0.2 0.8 0.6 0.4 0.2 Power and sample size 0 The Power as a function of mu, n=9 The Power as a function of n, mu=0.3 Lect.5 STAT2912 Power curve 74 Lect.5 STAT2912 Remarks There are four ways to increase the power: • Increase α. A 5% test of significance will have a greater chance of rejecting the alternative than 1% test because of evidence required for rejection is less. • Consider a particular alternative that is farther away from µ0. Values of µ that are in HA but lie close to the hypothesized value µ0 are harder to detect (lower power) than values of µ that are far from µ0. • Increase the sample size. More data will provide more information about x ¯ so we have a better chance of distinguishing values of µ. • Decrease σ. This has the same effect as increasing the sample size: more information about µ. Power and sample size 75 Lect.5 STAT2912 Choice of sample size Let us consider the following hypothesis: H : µ = µ0 vs HA : µ > µ0. We may be interested in: what is the smallest sample size such that the test at the level α has power at least 100(1−β)% when µ = µ1? Power and sample size 76 Lect.5 STAT2912 Example 5.2. Suppose X1, . . . , Xn independent N (µ, 1), and we wish to test H:µ=0 vs are H : µ > 0. How large must n be to be at least 95% sure of finding evidence at the 5% level when µ = 1? Solution: Power and sample size 77 Lect.5 STAT2912 Example 5.3. We plan to observe X from B(n, θ). Consider the hypothesis: H : θ = 1/2 vs HA : θ = 0.6. How large must n be to make it 90% certain that evidence against H will be obtained at the 5% level? Solution: We will use a normal approximation. We have a result significant at the 0.05 level if X − n/2 √ ≥ 1.645, n/4 i.e. √ X ≥ 1.645 n/4 + n/2. The power for the test at the 5% level is ( ) √ Power(0.6) = P0.6 X ≥ 1.645 n/4 + n/2 ( ) √ 1.645 n/4 + n/2 − 0.6n √ ≈ 1−Φ . 0.6 × 0.4 × n Power and sample size 78 Lect.5 STAT2912 So to get Power(0.6) = 0.9 we need to equate this probability with 0.9. Note that Φ(−1.28) = 0.10. We need to solve √ 1.645 n/4 − n/10 √ −1.28 = 0.24n giving ( n= )2 √ 1.28 0.24 + 0.8225 = 210.1. 0.1 So the sample size must be more than 210. Power and sample size 79 Lect.5 STAT2912 More exact results could now be obtained using R function pbinom, using n = 210 as a starting point: qbinom(.95,210,.5) # gives 117 > 1-pbinom(117,210,.6) [1] 0.8840604 ## power only 88% > qbinom(.95,211,.5) # gives 117 > 1-pbinom(117,211,.6) [1] 0.8990565 ## almost there! qbinom(.95,212,.5) gives 118 > 1-pbinom(118,212,.6) [1] 0.8883305 ##worse! > qbinom(.95,213,.5) gives 118 > 1-pbinom(118,213,.6) [1] 0.9028448 #OK So we need n = 213 and critical region x ≥ 119. Power and sample size 80 Lect.6 STAT2912 Lecture 6 Confidence intervals and tests • Definition of confidence interval • Confidence interval for the mean • Relationships between testing procedures and confidence intervals Confidence intervals and tests 81 Lect.6 STAT2912 Definition of confidence interval A point estimate of a parameter θ is a ”guess” of the unknown θ without the discussion of accuracy. A method of giving accurate information about the parameter θ is to find an interval estimate or confidence interval for it. Definition: Let θˆL and θˆR be two statistics. If P (θˆL ≤ θ ≤ θˆR) = 1 − α, then the random interval [θˆL, θˆR] is called a 100(1 − α)% confidence interval for θ, and 100(1 − α)% is called the confidence level of the interval. Typically, α is chosen to be 0.01, 0.05, 0.10, etc, and then we get 99%, 95%, 90% confidence interval accordingly. Confidence intervals and tests 82 Lect.6 STAT2912 Remarks: • There could have many confidence intervals for θ. The ideal one should be as narrow as possible. • θˆL can be chosen to be −∞ and θˆR can be chosen to be ∞. Then we get the so-called one-sided confidence interval (−∞, θˆR] and [θˆL, ∞) accordingly. In these cases, the θˆR and θˆL are also called 100(1− α)% upper bound for θ, and 100(1 − α)% lower bound for θ, respectively. Confidence intervals and tests 83 Lect.6 STAT2912 Confidence intervals for the mean µ Assumption 1: The (X1, X2, ..., Xn) is a random sample from normal population, that is, Xj ∼ N (µ, σ 2), where σ 2 is unknown. A 100(1 − α)% confidence interval for the µ is √ √ ¯ ¯ [X − tα/2,n−1S/ n, X + tα/2,n−1S/ n], ¯ and S are defined as before, and tα/2,n−1 where X (denoted by tα/2 below) is given by ( ) P tn−1 ≥ tα/2 = α/2. 100(1 − α)% one-sided confidence intervals for the µ are given by √ ¯ [X − tαS/ n, ∞); Confidence intervals and tests √ ¯ (−∞, X + tαS/ n]. 84 Lect.6 STAT2912 In fact, by using the fact that for any µ, √ ¯ −µ)/S ∼ tn−1 n(X [ √ √ ] ¯ ¯ P X − tα/2S/ n ≤ µ ≤ X + tα/2S/ n √ ¯ ] [ n(X − µ) ≤ tα/2 = P − tα/2 ≤ S = 1 − α. This gives the result on the two-sided confidence interval for the µ by definition. By analogous arguments, one-sided confidence intervals for the µ are derived. Confidence intervals and tests 85 Lect.6 STAT2912 Relationships between testing procedures and confidence intervals Consider testing for the mean µ: H : µ = µ0 vs HA : µ ̸= µ0 at the level α. Under the assumption 1, the result in Lecture 4 indicates that we would use a t-test based on the test statistic √ ¯ n(X − µ0) , Tn = S and would reject H if the observed value t of Tn fell in the rejection region {|t| > tα/2}. The complement of the rejection region is sometimes called ”acceptance region” for the test. Confidence intervals and tests 86 Lect.6 STAT2912 Hence for any of our two-sided tests at level α, the ”acceptance region” is given by {|t| < tα/2} That is, we do not reject H : µ = µ0 in favour of HA : µ ̸= µ0 if √ −tα/2 < n(¯ x − µ0) < tα/2. s Restated, the null hypothesis is not rejected (is accepted) at level α if √ √ x ¯ − tα/2s/ n < µ0 < x ¯ + tα/2s/ n. √ √ Notice that x ¯ − tα/2s/ n and x ¯ + tα/2s/ n just are the lower and upper endpoints, respectively, of a 100(1 − α)% two sided confidence interval for µ. Confidence intervals and tests 87 Lect.6 STAT2912 Thus, a duality exists between confidence intervals and tests. To test the hypothesis (at level α): H : µ = µ0 vs HA : µ ̸= µ0. • accept H : µ = µ0 (at level α), if the µ0 lies inside a 100(1 − α)% confidence interval for the µ √ √ ¯ + tα/2s/ n]; [¯ x − tα/2s/ n, x • reject H : µ = µ0, if µ0 lies outside the interval. Equivalently, a 100(1 − α)% confidence interval for µ can be interpreted as the set of all values of µ0 for which H : µ = µ0 is ”acceptable” at level α. Confidence intervals and tests 88 Lect.6 STAT2912 Confidence intervals for the mean µ Assumption 2: The sample (X1, X2, ..., Xn) is from a population without long tails (boxplot of data is not too skew), and sample size n is large (n ≥ 20). A 100(1 − α)% confidence interval for the µ (approximately) Unknown population variance σ 2: √ √ ¯ ¯ [X − zα/2S/ n, X + zα/2S/ n], ¯ and S are defined as before, and zα/2 is given where X by Φ(zα/2) = 1 − α/2. Given population variance σ 2: as above expect that the S is replaced by σ. Confidence intervals and tests 89 Lect.6 STAT2912 Remarks • The large sample confidence intervals are obtained by using the central limit theorem (see Lecture 3). • By analogous arguments as in Assumption 1, (a). Assume that Assumption 2 holds true and σ 2 is unknown. To test the hypothesis (at level α): H : µ = µ0 vs HA : µ ̸= µ0 ∗ accept H : µ = µ0 , if the µ0 lies inside √ √ [¯ x − zα/2s/ n, x ¯ + zα/2s/ n] ∗ reject H : µ = µ0 if µ0 lies outside the interval. Confidence intervals and tests 90 Lect.6 STAT2912 (b). Assume that Assumption 2 holds true and σ 2 is given. To test the hypothesis (at level α): H : µ = µ0 vs HA : µ ̸= µ0 ∗ accept H : µ = µ0, if the µ0 lies inside √ √ [¯ x − zα/2σ/ n, x ¯ + zα/2σ/ n] ∗ reject H : µ = µ0 if µ0 lies outside the interval. Confidence intervals and tests 91 Lect.6 STAT2912 One-sided confidence interval and duality Under Assumption 1, to test the hypothesis (at level α): / H : µ = µ0 vs HA : µ > µ0 µ < µ0 • accept H : µ = µ0, if the µ0 lies inside / √ √ [¯ x − tαs/ n, ∞) (−∞, x ¯ + tαs/ n]; • reject H : µ = µ0, if µ0 lies in / √ √ [¯ x + tαs/ n, ∞). (−∞, x ¯ − tαs/ n] Confidence intervals and tests 92 Lect.6 STAT2912 Under Assumption 2, to test the hypothesis (at level α): / H : µ = µ0 vs HA : µ > µ0 µ < µ0 • (unknown σ 2) accept H : µ = µ0, if the µ0 lies inside / √ √ [¯ x − zαs/ n, ∞) (−∞, x ¯ + zαs/ n]; reject H : µ = µ0 if µ0 lies in / √ √ (−∞, x ¯ − zαs/ n] [¯ x + zαs/ n, ∞). • (given σ 2) accept H : µ = µ0, if the µ0 lies inside / √ √ [¯ x − zασ/ n, ∞) (−∞, x ¯ + zασ/ n]; reject H : µ = µ0 if µ0 lies in / √ √ (−∞, x ¯ − zασ/ n] [¯ x + zασ/ n, ∞). Confidence intervals and tests 93 Lect.6 STAT2912 Example 6.1. A brand of beer claims its beer content is 375 (in millilitres) on the label. A sample of 40 bottles of the beer gave a sample average of 373.9 and a standard deviation of 2.5. Questions: (a). is there evidence that the mean content of the beer is less than the 375 mL claimed on the label at level α = 0.05? (b). Construct a 95% one-sided confidence interval that is related to the hypothesis in (a) for the true average beer content. (c). Based on this interval, should the claim on the label be rejected? (d). To accept the claim on the label at level α = 0.05, how many millilitres do we need for the sample average with a standard deviation of 2.5? Confidence intervals and tests 94 Lect.6 STAT2912 Solution: (a). Since it is necessary to test whether the mean content µ is less than the claim 375, the hypothesis to be tested is H : µ = 375 vs µ < 375. The population variance is unknown. √ ¯We need to use the student t-statistic: Tn = n(X − 375)/S. The sample size is large (n = 40). The critical value for the test at the level α = 0.05 is given by −z0.05 = −1.645. The observed value of Tn is √ t = 40(373.9 − 375)/2.5 = −2.7828 < −z0.05. So we should reject the null H at the level α = 0.05, that is, there is evidence that the mean content of the beer is less than the 375 mL at the level α = 0.05. Confidence intervals and tests 95 Lect.6 STAT2912 (b). With unknown population variance, the 95% onesided confidence interval related to the hypothesis in (a) for the true average beer content is √ (−∞, 373.9 + z0.05s/ 40] = (−∞, 374.55]. (c). Based on this interval, we should reject the claim on the label because 375 is in outside of the interval. It is same conclusion as in question (a). (d). To make the claim on the label acceptable at level α = 0.05, the sample average should satisfies that √ 375 ≤ x ¯ + z0.05s/ 40. With s = 2.5, the x ¯ should be at least √ 375 − 1.645 × 2.5/ 40 = 374.3498. Confidence intervals and tests 96