A Solution Manual For the Book: Probability and Statistics:

Transcription

A Solution Manual For the Book: Probability and Statistics:
A Solution Manual For the Book:
Probability and Statistics:
For Engineering and the Sciences (7th Edition)
by Jay L. Devore
John L. Weatherwax∗
October 18, 2005
Introduction
This solution manual was prepared for the seventh edition of Devore’s textbook. I would
suspect that other editions would be very similar.
Some Useful Formulas in Linear Regression
X
X
1X 2
yi
SST = Syy =
(yi − y¯)2 =
yi2 −
n
X
SSE =
(yi − yˆi )2 error sum of squares
SSR = SST − SSE regression sum of squares
SSR
SSE
R2 =
=1−
.
SST
SST
total sum of squares
(1)
(2)
(3)
(4)
Note that R2 is the percent of variance explained and can be calculated both in and out of
sample (with coefficients estimated using the in sample data).
∗
[email protected]
1
Probability
Problem Solutions
Note all R scripts for this chapter (if they exist) are denoted as ex2 NN.R where NN is the
section number.
Exercise 2.1
Part (a): The sample space S is the set of all possible outcomes. Thus using the integer
shorthand suggested in the problem for all of the possible outcomes we have
S = {1324, 1342, 3124, 3142, 1423, 1432, 4123, 4132, 2314, 2341, 3214, 3241, 2413, 2431, 4213, 4231} .
From which we see that there are 16 elements.
Part (b): This would be all number strings that begin with a 1 or
A = {1324, 1342, 1423, 1432} .
Part (c): This would be all number strings from S with a two in the first or second position
or
B = {2314, 2341, 3214, 3241, 2413, 2431, 4213, 4231} .
Part (d): This is the union of the two sets A and B or
A ∪ B = {1324, 1342, 1423, 1432, 2314, 2341, 3214, 3241, 2413, 2431, 4213, 4231} .
Now A∩B = ∅ and A′ = {3124, 3142, 4123, 4132, 2314, 2341, 3214, 3241, 2413, 2431, 4213, 4231}.
Exercise 2.2
Part (a): These would be
A = {LLL, RRR, SSS} .
Part (b): This would be
B = {LRS, LSR, RLS, RSL, SLR, SRL} .
Part (c): This would be
C = {RRL, RRS, RLR, RSR, LRR, SRR} .
2
Part (d): This would be
D = {RRL, RRS, RLR, RSR, LRR, SRR, LLR, LLS, LRL, LSL, RLL, SLL, SSR, SSL, SLS, SRS, LSS, RSS} .
Part (e): We have
D ′ = {LLL, RRR, SSS, LRS, RSL, SLR, LSR, RLS, SRL} ,
that is the event that all cars go the same direction or no cars go in the same direction. We
have that C ∪ D = D and C ∩ D = C as C is a subset of D.
Exercise 2.3
The total sample space for this problem would be
S = {(S, S, S), (S, S, F ), (S, F, S), (F, S, S), (F, F, S), (F, S, F ), (S, F, F ), (F, F, F )} .
Part (a): This would be
A = {(S, S, F ), (S, F, S), (F, S, S)} .
Part (b): This would be
B = {(S, S, S), (S, S, F ), (S, F, S), (F, S, S)} .
Part (c): This would happen if 1 functions and 2 or 3 (or both) function. These events are
C = {(S, S, S), (S, S, F ), (S, F, S)} .
Part (d): We have
C′
A∪C
A∩C
B∪C
B∩C
= {(F, S, S), (F, F, S), (F, S, F ), (S, F, F ), (F, F, F )}
= {(S, S, S), (S, S, F ), (S, F, S), (F, S, S)}
= {(S, S, F ), (S, F, S)}
= B since C is a subset of B
= C since C is a subset of B .
Exercise 2.4
Part (a): We have
S = {F F F F,
F F F V, F F V F, F V F F, V F F F,
F F V V, F V F V, V F F V, F V V F, V F V F, V V F F, V V V F,
V V V F, V V F V, V F V V, F V V V,
VVVV}.
3
Part (b): These would be
{F F F V, F F V F, F V F F, V F F F } .
Part (c): These would be
{F F F F, V V V V } .
Part (d): These would be
{F F F F, F F F V, F F V F, F V F F, V F F F } .
Part (e): The union is
{F F F F, F F F V, F F V F, F V F F, V F F F, V V V V } ,
while the intersection is
{F F F F } .
Part (f): The union is
{F F F F, F F F V, F F V F, F V F F, V F F F, V V V V } ,
while the intersection is the empty set ∅.
Exercise 2.5
Part (a): We could have 1, 2, or 3 for the first persons station assignment, 1, 2, or 3 for the
second persons station assignment, and 1, 2, or 3 for the third persons assignment. Thus the
sample space would be tuples of the form (i, j, k) where i, j, and k are taken from {1, 2, 3}.
Part (b): This would be the outcomes
{(1, 1, 1) , (2, 2, 2) , (3, 3, 3)} .
Part (c): This would be the outcomes
{(1, 2, 3) , (1, 3, 2) , (2, 1, 3) , (2, 3, 1) , (3, 1, 2) , (3, 2, 1)} .
Part (d): This could be obtained by enumerating all elements from S (as in Part (a)) and
removing any elementary events that have a two in them.
4
Exercise 2.6
Part (a): Our sample space is
S = {3, 4, 5, 13, 14, 15, 23, 24, 25, 123, 124, 125, 213, 214, 215} .
Part (b): This would be
A = {3, 4, 5} .
Part (c): This would be
B = {5, 15, 25, 125, 215} .
Part (b): This would be
C = {3, 4, 5, 23, 24, 25} .
Exercise 2.8 (the language of sets)
Part (a): A1 ∪ A2 ∪ A3 .
Part (b): A1 ∩ A2 ∩ A3 .
Part (c): A1 ∩ (A2 ∪ A3 )′ .
Part (d): (A1 ∩ A′2 ∩ A′3 ) ∪ (A′1 ∩ A2 ∩ A′3 ) ∪ (A′1 ∩ A′2 ∩ A3 ).
Part (e): A1 ∪ (A2 ∩ A3 ).
Exercise 2.10
Part (a): Three events that are mutually exclusive are the type of car bought from the
types
A = {Chevrolet, Pontiac, Buick}
B = {Ford, Mercury}
C = {Plymouth, Chrysler} .
Then A, B, and C are mutually exclusive.
Part (b): No. Consider the sets A, B = A and C defined as in Part (a). That is take B to
be the same set as A. Then A ∩ B ∩ C = ∅ but A and B are equal and cannot be mutually
exclusive.
5
Exercise 2.11
Part (a): 0.07
Part (b): 0.15 + 0.1 + 0.05 = 0.3.
Part (c): 1 − 0.18 − 0.25 = 1 − 0.43 = 0.57.
Exercise 2.12
Part (a): This is
P (A ∪ B) = P (A) + P (B) − P (A ∪ B) = 0.5 + 0.4 − 0.25 = 0.65 .
(5)
Part (b): This would be
P ((A ∪ B)′ ) = 1 − P (A ∪ B) = 1 − 0.65 = 0.35 .
Part (c): This would be the event A ∩ B ′ . To compute its probability we recall that
P (A) = P (A ∩ B) + P (A ∩ B ′ ) or 0.5 = 0.25 + P (A ∩ B ′ ) ,
so P (A ∩ B ′ ) = 0.25.
Exercise 2.13
Part (a): A1 ∪ A2 is the event that we are awarded project 1 or project 2. Its probability
can be calculated as
P (A1 ∪ A2 ) = P (A1 ) + P (A2 ) − P (A1 ∩ A2 ) = 0.22 + 0.25 − 0.11 = 0.36 .
Part (b): Since A′1 ∩ A′2 = (A1 ∪ A2 )′ this event is the outcome that we don’t get project 1
or project 2. This probability is then given by
1 − P (A1 ∪ A2 ) = 0.64 .
Part (c): The event A1 ∪ A2 ∪ A3 is the outcome that we get one of the projects 1 or 2, or
3. Its probability is given by
P (A1 ∪ A2 ∪ A3 ) = P (A1 ) + P (A2 ) + P (A3 ) − P (A1 ∩ A2 ) − P (A1 ∩ A3 ) − P (A2 ∩ A3 ) + P (A1 ∩ A2 ∩ A3 )
= 0.22 + 0.25 + 0.28 − 0.11 − 0.05 − 0.07 + 0.01 = 0.53 .
6
Part (d): This event is we don’t get any of the three projects. Using the identity
A′1 ∩ A′2 ∩ A′3 = (A1 ∪ A2 ∪ A3 )′ ,
its probability is given by
P (A′1 ∩ A′2 ∩ A′3 ) = 1 − P (A1 ∪ A2 ∪ A3 ) = 1 − 0.53 = 0.47 .
Part (e): This is the event that we don’t get project 1 and 2 but do get project 3. Its
probability is given by using the fact that
P ((A′1 ∩ A′2 ) ∩ A3 ) + P ((A′1 ∩ A′2 ) ∩ A′3 ) = P (A′1 ∩ A′2 ) ,
or with what we know
P (A′1 ∩ A′2 ∩ A3 ) + 0.47 = 0.64 so P (A′1 ∩ A′2 ∩ A3 ) = 0.17 .
Part (f): This is the event that we don’t get project 1 or 2 but do get project three. To
find its probability we first notice that
(A′1 ∩ A′2 ) ∪ A3 = (A1 ∪ A2 )′ ∪ A3 = ((A1 ∪ A2 ) ∩ A′3 )′ .
Thus if we can compute the probability of (A1 ∪ A2 ) ∩ A′3 we can compute the desired
probability. To compute this probability note that
[(A1 ∪ A2 ) ∩ A′3 ] ∪ [(A1 ∪ A2 ) ∩ A3 ] = A1 ∪ A2 ,
and the two sets on the left-hand-side are disjoint so we have
P ((A1 ∪ A2 ) ∩ A′3 ) + P ((A1 ∪ A2 ) ∩ A3 ) = P (A1 ∪ A2 ) .
(6)
From Part (a) we know the value of the right-hand-side is 0.36. To compute P ((A1 ∪A2 )∩A3 )
we note that by distributing the intersection over the unions we have
(A1 ∪ A2 ) ∩ A3 = (A1 ∩ A3 ) ∪ (A2 ∩ A3 ) .
We can now use Equation 5 to write the probability of the above event as
P ((A1 ∪ A2 ) ∩ A3 ) = P (A1 ∩ A3 ) + P (A2 ∩ A3 ) − P ([A1 ∩ A3 )] ∩ [A2 ∩ A3 ])
= 0.05 + 0.07 − P (A1 ∩ A2 ∩ A3 ) = 0.05 + 0.07 − 0.01 = 0.11 .
Using this in Equation 6 we have
P ((A1 ∪ A2 ) ∩ A′3 ) + 0.11 = 0.36 so P ((A1 ∪ A2 ) ∩ A′3 ) = 0.25 .
Finally with this we have the desired probability of
P ((A′1 ∩ A′2 ) ∪ A3 ) = 1 − P ((A1 ∪ A2 ) ∩ A′3 ) = 1 − 0.25 = 0.75 .
If anyone knows of a more direct method at obtaining this result please contact me.
7
Exercise 2.14
Part (a): Using
P (A ∪ B) = P (A) + P (B) − P (A ∩ B) we have 0.9 = 0.8 + 0.7 − P (A ∩ B) ,
or P (A ∩ B) = 0.6.
Part (b): This would be the event (A ∩ B ′ ) ∪ (A′ ∩ B). Since these two events are disjoint,
the probability of it is given by
P (A ∩ B ′ ) + P (A′ ∩ B) .
Lets compute each one. Using
A = (A ∩ B ′ ) ∪ (A ∩ B) we get
P (A) = P (A ∩ B ′ ) + P (A ∩ B) so with what we know
0.8 = P (A ∩ B ′ ) + 0.6 .
Thus we get that P (A∩B ′ ) = 0.2. Using the same method we compute that P (A′ ∩B) = 0.1.
Thus the probability we want is given by 0.2 + 0.1 = 0.3.
Exercise 2.15
Let G stand for a gas dryer and E stand for an electric dryer.
Part (a): We are told that
P ({GGGGG, EGGGG, GEGGG, GGEGG, GGGEG, GGGGE}) = 0.428 ,
and the event we want is the complement of the above event thus has a probability given by
1 − 0.428 = 0.572.
Part (b): This would be
1 − P ({GGGGG}) − P ({EEEEE}) = 1 − 0.116 − 0.005 = 0.879 .
Exercise 2.16
Part (a): The set would be
{CDP, CP D, DCP, DP C, P CD, P DC} ,
and each would get a probability of 1/6.
Part (b): This happens in two of the six samples so our probability is 2/6 = 1/3.
Part (c): This happens in only one sample so our probability is 1/6.
8
Exercise 2.17
Part (a): There could be other statistical software besides SPSS and SAS.
Part (b): P (A′) = 1 − P (A) = 0.7.
Part (c): We have
P (A ∪ B) = P (A) + P (B) − P (A ∩ B) = 0.3 + 0.5 − 0 = 0.8 ,
since P (A ∩ B) = 0 as there are no events in the set A ∩ B.
Part (d): We have
P (A′ ∩ B ′ ) = P ((A ∪ B)′ ) = 1 − P (A ∪ B) = 1 − 0.8 = 0.2 .
Exercise 2.18
This event will happen if we don’t select a bulb rated 75 Watts on the first draw. That we
6
= 23 . The
select one rated 75 Watt on the first draw will happen with probability 4+5+6
probability we require at least two draws in then 1 − 23 = 13 .
Exercise 2.19
Let A be the event that A found a defect and similarly for B. Then in the problem statement
we are told that
724
= 0.0724
10000
P (B) = 0.0751
P (A ∪ B) = 0.1159 .
P (A) =
Part (a): P ((A ∪ B)′ ) = 1 − P (A ∪ B) = 0.8841.
Part (b): We need to compute P (B ∩ A′ ). To do this note that
B = (B ∩ A′ ) ∪ (B ∩ A) and thus P (B) = P (B ∩ A′ ) + P (B ∩ A) .
To use this we need to compute P (B ∩ A). We can get that since
−P (A ∩ B) = P (A ∪ B) − P (A) − P (B) = 0.1159 − 0.0724 − 0.0751 = −0.0316 .
Thus P (A ∩ B) = 0.0316. Using this we have
0.0751 = P (B ∩ A′ ) + 0.0316 so P (B ∩ A′ ) = 0.0435 .
9
Exercise 2.20
Part (a): The simple events for this problem are tuples containing the shift and whether
or not the conditions of the accident were “unsafe” or “unrelated”. Thus we would have
S = {(Day, Unsafe), (Swing, Unsafe), (Night, Unsafe), (Day, Unrelated), (Swing, Unrelated), · · · } .
Part (b): This would be
0.1 + 0.08 + 0.05 = 0.23 .
Part (c): We have
P (Day) = 0.1 + 0.35 = 0.45 so P (Day′ ) = 1 − P (Day) = 0.55 .
Exercise 2.21
Part (a): This would be 0.1.
Part (b): This would be
P (Low Auto) = 0.04 + 0.06 + 0.05 + 0.03 = 0.18
P (Low Homeowners) = 0.06 + 0.10 + 0.03 = 0.19 .
Part (c): This would be
P ((Low Auto, Low Home)) +
=
P ((Medium Auto, Medium Home)) + P ((High Auto, High Home))
0.06 + 0.2 + 0.15 = 0.41 .
Part (d): This is 1 − 0.41 = 0.59.
Part (e): This is
0.04 + 0.06 + 0.05 + 0.03 + 0.1 + 0.03 = 0.31 .
Part (f): This is 1 − 0.31 = 0.69.
Exercise 2.22 (stopping at traffic lights)
Part (a): This is P (A ∩ B) which we can evaluate using
P (A ∩ B) = −(P (A ∪ B) − P (A) − P (B)) = P (A) + P (B) − P (A ∪ B) = 0.4 + 0.5 − 0.6 = 0.3 .
10
Part (b): This is P (A ∩ B ′ ). To compute this recall that
P (A) = P (A ∩ B ′ ) + P (A ∩ B) so 0.4 = P (A ∩ B ′ ) + 0.3 .
Thus P (A ∩ B ′ ) = 0.1.
Part (c): This is the probability of the event (A ∩ B ′ ) ∪ (A′ ∩ B) or
P (A ∩ B ′ ) + P (A′ ∩ B) .
We know P (A ∩ B ′ ) from Part (b). To compute P (A′ ∩ B) recall that
P (B) = P (B ∩ A′ ) + P (B ∩ A) or 0.5 = P (B ∩ A′ ) + 0.3 ,
so P (B ∩ A′ ) = 0.2. Thus
P (A ∩ B ′ ) + P (A′ ∩ B) = 0.1 + 0.2 = 0.3 .
Exercise 2.23
Part (a): This would happen in one way from 15 or a probability of
1
.
15
Part (b): This would happen with probability
4
6
2
2
=
= .
15
15
5
Part (c): This is the complement of the event that both selected computers are laptops so
1
14
this given a probability of 1 − 15
= 15
.
Part (d): This is
1−
1
6
8
−
=
.
15 15
15
Exercise 2.24
Since B = A ∪ (B ∩ A′ ) and the events A and B ∩ A′ are disjoint we have
P (B) = P (A) + P (B ∩ A′ ) .
As P (B ∩ A′ ) ≥ 0 we have that P (A) ≤ P (B). Since for a general events A and B we have
(A ∩ B) ⊂ A ⊂ (A ∪ B) ,
applying the above result twice we have
P (A ∩ B) ≤ P (A) ≤ P (A ∪ B) .
11
Exercise 2.25
From the problem statement we are told that
P (A) = 0.7
P (B) = 0.8
P (C) = 0.75
P (A ∪ B) = 0.85
P (A ∪ C) = 0.9
P (B ∪ C) = 0.95
P (A ∪ B ∪ C) = 0.98 .
Part (a): This is P (A ∪ B ∪ C) = 0.98.
Part (b): This is 1 − P (A ∪ B ∪ C) = 0.02.
Part (c): We want to evaluate P (A ∩ B ′ ∩ C ′ ). Drawing a Venn diagram with three sets
we get the following mutually exclusive sets
A ∩ B ∩ C′ ,
A ∩ B′ ∩ C ,
A′ ∩ B ∩ C ,
and A ∩ B ∩ C .
To evaluate all of these we first compute P (A ∩ B), P (A ∩ C) and P (B ∩ C) using
P (A ∩ B) = P (A) + P (B) − P (A ∪ B) = 0.7 + 0.8 − 0.85 = 0.65 .
In the same way we find
P (A ∩ C) = 0.7 + 0.75 − 0.9 = 0.55
P (B ∩ C) = 0.8 + 0.75 − 0.95 = 0.6 .
Now using what we know in
P (A ∪ B ∪ C) = P (A) + P (B) + P (C) − P (A ∩ B) − P (A ∩ C) − P (B ∩ C) + P (A ∩ B ∩ C) ,
we have
0.98 = 0.7 + 0.8 + 0.75 − 0.65 − 0.55 − 0.6 + P (A ∩ B ∩ C) ,
and find P (A ∩ B ∩ C) = 0.53. Lets now compute P (A ∩ B ∩ C ′ ) using the Venn diagram.
We have
P (A ∩ B ∩ C ′ ) = P (A ∩ B) − P (A ∩ B ∩ C) = 0.65 − 0.53 = 0.12 .
In the same way we have
P (A ∩ B ′ ∩ C) = P (A ∩ C) − P (A ∩ B ∩ C) = 0.55 − 0.53 = 0.02
P (A′ ∩ B ∩ C) = P (B ∩ C) − P (A ∩ B ∩ C) = 0.6 − 0.53 = 0.07 .
12
Using these computed probabilities we get
P (A ∩ B ′ ∩ C ′ ) = P (A) − P (A ∩ B ′ ∩ C) − P (A ∩ B ∩ C ′ ) − P (A ∩ B ∩ C)
= 0.7 − 0.02 − 0.12 − 0.53 = 0.03 .
Part (d): This would be the probability of the event
(A ∩ B ′ ∩ C ′ ) ∪ (A′ ∩ B ∩ C ′ ) ∪ (A′ ∩ B ′ ∩ C) .
Notice that this the union of disjoint sets and we have computed P (A ∩ B ′ ∩ C ′ ) in Part (c).
Following the steps Part (c) (for the other two sets) we find
P (A′ ∩ B ∩ C ′ ) = P (B) − P (A ∩ B ∩ C ′ ) − P (A′ ∩ B ∩ C) − P (A ∩ B ∩ C)
= 0.8 − 0.12 − 0.07 − 0.53 = 0.08
′
′
P (A ∩ B ∩ C) = P (C) − P (A ∩ B ′ ∩ C) − P (A′ ∩ B ∩ C) − P (A ∩ B ∩ C)
= 0.85 − 0.02 − 0.07 − 0.53 = 0.23 .
Given these numbers we have that
P (A ∩ B ′ ∩ C ′ ) + P (A′ ∩ B ∩ C ′ ) + P (A′ ∩ B ′ ∩ C) = 0.03 + 0.08 + 0.23 = 0.34 .
Note this is different than the answer in the back of the book. If anyone sees anything wrong
with what I have done please contact me.
Exercise 2.26
Part (a): P (A′1 ) = 1 − P (A1 ) = 0.78.
Part (b):
P (A1 ∩ A2 ) = P (A1 ) + P (A2 ) − P (A1 ∪ A2 ) = 0.12 + 0.07 − 0.13 = 0.06 .
Part (c): Using the
P (A1 ∩ A2 ) = P (A1 ∩ A2 ∩ A3 ) + P (A1 ∩ A2 ∩ A′3 ) ,
or
0.06 = 0.01 + P (A1 ∩ A2 ∩ A′3 ) so P (A1 ∩ A2 ∩ A′3 ) = 0.05 .
Part (d): This is
1 − P (have three defects) = 1 − P (A1 ∩ A2 ∩ A3 ) = 1 − 0.01 = 0.99 .
13
Exercise 2.27
Part (a): This is
1
1
.
5 =
10
2
Part (b): This would be
2
1
3
1
+ 22 22 30
6+1
7
=
=
.
10
10
10
We can also evaluate this probability as
1 − P (no members have a name starting with C) = 1 −
3
2
2
0
10
= 1−
3
7
=
.
10
10
Part (c): For this event we can select from the pairs
{(1, 5), (2, 4), (2, 5), (3, 4), (3, 5), (4, 5)} .
Thus we will have a probability for this event of
6
10
= 35 .
Exercise 2.28
Part (a): This will happen if we have one of the elementary events (1, 1, 1), (2, 2, 2), and
3
= 19 .
(3, 3, 3) and thus this event will happen with a probability of 27
Part (b): This is the complement of the event that all family members are assigned the
same section. We calculated this probability in Part (a) above. Thus the probability we
want is then 1 − 91 = 89 .
Part (c): This event is represented by the following set elementary events
{(1, 2, 3), (1, 3, 2), (2, 13), (2, 3, 1), (3, 1, 2), (3, 2, 1)} .
Thus this event will happen with a probability of
6
27
= 91 .
Exercise 2.29
Part (a): We have 26 choices for the first letter A-Z and then 26 choices for the second
letter making 262 = 676 total choices. If we allow digits we must add another 10 characters
(the digits 0-9) giving 36 total choices for characters the first and second location and giving
362 = 1296 total choices.
14
Part (b): These would be 263 = 17575 and 363 = 46656.
Part (c): These would be 264 = 456976 and 364 = 1679616.
97786
.
1679616
Part (d): This would be 1 −
Exercise 2.30
Part (a): This would be 8(7)(6).
Part (b): This would be
30
6
Part (c): This would be
8
2
= 593775.
10
2
12
2
.
Part (d): This would be the number from Part (c) over the number from Part (b).
Part (e): This would be
8
6
+ 10
+
6
593775
12
6
.
Exercise 2.31
Part (a): This would be 9(27) = 243.
Part (b): This would be 9(27)(15) = 3645. If we divide by 365 we get the number of years
which is 9.986 our about 10 years.
Exercise 2.32
Part (a): This would be 5(4)(3)(4) = 240.
Part (b): This would be 1(1)(3)(4) = 12.
Part (c): This would be 4(3)(3)(3) = 108.
Part (d): The number of systems with at least one Sony component is equal to the total
number of systems minus the number of ways to select components without a Sony component or 240 − 108 = 132.
Part (e): The probability that we have at least one Sony component is
15
132
240
= 0.55. The
probability that we have exactly one Sony component is given by
1(3)(3)(3) + 4(1)(3)(3) + 4(3)(3)(1)
= 0.4125 .
240
Exercise 2.33
Warning: The solutions to Part (a) and (b) do not match the ones given in the back of the
book. If anyone sees what I did that is incorrect please contact me.
Part (a): This would be 15
9! = 1816214400, since we can pick the 9 players to be on
9
15
the field in 9 ways and then order them (select the pitcher, the first, second, and third
baseman, etc.) in 9! ways.
Part (b): This would be the number in Part (a) above multiplied by 9! or the number of
ways we can specify the batting order. This would give 6.590679 1014.
Part (c): This would be
5
3
10
6
= 2100.
Exercise 2.34
Part (a): This would be
Part (b):
5
4
= 5.
Part (c): This would be
25
5
5
53130
= 53130.
= 9.418 10−5.
Part (d): This would be
5
4
+ 55
= 0.0001129 .
53130
Exercise 2.35
Part (a): We have 20
= 38760 selections of six workers coming from the day shift. The
6
probability that all 6 selected workers will be from the day shift in our sample is
20
6
45 = 0.004758 .
6
Part (b): This would be
20
6
+
15
6
45
6
+
10
6
16
= 0.005398 .
Part (c): This is 1 − P (all workers come from the same shift) = 0.9946.
Part (d): We want to compute the probability that at least one of the shifts will be
unrepresented in the sample of workers. This is the union of the events that there are
unrepresented shifts (day, swing, or graveyard) in the sample. We compute this using
one unrepresented shift = {the day shift is unrepresented and the others are}
∪ {the swing shift is unrepresented and the others are}
∪ {the graveyard shift is unrepresented and the others are} .
Note that these probabilities are not mutually exclusive. We can count the number of samples
in the first event on the right-hand-side of the above as
15
0
X
6 10
15 10
15 10
15 10
20 15
10
+
+ ··· +
+
=
.
6
1
5
5
1
6
0
0
k
6−k
k=0
15
10
Note that the first expression above i.e. 0 6 has two shifts unrepresented. The number
of samples in the second event can be computed as
20
1
X
6 10
20 10
20 10
20 10
20 15
10
+
+ ··· +
+
=
.
5
2
4
5
1
6
0
k
0
6−k
k=1
Finally, the number of samples in the third event can be computed as
20
1
X
5 15
20 15
20 15
20
15
10
+
+ ··· +
=
.
5
2
4
5
k
1
6−k
0
k=1
Note that the above summation does not have the term corresponding to k = 6. We have
been very careful in the above in avoiding double counting the number of events. When we
add these all up we get 2350060. This is to be divided
by the number of ways to select six
45
members from the 45 total. This is given by 6 . This gives a probably of 0.2885258.
Exercise 2.36
See the python code ex2 36.py. When we run that code we get the output
All possible orderings of the votes=
[’AAABB’, ’AABAB’, ’AABBA’, ’ABAAB’, ’ABABA’, ’ABBAA’, ’BAAAB’, ’BAABA’, ’BABAA’, ’BBAAA’]
Orderings where A leads (or equals) B= [’AAABB’, ’AABAB’, ’AABBA’, ’ABAAB’, ’ABABA’]
Probability that A always leads (or equals) B= 0.5
Exercise 2.37
Part (a): There are 3(4)(5) = 60 possible experiments.
17
Part (b): There are 1(2)(5) = 10 possible experiments.
Part (c): Note that we have 60 total experiments and thus 60! ways of ordering these
experiments.
We need to count the number of ways we can have the first five experiments have one of
each of the five catalysts. Imagine the experiment that uses the first type of catalyst. Once
that is fixed we have 3(4) = 12 possible orderings for the temperature and pressure values
that would go with this catalyst. Thus for each of the five catalyst we have 12 choices for
the other two variables. In total we have 125 choices for the two other variables for all five
experiments. We have 5! ways of ordering the five different catalyst giving a total of 5! 125
ways to order the first five experiments. Following these five experiments we have (60 − 5)!
ways to order the remaining experiments. This gives a probability of
5!125
5!125 (60 − 5)!
=
= 0.04556 .
60!
60 · 59 · 58 · 57 · 56
Exercise 2.38
Part (a): This is
6
2
4
+
1
15
3
5
1
+
5
+
3
15
3
6
3
= 0.2967 .
Part (b): This is
4
3
= 0.074725 .
Part (c): This is
4
1
5 6
1 1
15
3
= 0.2637 .
Part (d): From the problem statement there are six 75 Watt bulbs and nine bulbs with
different wattage. Let Si be the event we select the first 75 Watt bulb in the ith draw. We
18
have
P (S1 ) =
P (S2 |S1′ ) =
P (S3|S1′ S2′ ) =
P (S4 |S1′ S2′ S3′ ) =
P (S5 |S1′ S2′ S3′ S4′ ) =
6
1
15
1
6
1
14
1
6
1
13
1
6
1
12
1
6
1
11
1
=
6
15
=
6
14
=
6
13
=
6
12
=
6
.
11
Then the probability, P, it is necessary to examine at least 6 bulbs is given by
P = 1 − P (S1 ) − P (S1′ S2 ) − P (S1′ S2′ S3 ) − P (S1′ S2′ S3′ S4 ) − P (S1′ S2′ S3′ S4′ S5 )
= 1 − P (S1 ) − P (S2 |S1′ )P (S1′ ) − P (S3 |S1′ S2′ )P (S1′ S2′ ) − P (S4 |S1′ S2′ S3′ )P (S1′ S2′ S3′ ) − P (S5 |S1′ S2′ S3′ S4′ )P (S1′ S2′ S3′ S4′ )
= 1 − P (S1 ) − P (S2 |S1′ )P (S1′ ) − P (S3 |S1′ S2′ )P (S2′ |S1′ )P (S1′ ) − P (S4 |S1′ S2′ S3′ )P (S3′ |S1′ S2′ )P (S2′ |S1′ )P (S1′ )
− P (S5 |S1′ S2′ S3′ S4′ )P (S4′ |S1′ S2′ S3′ )P (S3′ |S1′ S2′ )P (S2′ |S1′ ) = 0.04195804 ,
when we use the numbers above.
Exercise 2.39
Part (a): First pick from the first ten spots
the five spots where we will put the cordless
phones. These five spots can be picked in 10
ways. We can order these five cordless phones
5
in 5! ways. The other 10 phones can be placed in 10! ways. In total we have
10
5!(10!) ,
5
ways in which the five cordless phones can be placed in the first ten spots. There are 15!
ways to order all the phones. Thus the probability is given by
10
5!(10!)
5
= 0.0839 .
15!
Part (b): For this part of the exercise we want the probability that after we service ten
phones we will have serviced all phones of one type. This means that in the first ten phone
there must be all five phones of one given type. To compute this probability note that we
have three choices of the type of phone which willhave all of its repairs done in the first
ten. Once we specify that phone type we have 10
locations in which we can place these
5
five phones and 5! ways in which to order them. This gives 3 10
5! ways to place the type
5
of five phones that will get fully serviced. We now have to place the remaining phones. We
have 10! ways to place these phone but some of these ways will give only a single phone type
19
for the last five spots. Thus removing the 5! permutations from each of the two phone types
finally gives us the probability
3 10
5!(10! − 2(5!))
5
= 0.2517316 .
15!
Warning: This is not the same answer as in the back of the book. If anyone sees an error
in what I have done please contact me.
Part (c): To have two phones of each type in the first six serviced means that we must
3
have two cordless, two corded, and two cellular phones. Now 52 = 103 is the number of
ways to pick the two phones from each
phone type that will go in the first six spots. We can
6
place the two cordless phones in 2 2! ways, then once these are placed we can place the two
corded phones in 42 2! ways and finally the two cellular phones can be placed in 2! ways.
With the phone in the first six locations determined we have to place the other 15 − 6 = 9
phones which can be done in 9! ways. This gives the probability
4 103 62 2!
2! (2!)9!
2
= 0.19980 .
15!
Exercise 2.40
Part (a): Since 3 + 3 + 3 + 3 = 27 we have
27!
,
(3!)4
chain molecules. First we assume that each of the molecule types are distinguishable to get
27! and then divide by the number of orderings (3!) of the A, B, C, and D type molecules.
Part (b): We would have
4·3·2·1
27!
(3!)4
=
4!(3!)4
.
27!
The denominator above is the number of chain molecules and the numerator is the number
of ways of picking the ordering of the four molecules. We have four choices for the first
molecule, three for the second molecule etc.
Exercise 2.41
Part (a): This is
1 − P (no female assistant is selected) = 1 −
20
4
3
8
3
=1−
4
= 0.92857 .
56
Part (b): This probability is
4
4
4
1 =
8
5
4
= 0.0714 .
56
Part (c): This would be
1 − P (orderings are the same between semesters) = 1 −
1
= 0.9997520 .
8!
Exercise 2.42
The probability that Jim and Paula sit at the two seats on the far left is
2(4!)
1
=
= 0.06666 .
6!
15
Since there are two permutations of Jim an Paula (where they are sitting together in the
two seats on the far left) and then 4! orderings of the other people.
For Jim and Paula to sit next to each other then as a couple they can be at the positions
(1, 2), (2, 3), (3, 4), (4, 5), and (5, 6) each of which has the same probability (as calculated
1
1
above) to get for the probability that Jim and Paula sit next to each other of 5 15
= 3.
Another method to get the same probability is to consider Jim and Paula (together) as one
unit. Then the number of orderings of this group is 2(5!) since we have five items (Jim and
Paula together and the other four people) and two orderings of Jim and Paula where they
are sitting together. This gives a probability of 2(5!)
= 31 the same as before.
6!
We now want the probability that at least one wife ends up sitting next to her husband.
Let Ci be the event that couple i (for i = 1, 2, 3) sits next to each other. Then we want to
evaluate
P (C1 ∪ C2 ∪ C3 ) ,
or
P (C1 ) + P (C2) + P (C3 ) − P (C1 ∩ C2 ) − P (C1 ∩ C3 ) − P (C2 ∩ C3 ) + P (C1 ∩ C2 ∩ C3 ) .
By symmetry this is equal to
3P (C1 ) − 3P (C1 ∩ C2 ) + P (C1 ∩ C2 ∩ C3 ) .
We computed P (C1) above. Now P (C1 ∩ C2 ) can be computed by considering two “fused”
couples items and the two other people. There are 4! ways to order these groups and two
orderings of each couple in their fused group. This gives
P (C1 ∩ C2 ) =
21
2
4!22
=
.
6!
15
In the same way we can evaluate P (C1 ∩ C2 ∩ C3 ) since it is three fused couples and we get
P (C1 ∩ C2 ∩ C3 ) =
1
3!23
=
.
6!
15
Thus we compute
P (C1 ∪ C2 ∪ C3 ) = 1 − 3
2
15
+
1
2
= .
15
3
Exercise 2.43
We have four
ways to pick the 10 to use in the hand, four ways to pick the nine to use, etc.
52
We have 5 ways to draw five cards. Thus the probability of a straight with ten the high
5
card is 452 = 0.000394.
(5)
To be a straight we can have five, six, seven, eight, nine, ten, jack, queen, king, or ace be the
high card. This is ten possible cards. The probability of a straight with any one of these as
the high card is the same as we just calculated. Thus the probability of a straight is given
by 10(0.000394) = 0.00394.
The probability that we have a straight flush where all cards are of the same suit and 10 is
the high card is 524 = 1.539 10−6. To have any possible high card we multiply this by 10 as
(5)
before to get 1.539 10−5.
Exercise 2.44
Recall that nk are the number of ways to draw a set of size k from n items. Once this set
is drawn what remains is a set of size n − k. Thus for every set of size k we have a set of
size n − k this fact gives this equivalence.
Notes on Example 2.26
The book provides probabilities for the various intersections P (A ∩ B), P (A ∩ C), P (B ∩ C),
and P (A ∩ B ∩ C) in the table given for this example, but does not explain how it calculated
the probabilities of the three way intersections i.e. P (A ∩ B ∩ C ′ ), P (A ∩ B ′ ∩ C), and
P (A′ ∩ B ∩ C). We can do this by setting up equations for the three way intersections in
terms of the two way intersections (by reading from the Venn diagram) as follows
P (A ∩ B ∩ C ′ ) + P (A ∩ B ∩ C) = P (A ∩ B)
P (A ∩ B ′ ∩ C) + P (A ∩ B ∩ C) = P (A ∩ C)
P (A′ ∩ B ∩ C) + P (A ∩ B ∩ C) = P (B ∩ C) .
This allows us to solve for the three way intersections.
22
Exercise 2.45
Part (a): From the given table we have
P (A) = 0.106 + 0.141 + 0.2 = 0.447
P (C) = 0.215 + 0.2 + 0.065 + 0.02 = 0.5
P (A ∩ C) = 0.2 .
Part (b): We find
P (A|C) =
0.2
2
P (A ∩ C)
=
= ,
P (C)
0.5
5
is the probability of the blood type A given that we are from the third ethnic group and
P (C|A) =
0.2
P (A ∩ C)
=
= 0.4474273 ,
P (A)
0.447
is the probability we are from the third ethnic group given we have the blood type A.
Part (c): Let G1 be the event that the individual is from ethnic group one. Then we want
to evaluate
P (G1 ∩ B ′ )
P (G1 |B ′ ) =
.
P (B ′ )
We need to compute the various parts of the above expression. First we have
P (B) = 0.008 + 0.018 + 0.065 = 0.091 so P (B ′ ) = 1 − P (B) = 0.909 .
Next as the blood types are mutually exclusive we have
G1 ∩ B ′ = G1 ∩ (O ∪ A ∪ AB) = (G1 ∩ O) ∪ (G1 ∩ A) ∪ (G1 ∩ AB) ,
so
P (G1 ∩ B ′ ) = P (G1 ∩ O) + P (G1 ∩ A) + P (G1 ∩ AB) = 0.082 + 0.106 + 0.04 = 0.228 .
Thus we get
0.228
= 0.2508251 .
0.909
Warning: The answer to this problem does not match that in the back of the book. If
anyone sees anything wrong with what I have done please contact me.
P (G1 |B ′ ) =
Exercise 2.46
From the description of the events given in the book P (A|B) is the probability a person is
over six feet tall given they are a professional basketball player and P (B|A) is the probability
a person is a professional basketball player given they are over six feet tall. We would expect
P (A|B) > P (B|A).
23
Exercise 2.47
From Exercise 12 we were told that
P (A) = 0.5
P (B) = 0.4
P (A ∩ B) = 0.25 .
Part (a): P (B|A) is the probability we have a MasterCard given we have a Visa card and
is given by
P (A ∩ B)
0.25
1
=
= .
P (A)
0.5
2
Part (b): P (B ′|A) is the probability we don’t have a MasterCard given we have a Visa card
and is given by
P (A ∩ B ′ )
.
P (A)
To compute P (A ∩ B ′ ) write A as
A = (A ∩ B) ∪ (A ∩ B ′ ) so P (A) = P (A ∩ B) + P (A ∩ B ′ ) .
Using this we have
P (A ∩ B ′ ) = P (A) − P (A ∩ B) = 0.5 − 0.25 = 0.25 .
Thus P (B ′ |A) =
0.25
0.5
= 21 . Note that this is also equal to 1 − P (B|A) as it should be.
Part (c): P (A|B) is the probability we have a Visa card given we have a MasterCard and
is given by
0.25
P (A ∩ B)
=
= 0.625 .
(7)
P (B)
0.4
Part (d): P (A′|B) is the probability we don’t have a Visa given we have a MasterCard and
is given by
1 − P (A|B) = 1 − 0.625 = 0.375 .
Part (e): For this part we want to evaluate
P (A ∩ (A ∪ B))
P ((A ∩ A) ∪ (A ∩ B))
=
P (A ∪ B)
P (A ∪ B)
P (A)
0.5
P (A ∪ (A ∩ B))
=
=
= 0.7692308 .
=
P (A ∪ B)
P (A ∪ B)
0.5 + 0.4 − 0.25
P (A|A ∪ B) =
24
Exercise 2.48
Part (a): This would be
P (A2 |A1 ) =
P (A1 ∩ A2 )
P (A1 ) + P (A2 ) − P (A1 ∪ A2 )
0.12 + 0.07 − 0.13
=
=
= 0.5 .
P (A1 )
P (A1 )
0.12
Part (b): This would be
P (A1 ∩ A2 ∩ A3 |A1 ) =
0.01
P (A1 ∩ A2 ∩ A3 )
=
= 0.0833 .
P (A1 )
0.12
Part (c): Denote the probability we want to calculate by P. Then P is given by
P
=
=
P {(A1 ∩ A′2 ∩ A′3 ) ∪ (A′1 ∩ A2 ∩ A′3 ) ∪ (A′1 ∩ A′2 ∩ A3 )|A1 ∪ A2 ∪ A3 }
P {[(A1 ∩ A′2 ∩ A′3 ) ∪ (A′1 ∩ A2 ∩ A′3 ) ∪ (A′1 ∩ A′2 ∩ A3 )] ∩ [A1 ∪ A2 ∪ A3 ]}
.
P (A1 ∪ A2 ∪ A3 )
The numerator of the above fraction is given by
P ((A1 ∩ A′2 ∩ A′3 ) ∩ (A1 ∪ A2 ∪ A3 )) + P ((A′1 ∩ A2 ∩ A′3 ) ∩ (A1 ∪ A2 ∪ A3 )) + P ((A′1 ∩ A′2 ∩ A3 ) ∩ (A1 ∪ A2 ∪ A3 )) ,
or
P (A1 ∩ A′2 ∩ A′3 ) + P (A′1 ∩ A2 ∩ A′3 ) + P (A′1 ∩ A′2 ∩ A3 ) .
We now need to compute each of the above probabilities. Given the assumptions of the
problem we can derive “all” of the intersections we might need
P (A1 ∩ A2 ) = P (A1 ) + P (A2 ) − P (A1 ∪ A2 ) = 0.12 + 0.07 − 0.13 = 0.06
P (A1 ∩ A3 ) = 0.12 + 0.05 − 0.14 = 0.03
P (A2 ∩ A3 ) = 0.07 + 0.05 − 0.1 = 0.02
P (A1 ∩ A′2 ) = P (A1 ) − P (A1 ∩ A2 ) = 0.12 − 0.06 = 0.06
P (A1 ∩ A′3 ) = 0.12 − 0.03 = 0.09
P (A2 ∩ A′3 ) = 0.07 − 0.02 = 0.05
P (A′1 ∩ A2 ) = P (A2 ) − P (A1 ∩ A2 ) = 0.07 − 0.06 = 0.01
P (A′1 ∩ A3 ) = P (A3 ) − P (A3 ∩ A1 ) = 0.05 − 0.03 = 0.02
P (A′2 ∩ A3 ) = P (A3 ) − P (A2 ∩ A3 ) = 0.05 − 0.02 = 0.03 .
Now with these and using
P (A1 ∩ A2 ) = P (A1 ∩ A2 ∩ A3 ) + P (A1 ∩ A2 ∩ A′3 ) we get
0.06 = 0.01 + P (A1 ∩ A2 ∩ A′3 ) so P (A1 ∩ A2 ∩ A′3 ) = 0.05 .
In the same way we get
P (A1 ∩ A′2 ∩ A3 ) = P (A1 ∩ A3 ) − P (A1 ∩ A3 ∩ A2 )
= 0.03 − 0.01 = 0.02 ,
25
and
P (A′1 ∩ A2 ∩ A3 ) = P (A2 ∩ A3 ) − P (A1 ∩ A2 ∩ A3 ) = 0.02 − 0.01 = 0.01
P (A1 ∩ A′2 ∩ A′3 ) = P (A1 ∩ A′2 ) − P (A1 ∩ A′2 ∩ A3 ) = 0.06 − 0.02 = 0.04
P (A′1 ∩ A2 ∩ A′3 ) = P (A′1 ∩ A2 ) − P (A′1 ∩ A2 ∩ A3 ) = 0.01 − 0.01 = 0.0
P (A′1 ∩ A′2 ∩ A3 ) = P (A′2 ∩ A3 ) − P (A1 ∩ A′2 ∩ A3 ) = 0.03 − 0.02 = 0.01 .
Using everything we have thus far we can compute the probability we need. We find
P (A1 ∩ A′2 ∩ A′3 ) + P (A′1 ∩ A2 ∩ A′3 ) + P (A′1 ∩ A′2 ∩ A3 ) = 0.04 + 0.0 + 0.01 = 0.05 .
Part (d): We have
P (A1 ∩ A2 ∩ A3 )
P (A1 ∩ A2 )
0.01
P (A1 ∩ A2 ∩ A3 )
=1−
= 0.8333333 .
=1−
P (A1 ) + P (A2 ) − P (A1 ∪ A2 )
0.12 + 0.07 − 0.13
P (A′3 |A1 ∩ A2 ) = 1 − P (A3 |A1 ∩ A2 ) = 1 −
Exercise 2.49
Let A be the event that at least one of the two bulbs selected is found to be 75 Watts and
.
B be the event that both bulbs are 75 Watts. Then we want to evaluate P (B|A) = P P(A∩B)
(A)
Now the denominator of P (B|A) can be evaluated as
6 9
+ 62
54 + 15
1 1
= 0.657143 .
P (A) =
=
15
105
2
For the numerator note that A ∩ B = B, since if B is true then A is true. Thus we have
(62)
(152)
P (B|A) =
= 0.21739 .
0.657143
Let C be the event that at least one of the two bulbs is not 75 Watts and D be the event
that both bulbs are the same rating. Then
D = {both bulbs 40W} ∪ {both bulbs 60W} ∪ {both bulbs 75W} .
For this part of the problem we want to evaluate
P (D|C) =
Now
P (C) =
9
2
P (C ∩ D)
.
P (C)
9
1
+
15
2
6
26
1
= 0.8571429 .
Now D ∩ C is the event that both 40 Watt or both are 60 Watt as C contradicts the event
that both bulbs are 75 Watts. Thus we have
P (C ∩ D) = P (both bulbs 40W) + P (both bulbs 60W)
5
4
=
Thus we have P (D|C) =
2
15
2
0.152381
0.8571429
+
2
15
2
= 0.152381 .
= 0.1777778.
Exercise 2.50
Let LS represent long-sleeved shirts and SS represent short-sleeved shirts.
Part (a): From the given table we have P (M, LS, P r) = 0.05.
Part (b): P (M, P r) = 0.07 + 0.05 = 0.12.
Part (c): For P (SS) we would add all the numbers in the short-sleeved table. For P (LS)
we would add all of the numbers in the long-sleeved table.
Part (d): We have
P (M) = 0.08 + 0.07 + 0.12 + 0.1 + 0.05 + 0.07 = 0.49
P (P r) = 0.02 + 0.07 + 0.07 + 0.02 + 0.05 + 0.02 = 0.25 .
Part (e): We want to evaluate
P (M|SS, P l) =
0.08
P (M, SS, P l)
=
= 0.5333333 .
P (SS, P l)
0.04 + 0.08 + 0.03
Part (f): We have
P (SS|M, P l) =
P (M, SS, P l)
0.08
=
= 0.4444444 ,
P (M, P l)
0.08 + 0.1
P (LS|M, P l) =
0.10
P (M, LS, P l)
=
= 0.5555556 .
P (M, P l)
0.08 + 0.1
and
Exercise 2.51
Part (a): Let R1 be the event that we draw a red ball on the first draw and R2 that we
draw a red ball on the second draw. The we have
6
8
= 0.4363 .
P (R1 , R2 ) = P (R2 |R1 )P (R1 ) =
11 10
27
Part (b): We want the probability that we have the same number of red and green balls
after the two draws as before. This is given by
P (R1 , R2 ) + P (G1 , G2 ) = P (R2 |R1 )P (R1 ) + P (G2 |G1 )P (G1)
4
32
4
24
=
+
= 0.5818 .
=
25 11 10
55
Exercise 2.52
Let F1 be the event that the first pump fails and F2 the event that the second pump fails.
Then we are told that
P (F1 ∪ F2 ) = 0.07
P (F1 ∩ F2 ) = 0.01 .
Now assuming that P (F1 ) = P (F2 ) we get
P (F1 ∪F2 ) = P (F1 ) + P (F2) −P (F1 ∩F2 ) = 2P (F1 ) −P (F1 ∩F2 ) so 0.07 = 2P (F1 ) −0.01 ,
and we get that P (F1 ) = P (F2) = 0.03. We can check that with this numerical value we
have P (F2 |F1 ) > P (F2 ) as we should. Note that
P (F2 |F1 ) =
0.01
1
P (F1 ∩ F2 )
=
= > P (F2 ) = 0.03 .
P (F1 )
0.03
3
Exercise 2.53
We have when B ⊂ A so that A ∩ B = B and thus
P (B|A) =
P (A ∩ B)
P (B)
0.05
=
=
= 0.083 .
P (A)
P (A)
0.6
Exercise 2.54
Part (a): The expression P (A2 |A1 ) is the probability we are awarded project 2 given that
we are awarded project 1. We can compute it from
P (A1 ∩ A2 )
0.11
1
=
= .
P (A1 )
0.22
2
Part (b): The expression P (A2 ∩ A3 |A1 ) is the probability we are awarded projects two and
three given that we were awarded project one. We can compute it as
P (A1 ∩ A2 ∩ A3 )
0.01
=
= 0.04545 .
P (A1 )
0.22
28
Part (c): The expression P (A2 ∪ A3 |A1 ) is the probability we are awarded projects two or
three given that we were awarded project one. We can compute it as
P (A1 ∩ (A2 ∪ A3 ))
.
P (A1 )
To compute P (A1 ∩ (A2 ∪ A3 )) we note that
A1 ∩ (A2 ∪ A3 ) = (A1 ∩ A2 ) ∪ (A1 ∩ A3 ) ,
so
P (A1 ∩ (A2 ∪ A3 )) = P ((A1 ∩ A2 ) ∪ (A1 ∩ A3 ))
= P (A1 ∩ A2 ) + P (A1 ∩ A3 ) − P ((A1 ∩ A2 ) ∩ (A1 ∩ A3 ))
= 0.11 + 0.05 − P (A1 ∩ A2 ∩ A3 ) = 0.16 − 0.01 = 0.15 .
Thus we get
P (A2 ∪ A3 |A1 ) =
0.15
= 0.6818 .
0.22
Part (d): The expression P (A1 ∩ A2 ∩ A3 |A1 ∪ A2 ∪ A3 ) is the probability we are awarded
projects one, two, and three given that we were awarded at least one of the three projects.
We can compute it as
P (A1 ∩ A2 ∩ A3 )
0.01
P ((A1 ∩ A2 ∩ A3 ) ∩ (A1 ∪ A2 ∪ A3 ))
=
=
= 0.01886792 .
P (A1 ∪ A2 ∪ A3 )
P (A1 ∪ A2 ∪ A3 )
0.53
where we computed P (A1 ∪ A2 ∪ A3 ) in Exercise 13 Page 6.
Exercise 2.55
Let L be the event that a tick carries Lyme disease and H the event that a tick carries HGE.
Then from the problem we are told that
P (L) = 0.16
P (H) = 0.1
P (H ∩ L|H ∪ L) = 0.1 .
We want to compute P (L|H). Now from the definition of conditional probability we have
P (H ∩ L|H ∪ L) =
so
P ((H ∩ L) ∩ (H ∪ L))
P (H ∩ L)
=
= 0.1 ,
P (H ∪ L)
P (H ∪ L)
P (H ∩ L) = 0.1P (H ∪ L) .
(8)
Next using P (H ∩ L) = P (H) + P (L) − P (H ∪ L) with the above we get
0.1P (H ∪ L) = 0.16 + 0.1 − P (H ∪ L) so P (H ∪ L) = 0.23636 .
Using Equation 8 we get P (H ∩ L) = 0.023636. For the probability we want to evaluate we
find
0.0236
P (L ∩ H)
=
= 0.236 .
P (L|H) =
P (H)
0.1
29
Exercise 2.56
Using the definition of conditional probability we have
P (A|B) + P (A′ |B) =
P (A ∩ B) + P (A′ ∩ B)
P (B)
=
= 1,
P (B)
P (B)
as we were to show.
Exercise 2.57
If P (B|A) > P (B) then adding P (B ′ |A) to both sides of this expression gives
1 > P (B) + P (B ′ |A) ,
or 1 − P (B) = P (B ′ |A) or P (B ′) > P (B ′|A) as we were to show.
Exercise 2.58
We have
P ((A ∩ C) ∪ (B ∩ C))
P ((A ∪ B) ∩ C)
=
P (C)
P (C)
P (A ∩ C) + P (B ∩ C) − P (A ∩ B ∩ C)
=
P (C)
= P (A|C) + P (B|C) − P (A ∩ B|C) ,
P (A ∪ B|C) =
as we were to show.
Exercise 2.59
Part (a): We have
P (A2 ∩ B) = P (A2 )P (B|A2 ) = 0.35(0.6) = 0.21 .
Part (b): We have
P (B) = P (B|A1 )P (A1 ) + P (B|A2 )P (A2 ) + P (B|A3 )P (A3 ) = 0.3(0.4) + 0.21 + 0.5(0.25) = 0.455 .
30
Part (c): We have
0.12
P (A1 ∩ B)
=
= 0.2637
P (B)
0.455
P (A2 ∩ B)
0.21
P (A2 |B) =
=
= 0.4615
P (B)
0.455
0.125
P (A3 ∩ B)
=
= 0.274725 .
P (A3 |B) =
P (B)
0.455
P (A1 |B) =
Exercise 2.60
Let D be the event that an aircraft is discovered and L be the event that the aircraft
discovered has an emergency locator. Then we are told that P (D) = 0.7, P (L|D) = 0.6,
and P (L|D ′) = 0.9. From these we can conclude that P (L ∩ D) = 0.7(0.6) = 0.42 and
P (L ∩ D ′ ) = 0.3(0.9) = 0.27.
Part (a): We want to evaluate P (D ′|L) =
P (L∩D ′ )
.
P (L)
Now
P (L) = P (L|D)P (D) + P (L|D ′)P (D ′) = 0.6(0.7) + 0.9(0.3) = 0.69 ,
so P (D ′|L) =
0.27
0.69
= 0.3913.
Part (b): We want P (D|L′) =
P (D∩L′ )
.
P (L′ )
Now
P (D ∩ L′ ) = P (D) − P (D ∩ L) = 0.7 − 0.42 = 0.28 ,
so P (D|L′) =
0.28
0.31
= 0.9032.
Exercise 2.61
Let D0 , D1 , and D2 be the events that there are no defective, one defective, and two defective
items in the batch of 10 items. Then we are told that
P (D0 ) = 0.5
P (D1 ) = 0.3
P (D2 ) = 0.2 .
Part (a): Let N be the event that neither tested component is defective. We want to
evaluate P (Di |N) for i = 0, 1, 2. We have
P (D0 |N) =
P (D0 ∩ N)
P (N|D0 )P (D0 )
=
,
P (N)
P (N|D0 )P (D0 ) + P (N|D1 )P (D1 ) + P (N|D2 )P (D2)
31
and the same type of expression for P (D1|N) and P (D2|N). To use the above note that
P (N|D0 ) = 1
P (N|D1 ) =
P (N|D2 ) =
Thus using these we have
P (N) = 1(0.5) +
9
2
10
2
8
2
10
2
=
36
45
=
28
.
45
36
28
(0.3) + (0.2) = 0.8644 ,
45
45
and then
0.5
= 0.5784359
0.8644 0.3 36
45
P (D1|N) =
= 0.277649
0.8644
28
0.2 45
= 0.14396 .
P (D2|N) =
0.8644
P (D0|N) =
Part (b): Let O (an upper case letter “o” and not a zero) be the event that one of the two
tested items is defective. Then
P (O) = P (O|D0)P (D0 ) + P (O|D1)P (D1) + P (O|D2)P (D2 )
1 9
2 8
1 1
1 1
= 0 + 10 (0.3) + 10 (0.2) = 0.06 + 0.0711 = 0.1311 .
2
2
Using this we have
P (D0 |O) = 0
0.06
= 0.4576
0.1311
0.07111
= 0.5424 .
P (D2 |O) =
0.1311
P (D1 |O) =
Exercise 2.62
Let B be the event that the camera is a basic model, and W the event that a warranty was
purchased. Then from the problem statement we have P (B) = 0.4 (so P (B ′ ) = 0.6) and
P (W |B) = 0.3 and P (W |B ′) = 0.5. Then we want to evaluate
P (W |B)P (B)
P (B ∩ W )
=
P (W )
P (W |B)P (B) + P (W |B ′)P (B ′ )
0.3(0.4)
=
= 0.2857 .
0.3(0.4) + 0.5(0.6)
P (B|W ) =
32
Exercise 2.63
Part (a): In words, we would draw a diagram with A going up (with a 0.75) and A′ going
down (with a 0.25). Then from the A branch we would draw B going up (with a 0.9) and B ′
going down (with a 0.1). From the A′ branch we would draw B going up (with a 0.8) and
B ′ going down (with a 0.2). From the AB branch we would draw C going up (with a 0.8)
and C ′ going down (with a 0.2). From the AB ′ branch we would draw C going up (with a
0.6) and C ′ going down (with a 0.4). From the A′ B branch we would draw C going up (with
a 0.7) and C ′ going down (with a 0.3). From the A′ B ′ branch we would draw C going up
(with a 0.3) and C ′ going down (with a 0.7).
Part (b): We could compute this as
P (A ∩ B ∩ C) = P (A ∩ B ∩ C|A)P (A) = P (B ∩ C|A)P (A)
= P (B ∩ C|A ∩ B)P (B|A)P (A)
= P (C|A ∩ B)P (B|A)P (A) = 0.8(0.9)(.75) = 0.54 .
Part (c): Using the tree diagram we would compute
P (B ∩ C) = 0.75(0.9)(0.8) + 0.25(0.8)(0.7) = 0.68 .
Or algebraically we could use
P (B ∩ C) = P (B ∩ C|A)P (A) + P (B ∩ C|A′ )P (A′ )
= P (C|A ∩ B)P (B|A)P (A) + P (C|A′ ∩ B)P (B|A′)P (A′ ) .
Part (d): Algebraically we have
P (C) = P (C|A ∩ B)P (A ∩ B) + P (C|A ∩ B ′ )P (A ∩ B ′ )
+ P (C|A′ ∩ B)P (A′ ∩ B) + P (C|A′ ∩ B ′ )P (A′ ∩ B ′ )
= 0.8P (A ∩ B) + 0.6P (A ∩ B ′ ) + 0.7P (A′ ∩ B) + 0.3P (A′ ∩ B ′ )
= 0.8P (B|A)P (A) + 0.6P (B ′|A)P (A) + 0.7P (B|A′)P (A′ ) + 0.3P (B ′|A′ )P (A′ )
= 0.8(0.9)(0.75) + 0.6(0.1)(0.75) + 0.7(0.8)(0.25) + 0.3(0.2)(0.25) = 0.74 .
Part (e): This would be
P (A|B ∩ B) =
P (A ∩ B ∩ C)
0.54
=
= 0.7941 .
P (B ∩ C)
0.68
Exercise 2.64
Let A1 be the event that we have the disease and B the event that our test gives a positive
1
, P (B|A1 ) = 0.99 and P (B ′ |A1 ) = 0.02. Now
result. Then P (A) = 25
1
24
′
′
P (B) = P (B|A1 )P (A1 ) + P (B|A1)P (A1 ) = 0.99
+ 0.02
= 0.0588 .
25
25
33
With this we have for the two probabilities requested
1
0.99 25
P (B|A1 )P (A1 )
=
= 0.6734
P (A1|B) =
P (B)
0.0588
24
′
′
′
0.02
P
(B
|A
)P
(A
)
1
1
25
=
= 0.02039949 .
P (A′1 |B ′ ) =
P (B ′ )
1 − 0.0588
Exercise 2.65
From the given problem statement we get
P (mean) =
P (median) =
P (mode) =
P (S|mean) =
P (S|median) =
P (S|mode) =
500
1
=
500 + 300 + 200
2
3
10
1
5
200
2
=
500
5
1
150
=
300
2
4
160
= .
200
5
Here S is the event that a given student was satisfied with the book. We then compute
P (S|mean)P (mean)
P (S|mean)P (mean) + P (S|median)P (median) + P (S|mode)P (mode)
1
2
5
23 1 = 0.3921569 .
= 2 1
1
4
+
+
5
2
2
10
5
5
P (mean|S) =
In the same way we get
P (median|S) = 0.2941176
P (mode|S) = 0.3137255 .
Exercise 2.66
There are various ways to stay connected to ones work while on vacation. While on vacation,
let E be the event that a person checks their email to stay connected, let C be the event
that a person connects to work with their cell phone, and let L be the event that a person
34
uses their laptop to stay connected. Then from the problem statement we are told that
P (E) = 0.4
P (C) = 0.3
P (L) = 0.25
P (E ∩ C) = 0.23
P ((E ∪ C ∪ L)′ ) = 0.51 so P (E ∪ C ∪ L) = 0.49 and
P (E|L) = 0.88
P (L|C) = 0.7 .
Using the above we can derive the probability of some intersections
P (E ∩ L) = P (E|L)P (L) = 0.88(0.25) = 0.22
P (L ∩ C) = P (L|C)P (C) = 0.7(0.3) = 0.21 .
Part (a): This would be
P (C|E) =
P (E ∩ C)
0.23
=
= 0.575 .
P (E)
0.4
Part (b): This would be
P (C|L) =
0.21
P (C ∩ L)
=
= 0.84 .
P (L)
0.25
Part (c): This would be
P (E ∩ L ∩ C)
.
P (E ∩ L)
The numerator in the above fraction can be computed with
P (C|E ∩ L) =
P (E ∪ L ∪ C) = P (E) + P (L) + P (C) − P (E ∩ L) − P (E ∩ C) − P (L ∩ C) + P (E ∩ L ∩ C) so
0.49 = 0.4 + 0.25 + 0.3 − 0.22 − 0.23 − 0.21 + P (E ∩ L ∩ C) ,
on solving we find P (E ∩ L ∩ C) = 0.2. Using this we find
P (C|E ∩ L) =
0.2
= 0.90909 .
0.22
Exercise 2.67
Let T be the event that a person is a terrorist (so that T ′ is the event that the person is not
a terrorist). Then from the problem statement we have that
100
= 3.33 10−6
300 106
P (T ′) = 1 − P (T ) = 1 − 3.33 10−6 .
P (T ) =
35
Let D (for detect) be the event that our system identifies a person as a terrorist, then D ′ is
the event the system does not identify the person as a terrorist. Then also from the problem
statement we have
P (D|T ) = 0.99
P (D ′|T ′ ) = 0.999 .
Then we want to evaluate P (T |D) i.e. the person is actually a terrorist given that our system
identifies them as one. We have
P (D|T )P (T )
P (D|T )P (T ) + P (D|T ′)P (T ′)
0.99(3.33 10−6)
=
= 0.0003298912 .
0.99(3.33 10−6) + (1 − 0.999)(1 − 3.33 10−6)
P (T |D) =
Exercise 2.68
From the problem statement we have P (A1 ) = 0.5, P (A2 ) = 0.3, and P (A3 ) = 0.2. Let Ld
be the event the flight is late into D.C. and La be the event the flight is late into L.A. Then
we have
P (Ld |A1 ) = 0.3
P (Ld |A2 ) = 0.25
P (Ld |A3 ) = 0.4
and
and
and
P (La |A1 ) = 0.1
P (La |A2 ) = 0.2
P (La |A3 ) = 0.25 .
We want to evaluate P {Ai |(Ld ∩ L′a ) ∪ (L′d ∩ La )} for i = 1, 2, 3. By Bayes Rule
P {(Ld ∩ L′a ) ∪ (L′d ∩ La )|A1 }P {A1 }
P {(Ld ∩ L′a ) ∪ (L′d ∩ La )}
(P {Ld ∩ L′a |A1 } + P {L′d ∩ La |A1 })P {A1 }
=
by disjoint sets
P {(Ld ∩ L′a ) ∪ (L′d ∩ La )}
(P {Ld |A1 }P {L′a |A1 } + P {L′d |A1 }P {La |A1 })P {A1 }
=
P {(Ld ∩ L′a ) ∪ (L′d ∩ La )}
(0.3(0.9) + 0.7(0.1))(0.5)
.
=
P {(Ld ∩ L′a ) ∪ (L′d ∩ La )}
P {A1 |(Ld ∩ L′a ) ∪ (L′d ∩ La )} =
By conditioning on the airline taken Ai the denominator of the above can be computed as
P {(Ld ∩ L′a ) ∪ (L′d ∩ La )} = (0.3(0.9) + 0.7(0.1))(0.5) + (0.25(0.8) + 0.75(0.8))(0.3) + (0.4(0.75) + 0.6(0.75))(0.2)
= 0.17 + 0.24 + 0.15 = 0.56 .
With this we then get for the posterior probabilities
0.17
= 0.303
0.56
0.24
P {A2 |(Ld ∩ L′a ) ∪ (L′d ∩ La )} =
= 0.428
0.56
0.15
P {A3 |(Ld ∩ L′a ) ∪ (L′d ∩ La )} =
= 0.267 .
0.56
P {A1 |(Ld ∩ L′a ) ∪ (L′d ∩ La )} =
36
Exercise 2.69
From the definitions of A1 , A2 , A3 , and B in the previous exercise we have
P (A1 ) = 0.4
P (A2 ) = 0.35
P (A3 ) = 0.25
P (B|A1 ) = 0.3
P (B|A2 ) = 0.6
P (B|A3 ) = 0.5 .
Let C be the event the customer uses a credit card, then from this exercise we have
P (C|A1 ∩ B) = 0.7
P (C|A1 ∩ B ′ ) = 0.5
P (C|A2 ∩ B) = 0.6
P (C|A2 ∩ B ′ ) = 0.5
P (C|A3 ∩ B) = 0.5
P (C|A3 ∩ B ′ ) = 0.4 .
Part (a): We want to compute
P (A2 ∩ B ∩ C) = P (C|A2 ∩ B)P (A2 ∩ B)
= P (C|A2 ∩ B)P (B|A2 )P (A2) = 0.6(0.6)(0.35) = 0.126 .
Part (b): We want to compute
P (A3 ∩ B ′ ∩ C) = P (C|A3 ∩ B ′ )P (A3 ∩ B ′ )
= P (C|A3 ∩ B ′ )P (B ′ |A3 )P (A3 ) = 0.4(0.5)(0.25) = 0.05 .
Part (c): We want to compute
P (A3 ∩ C) = P (A3 ∩ C ∩ B) + P (A3 ∩ C ∩ B ′ )
= P (C|A3 ∩ B)P (A3 ∩ B) + 0.05 = 0.5P (B|A3)P (A3 ) + 0.05
= 0.5(0.5)(0.25) + 0.05 = 0.1125 .
Part (d): We want to compute
P (B ∩ C) = P (B ∩ C|A1 )P (A1 ) + P (B ∩ C|A2 )P (A2 ) + P (B ∩ C|A3 )P (A3 )
= P (C|B ∩ A1 )P (B|A1 )P (A1 ) + P (C|B ∩ A2 )P (B|A2 )P (A2 ) + P (C|B ∩ A3 )P (B|A3 )P (A3 )
= 0.7(0.3)(0.4) + 0.6(0.6)(0.35) + 0.5(0.5)(0.25) = 0.2725 .
37
Part (e): We want to compute
P (C) = P (C ∩ B) + P (C ∩ B ′ )
= 0.2725 + 0.5(0.7)(0.4) + 0.5(0.4)(0.35) + 0.4(0.5)(0.25) = 0.5325 .
Part (f): We want to compute
P (A3 |C) =
P (A3 ∩ C)
0.1125
=
= 0.2112676 .
P (C)
0.5325
Exercise 2.70
From the definition of independence we have that events A and B are dependent if P (A|B) 6=
P (A) or P (A ∩ B) 6= P (A)P (B). In exercise 47 we computed P (A|B) and found it to be
0.625 (see Equation 7) which is not equal to P (A) = 0.5. Note also that using the other
expression we have P (A ∩ B) = 0.25 6= P (A)P (B) = 0.5(0.4) = 0.2.
Exercise 2.71
Part (a): Since the events are independent what happens with the Asia project does not
effect the European project and thus P (B ′ ) = 1 − P (B) = 0.3.
Part (b): We have (using independence) that
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
= P (A) + P (B) − P (A)P (B) = 0.4 + 0.7 − 0.28 = 0.82 .
Part (c): We have
P ((A ∩ B ′ ) ∩ (A ∪ B))
P ((A ∩ B ′ ∩ A) ∪ (A ∩ B ′ ∩ B))
=
P (A ∪ B)
P (A ∪ B)
′
′
P (A)P (B )
(0.4)(0.3)
P (A ∩ B )
=
=
= 0.14634 .
=
P (A ∪ B)
P (A ∪ B)
0.82
P (A ∩ B ′ |A ∪ B) =
Exercise 2.72
From exercise 13 we have
P (A1 ∩ A2 ) = 0.11 vs. P (A1)P (A2 ) = 0.22(0.25) = 0.055
P (A1 ∩ A3 ) = 0.05 vs. P (A1)P (A3 ) = 0.22(0.28) = 0.0618
P (A2 ∩ A3 ) = 0.07 vs. P (A2)P (A3 ) = 0.25(0.28) = 0.07 .
Thus A2 and A3 are independent while the others are not.
38
Exercise 2.73
We have
P (A′ ∩ B) = P (B) − P (A ∩ B) = P (B) − P (A)P (B) = P (B)(1 − P (A)) = P (B)P (A′ ) ,
showing that the events A′ and B are independent.
Exercise 2.74
The probabilities that both phenotype are O is given by
P (O1 ∩ O2 ) = P (O1)P (O2) = 0.442 = 0.1936 .
The probabilities that the two phenotype match is given by
P (A1 ∩ A2 ) + P (B1 ∩ B2 ) + P ((A ∩ B1 ) ∩ (A ∩ B2 )) + P (O1 ∩ O2 )
= P (A1 )P (A2 ) + P (B1 )P (B2) + P (A ∩ B1 )P (A ∩ B2 ) + P (O1)P (O2 )
= 0.422 + 0.12 + 0.042 + 0.442 = 0.3816 .
Exercise 2.75
From the problem statement, the probability that a point does not signal a problem (when
it is running correctly) is 0.95. The probability that in ten points at least one indicates a
problem is the complement of the probability that in ten points none indicate a problem
which is 0.9510 = 0.5987369. Thus the problem that at least one point that signals a problem
is 1 − 0.9510 = 0.4012631. For 25 points the probability that at least one point signals a
problems is 1 − 0.9525 = 0.7226.
Exercise 2.76
From the problem statement the probability that a grader will not make an error on one
question is 0.9. The probability they make no errors in ten questions is then 0.910 = 0.34867.
The probability of at least one error in ten questions is 1 − 0.910 = 0.6513. In general if the
probability that the grader makes an error on one question is p the probability of no error is
1 − p. The probability no errors in n questions is then (1 − p)n . The probability of at least
one error is mode is 1 − (1 − p)n .
Exercise 2.77
Part (a): Let p be the probability that an individual rivet is defective, then 1 − p is the
probability a single rivet is not defective and (1 − p)25 is probability that all 25 rivets are
39
not defective. Then 1 − (1 − p)25 is the probability that at least one rivets is defective and
the entire seam will need reworking. We are told that
1 − (1 − p)25 = 0.2 so solving for p gives p = 0.008886 .
Note that this is a different number than the one given in the back of the book. If anyone
sees anything wrong with what I have done please contact me.
Part (b): In this case we want 1 − (1 − p)25 = 0.1 so p = 0.004205.
Exercise 2.78
From the problem statement we have
P (at least one valve opens) = 1 − P (no valves opens) = 1 − 0.055 = 0.9999 ,
and
P (at least one valves fails to open) = 1 − P (all valves open) = 1 − 0.955 = 0.226 .
Exercise 2.79
Let Fo be the event that the older pump fails and Fn the event that the newer pump fails.
We are told that these events are independent and that P (Fo) = 0.1 and P (Fn ) = 0.05 thus
P (Fo ∩ Fn ) = P (Fo)P (Fn ) = 0.1(0.05) = 0.005 .
Note that this is a different number than the one given in the back of the book. If anyone
sees anything wrong with what I have done please contact me.
Exercise 2.80
Let Ci be the event that the component i works and let p = 0.9 be the probability that it
works. Then from the diagram given we have that
P (system works) = P (C1 ∪ C2 ∪ (C3 ∩ C4 ))
= P (C1) + P (C2 ) + P (C3 ∩ C4 )
− P (C1 ∩ C2 ) − P (C1 ∩ (C3 ∩ C4 )) − P (C2 ∩ (C3 ∩ C4 ))
+ P (C1 ∩ C2 ∩ (C3 ∩ C4 ))
= 2p + p2 − p2 − 2p3 + p4 = 2p − 2p3 + p4 = 0.9981 .
40
Exercise 2.81
Based on the figure in Example 2.36 we have (and assuming independence) that
P (system works) = P {(components 1 and 2 work) or (components 3 and 4 work)}
= P {(A1 ∩ A2 ) ∪ (A3 ∩ A4 )}
= P (A1 ∩ A2 ) + P (A3 ∩ A4 ) − P (A1 ∩ A2 ∩ A3 ∩ A4 )
= p2 + p2 − p4 = 2p2 − p4 .
We want this to equal 0.99 which gives the equation
p4 − 2p2 + 0.99 = 0 .
This has roots p2 = 0.9 and p2 = 1.1. Taking the square root of only valid value gives
p = 0.94868.
Exercise 2.82
These events are not pairwise independent since the events A and C are not independent.
To be mutually independent all events must be pairwise independent which they are not.
Exercise 2.83
Let D be the event a defect is present Ii the event that inspector i detects a defect. Then
from the problem statement we are told
P (I1 |D) = P (I2 |D) = 0.9 .
and
P ((I1′ ∩ I2 ) ∪ (I1 ∩ I2′ ) ∪ (I1′ ∩ I2′ )|D) = 0.2 .
Using a Venn diagram set in left-hand-side of above expression is the complement of the set
I1 ∩ I2 thus the above is equivalent to
1 − P (I1 ∩ I2 |D) = 0.2 or P (I1 ∩ I2 |D) = 0.8 .
Part (a): To have only the first inspector detect the defect we want to evaluate
P (I1 ∩ I2′ |D) = P (I1 |D) − P (I1 ∩ I2 |D) = 0.9 − 0.8 = 0.1 .
To have only one of the two inspectors detect the defect we need to compute
P ((I1 ∩ I2′ ) ∪ (I1′ ∩ I2 )) = P (I1 ∩ I2′ ) + P (I1′ ∩ I2 ) − P ((I1 ∩ I2′ ) ∩ (I1′ ∩ I2′ ))
= 0.1 + 0.1 = 0.2 .
41
Part (b): The probability that both inspectors do not find the defect in one defective
component is given by
P (I1′ ∩ I2′ |D) = P ((I1 ∪ I2 )′ |D) = 1 − P (I1 ∪ I2 |D)
= 1 − (P (I1 |D) + P (I2 |D) − P (I1 ∩ I2 |D)) = 1 − (0.9 + 0.9 − 0.8) = 0 .
Thus to have all three defective components missed is then P (I1′ ∩ I2′ |D)3 = 0.
Exercise 2.84
Let A be the event vehicle pass inspection and then P (A) = 0.7.
Part (a): This would be P (A)3 = 0.73 = 0.343.
Part (b): This would be 1 − P (all pass inspection) = 1 − 0.343 = 0.657.
Part (c): This would be
Part (d): This would be
3
1
0.71 0.32 = 0.189.
P (zero pass inspection) + P (one pass inspection) = 0.33 + 0.189 = 0.216 .
Part (e): We have
P (A1 ∩ A2 ∩ A3 |A1 ∪ A2 ∪ A3 ) =
=
P (A1 ∩ A2 ∩ A3 ∩ (A1 ∪ A2 ∪ A3 ))
P (A1 ∪ A2 ∪ A3 )
P (A1 ∩ A2 ∩ A3 )
P (A1 ∪ A2 ∪ A3 )
0.343
P (A1 ) + P (A2 ) + P (A3 ) − P (A1 ∩ A2 ) − P (A1 ∩ A3 ) − P (A2 ∩ A3 ) + P (A1 ∩ A2 ∩ A3 )
0.343
=
= 0.352518 .
3(0.7) − 3(0.72 ) + 0.73
=
Exercise 2.85
Part (a): This would be p + (1 − p)p = 2p − p2 = p(2 − p).
Part (b): One way to derive this would be to evaluate
p + (1 − p)p + (1 − p)2 p + · · · + (1 − p)n−1 p = p(1 + (1 − p) + (1 − p)2 + · · · + (1 − p)n−1 )
1 − (1 − p)n
= 1 − (1 − p)n .
=p
1 − (1 − p)
Note that we get the same result by evaluating
1 − P (flaw not detected in n fixations) = 1 − (1 − p)n .
42
Part (c): This would be
1 − P (flaw is detected in three fixations) = 1 − (1 − (1 − p)3 ) = (1 − p)3 .
Part (d): This would be
P (pass inspection) = P (pass inspection|flawed)P (flawed) + P (pass inspection|flawed′ )P (flawed′ )
= (1 − p)3 (0.1) + 1(0.9) .
Part (e): This would be
P (flawed|pass inspection) =
P (pass inspection|flawed)P (flawed)
(1 − p)3 (0.1)
=
.
P (pass inspection)
(1 − p)3 (0.1) + 0.9
Exercise 2.86
Part (a): From the problem statement we have that
2000
P (A) =
= 0.2
10000
1999
2000
′
′
P (B) = P (B|A)P (A) + P (B|A )P (A ) =
(0.2) +
(0.8) = 0.2
9999
9999
1999
(0.2) = 0.39984 .
P (A ∩ B) = P (B|A)P (A) =
9999
To see if A and B are independent we next compute P (A)P (B) = 0.2(0.2) = 0.04. Since
the two expressions P (A ∩ B) and P (A)P (B) are not equal the events A and B are not
independent.
Part (b): If P (A) = P (B) = 0.2 and A and B are independent than P (A ∩ B) =
P (A)P (B) = 0.04. The difference between this and the value computed in Part (a) /is
0.04 − 0.39984 = 1.6 10−5. Since this difference is so small we might conclude that A and B
are independent.
Part (c): If we now have only two green boards then we have
2
P (A) =
= 0.2
10
1
2
′
′
P (B) = P (B|A)P (A) + P (B|A )P (A ) =
(0.2) +
(0.8) = 0.2
9
9
1
P (B|A) = = 0.1111 so
9
P (A ∩ B) = P (B|A)P (A) = 0.0222 .
If we assume that A and B are independent (as we did in Part (b)) we would again compute
P (A)P (B) = 0.22 = 0.04. Note that is not very close in value to P (A ∩ B). Removing one
green board in the 10 board case changes the distribution of the number of green boards still
remaining quite significantly, while when there are 2000 green board initially removing one
does not change the distribution of green boards remaining.
43
Exercise 2.87
As earlier let Ci be the event that component i works. We want to compute P (system work).
Using the expression P (Ci ) = p we find
P (system works) = P (C1 ∪ C2 )P {(C3 ∩ C4 ) ∪ (C5 ∩ C6 )}P (C7 ) .
Lets compute the different parts in tern. First we have
P (C1 ∪ C2 ) = (P (C1) + P (C2 ) − P (C1 ∩ C2 )) = 2p − p2 .
Next we have
P {(C3 ∩ C4 ) ∪ (C5 ∩ C6 )} = P (C3 ∩ C4 ) + P (C5 ∩ C6 ) − P ((C3 ∩ C4 ) ∩ (C5 ∩ C6 ))
= p2 + p2 − p4 = 2p2 − p4 .
Thus we get
P (system works) = (2p − p2 )(2p2 − p4 )p = p4 (2 − p)(2 − p2 ) .
when p = 0.9 the above becomes 0.85883. If this system was connected in parallel with the
system in Figure 2.14 when we define the events S1 and S2 as
S1 ≡ System In Figure 2.14 (a) Works
S2 ≡ System In Problem 87 Works ,
we would have
P (S1 ∪ S2 ) = P (S1 ) + P (S2 ) − P (S1 ∩ S2 ) = P (S1 ) + P (S2 ) − P (S1 )P (S2)
= 0.927 + 0.85883 − 0.927(0.85883) = 0.9896946 .
Exercise 2.88
Route 1 has four railway crossings and route 2 has only two railway crossings but is longer.
Let Ti be the event that we are slowed by a train and we will take P (Ti ) = 0.1.
Part (a): The probability we are late given we take each route can be computed by
4
4
4
3
1
2
2
(0.1)4 (0.9)0 = 0.0523
(0.1) (0.9) +
(0.1) (0.9) +
P (late|route 1) =
4
3
2
2
2
(0.1)2 (0.9)0 = 0.19 .
(0.1)1 (0.9)1 +
P (late|route 2) =
2
1
Since the probability we are late under route 1 is smaller we should take route 1.
Part (b): If we toss a coin to decide which route to take then the probability we took route
one is given by
P (late|route 1)P (route 1)
P (late|route 1)P (route 1) + P (late|route 2)P (route 2)
0.0523(0.5)
=
= 0.215848 .
0.0523(0.5) + 0.19(0.5)
P (route 1) =
44
Exercise 2.89
We want the probability that exactly one tag was lost given that at most one is lost
P ((C1 ∩ C2′ ) ∪ (C1′ ∩ C2 )|(C1′ ∩ C2′ ) ∪ (C1′ ∩ C2 ) ∪ (C1 ∩ C2′ )) =
=
P { (C1 ∩ C2′ ) ∪ (C1′ ∩ C2 ) ∩ (C1′ ∩ C2′ ) ∪ (C1′ ∩ C2 ) ∪ (C1 ∩ C2′ ) }
P {(C1′ ∩ C2′ ) ∪ (C1′ ∩ C2 ) ∪ (C1 ∩ C2′ )}
P {(C1 ∩ C2′ ) ∪ (C1′ ∩ C2 )}
.
∩ C2′ ) ∪ (C1′ ∩ C2 ) ∪ (C1 ∩ C2′ )}
P {(C1′
The numerator N , in the above can be expanded using
N = P (C1 ∩ C2′ ) + P (C1′ ∩ C2 ) − P ((C1 ∩ C2′ ) ∩ (C1′ ∩ C2 ))
= P (C1)P (C2′ ) + P (C1′ )P (C2) − 0 = 2π(1 − π) .
since the last term is zero. The denominator D, can be expanded as
D = P (C1′ ∩ C2′ ) + P (C1′ ∩ C2 ) + P (C1 ∩ C2′ )
− P ((C1′ ∩ C2′ ) ∩ (C1′ ∩ C2 )) − P ((C1′ ∩ C2′ ) ∩ (C1 ∩ C2′ )) − P ((C1′ ∩ C2 ) ∩ (C1 ∩ C2′ ))
+ P ((C1′ ∩ C2′ ) ∩ (C1′ ∩ C2 ) ∩ (C1 ∩ C2′ ))
= P (C1′ )P (C2′ ) + P (C1′ )P (C2) + P (C1 )P (C2′ ) − 0 − 0 − 0 + 0
= (1 − π)2 + 2π(1 − π) = (1 − π)(1 + π) .
Dividing these two expressions we get
P ((C1 ∩ C2′ ) ∪ (C1′ ∩ C2 )|(C1′ ∩ C2′ ) ∪ (C1′ ∩ C2 ) ∪ (C1 ∩ C2′ )) =
2π
.
1+π
Exercise 2.90
Part (a): This would be
20
3
Part (b): This would be
19
3
= 1140.
= 969.
Part (c): This would be 20
− 10
= 1020 or the total number of shifts subtracting the
3
3
number of shifts that don’t have 1 of the 10 best machinists.
Part (d): This would be
19
3
20
3
= 0.85 .
Exercise 2.91
Let Li be the event that a can comes from line i. Note that the numbers given in the table
are P (defect type|Li ).
45
Part (a): This would be
1
500
= .
1500
3
The probability that the reason for nonconformist was a crack is given by
P (L1 ) =
P (crack) = P (crack|L1 )P (L1 ) + P (crack|L2 )P (L2 ) + P (crack|L3 )P (L3 )
5
4
6
= 0.5
+ 0.44
+ 0.4
= 0.444 .
15
15
15
Part (b): If the can came from line one the probability the defect was a blemish can be
read from the table. We have P (blemish|L1 ) = 0.15.
Part (c):
P (L1 ∩ surface defect)
=
P (L1 |surface defect) =
P (surface defect)
0.1
0.1
5
+ 0.08
15
5
15 4
15
+ 0.15
6
15
= 0.2906977 .
Exercise 2.92
Part (a): We have 10
= 210 ways of choosing the six forms from the ten we have to hand
6
off. If we want to have the remaining four forms to be all of the same type we have to choose
all six of the withdrawal petitions or all four of the substitution requests (and then any two
withdrawal petitions). We can do this in
4 6
6 4
= 16 ,
+
4 2
6 0
ways. Then the probability this happens is then
16
210
= 0.07619048.
Part (b): We have 10! ways of arranging all ten forms. The event that the first four occur in
sequentially different order can start happening in two ways. By starting with a withdrawal
petition or a substitution requests. Thus the number of ways that we can have the first four
forms alternating is given by
6(4)(5)(3) + 4(6)(3)(5) .
The first product 6(4)(5)(3) counts by first selecting one of the six withdrawal petition, then
second selecting one of the substitution requests, then selecting another withdrawal petition,
and finally another substitution request. The second product is derived in a similar way but
starting with a substitution request. Thus the probability we are looking for is given by
[6(4)(5)(3) + 4(6)(3)(5)]6!
= 0.1428571 .
10!
46
Exercise 2.93
We know that when A and B are independent we have
P (A ∪ B) = P (A) + P (B) − P (A ∩ B) = P (A) + P (B) − P (A)P (B) .
With P (A)P (B) = 0.144 we have P (B) =
using the above we get
0.144
P (A)
0.626 = P (A) +
and since we are told that P (A ∪ B) = 0.626
0.144
− 0.144 .
P (A)
If we write this as a quadratic equation we get
P (A)2 − 0.770P (A) + 0.144 = 0 .
If we solve for P (A) we get P (A) = 0.45 and P (B) = 0.32 (enforcing P (A) > P (B)).
Exercise 2.94
Let Ci be the event that the ith relay works correctly for i = 1, 2, 3 . Then from the problem
statement we know that P (Ci ) = 0.8 (so P (Ci′ ) = 0.2).
Part (a): This would be
Q3
i=1
P (Ci ) = 0.83 = 0.512.
Part (b): To have the correct output we can have no errors (all relays work correctly) or
two errors (two relays work incorrectly) to give a probability of
3
3
0
3
(0.2)2 (0.8)1 = 0.608 .
(0.2) (0.8) +
2
0
Part (c): Let T be the event that we transmit a one and R the event that we receive a one.
We are told that P (T ) = 0.7 and we want to evaluate
P (T |R) =
P (R|T )P (T )
.
P (R|T )P (T ) + P (R|T ′ )P (T ′)
In Part (b) we computed P (R|T ) = 0.608. In the same way we can compute P (R|T ′ ) = 0.392,
thus we get
0.608(0.7)
= 0.7835052 .
P (T |R) =
0.608(0.7) + 0.392(0.3)
Exercise 2.95
Part (a): This would be
1
5!
Part (b): This would be
1(4!)
5!
= 0.008333333.
=
1
5
= 0.2.
Part (c): This would be (4!)1
= 15 = 0.2, since we specify that F is the last person to hear
5!
the rumor and then have 4! ways of arranging the other four people.
47
Exercise 2.96
At each stage the person F has a probability of 15 of getting the rumor and a probability
4
of not getting the rumor. The the probability that F has not heard the rumor after ten
5
10
tellings is then 45
= 0.1073742.
Exercise 2.97
Let E be the event that we have the trace impurity in our sample and D the event that we
detect this. Then from the problem statement we are told that P (D|E) = 0.8, P (D ′|E ′ ) =
0.9 and P (E) = 0.4. In the experiment performed we found two detections from three trials
and let D be this event. Then we have
P (V |E)P (E)
.
P (E|V ) =
P (V |E)P (E) + P (V |E ′ )P (E ′)
We have that
3
3
2
′
0.82 0.2 = 0.384
P (D|E) P (D |E) =
P (V |E) =
2
2
3
3
0.12 0.9 = 0.027 .
P (D|E ′)2 P (D ′|E ′ ) =
P (V |E ′ ) =
2
2
Thus we compute
P (E|V ) =
0.384(0.4)
= 0.9045936 .
0.384(0.4) + 0.027(0.6)
Exercise 2.98
This would be
3
1
52
= 0.3472222 .
63
For the denominator note that there are 63 ways to select the three chosen categories, since
each contestant has six choices. For the numerator we have three choices for which contestant
will select category one and then the other two contestants each have five choices for their
categories.
Exercise 2.99
Part (a): We break down the number of ways a faster can pass inspection as
P (pass inspection) = P (pass initially)
+ P (pass after recrimping|pass initially′ , recrimped)P (pass initially′ , recrimped)
= 0.95 + 0.05(0.8)(0.6) = 0.974 .
48
Part (b): This would be
P (passed initially|passed inspection) =
0.95
0.95
==
= 0.9753593 .
0.95 + 0.05(0.8)(0.6)
0.974
Note this result is different than that in the back of the book. If anyone sees anything that
I did wrong please contact me.
Exercise 2.100
Let D (for disease) be the event that we are a carrier of the disease, and T (for test) be the
event that the test comes back positive. Then we are told that
P (T |D) = 0.9 and P (T |D ′) = 0.05 .
Part (a): This would be
P (T1 , T2 ) + P (T1′ , T2′ ) = P (T1 )P (T2 ) + P (T1′ )P (T2′ ) .
Now
P (T1 ) = P (T |D)P (D) + P (T |D ′)P (D ′) = 0.9(0.01) + 0.05(0.9) = 0.0585 .
Thus with this P (T1′ ) = 1 − 0.0585 = 0.9415. Thus the probability we want is given by
0.05852 + 0.94152 = 0.88984 .
Part (b): This would be
P (T1 , T2 |D)P (D)
P (T1 , T2 |D)P (D) + P (T1 , T2 |D ′ )P (D ′)
P (T |D)2P (D)
0.92 (0.01)
=
=
= 0.7659 .
P (T |D)2P (D) + P (T |D ′)2 P (D ′ )
0.92 (0.01) + 0.052 (0.99)
P (D|T1, T2 ) =
Exercise 2.101
Let C1 and C2 be the events that components one and two function. Then from the problem
statement we have that the probability that the second component functions is given by
P (C2 ) = 0.9, the probability that both components function is given by P (C1 ∩ C2 ) = 0.75
and the probability that at least one component function is given by
P ((C1 ∩ C2 ) ∪ (C1′ ∩ C2 ) ∪ (C1 ∩ C2′ )) = 0.96 .
We want to evaluate
P (C2|C1 ) =
P (C1 ∩ C2 )
.
P (C1 )
49
To do this recognize that C1 ∩ C2 , C1′ ∩ C2 , and C1 ∩ C2′ are mutually exclusive events and
we can write the probability that at least one component function as
P (C1 ∩ C2 ) + P (C1′ ∩ C2 ) + P (C1 ∩ C2′ ) = 0.96 ,
or since we know P (C1 ∩ C2 ) this becomes
P (C1′ ∩ C2 ) + P (C1 ∩ C2′ ) = 0.96 − 0.75 = 0.21 .
(9)
Now we can write the event C1 as the union of two mutually exclusive events so that we get
P (C2 ) = 0.9 = P (C2 ∩ C1 ) + P (C2 ∩ C1′ ) = 0.75 + P (C2 ∩ C1′ ) ,
which gives P (C2 ∩C1′ ) = 0.15. Using this and Equation 9 we have P (C1 ∩C2′ ) = 0.21−0.15 =
0.06. Putting everything together we have
P (C1) = P (C1 ∩ C2 ) + P (C1 ∩ C2′ ) = 0.75 + 0.06 = 0.81 .
Thus the conditional probability we want is then given by P (C2 |C1 ) =
0.75
0.81
= 0.92592.
Exercise 2.102
If we draw a diagram to represent the information given then from the diagram we have
P (E1 ∩ L) = P (L|E1 )P (E1 ) = 0.02(0.4) = 0.08 .
Exercise 2.103
Part (a): We would have (recall L is the event that a parcel is late)
P (L) = P (L|E1 )P (E1 ) + P (L|E2 )P (E2 ) + P (L|E3 )P (E3 )
= 0.02(0.4) + 0.01(0.5) + 0.05(.1) = 0.018 .
Part (b): We can compute the requested probability as
P (L′ ∩ E1′ )
P (L′ ∩ (E2 ∪ E3 ))
P ((L′ ∩ E2 ) ∪ (L′ ∩ E3 ))
=
=
P (L′ )
1 − P (L)
1 − P (L)
′
′
′
P (L |E2 )P (E2 ) + P (L′ |E3 )P (E3 )
P (L ∩ E2 ) + P (L ∩ E3 )
=
=
1 − P (L)
1 − P (L)
0.99(0.5) + 0.95(0.1)
=
= 0.6008 .
1 − 0.018
P (E1′ |L′ ) =
50
Exercise 2.104
This is an application of Bayes’ rule where we want to evaluate
P (R|Ai )P (Ai )
P (Ai |R) = P3
.
j=1 P (R|Aj )P (Aj )
Using the numbers given in this problem we find these values given by
[1] 0.3623188 0.3478261 0.2898551
Exercise 2.105
Part (a): We would find
365
10
and
10!
= 0.88305 .
P (All have different birthdays) =
36510
P (At least two people have the same birthdays) = 1 − P (all people have different birthdays)
= 1 − 0.88305 = 0.1169 .
Part (b): For general number of people k we would have
P (all k have different birthdays) =
365
k
k!
365k
.
365
k
k!
P (At least two people have the same birthday = 1 −
.
k
365
We can consider different values of k and find the smallest value where the above probability
is larger than one-half using the following R code
ks = 1:40
prob = 1 - ( choose(365,ks) * factorial(ks) ) / ( 365^ks )
plot(ks,prob)
grid()
which( prob > 0.5 )
Running the above we find that k = 23.
Part (c): Let E be the event that at least two people have the same birthday or at least
two people have the same last three digits of their SSN then we have
P (E) = 1 − P (all have different birthday)P (all have different SSN)
!
!
1000
365
10!
10!
10
10
= 1 − (0.88305)(0.9558606) = 0.155927 .
=1−
10
365
100010
51
Exercise 2.106
In the tables for this problem we are given the values of P (Oi |G) and P (Oi |B) for i = 1, 2, 3.
To save typing I’ll denote each of the possible observation ranges as Oi for i = 1, 2, 3. For
example, O1 means the event that the observations were such that R1 < R2 < R3 .
Part (a): In the notation of this problem we want to show that P (G|O1) > P (B|O1). Using
Bayes’ rule we see that this inequality is equivalent to
P (O1 |G)P (G)
P (O1|B)P (B)
>
.
P (O1 )
P (O1)
We will evaluate the left-hand-side and the right-hand-side of the above and show that it is
true. Note that we have
P (O1) = P (O1 |G)P (G) + P (O1 |B)P (B) = 0.6(0.25) + 0.1(0.75) = 0.225 .
With this the left-hand-side is given by
0.6(0.25)
= 0.666 ,
0.225
while the right-hand-side is given by
0.1(0.75)
= 0.333 .
0.225
And we see that the requested inequality is true as we were to show. If we receive the
measurement O1 , we would classify the sample as Granite since (as we just showed) its
posterior probability is larger.
Part (b): The first question corresponds to the observation we are calling O2 . In this case
we have
P (O2) = 0.25(0.25) + 0.2(0.75) = 0.2125 ,
so that
P (G|O2) =
0.25(0.25)
0.2(0.75)
= 0.294 and P (B|O2) =
= 0.705 .
0.2125
0.2125
In this case we classify as Basalt. The second question corresponds to the observation we
are calling O3 . We have
P (O3) = 0.15(0.25) + 0.7(0.75) = 0.5625 ,
so that
P (G|O3) =
0.7(0.75)
0.15(0.25)
= 0.066 and P (B|O3) =
= 0.933 .
0.5625
0.5625
In this case we also classify as Basalt.
52
Part (c): Since if I receive the observation O1 , we would classify the rock as Granite an
error will happen if in fact the rock is Basalt. Generalizing this for the other three types of
observations the probability of error is given by
P (O1 ∩ B) + P (O2 ∩ G) + P (O3 ∩ G) = P (O1 |B)P (B) + P (O2|G)P (G) + P (O3 |G)P (G)
= 0.1(0.75) + 0.25(0.25) + 0.15(0.25) = 0.175 .
Part (d): As in Part (a) we classify a Granite if P (G|O1) > P (B|O1 ) or
P (O1|B)P (B)
P (O1 |G)P (G)
>
,
P (O1 )
P (O1)
or
0.6p
0.1p
>
,
0.6p + 0.1(1 − p)
0.6p + 0.1(1 − p)
which is always true no matter what the value of p. Next we would need to set up the same
equations for observations O2 and O3 and see what restrictions they imposed on the value
of p. A value of p large enough should make us always classify every rock as Granite.
Exercise 2.107
The probability we want is given by the expression
P (detected) = P (G1 ∪ G2 ∪ · · · ∪ Gn )
= 1 − P (G′1 ∩ G′2 ∩ · · · ∩ G′n )
= 1 − P (G′1 )P (G′2 ) · · · P (G′n )
= 1 − (1 − p1 )(1 − p2 ) · · · (1 − pn ) .
Exercise 2.108
Part (a): We would need to get four balls in a row which happens with probability 0.54 .
Part (b): This would be
5
2
3
0.5 0.5 (0.5) = 0.15625 .
2
Part (c): This would be
5
4
3
2
3
1
3
0
3
0.5 0.5 (0.5) = 0.28125 .
0.5 0.5 (0.5) +
0.5 0.5 (0.5) +
2
1
0
Where in the above the first term corresponds to four total pitches (all balls), the second
term corresponds to five total pitches one of which is a strike and the last of which is a
53
ball and the third term corresponds to six total pitches two of which are strikes and the
last of which is a ball. Given this value, the probability of a strike out is then given by
1 − 0.28125 = 0.71875.
Part (d): To have the first batter score a run with each batter not swinging the pitcher
must walk four batters before he gets three outs. Let pw = 0.28125 and ps = 0.71875 be
the probabilities that the pitcher walks or strikes out a given batter (which we computed in
Part (b)). Then the probability we want is given by
5 2 3
4 1 3
3 0 3
p p pw = 0.0565 .
p p pw +
p p pw +
2 s w
1 s w
0 s w
Exercise 2.109
Part (a): This is
1
4!
= 0.0416
Part (b): It is not possible to have three engineers (but not four) in correct rooms. The
(4 )
probability we have only two engineers in correct rooms is 4!2 = 0.25. The probability we
have one engineer in a correct room is given by
4
3
3!
−
1
−
0
−
1
1
= 0.333 .
4!
In the numerator of the above fraction from 3! we subtract 1 (the number of ways to have
three engineers in correct rooms),
then 0 (the number of ways to have only two engineers
in correct rooms) and finally 31 (the number of ways to have one engineer in the correct
rooms). This then gives
4
1
1−
− 2 − 0.333 = 0.375 ,
24
4!
for the probability we seek.
Exercise 2.110
Part (a): By independence we have P (A∩B ∩C) = P (A)P (B)P (C) = 0.6(0.5)(0.4) = 0.12,
so 1 − P (A ∩ B ∩ C) = 1 − 0.12 = 0.88.
Part (b): We have
P (A ∩ B ′ ∩ C ′ ) = P (A)P (B ′)P (C ′ ) = 0.6(1 − 0.5)(1 − 0.4) = 0.18 .
What we need to evaluate is
P (A ∩ B ′ ∩ C ′ ) + P (A′ ∩ B ∩ C ′ ) + P (A′ ∩ B ′ ∩ C) ,
or
(0.6)(1 − 0.5)(1 − 0.4) + (1 − 0.6)(0.5)(1 − 0.4) + (1 − 0.6)(1 − 0.5)(0.4) = 0.38 .
54
Exercise 2.111
See the python code ex2 111.py where the given strategy is implemented for any value of
n the number of people we interview. For example, when we run that code with n = 10 we
get the following output
s=
s=
s=
s=
s=
s=
s=
s=
s=
s=
0;
1;
2;
3;
4;
5;
6;
7;
8;
9;
prob_best_hire=
prob_best_hire=
prob_best_hire=
prob_best_hire=
prob_best_hire=
prob_best_hire=
prob_best_hire=
prob_best_hire=
prob_best_hire=
prob_best_hire=
362880/3628800
1026576/3628800
1327392/3628800
1446768/3628800
1445184/3628800
1352880/3628800
1188000/3628800
962640/3628800
685440/3628800
362880/3628800
=
=
=
=
=
=
=
=
=
=
0.100000
0.282897
0.365794
0.398690
0.398254
0.372817
0.327381
0.265278
0.188889
0.100000
Were we see that in this case if we take s = 3 we maximize the probability that we hire the
best person. If n = 4 then s = 1 maximizes the probability we hire the best worker.
Exercise 2.112
The probability of at least one event is given by
P = 1 − P (no events happen) = 1 − P (A′1 ∩ A′2 ∩ A′3 ∩ A′4 )
= 1 − P (A′1)P (A′2 )P (A′3 )P (A′4 ) = 1 − (1 − p1 )(1 − p2 )(1 − p3 )(1 − p4 ) .
The probability of at least two events happen
P (only one event happens) = P (A1 ∩ A′2 ∩ A′3 ∩ A′4 ) + P (A′1 ∩ A2 ∩ A′3 ∩ A′4 )
+ P (A′1 ∩ A′2 ∩ A3 ∩ A′4 ) + P (A′1 ∩ A′2 ∩ A′3 ∩ A4 )
= p1 (1 − p2 )(1 − p3 )(1 − p4 ) + (1 − p1 )p2 (1 − p3 )(1 − p4 )
+ (1 − p1 )(1 − p2 )p3 (1 − p4 ) + (1 − p1 )(1 − p2 )(1 − p3 )p4 .
Using these we have that
P (at least two events happen) = 1 − P (no events happen) − P (only one event happens) .
Exercise 2.113
P (A1 ∩ A2 ) = P (win prize 1 and win prize 2) =
55
1
,
4
since in this case we have to draw the fourth slip of paper for this event to happen. Now
P (A1 ) = 24 = 12 and P (A2) = 42 = 21 and so we have P (A1 ∩ A2 ) = P (A1 )P (A2 ). In the same
way we have
1
P (A1 ∩ A3 ) = P (win prize 1 and win prize 3) = ,
4
since in this case we have to draw the fourth slip of paper for this event to happen. Now
P (A3 ) = 42 = 12 and so we have P (A1 ∩ A3 ) = P (A1 )P (A3 ). Now for A2 ∩ A3 we have the
same conclusion. For P (A1 ∩ A2 ∩ A3 ) we have
P (A1 ∩ A2 ∩ A3 ) = P (win prize 1, 2 and 3) =
1
1
6= P (A1 )P (A2 )P (A3 ) = ,
4
8
as we were to show.
Exercise 2.114
Using the definition of conditional probability we have
P (A1 |A2 ∩ A3 ) =
P (A1 ∩ A2 ∩ A3 )
P (A1 )P (A2 )P (A3 )
=
= P (A1 ) .
P (A2 ∩ A3 )
P (A2 )P (A3 )
56
Discrete Random Variables
and Probability Distributions
Problem Solutions
Exercise 3.1
We would have the following outcomes
SSS, SSF, SFS, FSS, FFS, FSF, SFF, FFF
with the following values for the random variable X
3, 2, 2, 2, 1, 1, 1, 0
Exercise 3.2
The event having a child of a specified sex (like a boy or a girl) can be viewed as a “success”
and having a child of the opposite sex viewed as a failure.
Catching a train (or not) can be viewed as an experiment where the outcome is either a
“success” or a “failure”.
Making a passing grade in a statistics class can be viewed as an experiment where the
outcome is either a “success” or a “failure”.
Exercise 3.3
The minimum number of cars at the two pumps or the product of the number of cars at the
two pumps.
Exercise 3.4
A zip code is a five digit number. Assuming that all possible five digit numbers are possible
zip codes the number of nonzero numbers in a zip code could be 0, 1, 2, 3, 4, 5. Having zero
nonzero numbers means that we have the zip code 00000 which is probably not realistic thus
X = 0 is not an allowable outcome. There maybe other restrictions on the numerical form
that a valid zip code can take.
57
Exercise 3.5
No. The mapping take more than one sample in the sample space to the same numerical
value.
Exercise 3.6
X would be 1, 2, 3, . . . , ∞. A few examples of experimental outcomes might be
L, RL, RSSL
and their X values are
1, 2, 4
Exercise 3.7
Part (a): The variable is discrete and ranges from 0 ≤ X ≤ 12.
Part (b): The variable is discrete and ranges from zero to the number of students in the
class.
Part (c): The variable is discrete and ranges from one to +∞ (where the golfer never hits
the ball).
Part (d): The variable is real and ranges from zero to the largest known length of a
rattlesnake.
Part (e): The variable is discrete and ranges from zero (for no books being sold) to the
largest known amount for sales (which is 10000c where c is the royalty per book), in increments of c.
Part (f): The variable is real and ranges over the range of the pH scale.
Part (g): The variable is real and ranges over the smallest to largest possible tension.
Part (h): The variable is discrete and ranges over the values three to ∞ (if a match) is
never obtained.
58
Exercise 3.8
We can get Y = 3 for the outcome
SSS .
We can get Y = 4 for the outcome
FSSS .
We can get Y = 5 for the outcomes
FFSSS , SFSSS .
We can get Y = 6 for the outcomes
SSFSSS , FSFSSS , SFFSSS , FFFSSS .
We can get Y = 7 for the outcomes
FFFFSSS , FFSFSSS , FSFFSSS , SFFFSSS , SSFFSSS , SFSFSSS , FSSFSSS .
Exercise 3.9
Part (a): X would take the values 2, 4, 6, . . . .
Part (b): X would take the values 2, 3, 4, 5, 6, . . . .
Exercise 3.10
Part (a): T would have a range from 0, 1, . . . , 9, 10.
Part (b): X would have a range from −3, −2, . . . , 4, 5.
Part (c): U would have a range from 0, 1, 2, . . . , 5, 6.
Part (d): Z would have a range from 0, 1, 2.
Exercise 3.11
Part (a): This would be P (X = 4) = 0.45, P (X = 6) = 0.4 and P (X = 8) = 0.15.
Part (c): This would be P (X ≥ 6) = 0.4 + 0.15 = 0.55 and P (X > 6) = P (X = 8) = 0.15.
59
Exercise 3.12
Part (a): This would be P (Y ≤ 50) = sum(0.05, 0.1, 0.12, 0.14, 0.25, 0.17) = 0.83.
Part (b): This would be P (Y > 50) = 1 − P (Y ≤ 50) = 0.17.
Part (c): If we are the first on the standby list then there must be at least one seat available
or P (Y ≤ 49) = 0.66. If we are the third person on the standby list then there must be at
least three seats available or P (Y ≤ 50 − 3) = P (Y ≤ 47) = 0.27.
Exercise 3.13 (phone lines in use)
Part (a): This would be P (X ≤ 3) = 0.1 + 0.15 + 0.2 + 0.25 = 0.7.
Part (b): This would be P (X < 3) = 0.1 + 0.15 + 0.2 = 0.45.
Part (c): This would be P (X ≥ 3) = 0.25 + 0.2 + 0.06 + 0.04 = 0.55.
Part (d): This would be P (2 ≤ X ≤ 5) = 0.2 + 0.25 + 0.2 + 0.06 = 0.71.
Part (e): This would be P 1 − P (2 ≤ X ≤ 4) = 1 − (0.2 + 0.25 + 0.2) = 0.35.
Part (f): This would be P (X = 0) + P (X = 1) + P (X = 2) = 0.1 + 0.15 + 0.2 = 0.45.
Exercise 3.14
Part (a): We must have k such that
P5
y=1
ky = 1. Solving for k we get k =
Part (b): This would be P (Y ≤ 3) = k(1 + 2 + 3) =
Part (c): This would be P (2 ≤ Y ≤ 4) =
Part (d): We would need to check that
P
2
p(y) we find 5y=1 y50 = 1.1 6= 1 so no.
1
(2
15
P5
y=1
1
(6)
15
1
.
15
= 25 .
+ 3 + 4) = 53 .
p(y) = 1. If we put in the suggested form for
Exercise 3.15
Part (a): These would be the selections
(1, 2) , (1, 3) , (1, 4) , (1, 5) , (2, 3) , (2, 4) , (2, 5) , (3, 4) , (3, 5) , (4, 5) .
60
Part (b): From the above sample set we have that
1
10
3
6
=
P (X = 1) =
10
5
3
.
P (X = 0) =
10
P (X = 2) =
Part (c): We would have
3
10
9
F (1) =
10
F (2) = 1 .
F (0) =
Exercise 3.16
Part (a): We could tabulate the possible sequences of S and F and count or recognize that
X is a binomial random variable and thus
4
0.3x 0.74−x for 0 ≤ x ≤ 4 .
P (X = x) =
x
Part (c): Evaluating the above and looking for the largest value of P (X = x) we see that
it is when x = 1.
Part (d): This would be P (X ≥ 2) = 0.3483.
Exercise 3.17
Part (a): This would be p(2) = 0.92 = 0.81
Part (b): This would be p(3) = 2(0.1)(0.9)2 = 0.162.
Part (c): If Y = 5 we must have the fifth battery be acceptable. Thus in the previous four
we need to have one other acceptable batter. These would be the events
UUUAA , UUAUA , UAUUA , AUUU A .
Thus p(5) = 4(0.1)3 (0.9)2 = 0.00324.
Part (d): p(y) = (y − 1)(0.1)y−2(0.9)2 .
61
Exercise 3.18
Part (a): To solve this part we can make a matrix where the outcome of the first die
corresponding to a row and the value of the second die corresponds to a column and the
matrix hold the value of the maximum of these two elements. We can then count the number
of times each possible maximum value occurs. When we do that we get
p(1) =
p(2) =
p(3) =
p(4) =
p(5) =
p(6) =
1
36
3
36
5
36
7
36
9
36
11
.
36
Part (b): This would be
1
36
4
F (2) =
36
9
F (3) =
36
16
F (4) =
36
25
F (5) =
36
F (6) = 1 .
F (1) =
Exercise 3.19
We can construct a matrix where the rows represent the day (Wednesday, Thursday, Friday,
or Saturday) when the first magazine arrives and the columns represent the day when the
second magazine arrives. Then from the given probabilities on each day and assuming a
product model for the joint events we have
P (Y = 0) = 0.09
P (Y = 1) = 0.12 + 0.16 + 0.12 = 0.4
P (Y = 2) = 0.06 + 0.08 + 0.04 + 0.08 + 0.06 = 0.32
P (Y = 3) = 0.03 + 0.04 + 0.02 + 0.03 + 0.04 + 0.02 + 0.01 = 0.19 .
P
Note that 3y=0 P (Y = y) = 1 as it must.
62
Exercise 3.20
Part (a): Following the hint, we will label the couples as #1, #2, and #3 and the two
individuals as #4 and #5.
• To have no one arrive late X = 0 will happen with probability 0.65 .
• To have only one person arrive late X = 1 will happen if either #4 or #5 arrives late
and thus with a probability 2(0.4)0.64 .
• To have two people arrive late X = 2 will happen if either #1, #2, or #3 arrives
late or both #4 and #5 arrives late. This combined event happens with a probability
3(0.4)0.64 + 0.42 0.63 .
• To have three people arrive late X = 3 will happen if one of #1, #2, or #3 arrives late
and one of #4 or #5 arrives last. This will happen with a probability of 6(0.4)2 (0.6)3 .
• To have four people arrive late X = 4 will happen if two of #1, #2, or #3 are late
with #4 and #5 on time or one of #1, #2, or #3 arrive late with both of #4 and #5
late. This happens with a probability of
3
3
2
3
0.43 0.62 .
0.4 0.6 +
1
2
• To have five people arrive late X = 5 will happen if two of #1, #2, or #3 are late
with one of #4 and #5 also late. This will happen with a probability 2 32 0.43 0.62 .
• To have six people arrive late will happen if all of #1, #2, and #3 are late with #4
and #5 on time or two of #1, #2, and #3 are late with both of #4 and #5 late. This
will happen with probability
3
3
2
0.42 0.6(0.4)2 .
0.4 0.6 +
2
• To have seven people arrive late will happen if all of #1, #2, and #3 are late and one
of #4 and #5 are late. This will happen with probability
20.43 0.6(0.4) .
• To have eight people arrive late will happen with probability of 0.45 .
Once can check that
given by
P8
x=0 P (X
= x) = 1 as it should. As an R array these probabilities are
[1] 0.07776 0.10368 0.19008 0.20736 0.17280 0.13824 0.06912 0.03072 0.01024
63
Exercise 3.21
Part (a): Using R we could compute p(x) as
xs = 1:9
ps = log( 1 + 1/xs, base=10 )
cdf = cumsum(ps)
which gives
[1] 0.30103000 0.17609126 0.12493874 0.09691001 0.07918125 0.06694679 0.05799195
[8] 0.05115252 0.04575749
These numbers are to be compared with 1/9 = 0.1111111. Notice that starting with a one
has a much higher probability than 1/9.
Part (b): This is given by
[1] 0.3010300 0.4771213 0.6020600 0.6989700 0.7781513 0.8450980 0.9030900
[8] 0.9542425 1.0000000
Part (c): This would be F (3) = 0.6020600 and P (X > 5) = 1 − P (X ≤ 4) = 1 − F (4) =
1 − 0.6989700 = 0.30103.
Exercise 3.23
Part (a): p(2) = F (2) − F (1) = 0.39 − 0.19 = 0.2.
Part (b): P (X > 3) = 1 − P (X ≤ 2) = 1 − 0.39 = 0.61.
Part (c): P (2 ≤ X ≤ 5) = F (5) − F (1) = 0.97 − 0.19 = 0.78.
Part (d): P (2 < X < 5) = P (3 ≤ X ≤ 4) = F (4) − F (2) = 0.92 − 0.39 = 0.53.
64
Exercise 3.24
Part (a): This would be
Part (b): These would be

0.3 x = 1




 0.1 x = 3
0.05 x = 4 .
p(x) =


0.15 x = 6



0.4 x = 12
P (3 ≤ X ≤ 6) = F (6) − F (1) = 0.6 − 0.3 = 0.3
P (4 ≤ X) = 1 − P (X ≤ 3) = 1 − 0.4 = 0.6 .
Exercise 3.25
We would have
P (Y = 0) =
P (Y = 1) =
P (Y = 2) =
..
.
P (Y = y) =
p
(1 − p)p
(1 − p)2 p
(1 − p)y p .
Exercise 3.26
Part (a): Alvie will visit at least one friend since he moves from the center to either A, B, C,
or D on the first step. There he might go back to the center (only visiting one friend) or he
might go to another friend. Logic like this gives rise to the following probability distribution
P (X = 0) = 0
1
P (X = 1) =
3 2 1
P (X = 2) =
3 3
2
2 1
P (X = 3) =
3 3
3
2 1
P (X = 4) =
3 3
..
.
x−1
2
1
P (X = x) =
.
3
3
65
Part (b): Alvie will have to cross at at least two segments (visiting a friend and then coming
back home). He can cross three total segments if after visiting the first friend he visits one
more and then goes home. Logic like this gives rise to the following probability distribution
P (Y = 1) = 0
1
P (Y = 2) =
3 2 1
P (Y = 3) =
3 3
2
2 1
P (Y = 4) =
3 3
3
2 1
P (Y = 5) =
3 3
..
.
y−2
1
2
.
P (Y = y) =
3
3
Exercise 3.27 (the matching distribution)
Part (a-b): See the python code ex3 27.py where we implement this. When we run that
code we get the following output
counts of number of different matches= Counter({0: 9, 1: 8, 2: 6, 4: 1})
probability of different n_matches values=
{0: 0.375, 1: 0.3333333333333333, 2: 0.25, 4: 0.041666666666666664}
This indicates that there is a probability of 0.375 that there will be no matches (there were
9 permutations of the numbers 1 − 4 with no matches), a probability 0.33333 that there will
be one matches (there were 8 permutations of the numbers 1 − 4 with one matches) etc.
Exercise 3.28
From the definition of the cumulative distribution function we have
X
X
X
p(y) .
p(y) +
p(y) =
F (x2 ) =
y:y≤x2
y:y≤x1
y:x1 <y≤x2
As p(y) ≥ 0 for all y, the second sum in the above expression is nonnegative and the first
sum is the definition of F (x1 ). Thus we have
P shown that F (x2 ) ≥ F (x1 ). We will have
F (x1 ) = F (x2 ) if this second sum is zero or y:x1 <y≤x2 p(y) = 0.
66
Exercise 3.29
Part (a): From the numbers given we get E(X) = 2.06.
Part (b): From the numbers given we get Var (X) = 0.9364.
Part (c): This would be the square root of the above or 0.9676776.
Part (d): This gives the same answer as in Part (b) as it must.
We computed these using the R code
xs = 0:4
ps = c( 0.08, 0.15, 0.45, 0.27, 0.05 )
ex = sum( ps * xs )
v_x = sum( ps * ( xs - ex )^2 )
ex2 = sum( ps * xs^2 )
v_x_2 = ex2 - ex^2
c( ex, v_x, sqrt(v_x), v_x_2 )
Exercise 3.30
Part (a): From the numbers given we get E(Y ) = 0.6.
Part (b): From the numbers given we get E(100Y 2 ) = 110.
Exercise 3.31
From exercise 12 on Page 60 we compute Var (Y ) = 4.4944 and σY = 2.12. Then the
probability that Y is within one standard deviation of the mean is 0.65. These were calculated
with the following R code
ys = 45:55
ps = c( 0.05, 0.1, 0.12, 0.14, 0.25, 0.17, 0.06, 0.05, 0.03, 0.02, 0.01 )
ey = sum( ys * ps )
ey2 = sum( ys^2 * ps )
var_y = ey2 - ey^2
sqrt( var_y )
67
inds = abs( ys - ey ) < sqrt( var_y )
sum( ps[inds] )
Note that this answer is different than the one in the back of the book. If anyone sees
anything wrong with what I have done please contact me.
Exercise 3.32
Part (a): For the given numbers we have E(X) = 16.3800, E(X 2 ) = 272.2980 and
Var (X) = 3.9936.
Part (b): This would be 25E(X) − 8.5 = 401.
Part (c): This would be 252 Var (X) = 2496.
Part (d): This would be E(X) − 0.01E(X 2 ) = 13.65702.
Exercise 3.33
Part (a):
E(X 2 ) = 02 (1 − p) + 12 p = p .
Part (b):
Var (X) = E(X 2 ) − E(X)2 = p − p2 = p(1 − p) .
Part (c):
E(X 79 ) = 079 (1 − p) + 179 p = p .
Exercise 3.34
To have E(X) finite we would need to be able to evaluate the following sum
∞
∞
c
X
X
1
.
E(X) =
x 3 =c
x
x2
x=1
x=1
This sum exists and thus E(X) is finite.
68
Exercise 3.35
Let R3 be the revenue if we order three copies. Then we have
R3 = −3 + 2 min(X, 3) .
Taking the expectation of this we get E[R3 ] = 2.466667. The same type of calculation if
we order four gives E[R4 ] = 2.666667. Thus we should order four if we want the largest
expected revenue.
Exercise 3.36
The policy profit in terms of X the claims X is given by
policy profit = policy cost − max(X − 500, 0) .
If the company wants to have an expected profit of 100 then taking the expectation of the
above gives
100 = policy cost − E[max(X − 500, 0)] .
From the numbers given using the R code
xs = c(0, 1000, 5000, 10000)
ps = c( 0.8, 0.1, 0.08, 0.02 )
expectation = sum( ps * pmax( xs - 500, 0 ) )
we find the expectation to be 600. Thus solving for the policy cost we get that it should be
700.
Exercise 3.37
Using the summation formulas given we have
n
n(n + 1)
n+1
=
2
2
n
1X 2
1 n(n + 1)(2n + 1)
(n + 1)(2n + 1)
2
E(X ) =
x =
=
n x=1
n
6
6
1X
1
E(X) =
x=
n x=1
n
Var (X) = E(X 2 ) − E(X)2 =
n2 − 1
,
12
when we use the two expressions above and simplify.
69
Exercise 3.38
We want to know if
E
1
X
>
1
.
3.5
If it is then we should gamble and if not we should take the fixed amount. We find this
expectation given by
6
1X1
1
= 0.4083333 >
= 0.2857143 ,
6 x=1 x
3.5
thus one should gamble.
Exercise 3.39
Using the numbers given in the book we compute E(X) = 2.3 and Var (X) = 0.81. The
number of pounds left after selling X lots is 100 − 5X. Then the expected number of pounds
left is 88.5 with a variance of 20.25.
Exercise 3.40
Part (a): A plot of the pmf p(−X) would be the same as a plot of p(X) but reflected about
the X = 0 axis. Since the spread of these two distributions is the same we would conclude
that Var (X) = Var (−X).
Part (b): Let a = −1 and b = 0 to get
Var (−X) = (−1)2 Var (X) = Var (X) .
Exercise 3.41
Expression 3.13 from the book is
Var (h(X)) =
X
x∈D
{h(x) − E[h(x)]}2 p(x) .
Following the hint let h(X) = aX + b then E[h(X)]] = aE[X] + b = aµ + b and
X
X
2
(a(x − µ))2 p(x) = a2
Var (h(X)) =
(x − µ)2 p(x) = a2 σX
.
D
D
70
(10)
Exercise 3.42
Part (a): Following the hint we have E(X(X −1)) = E(X 2 )−5 so E(X 2 ) = 27.5+5 = 32.5.
Part (b): Var (X) = 32.5 − 25 = 7.5.
Part (c): We have E(X 2 ) = E(X(X − 1)) + E(X) so
Var (X) = E(X 2 ) − E(X)2 = E(X(X − 1)) + E(X) − E(X)2 .
Exercise 3.43
We have
If c = µ then E(X − c) = 0.
E(X − c) = E(X) − c = µ − c .
Exercise 3.44
Part (a): For these values of k we find upper bounds given by
[1] 0.2500000 0.1111111 0.0625000 0.0400000 0.0100000
Part (a): Using Exercise 13 on Page 60 we compute µ = 2.64 and σ = 1.53961 and then
P (|X − µ| ≥ kσ) for the values of k suggested to get
[1]
[1]
[1]
[1]
[1]
"k=
"k=
"k=
"k=
"k=
2,
3,
4,
5,
10,
P(|X-mu|>=k
P(|X-mu|>=k
P(|X-mu|>=k
P(|X-mu|>=k
P(|X-mu|>=k
sigma)=
sigma)=
sigma)=
sigma)=
sigma)=
These suggest that the upper bound of
1
k2
0.040000"
0.000000"
0.000000"
0.000000"
0.000000"
is relatively loose.
Part (c): For this given distribution we find
1
8
(−1) + (0) +
18
9
1
8
E(X 2 ) = (+1) + (0) +
18
9
1
1
Var (X) =
so σ = .
9
3
µ=
71
1
(+1) = 0
18
1
1
(+1) =
18
9
Then we find
1
1
2
= ≤ .
18
9
9
This shows that the upper bound in Chebyshev’s inequality can sometimes be achieved.
P (|X − µ| ≥ 3σ) = P (|X| ≥ 1) =
Part (d): To do this we let p(x) be given by
 1 1
 2 25 x = −1
24
x=0
p(x) =
 1 251 x = +1
2 25
1
1
Then with this density we have E(X) = 0, E(X 2 ) = 50
(2) = 25
thus σ = 51 . With these we
have
1
P (|X − µ| ≥ 5σ) = P (|X| ≥ 1) =
= 0.04 ,
25
as we were to show.
Exercise 3.45
We have
E(X) =
X
x∈D
xp(x) ≤
X
bp(x) = b ,
x∈D
since x ≤ b for all x ∈ D. In the same way since x ≥ a for all x in D we have
X
X
ap(x) ≤
xp(x) so a ≤ E(X) .
x∈D
x∈D
Exercise 3.46
We will use R notation to evaluate these. We would compute
c( dbinom(3,8,0.35), dbinom(5,8,0.6), sum(dbinom(3:5,7,0.6)), sum(dbinom(0:1,9,0.1)) )
[1] 0.2785858 0.2786918 0.7451136 0.7748410
Exercise 3.47
We will use R notation to evaluate these. For Part (a)-(d) we would compute
c( pbinom(4,15,0.3), dbinom(4,15,0.3), dbinom(6,15,0.7), pbinom(4,15,0.3) - pbinom(1,15,0.3) )
[1] 0.51549106 0.21862313 0.01159000 0.48022346
For Part (e)-(g) we would compute
72
c( 1-pbinom(1,15,0.3), pbinom(1,15,0.7), pbinom(5,15,0.3) - pbinom(2,15,0.3) )
[1] 9.647324e-01 5.165607e-07 5.947937e-01
Exercise 3.48
Part (a): P (X ≤ 2) is given by pbinom(2,25,0.05) = 0.8728935.
Part (b): P (X ≥ 5) = 1−P (X ≤ 4) is given by 1 - pbinom(4,25,0.05) = 0.007164948.
Part (c): We would compute this using
P (1 ≤ X ≤ 4) = P (X ≥ 4) − P (X ≥ 0) ,
This is given in R by pbinom(4,25,0.05) - pbinom(0,25,0.05) = 0.7154455.
Part (d): This would be given by pbinom(0,25,0.05) = 0.2773896.
Part (e): We have E(X) = np = 25(0.05) = 1.25 and
Var (X) = np(1 − p) = 1.1875 so SD (X) = 1.089725 .
Exercise 3.49
Using R to evaluate these probabilities we have
Part (a): This is given by
dbinom(1,6,0.1)
[1] 0.354294
Part (b): We want to evaluate P (X ≥ 2) = 1 − P (X ≤ 1). In R this is
1 - pbinom(1,6,0.1)
[1] 0.114265
Part (c): Let X be the number of goblets examined to find four that are in fact “good”
(i.e. not defective). Then we want to compute P (X ≤ 5). We can have the event X ≤ 5
happen if the first four goblets are “good” or there is one defective goblet in the first four
but the fifth is “good”. These to mutually exclusive events have probabilities
P (X = 4) = 0.94 = 0.6561
P (X = 5) = P (one defective goblet in first four examined)P (fifth goblet is good)
4
3
(0.1)(0.9 ) 0.9 = 0.26244 .
=
1
73
Taking the sum of these two numbers we have P (X ≤ 5) = 0.91854.
Exercise 3.50
Let X be the number of fax messages received. Then using R to evaluate these probabilities
we have the following
Part (a): P (X ≤ 6) we use pbinom(6,25,0.25)=0.5610981.
Part (b): P (X = 6) we use dbinom(6,25,0.25)=0.1828195.
Part (c): P (X ≥ 6) = 1 − P (X ≤ 5) we use 1 - pbinom(5,25,0.25)=0.6217215.
Part (d): P (X > 6) = P (X ≥ 7) = 1−P (X ≤ 6) we use 1 - pbinom(6,25,0.25)=0.4389019.
Exercise 3.51
Again let X be the number of fax messages received. Then we have
Part (a): E(X) = np = 25(0.25) = 6.25.
Part (b): Var (X) = np(1 − p) = 25(0.25)(0.75) = 4.6875, thus SD (X) =
2.165064.
p
Var (X) =
Part (c): For this part we want to evaluate P (X ≥ 6.25 + 2(2.165064) = 10.58013). We
can evaluate this using
sum( dbinom( 11:25, 25, 0.25 ) )
[1] 0.02966991
Exercise 3.52
Let X be the number of students that buy a new textbook.
Part (a): E(X) = np = 25(0.3) = 7.5 and SD (X) =
2.291288.
p
np(1 − p) =
p
25(0.3)(0.7) =
Part (b): Using the above we have E(X) + 2SD (X) = 12.08258 so that the probability we
74
want is given by
P (X > 12.08258) =
25
X
b(x; 25, 0.3) = 0.01746974 .
x>12.08258
Part (c): We are told that there are 15 new and 15 old textbooks in stock. Let X be
the number of students (from the 25) that want a new textbook. Then Y = 25 − X is the
number of students that want a used textbook. We want to know for what values of X we
have
0 ≤ X ≤ 15 and 0 ≤ Y ≤ 15 .
Using the known expression for Y in terms of X we have that the second inequality above
is equivalent to
10 ≤ X ≤ 25 .
Intersecting this condition with that from earlier (i.e. 0 ≤ X ≤ 15) we see that the condition
we want to determine the probability for is given by
10 ≤ X ≤ 15 .
We can check the end points of this range to make sure that the constraints on the quantities
of books holds. For example, if X = 10 then Y = 15 and we buy 10 new textbooks and
15 used textbooks. If X = 15 then Y = 10 and we buy 15 new textbooks and 10 used
textbooks. Each of these statements is possible. The probability we want to evaluate is then
given by
15
X
b(k; 25, 0.3) = 0.1889825 .
k=10
Part (d): The expression for the revenue when we sell X new textbooks and Y old textbooks
is
h(X) = 100X + 70Y = 100X + 70(25 − X) = 70(25) + 30X = 1750 + 30X .
Thus the expectation of revenue is then given by
25
X
h(x)b(x; 25, 0.3) = 1750 + 30E(X) = 1750 + 30(25)(0.3) = 1975 .
x=0
Exercise 3.53
Part (a): Let X be the number of individuals with no traffic citations (in three years). We
would like to compute P (X ≥ 10). From Exercise 30 Section 3.3 the probability a given
individual haves no citations (in last three years) is 0.6. The probability we then want is
given by (in R notation)
P (X ≥ 10) = 1 − pbinom(9, 15, 0.6) = 0.4032156 .
75
Part (b): Let Y = 15 − X then Y is the random variable representing the number with at
least one citation. We want to evaluate
P (Y ≤ 7) = pbinom(7, 15, 0.4) = 0.7868968 .
Part (c): For this we want
P (5 ≤ Y ≤ 10) = pbinom(10, 15, 0.4) − pbinom(4, 15, 0.4) = 0.7733746 .
Exercise 3.54
Let X be the random variable representing the number of customers who want/prefer the
oversized version of the racket.
Part (a): For this part we want to P (X ≥ 6) = 1 − P (X ≤ 5) = 0.6331033
Part (b): We have
µX = np = 10(0.6) = 6
p
√
σX = npq = 10(0.6)(0.4) = 1.549193 .
Using these numbers we have that µX − σX = 4.450807 and µx + σX = 7.549193 so that we
next want to evaluate P (5 ≤ X ≤ 7). Using R we find this to be equal to 0.6664716.
Part (c): Let Y be the number of customers that get the midsized version of the racket. In
terms of X we know that Y = 10 − X. We then want to know for which values of X do we
have
0 ≤ X ≤ 7 and 0 ≤ Y ≤ 7 .
Using the fact that Y = 10 − X we get that the second expression is equivalent to
3 ≤ X ≤ 10 .
Intersecting this with the requirement 0 ≤ X ≤ 7 we want to compute
P (3 ≤ X ≤ 7) = pbinom(7, 10, 0.6) − pbinom(2, 10, 0.6) = 0.8204157 .
Exercise 3.55
Our phone must first be submitted for service which happens with a probability of 20%
and then once submitted there is a 40% chance that it will be replaced. Thus the initial
probability that a phone is replaces is then 0.2(0.4) = 0.08. The probability that two from
ten will be replaced is given by (in R) as dbinom(2,10,0.08) = 0.147807.
76
Exercise 3.56
Let X be the random variable representing the number of students (from 25) that received
special accommodation.
Part (a): This would be P (X = 1) = dbinom(1, 25, 0.02) = 0.3078902.
Part (b): This would be P (X ≥ 1) = 1 − P (X = 0) = 1 − dbinom(0, 25, 0.02) = 0.3965353.
Part (c): This would be P (X ≥ 2) = 1 −P (X = 0) −P (X = 1) = 1 −pbinom(1, 25, 0.02) =
0.0886451.
√
Part (d): We have µX = np = 25(0.02) = 0.5 and σX = npq = 0.7. With these we have
that µX − 2σX = −0.9 and µX + 2σX = 1.9. Thus the probability we want to evaluate is
given by P (0 ≤ X ≤ 1) = 0.9113549.
Part (e): As before X is the number of students that are allowed special accommodations
and let Y be the number of students not allowed special accommodations then Y = 25 − X.
The total exam time T is given by
T = 3Y + 4.5X = 3(25 − X) + 4.5X = 75 + 1.5X .
Thus the expectation of T is given by
E(T ) = 75 + 1.5E(X) = 75 + 1.5(25)(0.02) = 75.75 ,
hours.
Exercise 3.57
Both batteries will work with probability 0.92 = 0.81. The probability a flashlight works is
then 0.81. If we let X be the random variable specifying the number of flashlights that work
from n = 10 then X is a binomial random variable with n = 10 and p = 0.81. Thus we
conclude that
P (X ≥ 9) = 1 − P (X ≤ 8) = 1 − pbinom(8, 10, 0.81) = 0.4067565 .
Exercise 3.58
Let X denote the number of defective components in our batch of size n = 10.
Part (a): If the actual proportion of defectives is p = 0.01 then the probability we accept a
given batch is given by
P (X ≤ 2) = pbinom(2, 10, 0.01) = 0.9998862 .
77
0.6
0.4
0.0
0.2
acceptance probability
0.8
1.0
P(X<=2;10,p)
P(X<=1;10,p)
P(X<=2;15,p)
P(X<=1;15,p)
0.0
0.2
0.4
0.6
0.8
1.0
p (proportion of defective)
Figure 1: Operating characteristic curves for Exercise 58.
For the other values of p the values of the above expression are given by
[1] 0.9884964 0.9298092 0.6777995 0.5255928
Part (b-d): For this part we plot P (batch is accepted) = P (X ≤ x) = pbinom(x, n, p) for
different values of x and n as a function of p in Figure 1.
Part (e): Of the choices it looks like sampling with n = 15 and X ≤ 1 is the “best” in that
it has a curve with the lowest acceptance probability when p ≥ 0.1, which is the range of p
under which there are too many defective components.
Exercise 3.59
Part (a): P (reject claim) = P (X ≤ 15) = pbinom(15, 25, 0.8) = 0.01733187.
Part (b):
P (not reject claim) = 1 − P (reject claim) = 1 − P (X ≤ 15)
= 1 − pbinom(15, 25, 0.7) = 0.810564 .
If p = 0.6 then the above becomes 1 − pbinom(15, 25, 0.6) = 0.424617.
78
Exercise 3.60
Let X be the number of passenger cars then Y = 25 − X is the number of other vehicles.
Our revenue h(X) is given by
h(X) = X + 2.5(25 − X) = 62.5 − 1.5X .
Then the expectation of the above is given by
E(h(X)) = 62.5 − 1.5E(X) = 62.5 − 1.5(25)(0.6) = 40 .
Exercise 3.61
We compute the probability of a good paper depending on if our student picks topic A or
topic B
P (good paper) = P (A)P (X ≥ 1|A) + P (B)P (X ≥ 2|B) .
Here X are the number of books that arrive from inter-library loan. We can compute
P (X ≥ 1|A) = 1 − pbinom(0, 2, 0.9) = 0.99
P (X ≥ 2|B) = 1 − pbinom(1, 4, 0.9) = 0.9963 .
If we assume that we can pick P (A) or P (B) to be zero or one to maximize P (good paper)
the student should choose the larger of P (X ≥ 1|A) or P (X ≥ 2|B) and thus should choose
topic B. If p = 0.5 then the above values become
P (X ≥ 1|A) = 1 − pbinom(0, 2, 0.5) = 0.75
P (X ≥ 2|B) = 1 − pbinom(1, 4, 0.5) = 0.6875 .
Thus in this case the student should choose topic A.
Exercise 3.62
Part (a): Since Var (X) = np(1 − p) we will have Var (X) = 0 if p = 0 or p = 1 which
means that the result of the experiment is deterministic and not really random.
Part (b): We compute
1
dVar (X)
= n(1 − p − p) = n(1 − 2p) = 0 so p = ,
dp
2
when Var (X) is maximized.
79
Exercise 3.63
We first recall that
n x
p (1 − p)n−x .
b(x, n, p) =
x
Part (a): Consider
n
n
x n−x
pn−x (1 − p)x = b(n − x; n, p) .
(1 − p) p
=
b(x; n, 1 − p) =
n−x
x
Part (b): For this part consider
B(x; n, 1 − p) =
x
X
k=0
=1−
b(k; n, 1 − p) =
n
X
k=x+1
n
X
k=0
b(k; n, 1 − p) −
n
X
k=x+1
b(k; n, 1 − p)
b(k; n, 1 − p) .
Now by Part (a) b(x; n, 1 − p) = b(n − x; n, p) so we have the above equal to
1−
n
X
k=x+1
b(n − k; n, p) .
Now let v = n − k then the limits of the above summation become
k = x+1⇒v =n−x−1
k = n ⇒ v = 0,
to give
1−
n−x−1
X
v=0
b(v; n, p) = 1 − B(n − x − 1; n, p) .
Part (c): We don’t need p > 0.5 since if p > 0.5 we can transform the expression we need
to evaluate into one with p < 0.5.
80
Exercise 3.64
We have
E(X) =
n
X
x=0
n
X
x
xb(x; n, p) =
x=0
n X
n!
(n − x)!x!
px (1 − p)n−x
n!
=
px (1 − p)n−x
(n
−
x)!(x
−
1)!
x=1
n X
(n − 1)!
= np
px−1 (1 − p)n−x
(n − x)!(x − 1)!
x=1
n
X
(n − 1)!
px−1 (1 − p)n−1−(x−1) .
= np
(n
−
1
−
(x
−
1))!(x
−
1)!
x=1
Let y = x − 1 and the above becomes
E(X) = np
n−1 X
y=0
= np
n−1
X
y=0
(n − 1)!
py (1 − p)n−1−y
(n − 1 − y)!y!
b(y; n − 1, p) = np ,
as we were to show.
Exercise 3.65
Part (a): People can pay with a debit card (with a probability 0.2) or something else (also
with a probability of 0.8). Let X represent the number of people who pay with a debit card.
Then
E(X) = np = 100(0.2) = 20
Var (X) = npq = 100(0.2)(0.8) = 16 .
Part (b): The probability a person does not pay with cash is given by 1 − 0.3 = 0.7. If we
let Y be the number of people who don’t pay with cash we have
E(Y ) = 100(0.7) = 70
Var (Y ) = 100(0.7)(0.3) = 21 .
Exercise 3.66
Part (a): Let X be the number of people that actually show up for the trip (and have
reservations). Then X is a binomial random variable with n = 6 and p = 0.8. If X > 4 i.e.
81
X ≥ 5 then at least one person cannot be accommodated since there are only four seats.
The probability this happens is given by
P (X ≥ 5) = 1 − P (X ≤ 4) = 1 − pbinom(4, 6, 0.8) = 0.65536 .
Part (b): We have E(X) = 6(0.8) = 4.8 so the expected number of available places is
4 − 4.8 = −0.8.
Part (c): Let Y be the number of passengers that show up to take the trip. Then to
compute P (Y = y) we have to condition on the number of reservations R. For example for
Y = 0 we have
P (Y = 0) = P (Y = 0|R = 3)P (R = 3) + P (Y = 0|R = 4)P (R = 4)
+ P (Y = 0|R = 5)P (R = 5) + P (Y = 0|R = 6)P (R = 6)
= dbinom(0, 3, 0.8)(0.1) + dbinom(0, 4, 0.8)(0.2)
+ dbinom(0, 5, 0.8)(0.3) + dbinom(0, 6, 0.8)(0.4) = 0.0012416 .
Doing the same for the other possible values for y ∈ {0, 1, 2, 3, 4, 5, 6} we get the probabilities
[1]
[1]
[1]
[1]
[1]
[1]
[1]
"P(y=0)=
"P(y=1)=
"P(y=2)=
"P(y=3)=
"P(y=4)=
"P(y=5)=
"P(y=6)=
0.001242"
0.017254"
0.090624"
0.227328"
0.303104"
0.255590"
0.104858"
Now since we can only take four people if more than four show up we can only accommodate
four. Thus if we let Z be a random variable representing the number of people who actually
take the trip (extra people are sent away) then we have
[1]
[1]
[1]
[1]
[1]
"P(z=0)=
"P(z=1)=
"P(z=2)=
"P(z=3)=
"P(z=4)=
0.001242"
0.017254"
0.090624"
0.227328"
0.303104 + 0.255590 + 0.104858 = 0.663552"
Exercise 3.67
Recall that Chebyshev’s inequality is
P (|X − µ| ≥ kσ) ≤
82
1
.
k2
If X ∼ Bin(20, 0.5) then we have µ = 10 and σ = 2.236068 so that when k = 2 the
left-hand-side of the above becomes
P (|X − 10| ≥ 4.472136) = P (|X − 10| ≥ 5) = P (X − 10 ≥ 5 or 10 − X ≥ 5)
= P (X ≥ 15 or X ≤ 5) = P (X ≤ 5) + (1 − P (X ≤ 14)) = 0.04138947 .
If X ∼ Bin(20, 0.75) then the above becomes
P (|X − 15| ≥ 3.872983) = P (|X − 15| ≥ 4) = P (X ≥ 19 or X ≤ 11) = 0.06523779 .
This is to be compared with
way.
1
k2
=
1
4
= 0.25. The calculations for k = 3 are done in the same
Exercise 3.68
Part (a): X is a hypergeometric random variable with N = 15, M = 6, and n = 5. The
probability density function for this random variable is given by
6 15−6
h(x; n, M, N) = h(x; 5, 6, 15) =
x
5−x
15
5
for 0 ≤ x ≤ 5 .
Part (b): We can evaluate these using the R expressions
P (X = 2) = dhyper(2, 6, 15 − 6, 5) = 0.4195804
P (X ≤ 2) = phyper(2, 6, 15 − 6, 5) = 0.7132867
P (X ≥ 2) = 1 − phyper(1, 6, 15 − 6, 5) = 0.7062937 .
Part (c): Using the formulas in the book we compute µ = 2 and σ = 0.8783101.
Exercise 3.69
Part (a): We have this given by dhyper( 5, 7, 12-7, 6 ) = 0.1136364.
Part (b): We have this given by phyper( 4, 7, 12-7, 6 ) = 0.8787879.
Part (c): From the formulas in the book for a hypergeometric random variable the mean
and standard deviation of this distribution are given by µ = 3.5 and σ = 0.8141737 which
gives µ + σ = 4.314174. Thus we want to compute P (X ≥ 5) = 1 − P (X ≤ 4) = 0.1212121.
15
Part (d): In this case we have Nn = 400
= 0.0375 < 0.05 and M
= 0.1 which is not too close
N
to either 0 or 1 so we can use the binomial approximation to the hypergeometric distribution.
Using that approximation we compute
P (X ≤ 5) ≈ pbinom(5, 15, 0.1) = 0.9977503 .
83
Exercise 3.70
Part (a): If X is the number of second section papers then X is a hypergeometric random
variable with M = 30, N = 50, and n = 15. Thus using R we compute
M = 30 # second session numbers
N = 20 + M # total number of students
n = 15
x = 10
dhyper( x, M, N-M, n ) # gives 0.2069539
Part (b): This would be P (X ≥ 10) = 1 − P (X ≤ 9) and is given by
1 - phyper( x-1, M, N-M, n ) # gives 0.3798188
Part (c): In this case we could have at least 10 from the first or 10 from the second session.
This will happen with a probability of
( 1 - phyper( x-1, 30, 20, n ) ) + ( 1 - phyper( x-1, 20, 30, n ) )
which gives 0.3938039.
Part (d): In this case, using the formulas from the book we compute
m = n * ( M / N )
s = sqrt( n * ( M / N ) * ( M / N ) * ( ( N - M ) / ( N - 1 ) ) )
which give µ = 9 and σ = 1.484615.
Part (e): When we draw our fifteen papers this leaves 50 − 15 = 35 remaining papers to
grade. The number of second section papers in this group is another hypergeometric random
variable with M and N the same as before but now with n = 35. This distribution would
have mean and standard deviation given by
n = 35
m = n * ( M / N )
s = sqrt( n * ( M / N ) * ( M / N ) * ( ( N - M ) / ( N - 1 ) ) )
which give µ = 21 and σ = 2.267787.
84
Exercise 3.71
Part (a): The pmf of the number of granite specimens X is that of a hypergeometric random
variable with M = 10, N = 20, and n = 15 thus
10 M N −M
10
P (X = x) =
x
n−x
N
n
=
x
15−x
20
12
for 5 ≤ x ≤ 10 .
Part (b): This could happen if we get all granite or all basaltic rock i.e. x = 10. Since the
distribution of basaltic rock is the same as that of the granite rock the probability requested
would be given by
2 * dhyper( 10, 10, 10, 15 ) # gives 0.03250774
Part (c): For the number of granite rocks we find of µ − σ = 6.095121 and µ + σ = 8.904879
thus the probability we want to calculate is P (7 ≤ X ≤ 8). Using R we find this to be
0.6965944.
Exercise 3.72
Part (a): Here X (the number of the top four candidates interviewed on the first day) is a
hypergeometric random variable with N = 11 (the total number interviewed), M = 4 (the
number of special candidates), and n = 6 (the number selected for first day interviews) and
thus the pmf for X has a hypergeometric form.
Part (b): We want the expected number of the top four candidates interviewed on the first
day. This is given by
M =
N =
n =
n *
[1]
4
11
6
( M / N )
2.181818
Exercise 3.73
Part (a): Note that X, the number of the top ten pairs that are found playing east-west,
is a hypergeometric random variable with M = 10, N = 20, and n = 10.
Part (b): The number of top five pairs that end up playing east-west is another hypergeometric random variable this time with M = 5, M = 20, and n = 10, so the probability that
all five end up playing east-west is
85
dhyper( 5, 5, 15, 10 ) # gives 0.01625387
We could also have all five top pairs play north-south with the same probability. This gives
a probability that the top five pairs end up playing the same direction is then given by
2(0.01625387) = 0.03250774.
Part (c): Assume we have 2˜
n pairs (we use the notation n
˜ to not be confused with the
parameter n found in the hypergeometric pmf). In this more general case we have another
hypergeometric random variable with M = n
˜ , N = 2˜
n, and n = n
˜ . Thus This distribution
has an expectation and variance given by
n
˜
n
˜
=
µ=n
˜
2˜
n
2
n
˜
n
˜2
2˜
n−n
˜
n
˜
1−
=
.
Var (X) = n
˜
2˜
n
2˜
n
2˜
n−1
4(2˜
n − 1)
Exercise 3.74
Part (a): This is a hypergeometric random variable with N = 50, M = 15, and n = 10.
Part (b): We can approximate a hypergeometric random variable with a binomial random
150
variable with p = M
= 500
= 0.3 if this p is not too close to 0 or 1 (which it is not) and if
N
10
n
= 500 = 0.02 < 0.05 (which it is). We would have n = 10 in this binomial approximation.
N
Part (c): We would have for the exact and approximate pmf
M
150
µ=n
= 10
= 3.
N
500
For the exact pmf we have
2
σ =n
M
N
M
1−
N
N −M
N −1
= 1.472946 ,
while for the approximate binomial pmf we have
σ 2 = 2.1 .
Exercise 3.75
Part (a): For this problem we can model a success is having a girl child then X the number
of boys in the family is the number of failures before we have r = 2 success. Thus the pmf
for X is a negative binomial and we have
x+1 2
x+2−1 2
x
p (1 − p)x = (x + 1)p2 (1 − p)x ,
p (1 − p) =
P (X = x) =
1
2−1
86
for x = 0, 1, 2, . . . .
Part (b): With four children we must have two boys (with the two required girls) so X = 2
and we want to evaluate the negative binomial density with x = 2. In R this is in the
stats::NegBinomial library. Once this library is loaded you can access the probability
mass functions in the normal way i.e. dnbiom for the negative binomial density. Thus we
get
library(stats)
dnbinom(2,2,0.5)
[1] 0.1875
Part (c): The statement “at most four children” means we have at most two boys and this
probability is given by
pnbinom(2,2,0.5)
[1] 0.6875
Part (d): The expected number of failures is
E(X) =
2(1/2)
r(1 − p)
=
= 2.
p
(1/2)
Thus the expected number of children is given by 2 + 2 = 4.
Exercise 3.76
Let Y be the number of boys had before the third girl is had. Then Y is a negative binomial
random variable with r = 3 and p = 1/2. Let X be the total number of children then
X = Y + r and so the pmf of X is another negative binomial random variable but with
different arguments (the arguments of X are the offsets of Y but increased by r).
Exercise 3.77
Here X = X1 + X2 + X3 which shows X is a sum of three negative binomial random variables
each with r = 2 and p = 1/2. From this decomposition we have
3(2)(1/2)
r(1 − p)
=
= 6.
E(X) = 3
p
(1/2)
The expected number of male children born to each couple is
born to all couples.
87
r(1−p)
p
= 2 or 1/3 the average
Exercise 3.78
Each “double” will happen with a probability of 1/36. Let Y be the number of failures
before we roll five “doubles”. Then Y is a negative binomial random variable with p = 5
and p = 1/36. Let X = Y + 5 or the total number of die rolls. We have that
E(X) = E(Y ) + 5 =
r(1 − p)
5(35/36)
+5=
+ 5 = 180 ,
r
(1/36)
and
Var (X) = Var (Y ) =
5(35/36)
r(1 − p)
=
= 6300 .
p2
(1/36)2
Exercise 3.79
Part (a): P (X ≤ 8) = ppois(8, 5) = 0.9319064.
Part (b): P (X = 8) = dpois(8, 5) = 0.06527804.
Part (c): P (9 ≤ X) = 1 − P (X ≤ 8) = 1 − ppois(8, 5) = 0.06809363.
Part (d): P (5 ≤ X ≤ 8) = ppois(8, 5) − ppois(4, 5) = 0.4914131.
Part (e) P (5 < X < 8) = P (6 ≤ X ≤ 7) = ppois(7, 5) − ppois(5, 5) = 0.2506677.
Exercise 3.80
Part (a): P (X ≤ 5) = ppois(5, 8) = 0.1912361.
Part (b): P (6 ≤ X ≤ 9) = ppois(9, 8) − ppois(5, 8) = 0.5253882.
Part (c) P (10 ≤ X) = 1 − P (X ≤ 9) = 1 − ppois(9, 8) = 0.2833757.
Part (d) This would be P (X > 10.82843) = P (X ≥ 11) = 1 − P (X ≤ 10) = 0.1841142.
Exercise 3.81
Part (a): P (X ≤ 10) = ppois(10, 20) = 0.01081172.
Part (b): P (X > 20) = 1 − P (X ≤ 19) = 1 − ppois(19, 20) = 0.5297427. The answer in
the back of the book corresponds to P (X ≥ 20).
88
Part (c): We have P (10 ≤ X ≤ 20) = P (X ≤ 20) − P (X ≤ 9) = ppois(20, 20) −
ppois(9, 20) = 0.5540972, and P (10 < X < 20) = P (X ≤ 19)−P (X ≤ 10) = ppois(19, 20)−
ppois(10, 20) = 0.4594455.
√
Part (d): We have µX = 20 and σX = 20 = 4.472136 so that µX − 2σX = 11.05573 and
µX + 2σX = 28.94427. Thus we want to evaluate P (12 ≤ X ≤ 28) = 0.9442797.
Exercise 3.82
Part (a): P (X = 1) = dpois(1, 0.2) = 0.1637462.
Part (b): P (X ≥ 2) = 1 − P (X ≤ 1) = 1 − ppois(1, 0.2) = 0.0175231.
Part (c): P (X1 = 0 and X2 = 0) = P (X1 = 0)P (X2 = 0) = dpois(0, 0.2)2 = 0.67032.
Exercise 3.83
1
and n = 1000
From the given description X is a binomial random variable with p = 200
thus λ = np = 5. In using the Poisson approximation to the binomial distribution the book
states that we should have n > 50 and np < 5. The first condition is true here while the
second condition is not strictly true.
Part (a): P (5 ≤ X ≤ 8) = ppois(8, 50) − ppois(4, 50) = 0.4914131.
Part (b): P (X ≥ 8) = 1 − P (X ≤ 7) = 1 − ppois(7, 50) = 0.1333717.
Exercise 3.84
We have p = 0.1 10−2 = 0.001 and so λ = np = 104 (10−3 ) = 10.
Part (a): E(X) = np = 10 and Var (X) = npq = 10(1 − 0.001) = 9.99 so that SD (X) =
3.160696.
Part (b): X is approximately Poisson with λ = np = 10. Then P (X > 10) = 1 − P (X ≤
9) = 1 − ppois(9, 10) = 0.5420703.
Part (c): This is P (X = 0) = dpois(0, 10) = 4.539993 10−5.
89
Exercise 3.85
Part (a): Since λ = 8 these would be
P (X = 6) = dpois(6, 8) = 0.1221382
P (X ≥ 6) = 1 − P (X ≤ 5) = 1 − ppois(5, 8) = 0.8087639
P (X ≥ 10) = 1 − ppois(9, 8) = 0.2833757 .
Part (b): Now λ = 8(1.5) = 12, so E(X) = λ = 12 and SD (X) =
√
λ = 3.464102.
Part (c): Now λ = 8(2.5) = 20 so we get
P (X ≥ 20) = 1 − P (X ≤ 19) = 1 − ppois(19, λ) = 0.5297427
P (X ≤ 10) = ppois(10, λ) = 0.01081172 .
Exercise 3.86
Part (a): Since λ = 5 this is P (X = 4) = dpois(4, 5) = 0.1754674.
Part (b): This is P (X ≥ 4) = 1 − P (X ≤ 3) = 1 − ppois(3, 5) = 0.7349741.
Part (c): This is λ = 5(3/4) = 3.75.
Exercise 3.87
Part (a): The number of calls received, X is a Poisson random variable with λ = 4(2) = 8.
Thus we want P (X = 10) = dpois(10, 8) = 0.09926153.
Part (b): The number of calls received, X during the break is a Poisson random variable
with λ = 4(0.5) = 2. Thus we want P (X = 0) = dpois(0, 2) = 0.1353353.
Part (c): This would be the expectation of the random variable X which is λ = 2.
Exercise 3.88
Part (a): If X is the number of diodes that will fail then X is a binomial random variable
√
with n = 200 and p = 0.01. Thus E(X) = np = 2 and SD (X) = npq = 1.407125.
Part (b): We could approximate the true pdf of X as a Poisson random variable with
λ = np = 2. Thus we want to compute P (X ≥ 4) = 1 − P (X ≤ 3) = 0.1428765.
90
Part (c): The probability that all diodes work (using the Poisson approximation) is P (X =
0) = 0.1353353. The number of boards that work (from five) is a binomial random variable
with n = 5 and a probability of “success” given by P (X = 0) just calculated. Thus the
probability we seek (if N is a random variable denoting the number of working boards) is
P (N ≥ 4) = 1 − P (N ≤ 3) = 1 − pbinom(3, 5, 0.1353353) = 0.001495714 .
Exercise 3.89
Part (a): This would be
2
0.5
= 4.
Part (b): This would be P (X > 5) = 1 − P (X ≤ 5) = 1 − ppois(5, 4) = 0.2148696.
Part (c): For this we want
t
P (X = 0) ≤ 0.1 or dpois 0,
≤ 0.1 .
0.5
From the known functional form for the Poisson pdf we have
e−t/0.5 ≤ 0.1 or t ≥ −0.5 ln(0.1) = 1.151293 ,
years.
Exercise 3.90 (deriving properties of a Poisson random variable)
If X is a Poisson random variable then from the definition of expectation we have that
E[X n ] =
∞
X
in e−λ
i=0
∞
∞
X in λn
X in λi
λi
= e−λ
e−λ =
,
i!
i!
i!
i=0
i=1
since (assuming n 6= 0) when i = 0 the first term vanishes. Continuing our calculation we
can cancel a factor of i and find that
n
−λ
E[X ] = e
= λ
∞
∞
X
X
(i + 1)n−1λi+1
in−1 λi
−λ
=e
(i − 1)!
i!
i=0
i=1
∞
X
(i + 1)n−1e−λ λi
i=0
i!
.
Now this sum can be recognized as the expectation of the variable (X + 1)n−1 so we see that
E[X n ] = λE[(X + 1)n−1 ] .
From the result we have
E[X] = λE[1] = λ and E[X 2 ] = λE[X + 1] = λ(λ + 1) .
91
(11)
Thus the variance of X is given by
Var[X] = E[X 2 ] − E[X]2 = λ .
We find the characteristic function for a Poisson random variable given by
ζ(t) = E[eitX ] =
∞
X
eitx
x=0
= e−λ
∞
X
(eit λ)x
x=0
x!
e−λ λx
x!
= e−λ eλe
it
= exp{λ(eit − 1)} .
(12)
Above we explicitly calculated E(X) and Var(X) but we can also use the above characteristic
function to derive them. For example, we find
1
1 ∂ζ(t) it
it exp{λ(e
−
1)}λie
=
E(X) =
t=0
i ∂t t=0
i
= λeit exp{λ(eit − 1)}t=0 = λ ,
for E(X) and
1 ∂ 2 ζ(t) 1 ∂
it
it
E(X ) = 2
=
λe exp{λ(e − 1)} i ∂t2 t=0
i ∂t
t=0
1 it
it
it
it
iλe exp{λ(e − 1)} + λe (λie ) exp{λ(eit − 1)} t=0
=
i
= λeit exp{λ(eit − 1)} + λ2 e2it exp{λ(eit − 1)}t=0
2
= λ + λ2 ,
for E(X 2 ) the same two results as before.
Exercise 3.91
Part (a): This number X will be distributed as a Poisson random variable with λ =
80(0.25) = 20. Thus we want to evaluate P (X ≤ 16) = ppois(16, 20) = 0.2210742.
Part (b): This number X will be distributed as a Poisson random variable with λ =
80(85000) = 6800000 and this is also the expectation of X.
Part (c): This circle would have an area (in square miles) of π(0.1)2 = 0.03141593 which is
20.10619 acres. Thus the number of trees is a Poisson random variable with λ = 20.10619.
Exercise 3.92
Part (a): We need ten vehicles to arrive and then once these ten are inspected we need
to have no violations found. Let X be the number of vehicles that arrived and then N the
92
number of cars without violations. Then N is a binomial random variable with n = 10 and
p = 0.5. Using this we have
P (X = 10 ∩ N = 0) = P (N = 0|X = 10)P (X = 10)
= dbinom(0, 10, 0.5)dpois(10, 10) = 0.0001221778 .
Part (b): Based on the arguments above this would be
−10 y y
e 10
10
y−10
0.5 0.5
P (X = y)P (N = 10|X = y) =
10
y!
= dpois(y, 10)dbinom(10, y, 0.5) .
Part (c): Summing the above for y = 10 to y = 30 (a small approximation to ∞) we get
the value 0.01813279.
Exercise 3.93
Part (a): For there to be no events in the interval (0, t + ∆t) there must be no events in the
interval (0, t) and the interval (t, t + ∆). Using this and the independence from the Poisson
process we have
P0 (t + ∆t) = P0 (t)P0 (∆t) .
Part (b): Following the books suggestions we have
P0 (t + ∆t) − P0 (t)
= −P0 (t)
∆t
By property 2 from the Poisson process we have
1 − P0 (∆t)
∆t
.
1 − P0 (∆t) = 1 − (1 − α∆t + o(∆t)) = α∆t + o(∆t) ,
and the above becomes
P0 (t + ∆t) − P0 (t)
o(∆t)
.
= −P0 (t) α +
∆t
∆t
Taking the limit as ∆t → 0 we get
dP0 (t)
= −αP0 (t) .
dt
satisfies the above equation.
Part (c): The expression e−αt
k
not that
Part (d): For Pk (t) = e−αt (αt)
k!
kα(αt)k−1
k!
(αt)k−1
−αt
= −αPk (t) + αe
(k − 1)!
= −αPk (t) + αPk−1(t) ,
d
Pk (t) = −αPk (t) + e−αt
dt
the desired expression.
93
Exercise 3.94
Recall that the number of elements (unique unordered tuples of size three from seven elements) is given by
7
= 35 ,
3
which is how to calculate the total number of outcomes. Each tuple thus has a probability
1
of 35
. In the python code ex3 94.py we enumerate each possible tuple of three numbers and
compute its sum. When we run the above code we get
(1,
(1,
(1,
(1,
(1,
(1,
...
(3,
(3,
(4,
(4,
(4,
(5,
2, 3) 6
2, 4) 7
2, 5) 8
2, 6) 9
2, 7) 10
3, 4) 8
output omitted ...
5, 7) 15
6, 7) 16
5, 6) 15
5, 7) 16
6, 7) 17
6, 7) 18
If we next count up the number of times each potential sum occurs and compute the probability of getting this sum we compute
sum
6
7
8
9
10
11
12
13
14
15
16
17
18
numb
1
1
2
3
4
4
5
4
4
3
2
1
1
prob
0.028571
0.028571
0.057143
0.085714
0.114286
0.114286
0.142857
0.114286
0.114286
0.085714
0.057143
0.028571
0.028571
In the same code we compute µ and σ 2 and find
94
mu = 12.0
sigma2= 8.0
Exercise 3.95
Part (a): Following the hint we have
4 13
4
5
P (all five cards are spades) = 52 = 0.001980792 .
P (1) =
1
5
The factor 41 is the number of ways to select the suit of cards for the hand (here we selected
spades) from the four possible suits. Next we have
4
P (only spades and hearts with at least one of each suit)
P (2) =
2
13 P4
13
6
k=1 k
5−k = 0.1459184 .
=
52
5
4
The factor 2 is the number of ways to select the two suits of cards in the hand (here we
selected spades and hearts). In the numerator the factor 13
is the number of ways to select
k
13
the spades in the hand and then 5−k
is the number of ways to select the hearts in the hand.
Next we have
4
P (two spades and one card from the other three suits)
P (4) =
1
13 13 13
4 13
2
1
1 1 = 0.2637455 .
=
52
5
4
The factor 1 is the number of ways to select the suit of cards for the hand that will be
duplicated (here we selected spades) from the four possible suits. Then P (3) = 1 − P (1) −
P (2) − P (4) = 0.5883553.
Part (b): We find µ = 3.113866 and σ 2 = 0.4046217.
95
Exercise 3.96
We must have r successes and we stop when we get them. Thus the last trial will be the rth
success.
P (Y = r) = pr
r
(1 − p)pr
P (Y = r + 1) =
1
r+1
(1 − p)2 pr
P (Y = r + 2) =
2
..
. r+k−1
(1 − p)k pr
P (Y = r + k) =
k
for k ≥ 0 .
If we want this written in terms of just y the total number of trials then y = r + k and we
have
y−1
(1 − p)y−r pr for y ≥ r .
P (Y = y) =
y−r
Exercise 3.97
Part (a): This is a binomial random variable with n = 15 and p = 0.75.
Part (b): We have 1 − P (X ≤ 9) = 1 − pbinom(9, 15, 0.75) = 0.8516319.
Part (c): We have P (6 ≤ X ≤ 10) = P (X ≤ 10) − P (X ≤ 5) = 0.3127191.
Part (d): We have
µ = np = 15(0.75) = 11.25
σ 2 np(1 − p) = 2.8125 .
Part (e): We have 10 chain driven models and 8 shank driven models in the existing stock.
Note that Y = 15 − X is the number of shaft driven models bought by the next fifteen
customers. Thus to have enough product on hand we must have
0 ≤ X ≤ 10 and 0 ≤ Y ≤ 8 .
Since Y = 15 − X this last inequality is equivalent to
7 ≤ X ≤ 15 .
Thus combining this with the condition 0 ≤ X ≤ 10 we must have
7 ≤ X ≤ 10 .
Thus the probability we want to compute is given by
P (7 ≤ X ≤ 10) = P (X ≤ 10)−P (X ≤ 6) = pbinom(10, 15, 0.75)−pbinom(6, 15, 0.75) = 0.309321 .
96
Exercise 3.98
The probability that a six volt flash light works is one minus the probability that both six
volt batteries fail or
1 − (1 − p)2 .
The probability that the two D-cell flashlight works is the probability we have at least two
working batteries from the four given. If X is the number of working batteries than X is
a binomial random variable with parameters n = 4 and p. Thus the probability we want is
P (X ≥ 2) or
1 X
4 x
p (1 − p)4−x .
P (X ≥ 2) = 1 − P (X ≤ 1) = 1 −
x
x=0
Using the simple R code
ps = seq( 0, 1, length.out=100 )
P_six_volt = 1 - (1-ps)^2
P_D_cell = 1 - pbinom( 1, 4, ps )
plot( ps, P_six_volt, type=’l’, col=’blue’ )
lines( ps, P_D_cell, type=’l’, col=’red’ )
legend( ’topleft’, ’(x,y)’,
c(’probability six volt works’,’probability D cell works’),
lty=c(1,1), col=c(’blue’,’red’)
)
grid()
we can plot each of these expressions as a function of p. When we do that we get the plot
given in Figure 2. There we see that for low value of p i.e. less than about 0.65 the six volt
flashlight has a larger probability of working. If p is greater than about 0.65 then the D cell
has a larger probability.
Exercise 3.99
We want P (X ≥ 3) where X is binomial with n = 5 and p = 0.9 so
P (X ≥ 3) = 1 − P (X ≤ 2) = 1 − pbinom(2, 5, 0.9) = 0.99144 .
Exercise 3.100
From the problem statement a lot will be rejected if X ≥ 5.
Part (a): We have
P (X ≥ 5) = 1 − P (X ≤ 4) = 1 − pbinom(4, 25, 0.05) = 0.007164948 .
97
0.0
0.2
0.4
P_six_volt
0.6
0.8
1.0
probability six volt works
probability D cell works
0.0
0.2
0.4
0.6
0.8
1.0
ps
¯ Exercise 98.
Figure 2: The two flashlight probabilities of X
Part (b): This is
P (X ≥ 5) = 1 − pbinom(4, 25, 0.1) = 0.09799362 .
Part (c): This is
P (X ≥ 5) = 1 − pbinom(4, 25, 0.2) = 0.5793257 .
Part (d): We change the four in the above expressions to a five all probabilities would
decrease since now we require more defective batteries
Exercise 3.101
Part (a): X is a binomial random variable with n = 500 and p = 0.005 which we can
approximate using a Poisson random variable with λ = np = 2.5 since we have n > 50 and
np = 2.5 < 5.
Part (b): P (X = 5) = dpois(5, 2.5) = 0.06680094.
Part (c): This is
P (5 ≤ X) = P (X ≥ 5) = 1 − P (X ≤ 5) = 1 − ppois(4, 2.5) = 0.108822 .
98
Exercise 3.102
Note that X is a binomial random variable with n = 25 and p = 0.5.
Part (a): This is 0.9853667.
Part (b): This is 0.2199647.
Part (c): If p = 0.5 then by chance we would have
P (X ≤ 7 or X ≥ 18) = P (X ≤ 7) + P (X ≥ 18)
= pbinom(7, 25, 0.5) + (1 − P (X ≤ 17))
= pbinom(7, 25, 0.5) + (1 − pbinom(17, 25, 0.5)) = 0.04328525 .
Part (d): We reject the claim if the inequalities in the previous part are true. This can
happen with a probability (when p = 0.6) given by
pbinom(7,25,0.6) + ( 1 - pbinom(17,25,0.6) )
[1] 0.1547572
If p = 0.8 this becomes 0.8908772.
Part (e): We would want to construct a test to pick an integer value of c such that when
p = 0.5 we have
P (µ − c ≤ X ≤ µ + c) ≈ 1 − 0.01 = 0.99 .
This means that we expect to fall in the region µ − c ≤ X ≤ µ + c with probability 0.99
and outside of this region with probability of 0.01. Then if our sample had x ≥ µ + c or
x ≤ µ − c we would reject H0 : p = 0.5.
Exercise 3.103
Let T be the random variable specifying the number of tests that will be run. Then we have
E(T ) = 1P (none of the n members has the disease)
+ (n + 1)P (at least one of the n members has the disease)
= (1 − p)n + (n + 1)(1 − (1 − p)n )
= n + 1 − n(1 − p)n .
If n = 3 and p = 0.1 the above is E(T ) = 1.813. If n = 5 and p = 0.1 the above is
E(T ) = 3.04755.
99
Exercise 3.104
We receive a correct symbol with a probability 1 − p1 . We receive an incorrect symbol with
probability p1 but this incorrect symbol can be corrected with a probability p2 . In total
then, we receive a correct symbol with probability p given by
p = 1 − p1 + p1 p2 .
Then the number of correct symbols X is a binomial random variable with parameters n
and p (given by the above expression).
Exercise 3.105
In our sequence of trials the last two must be successes and thus the probability we perform
just two trials to accept will have a probability of p2 . We will perform three trials to accept if
the first trial is a failure followed by two successes which happens with probability (1 − p)p2 .
We will perform four trials to accept if the first two trials are F, F or S, F followed by two
successes. Thus this event has the probability
(1 − p)2 p2 + (1 − p)pp2 = (1 − p)p2 .
Thus so far we have
P (2) = p2
P (3) = (1 − p)p2
P (4) = (1 − p)p2 .
For P (x) = P {X = x} when x ≥ 5 we must have the last two trials a success and the trial
before the last two trials must be a failure. If it was a success we would have the sequence
S, S, S and would stop before the final trial. In the first x − 3 trials we cannot have a
sequential run of two successes. Thus we get for the probability of the event X = x
P (x) = p2 (1 − p) [1 − P (2) − P (3) − · · · − P (x − 4) − P (x − 3)] ,
for x ≥ 5. For p = 0.9 we can evaluate these expressions to compute P (X ≤ 8) with the
following R code
p = 0.9
#
x=2
x=3,
x=4
p_x = c( p^2, (1-p)*p^2, (1-p)*p^2 )
for( x in 5:8 ){
x_to_index = x-1 # location of x in p_x vector
previous_x_indices = 1:(x_to_index-3)
prob_no_acceptance = 1 - sum( p_x[ previous_x_indices ] )
p_x = c( p_x, prob_no_acceptance * (1-p) * p^2 )
}
print( sum(p_x) ) # gives 0.9995084
100
Exercise 3.106
Part (a): The number of customers that qualify for membership X is a binomial random
variable with n = 25 and p = 0.1. Thus we want compute P (2 ≤ X ≤ 6) = 0.7193177.
Part (b): In this case n = 100 and the same considerations as above gives µ = np = 10
and σ 2 = np(1 − p) = 9.
Part (c): We want to compute P (X ≥ 7) when X ∼ Bin(25, 0.1). We find this to be
0.009476361.
Part (d): In this case using R we compute P (X ≤ 6) to be
pbinom(6,25,0.2)
[1] 0.7800353
Exercise 3.107
Let S correspond to the event that a seed of maize has a single spikelet and P correspond
to the event that a seed of maize has a paired spikelet. Then we are told that
P (S) = 0.4
P (P ) = 0.6 .
We are also told that after the seed has grown it will produce an ear of corn with single or
paired spikelets with the following probabilities
P (S|S) = 0.29 so P (P |S) = 0.71
P (S|P ) = 0.26 so P (P |P ) = 0.74 .
We next select n = 10 seeds.
Part (a): For each seed the probability that we are of type S and produce kernels of type
S is given by the probability p computed as
p = P (S|S)P (S) = 0.29(0.4) = 0.116 .
Then the probability that exactly X of these seeds from the 10 do this is a binomial random
variable with n = 10 and p given above. Thus we compute
dbinom(5,10,0.116)
[1] 0.002857273
101
Part (b): Next we want the probability that we produce kernels of type S. This can be
computed
p = P (S|S)P (S) + P (S|P )P (P ) = 0.29(0.4) + 0.26(0.6) = 0.272 .
The two desired probability are then given by
c( dbinom(5,10,0.272), pbinom(5,10,0.272) )
[1] 0.07671883 0.97023725
Exercise 3.108
X is a hypergeometric random variable with M = 4, N = 8 + 4 = 12, and n = 4. We are
asked about the mean number of jurors favoring acquittal who will be interviewed. This is
M
4
µ=n
=4
= 1.333333 .
N
12
Exercise 3.109
Part (a): The number of calls (say X) any one of the operators receives during one minute
will be a Poisson random variable with λ = 2(1) = 2. Thus the probability that the first (or
any operator) receives no calls is P (X = 0) = e−λ = 0.1353353.
Part (b): If we consider receiving no calls a “success” then the number of operators (from
five) that receive no calls will be a binomial random variable with n = 5 and p = 0.1353353
(the probability from Part (a) above). Thus the probability we seek is given by 0.001450313.
Part (b): Let E be the event that all operators receive the same number of calls. Let c
be the number of calls each operators receives in the first minute. Then we have (using R
notation)
∞
X
dbinom(5, 5, dpois(c, 2))) = 0.00314835 .
P (E) =
c=0
Exercise 3.110
For a radius of size R the number of grasshoppers found will be a Poisson random variable
with a parameter λ = απR2 = 2πR2 . Since for a Poisson random variable the probability of
getting at least one count is given by P (X ≥ 1) = 1 − P (X = 0) = 1 − e−λ in this case we
find to find R such that
2
P (X ≥ 1) = 1 − e−2πR = 0.99 .
Solving for R in the above expression we find R = 0.8561166 yards.
102
Exercise 3.111
The expected number of copies sold is given by
E(number sold) =
5
X
k=0
kP (X = k) + 5
∞
X
P (X = k) = 2.515348 + 1.074348 = 3.589696 .
k=6
Exercise 3.112
Part (a): For x = 10 we can have A win the first ten games or B win the first ten games
and thus
P (X = 10) = p10 + (1 − p)10 .
For x = 11 the opponent of the player that ultimately wins must win one game in the first
ten games thus
10
10
10
p(1 − p)10 .
(1 − p)p +
P (X = 11) =
1
1
In the same way if x = 12 the opponent of the player that ultimately wins must win two
games in the first eleven games thus
11 2
11
2 10
p (1 − p)10 .
(1 − p) p +
P (X = 12) =
2
2
The pattern is now clear. We have
x − 1 x−10
x−1
x−10 10
p
(1 − p)10 ,
(1 − p)
p +
P (X = x) =
x − 10
x − 10
for 10 ≤ x ≤ 19. Lets check empirically that what we have constructed is a probability mass
function
p = 0.9
xs = 10:19
pt_1 = choose( xs-1, xs-10 ) * (1-p)^(xs-10) * p^10
pt_2 = choose( xs-1, xs-10 ) * (1-p)^10 * p^(xs-10)
sum( pt_1 + pt_2 ) # gives 1
Part (b): In this case X ∈ {10, 11, 12, 13, . . . } since we can imagine as many draws as
needed to make X as large as desired.
Exercise 3.113
Let T be the event that the result of the test is positive and let D be the event that the
person has the disease. Then we are told that
P (T |D c ) = 0.2 ⇒ P (T c |D c ) = 0.8 ,
103
and
P (T c|D) = 0.1 ⇒ P (T |D c) = 0.9 .
Part (a): No since we have a different probability of “success” on each trial since this
probability depends no whether or not the person has the disease.
Part (b): This probability p would be given by
p=
3
X
P (X = k|Select from the diseased group of 5)P (X = k|Select from non-diseased group of 5)
k=0
=
3 X
5
k=0
k
k
5−k
0.9 0.1
5
k
5−k
0.2 0.8
3−k
= 0.0272983 .
We computed this in R using
ks = 0:3
sum( dbinom( ks, 5, 0.9 ) * dbinom( 3-ks, 5, 0.2 ) )
Exercise 3.114
In R the dnbinom function does not need to have its size parameter (here the value of r)
an integer. Thus we compute P (X = 4) = dnbinom(4, 2.5, 0.3) = 0.106799 and P (X ≥ 1) =
1 − P (X ≤ 0) = 1 − pnbinom(0, 2.5, 0.3) = 0.950705.
Exercise 3.115
Part (a): We have p(x) ≥ 0 and
Part (b): This would be
P
x
p(x) = 1.
p(x; λ, µ) = 0.6
e−λ λx
x!
e−µ µx
+ 0.4
x!
.
Part (c): Using the expectation of a Poisson random variable we have
1
1
1
1
1
E(X) = E(Part 1) + E(Part 2) = λ + µ = (λ + µ) .
2
2
2
2
2
Part (d): To compute the variance we need the expectation of the random variable squared.
To compute this recall that if Y is a Poisson random variable with parameter κ then E(Y 2 ) =
Var (Y ) + E(Y )2 = κ + κ2 . Thus for our variable X in this problem we have
1
1
1
E(X 2 ) = (λ + λ2 ) + (µ + µ2 ) = (λ + µ + λ2 + µ2 ) .
2
2
2
104
Given this we have
Var (X) = E(X 2 ) − E(X)2
1
1
= (λ + µ + λ2 + µ2 ) − (λ2 + µ2 + 2λµ)
2
4
1
1
2
= (λ + µ) + (λ − µ) ,
2
4
when we simplify.
Exercise 3.116
Part (a): To prove this lets first consider the ratio b(x + 1; n, p)/b(x; n, p) which is given by
n
px+1 (1 − p)n−x−1
x+1
b(x + 1; n, p)
=
b(x; n, p)
n
px (1 − p)n−x
x
p
n−x
.
=
x+1 1−p
Now b(x + 1; n, p) will be larger than b(x; n, p) if this ratio is larger than one or
n−x
p
> 1.
x+1 1−p
This is equivalent to
x < np − (1 − p) .
Thus the mode x∗ is the integer that is larger than or equal to np − (1 − p) but less than (or
equal to) this number plus one. That is it must satisfy
np − (1 − p) ≤ x∗ ≤ np − (1 − p) + 1 = p(n + 1) .
−λ x
Part (b): For P (X = x) = e x!λ the mode is the value x∗ that gives the largest P (X = x∗ )
value. Consider for what values of x the ratio of P (X = x + 1) and P (X = x) is increasing
we have
P (X = x + 1)
λ
=
> 1.
P (X = x)
x+1
This means that λ > x + 1 so x < λ − 1. Thus the mode is the integer x∗ such that
λ − 1 ≤ x∗ ≤ λ − 1 + 1 = λ .
If λ is an integer then λ − 1 is also an integer so the bounds above puts x∗ between two
integers and thus either one can be the mode.
105
Exercise 3.117
Recall that X is the number of tracks the arm will pass during a new disk track “seek”.
Then we can compute P (X = j) by conditioning on the track that the disk head is currently
on as
P (X = j) =
=
10
X
i=1
10
X
P (arm is now on track i and X = j)
P (X = j|arm is now on track i)pi .
i=1
Next we need to evaluate P (X = j|arm is now on track i). To do that we consider several
cases. Starting on track i then for the head:
• to move over j = 0 tracks means the head does not actually move and it stays on track
i. This will happen with probability pi since we have to have a seek request to the
same track index i as we are currently on.
• to move over j = 1 tracks means that we have to have requests to the tracks i + 1 or
i − 1. This will happen with probability pi+1 + pi−1 .
• to move over j = 2 tracks means that we have to have requests to the tracks i + 2 or
i − 2. This will happen with probability pi+2 + pi−2 .
The pattern above continues for other values of j. In general, to move over j tracks means
that we receive a seek request to go to tracks i + j or i − j. Note that if one of these values
is less than 1 or larger than 10 it would indicate a track that does not exist and we must
take the values of pi+j or pi−j as zero. Thus we have
P (X = 0) =
P (X = j) =
10
X
i=1
10
X
p2i
for j ≥ 1 ,
(pi+j + pi−j )pi
i=1
with pk = 0 if k ≤ 0 or k ≥ 11. Note that this results is slightly different than the one in the
back of the book. If anyone sees anything wrong with what I have done please let me know.
Exercise 3.118
Since X is a hypergeometric random variable we have
E(X) =
X
xp(x) =
x
Xx
x
106
M
x
N −M
n−x
N
n
.
Now the limits of X are max(0, n − N + M) ≤ x ≤ min(n, M). Since we are told that n < M
we know that the upper limit of our summation is min(n, M) = n. Thus we have
N −M n
X
x M
x
n−x .
E(X) =
N
n
x=max(0,n−N +M )
Expanding the binomial coefficients this becomes
(N −M )!
!
n
x x!(MM−x)!
X
(n−x)!(N −M −n+x)!
E(X) =
N!
n!(N −n)!
x=max(0,n−N +M )
=
(N −1−(M −1))!
M (M −1)!
(x−1)!(M −1−(x−1))! (n−1−(x−1))!(N −1−(M −1)−(n−1)+(x−1))!
N (N −1)!
n(n−1)!(N −1−(n−1))!
x=max(1,n−N +M )
n
X
nM
=
N
M −1
x−1
n
X
x=max(1,n−N +M )
Let y = x − 1 then the above is
nM
E(X) =
N
nM
=
N
n−1
X
N −1−(M −1)
n−1−(x−1)
N −1
n−1
M −1
y
y=max(1,n−N +M )−1
.
N −1−(M −1)
n−1−y
N −1
n−1
M −1
y
n−1
X
y=max(0,n−1−(N −1)+(M −1))
N −1−(M −1)
n−1−y
N −1
n−1
=
nM
,
N
since the sum above is the sum of the hypergeometric density h(y; n − 1, M − 1, N − 1) over
all possible y values (and hence sums to one).
Exercise 3.119
From the given expression
X
all x
(x − µ)2 p(x) ≥
X
(x − µ)2 p(x) ,
x:|x−µ|≥kσ
since the left-hand-side is the definition of σ 2 we have
X
X
σ2 ≥
(x − µ)2 p(x) ≥
x:|x−µ|≥kσ
The right-hand-side of the above is
X
k2σ2
x:|x−µ|≥kσ
(k 2 σ 2 )p(x) .
x:|x−µ|≥kσ
p(x) = k 2 σ 2 P (|X − µ| ≥ kσ) .
If we divide both sides by k 2 σ 2 we get
P (|X − µ| ≥ kσ) ≤
or Chebyshev’s inequality.
107
1
,
k2
Exercise 3.120
Part (a): For the given functional expression for α(t) we get
t
Z t2
ea ebt 2
ea bt2
a+bt
e
dt =
λ=
=
(e − ebt1 ) .
a
b
t1
t1
Which due to the properties of the Poisson distribution is also the expectation of the number
of events between the two times [t1 , t2 ]. With the values of a and b given and for t1 = 0 and
t2 = 4 we get λ = 123.4364. If t1 = 2 and t2 = 6 we get λ = 409.8231.
Part (b): In the interval [0, 0.9907] the number of events X is a Poisson random variable with (calculated as above) λ = 9.999606. Thus we want to evaluate P (X ≤ 15) =
ppois(15, 9.999606) = 0.9512733.
Exercise 3.121
Part (a): The expectation is given by
E(call time) = 0.75E(call time|call is voice) + 0.25E(call time|call is data)
= 0.75(3) + 0.25(1) = 2.5 .
minutes.
Part (b): Let C be the random variable representing the number of chocolate chips found
and the three cookie types denoted by C1 , C2 , and C3 . Then we get
E(C) = E(C|C1 )P (C1 )+E(C|C2 )P (C2 )+E(C|C3 )P (C3 ) = 0.2(1+1)+0.5(2+1)+0.3(3+1) = 3.1 .
Exercise 3.122
We compute
P (X = 1) =
P (X = 2) =
P (X = 3) =
..
.
P (X = k) =
..
.
p
(1 − p)p
(1 − p)2 p
(1 − p)k−1 p
P (X = 10) = 1 −
9
X
P (X = k) = 1 −
k=1
8
X
= 1−p
k=0
k
9
X
(1 − p)k−1 p
k=1
(1 − p) = 1 − p
108
1 − (1 − p)9
1 − (1 − p)
= (1 − p)9 .
The average will then be given by
µ=
10
X
kP (X = k) = p
k=1
8
X
=p
k=0
9
X
k=1
k(1 − p)k−1 + 10(1 − p)9
k(1 − p)k + 10(1 − p)9 .
To evaluate the first summation we use the identity
n
X
k=0
k
kr = r
1 − rn
nr n
−
(1 − r)2 1 − r
We then get
when we simplify.
1
1
µ= −1+ 2−
(1 − p)9 ,
p
p
109
.
(13)
Continuous Random Variables and Probability
Distributions
Problem Solutions
Exercise 4.5
Part (a): We must have a value of k such that
Z 2
kx2 dx = 1 .
0
Integrating the left-hand-side we get
2
x3 k = 1.
3 0
or
3
8k
= 1 so k = .
3
8
Part (b): We find
P (X < 1) =
Z
0
1
3
3 2
x dx =
8
8
Part (c): We find
P (1 < X < 1.5) =
Z
1.5
1
3
3 2
x dx =
8
8
Part (d): For this we find
P (X > 1.5) =
Z
2
1.5
1
1
x3 = .
3 0 8
1.5
x3 1
= (1.53 − 1) = 0.296875 .
3 1
8
1 2
1
3 2
x dx = x3 1.5 = (8 − 1.53 ) = 0.578125 .
8
8
8
Exercise 4.11
Part (a): For this we have
P (X ≤ 1) = F (1) =
1
.
4
Part (b):
1 1
P (0.5 ≤ X ≤ 1) = F (1) = F (0.5) = −
4 4
110
2
1
3
.
=
2
16
Part (c):
1
P (X > 0.5) = 1 − P (X ≤ 0.5) = 1 − F (0.5) = 1 −
4
1
15
=
.
4
16
Part (d): For this part we want to solve 0.5 = F (˜
µ) for µ
˜ or
which means that µ
˜=
√
µ
˜2
1
=
,
2
4
2.
Part (e): We have

 0
x<0
0≤x<2
f (x) = F (x) =

0
x≥2
′
Part (f):
E(X) =
Z
xf (x)dx =
x
2
Z
2
0
Part (g): We first compute
Z
Z
2
2
E(X ) = x f (x)dx =
so that Var (X) = E(X 2 ) − E(X)2 = 2 −
2
1
dx =
2
2
Z
1
x
dx =
2
2
2
2
x
0
16
9
Z
x
x
x2 dx =
0
4
.
3
2
x3 dx = 2 ,
0
= 0.2222222 and σX = 0.4714045.
Part (h): This would be the same as E(X 2 ) which we computed above as 2.
Exercise 4.12
Part (a): This is
P (X < 0) = F (0) =
1
.
2
Part (b): This is given by
3
P (−1 < X < +1) = F (1) − F (−1) =
32
1
4−
3
3
−
32
3
1
+
2 32
1
−4 +
3
=
11
,
16
when we simplify.
Part (c): This is
P (0.5 < X) = 1 − P (X < 0.5) = 1 − F (0.5) = 1 −
111
4 1 1
− ·
2 3 8
=
81
,
256
when we simplify.
Part (d): This is
f (x) = F ′ (x) =
3
(4 − x2 ) .
32
Part (e): Now µ
˜ is the solution to F (˜
µ) = 21 . For this problem this equation is
3
µ
˜3
4˜
µ−
= 0.
32
3
√
This has solutions µ
˜ = 0 or µ
˜ = ± 12 = ±3.464102. Note that these last two solutions
don’t satisfy |˜
µ| < 2 and are not valid. Thus µ
˜ = 0 is the only solution.
Exercise 4.13
Part (a): We must have k such that
Z
∞
kx−4 dx = 1 .
1
Evaluating the left-hand-side of this gives
∞
Z ∞
k
k
x−3 −4
=
k
(0 − 1) = .
x dx = k
−3 1
−3
3
1
To make this equal to one means that k = 3.
Part (b): Our cumulative distribution for this density is given by
x
Z x
3ξ −3 1
1
−4
F (x) =
3ξ dξ =
=−
−1 =1− 3 ,
3
−3 1
x
x
1
for x > 1.
Part (c): These are given by
1
1
P (X > 2) = 1 − P (X < 2) = 1 − F (2) = 1 − 1 − 3 = ,
2
8
and
P (2 < X < 3) = F (3) − F (2) =
1−
1
27
1
19
1
1
=
= 0.08796296 .
− 1−
= −
8
8 27
216
112
Part (d): These can be computed using
Z ∞
E(X) =
x(3x−4 )dx
1
∞
Z ∞
x−2 3
3
−3
=3
x dx = 3
= − (0 − 1) =
(−2) 1
2
2
1
−1 ∞
Z ∞
Z ∞
x = −3(0 − 1) = 3 .
E(X 2 ) = 3
x2 (x−4 )dx = 3
x−2 dx = 3
(−1) 1
1
1
Thus
and σX =
√
3
2
Var (X) = E(X 2 ) − E(X)2 = 3 −
3
9
= ,
4
4
= 0.8660254.
Part (e): The domain we are interested in is |X − µ| < σ or µ − σ < X < µ + σ. Note that
√
3
3
= 0.6339746 < 1 ,
µ−σ = −
2
2
and is outside of the feasible domain. Thus we compute this using
√ !
3
3
P (µ − σ < X < µ + σ) = F (µ + σ) − F (1) = F
− F (1)
+
2
2



= 1 − 1
3
2
+
√
3
2

3  − 0 = 0.9245009 .
Exercise 4.14
Part (a): From properties of a uniform distribution we have
7.5 + 20
1
= 13.75
E(X) = (a + b) =
2
2
(20 − 7.5)2
(b − a)2
=
= 13.02083 .
Var (X) =
12
12
Part (b): We find
x
dξ
x − 7.5
=
for 7.5 < x < 20 ,
12.5
7.5 20 − 7.5
together with F = 0 if x < 7.5 and F = 1 if x > 20.
FX (x) =
Z
Part (c): We have
P (X < 10) =
P (10 < X < 15) =
Z
10
Z7.515
10
10 − 7.5
dξ
=
= 0.2
12.5
12.5
5
dξ
=
= 0.4 .
12.5
12.5
113
4
3
2
f(x)
1
0
0.0
0.2
0.4
0.6
0.8
1.0
x
Figure 3: A plot of the density f (x) given in Exercise 4.15.
Part (d): We have
P (|X − µ| < nσ) = P (µ − nσ < X < µ + nσ) =
Z
min(µ+nσ,20)
max(µ−nσ,7.5)
dξ
.
12.5
When n = 1 this is 0.5773503 and when n = 2 this is 1.0. These were computed with the
following R code
f_fn = function(n){
m = ( 7.5 + 20 )/2 # the mean
s = sqrt( ( 20 - 7.5 )^2 / 12 ) # the standard deviation
ll = max( m - n * s, 7.5 )
ul = min( m + n * s, 20 )
result = ( ul - ll )/12.5
}
lapply( c(1,2), f_fn )
114
Exercise 4.15
Part (a): We first plot 90x8 (1 − x) for 0 < x < 1. See Figure 3 where we do this. Next we
compute the cdf of X as
x
9
9
Z x
x
ξ 10 x10
ξ
8
= 90
= 10x9 − 9x10 .
FX (x) =
90ξ (1 − ξ)dξ = 90
−
−
9
10 0
9
10
0
With the conditions that FX (x) = 0 for x < 0 and FX (x) = 1 for x > 1.
Part (b): This is
P (X ≤ 0.5) = F (0.5) = 10(0.5)9 − 9(0.5)10 = 0.01074219 .
Part (c): This is
P (0.25 < X ≤ 0.5) = F (0.5) − F (0.25) = 0.01071262 ,
when we use the above expression for F (x). The second requested expression has the same
numerical value as the probability just computed above since our density is continuous.
Part (d): We want to find x0.75 which is the point x that satisfy F (x) = 0.75. This means
that we need to solve
10x9 − 9x10 = 0.75 ,
for x. We do that with the following R code
coefs = rep( 0, 11 )
coefs[1] = -0.75 # constant term
coefs[10] = 10 # coefficient of x^9
coefs[11] = -9 # coefficient of x^10
polyroot( coefs )
The only real root that satisfies 0 < x < 1 is x = 0.9035961.
Part (e): We can compute these using
1
10
Z 1
x11 x
9
−
E(X) = 90
x (1 − x)dx = 90
10
11 0
0
1
1
= 90
= 0.8181818 ,
−
10 11
and
E(X 2 ) = 90
Z
1
1
x11 x12 −
11
12 0
x10 (1 − x)dx = 90
1
1
= 90
= 0.6818182 .
−
11 12
0
115
Thus
Var (X) = E(X 2 ) − E(X)2 = 0.6818182 − 0.81818182 = 0.01239674
√
SD (X) = 0.01239674 = 0.1113406 .
Part (f): For this we have
1 − P (|X − µ| < σ) = 1 − P (µ − σ < X < µ + σ)
Z min(µ+σ,1)
=1−
90x8 (1 − x)dx
max(µ−σ,0)
= 1 − (FX (min(µ + σ, 1)) − FX (max(µ − σ, 0))) = 0.3136706 .
Exercise 4.16
In Exercise 5 above we had the pdf given by f (x) = 38 x2 for 0 < x < 2.
Part (a): Our cdf for this pdf is computed as
Z x
1 x x3
3 2
FX (x) =
ξ dξ = ξ 3 0 =
.
8
8
0 8
Part (b): For this we have
P (X ≤ 0.5) = FX (0.5) = 0.015625 .
Part (c): For this we have
P (0.25 ≤ X ≤ 0.5) = FX (0.5) − FX (0.25) = 0.01367188 .
Part (d): We want to find the value of x0.75 which is the solution of FX (x) = 0.75. Using the
above form for FX (x) we find that the value of x that solves this equation is x = 1.817121.
Part (e): To compute these we need
2
Z
Z 2 3
3 2 3
3 1 4 3
3 2
x dx =
x dx =
x = (16) = ,
E(X) =
x
8
8 0
8 4 0 32
2
0
and
2
E(X ) =
Z
0
2
2
x
Thus from these we compute
2
Z
3 2
3
3 2 4
3 1 5 12
x dx =
x dx =
x = (32) =
.
8
8 0
8 5 0 40
5
Var (X) =
So σX =
√
3
12 9
− =
= 0.15 .
5
4
20
0.15 = 0.3872983.
Part (f): Calculated the same way as in problem 15 for this probability we get 0.3319104.
116
Exercise 4.17
For the uniform distribution we have a cdf given by
FX (x) =
x−A
B−A
for A < x < B .
The value of xp is given by solving FX (xp ) =
xp −A
B−A
= p. This has the solution
xp = (B − A)p + A .
Part (b): For this we need to compute
B
Z B x2
dx
= A+B .
=
E(X) =
x
B−A
2(B − A) A
2
A
and
2
E(X ) =
Z
B
2
x
A
Using these we have that
dx
B−A
=
1
1
(B 3 − A3 ) = (B 2 + AB + A2 ) .
3(B − A)
3
Var (X) = E(X 2 ) − E(X)2
1
1
1
= (B 2 + AB + A2 ) − (A + B)2 = (A − B)2 ,
3
4
12
when we simplify. Using the expression for the variance we have that
B−A
σX = √
.
12
Part (c): For this we compute
n+1 B
Z B 1
1
x
dx
n
n
=
=
(B n+1 − An+1 ) .
E(X ) =
x
B−A
(B − A) n + 1 A (B − A)(n + 1)
A
Exercise 4.19
Part (a): Using the given cdf for this we have
1
P (X ≤ 1) = F (1) = (1 + log(4)) = 0.5965736 .
4
Part (b): This is
3
P (1 ≤ X ≤ 3) = F (3) − F (1) =
4
4
1
1 + log
− (1 + log(4)) = 0.369188 .
3
4
Part (c): This is given by
4
x
1
1
1
4
1 + log
+
−
= log
.
f (x) = F (x) =
4
x
4
x
4
x
′
117
Exercise 4.20
Part (a): The cdf for this pdf is different in different regions of y. First we have FY (y) = 0
if y < 0. Next we have
y
Z y
y2
1 ξ 2 1
=
ξdξ =
for 0 < y < 5 .
FY (y) =
25 2 0 50
0 25
Next if 5 < y < 10 we have
y
Z y
25
1
1 2
1 ξ 2 2
FY (y) =
+
− ξ dξ = + (y − 5) −
50
5 25
2 5
25 2 5
5
2
1 2
1 2
1
= + (y − 5) −
y − 25 = y − y 2 − 1 .
2 5
50
5
50
As a sanity check on our work we find FY (10) = 4 − 1 − 2 = 1 as it should.
Part (b): If 0 < p < 0.5 then we have to solve
1 2
y = p,
50
√
for y. Solving this we get y = 5 2p. If 0.5 < p < 1.0 then again we need to solve FY (y) = p
which in this case is given by
1
2
y − y2 − 1 = p .
5
50
Solving this with the quadratic equation we get
y = W W XF inishthis
Exercise 4.21
Since the area of a circle is πr 2 the expected area is given by
Z 11 3
501π
3π 668
2
2
2
2
E(πr ) = πE(r ) = π
r
=
(1 − (10 − r) ) dr =
= 314.7876 ,
4
4
5
5
9
when we perform the needed integration.
Exercise 4.22
Part (a): For this we have
−1 x
Z x
Z x ξ 1
ξ −2 dξ = 2(x − 1) − 2
F (x) =
2 1 − 2 dξ = 2(x − 1) − 2
ξ
−1 1
1
1
1
2
= 2(x − 1) − 2
− 1 = 2x + − 4 .
x
x
118
Lets check this expression for F (x) at a few special points. We have
F (1) = 2 + 2 − 4 = 0
2
F (2) = 4 + − 4 = 1 .
2
both of which must be true.
Part (b): For this part of the problem we want to find xp such that F (xp ) = p. From the
above expression for F (x) this means that
2xp +
2
− 4 = p,
xp
or
p
x2p + −2 −
xp + 1 = 0 .
2
When we solve this with the quadratic formula we get
r
p p2
p
+
.
xp = 1 + ±
4
2 16
We need to determine which of the two signs in the above formula to use. Lets check a few
“easy cases” [?] and see if they will tell us this information. If p = 0 then from the above
we have x0 = 1 which does not tell us the information we wanted. If p = 1 then the above
formula gives
r
1
1
1
5 3
x1 = 1 + ±
+
= ± .
4
2 16
4 4
Must take the plus sign so that x1 = 2 otherwise x1 < 2 which is not possible.
x1/2
1
2
in the above to compute
s
1
1
9 1 3
3
1 1
=µ
˜ =1+ +
= +
= .
+
8
8 16 4
8 2 4
2
To find the median µ
˜ we take p =
Part (c): We compute these expectations as
Z 2
Z 2
Z 2 dx
1
xdx − 2
E(X) =
2 1 − 2 dx = 2
x
x
1
1
1
2 2
2x =
− 2 ln(x)|21 = (4 − 1) − 2 ln(2/1) = 3 − 2 ln(2) = 1.613706 .
2 1
Next to compute Var (X) we need to compute E(X 2 ) we do this as
Z 2
Z 2
Z 2
1
2
2
2
E(X ) =
x 2 1 − 2 dx = 2
x dx − 2
dx
x
1
1
1
8
2 2
2
= x3 1 − 2x|21 = (8 − 1) − 2(2 − 1) = .
3
3
3
119
Thus using these two we have
8
− (3 − 2 ln(2))2 = 0.06262078 .
3
Part (d): We have h(X) = max(1.5 − X, 0) in stock so at the end of the week we expect to
have
Var (X) =
E(h(X)) = max(1.5 − E(X), 0) = max(1.5 − (3 − 2 ln(2)), 0) = max(−0.1137056, 0) = 0 .
So none left at the end of the week.
Exercise 4.23
When we have F = 1.8C + 32 then
E(F ) = 1.8E(C) + 32 = 1.8(120) + 32 = 248.0 .
Var (F ) = 1.82 Var (C) = 1.82 (22 ) = 12.96 .
Thus
SD (F ) =
√
12.96 = 3.6 .
Exercise 4.24
Part (a): The expectation of X can be computed as
−k+1 ∞
Z ∞
Z ∞ k
x
kθ
k
−k
k
dx
=
kθ
x
dx
=
kθ
E(X) =
x
xk+1
−k + 1 θ
θ
θ
1
k
kθk
0 − k−1 =
θ.
=−
k−1
θ
k−1
As long as −k + 1 < 0 or k > 1 so that we can evaluate the integral in the limit as x → ∞.
Part (b): If k = 1 this expectation is undefined.
Part (c): To compute the variance we need to compute E(X 2 ). We can compute this using
Z ∞
Z ∞
Z ∞ k
kθ
k
2−k−1
k
2
2
dx = kθ
E(X ) =
x
dx = kθ
x−k+1 dx
x
k+1
x
θ
θ
θ
−k+2 ∞
k
1
k 2
x
= kθ
0 − k−2 =
θ .
= kθk
−k + 2 θ
−k + 2
θ
k−2
Using this we then have
k 2
1
k2
k
2
2
Var (X) =
θ −
θ = kθ
−
k−2
(k − 1)2
k − 2 (k − 1)2
1
2
= kθ
.
(k − 2)(k − 1)2
120
The expression we were trying to show.
Part (d): If k = 2 then the variance does not exist.
Part (e): If we try to compute E(X n ) then we need to evaluate
Z ∞
Z ∞ k
kθ
k
n
n
dx = kθ
xn−k−1 dx .
E(X ) =
x
k+1
x
θ
θ
This integral will only converge if the power on x is “large enough negative”. Specifically we
need to have n − k − 1 < −1 for convergence. This is equivalent to k > n. This condition
satisfies what we found for for Part (a) and (b) above.
Exercise 4.25
Part (a): The distribution function for Y is given by
y − 32
y − 32
FY (y) = P {Y ≤ y} = P {1.8X + 32 ≤ y} = P X ≤
= FX
.
1.8
1.8
Now µ
˜Y is the number such that FY (˜
µY ) = 0.5. This means that
µ
˜Y − 32
FX
= 0.5 .
1.8
From the distribution of X this means that
µ
˜Y − 32
=µ
˜X ,
1.8
so solving for µ
˜Y we get
µ
˜ Y = 32 + 1.8˜
µX .
Part (b): To find yp we need to solve
FY (yp ) = FX
yp − 32
1.8
= p.
Using the distribution of X this means that
yp − 32
= xp ,
1.8
or
yp = 1.8xp + 32 .
Part (c): If xp is the p-percentile of the X distribution then yP = axp + b is the p-th
percentile of the Y distribution.
121
Exercise 4.28
Using R notation to evaluate all of these we have
Part (a):
P (0 ≤ Z ≤ 2.17) = pnorm(2.17) − pnorm(0) = 0.4849966 .
Part (b):
P (0 ≤ Z ≤ 1) = pnorm(1) − pnorm(0) = 0.3413447 .
Part (c):
pnorm(0) − pnorm(−2.5) = 0.4937903 .
Part (d):
pnorm(2.5) − pnorm(−2.5) = 0.9875807 .
Part (e):
pnorm(1.37) = 0.9146565 .
Part (f):
P (−1.75 ≤ Z) = 1 − P (Z < −1.75) = 1 − pnorm(−1.75) = 0.9599408 .
Part (g):
pnorm(2.0) − pnorm(−1.5) = 0.9104427 .
Part (h):
pnorm(2.5) − pnorm(1.37) = 0.07913379 .
Part (i):
P (1.5 ≤ Z) = 1 − P (Z < 1.5) = 1 − pnorm(1.5) = 0.0668072 .
Part (j):
P (|Z| ≤ 2.5) = P (−2.5 < Z < +2.5) = pnorm(2.5) − pnorm(−2.5) = 0.9875807 .
Exercise 4.29
Using R notation/functions we can answer these questions
Part (a):
qnorm(0.9838) = 2.139441 .
122
Part (b):
pnorm(c) − pnorm(0) = 0.291 ,
so
pnorm(c) = 0.291 + pnorm(0) = 0.791 so c = qnorm(0.791) = 0.8098959 .
Part (c):
P (c ≤ Z) = 0.121 or 1 − P (Z ≤ c) = 0.121 or P (Z ≤ c) = 0.879 .
Thus c = qnorm(0.879) = 1.170002.
Part (d):
P (−c ≤ Z ≤ c) = 1 − 2Φ(−c) = 0.668 so Φ(−c) = 0.166 ,
so c = −qnorm(0.166) = 0.9700933.
Part (e):
P (c ≤ |Z|) = 0.016 or 1 − P (|Z| ≤ c) = 0.016 or P (|Z| ≤ c) = 0.984 .
Then following the same steps as in Part (d) we find c = 2.408916.
Exercise 4.30
Using R notation for these we have
qnorm( c(0.91, 0.09, 0.75, 0.25, 0.06) )
[1] 1.3407550 -1.3407550 0.6744898 -0.6744898 -1.5547736
Exercise 4.31
Using R notation we have
qnorm( c( 1-0.0055, 1-0.09, 1-0.663 ) )
[1] 2.5426988 1.3407550 -0.4206646
Exercise 4.32
In R notation these are given by
• pnorm((100 − 80)/10) = 0.9772499.
123
• pnorm((80 − 80)/10) = 0.5.
• pnorm((100 − 80)/10) − pnorm((65 − 80)/10) = 0.9104427.
• 1 − pnorm((70 − 80)/10) = 0.8413447.
• pnorm((95 − 80)/10) − pnorm((85 − 80)/10) = 0.2417303.
• pnorm((90 − 80)/10) − pnorm((70 − 80)/10) = 0.6826895.
Exercise 4.33
In R notation these are given by Part (a): P (X < 18) = pnorm((18−15)/1.25) = 0.9918025.
Part (b): pnorm((12 − 15)/1.25) − pnorm((10 − 15)/1.25) = 0.008165865.
Part (c): This would be
P (|X − 15| ≤ 1.5(1.25)) = P (15 − 1.875 < X < 15 + 1.875) = P (13.125 < X < 16.875)
= pnorm((16.875 − 15)/1.25) − pnorm((13.125 − 15)/1.25)
= 0.8663856 .
Exercise 4.34
In R notation these are given by
Part (a): P (X > 0.25) = 1 − P (X < 0.25) = 1 − pnorm((0.25 − 0.3)/0.06) = 0.7976716.
Part (b): P (X < 0.1) = pnorm((0.1 − 0.3)/0.06) = 0.0004290603.
Part (c): We would want a value of t such that P (Z > t) = 0.05. This is equivalent to
P (Z < t) = 0.95. This has a solution t = qnorm(0.95) = 1.644854. In terms of X this means
that the largest 5% of concentration values will satisfy
X − 0.3
> 1.644854 so X > 0.3986912 .
0.06
Exercise 4.35
Part (a): These two wordings are the same and are computed as P (X > 10) = 1 − P (X <
10) = 1 − pnorm((10 − 8.8)/2.8) = 0.3341176.
Part (b): P (X > 20) = 1 − P (X < 20) = 1 − pnorm((20 − 8.8)/2.8) = 3.167124 10−5.
124
Part (c): This would be P (5 < X < 10) = pnorm((10 − 8.8)/2.8) − pnorm((5 − 8.8)/2.8) =
0.5785145.
Part (d): We want the value of c such that
P (8.8 − c < X < 8.8 + c) = 0.98 .
Converting to a standard normal variable Z this is equivalent to
8.8 + c − 8.8
8.8 − c − 8.8
= 0.98 or
<Z<
P
2.8
2.8
+c
−c
= 0.98 or
<Z<
P
2.8
2.8
c 1 − 2Φ −
= 0.98 or
2.8
c
Φ −
= 0.98 .
2.8
This last equation has the solution
−
c
= qnorm(0.01) = −2.326348 so c = 6.513774 .
2.8
Part (e): From Part (a) we have P (X > 10) = 0.3341176. The event that at least one tree
has a diameter exceeding 10 is the complement of the event that none of the selected trees
has a diameter that large. Thus the probability we seek is
1 − (1 − P (X > 10))4 = 1 − (1 − 0.3341176)4 = 0.803397 .
Exercise 4.36
Part (a): For these we would have
1500 − 1050
= 0.9986501 and
P (X < 1500) = Φ
150
1000 − 1050
P (X > 1000) = 1 − P (X < 1000) = 1 − Φ
= 0.6305587 .
150
Part (b): This is given by
1500 − 1050
P (1000 < X < 1500) = Φ
150
−Φ
1000 − 1050
150
= 0.6292088 .
Part (c): For this part we want to find a t such that P (X < t) = 0.02 or
t − 1050
X − 1050
= 0.02 .
<
P
150
150
125
In R notation this becomes
t − 1050
= qnorm(0.02)
150
Solving for t gives t = 741.9377.
Part (d): Now let p be defined as
p = P (X > 1500) = 1 − P (X < 1500) = 0.001349898 ,
using the result from Part (a). Then if N is the number of droplets (from five) that have a
size greater than 1500 µm we can compute P (N > 1) as
5 0
p (1 − p)5 = 0.006731292 .
P (N > 1) = 1 − P (N = 0) = 1 −
0
Exercise 4.37
Part (a): Now we have P (X = 105) = 0 since X is a continuous. Next we compute
105 − 104
P (X < 105) = Φ
= 0.5792597 .
5
The statement “X is at most 105” is the same condition as X < 105 and the probability we
seek is the same as the one above.
Part (b): We can compute this using
|X − µ|
|X − µ|
> 1 = 1−P
< 1 = 1−(Φ(1)−Φ(−1)) = 0.3173105 .
P (|X−µ| > σ) = P
σ
σ
Part (c): We would want to find the value of t such that
|X − µ|
P
> t = 0.001 .
σ
This means that t is the solution to −qnorm(0.001/2) = 3.290527. Once we have this value of
t the extream values of X are given by X < µ − σt = 87.54737 and X > µ + σt = 120.45263.
Exercise 4.46
Part (a): This would be
P (67 < X < 75) = pnorm((75 − 70)/3) − pnorm((67 − 70)/3) = 0.7935544 .
126
Part (b): We want a value of c such that
c
c
−Φ −
P (70 − c < X < 70 + c) = Φ
3 3c c
c
=Φ
− 1−Φ
= 2Φ
−1,
3
3
3
equals 0.95. Solving for c in the above we get c = 5.879892.
Part (c): The number of acceptable specimens is a binomial random variable which has an
expectation of np or 100.95 = 9.5.
Part (d): Following the hint we have p = P (X < 73.84) = pnorm((73.84 − 70)/3) =
0.8997274 and we want to evaluate
P (Y ≤ 8) =
8
X
dbinom(y, 10, 0.8997274) = 0.2649573 .
y=0
Exercise 4.47
We want the value of c such that P (X + 1 < c) = 0.99 or P (X < c − 1) = 0.99. This means
that
c − 1 − 12
P Z<
= 0.99 ,
3.5
so
c − 13
= qnorm(0.99) = 2.326348 .
3.5
and c = 21.14222.
Exercise 4.48
To solve these we will use
Φ(−c) = 1 − Φ(c) .
(14)
Part (a): We have
P (−1.72 < Z < −0.55) = Φ(−0.55)−Φ(−1.72) = (1−Φ(0.55))−(1−Φ(1.72)) = 0.2484435 .
Part (b): We have
P (−1.75 < Z < 0.55) = Φ(0.55) − Φ(−1.75) = Φ(0.55) − (1 − Φ(1.75)) = 0.6687812 .
127
Exercise 4.98
Part (a): P (10 ≤ X ≤ 20) =
Part (b): P (X ≥ 10) =
Part (c): This would be
R 25
R 20
1
dξ
10 25
1
dξ
10 25
F (x) =
=
Z
0
=
1
(25
25
x
1
(20
25
− 10) = 52 .
− 10) = 53 .
1
dξ =
25
x
25
0
0 ≤ x ≤ 25
.
otherwise
Part (d): Using the formulas for the mean and variance of a uniform distribution we have
1
E(X) = (25 + 0) = 12.5 .
2
Var (X) =
1
(25 − 0)2 = 52.08333 .
12
So σX = 7.216878.
128
Y =0
Y =1
Y =2
Y =3
Y =4
X = 0 0.6(0.5) = 0.3 0.1(0.5) = 0.05 0.1(0.5) = 0.05
0.1(0.5) = 0.05 0.1(0.5) = 0.05
X = 1 0.6(0.3) = 0.18 0.1(0.3) = 0.03 0.05(0.3) = 0.015 0.05(0.3) = 0.015 0.2(0.3) = 0.06
X = 2 0.6(0.2) = 0.12 0.1(0.2) = 0.02 0.05(0.2) = 0.01 0.05(0.2) = 0.01 0.2(0.2) = 0.04
Table 1: The joint probability distribution requested in Exercise 5.2 Part (b).
Joint Probability Distributions and Random Sampling
Problem Solutions
Exercise 5.1
Part (a): 0.02
Part (b): This would be 0.1 + 0.04 + 0.08 + 0.2 = 0.42.
Part (c): The combined condition {X 6= 0 and Y 6= 0} is the event that there is a person
at each pump. We can then compute
P {X 6= 0 and Y 6= 0} = 0.2 + 0.06 + 0.14 + 0.3 = 0.7 .
Part (d): For pX (x) we would compute (for the values of x ∈ {0, 1, 2})
0.16 , 0.34 , 0.5 .
For pY (y) we would compute (for the values of y ∈ {0, 1, 2})
0.24 , 0.38 , 0.38 .
Using these we can compute
P (X ≤ 1) = pX (0) + pX (1) = 0.16 + 0.34 = 0.5 .
Part (e): To be independent we would need to check if p(x, y) = pX (x)pY (y) for all x and
y. Consider x = 0 and y = 0 then from the table we have p(0, 0) = 0.1. Does this equal
pX (0)pY (0) = 0.16(0.24) = 0.0384 .
As these two numbers are not equal we have that X and Y are not independent.
Exercise 5.2
Part (a): See Table 1 for the requested table.
129
x1
pX1 (x1 )
0
1
2
3
4
0.19 0.3 0.25 0.14 0.12
Table 2: The marginal probability distribution pX1 (x1 ) requested in Problem 4 Part (a).
Part (b): From the table above we have
P (X ≤ 1 and Y ≤ 1) = 0.3 + 0.05 + 0.18 + 0.03 = 0.56 ,
while at the same time we have
P (X ≤ 1)P (Y ≤ 1) = (0.5 + 0.3)(0.6 + 0.1) = 0.56 ,
which are the same as they should be.
Part (c): P (X + Y = 0) = P (X = 0 and Y = 0) = 0.3.
Part (d): We have
P (X + Y ≤ 1) = P (X = 0 and Y = 0) + P (X = 0 and Y = 1) + P (X = 1 and Y = 0)
= 0.3 + 0.05 + 0.18 = 0.53 .
Exercise 5.3
Part (a): 0.15.
Part (b): P (X1 = X2 ) = 0.08 + 0.15 + 0.1 + 0.07 = 0.4.
Part (c): This would be
P (A) = P {X1 − X2 ≥ 2 or X2 − X1 ≥ 2} = P {|X1 − X2 | ≥ 2}
= P (0, 2) + P (0, 3) + P (1, 3) + P (2, 0) + P (3, 1) + P (3, 0) + P (4, 2) + P (4, 1) + P (4, 0)
= 0.04 + 0.00 + 0.04 + 0.05 + 0.03 + 0.00 + 0.05 + 0.01 + 0.00 = 0.22 .
Part (d): The first part would be
P {X1 + X2 = 4} = P (1, 3) + P (2, 2) + P (3, 1) + P (4, 0) = 0.04 + 0.1 + 0.03 + 0.00 = 0.17 .
Exercise 5.4
Part (a): See Table 2 for a tabular representation of pX1 (x1 ).
Using that table we can compute
E[X1 ] = 0(0.19) + 1(0.3) + 2(0.25) + 3(0.14) + 4(0.12) = 1.7 .
130
x2
0
1
2
pX2 (x2 ) 0.19 0.3 0.28
3
0.23
Table 3: The marginal probability distribution pX2 (x2 ) requested in Problem 4 Part (b).
x
pX (x)
0
1
0.1 0.2
2
3
4
0.3 0.25 0.15
Table 4: The marginal probability distribution pX (x) requested in Problem 5 Part (a).
Part (b): See Table 3 for a tabular representation of pX2 (x2 ).
Part (c): We have P (X1 = 4, X2 = 0) = 0 while the product P (X1 = 4)P (X2 = 0) =
0.12(0.19) = 0.0228 6= 0 and thus the random variables X1 and X2 are not independent.
Exercise 5.5
Recall that X is the number of customers waiting in line and each customer can have 1,2, or
3 packages each with probabilities 0.6, 0.3, 0.1. The random variable Y is the total number
of packages. To help solving this problem see Table 4 for a tabular representation of pX (x),
and Table 5 for a tabular representation of pY (y).
Part (a):
P (X = 3, Y = 3) = P (Y = 3|X = 3)P (X = 3)
= P (probability each customer has only one package to be wrapped)P (X = 3)
= 0.63 (0.25) = 0.054 .
Part (b): We want to compute P (X = 4, Y = 11) which equals the probability all but one
customer in line has three packages to be wrapped and the one customer with less than three
packages to be wrapped has two packages to be wrapped. This probability is given by
4
3
0.1 0.3 0.15 = 0.00018 .
P (Y = 11|X = 4)P (X = 4) =
1
1
2
3
y
pY (y) 0.6 0.3 0.1
Table 5: The marginal probability distribution pY (y) requested in Problem 5 Part (a).
131
Exercise 5.6
Part (a): Using the hint we have
4
(0.6)2 (0.4)2 · 0.15 = 0.05184 .
P (X = 4, Y = 2) = P (Y = 2|X = 4)P (X = 4) =
2
Part (b): We have
P (X = Y ) =
4
X
P (Y = k|X = k)P (X = k)
k=0
1
2
3
2
= 0.1 + 0.2
0.6 + 0.3
(0.6) + 0.25
(0.6)3 + 0.15(0.6)4
1
2
3
= 0.40144 .
Part (c): We have
n
m
n−m
P (X = n) .
0.6 0.4
P (X = n, Y = m) = P (Y = m|X = n)P (X = n) =
m
To compute this as a table we would let n ∈ {0, 1, 2, 3, 4} and 0 ≤ m ≤ n and evaluate the
above expression. The marginal probability mass function for Y can be computed once we
have evaluated P (X = n, Y = m) above. For fY (Y = m) we would compute
fY (Y = m) =
4
X
P (X = n, Y = m) for m = 0, 1, 2, 3, 4 .
n=m
Exercise 5.7
Recall that X is the number of cars and Y is the number of buses at the proposed left-turn
lane.
Part (a): From the given table we have P (X = 1, Y = 1) = 0.03.
Part (b): We have P (X ≤ 1, Y ≤ 1) = 0.025 + 0.015 + 0.05 + 0.03 = 0.12.
Part (c): We have for one car
P (X = 1) =
2
X
P (X = 1, Y = y) = 0.05 + 0.03 + 0.02 = 0.1 ,
y=0
and for one bus
P (Y = 1) =
5
X
P (X = x, Y = 1) = 0.015 + 0.03 + 0.075 + 0.09 + 0.06 + 0.03 = 0.3 .
x=0
132
Note this is the sum of the numbers in the column with Y = 1.
Part (d): Summing over the cases where would be overflow we would have
P (capacity is exceeded) =
5
X
P (X = x, Y = 1) +
x=3
5
X
P (X = x, Y = 2)
x=0
= 0.09 + 0.06 + 0.03 + 0.01 + 0.02 + 0.05 + 0.06 + 0.04 + 0.02 = 0.38 .
Part (e): From the given table we have PX (x) the sum of the columns and PY (y) the sum
of the rows. To be independent we need to check if PX,Y (x, y) = PX (x)PY (y) for all x and
y. Considering one specific case we have
0.03 = P (X = 1, Y = 2) ,
while
PX (X = 1)PY (Y = 2) = (0.05 + 0.03 + 0.02)(0.015 + 0.03 + 0.075 + 0.09 + 0.06 + 0.03) = 0.03 .
Note that these two results are equal. To fully show independence one would need to verify
this calculation for all x and y.
Exercise 5.8
Part (a): We have
8
3
p(3, 2) =
Part (b): For general x and y we would have
8
p(x, y) =
x
10 12
2
1
30
6
10
y
30
6
.
12
6−x−y
,
for 0 ≤ x ≤ 6 and 0 ≤ y ≤ 6. Note that we must have x + y + z = 6 where z is the number
of components selected from the third supplier.
Exercise 5.9
Part (a): We must select K to ensure that
Z 30 Z 30
f (x, y)dxdy = 1 .
20
20
133
The left-hand-side of this can be evaluated as follows
30
Z 30 Z 30
Z 30 3
x
2
+ xy dy
f (x, y)dxdy = K
3
20
20
y=20
20
Z 30 1 3
3
2
=K
(30 − 20 ) + 10y dy
3
y=20
10K
20K
K
19000 =
(19000) = 1 .
= 19000(10) +
3
3
3
So solving for K we get K =
3
380000
= 7.894737 10−6.
Part (b): Using the above value of K we find
Z 26 Z
P (20 ≤ X ≤ 26, 20 ≤ Y ≤ 26) = K
20
26
(x2 + y 2 )dxdy = 38304K = 0.3024 .
20
Part (c): This would be
P (|Y − X| ≤ 2) = 1 − P (|Y − X| ≥ 2)
= 1 − P (Y − X ≥ 2 or Y − X ≤ −2) = 1 − P (Y ≥ X + 2 or Y ≤ X − 2)
= 1 − P (Y ≥ X + 2) − P (Y ≤ X − 2)
Z 28 Z 30
Z 30 Z x−2
2
2
=1−
K(x + y )dxdy −
K(x2 + y 2 )dxdy
x=20
y=x+2
x=22
y=20
= 1 − 40576K − 40576K = 1 − 81152K = 1 − 0.6406737 = 0.3593263 .
One would need to evaluate the given integrals.
Part (d): We would need to compute
30
Z 30
y 3 2
2
2
pX (x) =
K(x + y )dy = K x y + 3 20
y=20
3
3
30 − 20
19000
2
2
= K 10x +
= K 10x +
,
3
3
(15)
for 20 ≤ x ≤ 30. Note that as X and Y are symmetric the above functional form is also the
functional form for the marginal distribution in the left tire i.e. pY (y).
Part (e): As the functional expression for the joint density f (x, y) = K(x2 + y 2) does not
factor into a function of x alone multiplied by a function of y alone we conclude that the
random variables X and Y are not independent.
Exercise 5.10
Part (a): We would have
PX,Y (x, y) = 1 when 5 ≤ x ≤ 6 and 5 ≤ y ≤ 6 ,
134
= .
and zero otherwise.
Part (b): This would be calculated as
P (5.25 ≤ X ≤ 5.75 and 5.25 ≤ Y ≤ 5.75) = 0.52 .
Part (c): Following the hint we would need to evaluate
1
1
1
P |X − Y | ≤
= 1 − 2P 5 + ≤ X ≤ 6 and 5 ≤ Y ≤ X −
6
6
6
= 1 − 2Area of lower right triangle in X-Y space
2
1
1
1
5
11
25
=1−2
1−
1−
=1−
=
.
=1−
2
6
6
6
36
36
Note that since our probability density is constant in evaluating the above probability we
can use the geometric view of the integration region (i.e. that it is a square region with two
triangles removed).
Exercise 5.11
Part (a): By independence we would have
x −λ y −θ θ e
λ e
,
PX,Y (x, y) =
x!
y!
for x ≥ 0 and y ≥ 0.
Part (b): This would be
P (at most one error) = P (X = 0, Y = 0) + P (X = 0, Y = 1) + P (X = 1, Y = 0)
= e−λ−θ + e−λ−θ (θ + λ) .
Part (c): For the event A defined in the problem we have
P (A) = P {(X, Y ) : X + Y = m} =
−λ−θ
=e
Note that we can write
m
X
λx θm−x
.
x!(m − x)!
x=0
m
X
x=0
P (X = x, Y = m − x)
1 m
1
,
=
x!(m − x)!
m! x
so that the above expression for P (A) becomes
m e−λ−θ X m x m−x e−λ−θ
λ θ
=
(λ + θ)m ,
m! x=0 x
m!
135
using the binomial theorem. Notice that this is the pmf for a Poisson random variable with
parameter λ + θ. Lets check that this result gives the same as we obtained in Part (b) of
this problem. In that part the probability we want to compute is
P {m = 0} + P {m = 1} = e−λ−θ + e−λ−θ (λ + θ) ,
which is the same expression that we got earlier.
Exercise 5.12
Part (a): To solve this we will first compute the pdf of X. To do this we have
−xy ∞
Z ∞
Z ∞
e −x
−x
−x(1+y)
−x
fX (x) =
f (x, y)dy =
xe
dy = xe
= −e (0 − 1) = e ,
−x
y=0
y=0
0
for x ≥ 0. From this we want to compute
Z 3
3
P (X ≥ 3) = 1 − P (X < 3) = 1 −
e−x dx = 1 + e−x 0 = 1 + (e−3 − 1) = e−3 .
0
Part (b): Note that we computed fX (x) above. Now to compute fY (y) we have to evaluate
∞
Z ∞
Z ∞
xe−x(1+y) 1
−x(1+y)
fY (y) =
xe
dx =
+
e−x(1+y) dx
−(1 + y) 0
1+y 0
x=0
−x(1+y) ∞
1
1
e
1
1
=−
=−
(0 − 0) +
(0 − 1) =
.
2
1+y
1 + y −(1 + y) 0
(1 + y)
(1 + y)2
Now for X and Y to be independent we would need to have fX,Y (x, y) = fX (x)fY (y) which
is not true in this case. Thus X and Y are not independent.
Part (c): For this part we want to evaluate
P (X ≥ 3 or Y ≥ 3) = 1 − P (X ≤ 3 and Y ≤ 3)
Z 3 Z 3
Z
−x(1+y)
=1−
xe
dydx = 1 −
x=0
3
y=0
3
−x
xe
x=0
Z
3
e−xy dydx
y=0
3
Z 3
e−xy −x
−3x
−x
dx
=
1
+
e
e
−
1
dx
=1−
xe
−x y=0
x=0
x=0
3
3
Z 3
e−x e−4x −4x
−x
−
=1+
(e
− e )dx = 1 +
−4 0
−1 0
x=0
1
1
1
= 1 − (e−12 − 1) + e−3 − 1 = + e−3 − e−12 .
4
4
4
Z
Exercise 5.13
Part (a): Since X and Y are independent random variables we have
f (x, y) = λe−λx λe−λy = e−x−y ,
136
when we take λ = 1.
Part (b): For this we want to calculate
−λx 1 −λy 1
Z 1Z 1
e e
2 −λx −λy
2
P (X ≤ 1 and X ≤ 1) =
λ e e dxdy = λ
−λ
−λ
0
0
0
0
= (1 − e−λ )(1 − e−λ ) = (1 − e−1 )(1 − e−1 ) = 0.3995764 .
Part (c): Following the hint if we draw of the region A then the probability P that the
total lifetime of the two bulbs is at most two is given by the following integral
Z 2 Z 2−x
P =
λ2 e−λ(x+y) dydx ,
x=0
y=0
which we can evaluate as follows
λy 2−x
Z 2
Z 2
e −λx
2
−λx
−λ(2−x)
e
P =λ
dx
=
−λ
e
e
−
1
dx
−λ 0
x=0
x=0
Z 2
−λx 2
e
= −λ
e−2λ − e−λx dx = −λ e−2λ (2) −
−λ
0
x=0
1
= −λ 2e−2λ + (e−2λ − 1) = −2λe−2λ + 1 − e−2λ
λ
−2λ
=1−e
− 2λe−2λ = 0.5939942 ,
when we take λ = 1.
Part (d): If we draw this region in the X-Y plane then this probability can be expressed
as an integral of the joint probability density function. As such we would need to evaluate
the following
Z 1 Z 2−x
Z 2 Z 2−x
2 −λx −λy
P {1 ≤ X + Y ≤ 2} =
λ e e dydx +
λ2 e−λx e−λy dydx .
x=0
y=1−x
x=1
We do that with the following calculations
Z 1 Z 2−x
Z
2 −λx −λy
P {1 ≤ X + Y ≤ 2} =
λ e e dydx +
x=0
y=1−x
2
x=1
Z
y=0
2−x
λ2 e−λx e−λy dydx
y=0
2−x
2−x
Z 2
1 −λy 1 −λy 2
−λx
2
−λx
dx + λ
dx
e
=λ
− e e
e λ
−λ
x=1
x=0
1−x
0
Z 1
Z 2
−λx −λ(2−x)
−λ(1−x)
= −λ
e (e
−e
)dx − λ
e−λx (e−λ(2−x) − 1)dx
x=0
x=1
Z 1
Z 2
=λ
(e−λ − e−2λ )dx + λ
(e−λx − e−2λ )dx
x=0
x=1
!
−λx 2
e
− e−2λ
= λ(e−λ − e−2λ ) + λ
−λ 1
Z
1
= λ(e−λ − e−2λ ) − λe−2λ + (e−λ − e−2λ )
= λe−λ − 2λe−2λ + e−λ − e−2λ
= (λ + 1)e−λ − (2λ + 1)e−2λ = 0.329753 ,
137
when we take λ = 1.
Exercise 5.14
Part (a): Following Example 5.11 in this section we have
f (x1 , x2 , . . . , x10 ) = λ10 e−λ
P10
i=1
xi
,
for xi ≥ 0. Now we want to evaluate
P (X1 ≤ t, X2 ≤ t, . . . , X10 ≤ t) =
Z tZ
0
=λ
10
t
···
Z
t
λ10 e−λ
0
0
t
10
−λx
Y
i
i=1
P10
i=1
xi
dx1 dx2 · · · dx10
10
Y
e
=
(1 − e−λt ) .
−λ 0 i=1
Part (b): For this part we want exactly k relationships of the form Xi ≤ t and then 10 − k
relationships of the form Xi ≥ t. Then the probability is given by
10
(1 − e−λt )k (e−λt )10−k ,
k
where the factor 10
is needed since that is the number of ways we can select the k light
k
bulbs that will fail.
Part (c): As in the previous part we still need to have five relationships of the form Xi ≤ t
and five others of the form Xi ≥ t. To evaluate the total probability we can condition on
whether the single bulb with parameter θ is in the set that fails before t or in the set that
fails after t. The probability that it is in either set is 21 . Thus we get
1
1
9
9
−θt
−λt 5 −λt 4
−λt 4 −λt 5
(1 − e ) (e ) e +
(1 − e ) (e ) (1 − e−θt ) .
5
4
2
2
Exercise 5.15
Part (a): From the drawing (and the hint) we can write
F (y) = P (Y ≤ y) = P ({X1 ≤ y} ∪ ({X2 ≤ y} ∩ {X3 ≤ y}))
= P (X1 ≤ y) + P ({X2 ≤ y} ∩ {X3 ≤ y}) − P ((X1 ≤ y) ∩ {X2 ≤ y} ∩ {X3 ≤ y})
= (1 − e−λy ) + (1 − e−λy )2 − (1 − e−λy )3
= 1 − 2e−2λy + e−3λy ,
when we expand and simplify. Lets check some properties of F (y) to verify that the above
gives reasonable results. Note that F (0) = 1 − 2 + 1 = 0 and that limy→∞ F (y) = 1 as a
cumulative density should.
138
Part (b): Using the expression for F (y) we find
f (y) = F ′ (y) = 4λe−2λy − 3λe−3λy .
For the expectation of Y we then find
Z ∞
E[Y ] =
y(4λe2λy − 3λe−3λy )dy
0
−3λy ∞
−2λy ∞
Z ∞
Z ∞
1
1
ye
ye
−2λy
−3λy
+
+
e
dy − 3λ
e
dy
= 4λ
−2λ 0
2λ 0
−3λ 0
3λ 0
∞ ∞ 1 e−2λy 1 e−3λy = 4λ 0 +
− 3λ 0 +
2λ −2λ 0
3λ −3λ 0
1
2
2
.
= − (0 − 1) + (0 − 1) =
2λ
3λ
3λ
Exercise 5.16
Part (a): In Example 5.10 we are given f (x1 , x2 , x3 ) and in this part we want to compute
f (x1 , x3 ) by “integrating out” x2 . We can do that as follows
f (x1 , x3 ) =
=
Z
1−x1 −x3
x2 =0
Z 1−x1 −x3
x2 =0
f (x1 , x2 , x3 )dx2
kx1 x2 (1 − x3 )dx2 = kx1 (1 − x3 )
= kx1 (1 − x3 )
Z
1−x1 −x3
x2 dx2
x2 =0
1−x −x
(1 − x1 − x3 )2
x22 1 3
,
= kx1 (1 − x3 )
2 0
2
for x1 ≥ 0, x3 ≥ 0, and x1 + x3 ≤ 1. Here as derived in Example 5.10 we have k = 144.
Part (b): We want P (X1 + X3 < 0.5) which we calculate as
Z 0.5 Z 1−x1
kx1
P (X1 + X3 < 0.5) =
(1 − x3 )(1 − x1 − x3 )2 dx3 dx1
2
x1 =0 x3 =0
Z 0.5 Z 1−x1
= 72
(1 − x3 )((1 − x1 )2 − 2(1 − x1 )x3 + x23 )dx3 dx1 ,
x1 =0
x3 =0
which would need to be integrated.
Part (c): For this part we have
Z 1−x1
Z
f (X1 ) =
f (x1 , x3 )dx3 = 72
x3 =0
1−x1
x3 =0
x1 (1 − x3 )(1 − x1 − x3 )2 dx3 = 6x1 (1 − x1 )3 (x1 + 3) ,
when we integrate.
139
Exercise 5.17
Part (a): Using the “area” representation of probability we have
2
π R2
πR2
1
=
= = 0.25 .
2
2
πR
πR (4)
4
Part (b): This would be
R2
R
1
R
=
= = 0.3183099 > 0.25 .
P X ≤ ∩Y ≤
2
2
2
πR
π
Part (c): The probability is then
2
2 √R2
πR2
=
2
= 0.6366198 .
π
Part (d): The marginal pdf of X and Y are given by
fX (x) =
Z
fY (y) =
Z
√
+ R2 −x2
√
y=− R2 −x2
+
√
x=−
R2 −y 2
√
R2 −y 2
1
2 √ 2
R − x2
dy
=
πR2
πR2
2 p 2
1
dy
=
R − y2 .
πR2
πR2
To have X and Y independent would mean that fX,Y (x, y) = fX (x)fY (y). From the expressions for these pdf’s above we see that this is not true.
Exercise 5.18
Part (a): We have
pX,Y (1, 0)
0.08
0.08
=
=
= 0.2352941
pX (1)
0.08 + 0.2 + 0.06
0.34
0.2
pX,Y (1, 1)
=
= 0.5882353
pY |X (0|1) =
pX (1)
0.34
0.06
pX,Y (1, 2)
=
= 0.1764706 .
pY |X (2|1) =
pX (1)
0.34
pY |X (0|1) =
Part (b): We are told that X = 2 and we want to evaluate pY |X (y|2). We have
pY |X (y|2) =
pX,Y (2, y)
pX,Y (2, y)
pX,Y (2, y)
=
=
.
pX (2)
0.06 + 0.14 + 0.3
0.5
140
When we evaluate the above for y ∈ {0, 1, 2} we get the values
0.12 , 0.28 , 0.60 .
Part (c): This would be
P (Y ≤ 1|X = 2) =
1
X
pY |X (y|2) = 0.12 + 0.28 = 0.4 .
y=0
Part (d): We are told that Y = 2 and we want to evaluate pX|Y (x|2). We have
pX|Y (x|2) =
pX,Y (x, 2)
pX,Y (x, 2)
pX,Y (x, 2)
=
=
.
pY (2)
0.02 + 0.06 + 0.3
0.38
When we evaluate the above for x ∈ {0, 1, 2} we get the values
0.05263158 , 0.15789474 , 0.78947368 .
Exercise 5.19
Part (a): Using fX (x) and fY (y) from Exercise 5.9 on Page 133 we have
f (Y = y|X = x) =
and
f (X = x|Y = y) =
K(x2 + y 2 )
x2 + y 2
f (X = x, Y = y)
=
,
=
f (X = x)
10x2 + 19000
K 10x2 + 19000
3
3
f (X = x, Y = y)
x2 + y 2
K(x2 + y 2 )
.
=
=
2 + 19000
f (Y = y)
10y
K 10y 2 + 19000
3
3
Part (b): We first evaluate
f (Y = y|X = 22) =
222 + y 2
484 + y 2
.
=
11173.33
10(222) + 19000
3
Then we want to evaluate
P (Y ≥ 25|X = 22) =
Z
30
f (Y = y|X = 22)dy = 0.555937 .
25
This to be compared with P (Y ≥ 25) where we have not information on the value of X. We
can evaluate this later probability as
Z 30 Z 30
P (Y ≥ 25) =
K(x2 + y 2 )dxdy = 0.5493418 ,
y=25
x=20
which is smaller than the value of P (Y ≥ 25|X = 22).
141
Part (c): In this case we want to compute
Z 30
E[Y |X = 22] =
yf (Y = y|X = 22)dy
y=20
Z 30 484 + y 2
=
y
dy = 25.3729 .
11173.33
y=20
To compute the standard deviation of the pressure in the left tire we first compute
Z 30
484 + y 2
2
2
dy = 652.029 ,
E[Y |X = 22] =
y
11173.33
y=20
then using this we have
Var (Y |X = 22) = E[Y 2 |X = 22] − E[Y |X = 22]2 = 652.029 − 25.37292 = 8.244946 .
Thus the standard deviation is then the square root of that or 2.871401.
Exercise 5.20
Part (a): We have when we evaluate the pmf of a multinomial distribution
n!
12!
px1 1 px2 2 px3 3 px4 4 px5 5 px6 6 =
0.242 0.132 0.162 0.22 0.132 0.142 = 0.002471206 .
x1 !x2 !x3 !x4 !x5 !x6 !
(2!)6
Part (b): In this case a success is getting an orange candy and a failure is drawing any
other color. Thus we then would have
5 X
20
0.2k 0.820−k .
P (at most five orange candies) =
k
k=1
Since we get an orange candy with a probability 0.2 and then must get any other candy with
a probability of 1 − 0.2 = 0.8.
Part (c): For this part we want to compute P (X1 + X3 + X4 ≥ 10). Now the probability
that we get a blue, a green, or an orange candy is given by the sum of their individual
probabilities or 0.24 + 0.16 + 0.13 = 0.53, so the probability we don’t get one of these colored
candies is 1 − 0.53 = 0.47. We then can compute that
20 X
20
0.53k 0.4720−k .
P (X1 + X3 + X4 ≥ 10) =
k
k=10
Exercise 5.21
Part (a): This would be
p(X3 = x3 |X1 = x1 , X2 = x2 ) =
p(X3 = x3 , X1 = x1 , X2 = x2 )
.
p(X1 = x1 , X2 = x2 )
142
Part (b): This would be
p(X2 = x2 , X3 = x3 |X1 = x1 ) =
p(X1 = x1 , X2 = x2 , X3 = x3 )
.
p(X1 = x1 )
Exercise 5.22
Part (a): We need to evaluate
E[X + Y ] =
X
X
(x + y)f (x, y) = 14.1 .
x∈{0,5,10} y∈{0,5,10,15}
Part (b): We need to evaluate
E[X + Y ] =
X
X
max(x, y)f (x, y) = 9.6 .
x∈{0,5,10} y∈{0,5,10,15}
We have evaluated each of these using the R code:
P = matrix( data=c( 0.02, 0.06, 0.02, 0.1, 0.04, 0.15, 0.20, 0.1, 0.01, 0.15, 0.14, 0.01 ), nrow=3, ncol=4, byrow=T )
X = matrix( data=c( rep(0,4), rep(5,4), rep(10,4) ), nrow=3, ncol=4, byrow=T )
Y = matrix( data=c( rep(0,3), rep(5,3), rep(10,3), rep(15,3) ), nrow=3, ncol=4, byrow=F )
print( sum( ( X + Y ) * P ) )
M_X_Y = matrix( data=c( 0, 5, 10, 15, 5, 5, 10, 15, 10, 10, 10, 15 ), nrow=3, ncol=4, byrow=T )
print( sum( M_X_Y * P ) )
Exercise 5.23
We have
E[X1 − X2 ] =
XX
x1
x2
(x1 − x2 )f (x1 , x2 ) = 0.15 .
We have evaluated each of these using the R code:
P = matrix( data=c( 0.08, 0.07,
0.06, 0.15,
0.05, 0.04,
0.0, 0.03,
0.0, 0.01,
X_1 = matrix( data=c( rep(0,4),
X_2 = matrix( data=c( rep(0,5),
sum( ( X_1 - X_2 ) * P )
0.04, 0.00,
0.05, 0.04,
0.10, 0.06,
0.04, 0.07,
0.05, 0.06 ), nrow=5, ncol=4, byrow=T )
rep(1,4), rep(2,4), rep(3,4), rep(4,4) ), nrow=5, ncol=4, byrow=T )
rep(1,5), rep(2,5), rep(3,5) ), nrow=5, ncol=4 )
Exercise 5.24
Let D be the distance in seats between A and B. The number of seats separating the two
individuals when sitting at locations X and Y is given in Table 6. To count the number of
143
x=1
x=2
x=3
x=4
x=5
x=6
y=1
0
1
2
1
0
y=2
0
0
1
2
1
y=3 y=4 y=5
1
2
1
0
1
2
0
1
0
0
1
0
2
1
0
y=6
0
1
2
1
0
-
Table 6: The number of seats between A and B when A sits at location X and B sits at
location Y .
people who handle the message between A and B we would need to add two to the numbers
given in Table 6. The probability that A sits at X and that Y sits at y is given by
p(x, y) =
1
1
=
.
6(5)
30
We want to evaluate E[h(x, y)] where we find the value 1.6.
Exercise 5.25
The area of the rectangle is given by the product XY . The expected area is then given by
2
Z L+A Z L+A
1
E[Area] =
dxdy
xy
2A
x=L−A y=L−A
2
2 L+A 2 L+A
1
y 1
x (L + A)2 − (L − A)2
=
=
4A2 2 2
4A2
2
x=L−A
x=L−A
1
1
4L2
2
2
2
2
=
(L
+
A
−
L
+
A)
(L
+
A
+
L
−
A)
=
(4A
)(2L)
=
= L2 .
2
2
16A
16A
4
Exercise 5.26
Revenue for the ferry is given by the expression 3X + 10Y so that its expected value would
be given by
XX
E(Revenue) =
(3x + 10y)p(x, y) = 15.4 .
x
y
We can evaluate this using the following R code
P = matrix( data=c( 0.025,
0.050,
0.125,
0.150,
0.015,
0.030,
0.075,
0.090,
0.010,
0.020,
0.050,
0.060,
144
X = matrix( data=c(
nrow=6,
Y = matrix( data=c(
sum( ( 3 * X + 10 *
0.100, 0.060, 0.040,
0.050, 0.030, 0.020 ), nrow=6, ncol=3, byrow=T )
rep(0,3), rep(1,3), rep(2,3), rep(3,3), rep(4,3), rep(5,3) ),
ncol=3, byrow=T )
rep(0,6), rep(1,6), rep(2,6) ), nrow=6, ncol=3 )
Y ) * P )
Exercise 5.27
We want to evaluate
E(h(X, Y )) =
=
Z Z
Z
1
x=0
1
=
Z
=6
=6
=6
h(x, y)f (x, y)dxdy
Z 1
|x − y|fX (x)fY (y)dydx
y=0
1
Z
|x − y|(3x2 )(2y)dydx
x=0 y=0
Z 1 Z x
x=0 y=0
1 Z x
Z
2
(x − y)x ydydx + 6
y=0
1
3 2
x=0
1
2 3 x
xy xy
dx + 6
−
2
3 y=0
when we further integrate and simplify.
Z
1
Z
(y − x)x2 ydydx
x=0 y=x
Z 1 Z 1
(x3 y − x2 y 2)dydx + 6
x=0
Z
Z
x=0
1
x=0
y=x
(x2 y 2 − x3 y)dydx
1
1
x3 y 2 xy
dx = ,
−
2
2 y=x
4
2 3
Exercise 5.28
Using independence we have
Z Z
Z Z
E(XY ) =
xyf (x, y)dxdy =
xyfX (x)fY (y)dxdy
Z
Z
= xfX (x)dx yfY (y)dy = E(X)E(Y ) .
In Exercise 25 with independence we have
E(Area) = E(X)E(Y ) = L2 ,
the same result as we found there.
145
Exercise 5.29
)
For this exercise we want to evaluate ρXY = Cov(X,Y
for Example 5.16. In that example we
σX σY
2
2
calculated µX = µY = 5 and Cov(X, Y ) = − 75 . Thus to compute ρXY we need to evaluate
σX and σY . To do that we recall that
2
σX
= E(X 2 ) − µ2X .
Thus we need
2
Z
1
2
2
Z
1
x3 (1 − 2x + x2 )dx
0
0
1
4
Z 1
x
1 2 1
1
2x5 x6 3
4
5
= 12
(x − 2x + x )dx = 12
= .
−
+ = 12
− +
4
5
6 0
4 5 6
5
0
E(X ) =
x (12x(1 − x ))dx = 12
Since the densities for X and Y are the same the above is also equal to E(Y 2 ). Using this
4
1
2
we have σX
= σY2 = 51 − 25
= 25
so σX = 15 . We can now evaluate ρXY and find
ρXY =
2
− 75
1
25
2
=− .
3
Exercise 5.30
For these two parts recall that
Cov(X, Y ) = E(XY ) − E(X)E(Y )
Cov(X, Y )
ρ=
.
σX σY
We can compute all that is needed with the following R code
P = matrix( data=c( 0.02, 0.06, 0.02, 0.1, 0.04, 0.15, 0.20,
X = matrix( data=c( rep(0,4), rep(5,4), rep(10,4) ), nrow=3,
Y = matrix( data=c( rep(0,3), rep(5,3), rep(10,3), rep(15,3)
E_X = sum( X * P )
E_Y = sum( X * P )
E_XY = sum( X * Y * P )
print( E_XY - E_X * E_Y )
E_X2 = sum( X^2 * P )
E_Y2 = sum( Y^2 * P )
rho = ( E_XY - E_X * E_Y ) / sqrt( ( E_X2 - E_X^2 ) * ( E_Y2
print(rho)
0.1, 0.01, 0.15, 0.14, 0.01 ), nrow=3, ncol=4, byrow=T )
ncol=4, byrow=T )
), nrow=3, ncol=4, byrow=F )
- E_Y^2 ) )
Part (a-b): We find Cov(X, Y ) = 13.4475 and ρ = 0.4862374.
146
Exercise 5.31
Part (a-b): Using the results from Exercise 9 in Page 133 we find
1925
= 25.32
76
37040
= 649.825
E(X 2 ) = E(Y 2 ) =
57
Var(X) = Var(Y ) = 649.825 − 25.322 = 8.7226
E(XY ) = 641.447
Cov(X, Y ) = 641.447 − 25.322 = 0.3446
0.3446
Corr(X, Y ) =
= 0.03950657 .
8.7226
E(X) = E(Y ) =
Note that these numbers do not agree with the ones given in the back of the book. If anyone
sees a mistake in what I have done here please contact me.
Exercise 5.32
Using the results of Exercise 12 on Page 136 we can compute
Z ∞Z ∞
Z ∞
Z ∞
−x(1+y)
2 −x
E(XY ) =
xy(xe
)dydx =
xe
ye−xy dydx
x=0 y=0
x=0
y=0
−xy ∞
Z ∞
Z ∞
e xe−x (0 − 1)dx
=
x2 e−x
dx = −
−x
x=0
x=0
y=0
Z ∞
Z ∞
∞
=
xe−x dx = −xe−x 0 +
e−x dx = 1 ,
x=0
x=0
when we simplify some. From the densities of fX (x) and fY (y) we can compute E(X) = 1
and E(Y ) = 1 thus
Cov(X, Y ) = E(XY ) − E(X)E(Y ) = 1 − 1 = 0 .
We would then also have ρ = 0.
Exercise 5.33
We have
Cov(X, Y ) = E(XY ) − E(X)E(Y ) = E(X)E(Y ) − E(X)E(Y ) = 0 ,
when we use independence.
147
Exercise 5.34
Part (a): We would have
2
σ =
=
Z Z
Z Z
(h(x, y) − E[h(x, y)])2 f (x, y)dxdy
h(x, y)2 f (x, y)dxdy − E[h(x, y)]2 .
Part (b): We compute
E[h] = 0(0.02) + 5(0.06) + 10(0.02) + 15(0.1)
+ 5(0.04) + 5(0.15) + 10(0.2) + 15(0.1)
+ 10(0.01) + 10(0.15) + 10(0.14) + 15(0.1) = 10.95 .
and
E[h2 ] = 02 (0.02) + 52 (0.06) + 102 (0.02) + 152 (0.1)
+ 52 (0.04) + 52 (0.15) + 102 (0.2) + 152 (0.1)
+ 102 (0.01) + 102 (0.15) + 102(0.14) + 152 (0.1) = 125.75 .
Thus σ 2 = 125.75 − 10.952 = 5.8475.
Exercise 5.35
Part (a): We can show the desired result with the following manipulations
Cov(aX + b, cY + d) = E((aX + b)(cY + d)) − E(aX + b)E(cY + d)
= E(acXY + adX + bcY + bd) − (aE(X) + b)(cE(Y ) + d)
= acE(XY ) + adE(X) + bcE(Y ) + bd
− (acE(X)E(Y ) + adE(X) + bcE(Y ) + bd)
= ac(E(XY ) − E(X)E(Y )) = acCov(X, Y ) .
Part (b): To begin recall that
Var(aX + b) = a2 Var(X)
Var(cY + d) = c2 Var(Y ) .
Thus using this and the result from Part (a) we have
Cov(aX + b, cY + d)
Var(aX + b)Var(cY + d)
acCov(X, Y )
acCov(X, Y )
=
=p
|a||c|σX σY
a2 Var(X)c2 Var(Y )
Corr(aX + b, cY + d) = p
= sign(a)sign(c)Corr(X, Y ) .
148
Here the function sign(x) is one if x > 0, is zero if x = 0, and is minus one if x < 0. Thus if
a and c have the same sign we have sign(a)sign(c) = 1 and we have the requested result.
Part (c): If a and c have opposite signs then sign(a)sign(c) = −1 and the correlation of the
linear combination is the negative of the correlation of the random variables X and Y .
Exercise 5.36
If Y = aX + b then from Exercise 35 above we have
Corr(X, Y ) = Corr(X, aX + b) = sign(a)Corr(X, X) .
Notice that Corr(X, X) = 1 and thus Corr(X, Y ) = ±1 depending on the sign of a. Specifically if a > 0 the correlation of X and Y will be +1 while if a < 0 it will be −1.
Notes on Example 5.21
¯ ≤ x¯} and {T0 ≤ 2¯
Since the two events {X
x} are equivalent, the cumulative probability
¯
x) = FT0 (2¯
x). This means we
distribution for X can be derived from that of T0 namely FX¯ (¯
evaluate the function FT0 (·) computed in Example 5.21 at the value 2¯
x. We get
x) = FT0 (2¯
x) = 1 − e−2λ¯x − 2λ¯
xe−2λ¯x .
FX¯ (¯
¯ as
With this we can get the density function for X
x) =
fX¯ (¯
x)
dFX¯ (¯
= 2λe−2λ¯x − 2λe−2λ¯x − 2λ¯
x(−2λ)e−2λ¯x = 4λ2 x¯e−2λ¯x ,
d¯
x
the same as the books equation 5.6.
Exercise 5.37
Part (a-b): See Table 7 where we present all possible two element samples we could draw
from this population. With each sample of two we compute the statistics x¯ = 12 (x1 + x2 ) and
s2 = (x1 − x¯)2 + (x2 − x¯)2 .
1
Note that since n = 2 the normalization factor of n−1
in the unbiased variance estimate
¯ in Table 8.
becomes 1. Once we have this data we display the sampling distribution for X
Using that table we can compute
¯ = 0.04(25) + 0.2(32.5) + 0.25(40) + 0.12(45) + 0.3(52.5) + 0.09(65) = 44.5 .
E(X)
Notice that this equals the population mean µ given by
µ = 0.2(25) + 0.5(40) + 0.3(65) = 44.5 .
149
x1
25
25
25
40
40
40
65
65
65
x2
25
40
65
25
40
65
25
40
65
p(x1 , x2 )
0.2(0.2) = 0.04
0.2(0.5) = 0.10
0.2(0.3) = 0.06
0.5(0.2) = 0.10
0.5(0.5) = 0.25
0.5(0.3) = 0.15
0.3(0.2) = 0.06
0.3(0.5) = 0.15
0.3(0.3) = 0.09
x¯
25
32.5
45
32.5
40
52.5
45
52.5
65
s2
0
112.5
800
112.5
0
312.5
800
312.5
0
Table 7: The possible two element samples we could draw from Exercise 37.
25 32.5 40
45 52.5
x¯
x) 0.04 0.2 0.25 0.12 0.30
pX¯ (¯
65
0.09
¯ for Exercise 37.
Table 8: The sampling distribution of X
The sampling distribution of S 2 is given in Table 9. From the sampling distribution of S 2
we find
E(S 2 ) = 0(0.38) + 112.5(0.2) + 312.5(0.3) + 800(0.12) = 212.25 .
Notice that this equals the population variance σ 2 is given by
σ 2 = 0.2(25 − 44.5)2 + 0.5(40 − 44.5)2 + 0.3(65 − 44.5)2 = 212.25 .
Exercise 5.38
Part (a): See Table 10 where we present all possible two element samples we could obtain
from the given distribution are presented. Based on this result the probability distribution
for T0 is given in Table 11.
Part (b): We compute
µT0 = 0(0.04) + 1(0.2) + 2(0.37) + 3(0.3) + 4(0.09) = 2.2 .
Note that we are told the population mean µ is µ = 1.1. Note that µT0 = 2µ.
s2
pS 2 (s2 )
0
112.5 312.5 800
0.38 0.2
0.30 0.12
Table 9: The sampling distribution of S 2 for Exercise 37.
150
x1
0
0
0
1
1
1
2
2
2
x2
0
1
2
0
1
2
0
1
2
p(x1 , x2 )
t0
0.2(0.2) = 0.04 0
0.2(0.5) = 0.10 1
0.2(0.3) = 0.06 2
0.5(0.2) = 0.1 1
0.5(0.5) = 0.25 2
0.5(0.3) = 0.15 3
0.3(0.2) = 0.06 2
0.3(0.5) = 0.15 3
0.3(0.3) = 0.09 4
Table 10: The possible two element samples we could draw from Exercise 38.
t0
0
1
2
3
4
pT0 (t0 ) 0.04 0.1+0.1=0.2 0.06 + 0.25 + 0.06 = 0.37 0.15 + 0.15 = 0.3 0.09
Table 11: The sampling distribution of T0 for Exercise 38.
Part (c): We compute
σT20 = E(T02 ) − E(T0 )2
= 02 (0.04) + 12 (0.2) + 22 (0.37) + 32 (0.3) + 42 (0.09) − 2.22 = 5.82 − 2.22 = 0.98 .
Note that σT20 = 2(0.49) = 2σ 2 .
Exercise 5.39
X is a binomial random variable with p = 0.8 and n = 10 representing the number of
successes (a drive that works in a satisfactory manner). Now V ≡ Xn is a scaled binomial
random variable so the probability of getting a certain value of V is equal to the probability
of getting various binomial probabilities. We tabulate these values in Table 12. We generated
the probabilities of each sample using the R code dbinom(0:10,10,0.8)
Exercise 5.40
Let the type of envelope opened be denoted by 0, 5, or 10 representing the dollar amount.
We generate the samples for this problem using the python code ex5 40.py. When we run
that code we get the following (partial) output.
v1=
v1=
v1=
0, v2=
0, v2=
0, v2=
0, v3=
0, v3=
0, v3=
0, prob= 0.125, max(v1,v2,v3)= 0
5, prob= 0.075, max(v1,v2,v3)= 5
10, prob= 0.050, max(v1,v2,v3)= 10
151
v = nx
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0 10pV (v)
10
0.1 0.8 = 0.0000001024
0 10
0.11 0.89 = 0.0000040960
1
10
0.12 0.88 = 0.0000737280
1
0.0007864320
0.0055050240
0.0264241152
0.0880803840
0.2013265920
0.3019898880
0.2684354560
0.1073741824
Table 12: The sampling distribution of
v1=
v1=
v1=
0, v2=
0, v2=
0, v2=
5, v3=
5, v3=
5, v3=
X
n
for Exercise 39.
0, prob= 0.075, max(v1,v2,v3)= 5
5, prob= 0.045, max(v1,v2,v3)= 5
10, prob= 0.030, max(v1,v2,v3)= 10
Part (a): If we accumulate the probabilities of the various possible maximums we get the
following
{0: 0.125, 10: 0.4880000000000001, 5: 0.387}
Thus the probability of getting a maximum of 5 is 0.387.
Part (b): The above python code could be modified to compare the values of M for different
samples sizes. We would expect that as we draw more samples it is more likely that we get
larger values for M and thus there should be more probability weight places on the higher
values for M.
Exercise 5.41
Part (a): We generate the samples for this problem using the python code ex5 41.py.
¯ (as a python
When we run that code we get the following sampling distribution for X
defaultdict).
defaultdict(<type ’float’>,
{1.5: 0.24, 1.0: 0.16000000000000003,
2.0: 0.25, 3.0: 0.1, 4.0: 0.010000000000000002,
2.5: 0.2, 3.5: 0.04000000000000001})
152
Part (b): We compute 0.85.
Part (c): In the same python code we compute
defaultdict(<type ’float’>,
{0: 0.30000000000000004, 1: 0.4, 2: 0.22000000000000003, 3: 0.08000000000000002})
¯ ≤ 1.5 would be for “samples of the form”
Part (d): The only samples where X
(1, 1, 1, 1)
(2, 1, 1, 1)
(3, 1, 1, 1)
1
4
4
4
= 6.
2
(2, 2, 1, 1)
I’ve listed the number of samples that are like the expression on the right in each case. Now
we get all ones with a probability of 0.44 etc. Thus the desired probability is given by
10.44 + 40.43 0.3 + 40.43 0.2 + 60.420.32 = 0.24 .
Exercise 5.42
Part (a): We generate the samples for this problem using the python code ex5 42.py.
¯ given by
When we run that code we get the following sampling distribution for X
[(27.75, 0.06666666666666667),
(28.0, 0.03333333333333333),
(29.7, 0.03333333333333333),
(29.700000000000003, 0.06666666666666667),
(29.95, 0.06666666666666667),
(31.65, 0.13333333333333333),
(31.9, 0.06666666666666667),
(33.6, 0.03333333333333333)]
¯ = 15.21.
We find that E(X)
Part (b): For this part we select an office first and then average the two salaries in that
office. There are only three choices we can make (which office to select) and thus the sampling
¯ under this sampling method is given by
distribution of X
27.75
31.65
31.9
153
6
4
Density
0
2
2
0
1
Density
3
8
n=10
4
n=5
4.40
4.45
4.50
4.55
4.60
4.65
4.25
4.30
4.35
4.40
4.45
4.50
colMeans(WD)
colMeans(WD)
n=20
n=30
4.55
4.60
4.55
4.60
4
3
Density
0
0
1
1
2
2
Density
3
5
4
4.35
4.3
4.4
4.5
4.6
4.7
4.25
4.30
colMeans(WD)
4.35
4.40
4.45
4.50
colMeans(WD)
¯ Exercise 44.
Figure 4: The sampling distribution of X
¯ = 30.43.
each with a probability of 13 . We find that in this case E(X)
Part (c): Notice that the population mean µ = 30.43.
Exercise 5.43
For b = 1, 2, · · · , B − 1, B we would draw n samples from our dispensing machine. For
each of these samples we would compute the fourth spread. This fourth spread would be
the statistic of interest and we would have B values of this statistic. These B values could
be used to compute/display the sampling distribution of the fourth spread from a uniform
random variables.
Exercise 5.44
See the R code ex5 44.R where we perform the requested simulation. When that code is run
we get the result given in Figure 4. The histogram with n = 30 looks the most approximately
normal.
154
n=20
0.00
0.0
32
33
34
35
30
32
34
36
colMeans(WD)
colMeans(WD)
n=30
n=50
38
0.15
0.00
0.00
0.05
0.05
0.10
0.10
Density
0.15
0.20
0.25
0.20
31
Density
0.10
Density
0.05
0.2
0.1
Density
0.3
0.15
0.4
0.20
n=10
28
30
32
34
36
38
40
30
32
colMeans(WD)
34
36
38
colMeans(WD)
¯ Exercise 45.
Figure 5: The sampling distribution of X
Exercise 5.45
See the R code ex5 45.R where we perform the requested simulation. When that code is run
we get the result given in Figure 5. The histogram with n = 50 looks the most approximately
normal.
Exercise 5.46
¯ is centered on µX¯ = 12 cm and σX¯ =
Part (a): X
√σ
n
=
0.04
4
= 0.01 cm.
¯ is centered on µX¯ = 12 cm and σX¯ =
Part (b): When n = 64 we have that X
0.005 cm.
√σ
n
=
0.04
8
=
¯ will be closer to µ = 12 as σX¯ in that case
Part (c): When n = 64 the sample value of X
is smaller.
155
Exercise 5.47
Part (a): We have
¯ ≤ 12.01) = Φ 12.01 − 12
P (11.99 ≤ X
0.01
11.99 − 12
−Φ
0.01
= 0.6826895 .
Part (b): We have
¯ ≥ 12.01) = 1 − P (X
¯ ≤ 12.01) = 1 − Φ
P (X
12.01 − 12
√
0.04/ 25
= 0.1056498 .
Exercise 5.48
Part (a): Using the Central Limit Theorem (CLT) we have µX¯ = 50 and σX¯ =
Then we have
49.9 − 50
50.1 − 50
¯
−Φ
= 0.6826895 .
P (49.9 ≤ X ≤ 50.1) = Φ
1/10
1/10
√1
n
=
1
.
10
Part (b): This would be
¯ ≤ 50.1) = Φ
P (49.9 ≤ X
50.1 − 49.8
1/10
−Φ
49.9 − 49.8
1/10
= 0.1573054 .
Exercise 5.49
Here n = 40 and the time to grade all the papers T0 is the sum of n random variables
√
T
√0 = X1 + X2 + · · · + Xn−1 + Xn so µT0 = nµ = 40(6) = 240 minutes and σT0 = nσ =
40(6) = 37.947 minutes. To finish grading by 11:00 P.M. we have to finish grading in 10
minutes + four hours or 250 minutes. Thus we want to calculate
250 − 240
√
P (T0 ≤ 250) = Φ
= 0.6039263 .
6 40
Part (b): In this case we want to calculate
260 − 240
√
P (T0 ≥ 260) = 1 − P (T0 < 260) = Φ
6 40
156
= 0.2990807 .
Exercise 5.50
Part (a): We have
10200
−
10000
9900
−
10000
¯ ≤ 10200) = P
√
√
≤Z≤
P (9900 ≤ X
500/ 40
500/ 40
10200 − 10000
9900 − 10000
√
√
=Φ
−Φ
500/ 40
500/ 40
= 0.8913424 .
Part (b): This would be given by changing n to 15 and given by
9900 − 10000
10200 − 10000
√
√
−Φ
= 0.7200434 .
Φ
500/ 15
500/ 15
Exercise 5.51
¯ ≤ 11) which can be done using
On the first day we want to calculate P (X
11 − 10
√
,
Φ
2/ 5
with a similar expression for the second day. If we want the sample average to be at most
11 minutes on both days then we would multiply these two results. When we do that we get
the value of 0.7724277.
Exercise 5.52
We want to find L such that P (T0 ≥ L) = 0.05 or
1 − P (T0 < L) = 0.05 ,
or
P (T0 < L) = 0.95 ,
or with n = 4
P
or
T − nµ
L − nµ
√
< √
nσ
nσ
= 0.95 ,
L − nµ
√
= 1.644854 ,
nσ
using R’s qnorm function. We can solve the above for L when we take n = 4, µ = 10, and
σ = 1 where we find L = 43.28971.
157
Exercise 5.53
Part (a): We would compute
¯ ≥ 51) = 1 − P (X
¯ < 51) = 1 − Φ
P (X
51 − 50
√
1.2/ 9
= 0.006209665 .
Part (b): We could use the same formula as above but with n = 40 to get 6.8 10−8.
Exercise 5.54
Part (a): We could compute these as
3.0
−
2.65
¯ ≤ 3.0) = Φ
√
= 0.9802444
P (X
0.85/ 25
3.0
−
2.65
2.65
−
2.65
¯ ≤ 3.0) = Φ
√
√
P (2.65 ≤ X
−Φ
= 0.4802444 .
0.85/ 25
0.85/ 25
Part (b): We would want to pick the value of n such that
3.0
−
2.65
¯ ≤ 3.0) = Φ
√
= 0.99 ,
P (X
0.85/ n
or when we use qnorm we get
0.85
3.0 − 2.65 = √ (2.326348) .
n
We can solve for n to find n = 31.91913. Since n must be an integer we take n ≥ 32.
Exercise 5.55
Part (a): Use the normal approximation we can compute
35 − 50
70 − 50
√
√
−Φ
= 0.9807137 .
P (35 ≤ N ≤ 70) = Φ
50
50
Here since the number of parking tickets is√Poisson we have a mean of λ = 50 and a variance
of λ = 50 (the standard deviation is then 50 = 7.071068).
Part (b): We have
P (225 ≤ T0 ≤ 275) = Φ
275 − 5(50)
√ √
5 50
158
225 − 5(50)
√ √
−Φ
5 50
= 0.8861537 .
Exercise 5.56
Part (a): Let T0 be the total number of errors. Then we have
µT0 = nµ = 1000(1/10) = 100
1/2
√
9
1
= 9.486833
σT0 = 1000
10
10
125 − 100
P (T0 ≤ 125) = Φ
= 0.995796 .
9.486833
Part (b): Let T1 and T2 be the number of errors in the first and second message respectively.
Now define X to be X ≡ T1 − T2 . Notice that X has an expectation of zero and a variance
(due to the independence of T1 and T2 ) given by
Var (X) = Var (T1 ) + Var (T2 ) = 2(9.486833) = 180 .
Then the probability we want is given by
50 − 0
P (|X| ≤ 50) = P (−50 ≤ X ≤ 50) = Φ √
180
−Φ
−50 − 0
√
180
= 0.9998061 .
Exercise 5.57
From properties of the gamma distribution
µX = αβ = 100
2
σX
= αβ 2 = 200 .
We want to evaluate
P (X ≤ 125) ≈ Φ
125 − µ
σ
= 0.9614501 .
Exercise 5.58
Part (a): We have
E(volume) = 27µ1 + 125µ2 + 512µ3
= 27(200) + 125(250) + 512(100) = 87850 ,
and
Var (volume) = 272 σ12 + 1252 σ22 + 5122σ32
= 272 (102) + 1252(122 ) + 5122 (82 ) = 19100116 .
Part (b): No we would need to know the covariances between two different variables Xi
and Xj for i 6= j.
159
Exercise 5.59
Part (a): We have
P (X1 + X2 + X3 ≤ 200) = Φ
and
P (150 ≤ X1 + X2 + X3 ≤ 200) = Φ
200 − 3(60)
p
3(15)
200 − 3(60)
p
3(15)
!
−Φ
!
= 0.9985654 ,
150 − 3(60)
p
3(15)
!
= 0.9985616 .
¯ = 1 (X1 + X2 + X3 ) we have µX¯ = 1 (3(60)) = 60 and
Part (b): Since X
3
3
1
2
σX
¯ = (3(15)) = 5 .
9
Then using these we have
and
55 − 60
¯
¯
√
= 0.9873263 ,
P (55 ≤ X) = 1 − P (X < 55) = 1 − Φ
5
62
−
60
58
−
60
¯ ≤ 62) = Φ
√
√
P (58 ≤ X
−Φ
= 0.6289066 .
5
5
Note that these numbers do not agree with the ones given in the back of the book. If anyone
sees anything incorrect in what I have done please let me know.
Part (c): If we define the random variable V as
V ≡ X1 − 0.5X2 − 0.5X3 ,
then we have E(V ) = 60 − 0.5(60) − 0.5(60) = 0 and
1 2 1 2
45
1 1
2
Var (V ) = σ1 + σ2 + σ3 = 15 1 + +
=
.
4
4
4 4
2
The probability we want to calculate is then given by
!
!
−10 − 0
5−0
−Φ p
= 0.8365722 .
Φ p
45/2
45/2
Part (d): In this case let V ≡ X1 + X2 + X3 so that E(V ) = 40 + 50 + 60 = 150 and
Var (V ) = σ12 + σ22 + σ32 = 10 + 12 + 14 = 36. Thus
160 − 150
P (X1 + X2 + X3 ≤ 160) = Φ
= 0.9522096 .
6
Next let V ≡ X1 + X2 − 2X3 so that E(V ) = 40 + 50 − 2(60) = −30 and Var (V ) =
σ12 + σ22 + 4σ32 = 10 + 12 + 4(14) = 78. Using these we have
P (X1 + X2 ≥ 2X3 ) = P (X1 + X2 − 2X3 ≥ 0) = 1 − P (X1 + X2 − 2X3 < 0)
0 − (−30)
√
= 0.0003408551 .
=1−Φ
78
160
Exercise 5.60
With the given definition of Y we have
1
1
1
1
E(Y ) = (µ1 + µ2 ) − (µ3 + µ4 + µ5 ) = (2(20)) − (3(21)) = 20 − 21 = −1 ,
2
3
2
3
and
1
1
1
1
1
1
1
Var (Y ) = σ12 + σ22 + σ32 + σ42 + σ52 = (2(4)) + (3(3.5)) = 3.166 .
4
4
9
9
9
4
9
Then we have
0 − (−1)
P (0 ≤ Y ) = 1 − P (Y < 0) = 1 − Φ √
= 0.2870544 ,
3.166
and
1 − (−1)
P (−1 ≤ Y ≤ +1) = Φ √
3.166
−1 − (−1)
√
−Φ
3.166
= 0.369498 .
Exercise 5.61
Part (a): The total number of vehicles is given by X + Y . We can compute the requested
information using the following R code
P = matrix( data=c( 0.025, 0.015, 0.010,
0.050, 0.030, 0.020,
0.125, 0.075, 0.050,
0.150, 0.090, 0.060,
0.100, 0.060, 0.040,
0.050, 0.030, 0.020 ), nrow=6, ncol=3, byrow=T )
X = matrix( data=c( rep(0,3), rep(1,3), rep(2,3), rep(3,3), rep(4,3), rep(5,3) ),
nrow=6, ncol=3, byrow=T )
Y = matrix( data=c( rep(0,6), rep(1,6), rep(2,6) ), nrow=6, ncol=3 )
E_T = sum( ( X + Y ) * P )
E_T2 = sum( ( X + Y )^2 * P )
Var_T = E_T2 - E_T^2
Std_T = sqrt( Var_T )
Numerically we get
> c( E_T, E_T2, Var_T, Std_T )
[1] 3.500000 14.520000 2.270000
1.506652
Part (b): The revenue is given by 3X + 10Y . We can compute the requested information
using the following R code
161
E_R = sum( ( 3 * X + 10 * Y ) * P )
E_R2 = sum( ( 3 * X + 10 * Y )^2 * P )
Var_R = E_R2 - E_R^2
Std_R = sqrt( Var_R )
Numerically we get
> c( E_R, E_R2, Var_R, Std_R )
[1] 15.400000 313.100000 75.940000
8.714356
Exercise 5.62
We compute
60 − (15 + 30 + 20)
√
P (X1 + X2 + X3 ≤ 60) = Φ
12 + 22 + 1.52
= 0.9683411 .
Exercise 5.63
Part (a): To compute this we will use
Cov (X1 , X2 ) = E(X1 X2 ) − E(X1 )E(X2 ) .
To evaluate each of the expressions on the right-hand-side of the above we have used the
following R code:
P = matrix( data=c( 0.08, 0.07, 0.04, 0.00,
0.06, 0.15, 0.05, 0.04,
0.05, 0.04, 0.10, 0.06,
0.0, 0.03, 0.04, 0.07,
0.0, 0.01, 0.05, 0.06 ), nrow=5, ncol=4, byrow=T )
X_1 = matrix( data=c( rep(0,4), rep(1,4), rep(2,4), rep(3,4), rep(4,4) ), nrow=5, ncol=4, byrow=T )
X_2 = matrix( data=c( rep(0,5), rep(1,5), rep(2,5), rep(3,5) ), nrow=5, ncol=4 )
E_X1 = sum( X_1 * P )
E_X2 = sum( X_2 * P )
E_X1_X2 = sum( X_1 * X_2 * P )
Cov_X1_X2 = E_X1_X2 - E_X1 * E_X2
Numerically we get
> c( E_X1, E_X2, E_X1_X2, Cov_X1_X2 )
[1] 1.700 1.550 3.330 0.695
Part (b): To compute Var (X1 + X2 ) we will use
Var (X1 + X2 ) = Var (X1 ) + Var (X2 ) + 2Cov (X1 , X2 ) .
Continuing with the calculations started above we have
162
E_X1_Sq = sum( X_1^2 * P )
E_X2_Sq = sum( X_2^2 * P )
Var_X1 = E_X1_Sq - E_X1^2
Var_X2 = E_X2_Sq - E_X2^2
Numerically these give
> c( Var_X1, Var_X2, Var_X1 + Var_X2, Var_X1 + Var_X2 + 2 * Cov_X1_X2 )
[1] 1.5900 1.0875 2.6775 4.0675
Exercise 5.64
Part (a): Let Xi be the waiting times for the morning bus and Yi the waiting times for the
evening bus for i = 1, 2, 3,
5 (Monday
P5 through Friday). Let the total weighting time be
P4,
5
denoted W so that W ≡ i=1 Xi + i=1 Yi . Then
E(W ) = 5E(Xi ) + 5E(Yi ) = 5(4) + 5(5) = 45 ,
minuets.
Part (b): We use the formula for the variance of a uniform distribution and independence
to get
Var (W ) =
5
X
Var (Xi ) +
5
X
Var (Yi ) = 5
i=1
i=1
82
12
+5
102
12
=
820
= 68.33 .
12
Part (c): On a given day i the difference between the morning and evening weighting
times would be Vi = Xi − Yi . Thus E(Vi ) = E(Xi ) − E(Yi ) = 4 − 5 = −1 and Var (Vi ) =
2
82
+ 10
= 13.6667.
Var (Xi ) + Var (Yi ) = 12
12
P5
i=1
2Yi . Thus
2E(V
) = 5E(Xi ) − 5E(Yi ) =
5(4) − 5(5) = 5(4) − 5(5) = −5 and Var (V ) = 5 812 + 5 10
= 820
= 68.33.
12
12
Part (d): This would be V =
P5
i=1
Xi −
Exercise 5.65
Part (a): Note that
and
¯ − E(Y¯ ) = 5 − 5 = 0 ,
µX−
¯ Y¯ = E(X)
2
0.2
¯
¯
¯
¯
= 0.0032 .
Var X − Y = Var X + Var Y = 2
25
163
Using these we have
¯ − Y¯ ≤ +0.1) = Φ
P (−0.1 ≤ X
0.1 − 0
√
0.0032
−0.1 − 0
−Φ √
0.0032
= 0.9229001 .
Part (b): In this case n = 36 and the variance of the difference changes to give
2
0.2
¯
¯
Var X − Y = 2
= 0.00222 ,
36
then we have
¯ − Y¯ ≤ +0.1) = Φ √0.1 − 0
P (−0.1 ≤ X
0.00222
−0.1 − 0
−Φ √
0.00222
= 0.9661943 .
Exercise 5.66
Part (a): From the problem statement we have
E(Bending Moment) = a1 E(X1 ) + a2 E(X2 ) = 5(2) + 10(4) = 50
Var (Bending Moment) = a21 Var (X1 ) + a22 Var (X2 ) = 52 (0.5)2 + 102 (12 ) = 106.25
√
std(Bending Moment) = 106.25 = 10.307 .
Part (b): This is given by
P (Bending Moment > 75) = 1−P (Bending Moment < 75) = 1−Φ
75 − 50
10.307
= 0.007646686 .
Part (c): This would be
E(Bending Moment) = E(A1 )E(X1 ) + E(A2 )E(X2 ) = 5(2) + 10(4) = 50 .
Part (d): To compute this we will use the formula
Var (Bending Moment) = E(Bending Moment2 ) − E(Bending Moment)2 .
First we need to compute
E(Bending Moment2 ) = E((A1 X1 + A2 X2 )2 )
= E(A21 X12 + 2A1 A2 X1 X2 + A22 X22 )
= E(A21 )E(X12 ) + 2E(A1 )E(A2 )E(X1 )E(X2 ) + E(A22 )E(X22 ) .
Now to use the above we compute
E(A21 ) = Var (A1 ) + E(A1 )2 = 0.52 + 52 = 25.25
E(A22 ) = 0.52 + E(A2 )2 = 0.52 + 102 = 100.25
E(X12 ) = 0.52 + 22 = 4.25
E(X22 ) = 12 + 42 = 17 .
164
Thus we can use the above to compute
E(Bending Moment) = 25.25(4.25) + 2(5)(10)(2)(4) + (100.25)(17) = 2611.562 .
Thus
Var (Bending Moment) = E(Bending Moment2 ) − E(Bending Moment)2
= 2611.562 − 502 = 111.5625 .
Part (e): Now if Corr(X1 , X2 ) = 0.5 then
Cor(X1 , X2 ) = σ1 σ2 Corr(X1 , X2 ) = 0.5(1)(0.5) = 0.25 .
Using this we compute
Var Bending Moment2 = Var (a1 X1 + a2 X2 )
= a21 Var (X1 ) + a22 Var (X2 ) + 2a1 a2 Cov(X1 , X2 )
= 52 (0.5)2 + 102 (12 ) + 2(5)(10)(0.25) = 131.25 .
Exercise 5.67
I think this problem means that we will connect a length “20” pipe to a length “15” pipe
in such a way that they overlap by “1” inch. Let the first pipe length by denoted as X1 ,
the second pipes length be denoted as X2 and the connectors length be denoted as O (for
overlap). Then the total length when all three are connected is then given by
L = X1 + X2 − O .
Thus E(L) = 20 + 15 − 1 = 34 and
Var (L) = Var (X1 ) + Var (X2 ) + Var (O) = 0.52 + 0.42 + 0.12 = 0.42 .
We want to compute
P (34.5 ≤ L ≤ 35) = Φ
35 − 34
√
0.42
−Φ
34.5 − 34
√
0.42
= 0.158789 .
Exercise 5.68
If the velocities of the first and second plane are given by the random variables V1 and V2
respectively then the distance between the two planes after a time t is
D = (10 + V1 t) − V2 t = 10 + (V1 − V2 )t .
Now D is normally distributed with a mean
E(D) = 10 + (E(V1 ) − E(V2 ))t = 10 + (520 − 500)t = 10 + 20t ,
165
and a variance given by
Var (D) = t2 (Var (V1 ) + Var (V2 )) = t2 (102 + 102 ) = 200t2 .
Part (a): We want to compute the probability that D ≥ 0 when t = 2. We find
0 − (10 + 20(2))
√
= 0.9614501 .
P (D ≥ 0) = 1 − P (D < 0) = 1 − Φ
200 22
Part (b): We want to compute the probability that D ≤ 10 when t = 2. We find
10 − (10 − 20(2))
√
P (D ≤ 10) = Φ
= 0.0786496 .
200 22
Exercise 5.69
Part (a): The expected total number of cars entering the freeway is given by
E(T ) = E(X1 ) + E(X2 ) + E(X3 ) = 800 + 1000 + 600 = 2400 .
Part (b): Assuming independence we can compute
Var (T ) = Var (X1 ) + Var (X2 ) + Var (X3 ) = 162 + 252 + 182 = 1205 .
Part (c): The value of E(T ) does not change from the value computed above if the number
of cars on each road is correlated. The variance of T is now given by
E(T ) = 162 + 252 + 182 + 2Cov (X1 , X2 ) + 2Cov (X1 , X3 ) + 2Cov (X2 , X3 )
= 1205 + 2(80) + 2(90) + 2(100) = 1745 .
√
So the standard deviation is 1745 = 41.7732.
Exercise 5.70
Part (a): From the definition of W given by W =
E(W ) =
n
X
iE(Yi ) =
i=1
n
X
Pn
i(0.5) =
i=1
i=1 iYi
n
X
1
2
we have
i=
i=1
n(n + 1)
.
4
Part (b): Since Yi is a binomial random variable from the properties of the binomial random
variable we have Var (Yi ) = pq = p(1 − p). Thus for the variance of W we have
Var (W ) =
n
X
i2 Var (Yi ) =
i=1
= p(1 − p)
n
X
i=1
n
X
i=1
2
i2 p(1 − p)
i = p(1 − p)
n(n + 1)(2n + 1)
6
166
=
n(n + 1)(2n + 1)
.
24
Exercise 5.71
Part (a): The bending moment would be given by
Bending Moment = a1 X1 + a2 X2 + W
Z
12
xdx
2 12
x = a1 X1 + a2 X2 + W
2 0
144
= a1 X1 + a2 X2 + W
2
= 5X1 + 10X2 + 72W .
0
With this expression we have that
E(Bending Moment) = 5E(X1 ) + 10E(X2 ) + 72E(W )
= 5(2) + 10(4) + 72(1.5) = 158 ,
and
Var (Bending Moment) = 52 Var (X1 ) + 102 Var (X2 ) + 722 Var (X3 )
= 25(0.52) + 100(12 ) + 722 (0.252 ) = 430.25 .
Part (b): Using the above we have that
200 − 158
P (bending moment ≤ 200) = Φ √
430.25
= 0.9785577 .
Exercise 5.72
Let T be the total time taken to run all errands and return to the office then T = X1 + X2 +
X3 + X4 with T measured in minutes. We want to compute the value of t such that
P (T ≥ t) = 0.01 ,
or
P (T < t) = 0.99 ,
or
t − (15 + 5 + 8 + 12)
Φ √
< 0.99
42 + 12 + 22 + 32
We can solve the above for t to find t = 52.74193 minutes. Thus the sign should say “I will
return by 10:53 A.M.”.
167
Exercise 5.73
¯ is approximately normal with a mean of 105 and a variance of
Part (a): X
62
.
approximately normal with a mean of 100 and a variance of 35
82
.
40
Y¯ is
¯ − Y¯ is approximately normal with a mean of 105 − 100 = 5 and a variance of
Part (b): X
2
2
¯ − Y¯ = 8 + 6 = 2.628 .
Var X
40 35
Part (c): Using the above results we compute
−1 − 5
1−5
¯
¯
−Φ √
= 0.006701698 .
P (−1 ≤ X − Y ≤ +1) = Φ √
2.628
2.628
Part (d): We calculate
10 − 5
¯
¯
¯
¯
P (X − Y ≥ +10) = 1 − P (X − Y ≤ +10) = 1 − Φ √
= 0.001021292 .
2.628
Since this is so small we would doubt the hypothesis that µ1 − µ2 = 5.
Exercise 5.74
If X and Y are binomial random variables and we let Z = X − Y then we have
E(Z) = n(0.7) − n(0.6) = 50(0.1) = 5
Var (Z) = n(0.7)(0.3) + n(0.6)(0.4) = 22.5 .
Where we have used the result that the variance of a binomial random variable is given by
npq. Using these results we can approximate
5−5
−5 − 5
P (−5 ≤ X − Y ≤ +5) = Φ √
−Φ √
= 0.4824925 .
22.5
22.5
Exercise 5.75
Part (a): We compute the marginal pmf for X in Table 13 and for Y in Table 14.
Part (b): This would be
P (X ≤ 15 ∩ Y ≤ 15) = 0.05 + 0.05 + 0.05 + 0.1 = 0.25 .
Part (c): We need to check if fX,Y (x, y) = fX (x)fY (y) for all x and y.
168
x
fX (x)
12 0.05 + 0.05 + 0.1 = 0.2
15 0.05 + 0.1 + 0.35 = 0.5
20
0 + 0.2 + 0.1 = 0.3
Table 13: The expression for fX (x).
fY (y)
y
12 0.05 + 0.05 + 0. = 0.1
15 0.05 + 0.1 + 0.2 = 0.35
20 0.1 + 0.35 + 0.1 = 0.55
Table 14: The expression for fY (y).
Part (d): We have
E(X + Y ) = 24(0.05) + 27(0.05) + 32(0.1)
+ 27(0.05) + 30(0.1) + 35(0.35)
+ 32(0) + 35(0.2) + 40(0.1) = 33.35 .
Part (e): We have
E(|X − Y |) = 0(0.05) + 3(0.05) + 8(0.1)
+ 3(0.05) + 0(0.1) + 5(0.35)
+ 8(0) + 5(0.2) + 0(0.1) = 3.85 .
Exercise 5.76
Let X1 and X2 be independent normal random variables with a mean of zero and a variance
of one. Then X1 + X2 is a normal random variable with a mean of zero and a variance
of 2. The 75% percentile of either X1 or X2 is given in R as qnorm(0.75,0,1). Evaluating
this gives 0.6744898. Two of these gives 1.34898. The 75% percentile of X1 + X2 is given
by qnorm(0.75, 0, 2). When we evaluate this we get 1.34898. Thus it looks like the 75%
percentiles add together when we add random variables.
169
30
25
20
15
0
5
10
y
0
10
20
30
40
x
Figure 6: The region of nonzero probability for Exercise 77.
Exercise 5.77
Part (a): See Figure 6 for the region of positive density. From that region the value of k is
given by evaluating the following
Z 20 Z 30−x
Z 30 Z 30−x
1=
kxydydx +
kxydydx
x=0 y=20−x
Z 20 2 30−x
x=20 y=0
Z 30 2 30−x
y dx
2 y=0
x=20
x=0
Z 20
Z 30
x
x
2
2
=k
((30 − x) − (20 − x) )dx + k
(30 − x)2 dx
2
2
x=20
x=0
81250k
70000
+ 3750k =
.
=k
3
3
=k
Thus k =
3
81250
x
y dx + k
2 y=20−x
= 3.692308 10−5.
170
x
Part (b): The marginal pdf of X is given by
( R 30−x
kxydy 0 < x < 20
Ry=20−x
fX (x) =
30−x
kxydy 20 < x < 30
y=0

2 30−x

 kxy
kxydy = kx
((30 − x)2 − (20 − x)2 )
2 2
y=20−x
30−x
.
=
kxy 2 
kx
2

=
(30
−
x)
2
2
y=0
In the same way we have the marginal pdf of Y given by
( R
30−y
kxydx 20 < y < 30
x=0
R
fY (y) =
30−y
kxydx 0 < y < 20
x=20−y
ky
(30 − y)2
2
.
=
ky
((30 − y)2 − (20 − y)2)
2
Note that f (x, y) 6= fX (x)fY (y) and X and Y are not independent.
Part (c): We need to evaluate
P (X + Y ≤ 15) =
Z
25
x=0
25−x
Z
kxydydx .
y=20−x
Part (d): We need to evaluate
E(X + Y ) =
Z
25
x=0
Z
25−x
(x + y)kxydydx .
y=20−x
Part (e): We need to compute E(X), E(Y ), E(XY ), E(X 2 ), E(Y 2 ), Var (X) and Var (Y )
to evaluate these.
Part (f): We first need to evaluate
2
E((X + Y ) ) =
Z
25
x=0
Z
25−x
(x + y)2kxydydx ,
y=20−x
and then use the formula for the variance expressed as the difference of expectations to
evaluate Var (X + Y ).
Exercise 5.78
By the argument given in the problem statement we would have
FY (y) = P {Y ≤ y} =
171
n
Y
i=1
P {Xi ≤ y} .
Since each Xi is a uniform random variable we have

y < 100
 0
y−100
P {Xi ≤ y} =
100
< y < 200
 100
1
y > 200
Using this we have
y − 100
100
Our pdf for Y is given by fY (y) =
or
FY (y) =
dFY
dy
fY (y) =
n
for 100 ≤ y ≤ 200 .
n
(y − 100)n−1 .
n
100
We then get the expectation of Y to be given by
Z 200
Z 200
yn
(y − 100)n−1dy
E(Y ) =
yfY (y)dy =
n
100
100
100
Z 200
Z 200
n
n
n−1
=
(y − 100) dy + 100
(y − 100) dy
100n 100
100
2n + 1
100n+1 100n+1
n
= 100
.
+
=
100n n + 1
n
n+1
Exercise 5.79
Let the random variable representing the average calorie intake be given by
365
1 X
(Xi + Yi + Zi ) .
V =
365 i=1
Where Xi , Yi , and Zi are defined in the problem. Then we have
365
1 X
E(V ) =
(E(Xi ) + E(Yi ) + E(Zi ))
365 i=1
1
(365(500) + 365(900) + 365(2000)) = 3400 ,
365
=
and
1
Var (V ) =
3652
1
=
3652
We want to calculate
365
X
i=1
365
X
i=1
(σx2 + σY2 + σZ2 )
!
(502 + 1002 + 1802 )
!
=
1
(502 + 1002 + 1802 ) = 123.0137 .
365
3500 − 3400
P (V < 3500) = Φ √
123.0137
172
= 1.
Exercise 5.80
P50
P
Part (a): Let T0 be equal the total luggage weight or T0 = 12
i=1 Yi where Xi is
i=1 Xi +
the weight of the ith business class customers luggage and Yi is the weight of the ith tourist
class customers luggage weight. Now with this definition we have that
E(T0 ) = 12E(Xi ) + 50E(Yi ) = 12(40) + 50(30) = 1980
2
Var (T0 ) = 12σX
+ 50σY2i = 12(102) + 50(62) = 3000 .
i
Part (b): For this part we want to compute
2500 − 1980
√
P (T0 ≤ 2500) = Φ
= 1.
3000
Exercise 5.81
Part (a): We can use the expression
E(X1 + X2 + · · · + XN ) = E(N)µ ,
to compute the desired expected total repair time. Let Xi be the length of time taken to
repair the ith component. We want to compute
E(X1 + X2 + · · · + XN ) = E(N)µ = 10(40) = 400 ,
minutes.
Part (b): Let Xi be the number of defects found in the ith component. Then the total
number of defects in four hours is T = X1 + · · · + XN where N is the number of components
that come in during the four hour period. We don’t know the value of N since it is a random
variable. We know however that
E(N) = 4E(N1 ) = 4(5) = 20 ,
where N1 is the number of components submitted in one hour. Using this we have
E(T ) = E(N)µ = 20E(X1 ) = 20(3.5) = 70 .
Exercise 5.82
Let total number of voters that favor this candidate be denoted T and then T = Tr + Tu
where Tr are the number of voters form the rural area and Tu are the number of voters from
the urban area. From the description in the problem Tr is a binomial random variable with
173
n = 200 and p = 0.45 and Tu is a binomial random variable with n = 300 and p = 0.6. We
want to compute
P (Tr + Tu ≥ 250) = 1 − P (Tr + Tu < 250) .
To use the central limit theorem (CLT) we need to know
E(Tr + Tu ) = 0.45(200) + 0.6(300) = 270
Var (Tr + Tu ) = 200(0.45)(0.55) + 300(0.6)(0.4) = 121.5 .
With these (and the CLT) the probability we want can be approximated by
250 − 270
√
= 0.9651948 .
1−Φ
121.5
Exercise 5.83
¯ − µ| < 0.02) = 0.95. Now X
¯ has a mean of
We want to find a value of n such that P (|X
0.1
¯ − µ has a mean of
µ and a standard deviation given by √n so that the random variable X
zero (and the same standard deviation). Thus we can divide by the standard deviation to
write the probability above as
¯
|X − µ|
0.02
√ <
√
P
= 0.95 .
0.1/ n
0.1/ n
Since the random variable
be written
¯
|X−µ|
√
0.1/ n
is the absolute value of a standard normal the above can
0.02
√
P |Z| <
= 0.95 .
0.1/ n
Based on properties of the standard normal we can show that
P (|Z| < c) = 1 − 2P (Z < −c) = 1 − 2Φ(−c) .
Thus we can write the above as
0.02
√
= 0.95 .
1 − 2Φ −
0.1/ n
Simplifying some we get
0.02
√
Φ −
= 0.025 .
0.1/ n
In the above we can solve for n. When we do this we get n = 96.03647. Thus as n must be
an integer we would take n = 97.
174
Exercise 5.84
The
P14 amount of soft drink consumed (in ounces) in two weeks (14 days) is given by T0 =
i=1 Xi where Xi is a normal random variable with a mean of 13 oz and a standard deviation
of 2 oz. Thus E(T0 ) = 14(13) = 182 and Var (T0 ) = 14(22 ) = 56. The total amount of soft
drink we have in the two six-packs is 2(6)(16) = 192 oz. For the problem we want to compute
192 − 182
√
= 0.9092754 .
P (T0 < 192) = Φ
56
Exercise 5.85
Exercise 58 is worked on Page 159. The total volume is given by V = 27X1 + 125X2 + 512X3
and we want to compute
100000 − 87850
√
P (V ≤ 100000) = Φ
= 0.9972828 .
19100116
Exercise 5.86
To make it to class we must have X2 − X1 or the amount of time between the end of the
first class and the start of the second class larger than the time it takes to get the second
class from the first class. From the problem statement X1 − X2 is a normal random variable
with a mean given by difference between the two means or
9:10 - 9:02 = 8 ,
2
minutes. The variable X1 − X2 has a variance of σX
= 12 + 1.52 = 3.25. To compute
2 −X1
the probability of interest want to compute P (X2 − X1 > X3 ). To find this probability
consider the random variable X2 − X1 − X3 . This is a normal random variable with a mean
of 8 − 6 = 2 minutes and a variance (by independence) of 12 + 1.52 + 12 = 4.25. Thus we
have
0−2
P (X2 − X1 − X3 > 0) = 1 − P (X2 − X1 − X3 < 0) = 1 − Φ √
= 0.8340123 .
4.25
Exercise 5.87
Part (a): Note that we can write
Var (aX + Y ) = a2 Var (X) + Var (Y ) + 2aCov (X, Y ) .
175
σY
σX
and the above becomes
2
σY
σY
2
2
Cov (X, Y )
Var (aX + Y ) =
σX + σY + 2
2
σX
σX
σY
2
= 2σY + 2
Cov (X, Y ) .
σX
In the above let a =
Next we recall that Cov (X, Y ) = σX σY ρ and that the variance is always positive so that
the above becomes
σY
2
ρσX σY ≥ 0 ,
2σY + 2
σX
or when we cancel positive factors we get
1 + ρ ≥ 0 so ρ ≥ −1 .
Part (b): Using the fact that Var (aX − Y ) ≥ 0 and the same type of expansion as above
we have
2
a2 σX
+ σY2 − 2aCov (X, Y ) ≥ 0 .
As before let a =
σY
σX
to get
σY2
+
σY2
−2
σY
σX
σX σY ρ ≥ 0 .
Which simplifies to
1 − ρ ≥ 0 so ρ ≤ 1 .
Exercise 5.88
To minimize E((X + Y − t)2 ] with respect to t we will take the derivative with respect to
t, set the result equal to zero, and then solve for t. Taking the derivative and setting the
result equal to zero gives
2E((X + Y − t)) = 0 .
The left-hand-side of this (after we divide by two) is given by
ZZ
2
(x + y − t)(2x + 3y)dydx .
5
To integrate this more easily with respect to y we will write the integrand as
(y + x − t)(3y + 2x) = 3y 2 + 2xy + 3y(x − t) + 2x(x − t) .
Integrating with respect to y over 0 < y < 1 gives
1
3
3
3
y + xy + y(x − t) + 2x(x − t)y = 1 + x + (x − t) + 2x(x − t)
2
2
y=0
5
3
− 2t x + 2x2 .
= 1− t +
2
2
176
We now integrate this with respect to x over 0 < x < 1 to get
1
3
35 5
5
x 2 3 1− t x+
=
− 2t
+ x
− t,
2
2
2 3 x=0 12 2
when we simplify. Setting this equal to zero and solving for t gives t =
minimizes the error of prediction.
7
6
for the value that
Exercise 5.89
Part (a): To do this part one needs to derive the cumulative density function for X1 + X2
by computing P (X1 + X2 ≤ t) and the fact that f (x1 , x2 ) is the product of two chi-squared
distributions with parameters ν1 and ν2 .
Part (b): From Part (a) of this problem we know that Z12 +Z22 +· · ·+Zn2 will be a chi-squared
random variables with parameter ν = n.
Part (c): As Xiσ−µ is a standard normal random variable when we square this we will get
a chi-squared random variable with ν = 1. First recall that the distribution of the sum of
chi-squared random variables is another chi-squared random variable with its degree equal
to the sum of the degrees of the chi-squared random variables in the sum. Because of this
2
the sum of n variables of the form Xiσ−µ is another chi-squared random variable with
parameter ν = n.
Exercise 5.90
Part (a): We have
Cov(X, Y + Z) = E(X(Y + Z)) − E(X)E(Y + Z)
= E(XY ) + E(XZ) − E(X)E(Y ) − E(X)E(Z)
= E(XY ) − E(X)E(Y ) + E(XZ) − E(X)E(Z)
= Cov (X, Y ) + Cov (X, Z) .
Part (b): Using the given covariance values we have
Cov (X1 + X2 , Y1 + Y2 ) = Cov (X1 , Y1 ) + Cov (X1 , Y2 ) + Cov (X2 , Y1 ) + Cov (X2 , Y2 )
= 5 + 1 + 2 + 8 = 16 .
177
Exercise 5.91
Part (a): As a first step we use the definition of the correlation coefficient ρ as
Cov (X1 , X2 )
Cov (W + E1 , W + E2 )
=
σX1 σX2
σX1 σX2
Cov (W, W ) + Cov (W, E2 ) + Cov (E1 , W ) + Cov (E1 , E2 )
=
σX1 σX2
2
Cov (W, W )
σW
=
=
.
σX1 σX2
σX1 σX2
ρ=
Since E1 and E2 are independent of one another and from W . Now for i = 1, 2 note that
2
2
σX
= Var (W + Ei ) = Var (W ) + Var (Ei ) = σW
+ σE2 .
i
Thus we have that ρ is given by
ρ=
2
σW
.
2
σW
+ σE2
Part (b): Using the above formula we have ρ =
1
1+0.012
= 0.9999.
Exercise 5.93
Following the formulas given in the book when Y = X4
1
X1
+
1
X2
1
1
1
E(Y ) = h(µ1 , µ2 , µ3 , µ4 ) = 120
+
+
10 15 20
+
1
X3
we have
= 26 .
Next to compute the variance we need
∂h
1
120
∂h
so
= X4 − 2
(µ1 , µ2 , µ3 , µ4 ) = − 2 = −1.2
∂X1
X1
∂X1
10
∂h
∂h
1
120
so
= X4 − 2
(µ1 , µ2 , µ3 , µ4 ) = − 2 = −0.5333333
∂X2
X2
∂X2
15
∂h
1
120
∂h
so
= X4 − 2
(µ1 , µ2 , µ3 , µ4 ) = − 2 = −0.3
∂X3
X3
∂X3
20
∂h
1
1
1
∂h
1
1
1
=
+
+
so
(µ1 , µ2 , µ3 , µ4 ) =
+
+
= 0.2166667 .
∂X4
X1 X2 X3
∂X4
10 15 20
Using these we get
2
2
2
2
∂h
∂h
∂h
∂h
2
2
2
σ1 +
σ2 +
σ3 +
σ42
V (Y ) =
∂x1
∂x2
∂x3
∂x4
2 2
2 2
2
2
= (−1.2) 1 + (−0.5333333) 1 + (−0.3) 1.5 + 0.2166667242 = 2.678056 .
178
Exercise 5.94
For this problem we will use the more accurate expression for the expectation of a function
of several random variables given by
1 2 ∂2h
1 ∂2h
+
·
·
·
+
σ
.
E[h(X1 , . . . , Xn )] = h(µ1 , . . . , µn ) + σ12
2 ∂x1 2
2 n ∂xn 2
In Exercise 93 above we computed h(µ1 , . . . , µn ) and all of the first derivatives of h. To use
the above we need to compute the second derivatives of h. We find
∂2h
∂2h
2X4
2(120)
so
= 0.24
=
2
2 (µ1 , µ2 , µ3 , µ4 ) =
3
X1
103
∂X1
∂X1
∂2h
2X4
2(120)
∂2h
= 0.07111111
=
so
2
2 (µ1 , µ2 , µ3 , µ4 ) =
3
X2
153
∂X2
∂X2
∂2h
2X4
2(120)
∂2h
so
= 0.03
=
2
2 (µ1 , µ2 , µ3 , µ4 ) =
3
X3
203
∂X3
∂X3
∂2h
= 0.
∂X4 2
Using these in the above formula we have
1
1
1
E(Y ) = 26 + 12 (0.24) + 12 (0.07111111) + (1.52 )(0.03) = 26 + 0.1893056 = 26.18931 .
2
2
2
Exercise 5.95
Part (a-b): To start with let U = αX + βY where X and Y are independent standard
normal random variables. Then we have
2
Cov (U, X) = αCov (X, X) + βCov (X, Y ) = ασX
= α,
and
2
Cov (U, U) = α2 σX
+ β 2 σY2 = α2 + β 2 .
To have Corr (X, U) = ρ we pick α and β such that
α
ρ= p
.
2
α + β2
If we take α = ρ then we have
ρ= p
ρ
ρ2 + β 2
.
When we solve for β in the above we get
Thus the linear combination
p
β = ± 1 − ρ2 .
U = ρX ±
p
1 − ρ2 Y
will have Corr (X, U) = ρ. Now if α = 0.6 and β = 0.8 using Equation 16 we get
0.6
= 0.6 .
Corr (U, X) = √
0.62 + 0.82
179
(16)
Tests of Hypotheses Based on a Single Sample
Problem Solutions
Exercise 8.1
To be a statistical hypothesis the statement must be an assumption about the value of a
single parameter or several parameters from a population.
Part (b): This is not since x˜ is the sample median and not a population parameter.
Part (c): This is not since s is the sample standard deviation and not a population parameter.
¯ and Y¯ are sample means and not population parameters.
Part (e): This is not since X
Exercise 8.2
For this books purpose the hypothesis H0 will always be an equality claim while Ha will look
like one of the following
Ha : θ > θ0
Ha : θ < θ0
Ha : θ =
6 θ0 .
Part (a): Yes.
Part (b): No since Ha is a less than or equal statement.
Part (c): No since H0 is not an equality statement.
Part (d): Yes.
Part (e): No since S1 and S2 are not population statistics.
Part (f): No as Ha is not of the correct form.
Part (g): Yes.
Part (h): Yes.
180
Exercise 8.3
If we reject H0 we want to be certain that µ > 100 since that is the requirement.
Exercise 8.4
With the hypothesis test
H0 : µ = 5
Ha : µ > 5 .
In a type I error we classify the water as contaminated when it is not. This seems like not a
very dangerous error. A type II error will have us not rejecting Ha when we should i.e. we
classify the contaminated water as clean. This second error seems more dangerous.
With the hypothesis test
H0 : µ = 5
Ha : µ < 5 ,
under a type I error we classify the water as safe when it is not (and is the dangerous error).
A type II error will have us failing to classify safe water when it is.
In general it is easier to specify a fixed value of α (the probability of a type I) error rather
than β the probability of a type II error. Thus we would prefer to use the second test where
we could make α very small (resulting in very few waters classified as safe when they are
not).
Exercise 8.5
We would test the hypothesis that
H0 : σ = 0.05
Ha : σ < 0.05 .
A type I error will reject H0 when it is true and thus conclude that the standard deviation
is smaller than 0.05 when it is not. A type II error will fail to reject H0 when it is not true
or we will fail to notice that the standard deviation is smaller than 0.05 when it is.
Exercise 8.6
The hypothesis test we would specify would be
H0 : µ = 40
Ha : µ 6= 40 .
181
The type I error would be to reject H0 when it is true or to state that the manufacturing
process is producing fuses that are not in specification. A type II error would be to accept
H0 when it is not true or Ha is true. This would mean that we are producing fuses outside
specifications and we don’t detect this.
Exercise 8.7
For this problem a type I error would indicate that we assume that the mean water temperature is too hot when in fact it is not. This would result in attempts to cool the water and
result in even cooler water. A type II error would result in failing to reject H0 when in fact
the water is too hot. Thus we could be working with water that is too hot and never know
it. This second error would seem more serious.
Exercise 8.8
Let µr and µs be the average warpage for the regular and the special laminate. Then we
hope that µs < µr and our hypothesis test could be
H0 : µ r − µ s = 0
Ha : µ r − µ s > 0 .
For this problem a type I error would indicate that we assume that the warpage under the
new laminate is less when in fact it is not. This we would switch to the new laminate when
it is not actually better. A type II error would result in failing to reject H0 when in fact the
new laminate is better.
Exercise 8.9
Part (a): Since we need a two sided test we would select R1 .
Part (b): A type I error is to conclude that the proportion of customers favors one cable
company over another. A type II error is to conclude that there is no favored company when
in fact there is.
Part (c): When H0 is true we have X ∼ Bin(25, 0.5) and we have
X
Pr(X = x|H0 ) =
α=
x∈R1
Which we computed with the following R code
x = c( 0:7, 18:25 )
182
sum( dbinom( x, 25, 0.5 ) )
[1] 0.04328525
Part (d): These would be computed with
x = 8:17
c( sum( dbinom( x, 25, 0.3 ) ), sum( dbinom( x, 25, 0.4 ) ),
sum( dbinom( x, 25, 0.6 ) ), sum( dbinom( x, 25, 0.7 ) ) )
[1] 0.4881334 0.8452428 0.8452428 0.4881334
Part (e): According to R1 we would reject H0 in favor of Ha .
Exercise 8.10
Part (a): The hypothesis test we would specify would be
H0 : µ = 1300
Ha : µ > 1300 .
¯ is normal with a mean of 1300 and a standard deviation of
Part (b): When H0 is true X
60
√
= 13.41641. With this we find
20
α = P {¯
x ≥ 1331.26|H0} =
1331.26 − 1300
1331.26 − 1300
x¯ − 1300
= 0.009903529 ,
≥
|H0 = 1 − Φ
=P
13.41641
13.41641
13.41641
or about 1%.
¯ is normal with a mean of 1350 and a standard deviation of
Part (c): In this case X
60
√
= 13.41641. Following the same steps as above we find
20
1331.26 − 1350
.
1 − β = P {¯
x ≥ 1331.26|Ha} = 1 − Φ
13.41641
This gives β = 0.08123729.
Part (d): We could change the critical value xc to be such that
x¯ − 1300
xc − 1300
xc − 1300
α = 0.05 = P {¯
x ≥ xc |H0} = P
.
≥
|H0 = 1 − Φ
13.41641
13.41641
13.41641
Solving for xc gives xc = 1322.068. This would make β smaller since we will be rejecting H0
more often.
¯ = 1331.26 into the expression for Z where we get a rejection
Part (e): We would put X
region {z ≥ 2.329359}.
183
Exercise 8.11
Part (a): The hypothesis test we would specify would be
H0 : µ = 10
Ha : µ 6= 10 .
Part (b): This would be
α = P (¯
x ≤ 9.8968|H0) + P (¯
x ≥ 10.1032|H0)
10.1032 − 10
9.8968 − 10
+ 1−Φ
= 0.009880032 .
=Φ
0.04
0.04
Part (c): If µ = 10.1 then we have
β = P (9.8968 ≤ x¯ ≤ 10.1032)
9.8968 − 10.1
10.1032 − 10.1
=P
≤Z≤
0.04
0.04
9.8968 − 10.1
10.1032 − 10.1
−Φ
= 0.5318812 .
=Φ
0.04
0.04
The same manipulations for µ = 9.8 give β = 0.007760254.
−10
Part (d): We define z ≡ x¯0.04
and we want to translate the critical region in Part (b) in
terms of x¯ into a critical region in terms of z. Letting x¯ equal the two values in the critical
region of Part (a) we get the numerical values −2.58 and +2.58 thus c = 2.58.
Part (e): We would now want to find a value of c such that
1 − α = P {−c ≤ z ≤ +c} ,
or if α = 0.05 we can write this as
0.95 = Φ(c) − Φ(−c) = (1 − Φ(−c)) − Φ(−c) = 1 − 2Φ(−c) .
When we solve this for c we find c = 1.959964. Then using this value for c and n = 10 we
get critical values on x¯ given by
0.2
(1.959964) = {9.876041 , 10.123959} .
10 ± √
10
Our rejection region would then be to reject H0 if either x¯ ≥ 10.123959 or x¯ ≤ 9.876041.
Part (f): For the given data set we compute x¯ = 10.0203. Since this is not in the rejection
region we have no evidence to reject H0 .
Part (g): This would be to reject H0 if either z ≥ +2.58 or z ≤ −2.58.
184
Exercise 8.12
Part (a): Our hypothesis test would be
H0 : µ = 120
Ha : µ < 120 .
¯ is sufficiently small which will happen in region R2 .
Part (b): We reject H0 is X
Part (c): We have
115.20 − 120
α = P (¯
x ≤ 115.20|H0) = Φ
(10/6)
= 0.001988376 .
To have a test with α = 0.001 we would want to pick a different critical value for xc i.e. we
pick xc such that
xc − 120
Φ
= 0.001 .
(10/6)
Solving for xc in the above gives xc = 114.8496.
Part (d): This would be the value of β where
115.20 − 115
= 0.5477584 .
1 − β = P (¯
x ≤ 115.20|Ha) = Φ
10/6
Thus β = 0.4522416.
Part (e): These would be (using R notation)
α = pnorm(−2.33) = 0.0099 ≈ 0.01
α = pnorm(−2.88) = 0.00198 ≈ 0.002 .
Exercise 8.13
Part (a): We compute
σ
α = P x¯ > µ0 + 2.33 √
|H0 = 1 − Φ(2.33) = 0.009903 ≈ 0.01 .
n
Part (b): This would be
2.33σ
x¯ − µ
µ0 − µ
√ ≥
√ + 2.33
α = P x¯ ≥ µ0 + √ |µ = 99 = P
n
σ/ n
σ/ n
µ0 − µ
√
= 1 − Φ 2.33 +
.
σ/ n
When µ0 = 100, n = 25, σ = 5 and µ = 99 the above gives 0.0004342299. When√µ = 98 the
above gives 7.455467 10−6. If the actual µ is less than µ0 we have (µ0 − µ)/(σ/ n) > 0 so
α in the above formula gets smaller. This makes sense since we are less likely to get a large
¯ and less likely to reject H0 .
reading for X
185
Exercise 8.14
Part (a): We have
α = P (z ≤ −2.65 or x ≥ 2.51|H0)
= Φ(−2.65) + (1 − Φ(−2.51)) = 0.01006115 .
Part (b): The probability we don’t reject H0 given that H0 is not true is given by
β = P (9.894 ≤ x¯ ≤ 10.1004|µ = 10.1)
9.894 − 10.1
10.1004 − 10.1
−Φ
= 0.5031726 .
=Φ
(0.2/5)
(0.2/5)
Exercise 8.15
Using R notation we have
Part (a):
α = P (z ≥ 1.88) = 1 − pnorm(1.88) = 0.03005404 .
Part (b):
α = P (z ≤ −2.75) = pnorm(−2.75) = 0.002979763 .
Part (c):
α = P (z ≤ −2.88 or z ≥ 2.88) = Φ(−2.88) + (1 − Φ(2.88)) = 0.003976752 .
Exercise 8.16
Using R notation we have
Part (a):
α = P (t ≥ 3.733) = 1 − pt(3.733, 15) = 0.0009996611 .
Part (b):
α = P (t ≤ −2.5) = pt(−2.5, 23) = 0.009997061 .
Part (c):
α = P (t ≤ −1.697 or t ≥ 1.697) = pt(−1.697, 30) + (1 − pt(1.697, 30)) = 0.1000498 .
186
Exercise 8.17
√
= 2.56. As the rejection region for this
Part (a): When x¯ = 30960 we have z = 30960−30000
1500/ 16
test is z ≥ zα = z0.01 = 2.33 we can reject H0 in favor of Ha .
Part (b): Since z0.01 = 2.33 from the formulas in the book we have
30000 − 30500
µ0 − µ′
√
√
= Φ 2.33 +
= 0.8405368 .
β = Φ zα +
σ/ n
1500/ 16
Part (c): From the formulas in the text we would have zβ = 1.644854 and n given by
σ(zα + zβ )
n=
µ0 − µ′
2
1500(2.33 + 1.644)
=
30000 − 30500
2
= 142.1952 ,
so we would need to take n = 143.
Part (d): For this given value of x¯ we had z = 2.56 using R we find a P-value given by
0.005233608 indicating that we should reject H0 for any α larger than this value.
Exercise 8.18
Part (a): Here we have z =
x
¯−µ
√
σ/ n
=
72.3−75
9/5
= −1.5.
Part (b): Since zα = 2.326348 and z > −zα we cannot reject H0 in favor of Ha .
Part (c): Using the R command pnorm(-2.88) we find α = 0.001988376.
Part (d): Using the formulas in the book we have
µ0 − µ′
√
β = 1 − Φ −zα +
σ/ n
75 − 70
= 0.5407099 .
= 1 − Φ −2.88 +
9/5
Part (e): We would first compute zβ = qnorm(1 − 0.01) = 2.326348 and then
σ(zα + zβ )
n=
µ0 − µ′
2
9(2.88 + 2.326348)
=
75 − 70
Thus we would take n = 88.
187
2
= 87.82363 .
Part (f): We would compute
α = P (z ≤ −zα |H0 ) i.e. the probability we reject H0 given that it is true
x¯ − µ0
√ ≤ −zα |H0
=P
σ/ n
σ
= P x¯ ≤ µ0 − √ zα |H0
n
µ0 − µ
x¯ − µ
√ ≤ −zα +
√ |H0
=P
σ/ n
σ/ n
µ0 − µ
75 − 76
√
= Φ −zα +
= 0.001976404 .
= Φ −2.326 +
9/5
σ/ n
It makes sense that this probability is small since as µ gets larger is is less likely that z ≤ −zα .
Exercise 8.19
Part (a): We first need to compute zα/2 = z0.005 = 2.575829. Now z =
As |z| ≤ zα/2 we cannot reject H0 in favor of Ha .
x
¯−95
1.2/4
= −2.266667.
Part (b): From the formulas in the book we have
95 − 94
95 − 94
− Φ −zα/2 +
= 0.224374 .
β(µ) = Φ zα/2 +
1.2/4
1.2/4
Part (c): From the formulas in the book we first need to compute zβ = qnorm(1 − 0.1) =
2.326348. Then for the value of n we have
σ(zα/2 + zβ ) 2
1.2(zα/2 + zβ ) 2
n=
= 21.42632 .
=
µ0 − µ′
95 − 94
Thus we would want to take n = 22.
Exercise 8.20
Note that the p-value of this test is 0.016 which is less than 0.05 so we can reject H0 in
favor of Ha . This p-value is not smaller than 0.01 and thus we cannot reject H0 at the 1%
significance level.
Exercise 8.21
We will assume that the hypothesis test we are performing is
H0 : µ = 0.5
Ha : µ 6= 0.5 .
188
Part (a-b): Note that using R notation we have
tα/2 = qt(1 − 0.5(0.05), 12) = 2.178813
Since |t| < tα/2 we cannot reject H0 in favor of Ha . We thus conclude that the ball bearings
are manufactured correctly.
Part (c): In this case we have
tα/2 = qt(1 − 0.5(0.01), 24) = 2.79694 .
Since |t| < tα/2 so we again cannot reject H0 in favor of Ha .
Part (d): In this case |t| > tα/2 so we can reject H0 in favor of Ha .
Exercise 8.22
We will assume that the hypothesis test we are performing is
H0 : µ = 200
Ha : µ 6= 200 .
Part (a): From the given box plot it looks like the average coating weight is greater than
200.
Part (b): We compute z = 206.73−200
= 5.801724 which has a P-value of 6.563648 10−9. To
1.16
compute this we used the following R code
z = ( 206.73 - 200 ) / 1.16
p_value = pnorm( -z ) + ( 1 - pnorm( z ) )
Exercise 8.23
The hypothesis test we are performing is
H0 : µ = 6(60) = 360 seconds
Ha : µ > 360 seconds .
From the numbers in the problem we compute x¯ = 370.69 > 360 which indicates that
from this sample the response time might be greater than six minutes. Since we are given
the sample standard deviation we will assume that we don’t know the population standard
x
¯−360
√
=
deviation and should be working with the t-distribution. We compute t = 24.36/
26
2.237624. This has a P-value given by 0.01719832. Since this is less than the value of 0.05
we should reject H0 in favor of Ha and this result shows a contradiction to the prior belief.
189
Exercise 8.24
The hypothesis test we are performing is
H0 : µ = 3000
Ha : µ 6= 3000 .
From the numbers in the problem we compute x¯ = 370.69 > 360 which indicates that from
this sample the response time might be greater than six minutes. Since we are given a sample
of data we will assume that we don’t know the population standard deviation and should be
−3000
√
working with the t-distribution. We compute t = x¯s/
= −2.991161. This has a P-value
5
given by 0.02014595. Since this is less than the value of 0.05 we should reject H0 in favor of
Ha at the 5% level. Thus we have evidence that the true average viscosity is not 3000.
Exercise 8.25
Part (a): The hypothesis test we would perform is
H0 : µ = 5.5
Ha : µ 6= 5.5 .
Since we assume we know the population standard deviation we will work with the normal
x
¯−5.5
√
distribution (and not the t-distribution). From the numbers given we compute z = 0.3/
=
16
−3.333333. This has a P-value given by 0.0008581207. Since this is less than the value of
0.01 we should reject H0 in favor of Ha indicating that the true average differs from 5.5.
Part (b): From the formulas in the text we have
µ0 − µ′
µ0 − µ′
′
√
√
β(µ ) = Φ zα/2 +
− Φ −zα/2 +
.
σ/ n
σ/ n
Using R we compute
zα/2 = qnorm(1 − 0.5(0.01)) = 2.575829 .
Using this value we have
5.5 − 5.6
5.5 − 5.6
√
√
β(5.6) = Φ zα/2 +
− Φ −zα/2 +
= 0.8929269 ,
0.3/ 16
0.3/ 16
since this is the error the probability we detect a difference is one minus this amount or
0.1070731.
Part (c): From the formulas in the text we have
σ(zα/2 + zβ ) 2
.
n=
µ0 − µ′
Using the numbers in this problem we get n = 216.2821 thus we should take n = 217.
190
Exercise 8.26
The hypothesis test we will perform is
H0 : µ = 50
Ha : µ > 50 .
Since we are only told the sample standard deviation we will use the t-distribution (rather
than the normal one), to compute the P-values. We find t = 3.773365 which has a P-value
of 0.0002390189. Since this is smaller than this is smaller than 0.01 we can reject H0 for Ha
at a 1%.
Exercise 8.27
Part (a-b): A plot of the histogram of the data shows a distribution with a single peak and
a longish right tail. All of the theorems in this text are proven assuming normality conditions
but using the normal results on non-normal distributions may still be approximately correct
especially when n is large.
Part (c): The hypothesis test we will perform is
H0 : µ = 1
Ha : µ < 1 .
Using the data in this exercise we compute t = −5.790507 this has a P-value of 2.614961 10−7,
thus there is strong evidence against H0 and we should reject it in favor of Ha .
Part (d): A 95% confidence interval for µ is given by
s
x¯ ± tα,n−1 √ = {0.6773241, 0.8222677} .
n
Exercise 8.28
The hypothesis test we would perform is
H0 : µ = 20
Ha : µ < 20 .
As we don’t know the population standard deviation we will use the t-distribution. From the
x
¯−20
√
= −1.132577. This has a P-value given
numbers given in this exercise we compute t = 8.6/
73
by 0.1305747 indicating that the evidence of lateral recumbency time is not inconsistent with
H0 .
191
Exercise 8.29
Part (a): The hypothesis test we would perform is
H0 : µ = 3.5
Ha : µ > 3.5 .
As we don’t know the population standard deviation we will use hypothesis tests based
x
¯−3.5
√ =
on the t-distribution. From the numbers given in this exercise we compute t = 1.25/
8
0.4978032. This has a P-value given by 0.316939 indicating that we cannot reject H0 at the
5% significance level.
Part (b): From the formulas in the book we have
3.5 − 4
µ0 − µ′
′
√
√
= Φ 1.644854 +
= 0.6961932 ,
β(µ ) = Φ zα +
σ/ n
1.25/ 8
note that this result is slightly different from that given in the back of the book.
Exercise 8.30
The hypothesis test we would perform is
H0 : µ = 15
Ha : µ < 15 .
As we don’t know the population standard deviation we will use hypothesis tests based on
x
¯−15
√
the t-distribution. From the numbers given in this exercise we compute t = 6.43/
=
115
−9
−6.170774. This has a P-value given by 5.337559 10 indicating that we can reject H0 at
most significance levels.
Exercise 8.31
The hypothesis test we would perform is
H0 : µ = 7
Ha : µ < 7 .
As we don’t know the population standard deviation we will use hypothesis tests based on the
x
¯−7√
= −1.236364.
t-distribution. From the numbers given in this exercise we compute t = 1.65/
9
This has a P-value given by 0.1256946 indicating that we cannot reject H0 at the 10%
significance level.
192
Exercise 8.32
Part (a): The hypothesis test we would perform is
H0 : µ = 100
Ha : µ 6= 100 .
As we don’t know the population standard deviation we will use hypothesis tests based
¯−100
√
=
on the t-distribution. From the numbers given in this exercise we compute t = xs/
12
−0.9213828. This has a P-value given by 0.3766161 indicating that we cannot reject H0 at
the 5% significance level.
Part (b): Using the formula in the book we have
σ(zα/2 + zβ ) 2
n=
.
µ0 − µ′
We can compute zα/2 and zβ using the R code
z_alpha_over_two = qnorm( 1-0.05/2 ) # gives 1.959964
z_beta = qnorm( 1 - 0.1 ) # gives 1.281552
Then we compute
7.5(1.959964 + 1.281552)
n=
100 − 95
Thus we would take n = 24.
2
= 23.6417 .
Exercise 8.33
From formula in the book we have
µ0 − µ′
µ0 − µ′
′
√
√
β(µ ) = Φ zα/2 +
− Φ −zα/2 +
.
σ/ n
σ/ n
Using this we evaluate
∆
∆
β(µ0 − ∆) = Φ zα/2 + √
− Φ −zα/2 + √
σ/ n
σ/ n
∆
∆
− Φ −zα/2 − √
.
β(µ0 + ∆) = Φ zα/2 − √
σ/ n
σ/ n
Since Φ(c) = 1 − Φ(−c) by applying this twice we can write β(µ0 + ∆) as
∆
∆
β(µ0 + ∆) = 1 − Φ −zα/2 + √
− 1 − Φ zα/2 + √
σ/ n
σ/ n
∆
∆
− Φ −zα/2 + √
= β(µ0 − ∆) ,
= Φ zα/2 + √
σ/ n
σ/ n
as we were to show.
193
Exercise 8.34
Consider the case where Ha : µ > µ0 then we must have µ′ > µ0 to make a type II error. In
this case as n → ∞ we have
√
µ0 − µ′
Φ zα + n
→ Φ(−∞) = 0 ,
σ
the other cases are shown in the same way.
Exercise 8.35
The hypothesis test to perform is
With n = 200 and pˆ =
124
200
H0 : p = 0.7
Ha : p 6= 0.7 .
= 0.62 we compute
z=p
pˆ − p0
p0 (1 − p0 )/n
= −2.468854 ,
(17)
which has a P-value of 0.01355467 indicating we should reject H0 at the 5% level.
Exercise 8.36
Part (a): The hypothesis test to perform is
H0 : p = 0.1
Ha : p > 0.1 .
With n = 100 and pˆ = 0.14 we compute z in Equation 17 and get z = 1.333333 which has
a P-value of 0.09121122. This does not provide compelling evidence that more than 10% of
all plates blister under similar circumstances.
Part (b): In this case we have p′ = 0.15 and so using the formulas in the book we need to
compute
!
p
′
p
−
p
+
z
p
(1
−
p
)/n
0
α
0
0
p
β(p′ ) = Φ
.
(18)
p′ (1 − p′ )/n
When we do that we find β = 0.4926891 if n = 200 we find β = 0.274806.
Part (c): We have zβ = qnorm(1 − 0.1) = 2.326348 and then n is given by
#2
" p
p
zα p0 (1 − p0 ) + zβ p′ (1 − p′ )
= 701.3264 .
n=
p′ − p0
Rounding upwards we would take n = 702.
194
Exercise 8.37
Part (a): The hypothesis test to perform is
H0 : p = 0.4
Ha : p 6= 0.4 .
82
. Note that p0 n = 60 > 10 and (1 − p0 )n = 90 > 10 so we
With n = 150 and pˆ = 150
can use large sample asymptotics for this problem. We compute z in Equation 17 and get
z = 3.666667 which has a P-value of 0.0002457328. This does provide compelling evidence
that we can reject H0 .
Exercise 8.38
Part (a): The hypothesis test to perform is
H0 : p = 2/3
Ha : p 6= 2/3 .
80
With n = 124 and pˆ = 124
. Note that p0 n = 82.66667 > 10 and (1 − p0 )n = 41.33333 > 10
so we can use large sample asymptotics for this problem. We compute z in Equation 17
and get z = −0.5080005 which has a P-value of 0.611453. This does not provide compelling
evidence that we can reject H0 .
Part (b): With a P-value that large we cannot reject H0 and the value of 2/3 is plausible
for kissing behavior.
Exercise 8.39
Part (a): The hypothesis test to perform is
H0 : p = 0.02
Ha : p < 0.02 .
15
With n = 1000 and pˆ = 1000
= 0.015. Note that p0 n = 20 > 10 and (1 − p0 )n = 980 > 10
so we can use large sample asymptotics for this problem. We compute z in Equation 17
and get z = −1.129385 which has a P-value of 0.1293678. This does not provide compelling
evidence that we can reject H0 .
Part (b-c): Now we assume p′ = 0.01 and use the formula in the book to compute β(p′ ) or
the probability we take the inventory when we don’t need to i.e. we fail to reject H0 . The
formula we use is given by
!
p
p0 − p′ − zα p0 (1 − p0 )/n
′
p
.
(19)
β(p ) = 1 − Φ
p′ (1 − p′ )/n
195
With the numbers from this problem we compute β(0.01) = 0.1938455. In the same way we
compute 1 − β(0.05) = 3.160888 10−8 which is the probability that when p = 0.05 we will
reject H0 in favor of Ha .
Exercise 8.40
The hypothesis test to perform is
H0 : p = 0.25
Ha : p 6= 0.25 .
177
With n = 1050 and pˆ = 1050
= 0.1685714. Note that p0 n = 262.5 > 10 and (1 − p0 )n =
787.5 > 10 so we can use large sample asymptotics for this problem. We compute z in
Equation 17 and get z = −6.093556 which has a P-value of 1.104295 10−9. This does provide
compelling evidence that we can reject H0 .
Exercise 8.41
Part (a): The hypothesis test to perform is
H0 : p = 0.05
Ha : p 6= 0.05 .
40
= 0.08. Note that p0 n = 25 > 10 and (1 − p0 )n = 475 > 10 so we
With n = 500 and pˆ = 500
can use large sample asymptotics for this problem. We compute z in Equation 17 and get
z = 3.077935 which has a P-value of 0.002084403. This does provide compelling evidence
that we can reject H0 at the 1% level.
Part (b): We use the formula from the book to compute
!
!
p
p
p0 − p′ + zα/2 p0 (1 − p0 )/n
p0 − p′ − zα/2 p0 (1 − p0 )/n
′
p
p
β(p ) = Φ
−Φ
.
p′ (1 − p′ )/n
p′ (1 − p′ )/n
From the numbers given in this problem we compute β(0.1) = 0.03176361.
Exercise 8.42
Part (a): The hypothesis test to perform is
H0 : p = 0.5
Ha : p 6= 0.5 .
196
(20)
To test this we would use the third region where “extream” values of X in either direction
(small or large) indicate departures from p = 0.5.
Part (b): We would compute
α = P { Reject H0 | H0 is true }
3 k 20−k
20 20
X
X
1
20
1
1
20
=
+
k
2
2
2
k
k=0
k=17
= 0.002576828 .
We computed the above with the following R code
sum( c( dbinom( 0:3, 20, 0.5 ), dbinom( 17:20, 20, 0.5 ) ) )
Since this value is less than 0.05 we could use it as a 5% level test. To be the best 5%
level test we would want the largest rejection interval such that α < 0.05. Trying to make
the above rejection interval larger yields α > 0.05 so we conclude that this is the best 5%
rejection region.
Part (c): We would compute
16 X
20
0.6k 0.420−k = 0.9839915 .
β = P { Accept H0 | Ha is true } =
k
k=4
We computed the above with the following R code
sum( dbinom( 4:16, 20, 0.6 ) )
In the same way we find β(0.8) = 0.5885511.
Part (d): For α = 0.1 the rejection region should be {0, 1, 2, 3, 4, 5, 15, 16, 17, 18, 19, 20}.
Since 13 is not in this rejection region we do not reject H0 in favor of Ha .
Exercise 8.43
The hypothesis test to perform is
H0 : p = 0.1
Ha : p > 0.1 .
The probability of not proceeding when p = 0.1 is the value of α and we want α ≤ 0.1. This
means that we must pick a decision threshold c such that
n X
n
0.1k 0.9n−k ≤ 0.1 .
α = P { Reject H0 | H0 is true } =
k
k=c
197
In addition we are told if p = 0.3 the probability of proceeding should be at most 0.1 which
means that
c−1 X
n
0.3k 0.7n−k ≤ 0.1 .
β = P { Accept H0 | Ha is true } =
k
k=0
We can find if a value of c exists for given values of n by looping over all possible values for
c and printing the ones that satisfy the above conditions. We can do this with the following
R code
search_for_critical_value = function( n=10 ){
for( thresh in 0:n ){
alpha = sum( dbinom( thresh:n, n, 0.1 ) )
if( alpha <= 0.1 ){
beta = sum( dbinom( 0:(thresh-1), n, 0.3 ) )
if( beta <= 0.1 ){
print( c(thresh,alpha,beta) )
}
}
}
}
Using the above function I find that the call search for critical value(25) gives the
output
[1] 5.00000000 0.09799362 0.09047192
Thus the value of c should be 5 and with this value we have α = 0.09799362 and β =
0.09047192.
Exercise 8.44
The hypothesis test to perform is
H0 : p = 0.035
Ha : p < 0.035 .
15
With n = 500 and pˆ = 500
= 0.03. Note that p0 n = 17.5 > 10 and (1 − p0 )n = 482.5 > 10
so we can use large sample asymptotics for this problem. We compute z in Equation 17 and
get z = −0.6083553 which has a P-value of 0.2714759. This does not provide compelling
evidence that we can reject H0 at the 1% level.
198
Exercise 8.45
The null is rejected if the P-value is small so in this case we will be rejected when the P-value
is less than 0.05. Thus we will reject for (a), (b), and (d).
Exercise 8.46
We would reject H0 in cases (c), (d), and (f).
Exercise 8.47
We can compute the P-value in each of these cases using R with the expression 1-pnorm(z).
For the given values we compute
z_values = c( 1.42, 0.9, 1.96, 2.48, -0.11 )
p_values = 1 - pnorm( z_values )
p_values
[1] 0.077803841 0.184060125 0.024997895 0.006569119 0.543795313
Exercise 8.48
We can compute the P-value in each of these cases using R with the expression 2(1-pnorm(abs(z))).
For the given values we compute
z_values = c( 2.10, -1.75, -0.55, 1.41, -5.3 )
p_values = 2 * ( 1 - pnorm( abs(z_values) ) )
p_values
[1] 3.572884e-02 8.011831e-02 5.823194e-01 1.585397e-01 1.158027e-07
Exercise 8.49
We can evaluate the P-values using the following R code
c( 1-pt(2.0,8), pt(-2.4,11), 2*(1-pt(1.6,15)), 1-pt(-0.4,19), 1-pt(5,5), 2*(1-pt(4.8,40)) )
[1] 4.025812e-02 1.761628e-02 1.304450e-01 6.531911e-01 2.052358e-03 2.234502e-05
199
Exercise 8.50
Using the R code we compute
c( 1-pt(3.2,15), 1-pt(1.8,9), 1-pt(-0.2,24) )
[1] 0.002981924 0.052695336 0.578417211
As the first number is less that 0.05 we can reject H0 at the level 5%, as the second number
is larger than 0.01 we cannot reject H0 at the level 1%, as the third number is so large we
would not reject H0 at any reasoable level.
Exercise 8.51
Since our P-value is larger than 0.1 it is larger than 0.01. In each case we cannot reject H0
in favor of Ha at this significance level either.
Exercise 8.52
The hypothesis test to perform is
H0 : p = 1/3
Ha : p 6= 1/3 .
With n = 855 and pˆ = 346
= 0.4046784. Note that p0 n = 285 > 10 and (1 − p0 )n = 570 > 10
855
so we can use large sample asymptotics for this problem. We compute z in Equation 17 and
get z = 4.425405 which has a P-value of 4.813073 10−6. A P-value this small indicates that
we should reject H0 and that there seems to be evidence of ability to distinguish between
reserve and regular wines.
Exercise 8.53
The hypothesis test to perform is to assume the null hypothesis that the pills each weight 5
grains or
H0 : µ = 5
Ha : µ 6= 5 .
¯−µ
√ 0 = −3.714286 which has a P-value of 0.0003372603. A P-value this
We compute t = xs/
n
small indicates that we should reject H0 and that there seems to be evidence that the pills
are smaller than they should be.
200
Exercise 8.54
The hypothesis test to perform is
H0 : p = 0.2
Ha : p > 0.2 .
Part (a): With n = 60 and pˆ = 15
= 0.25. Note that p0 n = 12 > 10 and (1−p0 )n = 48 > 10
60
so we can use large sample asymptotics for this problem. We compute z in Equation 17 and
get z = 0.9682458 which has a P-value of 0.1664608. A P-value this large indicates that we
don’t have enough evidence to reject H0 and we don’t need to modify the manufacturing
process.
Part (b): We want to compute 1 − β where β is given by Equation 18. When we use that
with the numbers from this problem we get 1 − β = 0.995158.
Exercise 8.55
The hypothesis test to perform is
H0 : p = 0.5
Ha : p < 0.5 .
47
= 0.4607843. Note that p0 n = (1 − p0 )n = 51 > 10 so we
With n = 102 and pˆ = 102
can use large sample asymptotics for this problem. We compute z in Equation 17 and get
z = −0.792118 which has a P-value of 0.2141459. A P-value this large indicates that we
don’t have enough evidence to reject H0 .
Exercise 8.56
The hypothesis test to perform is
H0 : µ = 3
Ha : µ 6= 3 .
x
¯−3
We compute z = 0.295
= −1.759322 which has a P-value of 0.07852283. Since this is smaller
than 0.1 we would reject H0 at the 10% level. It is not smaller than 0.05 we would not reject
H0 in that case.
Exercise 8.57
The hypothesis test to perform is
H0 : µ = 25
Ha : µ > 25 .
201
Using the numbers in this problem we compute x¯ = 27.923077, s = 5.619335, and have
x
¯−µ
√ = 1.875543 which has a P-value of 0.04262512. Since this
n = 13. This gives a t = s/
n
is smaller than 0.05 we would reject H0 and conclude that the mean response time seems
greater than 25 seconds.
Exercise 8.58
Part (a): The hypothesis test to perform is
H0 : µ = 10
Ha : µ 6= 10 .
Part (b-d): Since this is a two sided test the P-values for each part are given by the
following R expressions
c( 2*(1 - pt(2.3,17)), 2*(1 - pt(1.8,17)), 2*(1 - pt(3.6,17)) )
[1] 0.034387033 0.089632153 0.002208904
Thus we would reject H0 , accept H0 , and reject H0 (under reasonable values for α).
Exercise 8.59
The hypothesis test to perform is
H0 : µ = 70
Ha : µ 6= 70 .
Using the numbers in this problem we compute x¯ = 75.5, s = 7.007139, and have n = 6.
x
¯−µ
√ = 1.922638 which has a P-value of 0.1125473. Since this is larger than
This gives a t = s/
n
0.05 we would accept H0 and conclude that the spectrophotometer is working correctly.
Exercise 8.60
Since the P-value given by the SAS output is larger than 0.01 and 0.05 we cannot reject H0
at the 1% or the 5% level. Since the P-value is smaller than 0.1 we can reject H0 at the 10%
level.
202
Exercise 8.61
Part (a): We would compute
x¯ − 75
√ > −zα µ = 74
β = P {accept H0 |Ha is correct} = P
σ/ n
9 x¯ − 74
1
√ > −zα + √
= P x¯ > 75 − zα √ µ = 74 = P
.
n
9/ n
9/ n
Taking zα = qnorm(1 − 0.01) = 2.326348 and using the three ns suggested in the exercise
we compute the above to be
[1] 0.8878620986 0.1569708809 0.0006206686
Part (b): We have z =
Yes.
74−75
√
σ/ n
= −5.555556 which has a P-value given by 1.383651 10−8.
Part (c): This part has to do with the comments in the book about rejecting H0 when the
sample size is large since when n is very large almost any departure will be detected (even
if the departure is not practically significant).
Exercise 8.62
Part (a): Using Equation 18 we would compute for the given values of n
[1] 0.979279599 0.854752242 0.432294184 0.004323811
Part (b): I compute P-values of
[1] 4.012937e-01 1.056498e-01 6.209665e-03 2.866516e-07
Part (c): This has to do with the comments in the book about rejecting H0 when the
sample size is large since when n is very large almost any departure will be detected (even
if the departure is not practically significant).
Exercise 8.63
The hypothesis test to perform is
H0 : µ = 3.2
Ha : µ 6= 3.2 .
203
Using the numbers in this problem we compute t = −3.119589 which has a P-value of
0.003031901. Since this is smaller than 0.05 we would reject H0 and conclude that the
average lens thickness is something other than 3.05 mm.
Exercise 8.64
We compute
zα/2 = qnorm(1 − 0.05(0.05)) = 1.959964
zβ = qnorm(1 − 0.05(0.05)) = 1.644854 ,
then using the formulas in the book for sample size determination to select a value of β we
have
2
σ(zα/2 + zβ ) 2
0.3(1.959964 + 1.644854)
n=
=
= 29.2381 .
µ0 − µ′
3.2 − 3.0
Thus we need only around 30 samples.
Exercise 8.65
Part (a): The hypothesis test to perform is
H0 : µ = 0.85
Ha : µ 6= 0.85 .
Part (b): Since the given P-value is larger than both 0.05 and 0.1 we cannot reject H0 at
either level.
Exercise 8.66
Part (a): The hypothesis test we would perform is
H0 : µ = 2150
Ha : µ > 2150 .
Part (b-c): We would need to compute z =
x
¯−2150
30/4
= 1.333333.
Part (d): The P-value is given by in R 1 − pnorm(z) = 0.09121122.
Part (e): Since the above P-value is larger than 0.05 we cannot reject H0 in favor of Ha .
204
Exercise 8.67
The hypothesis test we would perform is
H0 : µ = 548
Ha : µ > 548 .
Part (a): We compute z =
587−548
√
10/ 11
= 12.93484 which has a P-value of zero.
Part (b): We assumed a normal distribution for the errors in the measurement of phosphorus
levels.
Exercise 8.68
The hypothesis test we would perform is
H0 : µ = 29
Ha : µ > 29 .
x
¯−µ
√ = 0.7742408 which has a P-value given by 0.2193942. Since
Part (a): We compute z = s/
n
this P-value is not smaller than 0.05 we cannot reject H0 in favor of Ha .
Exercise 8.69
Part (a): This distribution does not look normal since there are no negative values in a
distribution that has a mean value around 215 and a standard deviation around 235. With
a standard deviation this large and a normal distribution we would expect some negative
samples.
Part (b): The hypothesis test we would perform is
H0 : µ = 200
Ha : µ > 200 .
x
¯−µ
√ = 0.437595 which has a P-value given by 0.33084. Since this P-value
We compute z = s/
n
is not smaller than 0.1 we cannot reject H0 in favor of Ha .
Exercise 8.70
From the given output the P-value is 0.043 which is smaller than 0.05 and thus we can reject
H0 in favor of Ha at the 5% level. The P-value is not less than 0.01 and thus we cannot
reject H0 in favor of Ha at the 1% level.
205
Exercise 8.71
The hypothesis test we would perform for this problem is
H0 : µ = 10
Ha : µ < 10 .
We would have zα = qnorm(1 − 0.01) = 2.326348.
Part (a): We want to compute β(9.5) or the probability we don’t reject H0 in favor of Ha
when we should. From the formulas in the book we have
µ0 − µ′
10 − 9.5
′
√
√
β(µ = 9.5) = 1 − Φ −zα +
= 1 − Φ −2.326348 +
= 0.6368023 .
σ/ n
0.8/ 10
The same calculation when µ′ = 9.0 gives β = 0.05192175.
Part (b): We first need to compute zβ = qnorm(1 − 0.25) = 0.6744898 since we would have
a 25% chance of making an error. Then from the formulas in the book we calculate
2
2 σ(zα + zβ )
0.8(zα + zβ )
n=
= 23.05287 .
=
µ0 − µ′
10 − 9.5
Thus we would take n = 24. Note this result is slightly different than that given in the back
of the book which might be due to the books use of the tables in the Appendix.
Exercise 8.72
The hypothesis test we would perform for this problem is
H0 : µ = 9.75
Ha : µ > 9.75 .
From the data given we compute x¯ = 9.8525 and s = 0.09645697 with n = 20 so that
−9.75
√
= 4.752315. This has a P-value of 1.005503 10−6. Thus we would reject H0 in
z = x¯s/
n
favor of Ha .
Exercise 8.73
Part (a-b): The hypothesis test we would perform for this problem is
1
75
1
Ha : µ =
6
.
75
H0 : p =
206
We compute z = √
pˆ−p0
p0 (1−p0 )/n
= 1.64399. Using the normal distribution this has a P-value of
0.1001783. As this is larger than 0.05 we can not reject H0 at the 5% level. We computed
this using the following R code
n = 800
p_hat = 16/800
p_0 = 1/75
z = ( p_hat - p_0 ) / sqrt( p_0 * ( 1 - p_0 ) / n )
pnorm( -z ) + ( 1 - pnorm( z ) )
n * p_0
n * (1-p_0)
Note that np0 = 10.66667 > 10 and n(1 − p0 ) = 789.3333 > 10 so we can use this large
sample tests.
Exercise 8.74
The hypothesis test we would perform for this problem is
H0 : µ = 1.75
Ha : µ > 1.75 .
√
We compute t = 1.89−1.75
= 1.699673. Using the t-distribution this has a P-value of
0.42/ 26
0.05080173. As this is larger than 0.05 we cannot reject H0 at the 5% level but we could at
the 10% level. We computed this using the following R code
t = ( 1.89 - 1.75 )/( 0.42/sqrt(26) )
1 - pt( t, 25 )
Exercise 8.75
The hypothesis test we would perform for this problem is
H0 : µ = 3200
Ha : µ < 3200 .
√
We compute z = 3107−3200
= −3.31842. This has a P-value of 0.0004526412. Thus we can
188/ 45
reject H0 at α = 0.001.
207
Exercise 8.76
The hypothesis test we would perform for this exercise is
H0 : p = 0.75
Ha : p < 0.75 .
In the following R code
n = 72
p_0 = 0.75
p_hat = 42 / 72
z = ( p_hat - p_0 ) / sqrt( p_0 * ( 1 - p_0 ) / n )
p_value = pnorm( z )
We compute z = −3.265986 which has a P-value 0.0005454176. Thus we can reject H0 at
the 1% level and the true proportion of mechanics that could identify the given problem is
likely less than 0.75.
Exercise 8.77
The hypothesis test we would perform for this problem is
H0 : λ = 4
Ha : λ > 4 .
We compute x¯ =
160
36
= 4.444444 and then compute z = √x¯−4 = 1.333333. The P-value for
4/36
this is 0.09121122. Thus we can reject H0 at the 10% level but not the 2% level.
Exercise 8.78
Part (a): The hypothesis test we would perform for this exercise is
H0 : p = 0.02
Ha : p > 0.02 .
In the following R code
n = 200
p_0 = 0.02
p_hat = 0.083
z = ( p_hat - p_0 ) / sqrt( p_0 * ( 1 - p_0 ) / n )
p_value = 1 - pnorm( z )
208
We compute z = 6.363961 which has a P-value 9.830803 10−11. Thus we can reject H0 at all
reasonable levels and certainly at 5%.
Part (b): For the value of α = 0.05 we have zα = qnorm(1 − 0.05) = 1.644854 and the
formulas in the book give
!
p
p0 − p′ + zα p0 (1 − p0 )/n
′
p
= 0.1867162 .
β(p ) = Φ
p′ (1 − p′ )/n
Exercise 8.79
The hypothesis test we would perform for this problem is
H0 : µ = 15
Ha : µ > 15 .
17.5−15
√
We compute t = 2.2/
= 6.428243. The P-value for this is 1.824991 10−7. Thus we can
32
(easily) reject H0 at the 5% level.
Exercise 8.80
The hypothesis test we would perform for this problem is
H0 : σ = 0.5
Ha : σ > 0.5 .
2
We compute χ2 = 9(0.58)
= 12.1104. The P-value for this is 0.2071568. Thus the observation
0.52
is not large enough for us to reject H0 .
Exercise 8.81
In this case we are told that χ2 = 8.58. The P-value for this is 0.01271886. We computed
this with the following simple R code
# the probability our statistic is less than or equal to the observed 8.58
chi2 = 8.58
pchisq( chi2, 20 )
Since this is larger than 0.01 (but not by much) we would not be able to reject H0 . We could
reject H0 at the 5% level however.
209
Exercise 8.82
Part (a): We would use as our estimator of θ
¯ + 2.33S ,
θˆ = X
where S is the sample standard deviation.
Part (b): When we assume independence we get
2
σ2
σ2
2σ
2
ˆ
¯
+ 2.33
= 3.71445 .
Var θ = Var X + 2.33 Var (S) =
n
2n
n
Using this we have
σ
σθˆ = 1.927291 √ .
n
Part (c): The hypothesis test we would perform for this problem is
H0 : µ + 2.33σ = 6.75
Ha : µ + 2.33σ < 6.75 .
= −1.224517. The P-value for this is 0.1103787. Since this
We compute z = x¯+2.33s−(µ+2.33σ)
σθˆ
is larger than 0.01 we cannot reject H0 in favor of Ha .
Exercise 8.83
Part (a): We will consider the hypothesis tests of the form
H0 : µ = µ 0
Ha : µ < µ 0 .
and the other two forms. To do this we will consider the statistic
P
2λ ni=1 Xi − 2n
√
Z=
.
2 n
where λ = µ10 . We use this form since under the assumptions of the problem the expression
P
2λ ni=1 xi has a mean of 2n and a variance of 4n.
Part (b): Using the following R code
data = c( 95, 16, 11, 3, 42, 71, 225, 64, 87, 123 )
n = length( data )
z = ( 2 * sum( data ) / 75 - 2 * n ) / ( 2 * sqrt( n ) )
pnorm( z )
We find that z = −0.05481281, which has a P-value of 0.4781438. This is so large that we
cannot reject H0 .
210
Simple Linear Regression and Correlation
Problem Solutions
Note all R scripts for this chapter (if they exist) are denoted as ex12 NN.R where NN is the
section number.
Exercise 12.1
Part (a-b): Using the stem command in R we would get for the Temp feature
17
17
18
18
|
|
|
|
02344
567
0000011222244
568
while for the Ratio feature we would get
0
1
1
2
2
3
|
|
|
|
|
|
889
0011344
55668899
12
57
01
It looks like there are a good number of temperatures around 180 and most ratios are around
1. The Ratio feature also looks slightly skewed to the right (has more larger values than
smaller values). A scatter plot of the data shows that Ratio is not determined only by Temp.
Part (c): See Figure 7 (left) for a scatter plot of Ratio as a function of Temp. From that
plot it looks like a line would do a decent but not great job at modeling this data.
Exercise 12.2
See Figure 7 (right) for a scatter plot of Baseline as a function of Age. From that plot it
looks like a line would do a decent job of fitting the data if the two points with values of Age
around 7 were removed (perhaps they are outliers).
211
5
3.0
1
2
3
DF$Baseline
4
2.5
2.0
DF$Ratio
1.5
1.0
170
175
180
185
0
DF$Temp
5
10
15
DF$Age
Figure 7: Left: A scatter plot of the data for Exercise 12.1. Right: A scatter plot of the
data for Exercise 12.2.
Exercise 12.3
When this data is plotted it looks like it is well approximated by a line.
Exercise 12.4
Part (a-b): See Figure 8 for each of the requested plots. In the box plot it looks like
the amount removed is a smaller amount than the amount loaded and the amount removed
seems to have a smaller spread of the data around its center value. From the scatter plot
we have drawn the line y = x to emphasis that the amount removed is less than the amount
loaded. The linear fit would seem to be influenced by the point near (x, y) = (40, 10).
Exercise 12.5
Part (a-b): See Figure 9 for each of the requested plots. In the plot with the origin at
(55, 100) we see the potential for a quadratic fit as the points increase and then decrease
again.
212
80
140
120
BOD mass removed
40
60
100
80
60
20
40
20
0
x
y
0
20
40
60
80
100
120
140
BOD mass loading
Figure 8: Left: A box plot of the two features x BOD mass loading and y BOD mass
removal. Right: A scatter plot of y as a function of x for the data for Exercise 12.4.
Exercise 12.6 (tennis elbow)
From the given scatter plot in the text we see two points with a very large x values (relative to
the other data point). Their presence could strongly affect the linear relationship estimated.
Exercise 12.7
Part (a): 1800 + 1.3(2500) = 5050.
Part (b): 1.3(1) = 1.3.
Part (c): 1.3(100) = 130.
Part (d): 1.3(−100) = −130.
213
axes intersecting at (0,0)
0
100
50
150
100
DF$y
elongation
150
200
200
250
250
axes intersect at (55,100)
0
20
40
60
55
80
60
65
70
75
80
85
90
DF$x
temperature
Figure 9: The data for Exercise 12.5. Left: With the origin at (0, 0) Right: With the origin
at (55, 100).
214
Exercise 12.8
Part (a): When the acceleration strength is 2000 the distribution of the 28-day strength
is normal with a mean given by 1800 + 1.3(2000) = 4400 with a standard deviation of 350.
Then
5000 − 4400
Y − 4400
= 0.04323813 .
≤
P (Y ≥ 5000) = 1 − P (Y ≤ 5000) = 1 − P
350
350
Part (b): We have E(Y ) = 1800 + 1.3(2500) = 5050 and the standard deviation is the same
as before (or 350). Then
5000 − 5050
P (Y ≥ 5000) = 1 − Φ
= 0.5567985 .
350
Part (c): We have
E(Y1 ) = 1800 + 1.3(2000) = 4400
E(Y2 ) = 1800 + 1.3(2500) = 5050 .
We want to evaluate P (Y2 ≥ Y1 ) or P (Y2 − Y1 ≥ 0). Now as Y1 and Y2 are independent
normal random variables then Y2 − Y1 is also a normal random variable with a mean of
E(Y1 ) − E(Y2 ) = 4400 − 5050 = −650 and a variance 23502 = 245000 (so the standard
deviation is 494.9747). Using these we would find
1000 − (−650)
= 0.0004287981 .
P (Y2 − Y1 ≥ 1000) = 1 − Φ
494.9747
Part (d): We have
P (Y2 > Y1 ) = P (Y2 − Y1 > 0) = 1 − P (Y2 − Y1 < 0)
0 − (E(Y2 ) − E(Y1 ))
√
=1−Φ
.
23502
Since
E(Y2 ) − E(Y1 ) = 1800 + 1.3x2 − (1800 + 1.3x1 ) = 1.3(x2 − x1 ) ,
we have
−1.3(x2 − x1 )
√
P (Y2 − Y1 ) = 1 − Φ
.
23502
We want to find x2 − x1 such that the above equals 0.95. We can do this with the R code
- qnorm( 0.05 ) * sqrt( 2 * 350^2 ) / 1.3
which gives the value 626.2777.
215
Exercise 12.9
Part (a): 0.095(1) = 0.095.
Part (b): 0.095(−5) = −0.475.
Part (c): We would have
−0.12 + 0.095(10) = 0.83
−0.12 + 0.095(15) = 1.305 .
Part (d): We have
0.835 − (−0.12 + 0.095(10))
P (Y > 0.835) = 1 − P (Y < 0.835) = 1 − Φ
0.025
= 0.4207403 .
To get the other part we would change the value of 0.835 in the above to 0.840.
Part (e): We have
P (Y10 > Y11 ) = P (Y10 − Y11
0 − (E(Y10 ) − E(Y11 ))
√
> 0) = 1 − Φ
2(0.025)
= 0.003604785 .
Exercise 12.10
If the given expression is true then we must have
5500 − E(Y )
P (Y > 5500 when x = 100) = 1 − Φ
SD(Y )
5500 − (4000 + 10(100))
= 1−Φ
= 0.05 .
SD(Y )
This requires a value for the standard deviation of Y is given by 303.9784. Now that we have
the given value for the standard deviation we can check if the second probability statement
is true under that condition. Using R we can evaluate that statement with
1 - pnorm( ( 6500 - (4000 + 10*200) )/ 303.9784 )
which gives the value of 0.04999999 which is less than the claimed value of 0.1 thus the two
statements given are inconsistent.
216
Exercise 12.11
Part (a): We would have
−0.01(1) = −0.01
−0.01(10) = −0.1 .
Part (b): We would have
5.00 − 0.01(200) = 3
5.00 − 0.01(250) = 2.5 .
Part (c): All measurements are independent and thus the probability that all five times are
between 2.4 and 2.6 is the fifth power of the probability that one time is between these two
values. To calculate this later value we have
P (2.4 < Y < 2.6) = P (Y < 2.6)−P (Y < 2.4) = Φ
2.4 − E(Y )
2.6 − E(Y )
−Φ
= 0.8175776 ,
sd(Y )
sd(Y )
using the R code
pnorm( ( 2.6 - (5-0.01*250) )/0.075 ) - pnorm( ( 2.4 - (5-0.01*250) )/0.075 )
Then the the probability that all five measurements are between these two times is the fifth
power of the above number or 0.3652959.
Part (d): We would evaluate
P (Y2 > Y1 ) = P (Y2 − Y1 > 0) = 1 − P (Y2 − Y1 < 0) = 1 − Φ
0 − (−0.01(1))
0.075
= 0.4469649 .
Here we have used the fact that
E(Y2 ) − E(Y1 ) = −0.01(x2 − x1 ) = −0.01(1) .
Estimating Model Parameters
Exercise 12.12
Part (a): We would calculate
P
P
P
25825 − (517)(346)/14
xi yi − ( xi )( yi )/n
Sxy
ˆ
P 2
P 2
=
=
= 0.6522902 and
β1 =
Sxx
xi − ( xi ) /n
39095 − 5172 /14
βˆ0 = y¯ − βˆ1 x¯ = 0.6261405 .
217
Thus the regression line is given by
y = βˆ0 + βˆ1 x = 0.6261405 + 0.6522902x .
Part (b): We get when x = 35 a prediction given by
yˆ = 0.6261405 + 0.6522902(35) = 23.45 .
Since this measurement is the 9th measurement it has a measured value of y = 21 thus a
residual of 21 − 23.45 = −2.45.
Part (c): We compute
X
X
X
SSE =
yi2 − βˆ0
yi − βˆ1
xi yi = 17454 − 0.6226(346) − 0.65228(25825) = 394.45 ,
and
σ
ˆ2 =
A square root then gives σ
ˆ = 5.726.
SSE
394.45
=
= 32.78 .
n−2
12
Part (d): The proportion of explained variation or r 2 can be written as
r2 = 1 −
SSE
.
SST
We compute
SST =
Thus r 2 = 1 −
X
394.45
8902.85
(yi − y¯)2 =
= 0.95569.
X
yi2 − (
X
yi )2 /n = 17454 −
3462
= 8902.85 .
14
Part (e): Note that the x value of 103 has a y value of 75 and the x value of 142 has a y
value of 90. Thus the new sum values without these two sample points are given by
X
xi = 517 − 103 − 142 = 272
X
yi = 346 − 75 − 90 = 181
X
x2i = 34095 − 1032 − 1422 = 3322
X
yi2 = 17454 − 752 − 902 = 3729
X
xi yi = 25825 − 103(75) − 142(90) = 5320 ,
with n = 12. Using these values we get
βˆ1 = −0.428
βˆ0 = 24.78 and
r 2 = 0.6878 ,
note that these are quite different values than what we obtained before (when we included
these two points in the regression).
218
Ex. 6.13
We can plot the points to see what they look like. If we do we see that this data looks like
it comes from a line. Fitting a linear regression model gives for the proportion of explained
variance r 2 = 0.9716, indicating that a line fits very well.
Ex. 6.14
We will work this problem with R.
Part (a): We find the regression line given by
Ratio = −15.2449 + 0.09424Temp .
Part (c): The have y values on different sides of the least square line.
Part (d): This is given by r 2 = 0.4514.
Ex. 6.15
Part (a): Using R a stem plot gives
2
3
4
5
6
7
8
|
|
|
|
|
|
|
034566668899
0133466789
2
3
0
00
which shows a cluster of points around MOE ≈ 30 − 40 and an island of other points near
70 − 80.
Part (b): From the scatter plot of strength as a function of MOE the strength is not
uniquely determined by MOE i.e. at a given MOE value there look to be several possible
values for strength.
Part (c): The least-squares line is y = 3.2925 + 0.10748x. At x = 40 we get y = 7.5917.
The point x = 40 is inside the data used to build our linear model and thus using our model
at that point should not cause worry. The value of x = 100 is outside of the data used to
build the least squares model so we would not be comfortable using the least squares line to
predict strength in that case.
219
Part (d): From the MINITAB output we have SSE = 18.736, SST = 71.605, and r 2 = 0.738
so yes the relationship is relatively good at fitting the data.
Exercise 12.16
We will use the R command lm to compute all needed items.
Part (a): A scatter plot of the data points looks very linear and has an r 2 of 0.953.
Part (b-d): See the lm output.
Part (e): This proportion is the same as the value of r 2 reported above or 0.953.
Exercise 12.17
Part (b): 3.678 + 0.144(50) = 10.878.
Part (c): We want to evaluate
σ
ˆ2 =
Since we have
SSE
.
n−2
SSE = SST − SSR = 320.398 − SSR
= 320.398 − r 2 SST = 320.398 − 320.398(0.860) = 44.85572 .
Thus σ
ˆ2 =
44.85572
23−2
= 2.135987 and σ
ˆ = 1.4615.
Exercise 12.18
Part (a): We would calculate
P
P
P
S
x
y
−
(
x
)(
y )/n
987.645 − (1425)(10.68)/15
xy
i
i
i
P 2
P 2 i
=
= −0.0001938795
βˆ1 =
=
Sxx
139037.25 − (10.68)2/15
xi − ( xi ) /n
1425
10.68
− (−0.0001938795)
= 0.7304186 .
βˆ0 = y¯ − βˆ1 x¯ =
15
15
Thus the regression line is given by
y = βˆ0 + βˆ1 x = 0.7304186 − 0.0001938795x .
Part (b): This would be ∆y = βˆ1 (1) = −0.0001938795.
220
Part (c): We have
y = βˆ0 + βˆ1 x = βˆ0 + βˆ1
9
= βˆ0 + 32βˆ1 + βˆ1 x˜ ,
5
9
x˜ + 32
5
which give the new intercept and the new slope. Here x˜ is the temperature measured in
Celsius.
Part (d): I would think that we could use the regression results since 200 is inside the range
of x values.
Exercise 12.19
Part (a): We would use
P
P
P
xi yi − ( xi )( yi )/n
Sxy
ˆ
P 2
P
=
β1 =
Sxx
xi − ( xi )2 /n
βˆ0 = y¯ − βˆ1 x¯ .
Using the numbers from the given dataset we compute
βˆ1 = 1.711432
βˆ0 = −45.55 .
Part (b): We need to evaluate βˆ0 + βˆ1 (225) = −45.55 + 1.711432(225) = 339.52.
Part (c): This would be the value of βˆ1 times the change in liberation area or
1.711432(−50) = −85.57 .
Part (d): The value of 500 is beyond the range of inputs and thus the regression line
predictions are not to be trusted.
Exercise 12.20
Part (a): From the MINITAB output we have βˆ1 = 0.9668 and βˆ0 = 0.36510.
Part (b): We have βˆ0 + βˆ1 (0.5) = 0.36510 + 0.9668(0.5) = 0.8485.
Part (c): We find σ
ˆ = 0.1932.
Part (d): We have SST = 1.4533 and r 2 = 0.717.
221
Exercise 12.21
Part (a): The value of r 2 is 0.985 a very “large” number. A scatter plot of the data looks
like a line would fit quite well.
Part (b): We have yˆ = βˆ0 + βˆ1 (0.3) = 32.1878 + 156.71(0.3) = 368.891.
Part (c): This would be the same as in Part (b).
Exercise 12.22
Part (a): This is
ǫ=σ
ˆ=
Now
SSE = SST − SSR =
X
yi2 − βˆ0
r
SSE
.
n−2
X
yi − βˆ1
X
xi yi
= 7.8518 − 1.41122(10.68) − (−0.00736)(987.64) = 0.049245 .
q
= 0.0615.
So that we can evaluate ǫ = 0.049245
15−2
thus r 2 = 1 −
SSR
SST
SSE
= 1 − SST
. To evaluate this we first compute
X
X
SST =
yi2 − (
yi )2 /n = 7.8518 − 10.682 /15 = 0.24764 ,
Part (b): Using r 2 =
0.049245
0.24764
= 0.8011.
Exercise 12.23
Part (a): We want to consider the two definitions
X
SSE =
(yi − yˆi )2
X
X
X
SSE =
yi2 − βˆ0
yi − βˆ1
xi yi .
When I use R I get the same value of for each of these given by
[1] "SSE_1= 16205.453351; SSE_2= 16205.453351"
Part (b): We have
SST =
X
yi2 − (
X
yi )2 /n = 533760.0 ,
222
and
SSE
= 0.9696 .
SST
With an r 2 this “large” we expect our linear model to be quite good. Note that these results
don’t exactly match the back of the book. I’m not sure why. If anyone sees anything wrong
with what I have done please contact me.
r2 = 1 −
Exercise 12.24
We will work this problem using R.
Part (a): Based on the scatter plot a linear fit looks like a good model.
Part (b): We get y = 137.876 + 9.312x.
Part (c): This is r 2 . From the R output this looks to be given by 0.9897.
Part (d): Dropping the sample where x = 112 and y = 1200 and refitting give the least
squares line of
y = 190.352 + 7.581x ,
which is very different from the line we obtained with that data point included in the fit.
See Figure 10 where the line using all of the data points is given in red and the line with the
deleted point in green. Notice how different the two lines are.
Exercise 12.25
That b1 and b0 satisfy the normal equations can be shown by solving the normal equation
to obtain solutions that agree with the books equations 12.2 and 12.3.
Exercise 12.26
Lets check that x¯ and y¯ is on the line. Consider the right-hand-side of the least squares line
evaluated at x = x¯. We have
βˆ0 + βˆ1 x¯ = (¯
y − βˆ1 x¯) + βˆ1 x¯ = y¯ ,
showing that the given point (¯
x, y¯) is on the least-squares regression line.
223
1200
1000
800
DF$y
600
400
20
40
60
80
100
DF$x
Figure 10: The two least squares fit of the data in Exercise 12.24. See the text for details.
Exercise 12.27 (regression through the origin)
The least square estimator for β1 is obtained by finding the value of βˆ1 such that the given
SSE(β1 ) is minimized. Here SSE(β1 ) is given by
X
SSE(βˆ1 ) =
(yi − βˆ1 xi )2 .
Taking the derivative of the given expression for SSE(βˆ1 ) with respect to βˆ1 and setting the
resulting expression equal to zero we find
or
X
d
SSE(βˆ1 ) = 2
(yi − βˆ1 xi )(−xi ) = 0 ,
dβˆ1
−
X
yi xi + βˆ1
Solving this expression for βˆ1 we find
X
x2i = 0 .
P
xi yi
βˆ1 = P 2 .
xi
To study the bias introduced by this estimator of β1 we compute
P 2
P
x
x
E(y
)
i
i
P 2
= β1 P i2 = β1 ,
E(βˆ1 ) =
xi
xi
224
(21)
2: r2= 0.99, sigma= 4.03
3: r2= 0.99, sigma= 1.90
df$y
df$y
−0.4
−0.2
0.0
0.2
0.4
0.6
10
20
36
0.6
38
20
40
0.8
40
30
1.0
60
42
df$y
DF$y
1.2
40
44
80
1.4
46
50
100
1.6
48
60
1: r2= 0.43, sigma= 4.03
0.8
15
16
DF$x
17
18
19
20
df$x
10
20
30
40
50
df$x
10
20
30
40
50
df$x
Figure 11: Left: A scatter plot of the data (xi , yi ) (in green) and the points (xi − x¯, yi ) (in
red) for the data in Exercise 12.28. Right:
showing that this estimator is unbiased. To study the variance of this estimator we compute
X
X
1
1
Var(x
y
)
=
x2i Var(yi )
P 2 2
P 2 2
i i
( xi ) i
( xi ) i
2
2
X
σ
σ
= P 2 2
x2i = P 2 ,
( xi ) i
i xi
Var(βˆ1 ) =
(22)
the requested expression. An estimate of σ
ˆ is given by the usual
σ
ˆ2 =
SSE
,
n−1
which has n − 1 degrees of freedom.
Exercise 12.28
Part (a): See Figure 11 (left) where the original (xi , yi ) points are plotted in green and the
points (xi − x¯, yi ) points are plotted in red. The red points are a shift leftward of the green
points. Thus the least squares line using (xi − x¯, yi ) will be a shift to the left of the least
squares line using the data (xi , yi ).
225
Part (b): In the second model we have
Yi = β0∗ + β1∗ (xi − x¯) = β0∗ − β1∗ x¯ + β1∗ xi ,
For this to match the first model means that
β0 = β0∗ − β1∗ x¯
β1 = β1∗ .
Solving for β0∗ and β1∗ we have
β1∗ = β1
β0∗ = β0 + β1 x¯ .
Exercise 12.29
From the plot given in Figure 11 (right) for the three data sets we see that two of the plots
have a large r 2 value of 0.99 (plots 2 and 3). From these two plots the value of σ is smaller
in the third one. In this case then the linear fit would be the “best”’. In the first data set
the application of the linear fit would be the “worst”.
Inferences About the Slope Parameter β1
Exercise 12.30
Part (a): From the text we have that
σβˆ1
σ
=√
Sxx
with Sxx =
n
X
i=1
(xi − x¯)2 .
From the given data in this problem we compute x¯ = 2500 and Sxx = 7 106 thus
350
σβˆ1 = √
= 0.132228 .
7 106
Part (b): We have
!
ˆ1 − β1
β
1.5
−
β
1.0
−
β
1
1
≤
≤
P (1.0 ≤ βˆ1 ≤ 1.5) = P
σβˆ1
σβˆ1
σβˆ1
1.0 − 1.25
1.5 − 1.25
, n − 2 − pt
, n − 2 = 0.8826 ,
= pt
0.132228
0.132228
using R notation pt for the cumulative t-distribution.
226
Part (c): In the case of the eleven measurements we have x¯ = 2500 (the same as before)
but Sxx = 1.1 106 which is smaller than before. Thus σβˆ1 = √Sσxx in this case will be larger
than in the seven measurement case. What is interesting about this is that one might think
that having more measurements is always better and here is an example where that is not
true since the spread in x of the eleven points is less than that of in the seven point case.
Exercise 12.31
Part (a): To evaluate sβˆ1 we use
sβˆ1 = √
s
s
= pP 2
.
P
Sxx
xi − ( xi )2 /n
q
SSE
and thus we need
To use this we need to first compute s. This is computed using s = n−2
to evaluate SSE where we get
X
X
X
SSE =
yi2 − βˆ0
yi − βˆ1
xi yi = 0.04924502 .
Using this value we find sβˆ1 = 0.001017034.
Part (a): To calculate a confidence interval we use the fact that the fraction
β1 −βˆ1
sβˆ
is given
1
by a t-distribution with n − 2 degrees of freedom. Using this we can derive the result in the
book that a 100(1 − α)% confidence interval for the slope β1 is given by
βˆ1 ± tα/2,n−2 sβˆ1 .
(23)
For this part we want α = 0.05 and have n = 15 so in R notation we find that
tα/2,n−2 = qt(1 − 0.025, 13) = 2.1603 .
With these the confidence interval for β1 is then given by
[1] -0.009557398 -0.005163061
Exercise 12.32
Note that from the MINITAB output the p-value/t-statistic for the rainfall variable is
0.0/22.64 showing that the linear model is significant. Now the change in runoff associated
with the given change in rainfall would be approximated using
∆runoff = βˆ1 ∆rainfall = βˆ1 (1) = 0.827 .
A 95% confidence interval for this change would thus be the same as a 95% confidence
interval for β1 and is given by
βˆ1 ± tα/2,n−2 sβˆ1 .
Here from Exercise 16 we have that n = 16 so that tα/2,n−2 = 2.14478 and the confidence
interval obtained using the above is then (0.748, 0.905).
227
Exercise 12.33
Part (a): From the MINITAB output for Exercise 15 we have n = 27 and thus can compute
tα/2,n−2 = qt(1 − 0.025, 25) = 2.059 ,
so a 95% confidence interval for β1 using Equation 23 is given by (0.0811, 0.1338).
Part (b): For this part we want to know if the given measurements would reject the
hypothesis that β1 = 0.1. As the above confidence interval in fact contains this value we
cannot reject the hypothesis at the 5% level. Thus this sample of data does not contradict
the belief that β1 = 0.1.
Exercise 12.34
Part (a): In the rejection region approach we compute the confidence interval for β1 and see
if the value of zero is inside or outside of it. If the point zero is inside the confidence interval
we cannot reject the hypothesis that β1 = 0 and the linear model is not appropriate. Here we
desire to have α = 0.01, so that with n = 13 we get tα/2,n−2 = qt(1 − 0.005, 11) = 3.105807
and the 99% confidence interval is given by (0.3987, 1.534). As zero is not included in this
interval we cannot reject the hypothesis that β1 6= 0.
Part (c): From the statement in the problem we are told that the previous belief on the
= 1.5. Since the value of 1.5 is inside the 99% confidence
value of β1 is such that β1 = 0.15
0.1
interval for β1 computed above this new data does not contradict our original belief.
Exercise 12.35
Part (a): Using the given summary statistics we compute βˆ1 = 1.536018 with a 95%
confidence interval given by (0.6321, 2.439).
Part (b): Our p-value for β1 is computed as
p-value = P (βˆ1 ≥ 1.536|β1 = 0) = P
!
βˆ1
1.536 ≥
β1 = 0 = P
sβˆ1
sβˆ1 βˆ1
≥ 3.622|β1 = 0
sβˆ1
!
= 1 − pt(3.622, n − 2) = 0.00125 .
As the p-value is less than 0.05 we conclude that this result is significant at the 95% level.
Part (c): The value of does at 5.0 is outside of the range of observed samples so it would
not be sensible to use linear regression since it would be extrapolation.
Part (d): Without this observation we get β1 = 1.683 with a 95% confidence interval for
β1 given by (0.53, 2.834). As the values of zero is not inside this 95% confidence interval
228
the regression is still significant even without this point and it does not seem to be exerting
undue influence.
Exercise 12.36
Part (a): If we plot the data (see the R code) a scatter plot looks like it can be well modeled
by a linear curve. From the scatter plot we notice that there there is one point (965, 99) that
could be an outlier or might exert undo influence on the linear regression.
Part (b): The proportion asked for is the value of the r 2 coefficient which from the output
from the lm function is given by 0.9307.
Part (c): We want to know the increase in y when x is increased by 1000 − 100 = 900 which
has a point estimate given by
βˆ1 (900) = 6.211 10−4(900) = 0.55899 .
The 95% confidence interval on this product is
900βˆ1 ± tα/2,n−2 (900sβˆ1 ) = (0.233, 0.8845) .
Since the value of 0.6 is inside of this confidence interval there is not substantial evidence
from the data that the true average increase in y would be less than 0.6.
Part (d): This would be the point estimate and a confidence interval for the parameter β1 .
Exercise 12.37
Part (a): We can compute a t-test on the difference between the two levels. We find a
t-value given by -7.993149 which due to its magnitude is significant.
Part (b): To answer this question we will compute the 95% confidence interval for β1 . Using
the given data points we find this confidence interval given by (0.467, 0.840). Since all points
in this interval are less than one we conclude that β1 is less than 1. Since the value of zero
is not inside this interval we can also conclude that the linear relationship is significant.
Exercise 12.38
Part (a): Using R and the data from Exercise 19 we get a t-value for β1 given by 17.168
and a p-value of 8.23 10−10. Thus this estimate of β1 is significant.
Part (b): A point estimate of the change in emission rate would be estimated by 10βˆ1 . This
point estimate would have a 95% confidence interval given by (14.94231, 19.28633).
229
Exercise 12.39
Part (a): From Example 12.6 we have n = 20 and calculated SSE = 7.968. Next we
compute
X
X
X
SST = Syy =
(yi − y¯)2 =
yi2 − (
yi )2 /n = 124039.58 − 1574.82/20 = 39.828 .
Using these two results we have SSR = SST − SSE = 31.86. Now that we have these items
we can using Table 12.2 as a template where we find a f value for the linear fit given by
given by
SSR
31.86
f=
=
= 71.97 .
SSE/(n − 2)
7.968/18
The model utility test would compare this value to
Fα,1,n−2 = qf(1 − 0.05, 1, n − 2) = qf(1 − 0.05, 1, 20 − 2) = 4.413 .
Since f ≥ Fα,1,n−2 the linear regression is significant. We find a p-value given by
p-value = P (F ≥ 71.97) = 1 − pf(71.97, 1, n − 2) = 1 − pf(71.97, 1, 18) = 1.05 10−7 .
To compare this result to that when we use the t-test of significance we first have to estimate
β1 and σβ1 for the linear regression. From the text we have that βˆ1 = 0.04103 and sβˆ1 =
0.0048367 so the t-stat for β1 is given by
t=
βˆ1
= 8.483 .
sβˆ1
Note that t2 = 71.97 = f as it should. Using this t-statistic we compute the p-value using
p-value = P (|t| ≥ 8.483) = 1 − P (|t| < 8.483)
= 1 − (pt(8.483, n − 2) − pt(−8.483, n − 2)) = 1.05 10−7 ,
the same value as calculated using the F -statistic.
Exercise 12.40
We start with the expression for βˆ0 derived in this section of the text
P
P
Yi − βˆ1 xi
ˆ
β0 =
.
n
Then taking the expectation of both sides of this expression gives
X 1 X
ˆ
E[Yi ] − β1
xi .
E[β0 ] =
n
But by the linear hypothesis E[Yi ] = β0 + β1 xi and using this the above we get
X 1 X
(β0 + β1 xi ) − β1
xi = β0 ,
E[βˆ0 ] =
n
as it should.
230
Exercise 12.41
Part (a): We start with the expression for βˆ1 derived in this chapter given by
X
xi − x¯
.
βˆ1 =
ci Yi with ci =
Sxx
Taking the expectation of the above expression we get
X
X
E[βˆ1 ] =
ci E[Yi ] =
ci (β0 + β1 xi )
X
X
= β0
ci + β1
ci xi .
P
Note that
ci = 0 since
X
1
1 X
(xi − x¯) =
ci =
(n¯
x − n¯
x) = 0 .
Sxx
Sxx
P
Next consider the sum
ci xi . We have
X
1 X
1 X
(xi − x¯)xi =
(xi − x¯)(xi − x¯ + x¯)
ci xi =
Sxx
Sxx
X
X
1
1 X
2
=
(xi − x¯)2 = 1 .
(xi − x¯) + x¯
(xi − x¯) =
Sxx
Sxx
Combing these two results we have shown that E[βˆ1 ] = β1 .
Part (b): By the independence of the Yi ’s we have
X
X
σ2
σ2
σ2 X
(xi − x¯)2 = 2 Sxx =
,
Var βˆ1 =
c2i Var (Yi ) = σ 2
c2i = 2
Sxx
Sxx
Sxx
the same expression for the variance of βˆ1 as given in the text.
Exercise 12.42
The t-statistic for testing the hypothesis H0 : β1 = 0 vs. Ha : β1 6= 0 is given by the
ˆ
expression sβˆ1 . We will compute the new estimates for β1 and σβ1 under the transformation
β1
suggested and show that the t-statistic does not change.
xy
As we compute the least-squares estimate of β1 using βˆ1 = SSxx
so that under the given
transformation our new β1 estimate (denoted βˆ1′ ) is related to the old β1 estimate (denoted
βˆ1 ) as
cd
d
βˆ1′ = 2 βˆ1 = βˆ1 .
c
c
The estimate of β0 transforms as
d
′
βˆ1 c¯
x = dβˆ0 .
βˆ0 = d¯
y−
c
231
The error sum of squares (SSE) transforms
X d X
X
′
2
2
ˆ
ˆ
xi yi = d2 SSE .
yi − β1 dc
SSE = d
yi − dβ0 d
c
q
SSE
′
we have s′ = ds. Finally note that Sxx
= c2 Sxx . Using all of these
Thus using s = n−2
results we have
s′
ds
d
s′βˆ1 = p
= √
= sβˆ1 .
′
c
c Sxx
Sxx
Combing these results our new t-statistic of the transformed points is given by
(d/c)βˆ1
βˆ1
βˆ1′
=
=
,
s′βˆ
(d/c)sβˆ1
sβˆ1
1
the same expression for the t-statistic before the transformation.
Exercise 12.43
Using the formula given with β10 = 1, β1′ = 2 and data from this problem we have a value
of d given by
|1 − 2|
= 1.2034 .
d = q 15−1
4
4022
11098−
15
As we have n−2 = 13 degrees of freedom and are considering a two-tailed test would consult
the curves in the lower-right of Appendix A.17. There we β ∼ 0.1.
Inferences Concerning µY ·x∗ and the Prediction of Future
Y Values
Exercise 12.44
Part (a): The formula for sYˆ is given by
s
sYˆ = s
1 (x∗ − x¯)2
,
+
n
Sxx
(24)
and depends on the value of x∗ .
Part (b): The confidence interval with tα/2,n−2 = qt(1 − 0.025, 25) = 2.059539 is given by
yˆ ± tα/2,n−2 sYˆ ,
is given by (7.223343, 7.960657).
232
(25)
Part (c): A 100(1 − α)% prediction interval is given by
q
yˆ ± tα/2,n−2 s2 + s2Yˆ .
(26)
q
q
SSE
= 18.736
= 0.865702 and the prediction
From Exercise 15 in Section 12.2 we have s = n−2
27−2
interval for Y is given by (5.771339, 9.412661).
Part (d): We now want to know if we measure a 95% confidence interval for the average
strength when the modulus of elasticity is 60 what is the simultaneous confidence interval
for both measurements. We can use the Bonferroni inequality to conclude that the joint
confidence level is guaranteed to be at least 100(1 − kα)% or since k = 2 this is 100(1 −
2(0.05)% = 100(0.9)% = 90%.
Exercise 12.45
Part (a): Here α = 0.1 and we want the confidence interval for prediction which is given
using Equation 24. In the R code for this section we compute the confidence interval
[1] 77.82559 78.34441
Part (b): Here we want a prediction interval. Since all that changes is the standard error
of the prediction interval we get
[1] 76.90303 79.26697
This is an interval with the same center as the confidence interval but one that is wider.
Part (c): These two intervals would have the same center but the prediction interval would
be wider than then confidence interval. In addition, since x∗ = 115 is farther from the mean
x¯ = 140.895 these intervals will be wider than the ones computed at x∗ = 125.
Part (d): We compute 99% the confidence interval of β0 + 125β1 and see where the value of
80 falls relative to it. We find the needed confidence interval given by (77.65439, 78.51561).
Note that every point in this interval is less than 80. Thus we can reject the null hypothesis
is favor of the alternative.
Exercise 12.46
Part (a): We compute a confidence interval for the true average given by (0.320892, 0.487108).
233
Part (b): The value of x∗ = 400 is farther from the sample mean of x¯ = 471.5385 than the
value of x∗ = 500. Thus the confidence interval of the true mean at this point will be wider.
Part (c): Here we are interested in δy = β1 δx = β1 and thus want to consider a confidence
interval on β1 . This confidence interval is given by
βˆ1 ± tα/2,n−2 sβˆ1
with sβˆ1 = √
s
.
Sxx
Using the numbers in the text we compute this to be (0.0006349512, 0.0022250488) .
Part (d): We now need to compute a confidence interval on the true average silver content
when the crystallization temperature is 400. We find this given by (0.1628683, 0.3591317)
since the value of 0.25 is inside of this interval the given data does not contract the prior
belief.
Exercise 12.47
Part (a): We could report the 95% prediction interval for Y when x = 40 and find
(20.29547, 43.60613). This is a rather wide interval and thus we don’t expect to be able
to make very accurate predictions of the future Y value.
Part (b): With this new x value we compute the prediction interval given by (28.63353, 51.80747).
The simultaneous prediction interval will be accurate with accuracy 100(1 − 2α)% or (100 −
10)% = 90% in this case.
Exercise 12.48
Part (a): Plotting the data points shows that a line is not perfect but a good representation
of the data points.
Part (b): Using the lm function in R we get a linear fit given by
yˆ = 52.62689 − 0.22036x .
Part (c): This is given by the value of r 2 which is given by r 2 = 0.701.
Part (d): This is the model utility test. From the summary command we see that the t-value
for β1 is -4.331 with a p-value of 0.00251 indicating that the linear model is significant at
the 0.3% level.
Part (e): We want to consider ∆y = β1 ∆x = β1 (10) = 10β1 . We can compute a confidence
interval for 10β1 . From the summary command we see that sβˆ1 = 0.05088 thus s10βˆ1 = 0.5088
234
and our confidence interval is given by (−3.376883, −1.030294). Since the value of −2 is
inside this interval there is not strong evidence that the value of δy is less than -2.
Part (f): We want the confidence interval for the true mean given x = 100 and x = 200.
We compute these in the R code and find
[1] 22.36344 38.81858
[1] -3.117929 20.228174
The mean of the x values is 125.1 which is closer to the value x = 100 than the value x = 200.
We expect the confidence interval when x = 200 to be larger (as it is).
Part (g): We want the prediction intervals for Y when x = 100 and x = 200. For each of
these we find
[1] 4.941855 56.240159
[1] -18.39754 35.50779
These intervals are larger that the same confidence intervals (as they must be).
Exercise 12.49
The midpoint of the given confidence interval is the estimate yˆ and is given by 529.9. The
range of the two points is 2tα/2,n−2 sYˆ . For the 95% confidence interval the value of tα/2,n−2
is given by 2.306004. Thus we can compute sYˆ and find the value of 29.40151. For the 99%
confidence interval we compute a new value for tα/2,n−2 (since α is now different) and use yˆ
and syˆ to compute the new interval. We get (431.2466, 628.5534).
Exercise 12.50
Part (a): We want to consider the confidence interval for ∆y = β1 ∆x = β1 (.1) = 0.1β1 . In
the R code we compute a 95% confidence interval for 0.1β1 given by (−0.05755671, −0.02898329).
Part (b): This is confidence interval for the true mean of Y when x = 0.5. For a 95%
confidence interval we find (0.4565869, 0.5541655).
Part (c): This is a prediction interval for the value of Y when x = 0.5. For a 95% prediction
interval we find (0.3358638, 0.6748886).
Part (d): We want to guarantee 100(1 − 0.03)% joint confidence intervals for Y |x = 0.3,
Y |x = 0.5, and Y |x = 0.7. Using the Bonferroni inequality we need to measure each
235
individual confidence intervals to a 99$ or α = 0.01 accuracy to guarantee that the joint
confidence intervals are at 97%. We compute the three individual confidence intervals in the
same was as earlier ones.
Exercise 12.51
Part (a): The mean of the x values in this example is given by x¯ = 0.7495 from which
we see that the points x = 0.4 and x = 1.2 are 0.3495 and 0.4505 away from x¯. Thus the
confidence interval for x = 0.4 (since its closer to the mean) should be smaller than the
confidence interval for x = 1.2.
Part (b): We get (0.7444347, 0.8750933).
Part (c): We get (0.05818062, 0.52320338).
Exercise 12.52
We will use the R command lm to answer these questions.
Part (a): A scatter plot of the points looks like a linear model would do a good job fitting
the data. The r 2 of the linear fit is 0.9415 and the p-value for β1 is 1.44 10−5 all of which
indicate that the linear model is a decent one.
Part (b): We want to consider a confidence interval of ∆y = β1 ∆x = β1 (1) = β1 . A
point estimate of β1 is given by 10.6026. The 95% confidence interval for β1 is given by
(8.504826, 12.700302).
Part (c): We get (36.34145, 40.17137).
Part (d): We get (32.57571, 43.93711).
Part (e): This depends on which point is farther from x¯ = 2.666667. Since x = 3 is farther
from x¯ than x = 2.5 its intervals will be the wider ones.
Part (f): The value x = 6 is outside of the x data used to fit the model. Thus it would not
be recommended.
Exercise 12.53
In the given example we find that x¯ = 109.34 and thus the point x = 115 is further away
from x¯ than x = 110 is. Given this we can state that (a) will be smaller than (b), (c) will
236
be smaller than (d), (a) will be smaller than (c), and (b) will be smaller than (d).
Exercise 12.54
Part (a): A scatter plots indicates that a linear model would fit relatively well.
Part (b): Using R we find that the p-value for β1 is given by 0.000879 indicating that the
model is significant.
Part (c): We compute (404.1301, 467.7439).
Exercise 12.55 (the variance of Yˆ )
Using the expression for Yˆ = βˆ0 + βˆ1 x derived in this section we have
!
n
n
n
X
X
X
2
2
ˆ
ˆ
ˆ
Var Y = Var β0 + β1 x = Var
di Yi =
d Var (Yi ) = σ
d2 .
i
i=1
i=1
i
i=1
From the expression in the book for di we have
2
(x − x¯)(xi − x¯) (x − x¯)2 (xi − x¯)2
1
1 (x − x¯)(xi − x¯)
2
P
+
+ P
.
= 2 +2 P
di =
n
n
n (xi − x¯)2
( (xi − x¯)2 )2
(xi − x¯)2
Summing this expression for i = 1, 2, . . . , n − 1, n we get
2 (x − x¯)
(x − x¯)2
1 (x − x¯)2
1
P
.
+ P
0
+
=
+
n n (xi − x¯)2
(xi − x¯)2
n
Sxx
Using the above we have shown that
(x − x¯)2
2 1
ˆ
ˆ
ˆ
Var Y = Var β0 + β1 x = σ
+P
,
n
(xi − x¯)2
(27)
as we were to show.
Exercise 12.56
Part (b): The 95% confidence interval of the population mean for Torque is given by
(440.4159, 555.9841).
Part (c): The t-value of β1 is 3.88 with a p-value of 0.002 which is less than 0.01 thus the
result is significant at the 1% level.
Part (d): Lets look at the prediction interval when the torque is 2.0. We compute
(−295.2629, 1313.0629). This range is much larger than the range of observed y-values.
237
Correlation
Exercise 12.58
Part (a): We get r = 0.9231564 indicating that the two variables are highly correlated.
Part (b): If we exchange the two variables we get the same value for r (as we should).
Part (c): If we change the scale of one of the variables the value of the sample correlation
coefficient does not change.
Part (e): Using the R command lm we find that the t-value on the β1 coefficient is given by
7.594 with a p-value of 1.58 10−5 indicating that the linear model is significant.
Exercise 12.59
Part (a): We compute r = 0.9661835 indicating a strong linear relationship between .
Part (b): Since r > 0 we expect that the percent of dry fiber in the first specimen would
also be larger in than the second specimen.
Part (c): r would not change since the value of r is independent of the units used to measure
x or y.
Part (d): This would be the value of r 2 which is given by 0.9335106.
Part (e): Lets compute a test for the absence of correlation. We compute R the sample
correlation and then our test statistic T is given by
√
R n−2
T = √
.
(28)
1 − R2
For the data here we compute t = 14.98799. Now T above is given by a t-distribution
with n − 2 degrees of freedom. If we take α = 0.01 and find tα,n−2 = 2.583487. Since our
t ≥ tα,n−2 we can reject the null hypothesis of no correlation ρ = 0 in favor of the hypothesis
of a positive correlation ρ > 0.
Exercise 12.60
We want to compute the confidence interval for the population correlation coefficient ρ. The
null hypothesis is that ρ = 0 and from the data (since r = 0.7217517) we hypothesize that
maybe ρ > 0. To test this we compute T given by Equation 28 and find t = 3.612242. The
238
p-value for this value is given by 0.001782464. Note that the summary statistics given in this
problem don’t match the data from the book provided text file ex12-60.txt. For example,
the data file has only 14 samples rather than the 24 claimed in the problem statement.
Exercise 12.61
Part (a): The data itself gives r = 0.7600203.
Part (b): The proportion of observed variation in endurance that could be attributed to
a linear relationship is given by r 2 = 0.5776308. As the value of r does not change if we
exchange the meaning of the x and y variables the proportion of variation explained would
not change either.
Note that the summary statistics given in this problem (for example Sxx = 36.9839) don’t
match what one would compute directly from the data Sxx = 2628930. In fact it looks like
x and y are exchanged.
Exercise 12.62
Part (a): We compute t = 1.740712 and a p-value given by 0.0536438. Thus at the 5%
level the population correlation does not differ from 0.
Part (b): This would be r 2 = 0.201601.
Exercise 12.63
For this data we get r = 0.7728735 with a t-value of t = 2.435934. This gives a p-value of
0.03576068. This is significant at the 5% level.
Note that the book computes tα, n − 2 to be 2.776 (indicating no rejection of the null hypothesize) while I compute it to be 2.131847 which would allow rejection (at the 5% level).
Exercise 12.64
Part (a): From the numbers given we compute r = −0.5730081. Using this we get ν =
−0.65199 using
1+r
1
.
(29)
ν = ln
2
1−r
239
Then compute the endpoints
zα/2
zα/2
,ν + √
,
ν−√
n−3
n−3
from which we get (c1 , c2 ) = (−0.9949657, −0.3090143). Using these the confidence interval
for ρ is given by
2c1
e − 1 e2c2 − 1
.
,
e2c1 + 1 e2c2 + 1
For this we get (−0.7594717, −0.2995401).
Part (b): We now want to test H0 : ρ = −0.5 versus Ha : ρ < −0.5 with α = 0.05.
Following the section in the book, we compute z = −0.4924543. This is to be compared
with −zα = −1.959964. Since our value of z is not less than we cannot reject the null
hypothesis.
Part (c): Since we are considering in-sample results this would be r 2 = 0.3283383.
Part (d): Since now we are considering the population value of ρ we only know a range in
which it must lie and thus a range of the variance that can be explained. This range would
be the square of the confidence interval for ρ computed above or (0.08972426, 0.57679732).
Exercise 12.65
Part (a): Analysis is difficult due to the small sample size. A histogram of the data for x
seems to be skewed i.e. it does not seem to have values above 110. A qqnorm plot of x has
some curvature in it. A qqnorm plot of y looks very streight. The histogram looks normal
also.
Part (b): Lets test the hypothesis that H0 : ρ = 0 vs. the alternative that Ha : ρ > 0.
Computing the confidence interval for ρ we get (0.518119, 0.987159). Since zero is not in this
range we can conclude that there is a linear relationship between x and y and reject the null
hypothesis.
Exercise 12.67
Part (a): The P -value is less than the value of 0.001 and thus we can reject the null
hypothesis at this level of significance.
Part (b): Given the P -value we can compute the t value and then the value of r. Since the
P -value in this case is given by
P − value = P (T ≥ t) + P (T ≤ −t) = 2P (T ≤ −t) = 2pt(−t, n − 2) .
240
Using this we compute t = 3.623906. Then since T is related to R using Equation 28 we can
solve for R to get R = 0.1603017 so that the variance explained is given by R2 = 0.02569664
or only about 2%.
Part (c): We get a P -value now given by 0.01390376 and thus is significant at the 5% level.
Notice however that the value of r is very small and the fraction of variance explained would
be r 2 = 0.000484 so small that this result is to be of no practical value.
Supplementary Exercises
Exercise 12.68
Part (a): When we plot Height as a function of Price we see that there is not a deterministic
relationship but a relationship that is very close to linear.
Part (b): In the R code we plot a scatterplot of the data. It suggests that a linear relationship
will work quite well.
Part (c): Using the R code lm we get the model
Price = 23.77215 + 0.98715Height .
Part (d): Using the R function predict with the option interval=’prediction’ and we
get a prediction interval given by
fit
lwr
upr
1 50.42522 47.3315 53.51895
Part (e): This is the value of R2 which we can read from the summary report to be the
value of 0.9631.
Exercise 12.69
Part (a): We are to compute the 95% confidence interval on the coefficient β1 . When we
do that we get the values
[1] 0.8883228 1.0859790
Part (b): In this case we want the confidence interval of the mean value of Price when
Height=25. Using the R code predict with the option interval=’confidence’ and we get
a confidence interval for the mean given by
241
fit
lwr
upr
1 48.45092 47.73001 49.17183
Part (c): This is the prediction interval and we find
fit
lwr
upr
1 48.45092 45.37787 51.52397
Notice that the prediction interval and the confidence interval have the same center but the
prediction interval is wider than the confidence interval.
Part (d): The mean of the Height variable is 22.73684. The sample point x = 30 is farther
from the mean than the point x = 25 thus the confidence and the prediction intervals for
this point will be wider than the ones when x = 25.
Part (e): We can calculate the sample correlation coefficient by taking the square root of
the coefficient of determination r 2 = 0.9631 to get r = 0.9813766.
Exercise 12.70
First we follow the strategy suggested in this exercise. For this we will compute the prediction
interval of the linear regression when X.DAA=2.01 and see where the value 22 falls relative
to it. We find the prediction interval given by
fit
lwr
upr
1 28.76064 17.39426 40.12701
Since the value of 22 is inside of this prediction range it is possible that the individual is of
age 22 years or less.
Next we consider the “inverse” use of the regression line by first fitting the model
%DAA = β0 + β1 Age .
Then with this model we estimate Age given a value of %DAA = 2.01 by solving the above
for Age. The linear regression we obtain is given by
%DAA = 0.58174 + 0.04973Age ,
so that when we take %DAA = 2.01 and solve for Age we get Age = 28.72179. Using the
formula in this exercise we get a standard error of SE = 0.4380511. This gives us a 95%
prediction interval given by
242
[1] 17.12924 40.31435
Note that this range about the same as the range computed first using the prediction interval
approach. Thus we again conclude that since the value of 22 is inside this prediction range
it is possible that the individual is of age 22 year or less.
Exercise 12.71
Part (a): The change in the rate of the CI-trace method of ∆y would equal β1 ∆x or β1
times the change in the rage of the drain-weight method. If these two rate changes are to be
equal we must have β1 = 1. We will thus compute a confidence interval on β1 and observe
if the value of 1 is inside of it. If so this indicates that the data does not contradict the
hypothesis that β1 = 1. We find our confidence interval for β1 given by
[1] 0.7613274 1.0663110
Since the value of 1 is inside of this region the stated result is plausible.
Part (b): This would be given by the square root of the coefficient of determination or r 2 .
Taking this square root we get r = 0.9698089.
Exercise 12.72
Part (a): From the degrees of freedom column we since the the C Total degrees of freedom
is 7. Since this is equal to n − 1 we have that n = 8.
Part (b): Using the given parameter estimates we have
326.976038 − 8.403964(35.5) = 28.63532 .
Part (c): Given that the R-squared is 0.9134 we would conclude that yes there seems to be
a significant linear relationship.
Part (d): The sample correlation coefficient would be the square root of the coefficient of
determination or r 2 . From the SAS output we get r = 0.9557196.
Part (e): Loading in the data we see that the smallest value of the variable CO is 50. As we
are asking to evaluate the model where CO is 40 this would not be advised.
243
Exercise 12.73
Part (a): This would be the value of the coefficient of determination or r 2 = 0.5073.
Part (b): This would be the square root of the value of r 2 where we find r = 0.71225.
Part (c): We can observe the P -value for either the F -value for the entire model or the
T -value for the coefficient β1 (they must be the same for simple linear regression). We see
that this P -value is 0.0013. Since this is smaller than the value of 0.01 we have that the
linear model is significant.
Part (d): We would want to compute a confidence interval when the content is x = 0.5.
Using the numbers from the SAS output we find a 95% confidence interval given by (see the
R code)
[1] 1.056113 1.275323
The center of this confidence interval is the point prediction.
Part (e): We would predict 1.014318. The residual of the prediction when x = 30 is
0.8 − 1.014318 = −0.214318.
Exercise 12.74
Part (a): When we plot the data for this problem we see one point with a very large value of
CO that could be an outlier. It does appear to be close to the linear fit that would be present
if it was removed and thus might not problematic to the linear fit. Using the R command lm
we get a P -value for the fit O(10−6) and an r 2 = 0.9585 indicating a good linear fit.
Part (b): Computing a prediction interval when CO is 400 gives
fit
lwr
upr
1 17.22767 11.95623 22.4991
Part (c): We can remove the largest observation and refit to see if it has any impact on the
coefficients of the linear model. When we do this we get
> m$coefficients
(Intercept)
CO
-0.2204143
0.0436202
> m2$coefficients
244
(Intercept)
1.00140419
CO
0.03461564
Here m is the model fit to all of the data and m2 is the model with the potential outlier
removed. We see that the value of intercept coefficient has changed. This is not surprising
since the P -value for this coefficient is large in both fits and its also not necessarily significant
since the data points are relatively far from the origin. We also see that the slope coefficient
has not changed that much in-line with what we argued above. Thus for the data that
remains (after we remove the potential outlier) it does not appear to have a significant
impact.
Exercise 12.75
Part (a): This would be given by the linear regression
rate = 1.693872 + 0.080464 speed .
Part (b): This would be given by the linear regression
speed = −20.056777 + 12.116739 rate .
Part (c): The coefficient of determination or r 2 would be the same for each regression. We
find it given by 0.974967.
Exercise 12.76
Part (a): Using the R function lm we get a linear fit given by
y = −115.195 + 38.068x ,
with an R2 = 0.5125. Given this low value of R2 we conclude that a linear fit is not that
great of a modeling procedure for this data. A scatterplot of the data points confirms this
in that there seems to be two clusters of points.
Part (b): If a linear model is appropriate this would be a test on the value of the parameter
β1 . We can compute a confidence interval on this parameter and find
[1] 16.78393 59.35216
This is a rather wide confidence interval and the value of 50 is inside of it so we cannot reject
the null hypothesis that β1 = 50.
245
Part (c): Since the expression for the standard error for β1 is given by
sβˆ1 = √
s
,
Sxx
under the given suggestion for changing the sample x values the expression for s should not
change while Sxx will. In the original formulation we compute that Sxx = 148.9108 while
under the new suggested sampling scheme we compute Sxx = 144. Since this value is smaller
than the original value of Sxx we will have the new value of sβˆ1 larger and thus have lost
accuracy. It is better to stick with the more dispersed original x values.
Part (d): We want confidence and prediction intervals when x = 18 and x = 22. We find
the confidence intervals for these two points given by
fit
lwr
upr
1 570.0294 451.8707 688.1881
2 722.3016 655.9643 788.6388
The prediction intervals for these two points are given by
fit
lwr
upr
1 570.0294 284.6873 855.3715
2 722.3016 454.2358 990.3673
Exercise 12.77
Part (a): A scatter plot of this data looks very linear.
Part (b): The linear fit would be given by
y = −0.082594 + 0.044649x .
Part (c): This would be R2 which is given by R2 = 0.9827.
Part (d): The prediction from the linear model when x = 19.1 is given by yˆ = 0.7701935.
The residual would be 0.68 − 0.7701935 = −0.09019349.
Part (e): The P -value for the linear fit is O(10−7) indicating a strong linear relationship.
Part (f): If a linear model holds then we would have ∆y = β1 ∆x = β1 and a confidence
interval on this change is a confidence interval on β1 . We compute this to be
[1] 0.03935878 0.04993829
246
The center value of this interval or 0.04464854 is the point estimate.
Part (g): In this case we want a confidence interval on 20β1 . Which is just 20 times the
interval from above or
[1] 0.7871756 0.9987658
Exercise 12.78 (the standard deviation of β0 )
Using the formulas in the book we have
2
1
x
¯
∗
2
Var βˆ0 + βˆ1 x = Var βˆ0 = σ
,
+
x∗ =0
n Sxx
so the standard error of βˆ0 is estimated by
1/2
1
x¯2
sβˆ0 = s
.
+
n Sxx
and the confidence interval for β0 is then given by
βˆ0 ± tα/2,n−2 sβˆ0 .
For the data in Example 12.11 we get an estimate standard deviation of βˆ0 given by 0.187008
and a 95% confidence interval for β0 given by
[1] 125.8067 126.6911
Exercise 12.79 (and expression for SSE)
From the definition of SSE we have that
X
X
SSE =
(yi − yˆi )2 =
(yi − (βˆ0 + βˆ1 xi ))2 .
With βˆ0 = y¯ − βˆ1 x¯ the above is
X
X
[yi − y¯ + βˆ1 x¯ − βˆ1 xi ]2 =
[yi − y¯ − βˆ1 (xi − x¯)]2
X
=
[(yi − y¯)2 − 2βˆ1 (yi − y¯)(xi − x¯) + βˆ12 (xi − x¯)2 ]
X
= Syy − 2βˆ1
(yi − y¯)(xi − x¯) + βˆ12 Sxx
Since βˆ1 =
Sxy
Sxx
= Syy − 2βˆ1 Sxy + βˆ12 Sxx .
we get
Sxy
SSE = Syy − 2
Sxx
as we were to show.
Sxy +
2
2
Sxy
Sxy
= Syy −
= Syy − βˆ1 Sxy ,
Sxx
Sxx
247
Exercise 12.80
I think the answer is no. If x and y are such that r ≈ 1 then x and y almost lie on a streight
line with a positive slope. Thus variable y is linearly related to x. The variable y 2 would
then be quadratically related to x and thus would not lie on a straight line when regressed
against x as it would if r ≈ 1 thus r 2 should be different than 1.
Exercise 12.81
Part (a): We start with y = βˆ0 + βˆ1 x and replace βˆ0 = y¯ − βˆ1 x¯ to get
y = y¯ + βˆ1 (x − x¯) .
Next recall that βˆ1 and r are given/defined by
P
(xi − x¯)(yi − y¯)
S
xy
P
=
βˆ1 =
Sxx
(xi − x¯)2
and r = √
so we can write βˆ1 as
Sxy
p
,
Sxx Syy
s
sP
s
P
p
√ p
1
2
2
(y
−
y
¯
)
s2y
r
S
S
S
sy
(yi − y¯)
i
xx
yy
yy
n−1
ˆ
P
P
√
=r
=r .
β1 =
=r
=r
= r√
1
2
2
2
(xi − x¯)
sx
sx
(xi − x¯)
Sxx
Sxx
n−1
Thus
y = y¯ + r
sy
(x − x¯) ,
sx
as we were to show.
Part (b): Using the data from Ex. 12.64 we compute that r = −0.5730081. If our patients
x
= −1. Thus
average age is below the average age by one standard deviation then x−¯
sx
r
sy
(x − x¯) = −rsy = +0.5730081sy ,
sx
and the patients predicted ∆CBG is then 0.5730081 standard deviations above the average
∆CBG.
Exercise 12.82
We start with the t-stat test for H0 : ρ = 0 where
√
R n−2
.
T = √
1 − R2
248
Since R =
Sxy
1/2 1/2
Sxx Syy
we have that R2 =
2
Sxy
Sxx Syy
so
2
Sxx Syy − Sxy
.
1−R =
Sxx Syy
2
Thus T becomes
T =
1/2
√
Sxy n − 2
=p
.
2
2
Sxx Syy −Sxy
Sxx Syy − Sxy
√
Sxy n − 2
√
1/2
Sxx Syy
1/2
1/2
Sxx Syy
For the t-statistic of the test H0 : β1 = 0 we have
√
Sxy
ˆ
β1 − β1 βˆ1
Sxy n − 2
Sxx
√
T =
= s =
= 1/2 √
.
s
1 √ SSE
1/2
1/2
S
SSE
xx
1/2
Sxx
S
n−2
xx
β =0
Sxx
1
Using various relationships we have
X
X
X
SSE =
yi2 − βˆ0
yi − βˆ1
xi yi
P 2
X
( yi )
= Syy +
− (¯
y − βˆ1 x¯)n¯
y − βˆ1
xi yi
n
X
= Syy + n¯
y 2 − n¯
y 2 + n¯
xy¯βˆ1 − βˆ1
xi yi
X
= Syy + βˆ1 n¯
xy¯ −
xi yi
Using Equation 30 we have
SSE = Syy − βˆ1 Sxy = Syy −
2
Sxy
.
Sxx
Thus the t-statistic of the test H0 : β1 = 0 becomes
√
Sxy n − 2
T =p
,
2
Sxx Syy − Sxy
the same as we had earlier showing the two are the same.
Exercise 12.83
We start with the “computational” expression for SSE (the expression the book recommends
to use in computing SSE).
X
X
SSE =
(yi − yˆi )2 =
[yi − (βˆ0 + βˆ1 xi )]2
X
X
X
=
yi2 − βˆ0
yi − βˆ1
xi yi .
P
Next recall that the total sum of squares SST is given by (yi − y¯)2 = Syy by the definition
of Syy thus we have that
P 2
P
yi
xi yi
n¯
y
SSE
=
− βˆ0
− βˆ1
SST
Syy
Syy
Syy
249
Using the facts that
βˆ0 = y¯ − βˆ1 x¯
Sxy
βˆ1 =
Sxx
X
X
Syy =
yi2 − (
yi )2 /n so
we get for the ratio
SSE
SST
X
yi2 = Syy + (n¯
y )2 /n = Syy + n¯
y2 ,
above given by
P
xi yi
Syy + n¯
y2
Sxy
Sxy
n¯
y
SSE
=
− y¯ −
x¯
−
SST
Syy
Sxx
Syy Sxx
Syy
X
Sxy
Sxy
x¯y¯ −
xi yi .
=1+n
Sxx Syy
Sxx Syy
Next lets consider Sxy . We find
X
X
Sxy =
(xi − x¯)(yi − y¯) =
(xi yi − xi y¯ − x¯yi + x¯y¯)
X
X
=
xi yi − n¯
xy¯ − n¯
xy¯ + n¯
xy¯ =
xi yi − n¯
xy¯ .
(30)
SSE
we get
xi yi and putting it in the expression for SST
SSE
Sxy
Sxy
x¯y¯ −
(Sxy + n¯
xy¯)
=1+n
SST
Sxx Syy
Sxx Syy
2
Sxy
=1−
.
Sxx Syy
Using this by solving for
P
From the definition of r we see that this expression is 1 − r 2 .
Exercise 12.84
Part (a): A scatter plot of the data looks like a linear fit might be appropriate.
Part (b): Using the R command lm we find that a point prediction is given by 98.29335.
Part (c): This would be the estimate of σ which from the summary command we see is
given by 0.1552.
Part (d): This would be the value of R2 from the summary command again we see that this
is given by 0.7937.
Part (e): A 95% confidence interval for β1 is given by
[1] 0.06130152 0.09008124
250
The center of this interval is 0.07569138.
Part (g): When we add this new data point and then refit our linear model we get the two
different models (the first is the original model and the second is the new model)
Call: lm(formula = Removal ~ Temp, data = DF)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 97.498588
0.088945 1096.17 < 2e-16 ***
Temp
0.075691
0.007046
10.74 8.41e-12 ***
Residual standard error: 0.1552 on 30 degrees of freedom
Multiple R-squared: 0.7937,
Adjusted R-squared: 0.7868
Call: lm(formula = Removal ~ Temp, data = DF_new)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 97.27831
0.16026 607.018 < 2e-16 ***
Temp
0.09060
0.01284
7.057 6.33e-08 ***
Residual standard error: 0.2911 on 31 degrees of freedom
Multiple R-squared: 0.6163,
Adjusted R-squared: 0.604
This new point seems to change the model coefficients (especially β1 ) it increases the estimate
of σ and decreases the value of R2 . All of these indicate that this data point does not well
fit the other data points.
Exercise 12.85
A plot of the data looks like a linear fit would do a reasonable job at modeling this data.
From the summary command in R we see that a linear model is significant (the P -value for
the F -statistics is small) and that the coefficients are estimated well (small P -values). The
percentage of the observed variance explained by the linear model is 0.6341.
Exercise 12.86
Part (a): A plot of the data points indicates that a linear model is appropriate. The linear
fit give
HW = 4.7878 + 0.7432 BOD .
If the two techniques measure the same amount of fat then we would expect β1 ≈ 1. A 95%
confidence interval of β1 is given by
251
[1] 0.5325632 0.9539095
Since the value of 1 is not in this interval indicating that they do not measure the same
amount of fat.
Part (b): We could fit a linear model with y = HW and x = BOD which is the “inverse”
of the model above. We get that this model is given by
HW = 0.581739 + 0.049727 BOB .
Exercise 12.87
From the given description and the data we compute that T = 0.2378182 while when H0
is true T is given by a t-distribution with n1 + n2 − 4 degrees of freedom. Since a 95%
confidence interval would constrain T to the values such that |t| ≤ 2.048407 we see that we
cannot reject H0 and therefore conclude that the two slopes are equal.
252
Nonlinear and Multiple Regression
Problem Solutions
Note all R scripts for this chapter (if they exist) are denoted as ex13 NN.R where NN is the
section number.
Exercise 13.1
Part (a): Recall the equation for the variance of the ith residual given in the book
2
1
(x
−
x
¯
)
i
2
V (Yi − Yˆi ) = σ 1 − −
.
n
Sxx
(31)
Using this on the given data gives standard deviations for each residual given by
[1] 6.324555 8.366600 8.944272 8.366600 6.324555
Part (b): For this different set of data (where we have replaced the point x5 = 25 with
x5 = 50) we get
[1] 7.874008 8.485281 8.831761 8.944272 2.828427
Notice that in general all standard deviations (except the last) than they were before.
Part (c): Since the last point has a smaller standard deviation in the second case the least
squares fit line must have moved to be “closer” to this data point. Thus the deviation about
the estimate line in this case must be less than previously.
Exercise 13.2
The standardized residual plot for this question would plot the pairs (xi , e∗i ). When we do
that we get the plot where all e∗i values are bounded between −2 and +2. Thus there is
nothing alarming (in that the model was not fitting the data) to be seen in this plot.
Exercise 13.3
Part (a): A plot of the raw residuals as a function of x shows samples above and below the
horizontal line y = 0. This plot does not show any atypical behavior of the model.
253
Part (b): Computing the standardized residuals and the ratio ei /s give values that are very
close.
Part (c): The plot of the standard residuals as a function of x looks much the same as a
plot of the residuals as a function of x.
Exercise 13.4
Part (a): I don’t see much pattern in the plot of the residuals as a function of x. Thus this
plot does not contradict the assumed linear fit.
Part (b): The standardized residuals when plotted seem to have values bounded by O(1.5)
which does not contradict the assumed linear model.
Exercise 13.5
Part (a): The fact that the R2 is so large indicates that the linear model is quite a good
one for this data.
Part (b): A plot of the residuals as a function of Time indicates that there might be some
type of nonlinear relationship of the form (Time − 3)2 and thus we might want to add this
quadratic predictor. The location of the “peak” of this quadratic term (at 3) was specified
“by eye”. If one wanted to not have to specify that free parameter (the peak of the quadratic)
one could simply add the term Time2 to the existing regression or use the mean or the median
of the x values for the centering location.
Exercise 13.6
Part (a): The R2 = 0.9678 being so large indicates that a linear model fits quite well to
this data.
Part (b): A plot of the residuals as a function of x show a model with a quadratic term
might be more appropriate for this data (rather than just a linear model).
Part (c): A plot of the standardized residuals as a function of x does not show any significant
deviation from the expected behavior i.e. the values are not too large (bounded by ±2). This
plot still indicates that a quadratic model should be considered.
254
1.5
8
10
12
0.5
rstandard(m)
6
−1.5 −0.5
DF$Y1
4 5 6 7 8 9 10 11
4
14
5
6
7
8
9
10
9
10
9
10
rstandard(m)
−1.5
−0.5
0.5 1.0
m$fitted.values
3 4 5 6 7 8 9
DF$Y2
DF$X13
4
6
8
10
12
14
5
6
7
8
m$fitted.values
1
rstandard(m)
−1
0
10
6
8
DF$Y3
2
12
3
DF$X13
4
6
8
10
12
14
5
6
8
0.5
12
−0.5
sres
10
8
−1.5
6
DF$Y4
7
m$fitted.values
1.5
DF$X13
8
10
12
14
16
18
7
DF$X4
8
9
10
11
12
m$fitted.values
Figure 12: Scatter plots and residual plots for Exercise 13.9.
Exercise 13.7
Part (a): A scatter plot of the data does not look that linear. In fact, it seems to have an
increasing component and then a decreasing component indicating that a quadratic model
may be more appropriate.
Part (b): A plots of the standardized residuals as a function of x don’t show any very large
values for the standardized residuals. The residual plot still indicates that a quadratic term
might be needed in the regression.
Exercise 13.8
A plot of the standardized residuals as a function of HR shows that there are two values of HR
that have relatively large values of the standardized residual. The values of the standardized
residuals have values near ±2 and while this is not large in an absolute measure they are the
largest found in the data set. Thus if one has to look for outliers these would be the points
to consider.
255
Exercise 13.9
In Figure 12 (in the first column) we present scatter plots of each of the four data sets
given in this problem. From this, we can see that even though each data set has the ”same”
second order statistics the scatter plots look quite different. In the second column we present
residual plots of this data.1 The residual plots give an indication as to how well the linear
fit is performing. Starting at the first row we see that the data has a linear form and the
residual plot shows no interesting behavior (i.e. looks like the expected plot when a linear
model is a decent approximation to the data). In the second row our scatter plot has a
curvature to it that will not be captured by a linear model. The residual plot shows this in
that it also has a quadratic (curved) shape. The third row shows a scatter plot that seems
to look very linear but that has an extream outlier near the right end of the x domain. The
standardized residual plot shows this and we see a relatively large (order of 3) value for this
same sample. This gives an indication that this point is an outlier. The fourth plot has
almost all of the data at a single point x = 8 with a single point at the value x = 19. The
R implementation of the standard residuals in rstandard gives NaN for this data point. To
alleviate this we compute the standard residuals ”by hand” using the formula in the book.
In the plot that results we see that this point is separated from the main body of the data
just as it is in the scatter plot.
Exercise 13.10
Part (a): Our predictions are given by yˆi = βˆ0 + βˆ1 xi with βˆ1 =
consider the residual ei given by
ei = yi − (βˆ0 + βˆ1 xi ) ,
Sxy
Sxx
and βˆ0 = y¯ − βˆ1 x¯. Now
we have that the sum of the ei is given by
X
X
X
ei =
yi − βˆ0 n − βˆ1
xi = n¯
y − n(¯
y − βˆ1 x¯) − βˆ1 n¯
x = 0,
as we were to show.
Part (b): I believe that because the error terms that produce Yi from β0 + β1 xi are independent between Yi and Yj (under the assumptions of simple linear regression) that the
residuals would be also, but I’m not sure how to prove this. If anyone knows how to prove
this or can supply any more information on this please contact me.
Part (c): Consider the given sum. We have from the definition of ei that
X
X
xi ei =
(xi yi − xi βˆ0 − x2i βˆ1 )
X
X
=
xi yi − βˆ0 n¯
x − βˆ1
x2i
X
X
=
xi yi − n¯
x(¯
y − βˆ1 x¯) − βˆ1
x2i
1
For this problem we assume a residual plot means a plot with the fitted values yˆ on the x-axis and the
standardized residuals of the linear fit on the y axis. Another interpretation of a residual plot could be to
plot the independent variable on the x-axis and the standardized residuals from the model fit on the y-axis.
256
Next using the definition of Sxy and Sxx we can show
X
X
Sxy =
xi yi − n¯
xy¯ and Sxx =
x2i − n¯
x2 .
P
Using these expressions in what we have derived for
xi ei gives
X
X
X
x2i
xi ei =
xi yi − n¯
xy¯ + n¯
x2 βˆ1 − βˆ1
X 2
ˆ
= Sxy + n¯
xy¯ − n¯
xy¯ + β1 n¯
x −
x2i
= Sxy − βˆ1 Sxx = Sxy − Sxy = 0 ,
as we were to show.
P ∗
the stanPart (d): It is not true that
ei = 0. We can see this by actually computing
P ∗
∗
ei 6= 0. This is
dardized residuals ei for an linear regression example
P ∗and showing that
done in the R code for Ex.13-3 where we get that
ei = 0.2698802 6= 0.
Exercise 13.11
Part (a): Using the expression from the previous chapter that relates Yˆi in terms of Yj we
have that
n
X
1 (xi − x¯)(xj − x¯)
Yˆi =
dj Yj with dj = +
.
n
S
xx
j=1
Using this we have the ith residual given by
Yi − Yˆi = (1 − di )Yi −
X
dj Yj .
j6=i
As Yi and Yj are independent using the rule of variances we have that
)
(
)
(
n
X
X
X
2
2
2
2
2
2
2
2
2
dj .
dj = σ 1 − 2di +
Var Yi − Yˆi = (1−di ) σ +
dj σ = σ 1 − 2di + di +
j6=i
j6=i
Using the result form Exercise 12.55 on Page 237 we have that
n
X
j=1
d2j =
1 (xi − x¯)2
+
.
n
Sxx
Thus the variance of the residual is given by
2
2
1
1
(x
−
x
¯
)
(x
−
x
¯
)
i
i
2
+
Var Yi − Yˆi = σ 1 − 2
+
+
n
Sxx
n
Sxx
2
1 (xi − x¯)
= σ2 1 − −
,
n
Sxx
the expression we were to show.
257
j=1
Part (b): If Yˆi and Yi − Yˆi are independent then by taking the variance of both sides of the
identity
Yi = Yˆi + (Yi − Yˆi ) ,
we get
σ 2 = Var Yˆi + Var Yi − Yˆi
or Var Yi − Yˆi = σ 2 − Var Yˆi .
Using Equation 27 in the above we get
2
(x
−
x
¯
)
1
i
2
2
,
+
Var Yi − Yˆi = σ − σ
n
Sxx
which when we simplify some is the same as before.
Part (c): From Equation 27 we see that as x moves from x¯ Var Yˆi increases while from
the equation above we see that under the same condition Var Yi − Yˆi decreases.
Exercise 13.12
For this exercise we will use properties derived in Exercise 13.10 above.
Part (a): Note that as required for residuals we don’t have
the given values cannot be residuals.
Part (b): Note that as required for residuals we don’t have
the given values cannot be residuals.
P
P
ei = 0 as we should. Thus
xi ei = 0 as we should. Thus
Exercise 13.13
A similar expression for the residuals would be
Z=
yi − yˆi − 0
σ 1−
1
n
−
(xi −¯
x)2
Sxx
1/2 ,
which would have a standard normal distribution. If we estimate σ with s =
above expression would have a t-distribution with n − 2 degrees of freedom.
q
SSE
n−2
then the
The requested probability is given by 2 * pt( -2.5, 25-2 ) and equals 0.01999412.
Exercise 13.14
Part (a): Here we follow the prescription given in the problem and compute SSE = 7241.013,
SSPE = 4361, and SSLF = 2880.013. This gives an F statistics of f = 3.30201. If we
258
compute the critical value of the F -distribution with the given degrees of freedom (at the
5% level) we get a critical value of 4.102821. As our statistic is not larger than this we
cannot conclude that the results are significant, cannot reject the H0 hypothesis, and have
no evidence to conclude that the true relationship is nonlinear. Note if we decrease our level
of significance to 10% we get a critical value of 2.924466 which is less than our statistic
indicating we can reject the hypothesis H0 .
To make using this analysis easier the test H0 vs. Ha was written in a function H0 linear model.R.
This function computes the F statistic for our test and the probability of receiving a value
of F that large or larger by chance. For this data we get that probability to be 0.07923804.
Since this is larger than 5% and less than 10% we cannot reject H0 at the 5% level but we
can at the 10% level. In agreement with the statement earlier.
Part (b): A scatter plot of this data does look nonlinear in that the response y seems to
decrease as x increases. This is in contradiction to the above test at a significance of 5% but
not at 10%.
Notes on regressions with transformed variables
If our probabilistic model is Y = αeβx ǫ then
E[Y ] = αeβx E[ǫ] .
2
If ǫ is log-normal i.e. log(ǫ) ∼ N (µ, σ 2) then it can be shown that E[ǫ] = eµ+σ /2 . This latter
expression will be near one if µ = 0 and σ 2 ≪ 1 and we can conclude that in such cases
E[Y ] ≈ αeβx .
Exercise 13.15
Part (a): A scatter plot shows what looks to be exponential decay.
Part (b): A scatter plot of the logs of both x and y looks much more linear.
Part (c): We would have log(Y ) = β0 + β1 log(X) + ǫ or Y = αxβ ǫ or a multiplicative power
law.
Part (d): Our estimated model is log(Y ) = 4.638 − 1.049 log(X) + ǫ with σ = 0.1449077.
If we want a prediction of moisture content when x = 20 we have log(20) = 2.995732 and
the prediction of yˆ′ from our linear model gives 1.495298 with a 95% prediction interval of
(1.119196, 1.8714). If we transform back to the original coordinates we get a mean value for
y given by 4.460667 and a prediction interval for Y of (3.062392, 6.497386).
Part (e): Given the small amount of data the residual plots look good.
259
Exercise 13.16
The plot of Load vs. log(Time) looks like it could be taken as linear. The estimate of β1 is
given by -0.4932 which has p-value given by 0.000253 indicating that it is estimated well with
the given data. When the load is 80 I get a center and a 95% prediction interval for the linear
model of (2.688361, −2.125427, 7.50215) and which becomes (14.70756, 0.119382, 1811.933)
when we transform back to the original variables. Notice that this last prediction interval is
so wide that it is unlikely to be of any practical utility (it covers almost the range of all the
data).
Exercise 13.17
Part (a): If we want to assume a multiplicative power law model Y = αxβ ǫ then to linearize
we will need to take logarithms of both the original variables x and y.
Part (b): The model parameters for the transformed variables are well estimated and the
variance explained by the model is large (R2 ≈ 0.9596) indicating that the linear model for
the transformed variables is a good one. We can next consider some diagnostic plots on the
transformed linear model to see if there are any ways that it could be improved. A plot of
the fitted values on the x-axis and the standardized residuals on the y-axis does not indicate
any model difficulties.
Part (c): For the one-sided test asked for here the T statistic we compute is given by
T =
βˆ − βˆ0
,
sβ
where βˆ0 = 43 . We reject H0 if t-statistics for the test is less then the threshold −tα,n−2 of
the t-distribution. Here we have n = 13 data points and I compute our t statistic to be
t = −1.02918. I compute −tα,n−2 = −1.795885. Since our t-statistic is not less than this
number we cannot reject H0 in favor of Ha .
Part (d): We will have truth to the given statement when
y(5) = 2y(2.5) .
Since the model fit is a power law the above expression is equivalent to
α5β = 2α(2.5)β
or β = 1 .
when we solve for β. Thus the hypothesis test we want to do is H0 : β = 1 and Ha : β 6= 1.
The t-statistic for this test is given by 3.271051 and the critical value for a two sided test
is tα/2,n−2 = 2.200985. Since our t-statistic is larger than the critical value we reject H0 in
favor of Ha .
260
Exercise 13.18
In plotting the raw data we see that a direct linear model would not fit the data very
well. The range of the variable Cycfail is quite large and thus a better model will result
if logarithms are applied to that variable. It is more difficult to decide whether or not to
take a logarithm of the variable Strampl since both scatter plots look very much the same
qualitatively. One could fit linear models to both transform and compare the R2 between
the two selecting the model with the larger R2 . The R2 of the model
Strampl = β0 + β1 log(Cycfail) ,
is 0.4966854 while that for the model
log(Strampl) = β0 + β1 log(Cycfail) ,
is 0.468891. Since the two models give similar values for R2 it is not clear which is the better
choice. Since it is somewhat simpler (and has a larger R2 ) we will consider the first model.
For a value of Cycfail of 5000 we get a mean value and a 95% prediction interval given for
Strampl given by (0.008802792, 0.003023201, 0.01458238).
Exercise 13.19
Part (a): A scatter plot shows an exponential or power law decay of Lifetime as a function
of Temp.
Part (b): If we think that the relationship between the predictor
log(Lifetime) is linear then we would have
log(Lifetime) = β0 +
1
Temp
and the response
β1
+ ǫ′ .
Temp
Solving for Lifetime we get
β1
β1
Lifetime = eβ0 e Temp ǫ = αe Temp ǫ .
A scatter plot of the data transformed in the above manner looks linear.
Part (c): The predicted value for Lifetime is given by 875.5128.
Part (d): When we run the R code H0 linear model.R on the suggested date we find a F
statistic given by 0.3190668 and a probability of getting a value this large or larger (under
the H0 hypothesis) of 0.5805192. Thus we don’t have evidence to reject H0 and there is no
evidence that we need a nonlinear model to fit this data.
261
Exercise 13.20
Under the suggested transformations we get scatter plots that look more or less linear. One
could fit a linear model to each transformation and compute the R2 . The transformation
that resulted in the largest value for R2 could be declared the best. This is a model selection
problem and there maybe be better methods of selecting the model to use. ”By eye” the
best model to use looked to be one of
1
,
y = β0 + β1
x
or
1
.
log(y) = β0 + β1
x
Exercise 13.21
For this problem I choose to model
y = β0 + β1
104
x
.
Where I found β0 = 18.139488 and β1 = −0.148517. The predicted value of y when x = 500
is given by 15.16915.
Exercise 13.22
Part (a): Yes and models
1
= α + βx .
y
Part (b): Yes and models
log
1
−1
y
= α + βx .
Part (c): Yes and models
log(log(y)) = α + βx .
Part (d): No unless λ is given.
262
Exercise 13.23
Under both of these models if
epsilon is independent from x then we can calculate the variance of the two suggested models.
We find
Var (Y ) = Var αeβx ǫ = α2 e2βx σ 2
Var (Y ) = Var αxβ ǫ = α2 x2β σ 2 ,
both of which are values that depend on x. Sometimes if you find that simple linear regression
gives you a variance that depends on x you can try to find a transformation (like ones
discussed in this section) that removes this dependence.
Exercise 13.24 (does age impact kyphosis)
Looking at the MINITAB output we see that the coefficient of age in this logistic regression
is 0.004296 which has a Z value of 0.73 and a p-value of 0.463. Thus we have a 46.3% chance
of getting a estimated coefficient of age in this model this large or larger when it is in fact
zero. We conclude that age does not significantly impact kyphosis.
Exercise 13.25
In this case (in contrast to the previous exercise) we see that age does influence the regression
results in that the p-value for the age coefficient is small 0.007 indicating that there is less
than 1% chance that we got a coefficient this large (or larger) by chance.
Exercise 13.26
Part (a): A scatter plot of the data appears consistent with a quadratic regression model.
Part (b): This is given by R2 , which from the MINITAB output we see is 0.931.
Part (c): The full model has a p-value of 0.016 which is significant at the 5% level.
Part (d): From the output we have the center of the prediction interval to be 491.10, the
value of σYˆ = 6.52. We need to compute the critical values tα/2,n−(k+1) for α = 0.01 to
compute the 99% confidence interval where we find (453.0173, 529.1827).
Part (e): We would keep the quadratic term at the significance level of 5%. The p-value
for the quadratic term is 0.032 thus we would not keep it at 2.5% level.
263
Exercise 13.27
Part (a): A scatter plot looks very much like a quadratic fit will be appropriate.
Part (b): Using the given regression we have that yˆ = 52.8764 which gives a residual of
0.1236.
Part (c): This is the R2 which we calculate to be 0.8947476.
Part (d): The plot of the standardized residuals as a function of x show two points with
numbers close to 2. A normal probability plot also shows two points that deviate from the
line y = x. These two points could be considered in further depth if needed. Given the small
data set size I don’t think these values are too worrisome.
Part (e): For a confidence interval we get (48.53212, 57.22068).
Part (f): For a prediction interval we get (42.8511, 62.9017).
Exercise 13.28
Part (a): This is given by 39.4113.
Part (b): We would predict 24.9303.
Part (c): Using the formula given for SSE we get SSE = 217.8198, s2 = MSE = 72.6066
and s = 8.52095.
Part (d): From the given value of SST we get R2 = 0.7793895.
Part (e): From the value of sβˆ2 we can compute tβˆ2 = −7.876106. The p-value for a t-value
this “large” is given by 0.002132356. Since this is less than 0.01 (the desired significance)
we can conclude that our results are significant at the 1% level.
Exercise 13.29
Part (a): For predicted values and residuals I get
[1] 82.12449 80.76719 79.84026 72.84800 72.14642 43.62904 21.57265
and
264
[1] -1.1244938
2.2328118 -0.8402602
2.1520000 -2.1464152 -0.6290368
0.4273472
We get SSE = 16.77249 and s2 = 5.590829.
Part (b): We get R2 = 0.9948128. This is a very large result indicating quite a good fit to
the data.
Part (c): We get a t value for β2 given by −6.548501 this has a p-value of 0.003619986 and
we are significant at the 1% level. This indicates that the quadratic term does belong in the
model.
Part (d): Using the Bonferroni correction we need to calculate each confidence interval
to at least 97.5% so that our combined confidence interval will be 95%. We find the two
confidence intervals given by (for β1 and then for β2 )
[1] 0.4970034 3.8799966
[1] -0.005185555 -0.001146845
The guarantee that we are within both of these is good to 95%
Part (e): I get a confidence interval of (69.03543, 76.66057) and a prediction interval of
(64.4124, 81.2836).
Exercise 13.30
Note that the data given in the file ex13-30.txt does not seem to match the numbers quoted
in this problem.
Part (a): From the output give we see that R2 = 0.853 indicating a good fit to the data.
Part (b): I compute a confidence interval for β2 given by (−316.02234, 45.14234).
Part (c): I compute a t-value of -1.828972. For the one-sided test suggested here I compute
a 5% critical threshold given by -2.919986. Since the t-value we compute is not less than
this number there is no evidence to reject H0 and accept Ha .
Part (d): I compute this to be (1171.958, 1632.342).
Exercise 13.31
Note that the data given in the file ex13-31.txt does not seem to match the numbers quoted
in this problem.
265
Part (a): Using the numbers given in this problem statement and the R command lm we
get a quadratic model given by
Y = 13.6359 + 11.4065x − 1.7155x2 + ǫ .
Part (b): A residual vs. x plot does not show any interesting features. A scatter plot of
the raw data shows that the point with the largest value of x might have large influence.
Part (c): From the output of the summary command we can read that σ = 1.428435 so that
s2 = 2.040426, and R2 = 0.9473. Such a large value for R2 indicates that the quadratic fit
is a good one.
Part (d): We are given the values of (when we square) Var(Yˆj ) and will use independence
of Yˆj and Yj − Yˆj to write
2
ˆ
ˆ
σ = Var Yj + Var Yj − Yj .
If we replace σ 2 ≈ s2 we can solve for Var Yj − Yˆj in the above to get
This gives
Var Yj − Yˆj = s2 − Var Yˆj .
[1] 1.12840095 1.12840095 1.53348195 1.43669695 1.43669695 1.43669695 0.06077695
For the variances of each residual. To get the standard deviation of each residual we take
the square root of the above numbers to get
[1] 1.0622622 1.0622622 1.2383384 1.1986229 1.1986229 1.1986229 0.2465298
A plot of the standardized residuals looks much like the plot in Part (b) of this problem.
I don’t get a huge difference between using the correct sample standard deviation and the
value estimated for s.
Part (e): I get a prediction interval given by (27.29929, 36.32877).
Exercise 13.32
Part (a): The estimated regression function for the “centered” regression is
Y = 0.3463 − 1.2933(x − 4.3456) + 2.3964(x − 4.3456)2 − 2.3968(x − 4.3456)3 + ǫ .
266
Part (b): If we expand the above polynomials we can estimate the coefficient of any power
of x. In fact since we want the coefficient of x3 this is just the coefficient of the monomial
polynomial (x − 4.3456)3 which from the above we see is -2.3968. To compute the estimate
β2 we have to compute the second order term in (x − 4.3456)3 and then add it to the second
order term from (x − 4.3456)2 . Expanding the cubic term we have
(x − x¯)3 = x3 − 3¯
xx2 + 3¯
x2 x − x¯3 ,
to give a coefficient of the quadratic term of −3¯
x. We need to multiply this by β3∗ and then
add β2∗ to this to get the full coefficient of x2 . We get the value 33.64318.
Part (c): I would predict the value of yˆ = 0.1948998.
Part (d): The t-value for the cubic coefficient is -0.975, which is not that large. The p-value
for this number is 0.348951 indicating that with almost 35% chance we can get a result this
large or larger for the coefficient of β3∗ when in fact it is zero.
Exercise 13.33
Part (a): We would compute yˆ = 0.8762501 and yˆ = 0.8501021 which are different from
what the book reports. I’m not sure why if anyone sees an error I made please contact me.
Part (b): The estimated regression
function for the unstandardized model would be given
p
x
by expanding each of the terms x−¯
for the value of p ∈ {0, 1, 2, 3} to get polynomials in
sx
x and then add these polynomials.
Part (c): The t-value for the cubic term is given by 2 and the critical values for a tdistribution with n−(k+1) = 7−4 = 3 degrees of freedom is given by tα/2,n−(k+1) = 3.182446.
Notice that our t-value is less than this indicating that it is not significant at the 5% level
and should be dropped.
Part (d): The values of R2 and MSE for each model would be the same, since the two
models are equivalent in their predictive power.
Part (e): If we compute the adjusted R2 for each model we find
adjustedR22 = 0.9811126
adjustedR32 = 0.9889571 .
Since the adjusted R2 is larger for the cubic model than the quadratic model the increase in
R2 resulting from the addition of the cubit term is worth the “cost” of adding this variable.
This result is different from what the book reports. I’m not sure why if anyone sees an error
I made please contact me.
267
Exercise 13.34
Part (a): I would compute yˆ = 0.8726949.
Part (b): We compute SSE = 0.1176797 for the transformed regression (the one in terms
of x′ ). This gives R2 = 0.9192273.
Part (c): The estimated regression
for the unstandardized model would be given
p
function
x−¯
x
for the value of p ∈ {0, 1, 2} to get polynomial in x
by expanding each of the terms sx
and then adding these polynomials.
Part (d): This would be the same as the estimated standard deviation of β2∗ which we see
is 0.0319.
Part (e): The t-value for β2 is given by 1.404389 while the tα/2,n−(k+1) (for α = 0.05) critical
value is 2.228139. Since our t-value is less than this critical value we can not reject H0 and
and conclude that we should take β2 = 0. For the second order term the two tests specified
are the same and would reach the same conclusions.
Exercise 13.35
Using the R command lm I find the model
log(Y ) = 0.2826 − 0.008509x − 0.000003449x2 ,
we note that the O(x2 ) coefficient is not significant however. This result is different from
what the book reports. I’m not sure why if anyone sees an error I made please contact me.
Exercise 13.36
Part (a): The value of β1 represents the increase in maximum oxygen uptake with a unit
increase in weight (i.e. increasing weight by one kilogram) and holding all other predictors
in the model fixed. Here the estimated value of β1 = 0.1 > 0 indicates that as the male gets
older his maximum oxygen update should increase. The value of β3 represents the increase in
maximum oxygen uptake when the time necessary to walk 1 mile increase by one unit (one
minutes) again holding all other predictors in the model fixed. Here the estimated value
β3 = −0.13 means that for every minute needed to walk one mile the maximum oxygen
uptake decreases by -0.13.
Part (b): I compute 1.8.
Part (c): The mean value of our distribution when the x values are as specified is given
by 1.8 and our variance is given by σ 2 = 0.42 = 0.16. Thus the probability is given by
268
0.9544997.
Exercise 13.37
Part (a): I compute 4.9.
Part (b): The value of β1 represents the increase in total daily travel time when the distance
traveled in miles increases by one mile (holding all other predictors in the model constant).
The same type of statement can be made for β2 .
Part (c): I compute this would be 0.9860966.
Exercise 13.38
Part (a): I compute this to be 143.5.
Part (b): When the viscosity is 30 we have x1 = 30 and our expression for the mean
becomes
Y = 125 + 7.75(30) + 0.095x2 − 0.009(30)x2 = 357.5 − 0.175x2 .
Thus the change in Y associated with a unit increase in x2 is then given by -0.175.
Exercise 13.39
Part (a): I compute this mean to be 77.3.
Part (b): I compute this mean to be 40.4.
Part (c): The numerical value of β3 represents the increase in sales (1000s of dollars) that
the fast food outlet gets to having a drive-up window.
Exercise 13.40
Part (a): I compute the expected error percentage to be 1.96.
Part (b): I compute the expected error percentage to be 1.402.
Part (c): This would be the coefficient of the x4 variable or the value of β4 = −0.0006. If
we increase x4 by 100 (rather than 1) we would get ∆Y = 100β4 = −0.06.
269
Part (d): The answers in Part (c) do not depend on the other x values since the model is
purely linear. It would depend on the other x if there were interaction terms (like x1 x4 or
x2 x4 etc.).
Part (e): I get R2 = 0.4897959. For the model utility test I get a test statistic f of 6. The
critical value is Fα,k,n−(k+1) = 2.75871. Since our value of f is larger than this value we can
reject the H0 hypothesis (the model is not predictive) in favor of Ha the model is useful.
Exercise 13.41
For the model utility test I get a test statistic f of 24.41176. The critical value is Fα,k,n−(k+1) =
2.420523. Since our value of f is larger than this value we can reject the H0 hypothesis (the
model is not predictive) in favor of Ha (the model is useful).
Exercise 13.42
Part (a): From the MINITAB output we have that the f value is 319.31 which has a pvalue of zero to three digits. Thus this model is significant to an accuracy of at least 0.04%
(otherwise the given p-value would have been written as 0.001).
Part (b): I get a 95% confidence interval for β2 given by (2.117534, 3.882466). Since this
interval does not include 0 we can be confident (with an error of less than or equal to 5%)
that β2 6= 0.
Part (c): I get a 95% confidence interval for Y at the give values of x given by (−15.26158, 203.26158).
Part (d): From the MINITAB output we have that MSE = σ
ˆ 2 = 1.12. Using this we can
estimate a 95% prediction interval to be (−15.28295, 203.28295).
Exercise 13.43
Part (a): I get the value of 48.31 and a residual of 3.689972.
Part (b): We cannot conclude this since there is an interaction term x1 x2 in the regression.
Part (c): There appears to be a useful relationship of the type that is suggested (with an
interaction term). We can see this from the p-value for the model as whole.
Part (d): The t-value and p-value for the coefficient β3 is given in the SAS output. From
that we see that this term is significant at the α = 0.003 level.
270
Part (e): I compute a confidence interval given by (21.70127, 41.49134).
Exercise 13.44
Part (a): The estimate of β1 is the numerical change in the water absorption for wheat
flour for a unit change in x1 (flour protein). The interpretation of β2 is the same as for β1
(but now for a unit change in starch damage).
Part (b): This is the R2 for which we find to be 0.98207.
Part (c): The p-value for the model fit is very small (zero to the precision given) thus the
model is a useful one.
Part (d): No. The 95% confidence interval of starch damage does not include zero indicating
that the predictor is useful.
Part (e): From the SPSS output we have MSE = 1.1971 = σ
ˆ 2 or σˆ = 1.09412 from the
value reported in the Standard Error output. Using this we can compute the prediction
and confidence intervals and find
[1] 21.70127 41.49134
[1] -15.28295 203.28295
Part (f): For the estimated coefficient β3 we have a t-value of -2.427524 while the α = 0.01
threshold tα,2,n−(k+1) (when we have three predictors) is 2.79694. Since our t-value is not
larger than this in magnitude we conclude that we cannot reject the null hypothesis that
β3 = 0 and we should not include this term in the regression
Exercise 13.45
Part (a): Given the value of R2 we can compute a model utility test. We find f = 87.59259
and a critical value Fα,k,n−(k+1) = 2.866081. As f is greater than this critical value the model
has utility.
Part (b): I compute 0.9352.
Part (c): I get a prediction interval given by (9.095131, 11.086869).
271
Exercise 13.46
Part (a): From the given numbers I compute R2 = 0.835618 and an f value for the model
utility test of f = 22.88756. This has a p-value of 0.000295438. Since this p-value is less
than 0.05 there seems to be a useful linear relationship.
Part (b): The t-value for β2 is found to be 4.00641 where the tα/2,n−(k+1) critical value
is 2.262157. Since our t-value is larger than this critical value we conclude that the type
of repair provides useful information about repair time (when considered with elapsed time
since the last service).
Part (c): I get a confidence interval for β2 given by (0.544207, 1.955793).
Part (d): I get a prediction interval given by (2.914196, 6.285804).
Exercise 13.47
Part (a): β1 is the change in y energy content for a one unit change in x1 % percent plastics
by weight. β4 is the same for the fourth variable.
Part (b): The MINITAB output shows a p-value for the entire model as zero to three
digits. This indicates that there is a useful linear relationship between at least one of the
four predictors.
Part (c): The p-value for % garbage is 0.034 and thus would stay in the model at a
significance of 5% but not a significance of 1%.
Part (d): I compute a confidence interval given by (1487.636, 1518.364).
Part (e): I compute a prediction interval given by (1436.37, 1569.63).
Exercise 13.48
Part (a): The f -value for the given regression is 8.405018 which has a p-value of 0.01522663.
Note that this is not significant at the 1% level but is significant at the 5% level.
Part (b): I compute a confidence interval of (18.75891, 25.17509) for expected weight loss.
Part (c): I compute a prediction interval of (15.54913, 28.38487) for expected weight loss.
Part (d): We are asked for a F -test for a group of predictors. That is we have a null
272
hypothesis H0 that
βl+1 = βl+2 = · · · = βk = 0 ,
so the reduced model
Y = β0 + β1 x1 + · · · + βl xl + ǫ ,
is correct vs. the alternative Ha that at least one of the βl+1 , βl+2 , · · · , βk is not 0. Given
the values of SSE I’m getting an f -value for this test to be f = 6.431734. This has a p-value
of 0.02959756 indicating that there is a useful relationship between weight loss and at least
one of the second-order predictors.
Exercise 13.49
Part (a): I compute µ
ˆY ·18.9,43 = 96.8303 and a residual of -5.8303.
Part (b): This is a model utility test for the model as a whole. I compute a f value of
14.89655 which has a p-value of 0.001395391. Thus this model is “useful” at the 1% level.
Part (c): I compute a confidence interval of (78.28061, 115.37999).
Part (d): I compute a prediction interval of (38.49285, 155.16775).
Part (e): To do this problem we first have to solve for the estimated standard deviation
of βˆ0 + βˆ1 x1 + βˆ2 x2 at the given point in question. I compute this to be 25.57073. Once
we have this we can compute the 90% prediction interval as we normally do. I compute
(46.88197, 140.63003).
Part (f): For this we look at the t-value for β1 . From the MINITAB output we see this
is given by -1.36 which has a p-value of 0.208, thus we should probably drop this predictor
from the analysis.
Part (g): The F -statistic for the model H0 : Y = β0 + β2 x2 + ǫ vs. Ha : Y = β0 + β1 x1 +
β2 x2 + ǫ is given by 1.823276 while if we square the t-value for β1 in the MINITAB output
we get 1.8496. These two numbers are the same to the first decimal point. I’m not sure why
they are not closer in value. If anyone knows the answer to this please contact me.
Exercise 13.50
Part (a): I get a t-value for β5 given by 0.5925532 with a p-value of 0.5751163, indicating
that we should not keep this term (when the others are included in the model).
Part (b): Each t-value is computed with all other variables in the model. Thus each can
be small but the overall model can still be a good one.
273
Part (c): I get an f -value for this test of 1.338129 which has a p-value of 0.3472839 indicating
that we should not keep the quadratic terms in the model.
Exercise 13.51
Part (a): The plots suggested don’t indicate that the model should be changed.
Part (b): I get an f -value for the model utility test of 5.039004 and a p-value for that of
0.02209509. This indicates that the model is valid at the 5% level.
Part (d): I get an f -value for this test of 3.452831 and a p-value of 0.07151767, thus there is
at least a 7% chance that the reduction in SSE when we use the full model is due to chance.
At the 5% level we should drop these quadratic terms.
Exercise 13.52
Part (a): We
would want to do a model comparison test between the full quadratic model
3
with k = 2 + 3 = 6 predictors (I’m assuming all cross product terms in addition to all
quadratic terms) and the linear model with l = 3 predictors. I get an f -value of 124.4872
and a p-value for this of 8.26697 10−9 indicating that we should keep the quadratic terms in
the model.
Part (b): I get a prediction interval of (0.5618536, 0.7696064).
Exercise 13.53
The model as a whole is good (low p-value). The predictor x2 could perhaps be dropped not
significant at the 1% level but is significant at the 5% level.
Exercise 13.54
Part (a): We would map the numerical values given to the coded variables and then evaluate
the expression for y given in this problem.
Part (b): For each variable one would need to produce a mapping from the coded variables
to the uncoded variables. For x1 such a mapping could be
0.3 − 0.1
(coded value − 0) + 0.3 = 0.1 × coded value + 0.3 .
uncoded value =
−(−2)
274
If we solve for the coded value in terms of the uncoded value we would replace each coded
value in the regression with an expression in terms of its uncoded value.
Part (c): I get a f -value of 2.281739 with p-value of 0.06824885. This means that at the
5% level we cannot conclude that the quadratic and cross-product terms add significant
improvement to the linear model.
Part (d): I get a 99% confidence interval of (84.34028, 84.76932) since this does not include
the value of 85.0 we can conclude that the given information does contradict this belief (with
a chance of being wrong of 1%).
Exercise 13.55
Part (a): We would take the logarithm of the multiplicative power model which would give
log(Q) = log(α) + β log(a) + γ log(b) + log(ǫ) .
Then if we find β0 , β1 , and β2 using linear regression we would have
α = eβ0
β = β1
γ = β2 .
We can use the R command lm to estimate the above parameters and find
α = 4.783400
β = 0.9450026
γ = 0.1815470 .
We can also use this linear model to make predictions. For the requested inputs we predict
a value of Q = 18.26608.
Part (b): Again talking logarithms of the suggested model we would have
log(Q) = log(α) + βa + γb + log(ǫ) .
Again we could fit the above model with least squares.
Part (c): We can get this by transforming the confidence interval given in terms of q into the
variable of interest q by taking the exponential of each variable to get (1.242344, 5.783448).
Exercise 13.56
Part (a): I compute a f -statistic (given the value of R2 ) of 9.321212 this has a p-value of
0.0004455507 indicating that there is a linear relationship between y and at least one of the
predictors.
275
Part (b): I compute Ra2 = 0.6865 for the original model (the one with x2 ) and Ra2 = 0.7074
for the model when we drop x2 .
Part (c): This would be a model utility test where the null hypothesis is that the coefficients
of x1 , x2 , and x4 are all zero. For this test I get an f -value of 2.323232 with a p-value of
0.1193615. Since the p-value is so large we cannot reject H0 and must conclude that the
coefficients stated are “in fact” zero.
Part (d): Using the transformed equation I predict yˆ = 0.5386396.
Part (e): I compute a confidence interval for β3 given by (−0.03330515, −0.01389485).
Part (f): From the transformed equation we can write
x3 − x¯3
x5 − x¯5
y = β0 + β3
+ β5
s3
s5
β5
β3 x¯3 β5 x¯5 β3
−
+ x3 + x5 ,
= β0 −
s3
s5
s3
s5
which shows how to transform coefficients from the standardized model to the unstandardized
model. The coefficient for x3 in the unstandardized model is then given by
0.0236
β3
=−
= −0.00433449 .
s3
5.4447
We would compute the value of sβˆ3 using the normal rules for how variables transform under
multiplication i.e.
1
1
sβˆ3 = sβˆ′ =
(0.0046) = 0.0008448583 .
s3 3 5.4447
Part (g): I compute this prediction interval to be (0.4901369, 0.5769867).
Exercise 13.57
Part (a): We first start by converting the SSE into MSEk using
MSEk =
SSE
.
n−k−1
We then pick the model with the smallest MSEk as a function of k. This seems to be when
k = 2.
Part (b): No. Forward selection when considering a subset of two variables would start
with the result of the best single variable, which in this case this is x4 . Forward selection
would then add other variables to it and keeping the variable that had the best metric (R2
or MSE) from all pairs that included x4 as one of the two variables. Thus there is no way to
test the variable pairing (x1 , x2 ) which is the optimal one for this problem if we search over
all subsets of size two.
276
Exercise 13.58
In the first step we dropped the variable x3 since it had the smallest t-value and was below
the threshold tout . In the second step we dropped the variable x4 for the same reason. No
t-value was less than tout and the procedure stopped with three variables x1 , x2 , and x4 .
Note that it looks like tout = 2.0.
Exercise 13.59
It looks like the MINITAB output is presenting the best 3 models over all model subsets
of size k where k = 1, 2, 3, 4, 5. From the best Cp result with k = 4 it looks like x1 , x3 ,
x4 and x5 are important. The best Cp results with k = 3 indicates that x1 , x3 , and x5 are
important. These are the two models I would investigate more fully.
Exercise 13.60
The first “block” of outputs (from the first “Step” row down to the second “Step” row)
represents the backward elimination method and the second block of outputs represents the
forward selection method.
In the backward elimination first we remove the feature sumrfib, then the feature splitabs
and then the feature sprngfib and then stop. At each step (it looks like) we are removing
the feature with the smallest t-value under 2.0.
In the forward selection method we first select the feature %sprwood and then the feature
sumltabs and then stop. If we assume the minimum t-value for inclusion in the forward
selection method is the same as in the backwards elimination procedure no single additional
feature (if added) would have a t-value larger than 2.0.
Exercise 13.61
Severe multicollinearity is a problem if Ri2 is larger than 0.9. All Ri2 values given here are
smaller than this threshold.
Exercise 13.62
= 0.4210526.
We consider a sample to unusual if its hii value is larger than the value 2(k+1)
n
From the values given we see that observations 14, 15, and 16 are candidates for unusual
observations.
277
Exercise 13.63
The presence of a observation with a large influence means that this point could adversely
affect the estimate of the parameters in the linear regression. That we have points like
this casts some doubt on the appropriateness of the previous given model estimates. The
observation with a large standard residual is a point that does not fit the model very well
and (if you believe your model) could be an outlier point i.e. one that is not really generated
by the process we are attempting to study. Dropping the outlier point makes sense if it is
found out that it is not really representative of the system we are trying to study.
Exercise 13.64
Part (a): We would want to compare the given hii values to
The observation with hii = 0.604 appears to be influential.
2(k+1)
n
= 0.6 in this problem.
Part (b): Lets look at the change in the values of βi when we include the second point and
when we exclude it (divided by the standard error of βi ). We find these numbers given by
[1] 0.4544214 0.5235602 0.7248858
The value of β3 changes by almost 70% of a standard error. This is a relatively large change.
The other values of β change quite a bit also. This observation does seem influential.
Part (c): In this case the relative change in the values of βi are given by
[1] -0.1446507
0.2617801 -0.1712329
These are all relatively small in value and this observation does not seem influential.
Exercise 13.65
Part (a): See Figure 13 for the boxplots for this problem. From this plot we see that when
a crack appeared the value of ppv was larger “on average” than when a crack did not appear.
To test if the mean of the two distributions are different we use the Z-test described in an
earlier chapter. This gives a 95% confidence interval between the difference in mean between
the values of ppv for samples where cracking happens to samples where cracking does not
happen given by (146.6056, 545.4499). Thus we can be “certain” that the ppv for cracked
samples is larger than for un-cracked samples. Note this result is different from what the
book reports. If anyone sees anything wrong with what I have done please contact me.
278
1200
1000
800
200
400
600
ppv
0
1
crack appeared (0=False;1=True)
Figure 13: Boxplots for the data from Exercise 13.65.
Part (b): A scatter plot with a linear fit of ppv on the x-axis and Ratio on the y-axis looks
like a good model fit. To test if this can be improved we plot the standardized residuals e∗
as a function of the fitted values yˆ. This plot does not show anything that would make us
think the linear fit was not a good one. Note that one of the standardized residuals is -4
indicating that we should probably investigate that point.
Exercise 13.66
Part (a): The slope βˆ1 = 0.26 is the estimate of how much flux will change for a unit
increase in the value of inverse foil thickness. The coefficient of determination or R2 = 0.98
indicates that this is a good model fit.
Part (b): I compute this to be 5.712.
Part (c): Yes.
Part (d): Using the MINITAB output for sample 7 we have σYˆ = 0.253 when invthick =
45.0. This gives a 95% confidence interval given by (10.68293, 11.92107).
Part (e): Note that the value found for β0 is perhaps not significant.
279
Exercise 13.67
Part (a): By increasing x3 by one unit we expect y to decrease by -0.0996 holding all other
variables constant.
Part (b): By increasing x1 by one unit we expect y to increase by 0.6566 holding all other
variables constant.
Part (c): I would have predicted the value of 3.6689 which gives a residual of -0.5189.
Part (d): I compute R2 = 0.7060001.
Part (e): We compute an f -statistic of 9.005105 which has an p-value of 0.0006480373 thus
we have a useful model.
Exercise 13.68
Part (a): A scatter plot of log(time) versus log(edges) does suggest a linear relationship
between the two variables.
Part (b): This would be
time = α edgeβ ǫ ,
or a power law.
Part (c): Using the R code lm I compute the model
log(time) = −0.7601 + 0.7984 log(edge) + ǫ ,
with σǫ2 = 0.01541125. A point prediction for log(time) is given by 3.793736 which gives a
point prediction of time of 44.42206.
Exercise 13.69
Part (a): A plot of Temperature as a function of Pressure does not look linear but seems
to take a curved shape.
Part (b): I find that a plot of Temperature as a function of log(Pressure) looks very linear.
Using the R command lm I find a model given by
Temperature = −297.273 + 108.282 log(Pressure) + ǫ ,
with σǫ2 = 72.45414. Note that a plot of the fitted values yˆ on the x-axis and the standardized
residuals on the y-axis shows a cluster of points with almost a linear shape and we see a
280
point with a very large standardized residual. If we ignore these difficulties and use this
model in any case with a measured Pressure of 200 then we find a 95% prediction interval
for Temperature given by (257.8399, 295.0415).
We could try to fit a polynomial model to this data. A second order model has the same
point with a large standardized residual that the log model did. A cubic model does not
have any single point with a sufficiently large standardized residual but the intercept term
(i.e. β0 is no longer significant). Residual plots of the second order model don’t seem to
indicate the need for a cubic term and thus I would probably stop with a quadratic model.
Exercise 13.70
Part (a): For the model without the interaction term I get R2 = 0.394152 and for the model
with the interaction term I get R2 = 0.6409357.
Part (b): For the reduced model (with only two terms) I find a f -statistic of 1.951737 which
has a p-value of 0.2223775, indicating that this model is perhaps spurious (will not hold up
out of sample). When we add the interaction term our f -statistic becomes 2.975027 with a
p-value of 0.1355349. This second model appears to be much more significant than the first.
Note that neither model is significant at the 5% level.
Exercise 13.71
Part (a): The p-value for the fit using all five predictors is 2.18 10−8 indicating that there
is “utility” in this model. The R command summary gives
> summary(m)
Call:
lm(formula = pallcont ~ pdconc + niconc + pH + temp + currdens,
data = DF)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 25.0031
20.7656
1.204 0.239413
pdconc
2.2979
0.2678
8.580 4.64e-09 ***
niconc
-1.1417
0.2678 -4.263 0.000235 ***
pH
3.0333
2.1425
1.416 0.168710
temp
0.4550
0.2143
2.124 0.043375 *
currdens
-2.8000
1.0713 -2.614 0.014697 *
Residual standard error: 5.248 on 26 degrees of freedom
Multiple R-squared: 0.8017,
Adjusted R-squared: 0.7636
281
F-statistic: 21.03 on 5 and 26 DF,
p-value: 2.181e-08
Note that some of the predictors don’t appear to be as important. For example pH does not
seem to be significant at the 10% level (when the other predictors are included).
Part (b): I’ll assume the second order model includes 52 = 10 cross-terms and 5 quadratic
terms. In that case I’m getting R2 = 0.801736 for the model without quadratic and interaction terms and an R2 = 0.9196336. This is an expected result and is not sufficient to
conclude that the more complex model is better.
Part (c): We can perform a model utility test with the null hypothesis that the coefficients
of all quadratic terms are zero. The f -statistic for this test is 1.075801 with a p-value
of 0.4608437. This indicates that the reduction in SSE due to the quadratic terms is not
sufficient to warrant the more complex model.
Part (d): The model fit using all five predictors with pH2 has all predictors important at
the 5% level. The least important predictor is temp which has a p-value of 0.015562.
Exercise 13.72
Part (a): I compute R2 = 0.9505627 thus the model seems to have utility.
Part (b): I get an f -value of 57.68292 with a p-value of 2.664535 10−15.
Part (c): Following the hint we take the square root of the given F -value to get 1.523155
this gives the t-value of the test βcurrent-time = 0. A p-value for this t-value is 0.1393478 which
indicates that this predictor can be eliminated.
Part (d): I’m not sure how to compute an estimate of σ 2 given the inputs to this problem
since we have removed predictors from the complete second-order model where we were told
that SSE = 0.80017 and thus the value of SSE would change from that value (it would
have to increase). If we assume that it does not change however with a new value of k (the
number of predictors used in our model) we get a prediction interval for the given point of
(7.146174, 7.880226). This seems relatively accurate.
Exercise 13.73
Part (a): I compute a value of R2 = 0.9985706 for the quadratic model. This implies an
f -statistic of 1746.466 with a p-value of 7.724972 10−8. All of these which indicates a good
fit to the data and a useful model.
Part (b): The quadratic coefficient has a t-value of -48.11 and a p-value of 7.330253 10−8.
This indicates that it is estimated well and a linear model would not do as well as the
282
quadratic model. In addition, the F -test (for the inclusion of the quadratic terms) has an
f -value given by t2 and would be significant.
Part (c): Only if the residual plots indicated that one might be needed. We only have
n = 8 data point so by adding another predictor we might start to overfit the data.
Part (d): I get a confidence interval for the mean value when x = 100 of (21.0700, 21.6566).
Part (e): I get a prediction interval for the mean value when x = 100 of (20.67826, 22.04834).
Exercise 13.74
We could perhaps model log10 (y) as a function of x. This gives the model
log10 (y) = −9.12196 + 0.08857x + ǫ .
The prediction when x = 35 of y is then given by y = 9.508204 10−7.
Exercise 13.75
Part (a): This is a model utility test where we can consider whether or not the expanded
model including x2 decreases the mean square error enough to be included. We can recall
that the f -value for the test of including an additional predictor in a model is equal to t2 .
From the information given here we compute our f -value to be f = 59.08421. This has a
p-value of 0.0001175359 indicating that we should include the quadratic term.
Part (b): No.
Part (c): I compute a confidence interval for E[Y ] given the value of x to be (43.52317, 48.39903).
Exercise 13.76
Part (a): This is the R2 which we can read from the MINITAB output to be 0.807.
Part (b): The f -value for this fit is 12.51 which has a p-value of 0.007 and the model
appears to be useful.
Part (c): The p-value for this variable is 0.043 which is less than 5% and thus we would
keep it at that level of significance.
Part (d): I compute a confidence interval for β1 given by (0.06089744, 0.22244256).
283
Part (e): I compute a prediction interval for E[Y ] given by (5.062563, 7.565437).
Exercise 13.77
Part (a): I compute this point estimate to be 231.75.
Part (b): I compute R2 = 0.9029993.
Part (c): For this we need to consider a model utility test. We find an f -value of 41.8914
which gives a p-value of 2.757324 10−5. This indicates that there is utility to this model.
Part (d): I compute a prediction interval to be (226.7854, 232.2146).
Exercise 13.78
This is the application of the F test for a group of predictors where we want to see if the
addition of the quadratic terms has sufficiently reduced the mean square error to justify
their inclusion. I compute a f -statistic of 1.056076 and a p-value for this of 0.4465628. This
indicates there is not sufficient evidence to reject the hypothesis that the coefficients of the
quadratic terms are all zero. We should therefore not include them in the model.
Exercise 13.79
Part (a): When we plot the various metrics we don’t see an isolated extremum for any
of them. Thus we have to pick the number of predictors to use where the metric curves
start to asymptote (i.e. the marginal improvement in adding additional predictors begins to
decrease). From these plots this looks to be when we have around 4 predictors.
Part (b): Following the same logic as above but with the new values of Rk2 , MSEk and Ck
corresponding to the expanded model I would select five predictors.
Exercise 13.80
Part (a): I get a f -statistic of 2.4 which has a p-value of 0.2059335. This indicates that
there is greater than 20% chance that the model extracted is simply noise and will have no
out of sample predictive power.
Part (b): A high R2 is helpful when the number of predictors k is small relative to n. In this
case k and n are about the same value indicating that the in-sample R2 might overestimate
the out-of-sample R2 .
284
Part (c): We can repeat the analysis above for larger and larger values of R2 . We do that
in the R code for this problem. When we do that we find that for a R2 value of around 0.96
the p-value is less than 0.05.
Exercise 13.81
Part (a): We have k = 4 and for the value of n given we get an f -statistic of 106.1237.
This has a p-value that is zero to the accuracy of the numerical codes used to calculate it.
Thus this model is significant.
Part (b): I find a 90% confidence interval for β1 given by (0.01446084, 0.06753916).
Part (c): I compute a t-value for β4 given by 5.857143 which has a p-value of 4.883836 10−8
indicating that this variable is important in the model.
Part (d): I compute yˆ = 99.514.
Exercise 13.82
Part (a): This is a list where we have combined all predictors from the previous two
models. The R2 in that case must increase from the larger of the two R2 s. This is because
this expanded model has more degrees of freedom and can fit the data “no worse” than
each of the component models. Thus we can conclude that for the combined model R2 >
max(0.723, 0.689) = 0.723.
Part (b): In this case as x1 and x4 are both predictors in the first model with R2 = 0.723
removing the predictors x5 and x8 can only cause our R2 to decrease thus with this smaller
model we would expect R2 < 0.723.
285
Distribution-Free Procedures
Note on the Text
The Large Sample Approximation in the Wilcoxon Signed-Rank Test
Recall that Wi is the indicator random variable representing whether or not the ith samples
contributes its rank to S+ . Now under H0 in forming the statistic S+ every sample will
contribute its rank to S+ with a probability 1/2 (and with probability 1/2 it will contribute
the value 0 to S+ ). Thus
" n
#
n
n
n
X
X
X
1
n(n + 1)
1X
1 n(n + 1)
E[Wi ] =
=
E[S+ ] = E
Wi =
(i + 0) =
.
i=
2
2 i=1
2
2
4
i=1
i=1
i=1
Now the variance of S+ can be computed using independence as
Var (S+ ) =
n
X
Var (Wi ) .
i=1
To use this we need to estimate Var (Wi )
Var (Wi ) =
Thus
E[Wi2 ]
i2
− E[Wi ] = −
2
2
n
1X 2 1
i =
Var (S+ ) =
4 i=1
4
2
i
i2
= .
2
4
n(n + 1)(2n + 1)
6
.
Each of these are the results given in the text.
Notes on Ties in Absolute Magnitude
Given the signed ranks
1, 2, −4, −4, +4, 6, 7, 8.5, 8.5, 10 ,
the duplicate number four is because we had three tied values with sequential ranks 3, 4,
and 5. When we average these three values we get
3+4+5
= 4.
3
and would get signed ranks of −4, −4, and 4 thus τ1 = 3. The negative numbers (vs. the
positive numbers) are due to the fact that those samples must have magnitudes below the
median. Next we must have had two tied values with sequential ranks of 8 and 9 which have
an average rank of
8+9
= 8.5 ,
2
and we would have τ2 = 2.
286
Problem Solutions
The Wilcoxon signed rank test is in the R routine wilcoxon signed rank test.R and can
be used to help in solving some of these problems. The function will return the numerical
value of s+ given a sample of data and the mean of the null hypothesis. One will still need to
look up critical values of S+ from the tables in the appendix to determine if we will should
reject or accept the null hypothesis.
Some of the computational steps needed for the Wilcoxon rank-sum test is in the R routine
wilcoxon rank sum test.R and can be used to help in solving some of these problems.
All R scripts for this chapter (if they exist) are denoted as ex13 NN.R where NN is the section
number.
Exercise 15.1
Our hypothesis test for this data is
H0 : µ = 100
Ha : µ 6= 100 .
We compute a value of s+ = 27 (the back of the book has the value of s+ = 35 but I
think that is a typo). We are told to consider α = 0.05 so α/2 = 0.025. When we look in
Table A.13 where n = 12 for the row with an α value near 0.025 we find that when α = 0.026
we have c = 64 and we reject when s+ is greater than c or less than n(n+1)
− c = 14. Since
2
our value of s+ is between these two limits we cannot reject H0 in favor of Ha . This agrees
with the result (using the t-distribution) found when working Exercise 32 (see Page 193).
Exercise 15.2
Our hypothesis test for this data is
H0 : µ = 25
Ha : µ > 25 .
We compute a value of s+ = 11. We want to perform a one-sided test with α = 0.05. When
we look in Table A.13 where n = 5 we find for α = 0.062 that c1 = 14. Since our value of
s+ is smaller than this we cannot reject H0 in favor of the conclusion Ha . This result agrees
with the conclusions of Example 8.9 in the book.
287
Exercise 15.3
Our hypothesis test for this data is
H0 : µ = 7.39
Ha : µ 6= 7.39 .
We compute a value of s+ = 18. We want to consider a two-sided test with α = 0.05 so
α/2 = 0.025. When we look in Table A.13 where n = 14 for the row with an α value near
0.025 we find that when α = 0.025 we have c = 84. We will reject H0 when s+ is greater
− c = 21. Since our sample value of s+ = 18 is smaller than this
than c or less than n(n+1)
2
smallest value we can reject H0 in favor of Ha .
Exercise 15.4
Our hypothesis test for this data is
H0 : µ = 30
Ha : µ < 30 .
This is a one-sided test with α = 0.1. When we look in Table A.13 where n = 15 for the row
with an α value near 0.1 we find that when α = 0.104 we have c1 = 83. Since our sample
value of s+ = 39 is larger than the critical value of c2 = n(n+1)
− c1 = 37 we cannot reject
2
H0 in favor of Ha .
Exercise 15.5
This is a two sided test so if we take α = 0.05 we would have α/2 = 0.025. Then for n = 12
in Table A.13 we find α = 0.026 gives c1 = 64. Thus we will reject H0 if s+ is larger than
− c1 = 14. Since we have s+ = 72 we can reject H0 in
c1 = 64 or smaller than c2 = n(n+1)
2
favor of Ha .
Exercise 15.6
This is a two sided test so if we take α = 0.05 we would have α/2 = 0.025. Then for n = 9
in Table A.13 we find α = 0.027 gives c1 = 39. Thus we will reject H0 if s+ is larger than
c1 = 39 or smaller than c2 = n(n+1)
− c1 = 6. Since in this case we have s+ = 45 we can
2
reject H0 in favor of Ha .
288
Exercise 15.7
For this exercise we will use the large sample version of the Wilcoxon test (ignoring the
correction to the estimate of σS+ due to duplicate differences). For this data we compute
s+ = 443 a sample z = 2.903522 which has a P-value given by 0.003689908. Thus we should
reject H0 in favor of Ha in this case.
Exercise 15.8
For this hypothesis test we compare
H0 : µ = 75
Ha : µ > 75 .
Since we have n = 25 > 20 we will use the large sample test (ignoring the correction to σS+
due to duplicate Xi − 75 values). For the given data set I compute s+ = 226.5, which has
z = 1.722042 which has a P-value of 0.04253092. Thus at the 5% level we can reject H0 in
favor of Ha .
Exercise 15.9
See the python code ex15 9.py where we enumerate the various ranks that the Xi could
have. For each of these ranks we then evaluate the D statistic. Possible values for the D
statistic (and a count of the number of times D takes on this value when n = 4) are given
by
counts of number of different D values=
Counter({6: 4, 14: 4, 2: 3, 18: 3, 8: 2, 10: 2, 12: 2, 0: 1, 4: 1, 16: 1, 20: 1})
We can then obtain the probability that we get each D value and find
probability of different D values=
{0: 0.041666666666666664,
2: 0.125,
4: 0.041666666666666664,
6: 0.16666666666666666,
8: 0.08333333333333333,
10: 0.08333333333333333,
12: 0.08333333333333333,
14: 0.16666666666666666,
16: 0.041666666666666664,
18: 0.125,
20: 0.041666666666666664}
289
y
y
y
y
x
x
y
x
x
x
163 179 213 225 229 245 247 250 286 299
1
2
3
4
5
6
7
8
9
10
Table 15: The ranks of each xi and yj sample when combined.
If we want to then pick a value c such that under the null hypothesis we would have P (D ≤
c) ∼ 0.1 we have to take c = 0 for if c = 2 then P (D ≤ 2) ≈ 0.16666667 > 0.1.
Exercise 15.10
Our hypothesis test for this data is
H0 : µ 1 = µ 2 ⇒ H0 : µ 1 − µ 2 = 0
Ha : µ 1 > µ 2 ⇒ Ha : µ 1 − µ 2 > 0 .
We have five values for the adhesive strength for sample 1 (call these the xi s). We also have
five values for the adhesive strength for sample 2 (which we call the yi s). We don’t need to
use the large sample test (since the numbers are so small). For α = 0.05 we use Appendix
Table A.14 with m = 5 and n = 5 to find P (W ≥ 36 when H0 is true) = 0.048 ≈ 0.05. For
this data we have Table 15. Thus w = 5 + 68 + 9 + 10 = 38. Since w is greater than the
critical value of 36 we can reject H0 in favor for Ha .
Exercise 15.11
We label the “Pine” data as the xi s (since there are fewer samples) and the “Oak” data as
the yi s. Then our hypothesis test for this data is
H0 : µ 1 = µ 2 ⇒ µ 1 − µ 2 = 0
Ha : µ1 6= µ2 ⇒ µ1 − µ2 6= 0 .
Thus for α = 0.05 we use Appendix Table A.14 with m = 6 and n = 8 to find P (W ≥ 61 when H0 is true) =
0.021 ≈ 0.025 thus c = 61 and we reject H0 if the rank-sum w is such that
w ≥ 61 or w ≤ m(m + n + 1) − c = 29 .
For the data given here using wilcoxon rank sum test.R we find that w = 37 and thus we
do not reject H0 in favor of Ha .
290
Exercise 15.12
We label the data from the “Original Process” as the xi s and the data from the “Modified
Process” as the yi s. Then our hypothesis test for this data is
H0 : µ 1 − µ 2 = 1
Ha : µ 1 − µ 2 > 1 .
For α = 0.05 we use Appendix Table A.14 with m = 8 and n = 8 to find
P (W ≥ 84 when H0 is true) = 0.052 ≈ 0.05 ,
thus c1 = 84 and we reject H0 if the rank-sum w is larger than this number. For the data
given here we find that w = 65 (remembering to subtract the one) and thus we do not reject
H0 in favor of Ha .
Exercise 15.13
We label the data with the “Orange Juice” as the xi s and the data from the “Ascorbic Acid”
as the yi s. Then our hypothesis test for this data is
H0 : µ 1 − µ 2 = 0
Ha : µ1 − µ2 6= 0 .
Since m = n = 10 we can use the large sample normal approximation to compute z =
2.267787. For this value of z we compute a P-value given by 0.0233422. As this is larger
than 0.01 we cannot reject H0 in favor of Ha .
Exercise 15.14
Again we label the data with the “Orange Juice” as the xi s and the data from the “Ascorbic
Acid” as the yi s. Then our hypothesis test for this data is again
H0 : µ 1 − µ 2 = 0
Ha : µ1 − µ2 6= 0 .
Again m = n = 10 we can use the large sample normal approximation to compute z =
2.532362. For this value of z we compute a P-value given by 0.0113297. As this is larger
than 0.01 we still cannot reject H0 in favor of Ha .
291
Exercise 15.15
Again we label the “Unexposed” data as the xi s and the “Exposed” data as the yi s. Then
our hypothesis test for this data is
H0 : µ1 − µ2 = −25
Ha : µ1 − µ2 < −25 .
For α = 0.05 we use Appendix Table A.14 with m = 7 and n = 8 to find
P (W ≥ 71 when H0 is true) = 0.047 ≈ 0.05 ,
thus c1 = 71 and we reject H0 if the rank-sum w is smaller than
m(m + n + 1) − c1 = 41 .
For the data given here we find that w = 39. Thus we can reject H0 in favor of Ha in this
case.
Exercise 15.16
Again we label the “good” data as the xi s and the “poor” data as the yi s. Then our
hypothesis test for this data is again
H0 : µ
˜1 − µ
˜2 = 0
Ha : µ
˜1 − µ
˜2 < 0 .
Part (a): Here m = n = 8 and using wilcoxon rank sum test we get a w = 41.
Part (b): For α = 0.01 we use Appendix Table A.14 to find
P (W ≥ 90 when H0 is true) = 0.01 ,
thus c1 = 90 and we reject H0 if the rank-sum w is smaller than
m(m + n + 1) − c1 = 46 .
For the data given here since w = 41 we can reject H0 in favor of Ha in this case.
Exercise 15.17
When we have n = 8, using Appendix Table A.15 we find c = 32. With this value and using
the R command wilcoxon signed rank interval.R we get a 95% confidence interval for µ
of (11.15, 23.8).
292
Exercise 15.18
When we have n = 14, using Appendix Table A.15 we find c = 93. With this value and
using the R command wilcoxon signed rank interval.R we get a 99% confidence interval
for µ of (7.095, 7.43).
Exercise 15.19
When we have n = 8, using Appendix Table A.15 we find c = 32. With this value and using
the R command wilcoxon signed rank interval.R we get a 95% confidence interval for
µ of (−0.585, 0.025). Note this is significantly different than the answer in the back of the
book. If anyone sees anything wrong with what I have done please contact me.
Exercise 15.21
For this problem we have m = n = 5, using Appendix Table A.16 for α = 0.1 we find c = 21.
With this value and using the R command wilcoxon rank sum interval.R we get a 90%
confidence interval for µ1 − µ2 of (16, 87).
Exercise 15.22
For this problem we have m = 6 and n = 8, using Appendix Table A.16 for α = 0.01 we find
c = 44. With this value and using the R command wilcoxon rank sum interval.R we get
a 99% confidence interval for µ1 − µ2 of (−0.79, 0.73).
Exercise 15.23
For this problem I used the R code kruskal wallis test.R to compute the Kruskal Wallis
K statistic and the α = 0.1 critical value. For the given data we find k = 14.06286 and
kcrit = 6.251389. Since k > kcrit we reject H0 in favor of Ha .
Exercise 15.24
For this problem I used the R code kruskal wallis test.R to compute the Kruskal Wallis
K statistic and the α = 0.05 critical value. For the given data we find k = 7.586587 and
kcrit = 7.814728. Since k < kcrit we cannot reject H0 in favor of Ha and conclude that diet
makes no difference on nitrogen production.
293
Exercise 15.25
For this problem I used the R code kruskal wallis test.R to compute the Kruskal Wallis
K statistic and the α = 0.05 critical value. For the given data we find k = 9.734049 and
kcrit = 5.991465. Since k > kcrit we can reject H0 in favor of Ha .
Exercise 15.26
For this problem I used the R code friedmans test.R to compute the Friedman’s Fr statistic
and the α = 0.01 critical value. For the given data we find fr = 28.92 and fcrit = 11.34487.
Since fr > fcrit we can reject H0 in favor of Ha .
Exercise 15.27
For this problem I used the R code friedmans test.R to compute the Friedman’s Fr statistic
and the α = 0.05 critical value. For the given data we find fr = 2.6 and fcrit = 5.991465.
Since fr < fcrit we cannot reject H0 in favor of Ha .
Exercise 15.28
We label the “Potato” data as the xi s and the “Rice” data as the yi s. Then our hypothesis
test for this data is
H0 : µ 1 − µ 2 = 0
Ha : µ1 − µ2 6= 0 .
For α = 0.05 we use Appendix Table A.14 with m = 8 and n = 8 to find
P (W ≥ 87 when H0 is true) = 0.025 ,
thus c = 87 and we reject H0 if the rank-sum w is larger than this number or smaller than
m(m + n + 1) − c = 49. For the data given here we find that w = 73 and thus we do not
reject H0 in favor of Ha .
Exercise 15.29
For this problem I used the R code kruskal wallis test.R (this test ignores the salesperson
information) to compute the Kruskal Wallis K statistic and the α = 0.05 critical value. For
the given data we find k = 7.814728 and kcrit = 11.95095. Since k > kcrit we can reject H0
in favor of Ha and conclude that the average cancellation does depend on year.
294
The book seems to have used Friedman’s test on this problem where the blocks are the
sales people and the treatments are the year. If we do that we find fr = 9.666667 and
fcrit = 7.814728. Since ff > fcrit we again conclude that we reject H0 in favor of Ha .
Exercise 15.30
For this problem I used the R code kruskal wallis test.R to compute the Kruskal Wallis
K statistic and the α = 0.05 critical value. For the given data we find k = 7.814728 and
kcrit = 17.85714. Since k > kcrit we can reject H0 in favor of Ha and conclude that the true
mean phosphorus concentration does depend on treatment type.
Exercise 15.31
For this problem we have m = n = 5, using Appendix Table A.16 for α = 0.05 we find
c = 22. With this value and using the R command wilcoxon rank sum interval.R we get
a 95% confidence interval for µII − µIII of (−5.9, −3.8).
Exercise 15.32
We label the “Diagonal” data as the xi s and the “Lateral” data as the yi s. Then our
hypothesis test for this data is
H0 : µ 1 − µ 2 = 0
Ha : µ1 − µ2 6= 0 .
Part (a): For α = 0.05 we use Appendix Table A.14 with m = 6 and n = 7 to find
P (W ≥ 56 when H0 is true) = 0.026 ≈ 0.025 ,
thus c = 56 and we reject H0 if the rank-sum w is larger than this number or smaller than
m(m + n + 1) − c = 28. For the data given here we find that w = 43 and thus we do not
reject H0 in favor of Ha .
Part (b): To get a 95% confidence interval for the means we need to compute the dij(n)
values and then our confidence interval is
(dij(mn−c+1), dij(c) ) = (dij(8) , dij(35) ) = (−0.29, 0.41) ,
for c = 35 (using Appendix Table A.16) and with mn − c + 1 = 8.
295
Exercise 15.33
Part (a): Here Y counts the number of successes in n trials with a probability of success
p = 0.5 thus Y is a binomial random variable. Thus we can compute α as
α=
20
X
y=15
dbinom(y, 20, 0.5) = 1 − pbinom(14, 20, 0.5) = 0.02069473 .
Part (b): We want to find a value of c such that
1 − P r{Y ≤ c} = 0.05 .
We find c = 13 gives the value α = 0.9423409 and c = 14 gives α = 0.9793053. If we use
c = 14 then since y = 12 we cannot reject H0 in favor of Ha .
Exercise 15.34
Part (a): When H0 is true this means that µ
˜ = 25 so Y is binomial random variable with
n = 20 and p = 0.5. Using this we can compute α using
α=
5
X
y=0
dbinom(y, 20, 0.5) +
20
X
dbinom(y, 20, 0.5) = 0.04138947 .
y=15
Part (b): For this part we pick a value of µ
˜0 and see if H0 will be rejected i.e. we count
the number of times our sample xi is larger than µ
˜ 0 call this value Y . We reject if Y ≥ 15
or Y ≤ 5. Then we let µ
˜ 0 range over all possible values and the confidence interval is the
minimum and maximum of these values of µ
˜ 0 such that H0 is not rejected. Note that we
only need to test values for µ
˜0 that are the the same as our sample xi (since these are the
only ones where the value of Y will change). The values of µ
˜ 0 where we did not reject H0
are given by
> did_not_reject
[1] 14.4 16.4 24.6 26.0 26.5 32.1 37.4 40.1 40.5
Thus the 4.1% confidence interval for our value of µ
˜ is (14.4, 40.5).
Exercise 15.35
When I sort and label our combined data in the manner suggested I get the following
296
3.7
y
1
4.0
x
3
4.1
y
5
4.3
y
7
4.4
x
9
4.8
x
8
4.9
x
6
5.1
y
4
5.6
y
2
Thus w ′ = 3 + 8 + 9 + 6 = 26. Using the Appendix Table A.14 to find c such that P (W ≥
c) = 0.05 with m = 4 and n = 5 gives c = 27. Since w ′ < c we don’t reject H0 .
Exercise 15.36
When we have m = 3 and n = 4 we have seven total observations and so there are 73 = 35
possible rank locations for the x samples. In the python code ex15 36.py we explicitly
enumerate all of the possible values for the W ′′ variable when H0 is true. When we run this
code we get the null distribution for W ′′ given by
probability of different W’’ values=
{4: 0.05714285714285714,
5: 0.11428571428571428,
6: 0.2571428571428571,
7: 0.22857142857142856,
8: 0.2,
9: 0.11428571428571428,
10: 0.02857142857142857}
From the above probability we see that if c = 10 we have P {w ′′ ≥ c} = 0.02857 and if c = 9
that P {w ′′ ≥ c9} = 0.14286. Neither of these probabilities is very close to the value of 0.1.
297