Chapter 1 - UP Diliman School of Statistics

Transcription

Chapter 1 - UP Diliman School of Statistics
Statistics 115
BASIC STATISTICAL
METHODS
Course Objectives
The major objective of this course is simply to introduce the basic principles of Inferential Statistics. In order
to enlighten the student on these basic principles, selected elementary tools in Inferential Statistics will be
presented. And by the end of the semester, the student of this course must be able to achieve the following:
Commit to memory the basic terms, concepts and notations used in Applied Statistics, especially those
that are useful in Inferential Statistics;
Know how to solve simple probability problems using the a priori and a posteriori approaches to assign
probabilities;
Know how to use the normal, t, 2, and F tables to compute for probabilities;
Understand the concept of sampling distributions and know the sampling distributions of common statistics
computed from a random sample from a normal distribution;
Comprehend the basic concepts used in estimating population parameters and testing statistical
hypotheses;
Compute for the interval estimates of usual parameters that describe a single population or are used to
compare two or more populations;
Conduct tests of hypotheses on these parameters and interpret the results of these tests;
Be acquainted with simple tools used to study the relationship between two variables;
Perform the chi-squared test for independence;
Estimate the regression coefficients of the simple linear regression model using the method of least
squares;
Be aware of the basic principles of experimental design;
Know the description and principal advantages and disadvantages of some basic experimental designs;
Know the classical smoothing methods in times series analysis; and,Interpret particular computer outputs
of Microsoft Excel.
Textbook
Elementary Statistics
By
Almeda, Capistrano, and Sarte
Assignment
1.
Fill up attendance card. Follow the format below:
Second Semester, AY 2013-2014
Stat 115 (Section)
Name: LAST NAME, FIRST NAME
Student No: _________________
Grades:
Math 17 ________
Stat 114 ________
1.5x1.5 to
2x2 Photo
Nickname (in ALL CAPS)
2. Read course syllabus posted at http://www.stat.upd.edu.ph/fcapistrano.htm.
Site also contains presentation materials and sample exams.
3. Make sure you belong in a group with 3 to 4 members.
Chapter 1
PRELIMINARIES
Population vs Sample
 Population data = {X1, X2, …, XN}
Parameter:
a summary measure describing a particular characteristic of the
population that is computed using population data
N
N
Xi
Examples:
i 1
population mean =
N
(Xi
population variance =
2
i 1
N
 Sample data = {X1, X2, …, Xn}
Statistic:
a summary measure describing a particular characteristic of the sample
that is computed using sample data
n
n
(X i X) 2
Xi
Examples:
)2
sample mean = X
i 1
n
2
sample variance = S
i 1
n 1
Notes About the Parameter and Statistic
 Both the parameter and statistic are summary measures
that are computed using data.
 If you have population data then the computed summary
measure is a parameter. If you only have sample data then
the computed summary measure is a statistic.
 In a statistical inquiry, the answer to the research problem
is based on the value of the parameter that describes the
characteristic of interest of the population under study.
However, the value of this parameter can only be
computed using population data. If you only have sample
data, you cannot compute for the value of the parameter.
Descriptive Statistics vs Inferential Statistics
Descriptive
Statistics
comprise those methods concerned
with collecting, describing, and
analyzing a set of data without
drawing conclusions or inferences
about a larger group
Inferential
Statistics
comprise those methods concerned
with the analysis of sample data leading
to predictions or inferences about the
population
Notes About Inferential Statistics
 Although we cannot compute for the value of the parameter
using sample data, we can use the methods on Inferential
Statistics to infer on the value of this parameter.
 In Inferential Statistics, we compute for the value of the
statistic using sample data not for the purpose of describing
the sample but so that we can infer on the value of the
parameter of interest.
 It should be clear that we base our inferences on partial
information about the population. Thus, whatever inferences
we make will always be subject to some error. A background
on probability theory and distribution theory will help us
understand the errors that we commit in Inferential Statistics.
Random Experiment
(page 284)
Definition 10.4
A random experiment is a process that can be repeated
under similar conditions whose outcome cannot be
predicted with certainty beforehand.
Examples: tossing of a coin, tossing a die, drawing cards
from a standard deck of cards, selecting a sample from
the population using probability sampling methods
Sample Space
(pages 285-286)
Definition 10.5
The sample space, denoted by (Greek letter omega),
is the collection of all possible outcomes of a random
experiment. An element of the sample space is called a
sample point.
Examples: Examples 10.1 and 10.2
Example
(pages 86-87)
Recall:
Simple random sampling is a probability sampling
method wherein all possible subsets consisting of n
elements selected from the N elements of the
population have the same chances of selection.
In simple random sampling without replacement
(SRSWOR), all the n elements in the sample must be
distinct from each other.
In simple random sampling with replacement
(SRSWR), the n elements in the sample need not be
distinct, that is, an element can be selected more than
once to be a part of the sample.
Example cont’d
Example 3.8:
a)
Suppose the population consists of N=5 children: a=Janine, b=Josiel, c=Jan,
d=Eryl, and e=Eariel.
Suppose a sample of size n=2 will be selected using SRSWOR. Specify the sample space.
We will denote a sample of size 2 by the set {x1, x2}, where x1 and x2 are the two distinct
elements included in the sample.
= {{a,b}, {a,c}, {a,d}, {a,e}, {b,c}, {b,d}, {b,e}, {c,d}, {c,e}, {d,e}}
By definition of SRSWOR, all the 10 sample points (samples) will be given equal chances of
selection.
In general, a sample of size n selected using SRSWOR will be denoted by a set containing n
distinct elements, {x1,x2,…,xn}, where the xis are the elements selected in the sample.
When the sample of size n is selected from a population of size N using SRSWOR, then the
sample space will contain (N(N-1)(N-2)…(N-n+1))/(n(n-1)(n-2)…(2)(1)) sets containing n
elements, and by definition, all of them will be given equal chances of selection.
Example cont’d
Example 3.8:
b)
Suppose the population consists of N=5 children: a=Janine, b=Josiel, c=Jan,
d=Eryl, and e=Eariel.
Suppose a sample of size n=2 will be selected using SRSWR. Specify the sample space.
We will denote a sample of size 2 by an ordered pair, (x1, x2), where x1 is the element selected
on the first draw while x2 is the element selected on the second draw.
= {(a,a), (a,b), (a,c), (a,d), (a,e), (b,a), (b,b), (b,c), (b,d), (b,e), (c,a), (c,b), (c,c), (c,d), (c,e),
(d,a), (d,b), (d,c), (d,d), (d,e), (e,a), (e,b), (e,c), (e,d), (e,e)}
By definition of SRSWR, all the 25 sample points (samples) will be given equal chances of
selection.
In general, a sample of size n selected using SRSWR will be denoted by an ordered n-tuple
(x1,x2,…,xn) where xi is the element selected on the ith draw.
When the sample of size n is selected from a population of size N using SRSWR, then the
sample space will contain Nn ordered n-tuples and by definition, all of them will be given equal
chances of selection.
Assignment 1
Suppose the population consists of N=6 elements: a, b, c, d, e, and f.
Suppose a sample of size n=3 will be selected. Specify the sample
space by roster method for the following sampling schemes:
1.
2.
3.

SRSWOR
SRSWR
Systematic sampling (Recall: Since n is a divisor of N then k=N/n.
Select the starting point at random, from 1 to k then take every
kth element thereafter.)
Note: Under systematic sampling where n is a divisor of N, the
sample space will contain k=N/n equally likely sample points.
Event
(pages 287-289)
Definition 10.6
An event is a subset of the sample space whose probability is
defined. We say that an event occurred if the outcome of the
random experiment is one of the sample points belonging in
the event; otherwise, the event did not occur.
 We will denote events by capital latin letters.
 There are two special events:
 Sure event:
 Impossible event:
Examples
 Example 10.3 (page 287)
 The population consists of N=5 children: a=Janine, b=Josiel, c=Jan, d=Eryl, and
e=Eariel. Suppose a sample of size n=2 will be selected using SRSWR.
= {(a,a), (a,b), (a,c), (a,d), (a,e), (b,a), (b,b), (b,c), (b,d), (b,e), (c,a), (c,b), (c,c),
(c,d), (c,e), (d,a), (d,b), (d,c), (d,d), (d,e), (e,a), (e,b), (e,c), (e,d), (e,e)}
A = event that Janine is included in the sample
= {(a,a), (a,b), (a,c), (a,d), (a,e), (b,a), (c,a), (d,a), (e,a)}
B = event that Janine and Jan are both included in the sample
= {(a,c), (c,a)}
Suppose the sample selected was (a,d)? Did event A occur? Did event B
occur?
Probability of an Event
(pages 292-293)
 Definition 10.9. The probability of an event A, denoted by P(A), is
a function that assigns a measure of chance that event A will occur
and must satisfy the following properties:
i.
ii.
iii.
0 ≤ P(A) ≤ 1
P( ) = 1 and P( ) = 0
Finite Additivity. If event A can be expressed as the union of n non-overlapping
events, A1, A2, …, An, then P(A)=P(A1)+P(A2)+…+P(An)
 Interpretation: A probability measure that is close to 1 means that
the event has a very large chance of occurrence. On the other
hand, if the probability measure is close to 0, then the event has a
very small chance of occurrence. A probability of 0.5, the midpoint
of the interval [0,1] means that the chance that the event will occur
is the same as the chance that the event will not occur.
A Priori Probability or Classical Definition of Probability
(page 297)
Definition 10.10
If a random experiment can result in any one of N
different equally likely outcomes and if exactly n of
these outcomes belong in event A, then
P(A)
no. of elements in A
no. of elements in
n
N
Examples
 Example 10.9 (page 298)
 The population consists of N=5 children: a=Janine, b=Josiel, c=Jan, d=Eryl, and
e=Eariel. Suppose a sample of size n=2 will be selected using SRSWR.
= {(a,a), (a,b), (a,c), (a,d), (a,e), (b,a), (b,b), (b,c), (b,d), (b,e), (c,a), (c,b), (c,c),
(c,d), (c,e), (d,a), (d,b), (d,c), (d,d), (d,e), (e,a), (e,b), (e,c), (e,d), (e,e)}
By definition of SRSWR, the sample space contains equally likely outcomes.
A = event that Janine is included in the sample
= {(a,a), (a,b), (a,c), (a,d), (a,e), (b,a), (c,a), (d,a), (e,a)}
P(A) = 9/25
B = event that Janine and Jan are both included in the sample
= {(a,c), (c,a)}
P(B) = 2/25
Probabilities and Proportions
(page 298)
 The classical definition of probability allows us to view
proportions in terms of probabilities and vice versa. Consider
the experiment wherein we randomly select one element from a
population under study and observe if the selected element
possesses the characteristic of interest. We can define the
sample space as the collection of all elements in the population.
Let A=event that the selected element possesses the
characteristic of interest. Then by the classical definition of
probability:
P(A)
no. of elements in A
no. of elements in
prop'n of elts in popn that possess the characteristic
Example
(page 298)
Example 10.10. According to a study conducted by the Food and
Nutrition Research Institute (FNRI) in 1998, 30% of
children in Central Luzon aged 6 months to 5 years
old are afflicted with iron deficiency anemia while
50% have low to deficient Vitamin A levels.
Define
=set of children in Central Luzon aged 6 mos. to 5 years
A = event of selecting a child who has iron deficiency anemia
B = event of selecting a child with low to deficient Vit. A levels
Then based on the given data, we can write
P(A) = 0.3 and P(B) = 0.5
A Posteriori Probability
(page 299)
Definition 10.11
If a random experiment is repeated many times under uniform conditions, use
the empirical probability of event A to assign its probability as follows:
no. of times event A occurred
empirical P(A) =
no. of times experiment was repeated
The a posteriori definition of the probability of event A is the limiting value of
its empirical probability if we repeat the process endlessly.
Note:
The observed relative frequency of occurrence of event A is just the
empirical probability of event A. This empirical probability will be a
good approximate of the actual probability if we perform the process a
large number of times under uniform conditions.
Using Excel to Simulate the Experiment
Step 1:
Step 2:
Step 3:
Step 4:
In Column A, list all possible numeric outcomes in a single stage of the
random experiment. In Column B, write the corresponding
probabilities.
Select Data then click Data Analysis.
Choose Random Number Generation.
Fill-up dialogue box.
Number of variables:
no. of stages of the random experiment or
no. of times an outcome is selected in a
single trial
Number of random nos: total number of trials
Distribution:
Discrete
Value and Probability:
Cells containing outcomes & probabilities
Random Seed:
any positive integer less than 215
Exercise for Section 10.3
(page 302)
1. Consider the experiment of tossing a fair die twice. Specify
the following events using the roster method and compute
for their a priori probabilities:
a. A=event where the sum of the number of dots is less than 5
b. B=event of observing 2 dots on the first toss
c. C=event of observing the same number of dots on both tosses
2. Perform the experiment of tossing a fair die twice 1,000
times and under uniform conditions. Approximate the
probabilities of the events described in Problem no. 1 by
computing for the empirical probabilities.
Assignment 2
 Answer Exercise 1b and 1c.
 Use Microsoft Excel and the seed number assigned to
your group to answer Exercise 2.
Random Variable
(page 327)
Definition 10.18
A function whose value is a real number that is determined by each
sample point in the sample space is called a random variable. An
uppercase letter, say X, will be used to denote a random variable and its
corresponding lowercase letter, x in this case, will be used to denote one
of its values.
Note:
The use of the term variable is consistent with the way we use this
word in mathematics and the way we defined it in Chapter 1 as the
characteristic of interest whose value varies. The addition of the
term “random” emphasizes the requirement that the realized or
actual value of the random variable depends on the outcome of a
random experiment. Consequently, it is impossible to predict with
certainty what the realized value of the random variable X will be.
Example 10.34
(page 328)
Filipinos are so fascinated with elections and the polls conducted to predict the outcomes of these elections. For
illustration purposes, let us imagine a very small barangay consisting of 6 qualified voters. Let’s label these voters as
A1, A2, A3, A4, A5, and A6. There are two candidates vying for the position, say Renzo and Sandro. What we do not
know is that voters A1, A2, A3 and A4 have already decided to elect Renzo while voters A5 and A6 will elect
Sandro. We only have enough resources to get a sample of size 3. We will then use the information from this
sample to predict the outcome of the election.
Suppose we use SRSWOR to select our sample of size 3. Our sample space will contain all the 20 possible subsets
of size 3. The sample points in our sample space are:
{A1,A2,A3}
{A1,A2,A4}
{A1,A2,A5}
{A1,A2,A6}
{A1,A3,A4}
{A1,A3,A5}
{A1,A3,A6}
{A1,A4,A5}
{A1,A4,A6}
{A1,A5,A6}
{A2,A3,A4}
{A2,A3,A5}
{A2,A3,A6}
{A2,A4,A5}
{A2,A4,A6}
{A2,A5,A6}
{A3,A4,A5}
{A3,A4,A6}
{A3,A5,A6}
{A4,A5,A6}
Define X=number of voters who will elect Renzo.
X is a random variable. Its realized value depends on the outcome of the random experiment.
{A1,A2,A3}
{A1,A2,A4}
{A1,A2,A5}
{A1,A2,A6}
{A1,A3,A4}
{A1,A3,A5}
{A1,A3,A6}
{A1,A4,A5}
{A1,A4,A6}
{A1,A5,A6}
3
3
2
2
3
2
2
2
2
1
{A2,A3,A4}
{A2,A3,A5}
{A2,A3,A6}
{A2,A4,A5}
{A2,A4,A6}
{A2,A5,A6}
{A3,A4,A5}
{A3,A4,A6}
{A3,A5,A6}
{A4,A5,A6}
3
2
2
2
2
1
2
2
1
1
Using the Random Variable to Express the Event of Interest
(page 329)
 We will use the notation, X ≤ x, to express the event containing
all sample points whose associated value for the random variable
X is less than or equal to x, where x is a specified real number.
 We will use the notation, X > x, to express the event containing
all sample points whose associated value for X is greater than x.
 We will use the notation, a<X<b, to express the event containing
all sample points whose associated value for X is in between a
and b, where a and b are specified real numbers.
 And so on.
Example 10.35
(page 329)
Define X=number of voters who will elect Renzo
{A1,A2,A3}
{A1,A2,A4}
{A1,A2,A5}
{A1,A2,A6}
{A1,A3,A4}
{A1,A3,A5}
{A1,A3,A6}
{A1,A4,A5}
{A1,A4,A6}
{A1,A5,A6}
3
3
2
2
3
2
2
2
2
1
{A2,A3,A4}
{A2,A3,A5}
{A2,A3,A6}
{A2,A4,A5}
{A2,A4,A6}
{A2,A5,A6}
{A3,A4,A5}
{A3,A4,A6}
{A3,A5,A6}
{A4,A5,A6}
3
2
2
2
2
1
2
2
1
1
A = event of selecting a sample with 1 voter electing Renzo
= {{A1,A5,A6}, {A2,A5,A6},{A3,A5,A6}, {A4,A5,A6}}
Event A can be expressed as X=1.
B = event of selecting a sample with more than 2 voters electing Renzo
= {{A1,A2,A3}, {A1,A2,A4}, {A1,A3,A4}, {A2,A3,A4}}
Event B can be expressed as X>2.
C = event of selecting a sample with at least 1 voter electing Renzo
= = sure event
Event C can be expressed as X ≥ 1
D = event of selecting a sample with 5 voters electing Renzo
= = impossible event.
Event D can be expressed as X=5.
Discrete Random Variable and Its PMF
(pages 330 & 332)
Definition 10.20
If a sample space contains a finite number of sample points or has as many
sample points as there are counting/natural numbers then it is called a
discrete sample space.
Definition 10.21
A random variable defined over a discrete sample space is called a
discrete random variable.
Definition 10.22
The probability mass function (PMF) of a discrete random variable,
denoted by f(.), is a function defined for any real number x as:
f(x) = P(X = x).
The values of the discrete random variable X for which f(x)>0 are called
its mass points.
Example 10.38
(pages 332-333)
Define X=number of voters who will elect Renzo
{A1,A2,A3}
{A1,A2,A4}
{A1,A2,A5}
{A1,A2,A6}
{A1,A3,A4}
{A1,A3,A5}
{A1,A3,A6}
{A1,A4,A5}
{A1,A4,A6}
{A1,A5,A6}
3
3
2
2
3
2
2
2
2
1
{A2,A3,A4}
{A2,A3,A5}
{A2,A3,A6}
{A2,A4,A5}
{A2,A4,A6}
{A2,A5,A6}
{A3,A4,A5}
{A3,A4,A6}
{A3,A5,A6}
{A4,A5,A6}
3
2
2
2
2
1
2
2
1
1
X is a discrete random variable. The range of X={1,2,3}. The elements in the range of X are the mass points of
the discrete random variable X.
To derive the PMF of X, we need to compute P(X=x) for all x that are mass points of X. Since the sample
space contains equally likely outcomes then we can use the classical definition to compute for these
probabilities, that is, P(A) = no. of sample points in A/ no. of sample points in .
x
1
Event Associated with X=x
P(X=x)
{{A1,A5,A6}, {A2,A5,A6},{A3,A5,A6}, {A4,A5,A6}}
4/20 = 1/5
2
{{A1,A2,A5}, {A1,A2,A6}, {A1,A3,A5}, {A1,A3,A6}, {A1,A4,A5}, {A1,A4,A6},
{A2,A3,A5}, {A2,A3,A6},{A2,A4,A5}, {A2,A4,A6}, {A3,A4,A5}, {A3,A4,A6}}
12/20 = 3/5
3
{{A1,A2,A3}, {A1,A2,A4}, {A1,A3,A4}, {A2,A3,A4}}
4/20 = 1/5
The PMF of X can be presented in tabular form as follows:
x
1
2
3
f(x)
1/5
3/5
1/5
Continuous Random Variable and Its PDF
(pages 335-336)

For a continuous random variable, X, the P(X=x) will always be 0 for any real number x. This
property may sound strange but this property will be satisfied by variables that are measured
using some standard measurement of scale of real numbers or nonnegative real numbers such as:
inches, centimeters, degrees Fahrenheit, degrees Celsius, pints, liters, grams, ounces. Any
particular measure taken on such scales could be recorded to as many decimal places as one
might care to take it. So if you take any interval containing the point x on such scales, even if this
interval is very, very short, this interval will always contain infinitely many other points on the
scale and the point x is just one of them.
Definition 10.23
The probability density function (PDF) of a continuous random variable X, denoted by f(.), is
a function that is defined for any real number x and satisfy the following properties:
a) f(x) 0 for all x;
b) the area below the whole curve, f(x), and above the x-axis is always equal to 1; and,
c) P(a ≤ X ≤ b) is the area bounded by the curve f(x), the x-axis and the lines x=a and x=b.
Graph of the PDF
(page 336)
The graph of the PDF is always above the
x-axis because the function cannot take on
negative values.
 If we remove the lines x=a and x=b and
measure the whole area below f(x) and
above the x-axis, this area is always exactly
equal to 1.
 The shaded area which is bounded by the
curve f(x), the x-axis, and the lines x=a and
x=b, represents the P(a ≤ X ≤ b). We can
also see from this illustration the reason
why we stated earlier that for a continuous
random variable X, the P(X=x)=0 for any
real number x. P(X=a) is just the same as
P(a ≤ X ≤ a). In this case, we will let b=a.
Then, the area representing P(X=a) will be
0 because we will only be left with a single
line.

Notes About the PMF and PDF
The PMF of the discrete random variable and the PDF of
the continuous random variable are what we refer to as
the distribution of the random variable. The distribution
of the random variable X provides us with complete
information about the behavior of the random variable X.
Although we cannot predict with certainty what the
realized value of the random variable X will be, we can
use its distribution to compute for the probability of any
event expressed in terms of the random variable X. We
will learn how to do this in Stat 121. In Stat 115, we will
use the CDF of a continuous random variable to
compute for probabilities.
Cumulative Distribution Function
(page 330)
Definition 10.19
The cumulative distribution function (CDF) of a
random variable X, denoted by F(.) is a function defined
for any real number x as
F(x) = P(X x)
Notes About the CDF
 The CDF of the random variable X is also referred to as its
distribution. Just like the PMF of a discrete random variable and the
PDF of a continuous random variable, the CDF provides us with
complete information about the behavior of the random variable. We
can use it to compute for the probability of any event expressed in
terms of the random variable X.
 (page 339) When X is a continuous random variable, we can express
the probability of the event in terms of the CDF as follows:
 P(X a) = P(X < a) = F(a).
 P(X > a) = P(X a) = 1 – F(a).
 P(a < X < b) = P(a X b) = P(a
X < b) = P(a < X
b) = F(b) – F(a).
Examples
 Example 10.41 (page 339)
 Exercise 4. (page 340)
Given the CDF of a
continuous random
variable X, find the
following probabilities using
the CDF:
a. P(X>0.25)
b. P(0.3<X<0.7)
c. P(0.4≤X<1.25)
F ( x)
1
when x 1
x3
0
when 0 x 1
when x 0
Mean and Variance of the Discrete Random Variable X
(pages 341 and 344)
Suppose X is a discrete random variable with probability mass function:
x
f(x) = P(X=x)
x1
f(x1)
x2
f(x2)
x3
f(x3)
…
…
xn
f(xn)
Definition 10.24
Definition 10.25
The mean of the discrete random variable
X, also called the expected value of X is
The variance of the discrete random variable X
is
X
= E(X) = x1f(x1) + x2f(x2) +…+ xnf(xn).
The mean tells us where the center of mass
is located. In other words, it tells us the
average value of X if we repeat the random
experiment endlessly.
2
X
= Var(X) = (x1 - x)2f(x1) + (x2 … + (xn - x)2f(xn)
x
)2f(x2) +
The variance measures how close the values of
X are around the mean.
Examples
Examples 10.42 and 10.45: Let X=number of voters who will elect Renzo, as defined in Example
10.34. The PMF of this random variable as derived earlier is as follows:
x
1
2
3
f(x)
1/5
3/5
1/5
Use this PMF to find the mean and variance of X.
Solution:
X
= E(X) = (1)(1/5) +(2)(3/5) +(3)(1/5) =2.
Suppose we keep on repeating the process of selecting samples of size 3 and each
time observe how many will vote for Renzo. The average of these values, or the
average number of voters who will elect Renzo, is 2.
2
X
= Var(X) = (1 – 2)2(1/5) + (2 – 2)2(3/5) + (3 – 2)2(1/5) = 0.4
Assignment 3
Exercise 2 (page 348)
Given the CDF of a continuous
random variable X, compute
for the following probabilities:
1.
2.
a)
b)
c)
d)
e)
P(X ≤ 0.4)
P(X > 0.8)
P(X ≥ -0.6)
P(-0.5 < X < 0.2)
P(-0.1< X ≤ 2.5)
Always show your solution. Write
the formula in terms of the CDF
first then present how you have
plugged-in the appropriate values
to compute the probability.
0,
x2
2x 1
,
2
x 2 2x 1
,
2
F(x)
1,
if x
1
if 1 x
if 0
0
x 1
if x 1
Normal Distribution
(page 349)
Definition 10.27 (page 349)
A continuous random variable X is said to be normally distributed if its
probability density function is given by :
f ( x)
1
2
e
1 x
2
2
for any real number x. The constants, and 2, are such that - < <
and 2>0. The values, e and , are mathematical constants, wherein,
e 2.71828 and
3.14159.
The normal distribution has 2 parameters, namely, µ and 2. As stated in Definition 10.28, a
parameter in distribution theory is a constant that determines the specific form of the
probability distribution. We can view the parameter as a numerical descriptive measure of
the probability distribution. It carries vital information about the probability distribution.
The shape of the distribution, the location of its center, the value of its variance, and other
characterizations of the distribution all depend on the value of the parameter. We will notice
that this notion is consistent with our previous concept of a parameter as a summary
measure describing a specific characteristic of the population in Stat 114. This time though,
our population is specifically the set of all realized values of X if we were to repeat the
random experiment endlessly.
Graph of the Normal PDF
(page 350)
0.68
0.95
>0.99
-3
-2
-1
+1
+2
+3
Bell-shaped curve that is symmetric about µ.
The area bounded by the curve and the x-axis is 1.
The curve will approach the x-axis as we proceed in either
direction away from µ, but will never touch the x-axis.
In Stat 114, if X~Normal( , 2)
a) P( - 1 < X <
b) P( - 2 < X <
c) P( - 3 < X <
then
+ 1 ) 0.68
+ 2 ) 0.95
+ 3 ) > 0.99
Standard Normal Random Variable
(page 351)
Definition 10.29
If the normal random variable has mean 0 and variance 1, it is called a
standard normal random variable and is denoted by Z.
Standard Normal Table, Table B.1
(pages 603-604)
 Table B.1 presents the values of the CDF of a standard normal random
variable or P(Z ≤ z).
 We can compute for the probability of any event expressed in terms of the
standard normal random variable using these formulas:
 P(Z a) = P(Z < a) = F(a)
 P(Z > a) = P(Z a) = 1 – F(a)
 P(a<Z<b) = P(a Z b) = P(a Z<b) = P(a<Z b) = F(b) – F(a)
where Z is the standard normal random variable.
 See Examples 10.49 and 10.50 (page 352)
Property of the Normal Distribution
 Any random variable X that follows a normal distribution
with mean and variance 2 can be transformed into a
standard normal random variable Z with mean 0 and
variance 1.
 The transformation is the familiar formula that we use to
compute for the z-score in Stat 114:
Z
X
Computing for Probabilities
(page 353)
If X~N(µ, 2) then
P(X≤a)= P
X
a
= P Z
a
,
where Z is a standard normal random variable.
Example 10.51. . Suppose X~Normal(µ=5,
6) = P
5
=4).
6 5
P( Z 0.5) 0.6915.
2
2
4.5 5 X 5 6 5
b) P(4.5 < X < 6)= P
= P(-0.25<Z<0.5)
2
2
2
=F(0.5) – F(-0.25) = 0.6915 – 0.4013 = 0.2902.
X 5 4.5 5
c) P(X > 4.5) = P
= P(Z>-0.25) = 1 – F(-0.25)
2
2
=1– 0.4013 = 0.5987.
a) P(X
X
2
Checking the 68-95-99 Rule
Suppose X~Normal( , 2)
P
1
X
1
P
1
X
1
F(1) F( 1)
P
2
X
2
P
2
3
X
3
P
3
X
1
0.8413 0.1587
X
F(2) F( 2)
P
P
X
F(3) F( 3)
2
P
P
P( 1 Z 1)
0.6826
2
0.9772 0.0228
3
1
3
0.9987 0.0013
X
2
P( 2
Z
2)
0.9544
X
0.9974
3
P( 3
Z
3)
z Value of the Standard Normal Random Variable
(page 354)
The value z (read as “z sub alpha”) satisfies the condition that
P(Z > z ) = . This is equivalent to saying that P(Z ≤ z ) = 1 - .
1- 2
-z
0
z
z Values: Bottom of Table B.1
(page 604)
.10 .05 .025 .01 .005 .001 .0005 .00005
z 1.282 1.645 1.960 2.326 2.576 3.090 3.291 3.891
Example: z.05 = 1.645 l
P(Z > 1.645)=0.05, P(Z < 1.645) =0.95,
P(Z < -1.645) = 0.05, P(-1.645 < Z < 1.645) = 0.90.
0.90
0.05
-1.645
0.05
0
1.645
Importance of the Normal Distribution
(page 354)
The normal distributions or at least approximately
normal distributions occur in many situations. Many
physical and mental traits tend to be at least
approximately normally distributed. If it is not X that is
normally distributed, it is some transformation of X that
is normal. Furthermore, as a consequence of the Central
Limit Theorem, the normal distribution is also used to
model characteristics of interest that are believed to be
the result of summing up a large number of small effects
that are independently generated by a process.
Examples
Examples 10.53 and 10.54 (page 355)
Exercise 1 (page 372) A wine’s distinctive taste is a result of ageing it in
wooden casks. Some of the wine evaporates while it is aging in the porous
wooden casks. Define X=percentage of wine in the cask that is lost due to
evaporation. Suppose X is normally distributed with mean 5% and a standard
deviation of 1%. What is the probability of losing more than 7.5% of the wine
due to evaporation?
Always define random variable: X=percentage of wine in the cask that is lost due to evaporation
Identify distribution of X:
Given: X~Normal(µ=5,
2=12)
Express problem in terms of the defined random variable: Find P(X>7.5).
P( X
7.5)
P
X
5
1
7.5 5
1
P( Z
2.5) 1 F (2.5) 1 .9938 0.0062.
More Examples
Exercise 3 (page 373)
Suppose that the IQ’s of applicants of a certain science high school follow a normal
distribution with mean of 120 and a standard deviation of 9.
a) One of the requirements of the school in accepting a student is that the
student’s IQ must be at least 115. What proportion of the applicants will be
rejected on the basis of their IQ?
X=IQ of selected applicant
Given: X~Normal(µ=120,
2=92)
a) Find P(X<115).
P( X
115)
P
X 120
9
115 120
9
P( Z
5 / 9)
P( Z
.56)
F ( 0.56) 0.2877
Assignment 4
1.
If Z is a standard normal random variable, find the value of z0 that satisfies each
of the following probability statements:
a. P(Z < z0) = 0.99995
b. P(Z > z0) = 0.99
c. P(-z0 < Z < z0) =0.998
2.
3.
The existing machine setting of a factory produces bearings with a diameter
that is normally distributed with mean and standard deviation equal to 3.0005
cm and 0.0010 cm, respectively. Customer specifications require the bearing
diameters to lie in the interval [2.998, 3.002]. Those outside the interval are
considered scrap and must be remachined. With the existing machine setting,
what fraction of total production will not be considered as scrap?
The length of time for a college applicant to complete the college achievement
test is normally distributed with mean equal to 70 minutes and variance 144
min2. What is the probability of selecting an applicant at random who will take
more than 85 minutes to complete the test?