numerická matematika a matematická štatistika (numerical

Transcription

Moderné vzdelávanie pre vedomostnú spoločnosť
Projekt je spolufinancovaný zo zdrojov EÚ
NUMERICKÁ MATEMATIKA
A MATEMATICKÁ ŠTATISTIKA
(NUMERICAL MATHEMATICS
& MATHEMATICAL STATISTICS)
Stavebná fakulta
Doc.Ing.Roman Vodička, PhD.
RNDr. PavolPurcz, PhD.
Táto publikácia vznikla za finančnej podpory z Európskeho sociálneho fondu v rámci Operačného programu VZDELÁVANIE.
Prioritná os 1 Reforma vzdelávania a odbornej prípravy.
Opatrenie 1.2 Vysoké školy a výskum a vývoj ako motory rozvoja vedomostnej spoločnosti.
Názov projektu: Balík doplnkov pre ďalšiu reformu vzdelávania na TUKE
ITMS 26110230093
NÁZOV:
AUTORI:
VYDAVATEĽ:
ROK:
VYDANIE:
NÁKLAD:
ROZSAH:
ISBN:
Numerická matematika a matematická štatistika (Numerical Mathematics and Mathematical Statistics)
Vodička Roman, Purcz Pavol
Technická univerzita v Košiciach
2015
prvé
50 ks
167 strán
978-80-553-2041-0
Rukopis neprešiel jazykovou úpravou.
Za odbornú a obsahovú stránku zodpovedajú autori.
Moderné vzdelávanie pre vedomostnú spoločnosť
Projekt je spolufinancovaný zo zdrojov EÚ
NUMERICKÁ MATEMATIKA A MATEMATICKÁ
ŠTATISTIKA
(NUMERICAL MATHEMATICS AND MATHEMATICAL
STATISTICS)
Stavebná fakulta
Doc.Ing.Roman Vodička, PhD.
RNDr. PavolPurcz, PhD.
Introduction
Aim
Providing theoretical knowledge required for studying specialized subjects and applying obtained knowledge to the solution of technically oriented
using the methods of numerical
mathematics and mathematical
statistics
Contents and objectives
The aim of the module is to familiarize students with the necessary mathematical
theory of numericalmathematics (NM) and mathematical
statistics
(MS) and their applications.Each chapter contains definitions of terms and their properties required to solve the problems.
The publication includes
the definitions of allnecessary notions and terms and required theorems and propositions, some of them being proved within the solved The
examples.
publication thus contains the solved examples and problems for solving and self-assessment according to the following contents.
1. NM — Linear and nonlinear equations
2. NM — Interpolation and approximation
3. NM — Differentialequations
4. MS — The probability theory
5. MS — Descriptive statistics
6. MS — Estimates and hypotheses
7. MS — Correlation and regression
Prerequisites
Matematika I (Mathematics I.) (P. Purcz, R. Vodička, TU Košice, 2012, ISBN 978-80-553-1279-8)
Matematika II (Mathematics II.) (P. Purcz, R. Vodička, TU Košice, 2015, ISBN 978-80-553-2046-5)
Introduction – 2
Chapter 1. (Linear and nonlinear equations)
Aim
Find the solution of various types of algebraic equations and of their systems.
Objectives
1. Provide students with some numerical techniques for solving the equation f (x) = 0.
2. Find an approximate solution of linear equation systems by simple iterative methods.
3. Use minimization of quadratic functional as a tool for solving the linear equation systems.
Prerequisites
matrices; linear equation; functions; derivative and its properties; functional minimization
Introduction
Main aim of this chapter is to define and demonstrate solutions of the basic problems of numerical mathematics in solving various types of algebraical
equations We start with the nonlinear equations f (x) = 0, which will be solved by two simple methods. In the next part, an approximate solution to
the linear equation systems will be presented. The methods will be split into two groups. One of them is base on a modification of the linear equation
system leading to a simple iterative formula. The other group uses the minimization algorithms for quadratic functionals as the solution of some types
of linear equation systems is provided by the minimizer of an appropriate functional. The method of this type are widely used in various commercial
software, used also by civil engineers. The final part of the chapter is devoted to the solution of the nonlinear equation systems, where we present one
common algorithm.
Chapter 1. (Linear and nonlinear equations) – 1
Linear and nonlinear equations – Numerical methods for solving the equation f (x) = 0.
There are many nonlinear equations which cannot be solved exactly, thus the methods for finding approximate solutions are useful.
First, we introduce some numerical methods for solving the equation f (x) = 0. In the first part, we will show how to make the separation of roots and
then we describe some specific procedures for calculating the root of an equation with one variable.
The root of the equation f (x) = 0 in the domain of real numbers is a real number x̂ for which f (x̂) = 0. For some specific types of equations, there
are known mathematical formulas and procedures that allow to find just one or all of their roots. In general, however, it is not true, for most problems,
we can find a solution only approximately, i.e. we can find the approximate value of the root of x̂. This value is called the approximation of the root.
The calculation of roots of the equation f (x) = 0 is usually split into two stages which proceed as follows:
• using some methods of separation, we determine the interval in which exactly one root lies
• using some suitable numerical methods, we calculate the approximate value of the root with the required precision
The separation of roots can be done by two ways. The first way is the graphical method. The equation f (x) = 0 is possible to be modified into a
form g(x) = h(x), so that we can easily draw graphs of both functions g(x) and h(x). The x-coordinates of the intersections represent found solutions.
Based on this fact, we can then determine various intervals, which cover only a single root.
Example 1.1 Let’s separate the roots of the equation x − sin x − π4 .
Solution. We modify the given equation to the form x = sin x + π4 and we construct graphs of both functions (see Fig. 1.1). At the same time this
picture easily prove that the given equation has exactly one real root, which lies somewhere in the interval h0, πi.
Example 1.2 Let’s separate the roots of the equation exp x − x − 2 = 0.
Solution. We modify the given equation into the form exp x = x + 2 and we again construct graphs of both functions (see Fig. 1.2). Similarly, this
figure clearly shows that the given equation has exactly one real root, which lies somewhere in the interval h1, 2i.
Besides, a root can be separated by investigating of the function properties. First, using the table of function’s values f (x) we find the values a and
b for which holds f (a).f (b) < 0 so that the root lies in the interval ha, bi. Moreover, if f 0 (x) does not change its sign within this interval, then it is
possible to say that there is only one real root of the equation in ha, bi as the function is monotone. If the sign changes, then, using a table of values,
Figure 1.1: The graphs of the functions y = x and y = sin x + π4 .
Figure 1.2: The graphs of the functions y = x + 2 and y = exp x.
we narrow this interval until the condition of constant sign of f 0 (x) is valid, i.e. we find new values c and d so that hc, di⊂hc, di, f (c).f (d) < 0 and
f 0 (x) does not change its sign in hc, di.
A similar root separation property can be formulated for the second derivative of the function f as it is closely connected to the curvature of the function
(i.e. convex or concave).
Example 1.3 Let’s separate the roots of the equation x3 + x − 1 = 0.
Solution. We construct Table 1.1 for some functional values and we compare their signs. We see that in the interval h0, 1i definitely lies at least one
Table 1.1:
-1 0 1
x
f(x) -3 -1 1
real root. Next, f 0 (x) = 3x2 + 1, and it follows that f 0 (x) does not change its sign in the given interval. Thus, there is only one root in this interval.
Example 1.4 Let’s separate a positive root of the equation −x4 + 2x2 + 1 = 0.
Solution. We construct Table 1.2 for some of the functional values and we compare their signs. We see that in the interval h0, 1i lies at least one real
Table 1.2:
x
0 1
f(x) 1 -1
root.
Table 1.3:
0 0.5 1
x
f(x) 1 1.25 -1
Next, f 0 (x) = −16x3 + 4x = 4x(1 − 4x2 ). Because f 0 (0.5) = 0, so it is clear, that f 0 (x) changes its sign in h0, 1i. We construct the next Table 1.3
which shows that in the new interval h0.5, 1i lies the same root and f 0 (x) does not change its sign.
Of course, there are more methods of dealing with the problem of separating the roots which can be found in references.
If we know where a root lies, a iteration method can be used to specify the root with the prescribed accuracy In numerical mathematics, the roots are
not determined exactly, but with some prescribed accuracy. Let xk+1 and xk be two sequential approaches obtained during the calculation of the root x.
Then xk+1 is called an approximate solution of the equation f (x) = 0 with precision ε if holds: | xk+1 − xk |< ε. There are several iterative methods
which require different input assumptions which depend on the character of the considered problem. The algorithms have different speed of convergence,
i.e. the number of iteration steps needed to reach a final solution satisfying the given accuracy.
The first and the simplest method is the bisection method. It does not require any additional conditions, only two principal: first, f is continuous and,
second, existence of two values a and b so that f (a).f (b) < 0 and a guaranteeing condition that in the interval ha, bi, there is only one real root of the
problem f (x) = 0. The procedure of solution is then as follows:
A1. Calculate new value of f (x) in the middle of interval ha, bi, i.e. f (c), where c =
a+b
.
2
B1. If f (c) = 0, then the value c is the exact solution of the equation f (x) = 0. Otherwise, the interval ha, bi is replaced by the right or the left half
of the considered original interval, depending on the situation, where the found solution belongs. If f (a).f (c) < 0, resp. f (a).f (c) > 0, the root
of the equation is located on the left, resp. right half of the interval ha, bi. Overwriting the respective boundaries of the interval, we get a new
interval hâ, b̂i with half length.
C1. If | b − a |< ε, then as a solution we can consider, for example, the last calculated value c, otherwise we go to the item A1 and whole process
repeats.
Example 1.5 Using the bisection method determine the negative root of the equation x4 + 2x2 − 25 = 0 with the precision ε = 0.0001 .
Solution. Step by step, we calculate some values of the given function (Table 1.4) and we can see that the solution of the problem is located in the
Table 1.4:
x
f(x)
0
-1 -2 -3
-25 -26 -13 -50
interval h−3; −2i. The middle of this interval is the number −2.5 and because f (−3).f (−2.5) > 0, the solution exists in the new interval h−2.5; −2i.
The middle of this new interval is number −2.25. Because f (−2.5).f (−2.25) < 0, the solution exists in the next interval h−2.5; −2.25i, and so on.
This way is possible to apply until the fourteenth step (see Table 1.5) where we have the interval h−2.33386; −2.33380i. Because the length of this
Table 1.5:
i
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
a
-3
-2.5
-2.5
-2.375
-2.375
-2.34375
-2.34375
-2.33594
-2.33594
-2.33398
-2.33398
-2.33398
-2.33398
-2.33386
-2.33386
b
-2
-2
-2.25
-2.25
-2.3125
-2.3125
-2.32813
-2.32813
-2.33203
-2.33203
-2.33301
-2.33350
-2.33374
-2.33374
-2.33380
interval is equal to 0.00006 we can consider the last value −2.3338 as the searched root.
The second method is the Newton method known also as the method of tangent lines. Compared with the previous method, this method is much
faster, but requires more specific conditions. Namely, additional conditions with respect to the second derivative f 00 (x) have to be satisfied. The second
derivative has the constant sign in the interval ha, bi and, to solve this problem, we can start from any point ξ∈ha, bi, where f (ξ).f 00 (ξ) > 0. The
procedure is then as follows:
A2. We put x0 = ξ.
k)
B2. Next, we calculate step by step xk+1 = xk − ff0(x
, for k = 0, 1, 2, .... It is possible to show that the sequence of values x0 , x1 , x2 , ... converges to
(xk )
the exact solution x̂ of the equation f (x) = 0. Geometric representation of this procedure in the case of determining the 3 points x0 , x1 , x2 , x3 is
presented in Fig. 1.3.
C2. If the condition | xk+1 − xk |< ε holds for a k ∈ N then this value xk+1 is considered as the approximate root of the equation, otherwise we return
to the item B2 and whole process repeats.
Figure 1.3: The idea of approximation using the Newton method of tangent lines.
The conditions given for this method indicate that there are 4 options for the form of the function in the interval ha, bi. The condition f (ξ).f 00 (ξ) > 0
in the cases I. and II. is satisfied, if for example ξ = a, in the cases III. and IV. it is for ξ = b (see Fig.1.4).
Example 1.6 Using the Newton method with precision ε = 0.0001, determine the negative root of the equation x4 + 2x2 − 25 = 0.
Solution. Step by step, we calculate some values of the given function (Table 1.6) and we can see that the solution of the problem is located in the
Table 1.6:
x
f(x)
0
-1 -2 -3
-25 -26 -13 -50
Figure 1.4: Graphs of functions.
Table 1.7:
x0
x1
x2
x3
x4
x5
-3
-2.5283
-2.35583
-2.33416
-2.33384
-2.333839
interval h−3; −2i. Because f 00 (x) = 12x2 > 0 in the whole interval h−3; −2i, the condition f (ξ).f 00 (ξ) > 0 holds for example for ξ = −3. We put
50
x0 = −3 and following the item B2, we get x1 = −3 − ff0(−3)
= −3 − −106
= −2.52983. Similarly, we calculate next values (see Table 1.7) until x5 ,
(−3)
where we check after each step the condition for the prescribed accuracy The condition is satisfied for the value x5 , it means | x5 − x4 |< ε, so it is
possible to consider the value −2.333839 as the approximate root of the given equation.
Linear and nonlinear equations – Approximate solution of linear algebraic equations
Practical solutions, especially those performed by a computer, of large systems of linear equations are usually obtained by approximate methods.
In linear algebra we have learned both, to find exact solutions of linear equations and to answer the question of their solvability. In some parts of numerical
mathematics, for example if we solve partial differential equations (as well as other common technical problems), we meet the need of solving systems
of linear equations with guaranteed unique solvability. Then, it is possible to apply some well-known simple and relatively fast numerical procedures that
provide only approximate solutions with an error which is sufficient for those applications. In this section, we show two such algorithms, which are based
on one common platform.
Let’s consider a system of n linear equations with n variables in the form:
a11 x1 + a12 x2 + . . .
a21 x1 + a22 x2 + . . .
..
..
.
.
an1 x1 + an2 x2 + . . .
+ a1n xn =
+ a2n xn =
..
.
b1
b2
(1.1)
+ ann xn = bn .
It is possible to write the system in a matrix form as follows:
A·x=b
where

a11 a12 . . .
 a21 a22 . . .

A =  ..
..
..
 .
.
.
an1 an2 . . .

a1n
a2n 

..  ,
. 
ann


x1
 x2 
 
x =  ..  ,
.
xn
 
b1
 b2 
 
b =  .. 
.
bn
From the linear algebra we know that the given system has a unique solution if the matrix A = (aij ) is regular, i.e. if its determinant is nonzero.
We define
P some important notion
Pwhich we use below. First, a square matrix A = (aij ) of the size n is called diagonally dominant, if holds:
|aii | > nj=1; j6=i |aij |, or |aii | > nj=1; j6=i |aji | for each i = 1, 2, ..., n. It means that a diagonally dominant matrix has substantially larger diagonal
terms than the off-diagonal ones.
The magnitude of a matrix is given by its norm. There are several
or column norm, respectively, of the square matrix
Pn norms used in practice. TheProw,
n
A = (aij ) of the size n is the number: kAkm = maxi=1,2,...,n j |aij |;or kAke = maxj=1,2,...,n i |aij |. The Frobenius norm of the matrix A = (aij ) of
qP P
n
n
2
the size n is the number kAkk =
i=1
j=1 aij .
Let us demonstrate the algorithms of two methods. The first one is the Jacobi iteration method. The application of this method is based on standard
matrix form of the system Ax = b. Let us suppose that the system has all the diagonal elements non-vanishing. Then we can modify each equation to
the following form:
n
X
1
(bi −
aij xj );
i = 1, 2, ..., n
xi =
aii
j=1; j6=i
by expressing explicitly the diagonal unknown, i.e. the i-th unknown from the i-th equation.
In a general iterative method, we modify the origin system from standard form Ax = b to a new form x = Ux + V From this new form is possible to
create following sequence of iteration steps:
x(k+1) = Ux(k) + V;
k = 0, 1, 2, ...
where x(0) is any starting approximation. If this new created sequence x(k) , k = 0, 1, 2, ... converges, we talk about a convergent iterative process, or
about a convergent method. We can state that:
A3. If the sequence x(k) , k = 0, 1, 2, ... converges, then it converges to the exact solution of given problem.
B3. If some of the matrix norms of U (row, column or Frobenius) is less than one, then the sequence x(k) , k = 0, 1, 2, ... converges to the unique
(∗)
(k)
solution of the problem x(∗) independently of the choice of the initial approximation x(0) ; e.g. limk→∞ xi = xi for each i = 1, 2, ..., n.
C3. If the matrix A of the system is diagonally dominant, then the Jacobi method converges to the exact solution of the system for any choice of the
initial approximationx(0) .
D3. Any system of linear equations can be equivalently modified to a system with diagonally dominant matrix.
As the initial approximation x(0) , we choose mostly zero vector (0, 0, ..., 0) as it is easily to substitute. Of course, there are another necessary and
sufficient conditions for the convergence of the solutions of this problem which can be found in the references.
Similarly to solving the nonlinear equation, we require to ha a stopping criterion for the iterative process. As soon as during the iterative process we find
(k+1)
(k)
(k+1)
(∗)
kU k
that the condition kxi
− xi k < ε is satisfied for each i = 1, 2, ..., n, where ε is the previously specified accuracy then also kxi
− xi k < ε 1−kU
k
and value x(k+1) can be considered as sufficiently precise approximation of the exact solution x(∗) .
The full system of modified equations used in practical calculations can be written as:
P
(k)
(k+1)
x1
= a111 (b1 − nj=1; j6=1 a1j xj )
P
(k+1)
(k)
= a122 (b2 − nj=1; j6=2 a2j xj )
x2
..
.
P
(k)
(k+1)
1
(bn − nj=1; j6=n anj xj )
xn
= ann
Example 1.7 Solve the following system using the Jacobi method with the precision ε = 0.001.
5x1
+ 0.12x2 + 0.09x3 = 10
0.08x1 + 4x2
− 0.12x3 = 20 .
0.18x1 − 0.06x2 + 3x3
= −4.5
Solution. Since the system satisfies the condition of the dominant diagonal, it is guaranteed that an iterative process will converge to the exact solution
of the task. The system thus modify to the form:
x1 =
2
− 0.024x2 − 0.018x3
x2 =
5
− 0.02x1 + 0.03x3 .
x3 = −1.5 − 0.06x1 + 0.02x2
We choose
x(0)
 
0
= 0
0
and continue in the iteration

x(1)

2
=  5 ,
−1.5

x(2)

1.907
=  4.915  ,
−1.52

x(3)

1.9094
=  4.91626  ,
−1.52612

x(4)

1.9093
=  4.91633 
−1.52624
Because for both approximations, x(3) and x(4) , the condition of given precision of solutions is satisfied, we can consider the vector of solutions x(4) as
the final solution of the original problem.
The other method is the Gauss-Seidel iterative method.
In this method, we use the same form of the system as in the Jacobi method as well as the same matrix notation Ax = b. In contrast to the Jacobi
(k+1)
of the vector x(k+1) , we use all the latest known components, e.g. from the
method for the calculation of each new approximated component xi
vector x(k+1) or from the vector x(k) by following instruction:
(k+1)
xi
i−1
n
X
X
1
(k+1)
(k)
=
(bi −
aij xj
−
aij xj );
aii
j=1
j=i+1
i = 1, 2, ..., n;
k = 0, 1, 2, ...
The choice of the initial approximation as well as the convergence conditions are controlled by the same rules as used and explained by the Jacobi
method. Especially, we stress the condition of the diagonal dominance of the matrix which guarantee the convergence of the process for arbitrary choice
of the initial approximation.
Example 1.8 Solve the following systems of linear equations, using the Gauss-Seidel method, with the precision ε = 0.05.
7x1 − 2x2 + x3 = 2
3x1 − 8x2 + 2x3 = −11 .
x1 + 6x2 + 5x3 = −8
Solution. Since the system does not satisfy the condition of the dominant diagonal, it needs to be modified to a new form that will satisfy such a
condition, for example, adding the second equation to the third, we get a new system:
7x1 − 2x2 + x3 = 2
3x1 − 8x2 + 2x3 = −11 .
4x1 − 2x2 + 7x3 = −19
This system already satisfies the condition of the dominant diagonal, thus it is guaranteed that the iterative process will converge to the exact solution
of the task. To calculate the next approximation of the solution for k = 0, 1, 2, ..., we use the following equations:
(k+1)
x1
x2
x3
=
=
=−
2
7
11
8
19
7
+
+
−
2 (k)
x
7 2
3 (k+1)
x
8 1
4 (k+1)
x
7 1
−
+
+
(k)
1
8x3
7
1 (k)
x
4 3
2 (k+1)
x
7 2
.
As the initial approximation, we can choose now, for example, the right-side vector, e.g.
 2 
x(0) = 
Then we obtain

x(1)

1.066326531
=  1.096301021  ,
−3.010386297
7
11 
8
− 19
7
.

x(2)

1.028998334
=  1.008277801  ,
−3.010386297

x(3)

1.004394428
=  0.998096563 
−3.003054941
Because for the last two approximations x(2) and x(3) the condition for given precision of the solution is satisfied, the vector of solutions x(3) can be
considered as the approximate solution of the origin problem with prescribed accuracy.
Linear and nonlinear equations – Gradient methods for solving linear equations systems
An effective way for solving systems with symmetric matrices
Recently, methods for solving certain systems of linear equations have bee developed which are based on the following idea. Let f(x)= 21 a x2 −b x be a
quadratic function of a real variable x, where a is a positive number and b is an arbitrary real number. The minimization point of this function can be
found by solving a simple linear equation: a x=b. Following this idea for an n-tuple of real numbers x= (x1 , x2 , . . . , xn )> , we need a quadratic functional
with the unknown x determined by a square matrix A with n rows and a column matrix b with n rows, too:
1
f(x)= x> Ax−x> b.
2
(1.2)
In the case of one variable, we needed positive a, so with n variables we also have to define positiveness of the matrix A in some sense.
Example 1.9 Find a condition for the matrix A which makes the functional f(x)= 12 x> Ax−x> b to have such minimum that the minimizer is determined
by the solution of the linear equations system Ax=b.
m
Solution. The necessary condition for minimization of f with respect to x= (x1 , x2 , . . . , xn )> is ∂f(x)
=0, for all k=1, 2, . . . , n. As ∂x
=1 for k=m and
∂xk
∂xk
otherwise it vanishes, we obtain
!
!
!
n
n
n
n
n
X
X
X
X
X
1
1
1
∂f(x)
∂
>
x i bi =
Ax + A x − b ,
=
xi
Aij xj −
Akj xj +
xi Aik − bk =
∂xk
∂xk 2 i=1
2 j=1
2
k
i=1
i=1
i=1
for each k=1, 2, . . . , n. Thus for the minimization point w, we have an equation
1
A + A> w = b.
2
If simultaneously we demand w to be a solution of the system Ax=b, the matrix A should be symmetric, i.e. A=A> .
Counting the second derivative of the functional f for all k=1, 2, . . . , n and for all l=1, 2, . . . , n and considering the matrix A symmetric, renders
!!
n
n
X
∂ 2 f(x)
∂
1 X
1
=
Akj xj +
xi Aik
= (Akl + Alk ) = Akl .
∂xl ∂xk
∂xl 2 j=1
2
i=1
The principal subdeterminants of the matrix A have to be positive to guarantee minimization of the function f with n unknowns.
A symmetric matrix is called positively definite, if for all nonzero x is the product x> Ax positive. The condition which proves positive definiteness of a
symmetric matrix is called the Sylvester criterion, and it requires all principal subdeterminants to be positive, i.e.:
A11 A12 · · · A1n A11 A12 A13 A21 A22 · · · A2n A
A12 x> Ax > 0 pre všetky x 6= 0 ⇔ A11 > 0 ∧ 11
> 0 ∧ A21 A22 A23 > 0 ∧ · · · ∧ ..
(1.3)
..
.. > 0
.
.
A21 A22 .
.
.
. A31 A32 A33 An1 An2 · · · Ann Chapter 1. (Linear and nonlinear equations) – 13
Thus, for a symmetric positively definite matrix A, the quadratic functional (1.2) reaches its minimum at the vector x which is the solution of the linear
equations system Ax=b for arbitrary vector b.
Example 1.10 Verify that the symmetric matrix A=
4 1 0
1 4 1
0 1 4
is positively definite. Use the definition and the Sylvester criterion, too.
Solution. Using the criterion is very easy. The first condition reads A11 =4>0. The other conditions can be checked by calculating the following
determinants:
4 1 0
4 1 = 16 − 1 = 15 > 0, 1 4 1 = 64 − 4 − 4 = 56 > 0.
1 4 0 1 4
All three values are positive which proves positive definiteness of the symmetric matrix.
Using the definition, the required matrix product should be modified. Having arbitrary x= (x1 , x2 , x3 )> , we obtain

 


4
1
0
x
4x
+
x
1
1
2
x> Ax = x1 x2 x3 1 4 1 x2  = x1 x2 x3 x1 + 4x2 + x3 
0 1 4
x3
x2 + 4x3
= 4x21 + 2x1 x2 + 4x22 + 2x2 x3 + 4x23 = 3x21 + (x1 + x2 )2 + 2x22 + (x2 + x3 )2 + 3x23 ≥ 0.
The inequality holds as in the last expression only nonnegative terms (squares) are added. Let us find when the equality occurs. It may happen only if
all the terms are equal to zero, hence 3x21 =0, (x1 + x2 )2 =0, 2x22 =0, (x2 + x3 )2 =0, 3x23 =0. This is possible only for x1 =0, x2 =0 and x3 =0 which is
valid only for a zero triple x. Thus, the matrix is positively definite, because for all nonzero x the product x> Ax is positive.
To find the minimum of the functional f (1.2), we use a one-step iterative method, where the iteration (k+1) is sought in the form x=x(k) +αd(k) . The
vector d(k) is determined prior to the calculation and represents a direction in which the minimum of f is searched for. Thus, the only unknown is α and
its value, which minimizes f is denoted α(k) and pertinent x is x(k+1) . Hence
f(x(k+1) ) = f(x(k) + α(k) d(k) ) = min f(x(k) + αd(k) ).
α
(1.4)
The functional f has a unique minimum, if A is a symmetric positively definite matrix, which also means that the stationary point with respect to α is
also a minimum for the function f̂(α)=f(x(k) + αd(k) ). Searching for the stationary point provides
> > df̂(α)
d 1 (k)
(k)
(k)
(k)
(k)
(k)
0=
=
x + αd
A x + αd
− x + αd
b = α(d(k) )> Ad(k) +(d(k) )> Ax(k) −(d(k) )> b = α(d(k) )> Ad(k) −(d(k) )> r(k) ,
dα
dα 2
(1.5)
where in the last expression we used the residue r(k) for the k-th iteration defined for a general vector x as r=b−Ax. The solution α(k) of this equation
reads
(d(k) )> r(k)
(k)
,
pričom
r(k) = b − Ax(k)
(1.6)
α = (k)
(k)
>
(d ) Ad
Example 1.11 Let us find a relation of the residue to the functional f.
Solution. Based on the Example 1.9, we know that for a symmetric matrix A holds
∂f(x(k) )
(k)
= Ax(k) − b i = −ri ,
∂xi
which written in a matrix form renders grad f(x(k) )= − r(k) . As long as the gradient determines the direction of the steepest ascent of the function, the
residue defines a direction of the steepest descent of the function.
Still, there is a question how to choose the directions d(k) in the iterative process. The direction should correspond to a direction in which the values of
the functional f decrease as much as possible. A natural choice for such a direction is the direction of the residue at the pertinent point as it defines the
steepest descent direction. It leads to a simple iterative method which is called the steepest descent method, whose algorithm can be written as follows,
using the formulae derived in the equations (1.5) and (1.6) and considering the option d(k) =r(k) :
A4. Choose initial iteration x(0) and accuracy ε; calculate the residue r(0) = b − Ax(0) ; put k=0.
B4. Calculate the coefficient α(k) by the relation (see (1.6))
α(k) =
(r(k) )> r(k)
>
(r(k) ) Ar(k)
.
C4. Find the next iteration and its residue
x(k+1) = x(k) + α(k) r(k) ,
r(k+1) = r(k) − α(k) Ar(k) .
D4. If kx(k+1) − x(k) k < ε (or kr(k+1) k < ε), terminate the calculation and put x̃ ≈ x(k+1) . Otherwise, increase k by one and continue by the item B4.
Let us note that the second formula in C4 was obtained from the first one using the second relation of (1.6). The described algorithm is not used
frequently in calculations due to its low convergence rate and the fact that there are faster algorithms.
The conjugate gradient method belongs to such improvements. It searches the next iteration so that all residues are mutually orthogonal, i.e. they satisfy
the relation (r(k) )> r(i) =0 for all i6=k. The orthogonality makes r(n) to vanish as there are only n non-zero orthogonal n-tuples. The zero residue means
that x(n) is an exact solution of the system Ax=b. It should be noted, however, that the zero residue is practically never obtained, if rounding errors
are considered. This is not a problem because our aim is to find the solution with a given accuracy ε. The sequence of residues can be obtained if the
minimizing conjugate directions d(k) are mutually A-orthogonal which means that (d(k) )> Ad(i) =0 for all i6=k. An effective algorithm for the conjugate
gradient method follows:
A5. Choose an initial iteration x(0) and accuracy ε; calculate the residue r(0) = b − Ax(0) ; put d(0) = r(0) ; k=0.
B5. Calculate the minimizing coefficient α(k) (see (1.6)), next iteration and its residue using the formulae
α(k) =
(r(k) )> r(k)
,
(d(k) )> Ad(k)
x(k+1) = x(k) + α(k) d(k) ,
r(k+1) = r(k) − α(k) Ad(k) .
C5. Find subsequent coefficient for the next conjugate direction:
β (k) =
(r(k+1) )> r(k+1)
,
(r(k) )> r(k)
d(k+1) = r(k+1) + β (k) d(k)
D5. If kx(k+1) − x(k) k < ε (or kr(k+1) k < ε), terminate calculation and put x̃ ≈ x(k+1) . Otherwise, increase k by one and continue by the item B5.
In both algorithms, the sopping criterion based on the residue is usually used as the residues are directly calculated in both algorithms, see formulae C4
and B5. Let us remark that if the direction d(k) is used to find x(k+1) , using the minimization determined in the relation (1.5), and the residue r(k+1)
has the same direction as gradient of f (1.2) at the point x(k+1) , the vectors d(k) and r(k+1) have to be orthogonal, i.e. (d(k) )> r(k+1) =0 for any k.
Example 1.12 Deduce the formulae for calculating α(k) and β (k) in the conjugate gradient method.
Solution. The formula (1.6) defines α(k) . To identify it with the relation in B5, the numerator has to be modified. Making the inner product of the
second formula in C5 with r(k+1) renders
(r(k+1) )> d(k+1) = (r(k+1) )> r(k+1) + β (k) (r(k+1) )> d(k) .
The last term vanishes due to orthogonality property mentioned in the paragraph above this example. The rest of the relation changes the formula (1.6)
to the first of B5.
To find the coefficient β (k) , the second formula in C5 is multiplied by the vector Ad(k) and realizing the A-orthogonality of the conjugate directions d(k)
renders
0 = (d(k+1) )> Ad(k) = (r(k+1) )> Ad(k) + β (k) (d(k) )> Ad(k) ,
(k+1) >
(k)
Ad
which leads to β (k) = − (r(d(k) )>) Ad
(k) .
The formula in C5 is a bit different. Nevertheless, the term in the numerator Ad(k) can be substituted and subsequently α(k) can be substituted, too,
both by the formulae from B5. Also, orthogonality of the residues: (r(k+1) )> r(k) =0 can be realized. The substitutions provide the relations
(r(k+1) )>
β
(k)
1
(r(k+1) )> α(k)
r(k+1) − r
(r(k+1) )> Ad(k)
= − (k)
=
(d )> Ad(k)
(d(k) )> Ad(k)
(k)
1
(r(k) )> r(k)
> (k)
d(k)
Ad
(
)
(k) >
(d ) Ad(k)
=
r(k+1)
=
(r(k+1) )> r(k+1)
,
(r(k) )> r(k)
where the final fraction contains the required form of β (k) .
Example 1.13 Find two iterations in the solution of the system
8x1 − x2 = 17,
−x1 +8x2 =−10,
by the conjugate gradient method and calculate norms of their residues.
Solution. Let us denote the matrices of the system
A=
8 −1
,
−1 8
b=
17
.
−10
First, we verify that the matrix A is symmetric positively definite. Symmetry is obvious and the principal subdeterminants are: det A>0 and A11 >0,
hence also positivity holds. We choose zero initial iteration and we calculate the two required iterations
0
146
(0)
(0)
(0)
(0)
(0)
x =
, r = d = b , Ad =
,
0
−97
> 17
17
·
−10
−10
389 .
0
17
1.921
17
146
0.502
(0)
(1)
(1)
α =
= 0.113, x =
+ 0.113
=
, r =
− 0.113
=
,
> =
0.961
0
−10
−1.130
−10
−97
3452
17
146
·
−10
−97
> 0.502
0.502
·
0.961
0.961
1.176 .
0.502
17
0.553
3.493
(1)
(1)
(1)
β = = 0.003, d =
+ 0.003
=
, Ad =
,
> =
0.961
−10
0.931
6.895
389
17
17
·
−10
−10
α
(1)
> 0.502
0.502
·
0.961
0.961
1.176 .
= 0.141,
=
> =
8.351
0.553
3.493
·
0.931
6.895
x
(2)
=
1.921
0.553
1.999
+0.141
=
,
−1.130
0.931
−0.999
r
(2)
0.502
3.493
0.009
=
−0.141
=
.
0.961
6.895
−0.011
Notice decreasing norms of the residues: kr(0) k = 17, kr(1) k = 0.961, kr(2) k = 0.011. The norm of the residues decreased three orders. Of course, if
we calculated exactly (no rounding errors), r(2) would vanish, because the system has only two unknowns. Compare:
0
146
(0)
(0)
(0)
(0)
(0)
x =
, r = d = b , Ad =
,
0
−97
> 17
17
6613 945 ·
−10
−10
389
389
389
0
17
17
146
(0)
(1)
(1)
3452
α =
, x =
+
, r =
,
= −1945
−
= 1726
=
> 3213
0
−10
3452
3452 −10
3452 −97
146
17
1726
3452
·
−97
−10
β (1)
945 > 945 1726
945 7131537 23159115 · 1726
3213
3213
35721
35721
17
(1)
(1)
3452
3452
5958152
=
, d = 1726
+
= 11916304
, Ad = 78740991
,
=
> 3213
5367033
11916304
11916304 −10
17
17
3452
5958152
11916304
·
−10
−10
945 > 945 1726
6613 7131537 945 · 1726
3213
3213
3452 11916304
3452 23159115
3452
2
0
(2)
(2)
(1)
3452
3452
3452
1726
5958152
, x = −1945 +
=
, r = 3213 −
=
.
α =
> 23159115 =
5367033
78740991
7131537
−1
0
24507
24507 5958152
24507 11916304
1726
3452
11916304
5958152
· 78740991
5367033
5958152
11916304
Nevertheless, nobody would calculate with such horrible fractions.
Finally, we stress that the matrix of the system A has to be symmetric positively definite. If this is not the case of a system Ax=b but the system still
has a unique solution, it can be easily modified to convert to a system with a symmetric positively definite matrix. It is sufficient to multiply the original
system by the matrix A> from the left to obtain
A> Ax=A> b.
(1.7)
The matrix of this system A> A is always symmetric and it is also positively definite provided that A is non-singular. However, such modifications are
not solved frequently, because the numerical properties of the matrix A> A cause the described algorithm to converge slowly. Nevertheless, there exist
gradient methods also for general systems of linear equations, but they are out of the scope of this course.
Linear and nonlinear equations – Approximate solution of systems of nonlinear equations
Nonlinear physical laws render nonlinear equations or systems in their solving.
The problem will be described in general, though in practical calculations we will focus only to solve two equations with two variables. Following the way
of numerical solution of one equation with one variable (that of the type f (x) = 0 solved in the first part of this chapter), something in the description
of the present approach may appear quite similar, but the expansion to the nonlinear systems also brings more complexity, e.g. in determination of an
initial approximation, in the proof of convergence of the iterative process, and so on.
We will present only one method which is widely used – the Newton method. Let’s consider a system of nonlinear equations in the form
fi (x1 , x2 , ..., xn ) = 0,
i = 1, 2, ..., n;
where x = (x1 , x2 , ..., xn ) and F = (f1 , f2 , ..., fn )T . In the vector form, the system can be simply written as F (x) = 0.
If the functions fi , i = 1, 2, ..., n are continuously differentiable in some domain around the point x, then the Taylor theorem renders
fi (x1 + τ1 , x2 + τ2 , ..., xn + τn ) = fi (x1 , x2 , ..., xn ) =
n
X
∂fi (x1 , x2 , ..., xn )
j=1
∂xj
τj + o(kτ k)
where o(kτ k) is a small function converging to zero faster than kτ k or in vector form:
kF (x + τ ) − F (x) − F 0 (x).ττ k = o(kττ k)
with τ = (τ1 , τ2 , ..., τn )T , kττ k = max[|τ1 |, |τ2 |, ..., |τn |], F 0 (x) – the matrix of partial derivatives ( ∂f∂xi (x)
)ni,j=1 .
j
A successful application of the Newton method, i.e. guaranteed convergence, requires satisfaction of some conditions. Let in some domain Ωa = {x :
kx − x∗ k < a} hold the following conditions for the exact solution x∗ of the equation F (x) = 0:
A6. k(F 0 (x))−1 k ≤ a1 , ∀x ∈ Ωa ,
B6. kF (x1 ) − F (x2 ) − F 0 (x2 )(x1 − x2 )k ≤ a2 kx2 − x1 k2 , ∀x1 , x2 ∈ Ωa ,
C6. x(0) ∈ Ωb , where b = min{a, (a1 .a2 )−1 },
where (F 0 (x))−1 is the inverse matrix to the matrix of partial derivatives F 0 (x). Then the iterative process given by the formula
x(k+1) = x(k) − (F 0 (x(k) ))−1 .F (x(k) )
converges to the exact solution x(∗) . The term −(F 0 (x(k) ))−1 .F (x(k) ) expresses the increment R(x(k) ) which satisfies the equation −(F 0 (x(k) ))−1 .R(x(k) ) =
F (x(k) ). The calculation according to the above iteration relation will be stopped, if |R(x(k) )| < ε, where ε is prescribed accuracy of the calculation.
A choice of the initial iteration x(0) can be obtained either by a graphical scheme or by a calculation so that x(0) were sufficiently close to searched
solution x∗ , e.g. in a surrounding of x(0) where the conditions of the convergence are fulfilled as stated above.
Example 1.14 Determine approximately the solution couple (x1 , x2 ) using the Newton method with the precision ε = 0.00003, which satisfies the
following system of linear equations
x1 + 0.9x2 − 1 = 0,
x21 − x2 = 0.
Solution. If all of the first partial derivatives of the given functions do not change very fast, then the initial approximation x(0) can be determined by
both ways, graphically or numerically using a similar system of equations
x1 + x2 = 1,
x21 − x2 = 0.
In our case, the situation is conveniently described in Fig. 1.5). From Fig. 1.5), we can see that the system has two different couples of solutions.
Choose, for example x(0) = [−1, 5; 2], and in the next process we will approximate the solution in the surrounding of this chosen point. In this case both,
the vector function F (x) and the matrix F 0 (x) read
x1 + 0.9x2 − 1
1
0.9
F (x) =
, F 0 (x) =
.
x21 − x2
2x1 −1
The inverse matrix (F 0 (x))−1 is possible to determine using well-known procedures from the linear algebra. Thus we have
−1
−1 −0.9
0
−1
(F (x)) =
.
1
1 + 1.8x1 −2x1
After substituting the last calculations into the iterative relation and after performing all the required arithmetic operations, we have
x(1) = [−1.779411;
3.088236],
Figure 1.5: Graphs of functions for determining the initial approximation.
x(2) = [−1.747517;
3.052797],
x(3) = [−1.747089;
3.052323],
x(4) = [−1.747089;
3.052322].
Due to the required precision, the last iteration x(4) can be seen as the approximate solution of the origin problem.
Similarly, choosing another initial iteration x(0) , it is possible to find the second solution, too. In this case, we rather determine it by a calculation. The
system
x1 + x2 = 1,
x21 − x2 = 0.
is possible to solve directly, meant analytically. The solution is defined by two couples:
√
√
−1 + 5 3 − 5
[
;
],
2
2
√
√
−1 − 5 3 + 5
[
;
].
2
2
√
If we choose x(0) = [ −1+2
√
5 3− 5
; 2 ],
then after repeating the above procedure we obtain the following sequence of iterations:
x(1) = [0.636116;
0.404316],
x(2) = [0.635978;
0.404468],
x(3) = [0.635978;
0.404468].
Due to the required precision, the last iteration x(3) is the approximate solution of the origin problem. The other couple considered as x(0) , provides the
solution which we already have known.
Example 1.15 Determine approximately a solution couple (x1 , x2 ) using the Newton method with precision ε = 0.00003, which satisfies the following
system of nonlinear equations
x21 + x22 = 1.2
x31 = 0.8
Solution. The situation is described by Fig.1.6). From Fig. 1.6), we can see again that the system will have two different couples of solutions. Let’s
Figure 1.6: Graphs of functions for determining the initial approximation.
choose for example x(0) = [1; 0, 5] and we will now approximate the solution lying in a vicinity of this point. In this case the vector function F (x) and
its derivative F 0 (x) read
2
x1 + x22 − 1.2
2x1 2x2
0
F (x) =
, F (x) =
.
x31 − 0.8
3x2 0
The inverse matrix (F 0 (x))−1 can be determined analogically to that in the previous example, using methods of linear algebra. We obtain
−1
0
−2x2
0
−1
.
(F (x)) = 2
6x2 −3x2 2x1
Substituting into the iterative relation and performing all required arithmetic operations renders a sequence of iterations as follows:
x(1) = [0.933333;
0.583333],
x(2) = [0.928345;
0.581553],
x(3) = [0.928318;
0.581572].
Due to the required precision, the last iteration x(3) can be seen as the approximate solutions of the original problem. It should be noted that the given
problem has one another solution, the reader should be able to determine it on the basis of the procedures described above.
Self-assessment questions and problems for solving
1. Describe the basic difference (input conditions vs. the speed of convergence) between the Newton method and bisection method!
2. Is the Gauss-Seidel iterative method more effective as Jacobi in any case? Try to find a contra-example!
3. Why is it important for the matrix A to be symmetric positively definite for the conjugate gradient method?
4. How do yo explain that the conjugate gradient method provides the exact solution after a finite number of iterations when exact arithmetics is used?
Can you prove all the formula for this method?
5. Show that the matrix A> A in the relation (1.7) is symmetric positively definite!
6. Try to find the the main cause of greater difficulty for the finding of solutions of the nonlinear system of equations opposite to a linear ones!
In the problems 1. to 10., solve the nonlinear equations with the accuracy ε = 0.001.
x
3. cos 8 − 5x = 0 .
−x
1. x − 2e − 5.5 = 0 .
x
2. ln 2 −
x
3
4. sin 5 − 2 x + 4 = 0 .
1 2
5. tan x − 5 x − 1 = 0 (the smallest positive root) .
1
8x
= 0.
6. arctan x − 2x + 4 = 0 .
9.
x
1+x
− x2 + 4x + 1 = 0 (the positive root) .
4
3
7. x − 2x − x − 1 = 0 (the positive root) .
√
1 2
2
8. x + 1 − 5 x − 4x + 1 = 0 .
3
2
10. 4x − 25x − 100x + 625 = 0 (the largest positive root) .
2x
2
11. e + x − 4 = 0 .
2
14. ln (x + 3) − x = 0 .
−x
2
12. e + x − 2 = 0 .
2
15. x (2 + ln x) − 1 = 0 .
−2x
− ln x = 0 .
13. e
2
16. x sin 2x − 1 = 0 (the smallest positive root) .
x
19. e ln x − 1 = 0 .
17. x sin 2x − 1 = 0 (the negative root) .
2
18. x ln x − 1 = 0 .
2
−x
20. x − 2x − 4 + e = 0 .
x
21. ln x − e + 9 = 0 .
In the problems 22. to 25., solve the linear equation systems by the Jacobi method with the accuracy ε = 0.001.
10x1 + x2 + x3 = 12
22. 2x1 + 10x2 + x3 = 13 .
2x1 + 2x2 + 10x3 = 14
10.9x1 + 2.1x2 + 0.9x3 = −7
23. 2.1x1 + 9.8x2 + 1.3x3 = 10.3 .
0.9x1 + 1.3x2 + 12.1x3 = 24.6
−x1 + −x2 + 6x3 = 42
24. 6x1 − 4x2 − x3 = 11.33 .
−x1 + 6x2 − x3 = 32
x1 + 3x2 − x3
= 1
25. x1 − 0.3x2 + 2.4x3 = 3 .
2x1 + 0.5x2 + x3
= 3
In the problems 26. to 29., solve the linear equation systems by the Gauss-Seidel method with the accuracy ε = 0.001.
x1 + x2 − 5x3 = −8
26. x1 − 10x2 + x3 = −12 .
5x1 − x2 − x3 = −4
0.95x1 − 12x2 + 1.2x3 = 2.2
27. 3.1x1 + 0.2x2 − 0.64x3 = −1 .
x1
− 1.9x2 + 10x3 = 12
x1 + 3x2 − x3
= 1
= 3 .
28. 2x1 + 0.5x2 + x3
x1 − 0.3x2 + 2.4x3 = 3
9x1 + x2 − x3 = −10
29. x1 + x2 + 9x3 = 28 .
2x1 + 9x2 + 2x3 = 22
In the problems 30. to 35., find an approximate solution of the linear equations system by the conjugate gradient method. The accuracy is given as ε.
Choose the initial iteration arbitrarily.
30. ε = 0.5,
32. ε = 0.001,
34. ε = 0.1,
6x1 − 3x2
−3x1 + 2x2
= 5
.
= 2
31. ε = 0.05,
10x1 +
x2 +
x3
2x1 + 10x2 +
x3
2x1 + 2x2 − 10x3
4x1 − x2 − x3
−x1 + 4x2 − x3
−x1 − x2 + 4x3
= 12
= 13 .
= 14
=
3
= −2 .
=
3
7x1 − 2x2
x1 − 6x2
=
5
.
= −5
33. ε = 0.001,
9x1 + x2 − x3
x1 + 9x2 + 2x3
−x1 + 2x2 + 6x3
= −10
=
22 .
=
28
35. ε = 0.01,
5x1 − 2x2 − x3
−2x1 + 5x2
−x1
+ 3x3
=
6
= −7 .
=
2
36. Solve the previous problems by the steepest descent method. Find the third iteration of the solution and calculate the norm of the residue of the
third iteration. Choose the initial iteration arbitrarily .
In the problems 37. to 41., solve the nonlinear equation system with prescribed accuracy and given type of the initial iteration choice.
37. x21 − x2 − 0.2 = 0;
x21 − x2 − 0.2 = 0; ε = 0.002;
graphically .
38. 12.01 − 3x1 − 2x2 = 0;
39. x2 −
50
x21
− 0.01 = 0;
18 − 3.01x1 − 4x2 = 0; ε = 0.000001;
1.05x1 −
π
= 0;
40. x2 − cos x1 − sin 1000
41. 1 −
x21
16
−
x22
2
= 0;
1−
x21
8
20
x22
= 0; ε = 0.00001;
calculation .
x2 − x41 − 0.5 = 0; ε = 0.00001;
−
x22
4
= 0; ε = 0.00001;
calculation .
graphically .
graphically .
Conclusion
In this chapter, we looked at some basic methods used to solve both, the nonlinear equation f (x) = 0 and the system of linear and nonlinear equations.
We did not discuss all methods, but we showed, the most simple and principal methods, which are efficient in applications so that they can be used to
solve various problems in applied mathematics and in technical practice as well.
References
[1] Budinský, B., Charvát, J.: Matematika I, Praha 1987.
[2] Charvát J., Hála M., Šibrava Z.: Příklady k Matematice I, ČVUT Praha, 2002.
[3] Eliáš J., Horváth J., Kajan J.: Zbierka úloh z vyššej matematiky, časť I. (3.vyd.1980), Alfa, Bratislava.
[4] Ivan, J.: Matematika I, Bratislava 1983.
[5] Kreyszig, E.: Advanced Engineering Mathematics, New York 1993.
[6] Šoltés V., Juhásová Z.: Zbierka úloh z vyššej matematiky I, Košice 1995.
[7] G. I. Marčuk Metody numerické matematiky. Academia, Praha, 1987.
[8] Z. Dostál Optimal Quadratic Programming Algorithms. Springer, Berlin, 2009.
Problem solutions
The answers to the self-assessment questions, if you are unable to formulate, and also a lot of other answers can be found in provided references.
separation :< 5; 6 >
separation : < 2; 3 >
1. Newton : x0 = 5, x3 = 5.50811,
2. Newton : x0 = 2, x3 = 2.12139,
3. Newton : x0 = 0, x2 = 0.2,
bisection :< 5.508; 5.509 >, 10 sections
bisection : < 2.121; 2.122 >, 10 sections
4. Newton : x0 = 4, x3 = 3.0484,
5. Newton : x0 = 1, x4 = 0.85314,
6. Newton : x0 = 3, x3 = 2.60194,
7. Newton : x0 = 3, x5 = 2.27747,
8. Newton : x0 = 1, x3 = 0.51813,
9. Newton : x0 = 5, x3 = 4.41148,
separation : < −1; 0 >
k=7
Newton : x0 = −1, x4 = −0.71766;
x1 ≈ 1.0001
10.
18.
20.
22.
Newton : x0 = 7, x4 = 6.25
Newton : x0 = 2, x5 = 1.53158
x2 ≈ 1.0002
Newton : x0 = 4, x4 = 3.22718;
x3 ≈ 1.0002
k=7
k=9
k=9
k=6
k=4
k=9
k=5
x1 ≈ −0.9999
x1 ≈ 4.6657
x1 ≈ 0.9907
x1 ≈ −0.1579
x1 ≈ −0.0716
x1 ≈ 0.9894
x1 ≈ −0.9999
23.
24.
25.
26.
27.
28.
29.
x2 ≈ 1.0001
x2 ≈ 7.6185
x2 ≈ 0.2945
x2 ≈ 1.3684
x2 ≈ −0.0696
x2 ≈ 0.2949
x2 ≈ 1.9999
x3 ≈ 2.0001
x3 ≈ 9.0471
x3 ≈ 0.8749
x3 ≈ 1.8421
x3 ≈ 1.1939
x3≈ 2.9999

 x3 ≈ 0.8746


−0.8538
−1.00
1.000
5.33
30.
31. unsuitable unless modified by (1.7) 32. unsuitable unless modified by (1.7) 33.  1.6566  34.  0.00  35. −1.000
9.00
3.9722
1.00
1.000
37. a)x(0) = (0; 0), x(3) ≈ (0.033; −0.1989); b)x(0) = (1; 1), x(3) ≈ (1.1787; 1.1893)
38. x(0) = (2; 3), x(2) ≈ (2.013376; 2.984934)
39. x(0) = (5; 2), x(4) ≈ (5.099588; 1.932649)
40. a)x(0) = ( π4 ; 0.8), x(4) ≈ (0.71352; 0.759201); b)x(0) = (− π4 ; 0.8), x(4) ≈ (−0.71352; 0.759201)
(0)
(4)
41. a)x = (2; 1), x ≈ (2.309401; 1.1547); b)x(0) = (2; −1), x(4) ≈ (2.309401; −1.1547); c)x(0) = (−2; 1), x(4) ≈ (−2.309401; 1.1547); d)x(0) =
(−2; −1), x(4) ≈ (−2.309401; −1.1547)
Chapter 2. (Interpolation and approximation)
Aim
Estimate values of an unknown function, estimate values of its derivative and also an integral
Objectives
1. Identify the aims of interpolation.
2. Find the equation of a polynomial function passing through given points.
3. Find the value of a polynomial given by several points.
4. Find the most appropriate polynomial of a low degree which approximates many experimental data.
5. Estimate value of the derivative of a function determined by a table of values.
6. Approximate the value of an integral of a function by approximate formulae.
Prerequisites
polynomials; functions; derivative; minimum of a function; Taylor’s theorem; integral
Chapter 2. (Interpolation and approximation) – 1
Introduction
Basic methods of function approximations are discussed in the chapter. The task is to determine the function or at least its approximate values at several
points, knowing only several other values. Practically, such estimations are required if we have several values of measured dependency between two
physical quantities and we need other value of the function for which the measured datum is not available. Another example includes a case of computer
modelling of a complicated curve which requires a simple algorithm for implementation. Finally, an amount of experimental data with measurement
errors can be available. In such a case, it is not suitable to find a function passing through all given data points, we rather find a simple function and
the data lie on its graph only approximately.
In any case, the function is required to be simple. What is a simple function, is questionable, nevertheless, usually such functions are supposed to be
polynomial with as small degree as possible. Thus, approximation of a function by a polynomial will be discussed in what follows.
We also show how to approximate values of derivative of a function. If a measurements provides us with values of some physical quantity, but we need
to know the other which is a derivative of the measured one. Indeed, many physical relations contain derivatives. Additionally, approximation of the
derivative is useful in other numerical algorithms, e.g. for solving differential equations which will be discussed in the next section.
Finally, the approximate formulae for calculating integrals will be presented. There are several reasons for doing it. First, a lot of common integrals
cannot be evaluated analytically, second, the numerical formula are extensively used in engineering softwares, and, third, many physical relations contain
integrals.
In the first section, we find a polynomial determined by given points. We will search either the equation of the polynomial or only its value. The given
data contain several values of the polynomial. This type of polynomial finding is called Lagrange interpolation. If only a value of the polynomial is
required, an algorithm for the calculation will use finite differences.
In the second section, a low order polynomial approximating prescribed data will be searched for. It is natural to suppose that such a polynomial passes
relatively close to each of the given points so that the distances are the smallest possible. Thus, it is a minimization problem which we will solve by the
well known least square method. It is a very useful method used in numerical mathematics as well as a regression method in mathematical statistics or
a variational method for solving physical problems.
Finally, we show how a classical theorem of calculus – the Taylor theorem – helps us to approximate derivatives of a function. We again use finite
differences to write the approximation formulae in a compact and transparent form. The introduced interpolation polynomials can be used for defining
simple formulae for numerical calculation of integrals. The error estimates, as usually in numerical calculations, form an important part of the numerical
integration: it is useful to know to estimate the error for a calculation and also vice versa, to make a calculation with prescribed accuracy.
[x4 , y4 ]
[x3 , y3 ]
[x0 , y0 ]
[x1 , y0 ]
[x2 , y2 ]
(a)
(b)
Figure 2.1: Interpolation and approximation: (a) Given points, (b) Graphs of appropriate approximative polynomials.
Interpolation and approximation – Introduction to approximation and interpolation
What does it mean to approximate or interpolate a function?
Let there be given several points in a plane. The points have been obtained e.g.q from a measurement of a relation between two physical quantities. The
experiment, however, provides only several points of the dependency, mathematically, several points of the graph of an unknown function f. The aim is
to find the function y=f(x) (its graph, equation or several values), using given points with coordinates [xi , yi ]. The function should pass directly through
the given points or at least at their vicinity. The given points are usually ordered according to the coordinates x, i.e. for n+1 points with coordinates
[xi , yi ], i = 0, 1, . . . n holds x0 < x1 < · · · < xn . If we find f(x) for x∈hx0 ; xn i and require that the function f passes through given points, we speak
about the interpolation, if x is out of given range the correct term is the extrapolation. It should be noted that having the extrapolation any physical
sense, the number x cannot be ‘far’ from the interval hx0 ; xn i. In the case, where the function does not pass directly through given points we rather
speak about the approximation of a function.
The function f is chosen in a simple form, usually a polynomial, so we will utilize polynomial functions in the next sections of the chapter.
Example 2.1 Draw an approximation polynomial which is given by five points at Fig. 2.1(a).
Solution. The polynomial interpolation leads to a fourth degree polynomial, which is uniquely determined by five points. Fig. 2.1(b) shows a graph of
the polynomial in yellow. The fourth degree polynomial has five independent parameters that may be too much. If the degree of the polynomial is lower,
the number of parameters is lower, too. However, in such a case the polynomial cannot satisfy all given data exactly. The blue line in Fig. 2.1 shows a
quadratic function which can be used to approximate prescribed values of an unknown function.
The picture also documents why extrapolation can be considered only at vicinity of the end points. While both approximation curves are relatively close
to each other in the interval between x0 and x4 , out of this interval the functions diverge (e.g. the blue function increases and the yellow function
decreases to the right of the shown picture until infinity), as both functions are polynomial, hence, they cannot have another local extrema.
Interpolation and approximation – Lagrange interpolation
Find the polynomial passing through given points or find its unknown value.
Let there be n+1 ordered points with coordinates [xi , yi ], i = 0, 1, . . . n satisfying x0 < x1 < · · · < xn−1 < xn . These points uniquely define an n-degree
polynomial given by the relation
!
n
n
X
Y
(x − xj )
yi
(2.1)
Ln (x) =
(xi − xj )
i=0
j=0,j6=i
naturally including the properties Ln (xk ) = yk for all k = 0, 1, . . . n. The polynomial is called the Lagrange interpolation polynomial and it can also be
written as
n
X
(x − x0 ) · (x − x1 ) · · · (x − xi−1 ) · (x − xi+1 ) · · · (x − xn )
yi
Ln (x) =
(x
−
x
)
·
(x
−
x
)
·
·
·
(x
−
x
)
·
(x
−
x
)
·
·
·
(x
−
x
)
i
0
i
1
i
i−1
i
i+1
i
n
i=0
=
(x − x1 ) · · · (x − x2 ) · · · (x − xn )
(x − x0 ) · (x − x2 ) · · · (x − xn )
y0 +
y1 +
(x0 − x1 ) · (x0 − x2 ) · · · (x0 − xn )
(x1 − x0 ) · (x1 − x2 ) · · · (x1 − xn )
|
{z
} |
{z
}
i=0
i=1
(x − x0 ) · (x − x1 ) · · · (x − xk−1 ) · (x − xk+1 ) · · · (x − xn )
··· +
yk + · · ·
(xk − x0 ) · (xk − x1 ) · · · (xk − xi−1 ) · (xk − xk+1 ) · · · (xk − xn )
|
{z
}
i=k
(x − x0 ) · (x − x1 ) · · · (x − xn−2 ) · (x − xn )
(x − x0 ) · (x − x1 ) · · · (x − xn−2 ) · (x − xn−1 )
+
yn−1 +
yn . (2.2)
(xn−1 − x0 ) · (xn−1 − x1 ) · · · (xn−1 − xn−2 ) · (xn−1 − xn )
(xn − x0 ) · (xn − x1 ) · · · (xn − xn−2 ) · (xn − xn−1 )
|
{z
} |
{z
}
i=n−1
i=n
Example 2.2 Show that the polynomial Eq. (2.1) satisfies the relation Ln (xk ) = yk for all k = 0, 1, . . . n.
Solution. If we consider the polynomial in the form Eq. (2.2), and if we substitute x=xk , then in each terms up to i=k the numerator contains a factor
(xk −xk ) so that the numerator vanishes. The term for i=k reads
(xk − x0 ) · (xk − x1 ) · · · (xk − xk−1 ) · (xk − xk+1 ) · · · (xk − xn )
yk = yk .
(xk − x0 ) · (xk − x1 ) · · · (xk − xi−1 ) · (xk − xk+1 ) · · · (xk − xn )
As long as it is the only non-vanishing term in the right-hand side of Eq. (2.2), for any xk , we have Ln (xk ) = yk .
Example 2.3 Find the Lagrange interpolation polynomial for points given by the table.
x 1.1
1.2
1.3
y 1.336 1.510 1.698
1.4
1.904
Solution. The four points in the table determine the Lagrange interpolation polynomial of the third degree
L3 (x) =
(x − 1.1)·(x − 1.3)·(x − 1.4)
(x − 1.2)·(x − 1.3)·(x − 1.4)
·1.336 +
·1.510
(1.1 − 1.2)·(1.1 − 1.3)·(1.1 − 1.4)
(1.2 − 1.1)·(1.2 − 1.3)·(1.2 − 1.4)
(x − 1.1)·(x − 1.2)·(x − 1.4)
(x − 1.1)·(x − 1.2)·(x − 1.3)
2
17
901
399
+
·1.698 +
·1.904 = x3 − x2 +
x−
(1.3 − 1.1)·(1.3 − 1.2)·(1.3 − 1.4)
(1.4 − 1.1)·(1.4 − 1.2)·(1.4 − 1.3)
3
10
300
500
The most problems appears in collecting correct coefficients, especially with decimal digits.
If only a value of an unknown function is required, it is not necessary to find the expression for the polynomial. We will show a special case of calculating
the values of the Lagrange interpolation polynomial. Before doing that, we introduce finite differences.
We have n+1 points with coordinates [xi , yi ], i = 0, 1, . . . n which are equidistant with respect to x, i.e. xi −xi−1 =h, h>0 for all i = 1, . . . n. The first
order finite difference ∆yi is called the difference of y coordinates of the neighbour points. The k-th order finite difference ∆k yi is called the difference
of the neighbouring k−1-th order differences. As the differences can be indexed by two ways, we have to choose one of them. Our definition of the
finite differences is written in following expressions:
∆yi = yi+1 − yi ,
∆k yi = ∆k−1 yi+1 − ∆k−1 yi
(2.3)
The differences are usually written to a table of finite differences as it is written in Tab. 2.1 for n=3. The table documents that the maximal order of
the difference, related to the number of points, is n.
Example 2.4 Prepare the table of all differences for the values given by the table in Example 2.3.
Solution. First, realize that the differences according to the relations Eq. (2.3) can be calculated, because the points are x-equidistant x1 −x0 =x2 −x1 =x3 −x2 =0.1.
Thus the table of differences reads:
i
xi
yi
∆yi
∆2 yi
∆3 yi
Table 2.1: Table of finite differences of all possible orders.
0
1
2
x0
x1
x2
y0
y1
y2
∆y0 = y1 − y0
∆y1 = y2 − y1
∆y2 = y3 − y2
∆2 y0 = ∆y1 − ∆y0 ∆2 y1 = ∆y2 − ∆y1
∆3 y0 = ∆2 y1 − ∆2 y0
i
x
y
∆y
∆2 y
∆3 y
3
x3
y3
0
1
2
3
1.1
1.2
1.3
1.4
1.336 1.510 1.698 1.904
0.174 0.188 0.206
0.014 0.018
0.004
Now, return back to the Lagrange interpolation. Again, we have n+1 points with x-equidistant coordinates [xi , yi ], i = 0, 1, . . . n (xi −xi−1 =h, h>0 for
all i = 1, . . . n). A value of the Lagrange interpolation polynomial at the point x can be calculated by the Newton interpolation formula. Various forms
of the formula can be found in literature, we will use its following form:
!
n
i−1
X
Y
q(q − 1) 2
q−j
q(q − 1)(q − 2) 3
q(q − 1) · · · (q − n + 1) n
x − x0
Ln (x) =
∆i y0 = y0 + q ∆y0 +
∆ y0 +
∆ y0 + · · · +
∆ y0 , where q =
.
j
+
1
2!
3!
n!
h
i=0
j=0
(2.4)
Example 2.5 Prove the relation Eq. (2.4) for some small values of n.
Solution. Let us try n=1. Eq. (2.2) for x1 −x0 =h and with substituted q renders
L1 (x) =
x − x0
qh − h
qh
x − x1
y0 +
y1 =
y0 + y1 = y0 + q(y1 − y0 ) = y0 + q∆y0 .
x0 − x1
x1 − x0
−h
h
This is exactly Eq. (2.4) for n=1.
Similarly
(x − x1 )(x − x2 )
(x − x0 )(x − x2 )
(x − x0 )(x − x1 )
y0 +
y1 +
y2
(x0 − x1 )(x0 − x2 )
(x1 − x0 )(x1 − x2 )
(x2 − x0 )(x2 − x1 )
qh(qh − 2h)
qh(qh − h)
(q − 1)(q − 2)
q(q − 2)
q(q − 1)
(qh − h)(qh − 2h)
y0 +
(y0 +∆y0 )+
(y0 +∆y0 +∆y1 ) =
y0 +
(y0 +∆y0 )+
(y0 +2∆y0 +∆2 y0 )
=
(−h)(−2h)
h(−h)
2h·h
2
−1
2
(q − 1)(q − 2) q(q − 2) q(q − 1)
q(q − 1) 2
(q − 2) (q − 1)
q(q − 1) 2
=
+
+
+
∆ y0 = y0 + q∆y0 +
∆ y0 ,
y0 + q
∆y0 +
2
−1
2
−1
1
2
2
L2 (x) =
which is Eq. (2.4) for n=2, because the terms in brackets in the bottom line can be seen as values of the Lagrange interpolation polynomial given by
the points [0, 1], [1, 1], [2, 1] in the case of the first couple of brackets and [1, 1], [2, 1] in the case of the second one. The y coordinates are constant,
thus the Lagrange polynomials are also constant and their values are in both cases unit.
P
Using a similar way, the formula Eq. (2.4) can be proved generally, if an arbitrary yi is expressed by ∆k y0 . The pertinent formula is je yi = ik=0 ki ∆k y0
and it can be proved e.g. by mathematical induction.
Example 2.6 Interpolate the value of the polynomial f at the point x=1.22 from the table in Example 2.3.
Solution. We use the Newton interpolation formula. The table of finite differences has been found in the solution of Example 2.4. Now, calculate the
parameter q for given x: q= 1.22−1.1
=1.2. The values from the first column can be substituted to the interpolation formula together with the parameter
0.1
q to provide
1.2·0.2
1.2·0.2·(−0.8)
.
f(1.22) = 1.336 + 1.2·0.174 +
0.014 +
0.004 = 1.5463.
2
6
The value x can also be substituted to the resulting polynomial in Example 2.3, the result should be the same. For calculating the special values, the New.
ton formula is preferred as it seem to be easier to apply. For a comparison, calculate the value by the substitution. We obtain f(1.22)=L3 (1.22)=1.5463.
Let us realize the accuracy of the interpolation. The y values in the table were obtained as the values of the function y= sinh x for given x, rounded to
.
three decimal digits. The value of the function is sinh 1.22=1.5460 which is pretty close to the calculated approximation. Of course, we do not know
the function in a general case.
Interpolation and approximation – The least square method
Find an approximation polynomial of a low degree for large amount of data.
The values obtained, e.g. from measurements of physical quantities, contain usually some inaccuracies. Thus, such data make us to take the inaccuracy
into account also when searching for an approximative relation between some quantities. The requirements of exact passing of the approximative
polynomial through the prescribed points is then unrealistic. Under such conditions, the number of experimental data points can by even much more
greater than the number of unknown parameters of the approximative function. In our case of the polynomial approximation, the number of parameters
is determined by the degree of the polynomial. The question is how to find such a polynomial. This type of problems is usually solved by the least square
method.
The method is explained with an assumption of polynomial character of unknown function, but it can be used to more general type of functions. However,
having other type of dependency, the resulting equation or a system of equations can be nonlinear.
Let there be N various points with coordinates [xi , yi ], i=1, 2, . . . N and the degree n of the required polynomial. Let us notice that the x coordinates
of the points do not have to be necessarily different, the only requirement is that not all xi are equal to the same value. The degree of the polynomial is
chosen according to the solved problem, but we try to keep it as low as possible so that the number of unknown parameters is also low. Hence, N >n
(usually N n). In the case of N =n+1, the resulting function is the Lagrange interpolation polynomial as it will be clear from the derivation below.
In general, the least square method searches for a function f(x ; a) with the unknown x which minimizes a nonnegative function S
S(a) =
N
X
(yi − f(xi ; a))2
(2.5)
i=1
in the set of all admissible vector parameters a. This also explains the name of the method. The method minimizes sum of squares of distances between
the given points and the points with the same x coordinates lying on the all possible graphs of the functions f(x ; a) with the unknown x and parameters
a. If the parameters â are found so that for all given points [xi , yi ] holds: yi =f(xi ; â), the values of the function S(â) is zero and it must be the
minimum. This is the case of the Lagrange interpolation polynomial for N =n+1 mentioned above.
Let us search for the function f which is defined as a polynomial of the degree n, the unknown is x:
f(x; a) = a0 + a1 x + a2 x2 + · · · + an xn ,
a = (a0 , a1 , . . . , an ).
(2.6)
The function can be substituted into S in Eq. (2.5), leading to
S(a0 , a1 , . . . , an ) =
N
X
yi − a0 − a1 xi − a2 x2i − · · · − an xni
2
.
(2.7)
i=1
f (x) = a0 + a1 x + a2 x2
f (x2 )
y2
x2
Figure 2.2: The least square method.
Minimization of the function S leads to the solution of the following linear equation system, called the normal system
!
!
!
N
N
N
N
X
X
X
X
2
n
N a0 +
x i a1 +
xi a2 + · · ·+
x i an =
yi ,
N
X
!
xi
a0 +
i=1
i=1
N
X
!
x2i a1 +
i=1
i=1
N
X
!
x3i
a2 + · · ·+
i=1
i=1
!
N
X
xn+1
an
i
i=1
=
i=1
N
X
yi xi ,
i=1
(2.8)
..
.
N
X
!
xni
a0 +
i=1
N
X
!
xn+1
i
i=1
a1 +
N
X
i=1
!
xn+2
i
a2 + · · ·+
N
X
!
x2n
i
an =
N
X
i=1
yi xni .
i=1
The chosen assumptions cause that the solution of the system is unique and it determines the coefficients of the approximative polynomial in the least
square sense.
Example 2.7 Derive the system Eq. (2.8).
Solution. The solution of the system Eq. (2.8) minimizes the function Eq. (2.7) for given N points [xi , yi ]. The point of minimum is the stationary point
∂S
=0 for all k=0, 1, . . . , n. It provides
of the first derivative of the function S, it means that ∂a
k
N
0=
X
∂S
=2
yi − a0 − a1 xi − a2 x2i − · · · − an xni (−xki ),
∂ak
i=1
which renders
N
X
!
xki
a0 +
i=1
N
X
!
xk+1
i
a1 +
i=1
N
X
!
xk+2
i
a2 + · · · +
i=1
N
X
!
xk+n
i
an =
i=1
N
X
yi xki ,
i=1
and this is the k-th equation of the system Eq. (2.8).
The stationary point is the unique minimizer, if the second derivative of the function S is a positively definite matrix. This property can be verified, but
we omit this calculation.
Example 2.8 Find a cubic polynomial which approximates values of the function, see the table, in the least square sense. Compare the results with the
linear approximation.
x 0 1
y 1 4
2 3 4 5
5 3 4 7
Solution. The coefficients of the polynomial f3 (x) = a + b x + c x2 + d x3 can be found by solving the normal system. To find the matrix of the system,
several sums have to be calculated from the given N =6 points, namely:
N
X
i=1
xi = 15,
N
X
x2i
= 55,
i=1
N
X
i=1
N
X
x3i
= 225,
i=1
yi = 24,
N
X
i=1
xi yi = 74,
N
X
x4i
= 979,
i=1
N
X
i=1
N
X
x5i
= 4425,
i=1
x2i
yi = 290,
N
X
N
X
x6i = 20515,
i=1
x3i yi = 1256.
i=1
The system of linear equations for the unknown parameters a, b, c, d reads
6a+ 15b+ 55c+ 225d =24,
15a+ 55b+ 225c+ 979d =74,
55a+225b+ 979c+ 4425d =290,
225a+979b+4425c+20515d =1256.
607
95
, b = 108
, c = − 36
,d=
Its solution can be found by any appropriate method. The result is a = 17
18
function f3 , we obtain
17 607
95
19
f3 (x) =
+
x − x2 + x3 .
18 108
36
54
19
.
54
Substituting the values to the approximation
Let us note that the created normal system is usually annoying in general numerical algorithms, because there is a several order difference in the diagonal
terms (the matrix is called ill-conditioned).
The values of the function f3 for given x naturally vary from prescribed y, nevertheless, the differences are small as can also be seen in Fig. 2.3.
8
f5 (x)
0
1
1
1− 18
1+1
1
4
5
4+ 18
4− 65
2
5
5− 59
5− 75
3
3
3+ 95
3+ 57
4
4
5
4− 18
4+ 65
5
7
1
7+ 18
7−1
y
x
y
f3 (x)
f1 (x)
f3 (x)
f1 (x)
6
4
2
0
0
1
2
3
4
5
x
Figure 2.3: The least square approximation by polynomials of various degree.
If the least square linear polynomial f1 were searched for, the system would be simpler, but the approximation of the function would be coarser. The
normal system for unknown parameters a, b in a linear function y=a+bx reads
6a+15b =24,
15a+55b =74,
and its solution is a=2, b= 45 . The table and graph in Fig. 2.2 document that the differences with respect to given points are greater. The accuracy can
q
also be estimated from the values of the function S from Eq. (2.7) at both minima, or from the average size of a square δ= NS , which provide
2 2 2 2 2 2
1
5
10
10
5
1
7
S3 = −
+
+ −
+
+ −
+
= ,
18
18
18
18
18
18
9
6
S1 = (1) + −
5
2
2
2 2 2
7
7
6
44
+ −
+
+
+ (−1)2 = ,
5
5
5
5
r
δ3 =
r
δ1 =
7
= 0.3600
54
44
= 1.2111
30
Finally, we stress that for N =6 the fifth degree approximation polynomial is the Lagrange interpolation polynomial and it passes through all given points
as it can be seen also in Fig. 2.3. There is no reason to search for a higher degree polynomial.
Interpolation and approximation – Approximation of the derivative
Estimate the value of derivative of a function given by several points.
A derivative of a function may define a physical quantity which is not available in measurements. Thus, it is also useful to estimate the derivative values
at least at some points of the function, e.g. at those where the functional values are known. The approximative calculation of the derivative is also a
useful tool for solving differential equations, especially of those which cannot be solved analytically.
The formulae for calculation of derivatives are based on the Taylor theorem, recall it. Let the function f be continuously differentiable n+1 times in the
vicinity of the point x0 hx0 − δ; x0 + δi and let x be an internal point of the interval. Then there exists a point ξ lying between x0 and x such that the
Taylor formula holds:
(n)
(n)
f(x) = Tf (x) + Rf (x) = f(x0 ) + f 0 (x0 ) (x − x0 ) +
1
1
1 00
f (x0 ) (x − x0 )2 + f 000 (x0 ) (x − x0 )3 + · · · + f (n) (x0 ) (x − x0 )n
2!
3!
n!
1
f (n+1) (ξ) (x − x0 )n+1 . (2.9)
+
(n + 1)!
(n)
Let us stress that ξ in the last term, denoted also Rf (x), is unknown. Thus, the term represents an error which is admitted if the value of the function
(n)
(n)
f(x) is replaced by the value of the Taylor polynomial Tf (x) at the point x. The term Rf (x), however, can usually be estimated and then it can
provide an idea of the accuracy of the approximation by the Taylor polynomial.
Example 2.9 Estimate the value sin 0.2 using some Taylor polynomial such that the error of the estimation is smaller than ε=0.001.
Solution. First, an appropriate x0 should be found such that (all) the derivatives of the function y= sin x are known and that x0 lies close to 0.2. The
function y= sin x can be differentiated so that
(sin x)(4k+1) = cos x,
(sin x)(4k+2) = − sin x,
(sin x)(4k+3) = − cos x,
(sin x)(4k) = sin x, for arbitrary natural number k.
Thus, we obtain at the point x0 =0
(sin x)(4k+1) |x=0 = 1,
(sin x)(4k+2) |x=0 = 0,
(sin x)(4k+3) |x=0 = −1,
(sin x)(4k) |x=0 = 0.
(n)
So far, we do not know which n to choose. Nevertheless, in the error term Rf (x), there appears f (n+1) (ξ), which can be easily estimated as |f (n+1) (ξ)|≤1.
Thus, we know that
1
0.2n+1
(n)
|Rf (0.2)| ≤
| (0.2 − 0)n+1 | =
.
(n + 1)!
(n + 1)!
(n)
We want the error to be less than 0.001 so we have to find n with |Rf (0.2)|<0.001. It can be easily verified that
sufficient to choose n=3 to render
0.23
>0.001
3!
and
0.24
<0.001.
4!
So, it is
1
1
1
.
sin 0 · 0.22 − cos 0 · 0.23 = 0.2 − 0.008 = 0.198667,
2
6
6
.
which is the value we are looking for. To compare, we use a calculator to obtain the value and to see the accuracy: sin 0.2 = 0.198669. If we additionally
try to put n=2
1
sin 0.2 ≈ T(2) (0.2) = sin 0 + cos 0 · 0.2 − sin 0 · 0.22 = 0.2,
2
we can see that the error is greater than 0.001.
sin 0.2 ≈ T(3) (0.2) = sin 0 + cos 0 · 0.2 −
The Tyalor polynomial can be used to find general formulae for the approximation of the derivative of a function at some of its points. The idea of the
approximation formulae can be seen in the following example for n=1 in Eq. (2.9). We substitute x for x0 and x+h for x using a suitably small number
h (usually hx). We obtain
00 f(x + h) − f(x)
f (ξ) f 00 (ξ) 2
0
0
h < C h,
h
and after a rearrangement
− f (x) = (2.10a)
f(x + h) = f(x) + f (x)h +
2
h
2 where the boundedness of the second derivative of the given function by an unknown constant C is used. Though, the constant is unknown, the formula
determines the approximation order, which is given by the power of h. In fact, the relation states that if h is halved, the error in the approximation is
also halved, because the approximation formula can be written as
f 0 (x) ≈
f(x + h) − f(x)
h
(2.10b)
and the term Ch provides the error of the approximation. Let us realize that if the error is generally expressed as Chn (the approximation order n), for
n>1, the formula is better: for small h (h<1) is hn <h. Of course, the constants C may be and do be different.
A similar way can be used to derive other formulae for the first order derivative approximation as follows:
f 0 (x) ≈
f(x + h) − f(x − h)
,
2h
f 0 (x) ≈
−f(x + 2h) + 4f (x + h) − 3f (x)
,
2h
f 0 (x) ≈
f(x − 2h) − 4f (x − h) + 3f (x)
.
2h
(2.11)
In these relations, the approximation order is 2 (the error is proportional to h2 ). Also the formulae for approximation of higher order derivatives can be
obtained from the Taylor theorem, e.g. for the second derivative, we can have a second order formula (h2 dependent error)
f 00 (x) ≈
f(x + h) − 2f (x) + f(x − h)
.
h2
(2.12)
Example 2.10 Derive the formula Eq. (2.12) and the last relation in Eq. (2.11).
Solution. First, we discuss the second derivative formula. We use Eq. (2.9) for n=3 which is written twice for x+h and for x−h
f 00 (x) 2 f 000 (x) 3 f IV (ξ+ ) 4
h +
h +
h,
f(x + h) = f(x) + f (x)h +
2
6
24
f 00 (x) 2 f 000 (x) 3 f IV (ξ− ) 4
f(x − h) = f(x) − f 0 (x)h +
h −
h +
h.
2
6
24
0
If both equations are added and rearranged, we come to
f IV (ξ+ ) 4 f IV (ξ− ) 4
h +
h
f(x + h) + f(x − h) = 2f (x) + f (x)h +
24
24
00
2
a teda
f(x + h) − 2f (x) + f(x − h)
00
< C h2
−
f
(x)
h2
which provides the approximation relation
f 00 (x) ≈
f(x + h) − 2f (x) + f(x − h)
h2
and also the order of the approximation 2 (the term Ch2 ).
Similarly, the other formula is obtained, if Eq. (2.9) for n=2 is written at x−2h and also at x−h
f 00 (x) 2
f 000 (ξ2 ) 3
h −8
h,
2
6
f 00 (x) 2 f 000 (ξ1 ) 3
f(x − h) = f(x) − f 0 (x)h +
h −
h.
2
6
f(x − 2h) = f(x) − 2f 0 (x)h + 4
The equations now have to by combined so that the terms containing the second derivative are eliminated. We see that it is sufficient to multiply the
second equation by (−4) and add it to the first one. A rearrangement of the terms renders
f(x − 2h) − 4f (x − h) + 3f (x)
f 000 (ξ2 ) 3
f 000 (ξ1 ) 3
0
0
f(x − 2h) − 4f (x − h) = −3f (x) + 2f (x)h − 8
h +4
h
a teda
− f (x) < C h2 .
6
6
2h
Again, we have an approximation formula with the order of the approximation 2. It can be written as follows:
f 0 (x) ≈
f(x − 2h) − 4f (x − h) + 3f (x)
.
2h
All mentioned formulae can be expressed in terms of the finite differences and the table of differences Tab. 2.1 provided that the values of the function
y = f(x) are given at several x-equidistant points, i.e. the coordinates [xi , yi ], i=0, 1, . . . n, with the property xi+1 −xi =h are known. The simplest case
is in the relation Eq. (2.10b), where x=xi , x+h=xi+1 and yi =f (xi )
f 0 (xi ) ≈
f(xi+1 ) − f(xi )
yi+1 − yi
∆yi
=
=
.
h
h
h
(2.13)
Similarly, other formulae in the relations Eqs. (2.11), (2.12) can be expressed by the finite differences. We obtain in the respective order
f 0 (xi ) ≈
∆yi + ∆yi−1
,
2h
f 0 (xi ) ≈
−∆yi+1 + 3∆yi
,
2h
f 0 (xi ) ≈
−∆yi−2 + 3∆yi−1
,
2h
f 00 (xi ) ≈
∆2 yi−1
.
h2
(2.14)
Example 2.11 Derive the last two relations in Eq. (2.14).
Solution. In the first case, the last relation in Eq. (2.11) is used. We substitute x=xi and obtain
f 0 (xi ) ≈
f(xi − 2h) − 4f(xi − h) + 3f(xi )
f(xi−2 ) − 4f(xi−1 ) + 3f(xi )
yi − 2 − 4yi−1 + 3yi
−yi−1 + yi − 2 + 3yi − 3yi−1
−∆yi−2 + 3∆yi−1
=
=
=
=
.
2h
2h
2h
2h
2h
The other formula is rendered, if the relation Eq. (2.12) is used with x=xi :
f 00 (xi ) ≈
yi+1 − 2yi + yi−1
∆yi − ∆yi−1
∆2 yi−1
f(xi + h) − 2f(xi ) + f(xi − h)
=
=
=
.
h2
h2
h2
h2
Having in mind these simple difference formulae, we can correctly guess a relation between the derivative and the difference: the k-th order finite
difference divided by hk approximates the value of the k-th order derivative at the pertinent point. Additionally, an appropriately weighted average of
the various differences of the same order may increase the approximation order of any formula as e.g. in (2.14)3 .
Example 2.12 Estimate the values of the first and the second derivatives of the function y= tanh x at the point x = 1.12 using appropriate difference
formulae.
Solution. We choose h=0.01 and use the following approximation relations for the derivatives:
f 0 (xi ) ≈
∆yi
(with an error ∼ h),
h
f 0 (xi ) ≈
−∆yi+1 + 3∆yi
(∼ h2 ),
2h
f 00 (xi ) ≈
∆2 yi−1
(∼ h2 ).
h2
As the last formula should have the accuracy dependent on h2 , the second order difference needs accuracy h4 . Thus, we calculate the values of the
function y= tanh x with nine decimal digits. The table of finite differences is prepared for N =3 with x0 = 1.11 in order to have appropriate xs for the
second order derivative estimation.
x
y
∆y
∆2 y
∆3 y
1.11
1.12
1.13
1.14
0.804062391
0.807568917 0.811019262 0.814414094
0.003506526
0.003450345 0.003394832
−0.000056181 −0.000055513
0.000000668
Numerical estimates of the derivatives using aforementioned formulae are then
f 0 (1.12) = f 0 (x1 ) ≈
f 0 (1.12) = f 0 (x1 ) ≈
∆y1
= 0.345 (with an error ∼ 0.01),
0.01
−∆y2 + 3∆y1
−0.003394832 + 3·0.003450345
=
= 0.34781 (with an error ∼ 0.0001),
0.02
0.02
∆2 y0
= −0.56181 (with an error ∼ 0.0001).
f 00 (1.12) = f 00 (x1 ) ≈
0.0001
We can also compare the results with analytical differentiation of the function f:y= tanh x at the given point
f 0 (1.12) =
1
.
= 0.347832445,
2
cosh 1.12
f 0 (1.12) = −
2 sinh 1.12 .
= −0.561797341.
cosh3 1.12
The result shows that the actual errors have not exceeded the values of expected errors, though they were defined only by the orders in the error
estimation.
Interpolation and approximation – Numerical integration
Evaluate an integral which cannot be calculated analytically or the function is given only in several points
In general, we are not able to find exactly the primitive function of a given function f (x). In some cases, this calculation may be too complicated. In
engineering practice, we mostly meet the problem of calculating the definite integral and then, in specific applications, it is possible to calculate only its
approximate value with an appropriately small error.
Rb
For the numerical computation of definite integrals a f (x) dx, where the function f (x) is continuous and sufficiently smooth in interval ha, i >, we
list at this point the two most commonly used methods in practice, namely the trapezoidal and the Simpson method. The second one is a bit more
complex, but on the other side it is also more accurate.
First, we discuss the trapezoidal method. Let’s consider dividing of interval ha, bi to N equal parts and let’s consider values of function in N +1 nodal
points xi , i = 1, ..., N + 1
fi = f (xi ), i = 1, ..., N + 1,
xi = a,
h = b−a
N
xi+1 = xi + h,
i = 1, ..., N.
where
The rule for approximate calculation of definite integral IL using the trapezoidal method is given by the following way:
Z b
f (x) dx ≈ IL , where
a
IL =
h
(f1 + 2f2 + 2f3 + ... + 2fN + fN +1 ).
2
(2.15)
Geometric interpretation of the trapezoidal method is shown in Fig. 2.4. The value IL in this case represent the full area of eight trapezoids (N = 8).
The first trapezoid is given by the points [x1 , x2 , f2 , f1 ], the second one by [x2 , x3 , f3 , f2 ], and so on.
If the function f (x) is two times continuously differentiable in (a, b), it is possible to prove that there exists a point ξ∈(a, b), for which holds:
Z b
f (x) dx = IL + RL
a
RL = −
(b − a)3 00
f (ξ)
12N 2
The other method is the Simpson method. Let’s consider the same division of the interval as in the trapezoidal method with the same notation for the
nodes and for the values of the function.
The rule for approximate calculation of definite integral IS using the Simpson method (defined for even N only), is given as follows:
Z b
f (x) dx ≈ IS , where
a
Figure 2.4: Approximation of definite integral using trapezoidal method.
IS =
h
(f1 + 4f2 + 2f3 + 4f4 + 2f5 + 4f6 + ... + 4fN + fN +1 ).
3
(2.16)
Geometric interpretation of the Simpson method is shown in Fig. 2.5. The value IS represent in this case the total area under four parabolic sections. The
first parabolic curve is given the points [f1 , f2 , f3 ], the second one by [f3 , f4 , f5 ], and so on. If the function f (x) is four times continuously differentiable
Figure 2.5: Approximation of definite integral using the Simpson method.
in (a, b), it is possible to prove that there exists a point ξ∈(a, b), for which holds:
Z
b
f (x) dx = IS + RS
a
RS = −
(b − a)5 IV
f (ξ)
180N 4
Example 2.13 Calculate the approximate value of the integral
Z
1.8
ln(1 + 2x) dx
0
in case N = 6 and estimate the error of the result using both methods: a) trapezoidal, b) Simpson.
Solution. a) First, we estimate the error of the result, so we calculate f 00 (x) = −4(2x + 1)−2 . Because f 00 (0) = −4; f 00 (1.8) = −0.189 and f 00 (x) is
monotonous in h0; 1.8i, holds: max |f 00 (x)| = 4. Then for determination of the error’s estimation RL holds:
|RL | ≤
(b − a)3
1, 83
00
max
|f
(x)|
=
· 4 = 0.054 ≤ 0.06 = ε
12N 2
12 · 62
and thus the error of the obtained result will be less than 0.06.
Then, the approximate value of the integral can be calculated by the formula (2.15), with division of the interval h0; 1, 8i into N = 6 equal parts, and
after finding 7 nodal points, we calculate the appropriate functional values. The results are presented in Tab. (2.2). Substituting values fi from Tab. 2.2
Table 2.2:
i
xi
fi
1
2
3
4
5
6
7
0 0.3
0.6
0.9
1.2
1.5
1.8
0 0.47 0.788 1.03 1.224 1.386 1.526
into formula (2.15) we get:
IL =
0.3
(0 + 2 · 0.47 + 2 · 0.788 + 2 · 1.03 + 2 · 1.224 + 2 · 1.386 + 1.526) = 1.6983
2
b) First, we estimate the error of the result, so we calculate f IV (x) = −96(2x + 1)−4 . Similarly to the previous case, we get: max |f IV (x)| = 96. Then
for determination of the error’s estimation RS , we obtain
|RS | ≤
1.85
(b − a)5
IV
max
|f
(x)|
=
· 96 = 0.0077 ≤ 0.008 = ε
180N 4
180 · 64
and thus the error of the integral result will be less than 0.008.
Substituting values fi from Tab. 2.2 into formula (2.16) we get:
IS =
0.3
(0 + 4 · 0.47 + 2 · 0.788 + 4 · 1.03 + 2 · 1.224 + 4 · 1.386 + 1.526) = 1.70943
3
The exact value of the integral (in this case can be calculated analytically) is:
Z 1.8
Z
1 4.6
1
ln(1 + 2x) dx =
ln(z) dz = [z · ln(z)]41 .6 = 0.5 · [4.6 · ln(4.6) − 3.6] ≈ 1.70993.
2 1
2
0
The result obtained by the Simpson method is closer to this value, as expected.
Example 2.14 Calculate the approximate value of the integral, satisfying the accuracy ε = 0.001 using both method a)trapezoidal, b) Simpson
Z 0.5
2
e−x dx
0
2
Solution. a) First we determine the smallest N satisfying the given accuracy. So we calculate f 00 (x) = 2e−x (2x2 − 1). It is easy to show that in the
given interval is max |f 00 (x)| = 2. As for the calculation of the error’s estimation RL we have
(b − a)3
max |f 00 (x)| < ε,
12N 2
and as |RL | < ε and |RL | ≤
(b−a)3
12N 2
max |f 00 (x)|, it renders
r
N>
(b − a)3
max |f 00 (x)| =
12 · ε
r
0.5
· 2 = 4.56.
12 · 0.001
Table 2.3:
i
xi
fi
1
2
3
4
5
6
0 0.1
0.2
0.3
0.4
0.5
1 0.99 0.9608 0.9139 0.8521 0.7788
Let’s put N = 5. Then, the approximate value of the integral is again possible to calculate by the formula (2.15) with division of the interval h0; 1.8i into
N = 5 equal parts and after finding 6 nodal points, we calculate the appropriate functional values. The results are presented in Tab. (2.3). Substituting
values fi from Tab. 2.2 into formula (2.15), we get that IL = 0.46062.
b) Similarly, we determine the smallest N satisfying the given accuracy. So we calculate f IV (x) = 4 exp −x2 (3 − 12x2 + 4x4 ). It is easy to show that
in the given interval is max |f IV (x)| = 12. As for calculation of the error’s estimation RS we have
(b − a)5
max |f IV (x)| < ε,
180N 4
and because |RS | < ε and |RS | ≤
(b−a)5
180N 4
max |f IV (x)|, the estimate of N reads
r
5
4 (b − a)
N>
max |f IV (x)|
180 · ε
From the last inequality, provides the sufficient N : in the following calculation we put N = 2. Substituting values fi into formula (2.16), we get that
IS = 0.25
(1 + 4 · 0.9394 + 0.7788) = 0.4614.
3
Rb
If we calculate the definite integral a f (x) dx by either trapezoidal, or the Simpson method, a problem to find the required number of nodal points
can sometimes be a problem with determining the maximum for the second, or fourth derivatives of the function, respectively. To ignore that difficulty,
in some cases, in practice, we often use so called the re-computing method. It is base on the fact that for any choice of the first division of N using
either trapezoidal or the Simpsons method, we calculate the approximate value of the definite integral IN and next I2N with number of nodal points
2N . If holds: |I2N − IN | < ε, then as the approximate solution we can consider the value I2N . Otherwise, we calculate the following approximation I4N
with number of nodal points 4N . This process is repeated until for some k = 2, 3, 4, ... holds: |I2k N − I2(k−1) N | < ε and then we consider I2k N as the
approximate value of the integral.
Example 2.15 Calculate the approximate value of the integral, satisfying the accuracy ε = 0.01 by the re-computing method
Z π
2
√
cos x 3 − x dx
0
using both method: a)trapezoidal, b) Simpson.
Solution. Tab. 2.4 contains the values of the given function (of the given problem) in the sufficient nodal points.
a) In the case of the starting choice N = 2, we get:
I2 =
and in case N = 4:
I4 =
π
(1.732 + 2x1.052 + 0) = 1.5064
8
π
(1.732 + 2x1.492 + 2x1.052 + 2x0.517 + 0) = 1.5421.
16
Because |I4 − I2 | > 0.01, we have to continue and calculate next approximation I8 = 1.5508. Now, |I8 − I4 | < 0.01 and the approximation I8 = 1.5508
can be considered as sufficiently accurate approximation of finding solutions with respect to the required accuracy.
Table 2.4:
i
xi
fi
1
2
3
4
5
6
7
8
9
π
π
π
π
π
π
π
π
π
1 16
2 16
3 16
4 16
5 16
6 16
7 16
8 16
9 16
1.732 0.642 1.492 1.291 1.052 0.789 0.517 0.249 0
b) In the case of the starting choice N = 2, we get:
I2 =
π
(1.732 + 4x1.052 + 0) = 1.551
12
and in the case N = 4:
π
(1.732 + 4x1.492 + 2x1.052 + 4x0.517 + 0) = 1.554
24
Because |I4 − I2 | < 0.01, the approximation I4 = 1.554 can be considered as sufficiently accurate approximation of the integral with respect to the
required accuracy.
I4 =
1. Think about other type of interpolation than the Lagrange polynomial!
2. Can you interpret the equation (2.2) by the ‘superposition principle’ ?
3. Implement the derivation from Example 2.5 for n=3! Can you apply the scheme for an arbitrary n? This requires the formula for yi mentioned at
the end of the example. Prove it!
4. Try to implement the least square method to an approximation by other function than a polynomial!
5. What is the highest polynomial degree which can be found by a least square approximation with n given points?
6. Prove the rest of the relations in Eq. (2.11)! Do you find similar relations for higher order derivatives? Can you write the found formulae by the
differences?
7. Describe the basic difference between both, the trapezoidal and Simpson method in terms of reached accuracy of calculation! What is the main
reason of greater difficulty for estimating the error in the Simpson method with respect to the trapezoidal one?
8. What is the main benefit of the re-computing method?
In the problems 1. to 4., find the Lagrange interpolation polynomial determined by the prescribed points.
1.
x
y
-1 0 2
.
6 7 15
2.
x 0 1 3 6
.
y 3 2 6 -9
3.
x
y
-1 0 1
.
2 1 2
4.
x -2 0 1 3
.
y 27 5 6 2
In the problems 5. to 8., apply the Newton interpolation formula to interpolate the value of the function at the point x0 , using all given data.
5. x0 = 1.25,
x 1.1 1.2 1.3 1.4
.
y 0.89 0.93 0.96 0.95
6. x0 = 107,
105
110
115
120
x 100
.
y 1.094 1.3151 1.504 1.782 2.221
x 1.1
1.2
1.3
.
y 0.371 0.370 0.382
7. x0 = 1.15,
x 1000 1010 1020 1030 1040
.
y 3.000 3.143 3.386 3.737 3.933
8. x0 = 1002,
In the problems 9. to 12., find the linear and the quadratic functions in the least square sense.
9.
x
y
0 1
0 0.9
2
3
4
.
1.9 3.1 4.2
11.
x
y
2 3 5 7 8
.
8 10 15 22 25
10.
x -2 -1
0
y 4.8 0.9 1.1
1
2
.
2.1 4.2
12.
x 16 17 19 22 23 23 24 25 25 27
.
y 51 57 55 62 61 66 72 74 77 82
In the problems 13. to 16., for a function given by the table of values, estimate the first and the second order derivatives at the first two points of the
pertinent table. Use various difference formulae.
13.
x
y
0
0.1
0.2
0.3
0.4
.
0.001 0.096 0.182 0.262 0.334
14.
x 1.1 1.2 1.3 1.4
.
y 0.89 0.93 0.96 0.95
15.
x
y
100
105
110
115
120
.
1.094 1.3151 1.504 1.782 2.221
16.
1
1.05
1.1
1.15
1.2
x
.
y 2.719 2.863 3.002 3.157 3.321
In the problems 17. to 20., estimate values of the first and the second order derivatives of the function f at the point x0 by various difference formulae.
17. f(x) = ln(1 + x),
19. f(x) = sin x,
x
18. f(x) = x ,
x0 = 0.1 .
x0 = 1.3 .
x0 = 1.02 .
20. f(x) = cosh x,
x0 = 4.24 .
In the problems 21. to 23., calculate the integrals by the trapezoidal method for N = 8 and estimate the error.
Z
1
21.
2
ln(1 + x ) dx .
0.2
Z
22.
1
3.4
√
Z
1 + x3 dx .
23.
0
0.8
ex
dx .
x+4
In the problems 24. to 26., calculate the integrals by the trapezoidal method with the accuracy ε = 0.01.
Z
3
24.
√ √x
xe dx .
Z
5
25.
1
(2x +
√
3
Z
1 + 4x) dx .
0
2
xe3x dx .
26.
−0.3
4
In the problems 27. to 29., calculate the integrals by the Simpson method for N = 6 and estimate the error.
Z
0
27.
−3
Z
1
dx .
x+5
2
28.
−0.5
Z
cos x
dx .
x+2
0.6
2
ex dx .
29.
0
In the problems 30. to 32., calculate the integrals by the Simpson method with the accuracy ε = 0.001.
Z
3
30.
√
5x − 2 dx .
Z
0
31.
e
2+3x
Z
1
( + cos 2x) dx .
x
0.5
√
32.
dx .
−2
2
1.5
1
In the problems 33. to 39., calculate the integrals with the given accuracy.
Z
0.7
2
33.
sin x dx;
Z
ε = 0.001 .
1.2
2
x sin 4x dx;
Z
ε = 0.05 .
−1
39.
0
cos x dx;
Z
ε = 0.005 .
35.
−0.3
36.
Z
2
34.
0
Z
0.1
1
37.
0
ln 2 + x2
dx;
2 + x2
1
x
e cos dx;
2
x
x − x2 dx;
ε = 0.001 .
0.1
Z
ε = 0.03 .
1
38.
0.5
arctan x
dx;
x
ε = 0.001 .
ε = 0.01 .
Conclusion
Basic methods of function approximations were discussed in the chapter. We learned to find a polynomial determined by several points, we can find either
the equation of the polynomial or the function values. The presented type of interpolation, called Lagrange, uses only the functional values for finding
the polynomial equation. Then we searched for a polynomial of a small degree which approximates a lot of experimental data. We learned how to use the
very important least square method for polynomial approximations. This method we revisit once more within this introductory course to mathematical
statistics. Finally, a method for approximation of the derivative of a function was presented, leading to formulae derived from the application the Taylor
theorem. We also tried to explain basic principles of numerical integration, based in the present case on the Lagrange interpolation.
References
[1] G. I. Marčuk Metody numerické matematiky. Academia, Praha, 1987.
[2] A. Ralston Základy numerické matematiky. Academia, Praha, 1978
Problem solutions
1. x2 +2x+7
2. − 25 x3 + 13
x2 − 16
x+3
3. x2 +1
4. −x3 +3x2 −x+5
5. 0.9481
6. 1.3866
7. 0.3689
5
5
2
2
10. f(x)=0.9143x +0x+0.7914
11. f(x)=0.1531x +1.3540x+4.6075
12. f(x)=2.6266x+7.6526
8. 3.0301
9. f(x)=1.06x−0.1
17. e.g. for h=0.01 according to (2.14)2 f 0 (0.1)=0.909, according to (2.14)4 f 00 (0.1)=0.827
18. e.g. for h=0.01 according to (2.14)2 f 0 (0.1)=1.041,
according to (2.14)4 f 00 (0.1)=2.062
19. e.g. for h=0.01 according to (2.14)2 f 0 (0.1)=0.268, according to (2.14)4 f 00 (0.1)=−0.964
20. e.g. for
2
3
3x(4+x )
)
, ε ≈ 0.002, I ≈ 0.2618
22. y 00 = (4(1+x
h=0.01 according to (2.14)2 f 0 (0.1)=34.696, according to (2.14)4 f 00 (0.1)=34.711
21. y 00 = 2(1−x
3 )1 .5) ,
(1+x2 )2
1
2
00
x
2
ε ≈ 0.05, I ≈ 8.578
23. y = e ( (4+x) − 2(x + 4) + (4+x)3 ), ε ≈ 0.02, I ≈ 0.276
24. N = 9, I ≈ 11.933
25. N = 1, I ≈ 11.665
24
27. y (4) = (5+x)
28. ε ≈ 0.0005, I ≈ 0.64852
29. ε ≈ 0.000015,I ≈ 0.6804991
30. N = 2, I ≈ 3.2326
5 , ε ≈ 0.0008, I ≈ 0.9164
31. N = 20, I ≈ 2.457
33. lich. N = 8, I ≈ 0.1132, Simpson N = 4, I ≈ 0.1124
35. lich. N = 8, I ≈ 0.1756, Simpson N = 4, I ≈ 0.1756
38. I ≈ 0.4287
39. I ≈ 0.359
Chapter 3. (Differential equations)
Aim
Solve the ordinary differential equations of the first order with the initial condition.
Objectives
1. Introduce common principles of numerical solution for solving differential equations.
2. Define and apply general m-point Runge-Kutta methods.
3. Apply the general method to particular cases of the Euler, Heune and Runge-Kutta algoritms.
Prerequisites
the first order ordinary differential equation; approximation of derivative
Differential equations – Numerical methods for solving of the Cauchy problem
Many physical quantities are bound by differential relations, the analysis of them shows how the dependent quantity can be expressed.
The solution of differential equations with initial conditions is usually denoted as the Cauchy problem. We will restrict the term to an ordinary differential
equation of the first order u0 = f (t, u), where u=u(t) is the searched real function of one real variable that satisfies the initial condition u(t0 ) = u0 .
The searched function cannot be determined analytically using a numerical method, for example in an implicit form as a general solution in the chosen
interval, but only as a set of isolated discrete nodal points. The searched functional values in these nodal points are calculated only approximately.
Chapter 3. (Differential equations) – 1
Let’s consider a differential equation in the form
u0 (t) = f (t, u(t)),
u(t0 ) = u0 .
with the initial condition
for
t ≥ t0 ,
(3.1)
As in the numerical solution we cannot solve the equation up to infinity, let’s consider dividing of an interval < t0 ; T > into N equal parts with the step
h with the following specification:
T − t0
,
tn+1 = tn + h;
n = 0, 1, 2, ..., N − 1.
h=
N
This process is called as the discretization of given area, in this case of the closed interval < t0 ; T >, to the set of discrete nodal points tn , n = 0, 1, 2, ..., N .
The role of a numerical method is then to determine the approximate values of the searched function u=u(t) in these nodal points exactly. The most
commonly used methods in this area are so called m-points Runge-Kutta methods, which in practical algorithms we restrict only to the first three
variants: one-, two- and four-points.
First, we briefly describe basic principles of the m-point Runge-Kutta methods in general. Let’s consider again the differential equation of the form
u0 (t) = f (t, u(t)) with the initial condition u(t0 ) = u0 in < t0 ; T >. We look the solution for given T and N in < t0 ; T > in points tn+1 = tn + h; where
0
n = 0, 1, 2, ..., N − 1 and h = T −t
. The m-points Runge-Kutta methods for solving the aforementioned Cauchy problem are defined using following
N
sequence of iterative steps in the form:
u0 = u(t0 )
un = un−1 + h
m
X
γi Kin ;
n = 1, 2, ..., N
i=1
where
Kin = f
K1n = f (tn−1 , un−1 )
!
i−1
X
tn−1 + hαi , un−1 + h
βiL KLn ;
i = 2, 3, ..., N.
L=1
The constants γi , αi and βiL referred in previous relations are clearly shown in Tab. 3.1.
These constants play an important role in achieving both the speed and the accuracy of the Runge-Kutta methods. In the case m = 1, we get one
equation with one variable and γ1 = 1 is determined exactly. This method is the oldest one and its name is the Euler method, too. In the case m > 1,
there are already infinitely many solutions and there is also an infinite number of methods of this type. For example, in case m = 4 and for the choice
of the coefficients referred in Tab. 3.2, we get perhaps the most popular method used for solving the Cauchy problem, and this method in particular
Table 3.1:
α2
α3
.
.
.
αm
β21
β31
.
.
.
βm1
γ1
β32
βm2
γ2
0.5
0.5
1
.
.
βm3
γ3
.
.
.
.
βm(m−1)
γm
Table 3.2:
0.5
0 0.5
0
0 1
1
6
1
3
1
3
1
6
is usually referred to as the Runge-Kutta method. It is more accurate than the Euler method, but also requires quite a bit more work. Between them,
there is still the Heune method, which we get, if we put m = 2.
Now, look at the simple types of these methods. We start with the Euler method. As we have learned above, the Euler method is actually the simplest
case of the generally defined Runge-Kutta methods in the choice of both parameters, m = 1 and γ1 = 1. The sequence of steps for calculating all
iterations can be written as follows:
un = un−1 + h · f (tn , un );
n = 1, 2, ..., N ;
where
u0 = u(t0 );
h=
T − t0
.
N
Example 3.1 Solve the following Cauchy problem using the Euler method
u
u0 (t) = 3te−t − u − ,
t
u(1) = 2
in < 1; 2 > with step h = 0.25 (N = 4).
Solution. Our role is to compute step by step the following values, based on the specified parameters of this problem: u1 , u2 , u3 and u4 as the approximate
values of the unknown function u(t) at points: t1 = 1.25, t2 = 1.5, t3 = 1.75 and t4 = 2. Thus:
if n = 1 then u1 = u0 + hf (t0 ; u0 ) = 2 + 0.25f (1; 2) = 2 + 0.25(−2.8964) = 1.2759
if n = 2 then u2 = u1 + hf (t1 ; u1 ) = 1.2759 + 0.25f (1.25; 1.2759) = 1.2759 + 0.25(−1.2222) = 0.9703
if n = 3 then u3 = u2 + hf (t2 ; u2 ) = 0.9703 + 0.25f (1.5; 0.9703) = 0.9703 + 0.25(−0.6132) = 0.8171
if n = 4 then u4 = u3 + hf (t3 ; u3 ) = 0.8171 + 0.25f (1.75; 0.8171) = 0.8171 + 0.25(−0.3716) = 0.7241
All results are summarized in Tab. 3.3.
Table 3.3:
n
tn
un
0
1
2
3
4
1 1.25
1.5
1.75
2
2 1.2759 0.9703 0.8171 0.7241
The next algorithm is the Heune method. The sequence of steps for calculating all iterations using the Heune method is given by the parameter m = 2
and the choice of the coefficients α2 = β21 = 1 and γ1 = γ2 = 0, 5. It can be written as follows:
h
un = un−1 + (K1n + K2n );
2
n = 1, 2, ..., N ;
where
K1n = f (tn−1 ; un−1 )
K2n = f (tn−1 + h; un−1 + hK1n )
T − t0
u0 = u(t0 );
h=
.
N
Example 3.2 Solve the following Cauchy problem using the Heune method
u
u0 (t) = 3te−t − u − ,
t
u(1) = 2,
in < 1; 2 > with step h = 0.25 (N = 4).
Solution. Our role is to compute step by step following values based on the specified parameters of this problem: u1 , u2 , u3 and u4 as the approximate
values of the unknown function u(t) at the points: t1 = 1.25, t2 = 1.5, t3 = 1.75 and t4 = 2 similarly as in previous example, but we have to calculate
additionally a couple coefficients K1 and K2 .
Let’s n = 1. Then:
K11 = f (t0 ; u0 ) = f (1; 2) = −2.8964
K21 = f (t0 + h; u0 + hK11 ) = f (1.25; 1.2759) = −1.2222
0.25
h
(−2.8964 − 1.2222) = 1.4852
u1 = u0 + (K11 + K21 ) = 2 +
2
2
K12 = f (t1 ; u1 ) = f (1.25; 1.4852) = −1.5989
K22 = f (t1 + h; u1 + hK12 ) = f (1.5; 1.0854) = −0.805
h
0.25
u2 = u1 + (K12 + K22 ) = 1.4852 +
(−1.5989 − −0.805) = 1.1847
2
2
The calculation of both approximate values, u3 and u4 continues analogically. All the results are summarized in Tab. 3.4.
Table 3.4:
n
K1n
K2n
tn
un
1
2
3
4
-2.8964 -1.5989 -0.9704 -0.6471
-1.2222 -0.805 -05681 -0.4339
1.25
1.5
1.75
2
1.4852 1.1847 0.9924 0.8572
Finally, the third method is the proper Runge-Kutta method. The sequence of steps for calculating all iterations using the Runge-Kutta method in the
case m = 4 and with the choice of coefficients given by Tab. 3.2, is summarized as
h
un = un−1 + (K1n + 2K2n + 2K3n + K4n );
6
n = 1, 2, ..., N ;
where
K1n = f (tn−1 ; un−1 )
h
h
K2n = f (tn−1 + ; un−1 + K1n )
2
2
h
h
K3n = f (tn−1 + ; un−1 + K2n )
2
2
n
K4 = f (tn−1 + h; un−1 + hK3n )
u0 = u(t0 );
h=
T − t0
.
N
Example 3.3 Solve the following Cauchy problem using the Runge-Kutta method
u
u0 (t) = 3te−t − u − ,
t
u(1) = 2
in < 1; 2 > with step h = 0.25 (N = 4).
Solution. Our role is to compute step by step following values based on the specified parameters of this problem: u1 , u2 , u3 and u4 as the approximate
values of the function u(t) at the points: t1 = 1.25, t2 = 1.5, t3 = 1.75 and t4 = 2 similarly as in the previous examples, but we must calculate
additionally a quartet of coefficients K1 – K4 for each time step.
K11 = f (t0 ; u0 ) = f (1; 2) = −2.8964
h
h
K21 = f (t0 + ; u0 + K11 ) = f (1.125; 1.6380) = −1.9982
2
2
h
h
K31 = f (t0 + ; u0 + K21 ) = f (1.125; 1.7502) = −2.2103
2
2
1
K4 = f (t0 + h; u0 + hK31 ) = f (1.25; 1.4474) = −1.531
0.25
h
u1 = u0 + (K11 + 2K21 + 2K31 + K41 ) = 2 +
(−2.8964 − 2t1.9982 − 2t2.2103 − 1.531) = 1.4648
6
6
K12 = f (t1 ; u1 ) = f (1.25; 1.4648) = −1.5623
h
h
K22 = f (t1 + ; u1 + K12 ) = f (1.375; 1.2695) = −1.1499
2
2
h
h
K32 = f (t1 + ; u1 + K22 ) = f (1.375; 1.3211) = −1.2389
2
2
2
K4 = f (t1 + h; u1 + hK32 ) = f (1.5; 1.1551) = −0.9211
0.25
h
(−1.5623 − 2t1.1499 − 2t1.2389 − 0.9211) = 1.1623
u2 = u1 + (K12 + 2K22 + 2K32 + K42 ) = 1.4648 +
6
6
The calculation of both approximate values, u3 and u4 is performed analogically. All the results are summarized in Tab. 3.5.
Let us note that this Cauchy problem has an analytical solution which reads
u(t) = e−t (t2 +
2e−1
).
t
If we calculate the also exact values, we can now compare the accuracy of all three previously used numerical methods. The comparison is shown clearly
in Tab. 3.6.
Table 3.5:
n
K1n
K2n
K3n
K4n
tn
un
1
2
3
4
-2.8964 -1.5623 -0.9931 -0.6166
-1.9982 -1.1499 -0.72921 -0.5111
-2.2103 -1.2389 -0.7703 -0.5313
-1.5310 -0.921 -0.6115 -0.4482
1.25
1.5
1.75
2
1.4648 1.1623
0.9730
0.8417
As we could see in the last example, the accuracy of the algorithms is various. The proof of accuracy of all m-point Runge-Kutta method is out of
scope of the present course. Nevertheless, it is useful to know how large the time step h should be chosen. So we can summarize the accuracy of the
introduced algorithms. Let us denote the exact solution of the Cauchy problem (3.1) u = u(t) and let us denote an interpolated solution obtained from
the nodal values, which are calculated by any numerical algorithm, u∗ = u∗ (t). Then there exist pertinent constants Ci such that the following estimates
hold:
Table 3.6:
t
1.25
1.5
1.75
2
u-exact
1.42955 1.07923 0.9728 0.84173
u-Euler
1.2759 0.9703 0.8171 0.7241
u-Heune
1.4852 1.1847 0.9924 0.8572
u-Runge-Kutta 1.4648 1.1623 0.97230 0.8417
A1. For the Euler method: |u(t) − u∗ (t)| < C1 h for all t ∈ ht0 ; T i.
B1. For the Heune method: |u(t) − u∗ (t)| < C2 h2 for all t ∈ ht0 ; T i.
C1. For the Runge-Kutta method: |u(t) − u∗ (t)| < C3 h4 for all t ∈ ht0 ; T i.
The estimates confirm
extreme accuracy of the Runge-Kutta method, for which splitting each time step to two of half lengths, reduces the error to one
1 4 4
sixteenth (i.e. 2 h in the right-hand side of the pertinent inequality).
1. Compare the computational difficulty of all three aforementioned methods!
2. Compare the accuracy of all three aforementioned methods!
3. Try to construct other m-point runge-Kutta algorithms.
4. Try to explain the role of extra constants "K" (intermediate steps) in numerical methods of the type Runge-Kutta!
1. Solve the Cauchy problem u0 = 2 ut + 2t3 , u(1) = −1 in the range < 1; 2 > for N = 5 by the Euler, Heune and Runge-Kutta method .
In the problems 2. to 10., solve the Cauchy problem by the Euler method for N = 10, by the Heune method for N = 4 and also by the Runge-Kutta
method for N = 2.
u(2) = 4, in the range < 2; 3 > .
0
3. u =
u
t
0
4. u = t(u − t cos t), u(0.1) = 1.6, in the range < 0.1; 0.6 > .
0
5. u =
t
2u
0
2
6. 3u + u +
0
2
7. 2t u + u = 1, u(6) = 0.5, in the range < 6; 7 > .
0
2. u =
0
8. u −
4t
,
t2 +u2
1
t−u2
2
t2
= 0, u(1) = 1, in the range < 1; 2 > .
= 0, u(2) = 0, in the range < 2; 3 > .
cos ln ut , u(1) = 2, in the range < 1; 2 > .
+
tu
,
2(t2 −1)
u(2) = 1, in the range < 2; 4 > .
π
π
0
9. u cos t = 1 − u sin t, u(− 4 ) = 0, in the range < − 4 ; 0 > .
0
2
10. u = cos(t ), u(−2) = 0, in the range < −2; −1 > .
Conclusion
In this chapter, we looked at some basic methods used for solving the Cauchy problem (but only) in numerical mathematics. In fact, there are extremely
few actual practical problems of mathematical physics which can be solved analytically. This gives many possibilities for developing approximate methods
for solving any types of differential equations or their systems. The knowledge acquired in this chapter is also needed for finding solutions for various
new tasks resulting from everyday technical experience.
References
[6] Minorskij, V.P.: Sbírka úloh z vyšší matematiky, Praha 1958.
Problem solutions
E: t1 = 1.2 t2 = 1.4 t3 = 1.6 t4 = 1.8 t5 = 2
u1 ≈ −1 u2 ≈ −0.6421 u3 ≈ 0.272 u4 ≈ 1.9784 u5 ≈ 4.7508
E: u10 ≈ 8.958
H: t1 = 1.2 t2 = 1.4 t3 = 1.6 t4 = 1.8 t5 = 2
1.
2. H: u4 ≈ 9.3
u1 ≈ −0.8211 u2 ≈ −0.1212 u3 ≈ 1.3473 u4 ≈ 3.3704 u5 ≈ 7.7732
RK: u2 ≈ 9.39
RK: t1 = 1.2 t2 = 1.4 t3 = 1.6 t4 = 1.8 t5 = 2
u1 ≈ −0.8065 u2 ≈ −0.0787 u3 ≈ 1.433 u4 ≈ 4.0167 u5 ≈ 7.9988
E: u10 ≈ 9.835
E: u10 ≈ 0.47
4. H: u4 ≈ 9.858
5. E: u10 ≈ 3.27
6. H: u4 ≈ 0.497
7. RK: u2 ≈ 0.5556
8. RK: u2 ≈ 0.4142
RK: u2 ≈ 9.875
RK: u2 ≈ 0.4997
3. E: u10 ≈ 3.513
E: u10 ≈ −0.499
10. H: u4 ≈ −0.4184
RK: u2 ≈ −0.4437
Chapter 4. (The probability theory)
Aim
Remind students basic principles of probability and introduce a new theory concerning the operations with random variables required for statistical
calculations.
Objectives
1. Remind principal notions of probability.
2. Learn to work with random variables and their cumulative distribution functions.
3. Gain a survey on characteristics of random variables and on their calculation.
4. Identify basic types of probability distributions for random variables.
Prerequisites
secondary-school knowledge of probability; function and its properties; definite integral; improper integral
Chapter 4. (The probability theory) – 1
Introduction
The chapter reminds basic notions of probability theory, because it is required for rigorous explantation and application of statistical methods. Some
of them are known from the secondary school, most of them being closely connected to combinatorics and its application in probability. In such cases
the random events can have values in a finite set of possible results so that the probability can be obtained by a combinatoric formula. Nevertheless, in
practical calculation, random events can have values e.g. in an interval so that the assumption of a finite number of possible results is no longer valid.
We start with repetition of known facts. Especially, what probability is, what a random event is etc.. Simultaneously, our knowledge will be deepened
to have wider application.
The principal part of the chapter discusses the work with a random variable. It describes random events, even if their results are not numbers. We will
define functions which determine how the values are reached by a random variable and what basic numerical values characterize the random variable.
In the second part we introduce basic types of probability distributions and their characteristics, especially those which are useful in engineering practice.
We will focus on the normal distribution which plays a key role in the theory itself, in the theory of mathematical statistics and in practical problems as
well, e.g. in those where occurs a measurement error.
At the very end, considered rather as an appendix, a special function used in many branches of applied mathematics, especially in mathematical statistics,
will be introduced. The function is know as Euler function Γ. The function, however, will be also used in the previous parts of the chapter.
The probability theory – Events and probability
Recall and update the known facts about probability and ways of its interpretation in the mathematical theory.
Let us consider an attempt, whose result depends on random influences. A particular result of the attempt is called a simple event. The set of all simple
events is then called the space of simple events and it is denoted Ω. A set of simple events is called an event. It is a subset of Ω and usually it is denoted
by a capital letter from the beginning of the alphabet e.g. an event A. Trivial examples of events are an event of certainty, an event which occurs in any
case, e.g. A=Ω, or an event of impossibility, an event which never occurs, e.g. A=∅. Recall that events, as sets, can be manipulated be set operation
like union (A ∪ B), intersection (A ∩ B), difference (A \ B), a complement (to Ω) (¬A).
Example 4.1 Describe the space of simple events for a throw of a standard dice and also some events.
Solution. The top face of the dice is understood as the result of the throw, so
Ω={
1, 2, 3, 4, 5, 6} .
9
An event of certainty is that the resulting face contains at least one dot. An event of impossibility is e.g. the top face like this . Usually, a result of
the throw is represented by a number – the number of the dots, unwillingly we define a random variable which will be discussed below. Finally, an event
A can be an even number of the dice, i.e. A = { , , }.
246
If the result of an attempt depends on a chance, it is good to know a measure of such a chance or how likely are the possibilities. The measure is
expressed by the probability of the event A and it will be denoted as P(A). Thus, in order to define the probability mathematically correctly, a function
defined on subsets of the space of simple events Ω is needed. Additionally, as the set operations can be used for manipulating the events, also the results
of such operations have to be assigned a probability. It implies the for mathematical working with chances the probability space has to be given as a
triple (Ω, S, P), where S is an appropriate system of subsets of Ω and Pis a function defined on S. From the above, it is clear that S contains whole
simple-event set Ω: Ω∈S and that for any set A∈S also its complement must belong to S: ¬A∈S. Following the reasoning, if two sets A, B (of course
A⊆Ω, B⊆Ω) are in S, then their union lies also in S: A∪B∈S. This should be valid even for a countable number of the sets in the union. Such a
system of subsets is called a σ-algebra.
Example 4.2 Find examples of σ-algebras on Ω from Example 4.1.
Solution. A trivial example is always a two element set S={Ω, ∅}, because Ω∪Ω=Ω, Ω∪∅=Ω, ∅∪∅=∅, ¬Ω=∅.
135
Another example can be the set S={Ω, ∅, A, ¬A}, where e.g. A={ , , }. Additionally to the relations from the previous example we have e.g.
these: A∪Ω=Ω, A∪∅=A, A∪¬A=Ω etc.. In each case an element of S is obtained.
The maximal σ-algebra is the set of all subsets of the set Ω, as the required set relations are valid automatically. This set is usually denoted 2Ω .
In the case of Ω with a finite number of elements, it is easy to describe a σ-algebra and to describe the probability P. Nevertheless, in the case of infinite
(uncountable) sets (e.g. if Ω=R) the situation is complicated, for the probability is expected Pto behave ‘naturally’ which can be summarized as basic
properties of probability:
A1. For an arbitrary event A holds 0≤P(A)≤1.
B1. P(Ω)=1
C1. For arbitrary mutually non-intersecting events A1 , A2 , . . . Ak , k≤+∞ (i.e. if i6=j, then Ai ∩Aj =∅) holds
P (A1 ∪ A2 ∪ · · · ∪ Ak ) = P(A1 ) + P(A2 ) + · · · + P(Ak )
Such a function is called a probability measure in the mathematical theory. The range of the function Pdetermined in the item A1 is a convention to
give certainty a unit probability which is in particular stressed by the item B1. The item C1 naturally determines adding of the possibilities. In the case
of Ω=R, the satisfaction of this item is problematic and it causes problems in defining an appropriate set S. Hence, S is not considered as the set of
all subsets of R, but the least σ-algebra containing all intervals (the Borel σ-algebra). The reason for defining such S is that the intervals can be easily
‘measured’.
Another properties of the probability cen be derived from the basic ones. Having A, B∈S, there especially hold
P(¬A) = 1 − P(A),
(4.1a)
P(∅) = 0,
P(A \ B) = P(A) − P(A ∩ B),
(4.1b)
P(A ∪ B) = P(A) + P(B) − P(A ∩ B).
(4.1c)
A ⊆ B ⇒ P(A) ≤ P(B).
(4.1d)
It should be noted that the sketched definition of probability is called the Kolmogorov axiomatic definition of probability.
Example 4.3 Classical definition of probability: Consider a finite number of simple events which are indivisible and equally likely.
Solution. Expressed in introduced symbols it means that Ω={ω1 , ω2 , . . . Ωn } for some n, further for i6=j holds ωi ∩ωj =∅ and for all i is P(ωi )= n1 . Any
m-element subset A⊆Ω can be described as A={ωk1 , ωk2 , . . . ωkm }. Hence,
1
1
m
1
+ + ··· + = .
n n
n
n
This explains probability as it is known from the secondary school: a ratio between number of simple events pertaining to the event and of all simple
events. As we mentioned above the system of subsets S consists of all subsets of Ω: S=2Ω .
P(A) = P(ωk1 ) + P(ωk2 ) + · · · + P(ωkm ) =
Example 4.4 Geometrical definition of probability: Let a planar domain Ω⊆R2 be given and let G⊆Ω is also a domain. The probability of the event
that a randomly chosen point x∈Ω lies also in G is given by the ratio of the areas of the domains G and Ω.
Solution. The definition of probability can be written by means of a double integral
RR
dxdy
S(G)
P(x∈G) =
= RRG
.
S(Ω)
dxdy
Ω
The properties of the integral imply the basic properties of RR
probability, in particular the property of additivity C1. It can also be seen that G cannot
by an arbitrary subset of Ω, because generally the integral G dxdy does not exist. It has to be a so called measurable set: the system 2Ω in the
role of S does not work. The construction of the double integral usually starts with definition of the integral over a rectangle and then the set of
admissible-for-integration subsets of Ω is augmented. Similarly it works in the theory of probability and the definition of the system S, see also a note
about the Borel σ-algebra above.
Example 4.5 Statistical definition of probability: Under the same conditions, n attempts are performed. Let the event A occurred in mn of these n
attempts. Then P(a) = limn→+∞ mnn .
An important phenomenon in the theory of elasticity and in statistics, too, is a degree of influence which has one event on another. This degree is
expressed by the conditional probability. If A and B are to events in a probability space and if P(B)6=0, then the conditional probability P(A/B) of the
event A under the condition of the event B is meant as the probability of the event A if the event B certainly occurred, hence
P(A/B) =
P(A ∩ B)
.
P(B)
(4.2)
If the events do not influence each other, they are called independent. The independence of events is also important in mathematical statistics, e.g.
in assessing an experiment where we did several measurments. It is necessary to know, whether a particular measurement has an influence on other
measurments, it means whether the measurements are independent. As we will see, a lot of statistical calculations relies on such independence. If the
event B does not affect the event A, then it must hold P(A/B)=P(A) and the definition relation (4.2) provides an important relation for independent
events
P(A ∩ B) = P(A)P(B),
(4.3)
which is frequently considered as the definition of the events independence.
Example 4.6 We throw two dice a red one and a blue one. Determine the probability of the event A that both dice’s results are
probability that ‘the total number of spots is seven’ (the event B) if we know that at least one dice result is
(C).
5
1.
What is the
Solution. The set of simple events Ω is given by pairs
1 1), (1, 2), . . . , (1, 6), (2, 1), (2, 2), . . . (6, 6)}.
Ω = {( ,
1
. Of course, we can use independence
We use the classical probability: there are 36 pairs and there is only one that pertains to the event A, hence P(A)= 36
of the dice and define Ab as an event, where the blue dice result is
and an event Ar that the red dice result is , so that P(Ab ) = P(Ar ) = 16 . As
1
the dice are independent, we have P(Ab ∩Ar ) = P(Ab )P(Ar )= 61 16 = 36
.
1
Further, writing all possibilities we obtain P(B)= 16 , P(C) =
11
36
and P(B∩C) =
1
1
.
18
The events are not independent as
which has the event C on B can be calculated by the conditional probability P(B/C)=
which includes also the possibility (
5, 5).
1
18
11
36
2
= 11
.
1
6= 1 11 .
18 6 36
The degree of influence
The difference in relation to P(B) is caused by the result
The probability theory – The random variable
The result of a random event is described by numerical values and for all obtainable values, there should be determined probabilities.
The event can generally have numerical or non-numerical values. In calculation, however, it is advantageous to replace all values by the numerical ones,
which of course are random. The function ξ which uniquely assigns to each simple event ω∈Ω a value x is called a random variable, thus
ξ(ω) = x, pre ω∈Ω, x∈R.
(4.4)
The range of the function is a subset of R. If the value of the event is numerical, it means if Ω a numerical set, then the random variable is usually defined
naturally as ξ(ω)=ω. In the case that the probability space (Ω, S, P) includes S6=2Ω as we described above, for each element of S the probabilities have
to be reachable. It means that ξ has to satisfy the condition that for all x∈R holds: {ω∈Ω:ξ(ω)<x}∈S.
Some random variables can have only finite number of values or infinite but still countable number (i.e. they can be ordered in a sequence). Such random
variables are called discrete.
Example 4.7 Define a random variable for a classical dice.
1 2 3 4 5, 6}. The result of a throw is a number of spots, so the random variable ξ is natural to
ξ(3) = 3,
ξ(4) = 4,
ξ(5) = 5,
ξ(6) = 6.
2
Solution. From Example 4.1, we have Ω = { , , , ,
be defined as follows:
ξ( ) = 1,
ξ( ) = 2,
1
The definition of a random variable should take into account the whole probability space. It means that not only the values of the random variable are
important, the same importance pertains also to the probabilities of occurrence of all obtainable values. The rule of reaching of the values by a random
variable is called the probability distribution.
For a discrete random variable ξ with values xi , it is sufficient to define probabilities of all particular P
values, i.e. to determine pi =P({ξ(ωi )}=xi ) which
can be done by a table or a formula. The properties of probability imply that pi always sum to 1, ni=1 pi =1, where n is the number of the possible
values, either a natural number or +∞.
The probability distribution is usually determined by the cumulative distribution function of the random variable. The function y=Fξ (x), x∈R is call
cumulative distribution function of the random variable ξ, if holds
(4.5)
Fξ (x) = P ({ω : ξ(ω) < x}) .
We will usually write the probability in an abbreviated form Fξ (x)=P (ξ<x). It implies that for any cumulative distribution function hold:
A2. lim Fξ (x) = 0,
x→−∞
lim Fξ (x) = 1.
x→+∞
B2. The function Fξ is non-decreasing in R.
C2. The function Fξ is left continuous at each point x0 ∈R, i.e. lim− Fξ (x) = Fξ (x0 ).
x→x0
These properties also identify a function as a cumulative distribution function of an unknown random variable. If a function y=F (x) satisfies the
properties A2 to C2, then it is the cumulative distribution function for a random variable ξ and determines its probability distribution.
Let us note that the values of a random variable frequently cover only positive data, e.g. due to a physical meaning of pertinent random variable. For
such cases, the Heaviside function Θ is introduced which is defined by the relation
(
0, if x ≤ 0,
Θ(x) =
(4.6)
1, if x > 0.
Example 4.8 Prove some of the properties A2, B2, C2.
Solution. The most simple is B2. Choose two numbers a, b so that a<b, and think about the probabilities P ({ω:ξ(ω)<a}) and P ({ω:ξ(ω)<x}).
The events A={ω:ξ(ω)<a} and B={ω:ξ(ω)<b} naturally satisfy the condition A⊂B, and thence P(A)≤P(B). The last inequality is an equivalent
expression for the relation Fξ (a)≤Fξ (b). As the numbers a and b are arbitrary, the inequality proves the monotonicity of Fξ .
The rest of the properties include some limit. We show one of them, e.g. the first relation in A2. Let us define a sequence of events An ={ω:ξ(ω)<−n}.
They satisfy the condition
+∞
\
An+1 ⊆ An a Ak ∩ Ak+1 ∩ Ak+2 ∩ · · · ∩ Ak+100 ∩ · · · =
Aj = ∅
j=k
for any n and k. Let us define Bn =An \An+1 ={ω: − (n+1)≤ξ(ω)<−n}
for all possible n which are obviously disjoint so that the summing rule of
S+∞
probabilities C1 can be applied. It is also clear that for all n holds An = j=n Bj . Hence
P(An ) = P
+∞
[
j=n
!
Bj
=
+∞
X
j=n
P(Bj ),
where especially
1 ≥ P(A1 ) =
+∞
X
P(Bj ).
j=1
P
P
The last sum determines a convergent series (i.e. Sn = nj=1 P(Bj ), P(Bj )≥0, thus there exists limn→+∞ Sn =a≤1) so that its remainder Rn = +∞
j=n+1 P(Bj )
converges to zero (i.e. limn→+∞ Rn =0 as Rn +Sn =a). It implies
lim P(An ) = lim
n→+∞
+∞
X
n→+∞
P(Bj ) = lim Rn−1 = 0.
n→+∞
j=n
As x→−∞, there exists n such that x<−n and then also 0≤Fξ (x)=P(ξ<x)≤P(An ). The limit in the right-hand side is zero, hence limx→−∞ Fξ (x)=0.
The basic properties imply also other important relations. Some of them follow:
A3. P(ξ ≤ x) = lim Fξ (z)
z→x+
B3. P(ξ = x) = lim Fξ (z) − F ξ(x)
z→x+
C3. P(a ≤ ξ < b) = Fξ (b) − Fξ (a)
D3. P(a < ξ < b) = Fξ (b) − lim Fξ (z)
z→a+
E3. P(a < ξ ≤ b) = lim Fξ (z) − lim Fξ (z)
z→b+
z→a+
F3. P(a ≤ ξ ≤ b) = lim Fξ (z) − Fξ (a)
z→b+
Example 4.9 In a board game, there is used a dice which has three faces with
cumulative distribution function for this dice.
3, two with 4 and one with 2. Find a random variable and pertinent
Solution. The set of simple events and pertinent probabilities are:
Ω={
The random variable is defined naturally
2, 3, 4},
2
ξ( ) = 2,
2
1
P( ) = ,
6
ξ(
3) = 3,
3
1
P( ) = ,
2
ξ(
4
1
P( ) = .
3
4) = 4
which is a discrete random variable.
Now, we can find its cumulative distribution function. We choose a number x an try to find the probability
2 3, 4} : ξ(ω) < x}) = P({w ∈ {2, 3, 4} : w < x}), where the definition of ξ says P({2}) = 16 ,
1
1
P({3}) = , P({4}) = .
2
3
It implies that for x≤2 is P(ξ<x)=0 as it is an impossible event. Similarly for x>4 we obtain a certain event so that P(ξ<x)=1. Analogously it works
also for the values in the range (2; 4i. We obtain the following function:

x ≤ 2,

 0, for


1


 , for 2 <x ≤ 3,
Fξ (x) = 6
2


, for 3 <x ≤ 4,



3


1, for 4 <x.
P({ω ∈ { ,
It is important to be aware especially of the splitting points. There remains point 3: P(ξ < 3) = P(ξ = 2) =
cumulative distribution function (4.5).
1
6
due to inequality in the definition of the
We have defined the Heaviside function (4.6) in order not to write a function of this type by splitting it to cases. Using this notation, Fξ can be written
as
1
1
1
Fξ (x) = Θ(x − 2) + Θ(x − 3) + Θ(x − 4).
6
2
3
Let us notice the multiplicative constants which are equal to jumps in the cumulative distribution function.
The graph of the function is shown in Figure 4.1. All the properties A2, B2, C2 can be easily read from the graph.
Example 4.10 Show that the function F (x)=1−Θ(1−x) + Θ(x(1−x))x is the cumulative distribution function for a random variable ξ.
Solution. First, we rewrite F using elementary functions. The function Θ is discontinuous at zero. Thus, we find the values of F (x) for those x where
Θ is calculated at zero, i.e. for x=0 and x=1
F (0) = 1 − Θ(1) + Θ(0) · 0 = 0,
F (1) = 1 − Θ(0) + Θ(0) · 1 = 1.
For the rest of the values we obtain


: 1 − 1 + 0x = 0,
x < 0
F (x) = 1 − Θ(1 − x) + Θ(x(1 − x))x = 0 < x < 1 : 1 − 1 + x = x,


x>1
: 1 − 0 + 0x = 1.
1
5
6
2
3
1
2
1
3
1
6
0
-1
0
1
2
3
4
5
6
7
Figure 4.1: The cumulative distribution function for the dice of Example 4.9.
This expression easily verifies all the properties. First:
lim F (x) = lim 0 = 0,
x→−∞
x→−∞
lim F (x) = lim 1 = 1,
x→−∞
x→−∞
proves A2. The relations
lim F (x) = lim− 0 = 0,
x→0−
x→0
lim F (x) = lim− x = 1,
x→1−
x→1
and continuity of F at all other points prove C2. The function is even continuous at all points.
Monotonicity is shown for all possible choices of x, respectively to the definition of F
x1 < x2 < 0 : F (x1 ) = 0 ≤ 0 = F (x2 ),
x1 < 0 ≤ x2 < 1 : F (x1 ) = 0 ≤ x2 = F (x2 ),
0 ≤ x1 < x2 < 1 : F (x1 ) = x1 ≤ x2 = F (x2 ),
x1 < 0 < 1 ≤ x2 : F (x1 ) = 0 ≤ 1 = F (x2 ),
0 ≤ x1 < 1 ≤ x2 : F (x1 ) = x1 ≤ 1 = F (x2 ),
1 ≤ x1 < x2 : F (x1 ) = 1 ≤ 1 = F (x2 ).
Hence, the function F is non-decreasing.
The graph of the function in Figure 4.2 documents also the satisfaction of all three properties.
Let us suppose that the cumulative distribution function Fξ of a random variable ξ is continuous and increasing within an interval (a; b) (a may be −∞,
b may be +∞) and Fξ (a)=0, Fξ (b)=1. Then in this interval, there exists the inverse function Fξ−1 defined in the interval (0; 1). This inverse function
defines the quantile function, the value of it being quantile of the level δ xδ of the random variable ξ – a number from the interval (a; b) which satisfies
the relation
xδ = Fξ−1 (δ), teda P({ω : ξ(ω) < xδ }) = δ.
(4.7)
There are special names for some specific quantiles: the quantile at the level δ=0.5 is called median, the quantile at the level δ=0.25 is called lower
quartile, at the level δ=0.75 it is upper quartile.
1
0.75
0.5
0.25
0
-1
-0.5
0
0.5
1
1.5
2
Figure 4.2: The graph of the function F from Example 4.10.
Example 4.11 Find the median and both quartiles for a random variable with the cumulative distribution function from Example 4.10.
Solution. The cumulative distribution function is increasing in the interval (0; 1). In this interval, we have Fξ (x)=x so that Fξ−1 (t)=t. It implies
x0.5 = Fξ−1 (0.5) = 0.5,
x0.25 = Fξ−1 (0.25) = 0.25,
x0.75 = Fξ−1 (0.75) = 0.75.
If the cumulative distribution function Fξ is supposed to be piecewise smooth, then there exists a non-negative function pξ (cf. the property B2) for
which pξ (x)=Fξ0 (x) for (almost) all x. The function pξ is called the probability density of the random variable ξ. A random variable whose probability
density is an integrable function is called a continuous random variable. A continuous random variable ξ is usually given by its probability density and
the cumulative distribution function Fξ is then obtained by integration
Z x
Fξ (x) =
pξ (t)dt.
(4.8)
−∞
An important condition for an non-negative integrable function pξ to be a probability density follows from the second relation in the property A2:
R +∞
p (t)dt=1. Geometrically, the values of probabilities can be read as values of the cumulative distribution function or as areas below the graphs of
−∞ ξ
probability density.
The probability that a continuous random variable has a particular value is always zero, because the right-hand side limit in the relation B3 is equal to
the function value. Therefore, also the properties C3 to F3 provide all the same result of C3, where the inequalities can be strict or not.
Example 4.12 Find probability density for a random variable with the cumulative distribution function from Example 4.10.
Solution. The cumulative distribution function is continuous and differentiable besides the points 0 and 1. The density can be calculated by differentiation
pξ (x)=Fξ0 (x):

(

0, x < 0
1, 0 ≤ x ≤ 1
Fξ (x) = x, 0 ≤ x ≤ 1
⇒
pξ (x) =
.

0, otherwise

1, x > 1
The values of the function at 0 and 1 can be arbitrary.
When working with random variables, it is often required to have in mind an n-tuple of random variables given in the same probability space (Ω, S, P)
and it is important to know wether they affect one another.
Random variables ξ1 , ξ2 , . . . ξn are called independent, if for any n-tuple of numbers (x1 , x2 , . . . , xn ), the events A1 ={ω:ξ1 (ω)<x1 }, A2 ={ω:ξ2 (ω)<x2 },
. . . , An ={ω:ξn (ω)<xn } are independent which means that P(A1 ∩A2 ∩. . .∩An )=P(A1 )P(A2 ) . . . P(An ).
The probability in the left-hand side defines the joint distribution function for the random vector (ξ1 , ξ2 , . . . , ξn )
F (x1 , x2 , . . . , xn ) = P({(ω1 , ω2 , . . . , ωn ) ∈ Ωn : ξ1 (ω1 ) < x1 ∧ ξ2 (ω2 ) < x2 ∧ · · · ∧ ξ2 (ω2 ) < x2 }).
The relation can be rewritten for a continuous random variables using the joint probability density
Z x2 Z x1
Z xn
...
p(t1 , t2 , . . . , tn )dt1 dt2 . . . dtn .
F (x1 , x2 , . . . , xn ) =
−∞
−∞
(4.9)
(4.10)
−∞
Independence of random variables then means
F (x1 , x2 , . . . , xn ) = Fξ1 (x1 )Fξ2 (x2 ) . . . Fξn (xn ),
(4.11)
which can be transformed for continuous random variables to probability densities
p(x1 , x2 , . . . , xn ) = pξ1 (x1 )pξ2 (x2 ) . . . pξn (xn ).
(4.12)
If we need to create a new random variable ψ=h(ξ1 , ξ2 , . . . , ξn ) from continuous random variables ξ1 , ξ2 , . . . ξn using an appropriate function of n
variables y=h(x1 , x2 , . . . , xn ), the cumulative distribution function for ψ can be obtained in the form
Z
Z Z
Fψ (x) = P(ψ < x) = . . .
p(t1 , t2 , . . . , tn )dt1 dt2 . . . dtn .
(4.13)
h(t1 ,t2 ...,tn )<x
Example 4.13 Having given two independent continuous random variables ξ1 and ξ2 with probability density p1 and p2 , find the probability density for
ψ=ξ1 +ξ2 .
Solution. The cumulative distribution function for the random variable ψ can be written using the definition and relations 4.13 and 4.12 as
Z x
ZZ
Fψ (x) =
pψ (t)dt =
p1 (t1 )p2 (t2 )dt1 dt2 .
−∞
t1 +t2 <x
Using the substitution t=t1 +t2 in the integral over t1 and interchanging the order of integration, the second integral can be calculated as follows:
Z +∞ Z x−t2
Z +∞ Z x
Z x Z +∞
p1 (t1 )p2 (t2 )dt1 dt2 =
p1 (t − t2 )p2 (t2 )dtdt2 =
p1 (t − t2 )p2 (t2 )dt2 dt.
−∞
−∞
−∞
Hence, writing s instead of t2 , it renders
Z
−∞
−∞
x
Z
x
Z
+∞
p1 (t − s)p2 (s)ds dt.
pψ (t)dt =
Fψ (x) =
−∞
−∞
−∞
−∞
Comparing the two integrals for any x provides the relation
Z
+∞
p1 (t − s)p2 (s)ds.
pψ (t) =
−∞
This is the relation for the probability density of the random variable ψ.
Example 4.14 Use the result of Example 4.13 to a triple of independent random variables ξ1 , ξ2 , ξ3 with the same probability density from Example
4.12.
Solution. The probability density of all three variables is given by a simple function p(x)=Θ (x(1 − x)). We denote ψ1 =ξ1 +ξ2 and calculate first its
probability density
Z +∞
pψ1 (t) =
Θ ((t − s)(1 − t + s)) Θ (s(1 − s)) ds.
−∞
The integrand is non-vanishing if all following inequalities are satisfied: 0<s<1 and 0<t−s<1. The second relation can be changed to the form
t−1<s<t. Both imply that for t<0 or for t>2 the integrand is zero. Otherwise, there are two possibilities. If 0<t<1, then all inequalities render
non-zero integrand for 0<s<t which results to
Z +∞
Z t
Θ ((t − s)(1 − t + s)) Θ (s(1 − s)) ds =
1 · 1ds = t.
−∞
0
If 1<t<2, then all inequalities render non-zero integrand for t−1<s<1 and the result is
Z +∞
Z 1
Θ ((t − s)(1 − t + s)) Θ (s(1 − s)) ds =
1 · 1ds = 1 − (t − 1) = 2 − t.
−∞
t−1
The probability density can be written either by using the Heaviside function or by specifying


t,
pψ1 (t) = Θ(t(2 − t)) (1 − |t − 1|) = 2 − t,


0,
separate cases as follows:
0 < t < 1,
1 ≤ t < 2,
otherwise.
The random variable for all three summed variables is ψ=ψ1 +ξ3 . The probability density is obtained by the relation
Z +∞
pψ (t) =
Θ ((t − s)(1 − t + s)) Θ(s(2 − s)) (1 − |s − 1|) ds.
−∞
In this case, the integral is non-vanishing if: 0<s<2 and t−1<s<t. Both imply that for t<0 or for t>3 the integrand vanishes. For the rest of t, there
are three cases. If 0<t<1, then all inequalities render non-zero integrand for 0<s<t and the result is
Z +∞
Z t
Z t
1
Θ ((t − s)(1 − t + s)) Θ(s(2 − s)) (1 − |s − 1|) ds =
(1 − |s − 1|) ds =
sds = t2 .
2
−∞
0
0
If 1<t<2, then all inequalities render non-zero integrand for t−1<s<t which results in
Z t
Z 1
Z t
Z +∞
1
3
sds+ (2−s)ds =
(1 − |s − 1|) ds =
Θ ((t − s)(1 − t + s)) Θ(s(2−s)) (1 − |s − 1|) ds =
1 − (t − 1)2 + 1 − (2 − t)2 = −t2 +3t− .
2
2
1
t−1
t−1
−∞
Finally, if 2<t<3, then all inequalities provide non-vanishing integrand for t−1<s<2 and it renders
Z +∞
Z 2
Z
Θ ((t − s)(1 − t + s)) Θ(s(2 − s)) (1 − |s − 1|) ds =
(1 − |s − 1|) ds =
−∞
Hence, the probability density of the random variable ψ is given by the relation
1 2
t,

2


−t2 + 3t − 3 ,
2
pψ (t) = 1
2

(3 − t) ,


2
0,
t−1
2
1
(2 − s)ds = (3 − t)2 .
2
t−1
0 < t < 1,
1 ≤ t < 2,
2 ≤ t < 3,
inak,
its graph is shown in Figure 4.3.
0.8
0.6
0.4
0.2
0
-1
0
1
2
3
4
Figure 4.3: The probability density of the random variable ψ from Example 4.14.
The probability theory – Numerical characteristics of random variables
Some properties of random variables can be understood from easily obtainable numerical values.
A total description of a random variable includes its cumulative distribution function. Nevertheless, frequently we need only a piece of information about
it. Such information can be gained from numerical characteristics of random variables which describe basic properties of the random variables and can
be easily calculated and interpreted.
These characteristics of a random variable ξ are initial moments µ0k defined by the relations
µ0k (ξ)
n
X
=
pi xki ,
for a discrete random variable having n (even +∞) values xi with probabilities pi =P(ξ=xi ),
(4.14)
i=1
µ0k (ξ) =
Z
∞
p(t)tk dt,
for a continuous random variable with the probability density p,
−∞
and central moments µk defined by the relations
µk (ξ) =
n
X
pi (xi − µ01 )k ,
for a discrete random variable having n (even +∞) values xi with probabilities pi =P(ξ=xi ),
(4.15)
i=1
Z
∞
p(t)(t −
µk (ξ) =
µ01 )k dt,
for a continuous random variable with the probability density p.
−∞
In both cases, the absolute convergence of the
R ∞ integralk or series (if a discrete random variable has an infinite number of values) is supposed. It means
that e.g. in (4.14), there exists the integral −∞ p(t)|t| dt. Otherwise, the pertinent moment does not exist.
In what follows, we will use the mean value (expectation) Eof a random variable ξ which is the first initial moment E(ξ)=µ01 (ξ) and variance (dispersion)
Dwhich is the second central moment D(ξ)=µ2 (ξ). The formulae for their calculation are also summarized in Table 4.1
Table 4.1: Formulae for mean and variance of random variables.
a discrete random variable ξ
a continuous random variable ξ
+∞
n
R
P
Eξ =
xi p i
Eξ =
xp(x)dx
Eξ
i=1
Dξ
Dξ =
n
P
−∞
(xi − Eξ)2 pi =
i=1
n
P
x2i pi − (Eξ)2
Dξ =
i=1
+∞
R
+∞
R
−∞
−∞
(x − Eξ)2 p(x)dx =
x2 p(x)dx − (Eξ)2
If h(x) is an appropriate function and ξ is a random variable , then η=h(ξ) is also a random variable and holds
Eη =
n
X
g(xi )pi ,
resp.
Z+∞
Eη =
g(x)p(x)dx,
i=1
(4.16)
−∞
again supposing the absolute convergence.
The mean of a random variable is interpreted as an expected value of the random variable, the values from the vicinity of the mean are expected, they
have large probability. √The variance characterizes a level of dispersion of data of a random variables around its mean. In practical calculation, the
standard deviation σ= Dξ is used more frequently than variance, because physical units of the standard deviations are the same as for the mean or for
the values of the random variable itself.
Example 4.15 Consider the dice from Example 4.9 and calculate for its random variable mean and standard deviation.
Solution. We have from the example
2 3, 4},
Ω={ ,
2
1
P( ) = ,
6
3
1
P( ) = ,
2
4
P( ) =
1
3
and the discrete random variable ξ defined naturally
2) = 2,
ξ(
ξ(
3) = 3,
4
ξ( ) = 4.
The mean is
Eξ = 2 ·
1
1
1
19
+3· +4· = ,
6
2
3
6
and the variance gives
2
2
2
19
19
17
19
1
1
1
Dξ = 2 −
· + 3−
· + 4−
· = .
6
6
6
2
6
3
36
√
Thus, the standard deviation is σ=
17
.
6
Example 4.16 Calculate the mean and the standard deviation for random variable from Example 4.12 and Example 4.14.
Solution. First, we have the random variable ξ from Example 4.12 with probability density p(x)=Θ(x(x − 1)). Its required characteristics are
Z 1
Z +∞
1
xdx = ,
xΘ(x(x − 1))dx =
Eξ =
2
0
−∞
√
3
Z 1
Z +∞
1 2
1 1
3
1 2
1
(x − ) dx = 2
.
Dξ =
(x − ) Θ(x(x − 1))dx =
= ,⇒ σ =
2
2
3 2
12
6
0
−∞
In Example 4.14, we calculate the probability density ψ=ξ1 +ξ2 +ξ3 in the form
1 2
x,

2


−x2 + 3x − 3 ,
2
pψ (x) = 1
2

(3 − x) ,


2
0,
0 < x < 1,
1 ≤ x < 2,
.
2 ≤ x < 3,
otherwise,
Hence, the mean can be written as
Z
Eψ =
0
1
1
x x2 dx +
2
Z
1
2
3
x −x + 3x −
2
2
Z
dx +
2
3
1
1
3
3
x (3 − x)2 dx = + 1 + = .
2
8
8
2
The variance is given by the relation
2
2 2
Z 1
Z 2
Z 3
3
3
3 1
1
1
1
1
3 1 2
2
Dψ =
x−
x dx +
x−
−x + 3x −
dx +
x−
(3 − x)2 dx =
+
+
= ,
2 2
2
2
2 2
10 20 10
4
0
1
2
so that the standard deviation reads σ= 12 .
Let us notice that Eψ=Eξ1 +Eξ2 +Eξ3 =3Eξ and also Dψ=Dξ1 +Dξ2 +Dξ3 =3Dξ which are general properties of the mean and the variance, as we will
see below.
Example 4.17 Determine the constant c so that p is the probability density for a random variable ξ. Then, find the cumulative distribution function
for random variable ξ and its mean and variance. Calculate also the median of the probability distribution.

0,
x ≤ a,



c x−a , a < x ≤ v,
v−a
a < v < b.
p(x) =
x−b

c
v
<
x
≤
b,

v−b


0,
x > b,
Solution. First, we determine the constant c having in mind that the integral of p should be one:
Z +∞
Z v
Z b
x−a
x−b
1
1
1
1=
p(x)dx =
c
dx +
c
dx = c (v − a) + c (b − v) = c (b − a),
v−a
v−b
2
2
2
−∞
a
v
2
hence c= b−a
.
The cumulative distribution function Fξ can be find from the probability density (with the correct


0,



R x 2 x−a dx = (x−a)2 ,
v−a
R(b−a)(v−a)
Fξ (x) = Rav b−a
x 2 x−b
(x−b)2
2 x−a

dx + v b−a
dx = 1 − (b−a)(b−v)

v−b
a b−a v−a


1,
c)
x ≤ a,
a < x ≤ v,
v < x ≤ b,
x > b.
The characteristics can be calculated from definition formulae
Z b
Z b
Z v
2(x − a)
2(x − b)
1
dx +
x
dx = · · · = (a + b + v),
Eξ =
x p(x)dx =
x
(b − a)(v − a)
(b − a)(v − b)
3
v
a
a
Z
b
2
2
Z
x p(x)dx − (Eξ) =
Dξ =
a
a
v
Z b
2(x − a)
2(x − b)
1
x
dx +
x2
dx − (a + b + v)2
(b − a)(v − a)
(b − a)(v − b)
9
v
1
1 1 = ··· =
(v + a)2 + (b + v)2 + (b + a)2 − (a + b + v)2 =
(v − a)2 + (b − v)2 + (b − a)2 .
12
9
36
2
The median x0.5 is the argument of the cumulative distribution function whose value is 0.5. Thus e.g.
r
1
1
(x0.5 − a)2
=
⇒ x0.5 = a +
(b − a)(v − a),
(b − a)(v − a)
2
2
which occurs in the case of a<x0.5 ≤v. This happens, if
r
a<a+
1
(b − a)(v − a) ≤ v
2
1
(a + b) ≤ v.
2
⇒
Otherwise, the other option of the cumulative distribution function has to be used
1
(x − b)2
=
1−
(b − a)(b − v)
2
r
⇒
x0.5 = b −
1
(b − a)(b − v).
2
Summing it up, we obtain
x0.5

q
a + 1 (b − a)(v − a),
q2
=
b − 1 (b − a)(b − v),
2
1
(a
2
+ b) ≤ v,
1
(a
2
+ b) > v.
The mean or the variance can be also calculated without the definition formula, only from known properties of these characteristics.
Let ξ be an arbitrary random variable with Eξ=m, Dξ=σ 2 , σ>0 and let a, b be real numbers. Then hold
E(aξ + b) = am + b,
D(aξ + b) = a2 σ 2 .
(4.17)
A direct consequence of these relations for b=− m
and a= σ1 is that the random variable η= ξ−m
has a zero mean and a unit variance: Eη=0, Dη=1.
σ
σ
Such transformation we will use in statistics, the transformed random variable is called normalized (or standard).
The following properties are related to addition or multiplication of random variables (see also Example 4.16). Let ξ1 , ξ2 , . . . ξn be independent random
variables with finite means and variances. Then
E (ξ1 + ξ2 + · · · + ξn ) = Eξ1 + Eξ2 + · · · + Eξn ,
E (ξ1 · ξ2 · · · · · ξn ) = Eξ1 · Eξ2 · · · · · Eξn ,
D (ξ1 + ξ2 + · · · + ξn ) = Dξ1 + Dξ2 + · · · + Dξn .
(4.18a)
(4.18b)
(4.18c)
It should be noted that the first relation is valid also without the assumption of the random variable independence.
Example 4.18 Derive the relations (4.17) for a continuous random variable.
Solution. The equation (4.16) for the function h(x)=ax + b renders
Z
Z +∞
Z +∞
(at + b)p(t)dt = a
h(t)p(t)dt =
E(aξ + b) = E(h(ξ)) =
Z
+∞
p(t)dt = aEξ + b = am + b.
tp(t)dt + b
−∞
−∞
−∞
−∞
+∞
Similarly
2
Z
+∞
2
(at + b − E(aξ + b)) p(t)dt = a
D(aξ + b) = E((aξ + b − E(aξ + b)) ) =
−∞
2
Z
+∞
(t − Eξ)2 p(t)dt = a2 Dξ = a2 σ 2 .
−∞
Example 4.19 Let ξ1 , ξ2 , . . . ξn are independent random variables such that Eξi =m, Dξi =σ 2 and ξ¯ =
ξ1 +ξ2 +···+ξn
.
n
¯
Find Eξ¯ and Dξ.
Solution. The properties of the mean and the variance provide
Eξ¯ =
1
1
E(ξ1 + ξ2 + · · · + ξn ) = nm = m,
n
n
Dξ¯ =
1
σ2
1
2
D(ξ
+
ξ
+
·
·
·
+
ξ
)
=
nσ
=
.
1
2
n
n2
n2
n
The ‘average’ random variable has the same mean as its components, but its variance is n-times smaller. Then, the standard deviation is
smaller.
√
n-times
There exist also numerical characteristics which provide an information about the relation between random variables. If ξ and η are two random variables
, the strength of their mutual linear relationship is evaluated by the correlation coefficient %ξη which is defined by the relation
%ξη =
E ((ξ − Eξ)(η − Eη))
E (ξη) − EξEη
√
√
=
.
DξDη
DξDη
(4.19)
Such a relation between random variables is called correlative dependence, the random variables are correlated. Some properties of the correlation
coefficient are:
A4. |%ξη | ≤ 1
B4. %ξξ = 1
C4. If ψ=aη+b, a6=0, then %ξψ = sgn a%ξη . It means than a linear transformation of random variables does not change the absolute value of the
correlation coefficient (sgn a is the ‘sign’ of a: 1 for positive a, −1 for negative a).
Example 4.20 Show the property A4.
Solution. The definition (4.19) and the property (4.17) render
%ξψ =
aE ((ξ − Eξ)(η − Eη)))
a
E ((ξ − Eξ)(aη + b − E(aη + b)))
p
p
=
= √ %ξη = sgn a %ξη
DξD(aη + b)
a2
Dξa2 Dη
The strength of the correlation between random variables is expressed by the absolute value of %ξη . The farther from zero, the stronger correlative
dependence and |%ξη |=1 corresponds according to the properties B4 and C4 to exactly linear relation between ξ and η. If %ξη =0, then the random
variables are called correlatively independent.
Dependence and correlative dependence are two distinct notions. Nevertheless, the definition of the correlation coefficient (the last expression in (4.19))
and the property (4.18b) valid for independent random variables imply that two independent random variables are also correlatively independent.
The calculation of the correlation coefficient requires to know the probability distribution of the random vector (ξ, η) by the joint cumulative distribution
function (4.10), which of course provides also probability distributions of the random variables ξ and η according to the relation (4.13) with h(ξ, η)=ξ
or h(ξ, η)=η.
Example 4.21 Let
3
p(x, y) = Θ(x(1 − x))Θ(y(1 − y)) (1 − |x − y|) .
2
be the joint cumulative distribution function for a random vector (ξ, η). Calculate the correlation coefficient for ξ and η and find the probability density
for each ξ and η.
Solution. The relation (4.13) provides the probability density pξ in the form
Z x
Z +∞
Z
Z 1
3 1
3
3 1
2
pξ (x) =
p(x, y)dy =
(1 − |x − y|) dy =
(1 − y − x) dy +
(1 − x − y) dy =
+x−x
2 0
2 0
2 2
−∞
x
for x∈(0; 1), as pξ is non-vanishing only at this interval. The probability density for η is the same, because p(x, y) is symmetric when x and y are
interchanged.
The mean and the variance of the random variable ξ are
Z 1 3 1
1
2
Eξ =
+ x − x xdx = ,
2
2
0 2
Z
Dξ =
0
1
3
2
1
+ x − x2
2
2
1
3
x−
dx = .
2
40
The mean of the product can be obtained from the relations (4.13) and (4.16) as follows:
Z Z
Z +∞ Z +∞
11
3 1 1
xy (1 − |x − y|) dxdy = .
xy p(x, y)dxdy =
E(ξη) =
2 0 0
40
−∞
−∞
Thus, the correlation coefficient is
%ξη
11
−
E (ξη) − EξEη
√
q
= 40
=
3
DξDη
11
22
3
40 40
1
= .
3
The random variables are correlated, thus they are also dependent.
The probability theory – Probability distributions of random variables
Practical results affected by a chance are usually divided into special groups according to the nature of the random influences.
When random events in practice are observed, typical probability distributions of pertinent random variables are usually encountered. It is good to learn
at least some basic types of such random variables.
Standard examples of random variables with discrete probability distributions are the alternative distribution A(p), the binomial distribution B(n, p) and
the Poisson distribution P(λ). The parameters of these distributions which distinguish a characteristic of the distribution are written in parentheses.
Descriptions of the random variables with such probability distributions are included in Tab. 4.2. To express that a random variable ξ has a probability
distribution of a type G, we will use the notation ξ∼G.
The most simple random variable receives only two values, such random variable is called to have the alternative probability distribution and its values
are usually 1 and 0. The characteristics of the random variable can be calculated as in Example 4.15. Nevertheless, more frequently a repetition of such
a simple event occurs and the random variable under consideration counts occurrences of one of the possible results. The count is a random variable η
with the binomial probability distribution and it can be understood as a sum of n independent random variables ξi with the same alternative probability
distribution (with the values 1 and 0), hence
η∼B(n, p),
η = ξ1 + ξ2 + · · · + ξn ,
ξ∼A(p)
(4.20)
This property causes the values of the mean and the variance of binomial random variables which can be derived using the relations in (4.18a) and (4.18c)
from the alternative probability distribution.
Table 4.2: Description of basic discrete probability distributions.
Alternative
Binomial
Formula
Parameters
Characteristics
Usage
p ∈ (0; 1)
P(ξ=1) = p, P(ξ=0) = 1−p
Eξ = p,
Dξ = p(1 − p)
p ∈ (0;1), n ≥ 1
n k
P(ξ=k) =
p (1−p)n−k
k
k = 0, 1, . . . n
Eξ = np,
λ>0
λk −λ
P(ξ=k) =
e
k!
k = 0, 1, . . .
Dξ = np(1 − p)
For n independent equal attempts,
If ξ has just two values: the event A ei- where the event A occurs with probther occurs (wit probability p) or does ability p, ξ determines the number of
not occur (with probability 1−p)
occurrences of the event A in these n
attempts
0.15
n = 50, p =
0.12
Probability
graphs
Poisson’s
Eξ = λ,
For observing occurrences of the event
A during a determined period of time
(number of services, malfunctions,
...)
0.15
1
5
Dξ = λ
λt = 10
0.12
0.09
0.09
0.06
0.06
0.03
0.03
0
0
0
10
0.25
20
30
n = 15, p =
40
50
0
5
10
15
20
25
20
25
0.2
2
3
λt = 5
0.2
0.15
0.15
0.1
0.1
0.05
0.05
0
0
0
3
6
9
12
15
0
5
10
15
Example 4.22 Consider several standard dice. How many there should be for the probability of at least three
1 in dicing were at least 0.5?
1
Solution. Let us denote the unknown number of dice N . For one dice, the probability of
equals 16 . From such a point of view, the dice has the
alternative probability distribution : ξ( )=1, ξ(¬ )=0 and p= 16 . The count of
when throwing N dice is a random variable with the probability
distribution B(N, p). Our task is to calculate N so that the probability P(η≥3) were more than a half. Thus
1
1
1
N N −1 2 N −2
N
1
N
1
N
5
5
5
1
−
−
≤ P(η ≥ 3) = 1 − P(η < 3) = 1 − P(η = 0) − P(η = 1) − P(η = 2) = 1 −
1
2
2
0
6
6
6
6
6
N 5
1
1
=1−
1 + N + N (N − 1) ,
6
5
50
which implies the inequality
1
2
N
1
6
9
≥ 1 + N + N 2.
5
50
50
As long as N is a whole number, we can seek N by a trial-and-error way until the relation is satisfied. The result is N ≥16. Thus, we need to throw at
least 16 dice for the probability of having at least three
is greater than that of less number of .
1
1
The Poisson probability distribution observes the occurrences of an event in a specified period of time. If the length of the time intervale is one, t=1, and
the number of expected events is λ·1 (according to Tab. 4.2 Eξ=λ), then for an interval of a general length t the mean is Eξt =λ t and the parameter of
the probability distribution equals to λ t. Both Poisson and binomial probability distributions count the occurrences of a specified event, so it is expected
that they can be replaced by each other. For a large number n of attempts in the binomial probability distribution, the number n can be limited to +∞,
knowing the expected count of occurrences, i.e. limn→+∞ np(n)=λ. Then, the binomial probability distribution can be replaced by the Poisson one with
the parameter λ.
Example 4.23 Prove that for a large number n of attempts in the binomial probability distribution, it can be replaced by the Poisson probability
distribution.
Solution. For ξ∼B(n, p)n, we denote np=λ and we substitute it into the formula from Tab. 4.2 for p. We obtain
k
λ
n k
n
λ
n−k
P(ξ=k) =
p (1−p)
=
(1− )n−k .
n
n
k
k
Now, we calculate the limit for n→+∞
k n−k
n −k
n
λ
λ
n(n − 1)(n − 2) . . . (n − k + 1) λk
λ
λ
lim P(ξ=k) = lim
1−
= lim
1−
1−
n→+∞
n→+∞ k
n→+∞
n
n
k!
nk
n
n
"
−k # k n
λ
λ
λ
λk −λ
n(n − 1)(n − 2) . . . (n − k + 1)
e ,
1−
1−
=
1
= lim
n→+∞
nk
n
k!
n
k!
which is according to Tab. 4.2 P(ξ=k) for the Poisson probability distribution.
Example 4.24 Derive the formula for the mean of the Poisson distribution.
Solution. The mean requires to know the Taylor series of the function y=ex which is
+∞
X xj
1
1
1
ex = 1 + x + x2 + x3 + · · · + xk + · · · =
.
2
6
k!
j!
j=0
The mean of a random variable ξ is calculated from the relation in Tab. 4.1 using the formula for simple probabilities of the Poisson probability
distribution
X
+∞ +∞ j
+∞ k X
X
λj−1 −λ
λ −λ
λ
−λ
e
e
=
λ
= λe
= λe−λ eλ = λ.
Eξ =
j
j!
(j − 1)!
k!
j=1
j=0
k=0
Example 4.25 The average number of trains which pass over a bridge is 5. How many trains should be considered in constructing a new bridge for
the probability that there are more trains passing the bridge were at most 0.01?
Solution. Let ξ be a random variable expressing the number of passing trains (during an hour). According to the assumption, Eξ=5 so the parameter λ
is also five. We have to find for which k still holds P(ξ>k)≥0.01, equivalently written as P(ξ≤k)≤0.99. Thence, we obtain
1 2 1 3
1 k
−5
0.99≥P(ξ≤k) = e
1 + 5 + 5 + 5 + ··· + 5 .
2
6
k!
.
As 0.99e5 =146.9 and the sum in the parentheses is for k=10 approximately 146.4 and for k=11 roughly 147.6, the bridge has to be designed for ten
trains per hour.
Table 4.3: Description of basic continuous probability distributions.
Uniform
Exponential
Probability density
Parameters
Cumulative
distribution function
2
p(x) = Θ( w4 − (x − m)2 ) w1
m ∈ R, w > 0
F (x) =
Characteristics
Usage
1
2
1−
Eξ = m,
|x−b|−|x−a|
b−a
Dξ =
(x−m)2
1
p(x) = √2πσ
e− 2σ2
m ∈ R, σ 2 (σ > 0)
p(x) = Θ(x)δe−δx
δ>0
F (x) = Θ(x) 1 − e−δx
F (x) =
w2
12
Eξ = 1δ ,
Dξ =
1
δ2
√ 1 e−
2πσ
(t−m)2
2σ 2
Expresses a time between two occurrences of the observed event (life time,
service time, . . . )
Expresses measurement error, a characteristic of a population (height of
people,. . . ); the most frequent probability distribution in mathematical
statistics.
0.8
m=0
w = 0.5
1.5
0.6
σ = 0,5
1.5
1
0.5
Probability density
graphs
dt = Φ( x−m
)
σ
Dξ = σ 2
Eξ = m,
2
2
Rx
−∞
A random number generator
m=0
Normal (Gauss)
w=1
1
w=2
0.5
0
0.4
δ=2
δ=1
δ = 0,5
σ=2
0
-2
-1
0
1
2
σ=1
0.2
0
0
2
4
6
8
-4
-2
0
2
4
0.8
w=1
σ=1
2
0.6
1.5
0.4
m = −1 m = 0 m = 1
1
m = −3 m = 0 m = 2
0.2
0.5
0
0
-2
-1
0
1
2
-4
-2
0
2
4
There are also three basic types of random variables with continuous distribution which we discuss: the uniform distribution U(m, w), the exponential
distribution E(δ) and the normal (Gauss) distribution N (m, σ 2 ). The parentheses include the parameters of each particular probability distribution. The
basic description of random variables with these probability distributions is summarized in Tab. 4.3.
The random variables ξ with the uniform probability distribution have the property that for all intervals I of the same length, taken from the interval
hm− w2 ; m+ w2 i, the probability P(ξ∈I) is also the same. We have already met this type of probability distribution in Example 4.10, Example
4.12, Example 4.14 and Example 4.16. It is probably the simplest type of a continuous probability distribution.
The Poisson probability distribution discussed afore observes the number of occurrences of an event during a time span. Thus, the time between two
occurrences of the event is also random. The random variable which measures this time is considered to have the exponential probability distribution.
If ξ∼P(()λ) and if η expresses the time between two occurrences of the event, then for any t holds
∗
Fη (t) = P(η < t) = 1 − P(η > t) = 1 − P(ξt = 0) = 1 − e−λt
(4.21)
which proves that random variable η has the exponential probability distribution with the parameter δ=λ. The relation denoted by ∗ could be used
because there was no event during the time t. The exponential probability distribution is also interesting for its property of being independent of its past
as we intend to show in the next example.
Example 4.26 Let the service time is expressed by a random variable η with the exponential probability distribution and let the service has not been
terminated during the time t. Then the probability that it will not have terminated by the following time τ is equal to the probability that it does not
terminate during the time τ . Prove it!
Solution. First, sum it up. The service did not terminate during the time t, the probability of such an event is P(η > t). Having this assumption,
we want to calculate the probability that it will not have terminated by next time τ . Hence, the total time of non-termination is greater than t+τ .
Considering the condition, the conditional probability has to be calculated from its definition (4.2) and utilizing the cumulative distribution function of
the exponential probability distribution from Tab. 4.3. It renders
1 − 1 − e−δ(t+τ )
P(η > t + τ )
P({η > t + τ } ∩ {η > t})
=
=
= e−δτ = P(η > τ ),
P(η > t + τ /η > t) =
P(η > t)
P(η > t)
1 − (1 − e−δt )
which is the probability that the service does not terminate within the time τ . And this was to prove.
Example 4.27 There appeared an announcement in a hypermarket which had stated: If some cash desks are out of order and you wait longer than
five minutes, you receive a euro from us. Let us suppose that not all cash desks are in order and that a customer waits a hundred seconds in average.
Which is the probability that a customer receives the promised one euro?
Solution. We suppose that the service time has the exponential probability distribution with the parameter δ. As the average service time is determined
by the mean of a random variable ξ with the exponential probability distribution and Eξ= 1δ , the parameter δ can be obtained as
Eξ =
1
100
=
δ
60
(expressed in minutes)
⇒
3
δ= .
5
The required probability can be calculated using the cumulative distribution function of the exponential probability distribution
.
P(ξ > 5) = 1 − P(ξ ≤ 5) = 1 − Fξ (5) = 1 − 1 − e−5δ = e−3 = 0.0498.
Hance, a customer receives the present with the probability of roughly 5 per cent.
The most important probability distribution is the normal distribution. This is true for the theory and also for applications and its utilization in the
mathematical statistics. The cumulative distribution function is not expressed by an elementary function, see also Tab. 4.3, so it is usually expressed
only for a special form of the normal distribution N (0, 1), the standard normal distribution and then transformed. This standard function is denoted Φ
and reads
Z x
t2
1
(4.22)
e− 2 dt.
Φ(x) = √
2π −∞
The values of the function Φ are written in Tab. 4.4. The probability distribution is symmetric, i.e. the probability density is an even function, therefore
the relation Φ(x)=1−Φ(−x) holds and the table then includes only values of Φ for positive x. The normal probability distribution has a lot of interesting
and important properties, some of them are:.
A5. Let ξ1 ∼N (m1 , σ12 ), ξ2 ∼N (m2 , σ22 ),. . . ,ξn ∼N (mn , σn2 ) be independent and let η=c + c1 ξ1 + c2 ξ2 + · · · + cn ξn , with c, c1 , c2 , . . . cn being non-zero
real constants. Then
(4.23)
η ∼ N c + c1 m1 + c2 m2 + · · · + cn mn , c21 σ12 + c22 σ22 + · · · + c22 σ22 ,
which can be remembered as ‘the sum of the normal is normal’.
B5. If a random vector (ξ1 , ξ2 ) has the joint normal distribution determined by the probability distribution
p(x1 , x2 ) =
2πσ1 σ2
1
p
1 − %2
e
−
1
1−%2
(x1 −m1 )2
(x −m )2
(x −m )(x −m2 )
+ 2 2 2 −% 1 σ1 σ 2
2
2σ1
2σ2
1 2
,
(4.24)
where ξ1 ∼N (m1 , σ12 ), ξ2 ∼N (m2 , σ22 ) and % is the correlation coefficient between ξ1 and ξ2 . Then the random variable ξ1 and ξ2 are independent
if and only if they are correlatively independent (uncorrelated), i.e. %=0.
Φ
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3.0
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
0
0.5000
0.5398
0.5793
0.6179
0.6554
0.6915
0.7257
0.7580
0.7881
0.8159
0.8413
0.8643
0.8849
0.9032
0.9192
0.9332
0.9452
0.9554
0.9641
0.9713
0.9772
0.9821
0.9861
0.9893
0.9918
0.9938
0.9953
0.9965
0.9974
0.9981
0.9987
0.9990
0.9993
0.9995
0.9997
0.9998
0.9998
0.9999
0.9999
1.0000
1
0.5040
0.5438
0.5832
0.6217
0.6591
0.6950
0.7291
0.7611
0.7910
0.8186
0.8438
0.8665
0.8869
0.9049
0.9207
0.9345
0.9463
0.9564
0.9649
0.9719
0.9778
0.9826
0.9864
0.9896
0.9920
0.9940
0.9955
0.9966
0.9975
0.9982
0.9987
0.9991
0.9993
0.9995
0.9997
0.9998
0.9998
0.9999
0.9999
1.0000
Table
2
0.5080
0.5478
0.5871
0.6255
0.6628
0.6985
0.7324
0.7642
0.7939
0.8212
0.8461
0.8686
0.8888
0.9066
0.9222
0.9357
0.9474
0.9573
0.9656
0.9726
0.9783
0.9830
0.9868
0.9898
0.9922
0.9941
0.9956
0.9967
0.9976
0.9982
0.9987
0.9991
0.9994
0.9995
0.9997
0.9998
0.9999
0.9999
0.9999
1.0000
4.4: Table of values of the
3
4
5
0.5120 0.5160 0.5199
0.5517 0.5557 0.5596
0.5910 0.5948 0.5987
0.6293 0.6331 0.6368
0.6664 0.6700 0.6736
0.7019 0.7054 0.7088
0.7357 0.7389 0.7422
0.7673 0.7704 0.7734
0.7967 0.7995 0.8023
0.8238 0.8264 0.8289
0.8485 0.8508 0.8531
0.8708 0.8729 0.8749
0.8907 0.8925 0.8944
0.9082 0.9099 0.9115
0.9236 0.9251 0.9265
0.9370 0.9382 0.9394
0.9484 0.9495 0.9505
0.9582 0.9591 0.9599
0.9664 0.9671 0.9678
0.9732 0.9738 0.9744
0.9788 0.9793 0.9798
0.9834 0.9838 0.9842
0.9871 0.9875 0.9878
0.9901 0.9904 0.9906
0.9925 0.9927 0.9929
0.9943 0.9945 0.9946
0.9957 0.9959 0.9960
0.9968 0.9969 0.9970
0.9977 0.9977 0.9978
0.9983 0.9984 0.9984
0.9988 0.9988 0.9989
0.9991 0.9992 0.9992
0.9994 0.9994 0.9994
0.9996 0.9996 0.9996
0.9997 0.9997 0.9997
0.9998 0.9998 0.9998
0.9999 0.9999 0.9999
0.9999 0.9999 0.9999
0.9999 0.9999 0.9999
1.0000 1.0000 1.0000
function Φ.
6
7
0.5239 0.5279
0.5636 0.5675
0.6026 0.6064
0.6406 0.6443
0.6772 0.6808
0.7123 0.7157
0.7454 0.7486
0.7764 0.7794
0.8051 0.8078
0.8315 0.8340
0.8554 0.8577
0.8770 0.8790
0.8962 0.8980
0.9131 0.9147
0.9279 0.9292
0.9406 0.9418
0.9515 0.9525
0.9608 0.9616
0.9686 0.9693
0.9750 0.9756
0.9803 0.9808
0.9846 0.9850
0.9881 0.9884
0.9909 0.9911
0.9931 0.9932
0.9948 0.9949
0.9961 0.9962
0.9971 0.9972
0.9979 0.9979
0.9985 0.9985
0.9989 0.9989
0.9992 0.9992
0.9994 0.9995
0.9996 0.9996
0.9997 0.9997
0.9998 0.9998
0.9999 0.9999
0.9999 0.9999
0.9999 0.9999
1.0000 1.0000
8
0.5319
0.5714
0.6103
0.6480
0.6844
0.7190
0.7517
0.7823
0.8106
0.8365
0.8599
0.8810
0.8997
0.9162
0.9306
0.9429
0.9535
0.9625
0.9699
0.9761
0.9812
0.9854
0.9887
0.9913
0.9934
0.9951
0.9963
0.9973
0.9980
0.9986
0.9990
0.9993
0.9995
0.9996
0.9997
0.9998
0.9999
0.9999
0.9999
1.0000
9
0.5359
0.5753
0.6141
0.6517
0.6879
0.7224
0.7549
0.7852
0.8133
0.8389
0.8621
0.8830
0.9015
0.9177
0.9319
0.9441
0.9545
0.9633
0.9706
0.9767
0.9817
0.9857
0.9890
0.9916
0.9936
0.9952
0.9964
0.9974
0.9981
0.9986
0.9990
0.9993
0.9995
0.9997
0.9998
0.9998
0.9999
0.9999
0.9999
1.0000
n
.
C5. If ξ1 , ξ2 ,. . . ,ξn ,. . . is a sequence of independent random variables with the same means and variances: Eξi =m, Dξi =σ 2 and if ξ¯(n) = ξ1 +ξ2 +···+ξ
n
Then
ξ¯(n) − Eξ¯(n)
lim Fη(n) (x) = Φ(x),
where η (n) = p
.
(4.25)
n→+∞
Dξ¯(n)
This proposition is called the central limit theorem and it plays a key role e.g. in many statistical calculations. Basically, it claims that averaged
random variables after normalization are very close to a random variable with the standard normal distribution.
Example 4.28 Show that the parameters of the normal probability distribution determine its mean and variance.
Solution. First, we look at the mean
x−m
t
=
σ
Z +∞
Z +∞
Z +∞
1
2
(x−m)
1
1 − t2
1 − t2
dt = σ dx
∗
−
2
2 dt = 2m
√
√
x√
Eξ =
e 2σ dx = (σt
+
m)
e
e 2 dt
=
2πσ
2π
2π
−∞
0
−∞
x → −∞ : t → −∞
x → +∞ : t → +∞
√
s = t2 , t = 2s 2
Z +∞
1
1
1
dt = √12s ds
−s
√ e ds = 2m √ Γ
=
= 2m
= m,
2
2 πs
2 π
t=0:s=0
0
t → +∞ : s → +∞
where in the
√ equation marked by ∗, we used evenness or oddness of the integrated functions and in the last expression we used a value of the function
1
Γ: Γ 2 = π, see below Example 4.33.
Similarly, the variance can by calculated (the same substitutions as above)
Z +∞
Z +∞
Z +∞
Z +∞
(x−m)2
1
1 − t2
1 − t2
1
1
3
−
2
2
2
2
2
2
−s
2
2
Dξ =
(x − m) √
σ t √ e 2 dt = 2σ
t √ e 2 dt = 2σ
2s √ e ds = 2σ √ Γ
e 2σ dx =
= σ2,
2
2
π
s
π
2πσ
2π
2π
−∞
−∞
0
0
√
where we again used a value of the function Γ: Γ 23 = 12 Γ 21 = 12 π.
Example 4.29 Show that for a random variable ξ∼N (m, σ 2 ) and arbitrary real constants a, b (a6=0) holds η=aξ+b ∼ N (am + b, a2 σ 2 ).
Solution. The value of the mean and of the variance can be calculated by using the property (4.17). The problem is normality of η. The cumulative
distribution function of η renders
Fη (x) = P(η < x) = P(aξ+b < x).
In dividing by a, we have to distinguish between positive and negative a. If a>0, then
x−b
x−b
= Fξ
=Φ
P(aξ+b < x) = P ξ <
a
a
x−b
a
−m
σ
!
=Φ
x − am − b
aσ
and the last expression confirms, according to Tab. 4.3, that η has a normal probability distribution. It even shows its mean and variance, of course.
If a<0, then
x−b
x−b
x−b
=1−P ξ <
= 1 − Fξ
=1−Φ
P(aξ+b < x) = P ξ >
a
a
a
x−b
a
−m
σ
!
=1−Φ
x − am − b
aσ
∗
=Φ
x − am − b
−aσ
and the last expression again confirms that η has a normal probability distribution, realizing symmetry, Φ(x)=1−Φ(−x) (used in the relation denoted
by ∗) and positiveness of the denominator −aσ.
Example 4.30 Show that for a pair of independent random variables ξ1 ∼N (m1 , σ12 ) and ξ2 ∼N (m2 , σ22 ) holds η=ξ1 +ξ2 ∼ N (m1 + m2 , σ12 + σ22 ).
Solution. The reasoning for the mean and for the variance of the random variable η respectively uses the relations (4.18a) and (4.18c). To show that it
has a normal probability distribution is a bit harder.
The result of Example 4.13 provides that
Z
+∞
Z
+∞
pξ1 (t)pξ2 (x − t)dt =
pη (x) =
−∞
−∞
t − m1 = s, µ = x − m1 − m2 Z +∞
2
(t−m1 )2
(x−t−m2 )2
(µ−s)2
− 12 s 2 +
−
−
1
1
1
dt
=
ds
2
2
2
σ1
σ2
2σ2
=
√
e 2σ1 √
e
dt = e
ds.
t
→
−∞
:
s
→
−∞
2πσ
σ
2πσ1
2πσ2
1
2
−∞
t → +∞ : s → +∞
We modify the exponent to a sum of squares so that s appears only in one addend. We obtain
2
s2
(µ − s)2
1
1
µ2
σ12 + σ22
µ
µσ12
µ2
2
+
=
+
s
−
2
s
+
=
s
−
+
.
σ12
σ22
σ12 σ22
σ22
σ22
σ12 σ22
σ12 + σ22
σ12 + σ22
The substitution to the last integral renders
Z
+∞
pη (x) =
−∞
1
e
2πσ1 σ2
−
2 +σ 2
σ1
2
2 σ2
2σ1
2
s−
2
µσ1
2 +σ 2
σ1
2
2
−
µ2
2 +σ 2
2 σ1
2
(
1
−
µ2
2 +σ 2
2 σ1
2
) ds = p
e (
2π (σ12 + σ22 )
)





Z +∞ 

−∞

−

1
q
e
σ12 σ22


2π

σ12 +σ22


1
µσ 2
s− 2 1 2
σ1 +σ2
σ2 σ2
2 21 22
σ1 +σ2
!2 






ds





−
µ2
2 +σ 2
2 σ1
2
=p
e (
2π (σ12 + σ22 )
1
−
) =p
e
2π (σ12 + σ22 )
µσ 2
(x−m1 −m2 )2
2 +σ 2
2 σ1
2
(
) ,
σ2 σ2
because in the last integral the probability density of the normal probability distribution with the mean σ2 +σ1 2 and with the variance σ21+σ22 is integrated
1
2
1
2
and such integral is always equal to one. The resulting function is the probability density of the normal probability distribution with required parameters.
Example 4.31 For ξ∼N (m, σ 2 ), derive the rule of ‘three sigma’: P(|ξ − m| > 3σ)<0.003.
Solution. The distribution of η= ξ−m
is normalized so that
σ
∗
P(|ξ − m| < 3σ) = P(−3 < η < 3) = Φ(3) − Φ(−3) = 2Φ(3) − 1 = 2 · 0.9987 − 1 = 0.9974 > 0.997,
and it renders
P(|ξ − m| > 3σ) = 1 − P(|ξ − m| < 3σ) < 1 − 0.997 = 0.003.
The probability that the value of a random variable with the normal probability distribution falls out of the interval spanning 3σ around the mean of the
random variable is negligibly small.
Example 4.32 Using Tab. 4.4, find the probability P(1.3 < ξ < 2.7), where ξ∼N (1, 3).
√ . Then
Solution. First, the random variable ξ is normalized: η= ξ−1
3
P(1.3 < ξ < 2.7) = P
1.3 − 1
ξ−1
2.7 − 1
√
< √ < √
3
3
3
= P(0.173 < η < 0.981) = Φ(0.981) − Φ(0.173).
The required values of the function Φ can be interpolated from the values in the table
Φ(0.981) = Φ(0.98) +
Φ(0.99) − Φ(0.98)
(0.981 − 0.98) = 0.8365 + (0.8389 − 0.8365) · 0.1 = 0.8367,
0.99 − 0.98
Φ(0.173) = Φ(0.17) +
Φ(0.18) − Φ(0.17)
(0.173 − 0.17) = 0.5675 + (0.5714 − 0.5675) · 0.3 = 0.5687
0.18 − 0.17
and we obtain
P(1.3 < ξ < 2.7) = Φ(0.981) − Φ(0.173) = 0.8367 − 0.5687 = 0.2680.
The probability theory – The function Γ
There are function which ar not elementary for us, but they are frequently used in advanced mathematics.
The Euler function Γ y=Γ(x) is defined by an improper integral. The definition relation is
Z +∞
tx−1 e−t dt,
for
x > 0.
Γ(x) =
(4.26)
0
The definition of the function introduces the function domain only by positive numbers, in order the integral to exist. Nevertheless, the function Γ can
be defined for all complex numbers with an exception of negative whole numbers.
Example 4.33 Calculate the values Γ(1) and Γ
Solution. We use the definition
Z
Γ(1) =
1
2
.
+∞
1−1 −t
t
0
Z
e dt =
0
+∞
e−t dt = lim
c→+∞
−t c
−e 0 = lim 1 − e−c = 1,
c→+∞
so Γ(1)=1.
For the other value, we use a substitution first
t = s2 , s ≥ 0
Z +∞
Z +∞ −t
Z +∞
1
1
e
2
dt
=
2sds
−1
−t
=2
√ dt = Γ
=
t 2 e dt =
e−s ds.
t
=
0
⇒
s
=
0
2
t
0
0
0
t → +∞ ⇒ s → +∞
The last integral can be calculated by means of polar coordinates as follows:
s = ρ cos ϕ
Z +∞
2 Z +∞
Z +∞
ZZ
p = ρ sin ϕ
−s2
−s2
−p2
−s2 −p2
e ds =
e ds ·
e dp =
e
dsdp = J =ρ
0
0
0
h0+∞)×h0+∞)
0 ≤ ρ, 0 ≤ ϕ ≤
2
ρ
=
u,
s
≥
0
Z π Z +∞
2
2
2ρdρ
=
du
−ρ
=
e ρdρdϕ = ρ=0⇒u=0
0
0
π
ρ → +∞ ⇒ u → +∞
2
Z
c
π
π
π +∞ 1 −u
e du =
lim −e−u 0 = .
=
2 0
2
4 c→+∞
4
The result and the previous formula render
r
Z +∞
1
π √
2
Γ
e−s ds = 2
=2
= π.
2
4
0
Important values of the function Γ can be calculated by many known formulae, we will need especially the following:
1 1 · 3 · 5 · · · (2n − 1) √
π,
Γ(x + 1) = xΓ(x),
(in particular for natural n) Γ(n + 1) = n!, Γ n +
2
2n
π
sin πx
The first formula makes us to think about the function Γ as a generalization of factorial (n!) for non-whole numbers.
Γ(x)Γ(1 − x) =
(4.27)
(4.28)
It also may be useful to draw the graph of the function, see Fig. 4.4(left). For x close to zero and for large x, too, the values of the function Γ tend to
infinity. The approximate values of the function Γ close to this limit points can be estimated by the relations
1
1
−γ+
6γ 2 + π 2 x, x → 0,
x
12
(4.29)
x x r 2π 1
Γ(x) ≈ κ+∞ (x) =
1+
, x → +∞,
e
x
12x
Pn 1
where we used the Euler-Mascheroni constant γ= limn→+∞
k=1 n − ln n ≈0.577215664901533 known also in many other special functions. The
graphs in Fig. 4.4(right) demonstrates the accuracy of the approximation formulae, where the vertical axis indicates relative differences δy= |κ0 (x)−Γ(x)|
Γ(x)
Γ(x) ≈ κ0 (x) =
(x)−Γ(x)|
and δy= |κ+∞Γ(x)
in a logarithmic scale. We can see that roughly from 0.5 up the difference between Γ(x) and κ+∞ (x) is small and in a vicinity of
zero the other estimation by κ0 offers very accurate values of the function Γ.
10−1
κ+∞
κ0
10−2
δy
y
6
5
4
3
2
1
0
10−3
10−4
10−5
10−6
0
0,5
1
1,5
2
2,5
3
3,5
4
0
0,5
1
x
1,5
2
2,5
3
x
Figure 4.4: The function y=Γ(x) and accuracy of its approximations y=κ0 (x) a y=κ+∞ (x).
1. Can you derived the relations (4.1) from the basic properties A1 to C1?
2. Do you demonstrate the properties of the cumulative distribution function A2, B2, C2 which were not proved in Example 4.8?
3. What form does any cumulative distribution function have?
4. According to Example 4.13, try to find a formula for probability density if the random variables are subtracted, multiplied, divided, . . .
5. Are you able to derive a formula similar to that of Example 4.13 valid for discrete random variables?
6. Suggest a way of demonstrating the relations (4.18) and (4.17)?
7. Can you explain the properties of the normal distribution A5, B5, C5 also for the possibilities not proved in the examples?
8. Prove the relations (4.27)!
In the problems 1. to 7., verify, whether the functions F are cumulative distribution functions for some random variables ξ. If so, find probability density
p and calculate the mean and the variance of the random variable ξ.
1. F (x) =
1
2
+
1
π
arctg x .
x
2. F (x) = Θ(x) x+1 .
(
ex ,
x ≤ 0,
3. F (x) =
.
−x
1−e , x>0
α
−( x )
, α>0, β>0 .
6. F (x) = Θ(x) 1 − e β
4. F (x) = Θ(x)x ln x .
5.
x(x+2)
7. F (x) = Θ(x) (x+1)2 .

0,



2x2 ,
F (x) =
1 − 2(1 − x)2 ,



1,
x ≤ 0,
0 < x ≤ 0.5,
0.5 < x ≤ 1,
x>1
.
In the problems 8. to 12., determine the constant c so that the function p is the probability density for a random variable ξ. Calculate also basic
characteristics of the random variable.
8. p(x) = Θ(x)c xe
− x2
2
.
−x
.
10. p(x) = Θ(x)c xe
9. p(x) = Θ(x)c
x
(2+x)6
x
11. p(x) = Θ(x)c x+1 .
.


0,



2


c x ,
p(x) = 2c − c (2 − x)2 ,



12.
c (4 − x)2 ,



0,
x ≤ 0,
0 < x ≤ 1,
1 < x ≤ 3,
3 < x ≤ 4,
x>4
.
In the problems 13. to 18., choose an appropriate probability distribution for the pertinent random variable and calculate the required probabilities.
13. In a classroom, there are 10 computers. The probability that a computer breaks during a school lesson is 0.05. Find the probability that (i) there
is right one broken computer in a lesson and (ii) there is at least one broken computer in a lesson .
14. The average time for the test corrector R.V. to correct one exam test is five minutes. What is the probability that an hour is not sufficient to
correct the exam test for ten students? Determine also the probability that in five minutes he manages two tests .
15. Under the conditions of the previous problem, determine the probability that for a test R.V. needs more than ten minutes, alternatively that he
manages to do it in a minute .
16. Let us suppose that an average height of students is 172cm, and that the population has standard deviation 12cm. What is the ratio of the
students which should be according to probability taller than 190cm? Calculate also how many are 170cm to 180cm tall .
17. An experiment successes with a particular specimen in 80% of cases. Determine how many specimens are required to claim with probability 0.99
that at least two specimens succeed in the experiment .
18. The average number of claims reported to an insurance company is two per day. What is the probability that one day is not reported any insured
event in the insurance company? In addition, calculate the probability that the insurance company does not deal with more than 30 claimed events in a
month .
Conclusion
The chapter helped us to learn those facts from the probability theory which will be useful in statistical calculations. We have learned what a random
variable is and how to use it to describe some random events e.g. a result of an experiment that is affected by a factor of chance. We know the cumulative
distribution function and probability density of random variables. For any random variable, we have also introduced basic numerical characteristics which
determine an expected value of the random variable or a measure of deviation from the expected value. We know a characteristic which describes the
importance of the linear relationship between two random variables. We also identified the basic types of random variables and their characteristics.
We have focused mainly on the properties of the normal probability distribution which seems to be the most important. Finally, basic facts about the
function Γ which often appears in applications of advanced mathematics have been introduced.
References
[1] G. K. Bhattacharyya, R. A. Johnson. Statistical Concepts and Methods. Wiley, New York, 1977
[2] C. Török. Úvod do teórie pravdepodobnosti a matematickej štatistiky. TU Košice, 1992.
[3] M. Kalina et al. Základy pravdepodobnosti a matematickej štatistiky. STU Bratislava, 2010.
Problem solutions
1. yes
2. yes
3. no
4. no
5. yes
6. yes
7. yes
8. c= 41 , Eξ=4, Dξ=8
11. c does not exist
12. c= 41 , Eξ=2, Dξ= 21
13. ξ∼B(10, 0.05), P(ξ(i) =1), P(ξ(ii) ≥1)
1
15. ξ1,2 ∼E( 5 ), P(ξ1 >10), P(ξ2 <1)
16. ξ1,2 ∼N (172, 144) , P(ξ1 >190), P(170<ξ2 <180)
18. ξ1 ∼P(2), P(ξ1 =0), ξ2 ∼P(60), P(ξ2 ≤30)
√
9. c=320, Eξ= 43 , Dξ= 20
10. c=2, Eξ= 2π , Dξ=1− π4
9
14. ξ1 ∼P(12), P(ξ1 <10), ξ2 ∼P(1), P(ξ2 ≥2)
17. ξ∼B(n, 0.8), P(ξ≥2)=0.99
Chapter 5. (Descriptive statistics)
Aim
Introduce to students basic notions of statistics, the ways of recording results of measurement and the source data analysis.
Objectives
1. Understand and learn to use correctly principal phrases of statistics.
2. Effectively write data to statistical tables.
3. Gain a specified information about data from numerical sample characteristics and graphs.
4. Connect sample characteristics to the probability theory.
5. Determine basic probability distributions for random variables used in statistics and understand their characteristics.
Prerequisites
random variable; cumulative distribution function; probability distribution
Chapter 5. (Descriptive statistics) – 1
Introduction
The chapter discusses the basis of mathematical statistics. Mathematical statistics deals with properties of random phenomena based on data obtained
by experiments. The experiment can be a true physical experiment and the results of its measurement, collection of data pertinent to a character or a
property of construction materials, and a survey on customers’ satisfaction etc..
In probability theory, we have shown how to work with random variables and how to describe their properties. Nevertheless, there has remained a question
of identifying the specific context of the pertinent type of the random variable. This is a task of mathematical statistics, but not only this, because
the properties or characteristics of random variables may be unknown and as the very minimum they should be able to be competently estimated and
assessed.
In mathematical statistics, therefore, we will analyze the collected data sets to estimate the type of the cumulative distribution function of a random
variable and we will also try to estimate some of its parameters. For this, we need to do a physical experiment or to collect data in order to properly
register our observations.
First of all, basic terms which we will use for describing such a collection will be summarized and we will show how to write the collected data effectively
and efficiently. Next, it will be specified which sample characteristics we will use in the statistical methods for proper assessment of the subsequent
analysis. And also we will show how to display the data collected by statistical graphs, because visual perception in such an analysis is very useful.
Then, a mathematical analysis of the data follows with respect to their unpredictable character and at this point the theory of probability has to be
applied. Therefore, in the next section of this chapter, the essential relationships of measured data with the theory of probability will be focused. In the
field of civil engineering, the data affected by a chance are often considered with the normal probability distribution so just this probability distribution
and the probability distributions closely related to it will form the basis of our subsequent work in the field of mathematical statistics. Therefore, at the
end of the chapter, the basic facts and properties of the probability distributions are listed, as a rigorous use of statistics in practice and its utilization
and connection to the numeric data analysis is required.
Descriptive statistics – Statistical sample, statistical variables and tables
Before any statistical analysis, we learn basic statistical terminology.
The statistical population is the complete set of all possible statistical units which can be considered in an experiment, in collection etc.
The random choice from the population is a sequence of independent random variables ξ1 , ξ2 , . . . , ξn with the same probability distribution determined
by the cumulative distribution function F .
The statistical sample is understood either as a collection of objects, statistical units, affected by unpredictable influences, i.e. as a set of random variables
obtained from a random choice, or as a collection of data gained from the chosen statistical units, i.e. as the values of the random variables.
Example 5.1 Demonstrate some examples of the notions in practice.
Solution. If we statistically investigate the population of Slovaks, then each Slovak is a statistical unit and all Slovaks form the statistical population.
When the population is tested, we randomly choose some people and we get the statistical sample, which is either a set of the objects (particular
Slovaks), or a set of data which characterize them, e.g. height, eye color, occupation etc.
Similarly, in testing a primer paint in a container, the paint itself is the statistical population and a statistical unit may by a specimen of this paint. In
testing the paint, we take several specimens from the container and we obtain the statistical sample which is a set of objects, i.e. all specimens, and
also we get a set of data about the specimens e.g. density, viscosity, permeability.
A starting point for a statistical analysis are the data pertinent to their typical properties, which represent statistical variables. The data are usually
written into tables. In the present methods we will use two types of tables: the primary table and the frequency table. The primary table summarizes
all data from an statistical sample with n elements, each elements having m different properties, i.e. the statistical variables. If the values (observed
values) of the pertinent random variables ξij , with i=1, . . . , n and j=1, . . . , m, are denoted xji , the result is something like Tab. 5.1. Of course, the
Table 5.1:
ξ1
1 x11
2 x12
..
..
.
.
n
x1n
A primary
ξ2 . . .
x21 . . .
x22 . . .
..
...
.
table.
ξm
xm
1
xm
2
..
.
x2n
xm
n
...
table can be written in a transposed form, if it is more appropriate.
The frequency table is used if the data are divided to categories Ci , as it is shown in a mask for the frequency table in Tab. 5.2. The categories can be
the values of the random variables, if they belong to discrete random variables with a small number of possible values. If a discrete random variable may
have too many or infinite number of possible results or if the variable is pertinent to a continuous random variable, the categories are formed by some
sets of the values, e.g. suitable disjoint intervals. In such a case, each category receives for the analysis a unique value, e.g. the mean of admissible
category values. The table then P
presents the number of values ni pertinent to the category Ci . Sometimes, instead of absolute counts ni , relative
counts νi = nni are used, where n= m
k=1 ni . The frequency table is prepared for having an overview of the measured data: which values are present more
frequently or which are only a few. This helps in having an idea of probability distribution of the random variables pertinent to the statistical population.
Thus, such a table is used in those statistical method, which try to infer just about the probability distribution.
Table 5.2: A frequency table.
Categories C1 C2 . . . Cm
Counts
n1 n2 . . . nm
Example 5.2 The hypermarket made a study of the costumers waiting times. The recorded waiting times is seconds (after reordering them into an
ascending order) are written in Tab. 5.3. Prepare a frequency table of the measure data.
2
44
92
149
7
47
92
165
Table
10 21
48 50
96 103
176 181
5.3: The costumers waiting times.
26 28 32 33 38 40 41 41 44
50 55 67 69 69 74 78 84 85
103 104 111 118 131 133 136 138 143
196 197 206 213 259 275 287 383 440
Solution. The table is a primary one, with just one property – the time. So many data cannot be effectively and clearly written in a row or a column.
The number of categories should be chosen so that the distribution of the values is clearly seen. In our case, the data are divided to time intervals of
the lengths 50s, where a value of a category is chosen as the midpoint of the interval, ni are counts of the results in pertinent categories and νi are their
relative counts, see Tab. 5.4.
Descriptive statistics – Descriptive statistics
There are many data in a measurement and it is necessary to get short and clear information in graphical or numerical forms about the data considered
as a statistical sample.
The sample characteristic (or statistic) T is a function of the random choice with n elements. It can be understood as a function of random variables
ξ1 , ξ2 , . . . , ξn of the random choice, or as a function of the values x1 , x2 , . . . , xn of these random variables performed by the statistical sample. Hence,
the sample characteristic can be interpreted either as a random variable ξ=T (ξ1 , ξ2 , . . . , ξn ), which is important from the point of view of the theory to
fill the probability character of the statistical analysis, or as a value x=T (x1 , x2 , . . . , xn ), which is necessary for practical application and explanation of
the experimental data.
Our sample characteristics will be: the sample mean and the sample variance. Alternatively, the sample standard deviation can be used which is the
square root of the variance. The formulae for characteristic calculation from the values of the sample x1 , x2 , . . . , xn are written in Tab. 5.5, where the
sample standard deviation is Sn−1 . It should be noted that in some calculation also the variance of the population Sn2 is used for the population with n
Table 5.4: The frequency table for the customers waiting times.
Interval
Category C
ni
νi
(0, 50i
(50, 100i
(100, 150i
(150, 200i
(200, 250i
(250, 300i
(300, 350i
(350, 400i
(400, 450i
25
18
0.3462
75
11
0.2115
125
11
0.2115
175
5
0.0962
225
2
0.0385
275
3
0.0577
325
0
0.0
375
1
0.0192
425
1
0.0192
n
1P
(xi − X̄)2 . Having a large n, the difference between both relations is small. Nevertheless, the difference
n i=1
n
becomes significant for small samples, as the ratio between them is n−1
, see Example 5.4 below.
elements, defined be the relation Sn2 =
For the frequency table Tab. 5.2, the sample characteristics are calculated by weighted formulae, where the weights are determined by the counts ni and
the values xi are in fact the (unique) values of the categories Ci , hence
m
1X
X̄ =
n i xi ,
n i=1
m
Sn2
m
X
1X
=
ni (xi − X̄)2 , where n=
ni .
n i=1
i=1
(5.1)
The statistical analysis of the distribution and dispersion of the sample data also uses the sample median, the sample lower quartile and the sample
upper quartile, which are determined from the ordered sample, i. e. in the statistical sample the values x1 , x2 , . . . , xn are arranged in an ascending order
x̃1 ≤x̃2 ≤ · · · ≤x̃n . The characteristics then are defined by those values which satisfy the conditions: there is a quarter of all observations with values
smaller than the lower quartile, there is a half of all observations with values smaller than the median, and there are three quarters of all observations
with values smaller than the upper quartile. These characteristics are also written in Tab. 5.5.
2
Example 5.3 Prove the equality between two formulae for Sn−1
in Tab. 5.5.
Solution. We modify one of the formulae
n
X
n
n
n
n
X
X
X
X
2
2
2
2
(xi − X̄) =
(xi − 2xi X̄ + X̄ ) =
xi − 2X̄
xi + nX̄ =
x2i − nX̄ 2 ,
2
i=1
because
n
P
i=1
xi =nX̄. Multiplying by the factor
i=1
1
,
n−1
i=1
i=1
i=1
we obtain the required relation.
Statistic
Sample mean
Sample variance
Sample lower quartile
Sample median
Sample upper quartile
Table 5.5: Basic sample characteristics.
Notation
Formula
n
1P
X̄
X̄ =
xi
n i=1
n
n
P 2
1 P
1
2
2
2
2
S
Sn−1 =
(xi − X̄) =
x − nX̄
n − 1 i=1
n − 1 i=1 i
n=4k
n=4k+1
n=4k+2
n=4k+3
3x̃k +x̃k+1
x̃k +x̃k+1
x̃k +3x̃k+1
Q1
x̃k+1
4
2
4
n=2k
n=2k+1
x̃k +x̃k+1
M (=Q2 )
|
x̃k+1
2
n=4k
n=4k+1
n=4k+2
n=4k+3
3x̃3k +x̃3k+1
x̃3k+1 +x̃3k+2
3x̃3k+2 +x̃3k+3
Q3
x̃3k+3
4
2
4
Example 5.4 In Example 5.2, Tab. 5.3 refers to the customers waiting times in a hypermarket. Calculate the basic characteristics of the sample.
Solution. We start with the mean:
52
n = 52,
1 X
1
.
(2 + 7 + 10 + · · · + 440) = 111.73.
X̄ =
xi =
52 i=1
52
The dispersion we characterize by the standard deviation, rather than the variance
v
! r
u
52
u1 X
1
.
.
Sn−1 = t
x2i − 52 x̄2 =
(1087654 − 52 · 111.732 ) = 92.73.
51 i=1
51
If this sample were the whole population, we would use the formula for Sn which renders
v
! r
u
52
u1 X
1
.
.
Sn = t
x2i − 52 x̄2 =
(1087654 − 52 · 111.732 ) = 90.94.
52 i=1
52
There is no important difference between it and the sample characteristic.
The original data are already ordered so the quartiles and the median can be easily read. Each row of the table contains a quarter of the data and as
n=52=2·26=4·13, the Tab. 5.5 provides
Q1 =
3x̃13 +x̃14 3·44+44
=
=44,
4
4
M=
x̃26 +x̃27 85+92
=
=88.5,
2
2
Q3 =
3x̃39 +x̃40 3·143+149
=
=144.5.
4
4
In order to compare the results, we calculate the sample characteristics also from the frequency table Tab. 5.4. We denote the sample mean X̄ and the
standard deviation S. For m=9 categories, we obtain
X̄ =
9
X
.
νi Ci = (0.3462 · 25 + · · · + 0.0192 · 425) = 107.69,
j=1
S=
9
X
.
νi Ci2 − X̄ 2 = 0.3462 · 252 + · · · + 0.0192 · 4252 − 107.692 = 92.45
j=1
which are, naturally, values close to x̄ or Sn−1 .
F̂
νi or ni
Besides the numerical values, it is also recommended to use graphical representation of the data. To this purpose, we may use the histogram, see Fig.
Ci
x
Figure 5.1: Histogram and empirical distribution function.
5.1, left. It is a rectangular graph which corresponds to the frequency table: each category has a segment on the horizontal axis (e.g. the range of the
values is split to m equally long intervals), the height of each rectangle is given by the count of the pertinent category, alternatively it can correspond to
the relative count. The histogram can be useful in an analysis of dispersion of the data, in identification of far values, or in determining the probability
distribution of the sample as it reminds one the probability density of a random variable.
The probability theory use besides the probability density also the cumulative distribution function. It can be estimated by the data plotting the graph
of the empirical distribution function F̂ presented in Fig. 5.1, right. The function is defined as
F̂ (x) =
nxi <x
,
n
(5.2)
where nxi ≤x is the number of those values xi which are smaller than x. An effective construction of the empirical distribution function thus requires
ordered data.
Example 5.5 In Example 5.2, the data in Tab. 5.3 present the customers waiting times in a hypermarket. Prepare the histogram and the empirical
distribution function.
Solution. The frequency table was created in the mentioned example, it can serve for drawing the histogram. The histogram is plotted in Fig. 5.2,
left, for the relative counts νi together with the graph of a function which expresses the probability that the value of the random variable ξ with the
exponential probability distribution defined by the probability density p(x)=δe−δx and with the parameter δ=0.01 lies in the interval hx−25, x+25i for
us to see a relation with this probability distribution function. The picture documents that an assumption of the exponential probability distribution for
the random variable ξ expressing the costumers waiting times is quite natural.
We try something similar with the empirical distribution function. If we
look like the following:


0




1

 52



F̂ (x) = 50


52



51


52


1
wanted to write an expression for the function F̂ based on Tab. 5.3, it would
, if x ≤ 2,
, if 2 < x ≤ 7,
..
.
, if 287 < x ≤ 383,
, if 383 < x ≤ 440,
, if 440 < x.
It is not very clear, so the graph is better,
see Fig. 5.2, right. This picture also includes the graph of cumulative distribution function for the exponential
−δx
probability distribution F (x)= 1−e
. It also proves exponential character of the random variable which provides the statistical sample.
0.4
1
x+25
R
0.1
x−25
0.6
F̂
ν, P
0.2
F̂ (x)
F (x)
0.8
νi
0.3
p(t)dt
0.4
0.2
0
0
0
50
100 150 200 250 300 350 400 450
t [s]
0
50
100 150 200 250 300 350 400 450
t [s]
Figure 5.2: the histogram and the empirical distribution function.
Descriptive statistics – The principal probability distribution in mathematical statistics
The random character of the collected data and also of the numerical characteristics has to be assessed by the theory of probability.
In many practical problems, the sample is considered as a set of data pertinent to the normal probability distribution. Thus, the normal probability
distribution is a real base for all statistical calculation which will be discussed in the present course. Nevertheless, another probability distributions have
to be taken into account, as long as e.g. calculating the sample mean or variance, the results are again random variables. It is useful to know which
probability distributions pertain to them or at least to know to estimate them. Having this in mind, we will cope with the following three probability
distributions.
Let η1 , η2 , . . . ηn , ηn+1 . . . ηn+k , be independent random variables with the standard normal probability distribution ηi ∼ N (0, 1).
We say that the random variable ξ has the probability distribution χ2 with the degree of freedom n, if
ξ = η12 + η22 + · · · + ηn2 ,
and we denote it ξ ∼ χ2 (n). Often, the symbol χ2 (n) will be used to denote a random variable with this distribution. This definition enables to draw
the probability distribution pχ2 (n) and to calculate the mean Eχ2 (n) and the variance Dχ2 (n):
n
x
x 2 −1 e− 2
pχ2 (n) (x) = Θ(x) n n ,
22Γ 2
Eχ2 (n) = n,
Dχ2 (n) = 2n.
(5.3)
The quantile xδ of such a random variable at the level δ is denoted χ2δ (n).
The distribution of the values of the random variable can be easily guessed from the graph of the probability density depending on the degree of freedom
n. The graphs in Fig. 5.3 pertains to three different values of n. In statistical calculation, we use the quantiles of the random variables either close to
zero or close to one. The values for χ2δ (n) are gathered in Tab. 5.6.
0.2
0.4
N (0, 1)
n=5
0.3
n=10
0.1
p
p
0.15
0.2
n=20
0.05
n=2
0.1
n=20
n=5
0
0
0
5
10
15
20
25
30
-4
-2
x
0
2
4
x
Figure 5.3: The probability densities for χ2 (n) and t(n).
Further, we say that the random variable ξ has the Student t distribution with the degree of freedom n, if
ηn+1
ξ=q
,
χ2 (n)
n
where χ2 (n) is define above. It is denoted as ξ ∼ t(n) and the same symbol t(n) is also used for denoting the random variable with such a probability
distribution itself. The definition may help in deriving the probability density pt(n) , and in calculating the mean E (t(n)) and the variance D (t(n)). The
are determined by the relations
Γ n+1
n
2
pt(n) (x) = √
E (t(n)) = 0,
D (t(n)) =
,
(5.4)
n+1 ,
2
n−2
nπΓ n2 1 + xn 2
assuming that the expressions have any sense, i.e. D only for n>2.
The pertinent quantile xδ is denoted t2δ (n). The probability distribution is symmetric, thus the table of quantiles, see Tab. 5.6, contains only quantile
levels close to one because tδ (n)=−t1−δ (n).
The expected or unexpected values of the random variable can be easily guessed from the graph of the probability density depending on the degree of
freedom n, which is plotted in Fig. 5.3 for three different values of n. The picture also include the graph of standard normal probability distribution
which is a limit of t probability distribution for large n.
Finally, the third probability distribution is the F distribution. The random variable ξ is said to have the Fisher F distribution with degrees of freedom
n, k, if
ξ=
χ2 (n)
n
,
χ2 (k)
k
2
2
2
where χ2 (n) is defined above and χ2 (k) = ηn+1
+ ηn+2
+ · · · + ηn+k
. As before, the notation ξ ∼ F(n, k) is also commonly used for denoting a random
variable with this probability distribution F(n, k). The probability density pF(n,k) , the mean E (F(n, k)) and the variance D (F(n, k)) can be obtained in
the following form:
n k n −1
Γ n+k
n2k2x2
2k 2 (k + n − 2)
k
2
,
D
(F(n,
k))
=
,
(5.5)
pF(n,k) (x) = Θ(x)
,
E
(F(n,
k))
=
n+k
k−2
n(k − 2)2 (k − 4)
Γ n2 Γ k2 (k + nx) 2
assuming also here that the expressions can be calculated, i.e. D only for k>4.
The graph of the probability density is a good visual representation of the values of the random variable depending on the degrees of freedom. Hence,
Fig. 5.4 shows the densities for three pairs (n, k). As χ2 (n),also F probability distribution is nonsymmetric, however, additionally there are two parameters
in F distribution, so that a simple table of quantiles has to be prepared separately for each level δ. The tables 5.7 to 5.10 summarize the quantiles which
we will use in this course.
Let us note that the cumulative distribution function of any of the aforementioned probability distributions is complicated and it is usually expressed by
an integral or using special functions (like the functions Γ, B or their incomplete versions). It is not important to define them here.
Example 5.6 Find, using the tables, the following quantiles: χ20.975 (14), t0.05 (19), F0.99 (13, 14).
Solution. The first value can be find directly in Tab. 5.6, left: χ20.975 (14)=26.1189.
Though, the second value is not in the table, we can use the symmetry of the t probability distribution to obtain: t0.05 (19)= − t1−0.05 (19)= −
t0.95 (19)=1.7291. The last value was found in Tab. 5.6, right.
Table 5.6: Quantiles for χ2 (n) and t(n).
χ2δ (n)
n
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
40
50
60
70
80
90
100
δ
0.005
0.00004
0.0100
0.0717
0.2070
0.4117
0.6757
0.9893
1.3444
1.7349
2.1559
2.6032
3.0738
3.5650
4.0747
4.6009
5.1422
5.6972
6.2648
6.8440
7.4338
8.0337
8.6427
9.2604
9.8862
10.5196
11.1602
11.8076
12.4613
13.1211
13.7867
20.7066
27.9907
35.5345
43.2752
51.1719
59.1963
67.3275
0.01
0.0002
0.0201
0.1148
0.2971
0.5543
0.8721
1.2390
1.6465
2.0879
2.5582
3.0535
3.5706
4.1069
4.6604
5.2294
5.8122
6.4078
7.0149
7.6327
8.2604
8.8972
9.5425
10.1957
10.8564
11.5240
12.1982
12.8785
13.5647
14.2565
14.9535
22.1642
29.7067
37.4848
45.4417
53.5401
61.7541
70.0649
0.025
0.0010
0.0506
0.2158
0.4844
0.8312
1.2373
1.6899
2.1797
2.7004
3.2470
3.8157
4.4038
5.0088
5.6287
6.2621
6.9077
7.5642
8.2307
8.9065
9.5908
10.2829
10.9823
11.6886
12.4012
13.1197
13.8439
14.5734
15.3079
16.0471
16.7908
24.4330
32.3574
40.4818
48.7576
57.1532
65.6466
74.2219
0.05
0.0039
0.1026
0.3518
0.7107
1.1455
1.6354
2.1674
2.7326
3.3251
3.9403
4.5748
5.2260
5.8919
6.5706
7.2609
7.9616
8.6718
9.3905
10.1170
10.8508
11.5913
12.3380
13.0905
13.8484
14.6114
15.3792
16.1514
16.9279
17.7084
18.4927
26.5093
34.7643
43.1880
51.7393
60.3915
69.1260
77.9295
0.95
3.8415
5.9915
7.8147
9.4877
11.0705
12.5916
14.0671
15.5073
16.9190
18.3070
19.6751
21.0261
22.3620
23.6848
24.9958
26.2962
27.5871
28.8693
30.1435
31.4104
32.6706
33.9244
35.1725
36.4150
37.6525
38.8851
40.1133
41.3371
42.5570
43.7730
55.7585
67.5048
79.0819
90.5312
101.8795
113.1453
124.3421
0.975
5.0239
7.3778
9.3484
11.1433
12.8325
14.4494
16.0128
17.5345
19.0228
20.4832
21.9201
23.3367
24.7356
26.1189
27.4884
28.8454
30.1910
31.5264
32.8523
34.1696
35.4789
36.7807
38.0756
39.3641
40.6465
41.9232
43.1945
44.4608
45.7223
46.9792
59.3417
71.4202
83.2977
95.0232
106.6286
118.1359
129.5612
0.99
6.6349
9.2103
11.3449
13.2767
15.0863
16.8119
18.4753
20.0902
21.6660
23.2092
24.7250
26.2170
27.6883
29.1412
30.5779
31.9999
33.4086
34.8053
36.1909
37.5662
38.9322
40.2894
41.6384
42.9798
44.3141
45.6417
46.9629
48.2782
49.5879
50.8922
63.6908
76.1539
88.3794
100.4251
112.3288
124.1163
135.8068
0.995
7.8794
10.5967
12.8382
14.8602
16.7496
18.5476
20.2778
21.9549
23.5893
25.1882
26.7569
28.2995
29.8195
31.3193
32.8013
34.2672
35.7184
37.1565
38.5822
39.9969
41.4011
42.7957
44.1813
45.5585
46.9278
48.2899
49.6449
50.9933
52.3355
53.6720
66.7660
79.4899
91.9518
104.2149
116.3210
128.2990
140.1695
tδ (n)
n
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
40
50
60
70
80
90
100
δ
0.950
6.3138
2.9200
2.3534
2.1318
2.0150
1.9432
1.8946
1.8595
1.8331
1.8125
1.7959
1.7823
1.7709
1.7613
1.7531
1.7459
1.7396
1.7341
1.7291
1.7247
1.7207
1.7171
1.7139
1.7109
1.7081
1.7056
1.7033
1.7011
1.6991
1.6973
1.6839
1.6759
1.6706
1.6669
1.6641
1.6620
1.6602
0.975
12.7062
4.3026
3.1824
2.7764
2.5706
2.4469
2.3646
2.3060
2.2622
2.2281
2.2010
2.1788
2.1604
2.1448
2.1315
2.1199
2.1098
2.1009
2.0930
2.0860
2.0796
2.0739
2.0687
2.0639
2.0595
2.0555
2.0518
2.0484
2.0452
2.0423
2.0211
2.0086
2.0003
1.9944
1.9901
1.9867
1.9840
0.990
31.8207
6.9646
4.5407
3.7469
3.3649
3.1427
2.9979
2.8965
2.8214
2.7638
2.7181
2.6810
2.6503
2.6245
2.6025
2.5835
2.5669
2.5524
2.5395
2.5280
2.5177
2.5083
2.4999
2.4922
2.4851
2.4786
2.4727
2.4671
2.4620
2.4573
2.4233
2.4033
2.3901
2.3808
2.3739
2.3685
2.3642
0.995
63.6566
9.9249
5.8409
4.6041
4.0321
3.7074
3.4995
3.3554
3.2498
3.1693
3.1058
3.0545
3.0123
2.9768
2.9467
2.9208
2.8982
2.8784
2.8609
2.8453
2.8314
2.8188
2.8073
2.7969
2.7874
2.7787
2.7707
2.7633
2.7564
2.7500
2.7045
2.6778
2.6603
2.6479
2.6387
2.6316
2.6259
Table 5.7: Quantiles for F(n, k), δ=0.95.
Fδ(n,k)
k
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
n
2
19.00
9.552
6.944
5.786
5.143
4.737
4.459
4.256
4.103
3.982
3.885
3.806
3.739
3.682
3.634
3.592
3.555
3.522
3.493
3.467
3.443
3.422
3.403
3.385
3.369
3.354
3.340
3.328
3.316
3
19.16
9.277
6.591
5.409
4.757
4.347
4.066
3.863
3.708
3.587
3.490
3.411
3.344
3.287
3.239
3.197
3.160
3.127
3.098
3.072
3.049
3.028
3.009
2.991
2.975
2.960
2.947
2.934
2.922
4
19.25
9.117
6.388
5.192
4.534
4.120
3.838
3.633
3.478
3.357
3.259
3.179
3.112
3.056
3.007
2.965
2.928
2.895
2.866
2.840
2.817
2.796
2.776
2.759
2.743
2.728
2.714
2.701
2.690
5
19.30
9.013
6.256
5.050
4.387
3.972
3.687
3.482
3.326
3.204
3.106
3.025
2.958
2.901
2.852
2.810
2.773
2.740
2.711
2.685
2.661
2.640
2.621
2.603
2.587
2.572
2.558
2.545
2.534
6
19.33
8.941
6.163
4.950
4.284
3.866
3.581
3.374
3.217
3.095
2.996
2.915
2.848
2.790
2.741
2.699
2.661
2.628
2.599
2.573
2.549
2.528
2.508
2.490
2.474
2.459
2.445
2.432
2.421
7
19.35
8.887
6.094
4.876
4.207
3.787
3.500
3.293
3.135
3.012
2.913
2.832
2.764
2.707
2.657
2.614
2.577
2.544
2.514
2.488
2.464
2.442
2.423
2.405
2.388
2.373
2.359
2.346
2.334
8
19.37
8.845
6.041
4.818
4.147
3.726
3.438
3.230
3.072
2.948
2.849
2.767
2.699
2.641
2.591
2.548
2.510
2.477
2.447
2.420
2.397
2.375
2.355
2.337
2.321
2.305
2.291
2.278
2.266
9
19.38
8.812
5.999
4.772
4.099
3.677
3.388
3.179
3.020
2.896
2.796
2.714
2.646
2.588
2.538
2.494
2.456
2.423
2.393
2.366
2.342
2.320
2.300
2.282
2.265
2.250
2.236
2.223
2.211
10
19.40
8.786
5.964
4.735
4.060
3.637
3.347
3.137
2.978
2.854
2.753
2.671
2.602
2.544
2.494
2.450
2.412
2.378
2.348
2.321
2.297
2.275
2.255
2.236
2.220
2.204
2.190
2.177
2.165
12
19.41
8.745
5.912
4.678
4.000
3.575
3.284
3.073
2.913
2.788
2.687
2.604
2.534
2.475
2.425
2.381
2.342
2.308
2.278
2.250
2.226
2.204
2.183
2.165
2.148
2.132
2.118
2.104
2.092
15
19.43
8.703
5.858
4.619
3.938
3.511
3.218
3.006
2.845
2.719
2.617
2.533
2.463
2.403
2.352
2.308
2.269
2.234
2.203
2.176
2.151
2.128
2.108
2.089
2.072
2.056
2.041
2.027
2.015
20
19.45
8.660
5.803
4.558
3.874
3.445
3.150
2.936
2.774
2.646
2.544
2.459
2.388
2.328
2.276
2.230
2.191
2.155
2.124
2.096
2.071
2.048
2.027
2.007
1.990
1.974
1.959
1.945
1.932
25
19.46
8.634
5.769
4.521
3.835
3.404
3.108
2.893
2.730
2.601
2.498
2.412
2.341
2.280
2.227
2.181
2.141
2.106
2.074
2.045
2.020
1.996
1.975
1.955
1.938
1.921
1.906
1.891
1.878
30
19.46
8.617
5.746
4.496
3.808
3.376
3.079
2.864
2.700
2.570
2.466
2.380
2.308
2.247
2.194
2.148
2.107
2.071
2.039
2.010
1.984
1.961
1.939
1.919
1.901
1.884
1.869
1.854
1.841
Fδ(n,k)
k
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
n
2
39.00
16.04
10.64
8.434
7.260
6.542
6.059
5.715
5.456
5.256
5.096
4.965
4.857
4.765
4.687
4.619
4.560
4.508
4.461
4.420
4.383
4.349
4.319
4.291
4.265
4.242
4.221
4.201
4.182
3
39.17
15.44
9.979
7.764
6.599
5.890
5.416
5.078
4.826
4.630
4.474
4.347
4.242
4.153
4.077
4.011
3.954
3.903
3.859
3.819
3.783
3.750
3.721
3.694
3.670
3.647
3.626
3.607
3.589
4
39.25
15.10
9.605
7.388
6.227
5.523
5.053
4.718
4.468
4.275
4.121
3.996
3.892
3.804
3.729
3.665
3.608
3.559
3.515
3.475
3.440
3.408
3.379
3.353
3.329
3.307
3.286
3.267
3.250
5
39.30
14.88
9.364
7.146
5.988
5.285
4.817
4.484
4.236
4.044
3.891
3.767
3.663
3.576
3.502
3.438
3.382
3.333
3.289
3.250
3.215
3.183
3.155
3.129
3.105
3.083
3.063
3.044
3.026
6
39.33
14.73
9.197
6.978
5.820
5.119
4.652
4.320
4.072
3.881
3.728
3.604
3.501
3.415
3.341
3.277
3.221
3.172
3.128
3.090
3.055
3.023
2.995
2.969
2.945
2.923
2.903
2.884
2.867
7
39.36
14.62
9.074
6.853
5.695
4.995
4.529
4.197
3.950
3.759
3.607
3.483
3.380
3.293
3.219
3.156
3.100
3.051
3.007
2.969
2.934
2.902
2.874
2.848
2.824
2.802
2.782
2.763
2.746
8
39.37
14.54
8.980
6.757
5.600
4.899
4.433
4.102
3.855
3.664
3.512
3.388
3.285
3.199
3.125
3.061
3.005
2.956
2.913
2.874
2.839
2.808
2.779
2.753
2.729
2.707
2.687
2.669
2.651
9
39.39
14.47
8.905
6.681
5.523
4.823
4.357
4.026
3.779
3.588
3.436
3.312
3.209
3.123
3.049
2.985
2.929
2.880
2.837
2.798
2.763
2.731
2.703
2.677
2.653
2.631
2.611
2.592
2.575
10
39.40
14.42
8.844
6.619
5.461
4.761
4.295
3.964
3.717
3.526
3.374
3.250
3.147
3.060
2.986
2.922
2.866
2.817
2.774
2.735
2.700
2.668
2.640
2.613
2.590
2.568
2.547
2.529
2.511
12
39.41
14.34
8.751
6.525
5.366
4.666
4.200
3.868
3.621
3.430
3.277
3.153
3.050
2.963
2.889
2.825
2.769
2.720
2.676
2.637
2.602
2.570
2.541
2.515
2.491
2.469
2.448
2.430
2.412
15
39.43
14.25
8.657
6.428
5.269
4.568
4.101
3.769
3.522
3.330
3.177
3.053
2.949
2.862
2.788
2.723
2.667
2.617
2.573
2.534
2.498
2.466
2.437
2.411
2.387
2.364
2.344
2.325
2.307
20
39.45
14.17
8.560
6.329
5.168
4.467
3.999
3.667
3.419
3.226
3.073
2.948
2.844
2.756
2.681
2.616
2.559
2.509
2.464
2.425
2.389
2.357
2.327
2.300
2.276
2.253
2.232
2.213
2.195
25
39.46
14.12
8.501
6.268
5.107
4.405
3.937
3.604
3.355
3.162
3.008
2.882
2.778
2.689
2.614
2.548
2.491
2.441
2.396
2.356
2.320
2.287
2.257
2.230
2.205
2.183
2.161
2.142
2.124
30
39.46
14.08
8.461
6.227
5.065
4.362
3.894
3.560
3.311
3.118
2.963
2.837
2.732
2.644
2.568
2.502
2.445
2.394
2.349
2.308
2.272
2.239
2.209
2.182
2.157
2.133
2.112
2.092
2.074
Fδ(n,k)
k
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
n
2
99.00
30.82
18.00
13.27
10.93
9.547
8.649
8.022
7.559
7.206
6.927
6.701
6.515
6.359
6.226
6.112
6.013
5.926
5.849
5.780
5.719
5.664
5.614
5.568
5.526
5.488
5.453
5.420
5.390
3
99.17
29.46
16.69
12.06
9.780
8.451
7.591
6.992
6.552
6.217
5.953
5.739
5.564
5.417
5.292
5.185
5.092
5.010
4.938
4.874
4.817
4.765
4.718
4.675
4.637
4.601
4.568
4.538
4.510
4
99.25
28.71
15.98
11.39
9.148
7.847
7.006
6.422
5.994
5.668
5.412
5.205
5.035
4.893
4.773
4.669
4.579
4.500
4.431
4.369
4.313
4.264
4.218
4.177
4.140
4.106
4.074
4.045
4.018
5
99.30
28.24
15.52
10.97
8.746
7.460
6.632
6.057
5.636
5.316
5.064
4.862
4.695
4.556
4.437
4.336
4.248
4.171
4.103
4.042
3.988
3.939
3.895
3.855
3.818
3.785
3.754
3.725
3.699
6
99.33
27.91
15.21
10.67
8.466
7.191
6.371
5.802
5.386
5.069
4.821
4.620
4.456
4.318
4.202
4.102
4.015
3.939
3.871
3.812
3.758
3.710
3.667
3.627
3.591
3.558
3.528
3.499
3.473
7
99.36
27.67
14.98
10.46
8.260
6.993
6.178
5.613
5.200
4.886
4.640
4.441
4.278
4.142
4.026
3.927
3.841
3.765
3.699
3.640
3.587
3.539
3.496
3.457
3.421
3.388
3.358
3.330
3.304
8
99.37
27.49
14.80
10.29
8.102
6.840
6.029
5.467
5.057
4.744
4.499
4.302
4.140
4.004
3.890
3.791
3.705
3.631
3.564
3.506
3.453
3.406
3.363
3.324
3.288
3.256
3.226
3.198
3.173
9
99.39
27.35
14.66
10.16
7.976
6.719
5.911
5.351
4.942
4.632
4.388
4.191
4.030
3.895
3.780
3.682
3.597
3.523
3.457
3.398
3.346
3.299
3.256
3.217
3.182
3.149
3.120
3.092
3.067
10
99.40
27.23
14.55
10.05
7.874
6.620
5.814
5.257
4.849
4.539
4.296
4.100
3.939
3.805
3.691
3.593
3.508
3.434
3.368
3.310
3.258
3.211
3.168
3.129
3.094
3.062
3.032
3.005
2.979
12
99.42
27.05
14.37
9.888
7.718
6.469
5.667
5.111
4.706
4.397
4.155
3.960
3.800
3.666
3.553
3.455
3.371
3.297
3.231
3.173
3.121
3.074
3.032
2.993
2.958
2.926
2.896
2.868
2.843
15
99.43
26.87
14.20
9.722
7.559
6.314
5.515
4.962
4.558
4.251
4.010
3.815
3.656
3.522
3.409
3.312
3.227
3.153
3.088
3.030
2.978
2.931
2.889
2.850
2.815
2.783
2.753
2.726
2.700
20
99.45
26.69
14.02
9.553
7.396
6.155
5.359
4.808
4.405
4.099
3.858
3.665
3.505
3.372
3.259
3.162
3.077
3.003
2.938
2.880
2.827
2.781
2.738
2.699
2.664
2.632
2.602
2.574
2.549
25
99.46
26.58
13.91
9.449
7.296
6.058
5.263
4.713
4.311
4.005
3.765
3.571
3.412
3.278
3.165
3.068
2.983
2.909
2.843
2.785
2.733
2.686
2.643
2.604
2.569
2.536
2.506
2.478
2.453
30
99.47
26.50
13.84
9.379
7.229
5.992
5.198
4.649
4.247
3.941
3.701
3.507
3.348
3.214
3.101
3.003
2.919
2.844
2.778
2.720
2.667
2.620
2.577
2.538
2.503
2.470
2.440
2.412
2.386
Fδ(n,k)
k
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
n
2
199.0
49.80
26.28
18.31
14.54
12.40
11.04
10.11
9.427
8.912
8.510
8.186
7.922
7.701
7.514
7.354
7.215
7.093
6.986
6.891
6.806
6.730
6.661
6.598
6.541
6.489
6.440
6.396
6.355
3
199.2
47.47
24.26
16.53
12.92
10.88
9.596
8.717
8.081
7.600
7.226
6.926
6.680
6.476
6.303
6.156
6.028
5.916
5.818
5.730
5.652
5.582
5.519
5.462
5.409
5.361
5.317
5.276
5.239
4
199.3
46.19
23.15
15.56
12.03
10.05
8.805
7.956
7.343
6.881
6.521
6.233
5.998
5.803
5.638
5.497
5.375
5.268
5.174
5.091
5.017
4.950
4.890
4.835
4.785
4.740
4.698
4.659
4.623
5
199.3
45.39
22.46
14.94
11.46
9.522
8.302
7.471
6.872
6.422
6.071
5.791
5.562
5.372
5.212
5.075
4.956
4.853
4.762
4.681
4.609
4.544
4.486
4.433
4.384
4.340
4.300
4.262
4.228
6
199.3
44.84
21.97
14.51
11.07
9.155
7.952
7.134
6.545
6.102
5.757
5.482
5.257
5.071
4.913
4.779
4.663
4.561
4.472
4.393
4.322
4.259
4.202
4.150
4.103
4.059
4.020
3.983
3.949
7
199.4
44.43
21.62
14.20
10.79
8.885
7.694
6.885
6.302
5.865
5.525
5.253
5.031
4.847
4.692
4.559
4.445
4.345
4.257
4.179
4.109
4.047
3.991
3.939
3.893
3.850
3.811
3.775
3.742
8
199.4
44.13
21.35
13.96
10.57
8.678
7.496
6.693
6.116
5.682
5.345
5.076
4.857
4.674
4.521
4.389
4.276
4.177
4.090
4.013
3.944
3.882
3.826
3.776
3.730
3.687
3.649
3.613
3.580
9
199.4
43.88
21.14
13.77
10.39
8.514
7.339
6.541
5.968
5.537
5.202
4.935
4.717
4.536
4.384
4.254
4.141
4.043
3.956
3.880
3.812
3.750
3.695
3.645
3.599
3.557
3.519
3.483
3.450
10
199.4
43.69
20.97
13.62
10.25
8.380
7.211
6.417
5.847
5.418
5.085
4.820
4.603
4.424
4.272
4.142
4.030
3.933
3.847
3.771
3.703
3.642
3.587
3.537
3.492
3.450
3.412
3.377
3.344
12
199.4
43.39
20.70
13.38
10.03
8.176
7.015
6.227
5.661
5.236
4.906
4.643
4.428
4.250
4.099
3.971
3.860
3.763
3.678
3.602
3.535
3.475
3.420
3.370
3.325
3.284
3.246
3.211
3.179
15
199.4
43.08
20.44
13.15
9.814
7.968
6.814
6.032
5.471
5.049
4.721
4.460
4.247
4.070
3.920
3.793
3.683
3.587
3.502
3.427
3.360
3.300
3.246
3.196
3.151
3.110
3.073
3.038
3.006
20
199.5
42.78
20.17
12.90
9.589
7.754
6.608
5.832
5.274
4.855
4.530
4.270
4.059
3.883
3.734
3.607
3.498
3.402
3.318
3.243
3.176
3.116
3.062
3.013
2.968
2.928
2.890
2.855
2.823
25
199.5
42.59
20.00
12.76
9.451
7.623
6.482
5.708
5.153
4.736
4.412
4.153
3.942
3.766
3.618
3.492
3.382
3.287
3.203
3.128
3.061
3.001
2.947
2.898
2.853
2.812
2.775
2.740
2.708
30
199.5
42.47
19.89
12.66
9.358
7.534
6.396
5.625
5.071
4.654
4.331
4.073
3.862
3.687
3.539
3.412
3.303
3.208
3.123
3.049
2.982
2.922
2.868
2.819
2.774
2.733
2.695
2.660
2.628
0.8
0.6
0.6
0.4
p
p
0.8
n=5, k=5
n=5, k=10
n=5, k=20
0.2
0.4
n = 5, k = 5
n = 10, k = 5
n = 20, k = 5
0.2
0
0
0
1
2
3
4
5
0
1
x
2
3
4
5
x
Figure 5.4: A comparison of the probability densities of F(n, k) for a constant n and a constant k.
The third quantile we will find in Tab. 5.9. The pertinent pair of freedom degrees does not occur in the table, so we have to find it by interpolation. In
the table, we find
F0.99 (12, 14)=3.800,
F0.99 (15, 14)=3.656.
An approximation of the required value is calculated by the linear interpolation from these two values
F0.99 (13, 14) = F0.99 (12, 14) +
13 − 12
1
(F0.99 (15, 14) − F0.99 (12, 14)) = 3.800 + (3.656 − 3.800) = 3.752.
15 − 12
3
Of course, we could use higher degree polynomial for the interpolation. Nevertheless, the simple linear interpolation is sufficient in many occasions.
Example 5.7 Derive the relation for pχ2 (n) for n=1 and n=2.
Solution. We start with χ2 (1). According to the definition, we have χ2 (1)= (N (0, 1))2 . The probability P(χ2 (1)<x), i.e. the value of the cumulative
distribution function, can be written using the unknown probability density as
Z x
2
P(χ (1)<x) =
pχ2 (1) (t)dt.
−∞
Now, we use the definition:
√
√
P(χ2 (1)<x) = P((N (0, 1))2 <x) = P(− x<N (0, 1) < x),
where the last equation holds only for positive x. For a negative x, it is obvious that the second probability vanishes so that the probability density of
χ2 (1) is zero. The last probability is calculated by the√probability density of the normal probability distribution N (0, 1) and exploiting the symmetry of
the normal distribution. Applying the substitution t= v in the integral, we come to
Z √x
Z x
Z x
Z √x
√
√
v
1
1 −v 1
1
1 − t2
2
2
√ e dt = 2
√ e
√
√ dv =
e− 2 v − 2 dv.
P(− x<N (0, 1)< x) = √ pN (0,1) (t)dt = 2
1
2 v
2π
2π
2Γ 2
0
0
− x
0
The resulting integral can be compared with the previous equation, recalling the definition of the Heaviside function Θ, it then renders
pχ2 (1) (t) = Θ(t)
1
1
2
2 Γ
t
1
e− 2 t− 2 ,
1
2
which corresponds to the relation (5.3) for n=1.
The calculation of the probability density for n=2 relies on the definition χ2 (2)= (N (0, 1))21 + (N (0, 1))22 = (χ2 (1))1 + (χ2 (1))2 and also on the formula
for calculating probability density of a sum of two independent random variables
Z +∞
pχ2 (2) (x) =
pχ2 (1) (t)pχ2 (1) (x − t)dt.
−∞
Now, the integral is to be calculated, having in mind that pχ2 (1) (t) vanishes for negative t. It means that for negative x the integral is zero. For positive
x, we obtain
#"
#
Z x
Z x"
Z x
x−t
t
1
1
1
1
dt
−2 −2
− 2
−2
− x2 1
p
pχ2 (2) (x) =
pχ2 (1) (t)pχ2 (1) (x − t)dt =
e
e
t
(x
−
t)
dt
=
e
.
1
1
2π 0
t(x − t)
2 2 Γ 12
2 2 Γ 21
0
0
The last integral is calculated as
Z
0
x
dt
p
=
t(x − t)
Z
0
x
Z
dt
q
x 2
2
− t−
x 2
2
x
2
=
− x2
dv
q x 2
2
−
v2
2v
= arcsin
x
x2
= π,
− x2
so that the resulting formula of the probability density is
x
pχ2 (2) (x) = Θ(x)e− 2
1
1 x
π = Θ(x) e− 2 .
2π
2
It is a special case of the relation (5.3) for n=2. Let us notice that the probability density is equal to probability density of a random variable with
exponential probability distribution with the parameter δ= 12 .
It should be noted that a similar way can be used to prove the formula (5.3) for larger n.
All of the aforementioned probability distributions will be used in the calculations. The numerical analysis of a sample can give an idea for particular
type of the random variable used in the sample characteristic. We start with the basic characteristics. According to Tab. 5.5, the formulae for sample
characteristics are written as functions of the data from the sample. If they are written as relations of random variables ξi , we obtain
n
1X
ξ¯ =
ξi ,
n i=1
n
ξS 2
2
1 X
ξi − ξ¯ .
=
n − 1 i=1
We will suppose, as usually in a sample, that the random variables are independent and additionally that they have normal distribution, i. e. ξi ∼N (m, σ 2 )
or alternatively ξi −m
∼N (0, 1). Then, the random variable ξ¯ and ξS 2 are independent and their probability distributions are
σ
σ2
n−1
¯
ξS 2 ∼ χ2 (n−1).
(5.6)
ξ ∼ N m,
,
n
σ2
The random variable ξ¯ is provided as a linear combination of ξi so that the proposition about ξ¯ is related to the properties of the normal probability
distribution and the properties of the mean and of the variance. Additionally, the relation can be rewritten using the standard parameters as
ξ¯ − m √
n ∼ N (0, 1).
σ
(5.7)
The fact that the random variable ξS 2 is connected with the probability distribution χ2 is obvious from its definition. The question, however, is what is
the degree of its freedom.
Example 5.8 Express the random variable ξS 2 as a random variable with χ2 distribution.
Solution. The expression for ξS 2 can be multiplied by the factor n−1
and modified as in Example 5.3 in order to use the relations (5.7). The result
σ2
shows that the degree of freedom for ξS 2 is n−1
!
2
¯
2
n
n n
n ¯− m 2 X
¯2 X
2 X
(ξ
−
m)
−
ξ
n−1
1 X
ξ
−
ξ
ξi − m
ξ−m
i
i
¯
ξi − ξ =
=
=
−n
ξS 2 = 2
σ2
σ i=1
σ
σ
σ
σ
i=1
i=1
i=1
=
n
X
(N (0, 1))2i − (N (0, 1))2 = χ2 (n) − χ2 (1) = χ2 (n − 1).
i=1
Two interesting relations can be added as they will be useful in the calculations. Still working with a sample of n elements, the following sample
characteristic T has t distribution
ξ¯ − m √
n ∼ t(n − 1).
(5.8)
T (ξ1 , ξ2 , . . . ξn ) = √
ξS 2
If we have
another sample P
η1 , η2 , . . . ηk of independent random variables with normal probability distribution N (m0 , σ 2 ) (the same σ) for which we define
P
2
k
1
η̄ = k1 ki=1 ηi , ηS 2 = k−1
i=1 (ηi − η̄) , then the following sample statistic T has F distribution
T (ξ1 , ξ2 , . . . ξn , η1 , η2 , . . . ηk ) =
ξS 2
∼ F(n − 1, k − 1).
ηS 2
(5.9)
If the sample characteristic is considered as a numerical value and the ξi , or ηi are substituted by the values xi , or yi , respectively, from the statistical
sample the relations (5.8) and (5.9) provide the numerical values
x̄ − m √
n,
T (x1 , x2 , . . . xn ) =
Sn−1
2
(x)
Sn−1
T (x1 , x2 , . . . xn , y1 , y2 , . . . yk ) = 2
Sk−1 (y)
which are the values of random variables with respective t or F distributions.
1. Can you correctly use the notions of random choice, statistical sample or population at a particular situation pertinent to your field of interest?
2. If a measurement observes more properties of the objects and a relationship between the properties is searched for, the data are usually written into
the contingency table which is in fact a two-dimensional frequency table. Can you prepare such a table? Find an example of its application.
3. Which graph corresponds to the contingency table?
4. Try to propose another graphical representation of experimental data.
5. Can you introduce other numerical characteristics of a sample? Where can they be used? Which probability distribution belongs to the proposed
characteristics assuming that the sample comes out from a normal distribution?
6. Try to continue in the solution of Example 5.7 for higher n. Alternatively, are you able to derive the formulae (5.4) or (5.5) using the definition?
7. Do you explain by a mathematically rigorous way the formulae (5.8) and (5.9)?
In the problems 1. to 5., make the numerical and graphical analysis of the measured data, i. e. calculate the sample characteristics, propose a division
of the data into categories and prepare the table of frequencies, plot the histogram and the graph of the empirical distribution function.
1. A measurement of the ozone concentration in a certain area of choice during the summer holidays of several years.
3.5
4.7
5.3
6.6
1.6
3.1
1.4
3.5
4.7
6.2
2.5
4.7
6.6
4.0
7.4
7.5
8.1
3.8
6.0
2.4
6.0
6.2
6.6
5.9
4.2
3.0
6.7
6.0
9.4
3.3
4.4
5.6
11.7
5.8
3.4
6.2
5.3
4.7
5.5
2.8
5.8
7.6
5.6
6.5
1.1
6.1
7.6
6.6
6.8 2.5 5.4 4.4 5.4
3.0 4.1 3.4 6.8 1.7
5.1 5.6 5.5 1.4 3.9
4.1 5.7 5.8 3.1 5.8
1.4 3.7 2.0 3.7 6.8
4.4 5.7 4.5 3.7 9.4
.
2. Measurement of the level of traffic noise (in dB).
64.6
62.1
63.1
62.7
62.6
55.8
52.0
55.7
54.5
54.4
58.9
56.7
57.6
57.0
56.8
60.0
59.4
59.8
59.0
59.4
67.1
65.7
67.0
66.8
66.2
56.4
55.9
54.4
56.2
55.9
60.8
60.2
60.6
60.5
60.3
62.0
61.0
61.8
61.7
61.4
64.9
63.8
64.8
64.6
64.0
77.1
67.9
69.4
68.9
68.2
.
3. Measurement of the water flow rate in a brook (in m3 s−1 ).
0.381
0.390
0.173
0.181
0.800
0.819
0.510
0.520
0.291
0.300
0.361
0.372
0.686
0.740
0.783
0.783
0.437
0.441
0.254
0.274
1.029
1.084
0.634
0.660
0.230
0.248
0.418
0.424
0.634
0.634
0.974
0.976
0.399
0.414
0.217
0.230
0.840
0.853
0.577
0.625
0.334
0.348
0.390
0.399
0.751
0.771
0.822
0.840
0.464
0.476
0.300
0.318
1.164
1.303
0.740
0.746
0.181
0.209
0.443
0.455
0.534
0.558
1.134
1.157
0.504
0.414
1.601
0.772
1.647
0.921
0.254
0.488
0,504
0,361
0,634
1.514
.
4. Determination of the strength of the material. The values of the stress at the moment of breakage of the specimen are expressed in MPa.
14.40
14.51
14.48
14.35
14.80
14.81
16.40
16.42
15.64
15.60
16.03
15.90
15.00
15.98
15.07
15.90
15.51
15.40
16.54
17.10
15.80
16.42
16.50
16.12
15.40
15.20
14.61
14.70
14.80
14.80
16.50
16.48
16.04
16.50
16.42
16.40
15.90
15.94
15.09
15.50
15.81
15.80
17.35
16.90
16.25
16.30
17.40
16.36
14.78
15.50
15.12
14.50
15.30
15.92
16.00
16.64
16.80
15.63
16.48
15.70
15.30
14.41
16.27
15.90
15.06
16.50
16.10
17.54
16.28
16.30
17.33
16.74
.
5. Number of computer network outages in the university computer center per week.
0
0
1
6
0
0
4
0
4
0
2
3
2
3
0
0
0
2
3
1
0
6
0
1
1
0
2
0
2
0
3
0
0
4
2
3
3
1
0
0
0
1
2
0
2
0
1
1
3
4
1
0
2
1
4
8
1
0
2
0
5
0
0
0
2
0
1
1
0
2
.
Conclusion
In the chapter, we learned to properly use the basic concepts of statistics and we understood their relevance. We learned how to record the data into
tables in a transparent and efficient form. From the table, we can get clear information about table data, using numerical sample characteristics and
reflecting some special features of these data sets. We also explained how to create pictures of graphs that describe the statistical nature of the data.
We implemented the obtained numerical sample characteristics in the sense of the probability theory and we know to assess the level and the nature
of randomness in the statistical data. Moreover, we have learned to distinguish between basic types of random variables used in statistics so that we
understand their characteristics.
References
Problem solutions
1. Sn−1 =2.0026, x̄=4.9744, M =5.3, Q1 =3.55, Q3 =6.175 2. Sn−1 =4.8868, x̄=61.352, M =60.9, Q1 =57.15, Q3 =64.6 3. Sn−1 =0.3356, x̄=0.6059,
M =0.507, Q1 =0.3693, Q3 =0.783
4. Sn−1 =0.8047, x̄=15.8226, M =15.9, Q1 =15.18, Q3 =16.42
5. Sn−1 =1.7588, x̄=1.4714, M =1, Q1 =0,
Q3 =2
Chapter 6. (Estimates and hypotheses)
Aim
Explain principal tools of mathematical statistics which are used to estimate unknown parameters of random variables and which infer relevance of
statistical propositions.
Objectives
1. Learn to estimate an unknown parameter of a random variable using a statistical sample.
2. Understand differences between point estimators and interval estimators of a parameter.
3. Get the idea of the randomness taken into account in measured data.
4. Understand principals of statistical hypotheses formulation and of the statistical assessments.
5. Learn to use tests based on the assumption of normality for the random choice.
6. Verify information about probability distribution in a random choice using an appropriate test.
Prerequisites
random variable; characteristics of random variables; conditional probability; cumulative distribution function; quantiles
Chapter 6. (Estimates and hypotheses) – 1
Introduction
The chapter discusses one of the fundamental objectives of the mathematical statistics which is the determination of the probability distributions of
random variables in a given random sample, or at least their parameters based on measured data which are available in a statistical sample.
First, the estimators of the random variable parameters are mentioned. In statistics, there are two types of parameter estimators which will be analyzed.
A point estimator is a sample characteristic with certain properties such that the numerical value obtained by the statistical sample approximates the
true value of the parameter. In this estimate, however, there is no information about the reached probability. After all, if a continuous random variable
is considered, the probability of a particular value of the random variable is zero. So, if the probability is to be felt as a measure of reliability of the
estimate, a range of values is required. In such a case, we are talking about interval estimators of parameters.
The second part of the chapter will be devoted to statistical hypotheses testing. In practice, a specific parameter value is not needed, only a comparison
of random data is relevant, e.g. two methods of measurement for one variable, or comparing the results with some standard. To this end, we show
how to construct appropriate propositions about random variables, which are called statistical hypotheses. Frequently, an assumption on normality of
the pertinent random variable can be made so that the arguments are then inferred from to the parameters of the normal distribution. Such tests are
called parametric. If the hypotheses about the parameters cannot be gained, e.g. if the random choice does not come out from the normal distribution,
non-parametric tests are used. The non-parametric tests are not addressed in this course.
Finally, a test will be described which concerns the probability distribution of the random variable provided by the random choice. As we mentioned
above, it is often needed to verify that the random variable has a specific type of distribution, e.g. the normal distribution, in order we could test the
parameters. The test of this type is called a goodness-of-fit test.
Estimates and hypotheses – Point estimators of the probability distribution parameters
Which are the principal methods of a random variable characteristic estimation, using descriptive statistics?
An estimator of a parameter Λ is called a value of a sample characteristic Λ̂=Λ̂(x1 , x2 , . . . xn ) of a random choice ξ1 , ξ2 , . . . ξn , determined by the
values of the statistical sample x1 , x2 , . . . xn , with the probability distribution dependent on the parameter Λ and defined by the cumulative distribution
function FΛ . The parameter Λ can be a scalar number, or an ordered N-tuple, e.g. in the case of the exponential probability distribution there is only
one parameter δ, thus Λ=δ, in the case of the normal probability distribution there are two parameters m and σ, so that Λ=[m, σ].
Actually, not all estimators have the same quality. It is required to determine some properties which guarantee admittance of the estimators. The
estimator Λ̂ is called the unbiased) estimator of the parameter Λ, if its mean is equal to the value of the parameter, hence:
An unbiased estimator: E(Λ̂) = Λ.
In the case where the relation holds only for large samples, i.e. if limn→+∞ E(Λ̂n ) = Λ, then the estimator is called asymptotically unbiased. The
estimator Λ̂ is called the best unbiased (effective) estimator of the parameter Λ, if its variance is the smallest one among all unbiased estimators, hence:
0
The best unbiased estimator: D(Λ̂) = min
D(Λ̂ )
Λ̂0 :E(Λ̂0 )=Λ
In descriptive statistics, there were mentioned two sample characteristics: the sample mean ξ¯ and the sample variance ξS 2 whose values for the data of
2
. If the sample x1 , x2 , . . . xn belongs to a random choice of the probability distribution whose parameters contain the mean
the sample are X̄ and Sn−1
2
].
and the variance Λ=[E(ξi ), D(ξi )], then the unbiased estimator for Λ is Λ̂=[X̄, Sn−1
¯ 1 Pn ξi , ξS 2 =
Example 6.1 Find the mean of random variables ξ=
i=1
n
with the mean m and with the variance σ 2 .
1
n−1
Pn
i=1
2
ξi − ξ¯ and ξS0 2 =
1
n
Pn
i=1
2
ξi − ξ¯ , where ξi are random variables
Solution. We use the properties of the mean. We start with ξ¯
n
n
X
1X
¯ = 1
E(ξ)
E(ξi ) =
m = m.
n i=1
n i=1
Modifying the random variable ξS 2 , we use the same way as we did in Example 5.8 and simultaneously we also apply the equation (5.6)
" n
#
" n
#
n
2
X
X
1
1
1
σ
1 X
2
2
2
2
¯ =
¯ =
E (ξi − ξ)
E (ξi − m) − nE (ξ¯ − m)
=
D(ξi ) − nD(ξ)
nσ − n
= σ2.
E(ξS 2 ) =
n − 1 i=1
n − 1 i=1
n − 1 i=1
n−1
n
As long as both means are equal to pertinent parameters, the numerical values of these variables for the data of a sample are unbiased estimators of
parameters.
Comparing with the previous relation we obtain
E(ξS0 2 ) =
n−1
n−1 2
E(ξS 2 ) =
σ , with
n
n
n−1 2
σ = σ2,
n→+∞
n
lim
hence, the estimator of the parameter σ 2 is only asymptotically unbiased. Nevertheless, let us notice that its variance is smaller than the variance of
ξS 2 , because
2
n−1
n−1
0
D(ξS 2 )=D
ξS 2 =
D(ξS 2 )<D(ξS 2 ).
n
n
Thus, the value of ξS 2 provides an unbiased estimator of σ 2 , which is not an effective estimator. Additionally, such an effective estimator does not have
to exist.
Various methods are used to construct other point estimators. The most simple method for such a construction is the method of moments. Let
x1 , x2 , . . . xn be a statistical sample which belongs to the random choice ξ1 , ξ2 , . . . ξn whose cumulative distribution function FΛ depends on k parameters:
Λ = [Λ1 , Λ2 , . . . Λk ] and let there exist µr (Λ1 , Λ2 , . . . Λk )=E(ξir ) for r=1, 2, . . . k. The estimator of the parameter Λ by the methods of moments Λ̂ is
determined by the solution of the following system of equations:
n
1X r
x
µ (Λ̂1 , Λ̂2 , . . . Λ̂k ) =
n i=1 i
r
for
(6.1)
r=1, 2, . . . k.
Example 6.2 Find an estimator of the parameter δ of the exponential probability distribution given by the cumulative distribution function
F (x)=Θ(x) 1−e−δx ,
δ>0.
Solution. We know from the probability theory that the mean of the exponential probability distribution satisfies E(ξ)= 1δ . As we have only one parameter
the system (6.1) is reduced to just one equation which provides the required estimator δ̂
1
n
1X 1
=
x =X̄
n i=1 i
δ̂
which renders
δ̂ =
1
.
X̄
Example 6.3 The distance of two objects was measured in meters. Supposing that the results of the measurement form a random choice from normal
distribution, find the unbiased estimators of the parameters of the pertinent normal distribution.
3398
3379
3359
3411
3331
3392
3420
3509
3175
3341
3236
3292
3491
3223
3367
3305
3305
3384
3298
3344
Solution. The parameters of the normal distribution N (m, σ 2 ) are determined by the mean m and the variance σ 2 of the pertinent random variable.
2
The unbiased estimators of these parameters can be calculated by the sample mean X̄ and the sample variance Sn−1
. We already know the formulae
for them, so we obtain
!
20
20
X
1 X
1
.
.
2
n = 20,
X̄ =
xi = 3348,
Sn−1
=
x2i − 20 X̄ 2 = 6867.58.
20 i=1
19 i=1
The estimator for the standard deviation, if we are interested in, can be obtained by the square root of the variance: Sn−1 =82.87.
Estimates and hypotheses – Interval estimators of the probability distribution parameters
Introduce the probability into the estimates in order to provide a measure of reliability of the estimators.
The estimation by a value Λ̂ does not provide any measure of reliability of the parameter Λ estimation. In the theory, the measure of reliability is
probability. Thus, in the present type of estimates the confidence level γ is chosen to determine the probability that an interval covers the actual value
of the parameter Λ. Numerically, the confidence level is chosen close to unity in order the probability to be high, the most frequent confidence levels
are γ=0.95 or γ=0.99. The resulting interval is called the confidence interval. Let us remark that the parameter Λ is fixed (no probability), though
unknown, and the boundaries of the confidence interval depend on the random choice, so they define random variables.
The confidence intervals are either one-sided, if only one interval bound is determined and the other is +∞ or −∞, or two-sided, if the interval is
bounded from both sides, as follows:
One-sided confidence interval, lower bounded (Λ̂D1 , ∞) : P Λ̂D1 < Λ = γ
One-sided confidence interval, upper bounded (−∞, Λ̂H1 ) : P Λ̂H1 > Λ = γ
Two-sided confidence interval
(Λ̂D2 , Λ̂H2 ) : P Λ̂D2 < Λ < Λ̂H2 = γ
The bounds of the two-sided confidence
interval
are usuallychosen such that both excluded unbounded intervals cover the value of the parameter with
. The decision on selecting the one-sided or two-sided confidence interval depends on
equal (small) probability, i. e. P Λ̂D2 > Λ =P Λ̂H2 < Λ = 1−γ
2
the formulation of each particular problem.
Example 6.4 Find the one-sided upper bounded confidence interval at the confidence level γ for the parameter δ of the random variable ξ with
exponential distribution.
−δx
Solution. The pertinent probability density of the random variable ξ is given by the relation pξ (x) = Θ(x)δe
. Up to the factor δ it reminds the
1
1 − 2 (2δx)
2
probability density of χ (2) according to the relation (5.3). We can easily verify that pξ (x) = Θ(x)2δ 2 e
=2δpχ2 (1) (δx), so we can use the
following relations:
n
X
2δξi ∼ χ2 (2), which implies 2δ
ξi ∼ χ2 (2n), hence η = 2nδ ξ¯ ∼ χ2 (2n).
i=1
Using the quantile of the χ2 distribution, we obtain the probability relation for the random variable η
χ2γ (2n) 1
2
2
¯
,
γ = P η < χγ (2n) = P 2nδ ξ < χγ (2n) = P δ <
2n ξ¯
χ2 (2n)
which provides the upper bound of the confidence interval: δ̂H1 = γ2n X̄1 , for the random choice which is represented by the sample x1 , x2 , . . . , xn with
P Pn
X̄ = n1
i=1 xi . The parameter δ is covered by the interval (−∞, δ̂H1 ) at the confidence level γ.
Example 6.5 Find the upper bounded confidence interval for δ from Example 5.4, giving the confidence level by γ=0.05.
Solution. In the referenced example, we found the sample mean X̄=111.73 with n=52. The quantile χ20.05 (2·52) cannot be found in Tab. ??, so it has
to be extrapolated
14 2
χ0.05 (100) − χ20.05 (90) = 113.1453 + 1.4¨(124.3421 − 113.1453) = 128.82
10
1
=(−∞, 0.011).
and we get the confidence interval −∞, 128.82
104 111.73
χ20.05 (104) ≈ χ20.05 (104) +
Example 6.6 Find the two-sided confidence interval at the confidence level γ for the parameter m of a random variable ξ with normal distribution
ξ∼N (m, σ 2 ).
2
Solution. We know that the point estimator of the parameter m is the sample mean X̄ and the point estimator of σ 2 is the sample variance Sn−1
. We
2
have the statistical sample x1 , x2 , . . . , xn , a set of values of n random variables ξ1 , ξ2 , . . . , ξn with normal distribution ξi ∼N (m, σ ). These assumptions
2
ξ̄−m √
n has
provide the random variable ξ¯ as a random variable with normal distribution ξ∼N m, σn . Using the relation (5.8), the random variable √
ξS 2
√
ξ̄−m
t distribution: √
n ∼ t(n − 1). In this relation, only the unknown parameter m cannot be obtained from the values of the sample. Hence, using
ξS 2
an appropriate quantile of the t distribution, we obtain
ξ¯ − m √
γ = P t 1−γ (n − 1) < √
n < t 1+γ (n − 1) .
2
2
ξS 2
The symmetry of the t distribution renders
√
√
ξS 2
ξS 2
¯
¯
γ = P ξ − √ t 1+γ (n − 1) < m < ξ + √ t 1+γ (n − 1) .
n 2
n 2
Finally, we can write the bounds of the required confidence interval
Sn−1
Λ̂H2 = X̄ + √ t 1+γ (n − 1).
n 2
Sn−1
Λ̂D2 = X̄ − √ t 1+γ (n − 1),
n 2
Table 6.1: The bounds of confidence intervals for the parameters of the normal distribution.
Λ̂D1
Λ̂H1
Λ̂D2
Λ̂H2
m
σ2
X̄ − tγ (n − 1) S√n−1
n
X̄ + tγ (n − 1) S√n−1
n
2
(n−1)Sn−1
2
χγ (n−1)
2
(n−1)Sn−1
2
χ1−γ (n−1)
Two-sided
One-sided
The most common types of confidence intervals cover those of the parameters of the normal distribution. One of them we have derived in Example
6.6. The formulae for the rest of the possibilities are gathered in Tab. 6.1. All derivations are analogical to the solution of Example 6.6.
m
X̄ − t 1+γ (n − 1) S√n−1
n
X̄ + t 1+γ (n − 1) S√n−1
n
2
(n−1)Sn−1
χ21+γ (n−1)
2
(n−1)Sn−1
χ21−γ (n−1)
2
σ2
2
2
2
Example 6.7 In Example 6.3, we calculated the point estimators of the normal distribution parameters obtained from a distance measurement. Based
on the results, find the two-sided confidence intervals for both parameters of the normal distribution, using the confidence level γ=0.95.
Solution. From Example 6.3 we have X̄ = 3348 and Sn−1 = 82.87, where n=20. The bounds of the intervals can be calculated from pertinent formulae
in Tab. 6.1. The necessary quantiles are found in Tab. 5.6
t 1+0.95 (20 − 1) = t0.975 (19) = 2.0930,
2
χ21+0.95 (20 − 1) = χ20.975 (19) = 32.8523, ,
2
χ21−0.95 (20 − 1) = χ20.025 (19) = 8.9065.
2
The confidence interval for the parameter m is
82.87
82.87
(3348 − √ 2.0930; 3348 + √ 2.0930 = (3309.2; 3386.8) .
20
20
Similarly, the confidence interval for σ 2 is
19 · 6867.58 19 · 6867.58
;
32.8523
8.9065
= (3971.8; 1.4650) ,
alternatively the confidence interval for the standard deviation σ can be obtained by extracting of roots of the bounds (63.023; 121.04).
The resulting intervals can be interpreted such that the measured data cover the actual value of pertinent parameter in 95 cases of 100. The width
of the interval depends on the character of the random variable and also on the number of measurements. If it seems that the interval is rather large,
the number of measurements should be increased as the nature of the randomness in the data probably cannot be influenced, unless we have another
equipment which is more accurate and provides data with smaller variance.
Estimates and hypotheses – Testing statistical hypotheses
A hypothesis is a proposition. Can it be verified by statistical methods?
A hypothesis in mathematical statistics is a proposition about a property of a random variable e.g. about the probability distribution of the random
variable , about the value of its parameter, about a mutual relationship of two random variables . . .
The hypothesis being verified is called the null hypothesis and it is denoted H0 . Nevertheless, the verification requires a comparison with another
possibility, i.e. with another hypothesis. Thus, the other hypothesis in the verification process is called the alternative hypothesis and it is denoted H1 .
The alternative hypothesis negates in some sense the null hypothesis in any case, so that only one of them can be considered acceptable.
Example 6.8 Demonstrate examples of statistical hypotheses.
Solution. We verify, whether an equipment provides the results with measurement errors declared by the producer. The measurement errors are expressed
by the standard deviation. Let the producer declare its value at σ0 . If we measure some data, we obtain a statistical sample which could be a random
choice from normal distribution: ξ∼N (m, σ 2 ). So that our task is to test the value of σ: σ=σ0 . The hypotheses are
H0 : σ = σ0 ,
H1 : σ 6= σ0 ,
where H1 represents the alternative that the error is different from that claimed by the producer of the equipment.
We could also verify, whether the measurement provides a random choice from the normal distribution N (m, σ 2 ). The pertinent hypotheses are
H0 : ξ ∼ N (m, σ 2 ),
H1 : ξ N (m, σ 2 ),
which can be interpreted e.g. by the probability density pξ
H0 : pξ (x) = √
(x−m)2
1
e− 2σ2 ,
2πσ
H1 : pξ (x) 6= √
(x−m)2
1
e− 2σ2 .
2πσ
Of course, there is a question of the parameters m and σ: Are they known?
Another possibility is whether the value of ξ measured by the aforementioned equipment is affected by the surrounding temperature T . A measure of
the mutual (linear) relation is the correlation coefficient ρξT which renders the following hypotheses:
H1 : ρξT 6= 0,
H0 : ρξT = 0,
because the value ρξT =0 pertains to uncorrelated random variables.
The acceptance or rejection of any hypothesis depends on some probability — the probability of an error which can appear in the inference. There are
two kinds of errors. We can reject the hypothesis H0 , though it is correct — this is the type I error. The probability of this error is called the significance
level of the test and it is denoted α. The value is chosen in a test as a small number, usually α=0.05 or α=0.01. On the other hand, if the hypothesis
H0 is not rejected though it is incorrect, we call the error the type II error. This error is denoted β and it is given by a particular typ of a test. The value
1−β is called the test strength and provides information about the reliability of the test. It will not be discussed in more details in this course.
The hypothesis testing, i.e. test of the hypothesis H0 against H1 , uses a sample characteristic T =T (x1 , x2 , . . . xn ) which is called the test statistic. The
value of the test statistic causes the rejection of the hypothesis H0 (or of H1 , if the test strength is known). Let K denotes a set of the values of the
test statistic T which rejects the hypothesis H0 against H1 . Thus we accept H1 . The set K is called the rejection range of the test. Otherwise, the
result of the test is formulated more carefully and the hypothesis H0 is not rejected. Nevertheless, the hypothesis H0 is not accepted, hence H1 is not
rejected, unless the test strength is known.
We determine a scheme which helps to make an inference in any type of a test:
•
•
•
•
•
Formulate the hypotheses H0 and H1 .
Set the significance level α.
Determine the test statistic T and pertinent rejection range K.
Calculate necessary sample characteristics an other required values.
Interpret the result based on rejection or acceptation of the hypotheses.
A common type of a test is a test about a parameter Λ of the probability distribution of pertinent random variable. Such a test uses the hypotheses
H0 : Λ=Λ0 , H1 : Λ∗Λ0 at the significance level α, where the relation ∗ is one of the relations 6=, >, <. The inferences of the test rely on the confidence
interval for the parameter with the confidence level 1−α. Thus, the rejection range K is the set of values out of the confidence interval, because such
an interval covers the value Λ0 with the probability 1−α. So e.g. we have
H0 : Λ=Λ0 , H1 : Λ6=Λ0
:
K = R \ (Λ̂D2 ; Λ̂D2 ).
A frequently used type of a test is the tests for one of the parameters m or σ 2 of the normal distribution. Here, the tests based on one measured sample
or two measured samples can be formulated.
If x1 , x2 , . . . xn is a sample pertinent to a random choice from normal distribution N (m, σ 2 ) and suppose that the variance σ 2 is known. The relation (5.7)
provides the confidence interval for m so it renders a formula for test statistic T and bounds of the rejection range. This characteristics of the test are
summarize in Tab. 6.2(A).
Table 6.2: The test of hypotheses for parameters of the random distribution at the significance level α.
(A)
X̄−m0 √
n
σ
Tm =
(B)
X̄−m0 √
n
2
Sn−1
Tm =
σ known
H1 : m > m0
H1 : m < m0
H1 : m 6= m0
H0 : m = m0
Tm > u1−α
Tm < −u1−α
|Tm | > u1− α2
σ unknown
H1 : m > m0
H1 : m < m0
H1 : m 6= m0
H0 : m = m0
Tm > t1−α (n − 1)
H1 : σ 2 > σ02
(C)
T
σ2
=
n−1 2
Sn−1
σ2
Tm < −t1−α (n − 1) |Tm | > t1− α2 (n − 1)
2
H0 : σ =
σ02
T
σ2
>
χ21−α (n
− 1)
H1 : σ 2 < σ02
T
σ2
<
χ2α (n
− 1)
H1 : σ 2 6= σ02
Tσ2 > χ21− α (n − 1)
2
or
Tσ2 < χ2α (n − 1)
2
The test requires the knowledge of the variance which, however, is usually unknown. Otherwise, the variance has to be estimated from the sample
values. Hence, according to the relation (5.8) we obtain the test statistic with the t distribution. The test characteristics are gathered in Tab. 6.2(B).
They are based on the confidence intervals for the parameter m whose bounds can be found in Tab. 6.1. This test is called the one-sample t test and
it is used to make an inference about the normal distribution whether its mean is a given number m0 (e.g. given by a standard). Thus, the hypothesis
H0 :m=m0 is tested against H1 .
Alternatively, the variance or standard deviation can be tested with respect to a reference value. The table 6.1 can again by useful as it also includes
the confidence interval for σ 2 . The characteristics of the test are written in Tab. 6.2(C). This test is called the one-sample χ2 test and it is used to
make an inference about the normal distribution whether its variance is a given number σ02 determined by a prescribed condition (e.g. according to a
standard). Thus, the hypothesis H0 :σ 2 =σ02 is tested against H1 .
Example 6.9 We examined whether the fuel consumption of a type of a car corresponds to the consumption data given by the manufacturer: 7 l per
100 km. We passed the same circuit of the length 227 km using ten cars. The circuit covered various types of roads and countryside. The results of the
consumption in liters per 100 km are shown in the table. Determine whether the measured consumption is statistically greater than that specified by the
manufacturer at the significance level α=0.05.
6.94
7.01
7.26
7.24
7.58
6.99
7.12
7.04
7.33
6.85
Solution. The result is inferred by the t test. As long as the question in the problem is whether the consumption is greater than the given standard,
we use the hypotheses H0 : m=7, H1 : m>7. The conditions for the test assessment are given in Tab. 6.2(B). We need to calculate the sample
2
, then the test statistics and pertinent quantile of the t distribution at the level 1−α=0.95. We obtain
characteristics X̄, Sn−1
X̄ = 7.136,
Sn−1 = 0.218,
T =
7.136 − 7 √
10 = 1.973,
0.218
t0.95 (10−1) = 1.833.
As T >t0.95 (9), the hypothesis H0 is rejected. The test confirmed statistically greater consumption than manufacturer provided.
Example 6.10 Similarly to the previous problem, the fuel consumption was examined. However, the aim of the test is to verify the deviations in the
consumption. Let the deviations from the standard consumption 7 l per 100 km are allowed at 5%, i. e. let us consider the standard deviation 0.35 l per
100 km. The consumption of 22 cars in l per 100 km are given in the table. Find, whether at the significance level α=0.05 the deviations are greater
than it is allowed.
7.56
7.42
7.71
7.19
6.66
7.19
6.98
6.54
6.86
7.32
7.06
6.6
7.64 7.06
7.36 7.17
6.66
6.4
7.78 7.15
7.37 8.12
Solution. The χ2 test is used for inference, because the data scattering is examined. The standard determines maximal value of the standard deviation
and the test is to demonstrate the violation of the standard. Thus, the hypotheses are H0 : σ 2 =0.352 , H1 : σ 2 >0.352 . The table 6.2(C) provides the
2
pertinent condition for this particular test. The test requires to calculate only the sample variance Sn−1
which provides the test statistic and then the
2
quantile of the χ distribution at the level 1−α=0.95. It renders
Sn−1 = 0.441,
T =
22 − 1
0.4412 = 33.409,
0.352
χ20.95 (22 − 1) = 32.6706.
As T >χ20.95 (21), the hypothesis H0 is rejected. The test confirmed that the deviations in consumption measurement are significantly greater than 0.35 l
per 100 km.
Let x1 , x2 , . . . xn be the sample pertinent to the random choice with the probability distribution N
N mx σx2 as above. Additionally, let y1 , y2 , . . . yk be
an independent sample pertinent to the random choice with the probability distribution N my , σy2 . If we suppose that the variances σx2 and σy2 are the
same, we obtain, according to the relation (5.8), the test statistic with the t distribution. The test characteristics are gathered in Tab. 6.3(A). The
test statistics are denoted as before, just the indices n−1 and k−1 in the sample variances are omitted and replaced by x and y respectively, to indicate
the pertinent sample. The test is called the two-sample t test and it is used to examine whether the both samples are pertinent to the same normal
probability distribution, as the variances are considered the same. The hypothesis H0 :mx =my is tested against the hypothesis H1 .
Table 6.3: Two-sample tests for parameters of the normal distribution at the significance level α.
(A)
X̄−Ȳ
q
1
S∗ ( n
+ k1 )
(n−1)Sx2 +(k−1)Sy2
n+k−2
T =
S∗2 =
(B)
T =
rX̄−Ȳ
2
S2
Sx
+ ky
n
2
S2 2
Sx
+ ky
n
2 2
2 2
Sy
Sx
n
k
+
n−1
k−1
σx = σy
H1 : mx > my
H1 : mx < my
H1 : mx 6= my
H0 : mx = my
T > t1−α (n + k − 2)
T < −t1−α (n + k − 2)
|T | > t1− α2 (n + k − 2)
σx 6= σy
H1 : mx > my
H1 : mx < my
H1 : mx 6= my
H0 : mx = my
T > t1−α (ν)
T < −t1−α (ν)
|T | > t1− α2 (ν)
2
Smax
= max{Sx2 , Sy2 }
2
Smin
= min{Sx2 , Sy2 }
2
2
H1 : σmax
> σmin
2
2
H1 : σmax
6= σmin
ν=
(C)
T =
2
Smax
2
Smin
H0 : σ 2 = σ02
T > F1−α (nmax − 1, nqmin − 1) T > F1− α2 (nmax − 1, nmin − 1)
nmax and nmin are respectively
numbers n and k of the sample
with greater and smaller variance
Example 6.11 Verify that the two-sample t test H0 :mx =my , H1 :mx >my summarized in Tab. 6.3(A) uses the correct rejection region bound at the
significance level α.
Solution. First, we show that the test statistic T has the t distribution. If ξi is the random variable whose value in the sample is xi , i=1, . . . , n and if
ηj similarly corresponds to yj , j=1, . . . , k, then the relations (5.6) and (5.7), properties of the sum of two independent random variables with normal
probability distribution and the assumption of equal variances σx2 =σy2 =σ 2 render
E ξ¯ − η̄ = mx − my ,
nσ 2 kσy2
D ξ¯ − η̄ = 2x + 2 = σ 2
n
k
1 1
+
n k
,
,
ξ¯ − η̄ − (mx − my )
q
∼ N (0, 1),
σ n1 + k1
and analogically
n−1
ξS 2 ∼ χ2 (n − 1),
2
σx
k−1
ηS 2 ∼ χ2 (k − 1),
2
σy
n−1
k−1
ξS 2 +
ηS 2
2
σx
σy2
=
n + k − 2 (n − 1)ξS 2 + (k − 1)ηS 2
∼ χ2 (n + k − 2).
2
σ
n+k−2
Thus, the random variable
ξ̄−η̄−(mx −my )
σ
r
√1
n
+ k1
n+k−2 (n−1)ξS 2 +(k−1)ηS 2
n+k−2
σ2
n+k−2
=q
ξ¯ − η̄ − (mx − my )
∼ t(n + k − 2)
1
1 (n−1)ξS 2 +(k−1)ηS 2
+
n
k
n+k−2
(n−1)ξ
+(k−1)η
S2
S2
has t distribution with given degree of freedom. The value of the random variable
for the sample is denoted S∗2 in Tab. 6.3. Therefore,
n+k−2
¯
the confidence interval with the level of confidence γ=1−α for the parameter m of the random variable ξ−η̄
reads


X̄ − Ȳ

P q
> t1−α (n + k − 2) = α,
1
1
S∗
+k
n
which confirms the rejection region of the pertinent test in Tab. 6.3(A). Similarly, the other relations on the table can be reasoned.
Nevertheless, the assumption of equal variances is not always acceptable. If such an assumption is unacceptable, the proves of Example 6.11 cannot be
used and the test statistic is given by Tab. 6.3(B). The table also defines other parameters of this test. The reason for such a calculation is out of the
scope of the present course. Let us only remark that the parameter ν expressing the degree of freedom is usually approximated by the closest integer
number and the pertinent quantile is found in Tab. 5.6.
The question is: Which variant of the test is to be used? The question can be answered by the statistics and requires one more type of a parametric
test. The test is called the two-sample F test and it is used to examine whether both samples come from the normal probability distribution with the
2
2
is tested against the hypothesis H1 . The characteristics of the test are written in Tab. 6.3(C),
=σmin
same variances. Hence, the hypothesis H0 :σmax
the proof for the formula of the test statistic and for the rejection region bounds uses, as in Example 6.11, the relation (5.9). Let us remark that the
indices max and min introduced in the table refer to the magnitude of the pertinent sample variances, so that e.g. if Sy2 >Sx2 , then nmax =k no matter
which of n and k is greater.
The two-sample F test is introduced in a close relation to two-sample t test. Nevertheless, it can be used also separately, if we are interested in a
comparison of the variances of two statistical samples, e.g. if the accuracy of the measuring devices for one physical quantity are to be compared.
Example 6.12 Two methods A and B were compared which assess the concentration c of nitrogen in soil. The method A is said to be more accurate
in the sense that the variance of the result should be smaller. Use the significance level α=0.05, to test the assumption of equal variances. Also prepare
a test to compare the measured values of the concentration.
cA
cB
0.166 0.173 0.179 0.192 0.193 0.194 0.194 0.196 0.198 0.199 0.201 0.203
0.206 0.207 0.208 0.208 0.228 0.246 0.249 0.254
0.153 0.161 0.161 0.164 0.165 0.172 0.176 0.179 0.185 0.193 0.225 0.228
0.229 0.233 0.237 0.239 0.242 0.243 0.254 0.258 0.286
Solution. Suppose that the results of both measurements have normal distribution. Our first task is to compare the variances in the two-sample F test,
which uses the hypotheses H0 : σcA =σcB against H1 : σcA <σcB . First we calculate all necessary sample characteristics:
NA = 20,
S2
NB = 21,
c̄A = 0.2046,
c̄B = 0.2087,
ScA = 0.02363,
ScB = 0.03986.
2
0.03986
The value of the test statistic is Tσ = Sc2B = 0.02363
2 =2.8461, where the condition for the rejection region is determined by the quantile of the F distribution
cA
F1−0.05 (21−1.20−1)=2.1555. According to the condition 2.8461>2.1555, the hypothesis H0 is rejected, so that the test confirmed statistical significance
of the smaller variance in the method A.
Our second task is to compare the means of both random variables using the two-sample t test, having in mind that the difference of the variances has
been accepted (even Tσ >F1−0.025 (21−1.20−1)=2.5089 for H0 : µcA =µcB , H1 : µcA 6=µcB ). In the test, we have to calculate the test statistic Tm and
to estimate the parameter ν
|c̄A − c̄B |
0.0031
Tm = q 2
=q
= 0.4001,
2
sc
sc
0.023632
0.039862
A
B
+ 21
+ NB
20
NA
s2
cA
ν=
NA
2
s2
cA
NA
NA −1
+
2
s2c
B
NB
2 2
sc
+
B
NB
NB −1
2
0.023632
0.039862
+
20
21
2 2
2 2
=
0.02363
20
19
+
0.03986
21
= 32.7826 ≈ 33.
20
The quantile of the t distribution which determines the rejection region is t1−0.025 (33)=1.6924. The rejection condition Tµ =0.4001≯1.6924 (H0 not
rejected) make us to infer that the difference in measured concentration by both methods A and B is not significant. Of course, this is what we want
as both methods measure in fact the same value of concentration.
Sometimes, the measured data in two samples are mutually related, e.g. if we measured the value of the same physical quantity at the samy place
by two devices simultaneously, or if we compared the values of the quantity in the beginning of the experiment and at its end, or if the value was
measured at two places which are mutually related etc.. In such cases, the experiment provides the paired sample determined by the statistical sample
of couples (x1 , y1 ), (x2 , y2 ), . . . (xn , yn ). In accordance with the previous notation, the pertinent random variables are denoted ξi , ηi and the differences
ζi =ξi −ηi are introduced. The differences are supposed to have the normal distribution ζi ∼N (m, σ 2 ). Fulfilling these conditions, the paired t test can
be performed. The test compares the means of ξi , ηi by transforming them to the one-sample t test for the sample of ζi . Thus, the hypotheses which
are tested are: H0 :m=0 (i. e. equality of the means for ξi and ηi ) against H1 utilizing Tab. 6.2(B).
Example 6.13 In Example 6.9 we verified the fuel consumption of cars in relation to manufacturer’s data. There are again 10 cars which circulated
the same test circuit in both directions. The results in consumption (l per 100 km) are summarized in the table. Find at the significance level α=0.05,
whether the consumption depend on the direction of circulation.
1st direction 6.94 7.01 7.26 7.24
2nd direction 6.81 7.25 7.41 7.19
7.58
7.2
6.99 7.12 7.04 7.33 6.85
7.64 7.31 7.48 7.05 7.14
Solution. The result can be inferred by he paired t test, as the specimens are the same (the same cars), only the direction of circulation changed. The
clockwise circulation is denoted as the sample ’x’ and the anti-clockwise circulation is the sample ’y’. To make it clearer, the cars are also numbered
to realize that we have the paired sample. The table is supplemented by on row which includes the differences in consumption with respect to the
orientation. If we ask whether the consumption depend on the direction we take the hypotheses H0 : m=0, H1 : m6=0, where m is the mean of the
consumption difference.
Auto i
1
2
3
1. smer xi 6.94 7.01
7.26
2. smer yi 6.81 7.25
7.41
zi =xi −yi 0.13 −0.24 −0.15
4
5
6
7
8
7.24 7.58 6.99
7.12
7.04
7.19 7.2
7.64
7.31
7.48
0.05 0.38 −0.65 −0.19 −0.44
9
10
7.33 6.85
7.05 7.14
0.28 −0.29
2
In Tab. 6.2(B), we find the condition for inference. It requires to calculate the characteristics of the sample Z̄, Sn−1
which give the test statistic and
α
also the quantile of t distribution at the level 1− 2 =0.975. We obtain
−0.112 − 0 √
Z̄ = −0.112,
Sn−1 = 0.322,
T =
10 = −1.010,
t0.975 (10−1) = 2.262.
0.322
As |T |≯t0.975 (9), the hypothesis H0 is not rejected. The test did not confirm that the consumption depends on the direction of circulation. Such a
result should be positive for a circuit representing an ‘average countryside’.
The analysis of data also requires to verify the probability distribution of the sample, e.g. if we want to make the t test, the sample should be of normal
distribution. The inference about such a fact is called the goodness-of-fit test. There are several tests of good fit, we will demonstrate one of them
which is called the Pearson χ2 goodness-of-fit test. This test is universal in the sense that it can be used for discrete as well as for continuous probability
distributions. A disadvantage of the generality of the test is that it is only an asymptotic test, requiring greater number of measured data, because the
probability distribution for the test statistic is exactly χ2 only in the limit case of a general probability distribution .
The starting point for this test is the frequency table which divides the measured data to k categories. The categories are defined by Ci = hai ; bi ) chosen
so that the distribution function is continuous at the end points (this is always satisfied for a continuous random variable) and such that bi =ai+1 . The
test is used to make an inference about the relation of the sample to particular probability distribution so the hypothesis H0 is formulated as: The random
variable ξ has the probability distribution Ξ given e.g. by the cumulative distribution function FΛ , dependent on the parameters Λ. The hypotheses H1
than negates it. If the probability that the random variable with cumulative distribution function FΛ reaches the values in the ith category is denoted
pi , the probabilities can be calculated by the relation
pi = FΛ (bi ) − FΛ (ai ),
or at least
pi = FΛ̂ (bi ) − FΛ̂ (ai ),
where Λ̂ denotes the point estimate of those parameters Λ which are unknown. If, e.g. the hypothesis H0 denotes the normal distribution N (3.2, 0.12 ),
then both parameters are given and both are included in Λ. However, if the hypothesis H0 declares a general normal distribution, then the parameters m
and σ 2 have to be estimated and both are in Λ̂, e.g. m can be estimated by the sample mean calculated from the frequency table by the formula (5.1).
The test is determined by the test statistic which has (asymptotically) χ2 distribution and thus provides also the condition for the rejection range K:
T =
k
X
(ni − n pi )2
i=1
n pi
,
K:T > χ2 (k − g − 1),
where g denotes the number of parameters which have to be estimated. As discussed above, g=0 in the first example and g=2 in the second example.
All data are usually written into an extended frequency table 6.4.
Let us note that no value npi may be small, in order the assumption of χ2 distribution of the test statistic to be acceptable. In various references, various
threshold values are applied, in what follows we will us the condition npi >1. In the case that a category does not satisfy this condition, it is associated to
a neighbour category. This causes the increase of probability for the category so that npi also increases. The number of categories, however, decreases.
Table 6.4: The extended frequency table for the goodness-of-fit
test.
P
C1
C2
...
Ck
n1
n2
...
nk
n
p1
p2
...
pk
np1
np2
...
npk
(n1 −n p1 )2
(n2 −n p2 )2
(nk −n pk )2
...
T
n p1
n p2
n pk
Example 6.14 Prepare the test which examines wether the service times in Example 5.2 (cf. also Example 4.27) are in contradiction with the
assumption of exponential character of the random variable ξ. Use the significance level α=0.05.
Solution. The goodness-of-fit test is used: the hypothesis H0 states that the random variable ξ has the exponential probability density and the hypothesis
H1 claims that it does not have the exponential distribution. We prepare the extended frequency table by the expected probabilities. However, it requires
to estimate the parameter δ of the exponential distribution. Knowing that the mean of exponential distribution is the reciprocal value of δ and that the
mean can be estimated by the sample mean (see the aforementioned example), we obtain
δ=
1 est. 1 .
=
= 0.009286.
Eξ
X̄
The probability of the values of the random variable ξ in the category pertinent to the interval Ii =(ai , ai +50i is
.
pi = P (ξ ∈ Ii ) = e−δ ai 1 − e−50δ = 0.3714e−0.009286ai .
The extended frequency table thus reads:
Interval
ni
pi
n pi
(0, 50i
(50, 100i
(100, 150i
(150, 200i
(200, 250i
(250, 300i
(300, 350i
(350, 400i
(400, 450i
18
0.3714
19.3128
11
0.2335
12.1395
11
0.1467
7.6306
5
0.0922
4.7964
2
0.0580
3.0149
3
0.0364
1.8951
0
0.0229
1.1912
1
0.0144
0.7488
1
0.0091
0.4707
The test requires each category to have a value greater than one in the last row. Otherwise some categories have to be gathered together – in our case
P
2
i)
these are the last two columns. The table is also supplemented by a line of terms of the test statistic T = ki=1 (ni −np
with k categories.
npi
Interval
ni
pi
n pi
(ni −npi )2
npi
(0, 50i
(50, 100i
18
11
0.3714 0.2335
19.3128 12.1395
0.08924 0.1070
(100, 150i
(150, 200i
(200, 250i
(250, 300i
(300, 350i
(350, 450i
11
0.1467
7.6306
1.4878
5
0.0922
4.7964
0.008643
2
0.0580
3.0149
0.3416
3
0.0364
1.8951
0.6442
0
0.0229
1.1912
1.1912
2
0.0235
1.2195
0.4995
We need the value of the test statistic T =4.3692 (sum of the numbers in the last row of the table) to make an inference, the number of categories k=8
(after the reduction) and the number g=1 of estimated parameters (δ). The table of quantiles provides the value χ21−α (k−g−1)=χ20.95 (6)=12.592. As
T ≯12.592 (the rejection condition), the hypothesis H0 cannot be rejected. The measured data did not confirm significant difference from the exponential
probability density for the random variable ξ. Thus, ξ can be considered as a random variable with exponential distribution and with the parameter
δ=0.009286.
.
Even though, the test strength is not known, we can say something about reliability of the result. We can find that χ20.3732 (6)=T , which corresponds to
α=0.6268. If we rejected the hypothesis H0 , we would have the probability more than 60% that the inference was incorrect.
Finally, we note that the hypothesis H0 could include particular value of δ, if we set δ=0.01 according to Example 5.2, we would obtain different
.
probabilities pi =e−0.01 ai (1 − e−0.5 ) =0.3935e−0.01ai . The test statistic T would of course have a different value and also we would have no estimated
parameter (g=0), so the rejection range bound would be determined by a different quantile: χ20.95 (7)=14.067.
1. Find point estimators of the parameters of theprobability distributions mentioned in the probability theory.
2. Can you prove all the relations for confidence intervals in Tab. 6.1? Do yo prove other types of confidence intervals for the parameter of the
exponential distribution? And what about the random variables with other probability distribution?
3. Demonstrate the relations in Tab. 6.2 and the relations in Tab. 6.1(A,C)!
4. Do you realize differences between the pair and the two-sample t tests?
5. Why is the Pearson goodness of fit test said to be universal?
6. In which sense is the test statistics of the goodness of fit test approximated in order to get the χ2 distribution?
In the problems 1. to 3., it is supposed that the measured values belong to random choices from normal distribution. Find the confidence intervals for
its parameters.
1. The hight of chosen students in cm: 172, 184, 177, 192, 188, 175. Find the two-sided confidence intervals at the confidence level γ=0.95 .
2. The diameter of the dust particles in µm: 24.5, 3.2, 7.6, 21.4, 18.3, 13.5, 16.2, 29. Find the two-sided confidence intervals at the confidence
level γ=0.95 and upper-bounded one-sided confidence intervals at the confidence level γ=0.99 .
3. The strength limit of the specimen in MPa: 56, 59, 58, 57, 60, 55, 55, 52, 53, 57, 56. Find the two-sided confidence intervals at the confidence
level γ=0.95 and lower-bounded one-sided confidence intervals at the confidence level γ=0.99 .
In the problems 4. to 12., based on the data, select an appropriate test and pertinent hypotheses to make an inference. Make a verbal conclusion..
4. Inspection of the Office for Environmental Protection requires that the lead concentration in waste production was less than 0.5 units. Eleven
independent measurements were measured with following values: 0.8, 0,5, 0.6, 0.6, 0.6, 0.5, 0.6, 0.4, 0.5, 0.4, 0.7. Is there proved a statistically
significant difference with respect to maximal allowed value? Test at the significance level α=0.01 .
5. In the production of ceramic tiles, it is necessary to maintain the constant thickness. If the standard deviation σ of the thickness of the tiles in
the supply exceeds 0.1 mm, the product is considered poor. Ten measurements of the thickness were made: 7.25, 7.18, 7.03, 7.35, 7.22, 7.31, 7.28,
7.11, 7.24, 7.41. Test at the significance level α=0.05, whether the measured data prove an inferior product .
6. Quality inspection requires that a year of operation does not affect the noise transmittance of windows. In an experiment, there were used ten
windows working in the same conditions and the levels of noise (in dB) was measured just after the installation and after one year of operation. Find,
using the significance level α=0.05, whether there is a statistically significant increase of noise for windows after a year of operation.
Window no.
On installation
After one year
1 2 3 4 5 6 7 8 9 10
44 40 40 41 42 41 43 43 40 42
44 41 44 42 40 42 43 44 41 43
.
7. Does the deflection of a beam change? The maximum deflection of ten equal beams (in µm) was measured in one year interval, the values
are denoted as w0 (initial) and wy (after one year). Set the level of significance at α=0.05 to infer about statistically significant change of the beam
deflection due to the operation of the construction.
Beam no.
w0
wy
1
132
132
2
128
129
3
128
132
4
129
128
5
6
130 128
129 131
7
131
132
8
131
133
9
129
130
10
130
129
.
8. In order to determine whether a special coating reduces moisture permeability, an experiment has been designed with several material samples
(under the same external conditions). The measured data (in kg m−1 s−1 Pa−1 × 1012 ) are in the table. Make a test at the significance level α=0.05.
No coating
15.3 15.6 15.1 15.8 15.9 15.2 15.8 15.8
With coating 14.8 14.9 15.0 15.8 15.5 15.3 15.5 15.7
15.2
.
9. A newly developed special admixture to concrete should increase its strength. The table summarizes measured values of strength limit (in MPa)
in a pressure test applied to a specified type of concrete. Find at the significance level α=0.05, whether addition of small amount of admixture confirms
an increase of the concrete strength.
No admixture
5.6 5.9 5.8 5.9 5.7 5.7 6.0 5.5 5.7 5.5
With admixture 6.2 5.7 5.1 6.5 6.3 5.8 5.7 5.8 6.0
5.5
.
10. Having a standard dice, each number should have the same probability. Nevertheless, that one used in a game by E.S. and T.O. seemed to them
a bit uneven. Make a test, which can provide at the significance level α=0.05 an inference on regularity of the used dice. The results of an experiment
with 600 attempts are gathered in the table.
Dice
Count
96
91
83
120
118
92
.
11. In Problem 1 of Chapter 5, it is reasonable to suppose that the measured data form a random sample from the normal distribution. Use the
significance level α=0.05 to verify whether such an assumption does not contradict the measured values. Does the result change if the assumption on
the probability distribution is N (5, 3)? .
12. Based on the numerical and graphical analysis of data in Problems 2 to 5 of Chapter 5, prepare hypotheses on the probability distribution of the
pertinent random variables and prepare tests in the same way as in the previous problem. .
Conclusion
In this section we explained estimations of parameters and testing of statistical hypotheses. We have learned to estimate the unknown parameters of
random variables from a statistical sample, and we have seen two types of estimators: point estimators and interval estimators, in order we would be
able to estimate the values as well as to interpret and, using probability, to express their reliability. By this we also understand, where the estimates take
into account the randomness of the measured data. From the chapter we also know what statistical hypotheses are and how to formulate them properly
to verify their truthfulness, or at least statistical relevance by appropriate methods. As examples of statistical hypotheses and their testing, we learned
to use tests based on normal random choices and on the evaluation of the parameters of the normal probability distribution using t, F, or χ2 tests. We
can also verify the information about the probability distribution of a random choice by using the test of goodness of fit.
References
Problem solutions
1.
4.
8.
9.
173.05<m<189.61,4.926<σ<19.35
2. 9.668<m<22.38,5.583<σ<15.84
3. 54.57<m<57.80, 1.677<σ<4.213
m=0.56,Sn−1 =0.121,T =1.75,H0
5. Sn−1 =0.112,T =11.256,H0
6. m=0.8,Sn−1 =1.476,T =1.714,H0
7. m=0.9,Sn−1 =1.729,T =1.646,H0
m1 =15.52,Sn−1 1 =0.319,m2 =15.31,Sn−1 2 =0.376,Tσ =1.386,H0 ,Tm =1.646,H0
est.
est.
m1 =5.71,Sn−1 1 =0.176,m2 =5.90,Sn−1 2 =0.412,Tσ =5.50,H1 ,ν=10,Tm =1.296,H0
10. T =11.74,H1
11. m = 4.985,σ = 1.868,T =6.613,H0
Chapter 7. (Correlation and regression)
Aim
Draw basic ways of finding relations and dependencies between measured data.
Objectives
1. Learn to estimate the existence of a dependency of measured values in a pair sample.
2. Understand differences between correlation and regression.
3. Catch the way of determination of linear regression model parameters using the least square method.
4. Get knowledge about tests for regression parameters and about confidence intervals for them.
5. Learn to use regression for basic simple nonlinear relations.
Prerequisites
random variable; characteristics of a random variable; statistical tests; the least square method
Chapter 7. (Correlation and regression) – 1
Introduction
Practical problems frequently require to discuss existence of a relation between given values, e.g. of measured physical quantities or specify a type of
such a relation. The chapter treats with an important field of mathematical statistics which assesses relations and dependencies between measured data.
First, stating a mere fact of existence of a relation will be discussed We restrict ourselves to verification of a linear relation between measured data which
is called correlation. Additionally, only a relation between two samples will be discussed an similarly to the previous chapter, we focus on the case where
the statistical data in random samples pertain to random variables with normal distribution.
Confirming a relation between the data by a statistical method makes one to think more about the relation and its description. Thus, in the next section,
linear regression and its methods will be discussed. Its tools enable to determine or to make estimations of parameters of the linear function based on
the measured data. The regression analysis, however, offers more, e.g. using an appropriate statistical test to infer about satisfaction of a technical
standard or to estimate how much it differs from it.
The regression, however, can be used not only for linear relations. So in the final part, some simple nonlinear relations between measured data will be
mentioned if a linear relation between experimental data is not statistically significant, e.g. a type of exponential relation is quite common in physics so
it is practically useful to identify it and to determine or estimate its parameters. Of course, this is true also in the cases of other simple relations which
will be discussed.
Correlation and regression – Correlation analysis
Accept or reject existence of a relation between observed variables.
Correlation studies the existence of a relation. To express a level of correlation in measured data, we will use the Pearson sample correlation coefficient.
If a pair statistical sample (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ) is given which belongs to a population characterized by a pair sample of random variables
(ξi , ηi ), the Pearson sample correlation coefficient r is defined by the relation
Pn
Pn
x
−
X̄
y
−
Ȳ
i
i
i=1
i=1 xi yi − nX̄ Ȳ
r = qP
=q P
(7.1)
Pn 2
.
P
2
2
n
n
n
2
2
2
x
−
n
X̄
y
−
n
Ȳ
i=1 i
i=1 i
i=1 xi − X̄
i=1 yi − Ȳ
This sample characteristic is an unbiased estimator of the correlation coefficient ρ between random variables ξi and ηi . It is also used in statistical test
to make a decision on the value of the parameter ρ. Similarly to ρ, the sample coefficient also lies in the range between -1 and 1 and its absolute value
expresses strength of the correlation and the sign reflects the slope of the relation: plus for increasing function, minus for a decreasing one. A linear
transformation of data does not vary the sample correlation as well, it means that if the values xi are changed to a+bxi and yi to c+dyi , provided
that the signs of b and d are the same, then r does not change. The value r=0 pertains to non-correlated data — there is no linear relation between
such data. Of course, there may exist other type of a relation between them. It is nicely demonstrated in Fig. 7.1, where the values of r close to +1
or -1 belong to measured data grouped closely around a line, e.g. r=0.9 or r=−0.9. The intermediate values of r, e.g. r=0.5 or r=−0.5, provide an
observable linear trend, however for r≈0 are the data either randomly scattered or are grouped around a graph of other function whose linear trend is
negligible as it is shown in the middle bottom graph.
r=0.9
r=0.0
r=−0.5
r=0.5
r=0.0
r=−0.9
Figure 7.1: The Pearson sample correlation coefficient and scattering of measured data.
There also exist other types of correlation coefficients than the Pearson one, but they are out of the scope of this course. The advantage of the Pearson
coefficient appears in the cases where the samples come from the normal distribution. In such a case, a simple test for significance of the correlation
coefficient ρ can be introduced using its point estimator r. The null hypothesis H0 :ρ=0 is tested against the alternative H1 :ρ6=0 in the test. On rejecting
the hypothesis H0 , the correlation coefficient is said to be significant, thus there exists statistically significant correlation between the random variables
ξ and η. In the other case, the coefficient is statistically insignificant and there is no reason to talk about a linear relation between ξ and η.
If the statistical data (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ) form a pair sample pertaining to a pair of random variables (ξi , ηi ) with normal distribution, than
the test for the aforementioned hypotheses at the level of significance α is defined by the following test characteristic T and the rejection region K:
√
n − 2 |r|
T = √
,
K : T > t1− α2 (n − 2).
(7.2)
1 − r2
Thus, it is a t test, because the characteristic T has the t distribution, see also Tab. 6.2.
Example 7.1 The relation between stress τ and deformation ∆l in clay was examined by a pressure test, the measured values are in the table. Test at
the significance level α=0.05 the (linear) relation between τ and ∆l.
∆l [mm]
τ [MPa]
0.003 0.006 0.010 0.015 0.025 0.050 0.075 0.100 0.125 0.150 0.175 0.200
0.128 0.231 0.312 0.377 0.439 0.542 0.621 0.653 0.688 0.719 0.750 0.767
Solution. First, we find the sample characteristics needed for the relation (7.1) and we denote xi =∆li and yi =τi . Thus, we obtain
X̄ = 0.0778,
Ȳ = 0.5189,
Sn−1 (x) = 0.0708,
Sn−1 (y) = 0.2169,
n
X
xi yi = 0.6390.
i=1
In the calculation, we usually use the second formula from (7.1) and in the denominator we can use the standard deviations to have
r=
0.6390 − 12 · 0.0778 · 0.5189
= 0.9133.
(12 − 1) · 0.0708 · 0.2169
Now, we can test the hypothesis H0 :ρ=0 against the alternative H1 :ρ6=0. The values of the test statistic and the pertinent quantile of the t distribution
from the relation (7.2) are
√
12 − 20.9133
T = √
= 7.0896,
t0.975 (10) = 2.228.
1 − 0.91332
As long as 7.0896>2.228, the hypothesis H0 is rejected, resulting in statistical significance of a linear relation between the stress and the deformation.
Correlation and regression – Linear regression
Determine the form of the linear relation between the observed variables.
The correlation provides only existence of a relation, now we intend to find the formula for the relation.
Let the random variable η depend on a deterministic variable x linearly which is given by the relation
η = ax + b + ε,
(7.3)
where ε is a random variable with a zero mean and a nonzero variance. Introducing a random variable in a relation of two physical quantities can be
caused e.g. by inaccuracy of η measurement, or alternatively also by random character of the relation itself, thus the result is also random. Such relation
is called the linear regression model and the parameters a, b are called the regression coefficients.
The aim of the linear regression analysis is to find the regression coefficients a, b or, better say, their estimators, â, b̂ based on a pair measurements
(x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ), where yi are the values of the random variable η in the sample ηi . Besides, the linear regression can test the appropriateness
of the model, expected values of regression coefficients, or can make an inference based on the confidence intervals for them.
The regression coefficients can be estimated by various methods. One of them is the least square method which we have already known. In following,
ŷi = âxi + b̂
ŷi
yi
xi
Figure 7.2: Linear regression and the least square method.
we find the regression line ŷ=âx + b̂, see Fig. 7.2 which is the best one in the least square sense. This means that it minimizes the function S=S(a, b)
S(a, b) =
n
X
h
2
(yi − axi − b) ,
i
â, b̂ = argmin[a,b]∈R2 S(a, b).
(7.4)
i=1
The minimization leads to the solution of the system
â
n
X
x2i + b̂
i=1
â
n
X
n
X
xi =
i=1
xi + b̂ n
i=1
n
X
xi y i
i=1
=
n
X
(7.5)
yi
i=1
which is called the normal system. The solution, hence the least square estimators, can be expressed in the form
â =
r Sn−1 y
,
Sn−1 x
b̂ = Ȳ − âX̄
(7.6)
where the standard notation for sample characteristics was used.
Example 7.2 Show that â and b̂ in (7.6) determine the minimum of the function S=S(a, b) in (7.4).
Solution. We find the minimum of S by differential calculus. The conditions for a stationary point are
0=
Sa0
=2
n
X
(yi − axi − b) (−xi ),
0=
Sb0
=2
n
X
i=1
(yi − axi − b) (−1),
i=1
which gives, after a small rearrangement, exactly the system (7.5). The second relation in (7.6) is a direct consequence of the second equation in (7.5)
after dividing by n. We use the Cramer rule to find the value of â:
P
P
n
n
P
n 2 P
n
n
n
xi x
y
x
xi
i
i
i
X
X
2
i=1
i=1
2
2
i=1
i=1
xi yi − n2 X̄ Ȳ ,
xi − n X̄ ,
Dâ = P
D = P
=n
=n
n
n
x
n yi
n i=1
i=1
i
i=1
i=1
n
P
Dâ
â =
= i=1
n
P
D
xi yi − nX̄ Ȳ
x2i − n X̄
2 =
r(n − 1)Sn−1 y Sn−1 x
r Sn−1 y
=
.
2
(n − 1)Sn−1 x
Sn−1 x
i=1
Finally, we use the second order derivative to verify that the solution actually provides minimum. It renders
00
Saa
=2
n
X
00
Sbb
x2i ,
00
Sab
= 2n,
=2
i=1
00
00
00
00 2
with Sbb
>0 (n>0) and Saa
Sbb
− (Sab
) >0 (4n
n
X
xi ,
i=1
Pn
2
2
i=1 xi − 4 (
i=1 xi ) >0). Thus the solution is a minimizer of S.
Pn
The last inequality is obtained from a well known relation between the quadratic xQ and the arithmetic xA =X̄ means
n
xQ ≥ xA , kde
x2Q
1X 2
=
x.
n i=1 i
The equality holds only in the case where all xi are mutually equal.
The minimum of S(a, b) is usually denoted SE and called the sum of errors. This sum obeys the following principal relation of linear regression:
n
X
i=1
yi − Ȳ
2
=
n
X
i=1
2
(yi − ŷi ) +
n
X
ŷi − Ȳ
2
,
(7.7)
i=1
2
2
2
and it provides the sum of errors expressed within the terms of other sample characteristics: SE=(n−1) Sn−1
y −Sn−1 x â . The principal relation is
written in the form ST =SE+SM which, beside SE, introduces also the total sum ST , the left-hand side of Eq. (7.7), and the sum of the model SM ,
the second term in the right-hand side of Eq. (7.7). It is good to see that the total sum does not depend on the regression model and the sum of the
model describes the deviations in the regression model. The term SE then expresses the errors unexplained by the used model, only by the random
variables, and these are minimal for the given regression model. The confidence of the regression model is thus determined by the ratio SM
and it is
ST
numerically equal to the square of the Pearson sample correlation coefficient.
Example 7.3 Show the relation r2 = SM
.
ST
Solution. The relation (7.6) renders
SM =
n
X
i=1
Thus, Eq. (7.7), the definition of
2
ST =(n−1)Sn−1
y
ŷi − Ȳ
2
=
n X
âxi + b̂ − âX̄ − b̂
2
2
= â2 (n − 1)Sn−1
x.
i=1
and Eq. (7.6) provide
2
â2 (n − 1)Sn−1
SM
x
=
=
2
ST
(n − 1)Sn−1
y
r Sn−1 y
Sn−1 x
2
2
Sn−1
x
= r2 .
2
Sn−1
y
Let us suppose now that the random variable ε in Eq. (7.3) has normal distribution ε∼N (0, σ 2 ). It is a common assumption if ε interprets a measurement
error in η depending on x. For the sample ηi , we can then write
1 2
2
ηi ∼ N (axi + b, σ ),
η̄ ∼ N aX̄ + b, σ
(7.8)
n
and there appears a new parameter σ 2 in the relation which characterizes the error. These assumptions enable us to think about the estimators â and
b̂ of the regression coefficients a and b as about the random variables with the normal distribution and about SE as a random variable with the χ2
distribution. It holds
σ2
1
SE
X̄ 2
2
â ∼ N a,
,
b̂ ∼ N b, σ
+
,
∼ χ2 (n − 2),
(7.9)
2
2
(n − 1)Sn−1 x
n (n − 1)Sn−1 x
σ2
so that â is an unbiased estimator of the parameter a, b̂ is an unbiased estimator of the parameter b and
σ2.
SE
n−2
is an unbiased estimator of the parameter
Example 7.4 Prove the relations (7.9) and unbiasedness of the pertinent estimators.
Solution. The relation (7.6) provides
Pn
n
n
X
X
(xi − X̄)
(x − X̄)
i=1 (xi − X̄)(ηi − η̄)
Pn
Pn
Pn i
=
(ηi − η̄) ==
η.
â =
2
2
2 i
i=1 (xi − X̄)
i=1 (xi − X̄)
i=1 (xi − X̄)
i=1
i=1
As ηi are independent and form a sample from a normal distribution, also their combination has a normal distribution. In addition, it holds
!
n
n
n
n
X
X
X
X
(xi − X̄)
(xi − X̄)
(xi − X̄)
(x − X̄)xi
Pn
Pn
Pn
Pn i
Eâ = E
ηi =
Eηi =
(axi + b) = a
= a,
2
2
2
2
i=1 (xi − X̄)
i=1 (xi − X̄)
i=1 (xi − X̄)
i=1 (xi − X̄)
i=1
i=1
i=1
i=1
Dâ = D
n
X
i=1
(x − X̄)
Pn i
η
2 i
i=1 (xi − X̄)
!
=
n
X
i=1
(xi − X̄)2
Pn
i=1 (xi
− X̄)2
2 Dηi =
n
X
i=1
(xi − X̄)2
Pn
i=1 (xi
− X̄)2
σ2
1
=
.
2
2
(n − 1)Sn−1
x
i=1 (xi − X̄)
2
2
2 σ = σ Pn
The first relation provides unbiasedness of the estimators and both relations give values of the parameters of the normal distribution in the first relation
of (7.9).
Similarly, the relation (7.6) renders the characteristics of the estimator b̂:
b̂ = η̄ − b̂X̄ =
n X
1
i=1
(xi − X̄)X̄
−
2
n (n − 1)Sn−1
x
ηi
which are
Eb̂ =
n X
1
i=1
Db̂ =
(xi − X̄)X̄
−
2
n (n − 1)Sn−1
x
n X
1
i=1
(xi − X̄)X̄
−
2
n (n − 1)Sn−1
x
Eηi =
n X
1
i=1
2
(xi − X̄)X̄
−
2
n (n − 1)Sn−1
x
(axi +b) = aX̄ +b−a
n
n
X
X
(xi − X̄)xi X̄
(xi − X̄)X̄
−b
X̄
= aX̄ +b−aX̄ = b,
2
2
(n
−
1)S
(n
−
1)S
n−1
x
n−1
x
i=1
i=1
n X
1
2 (xi − X̄)X̄
(xi − X̄)2 X̄ 2
2
Dηi = σ
−
+
= σ2
2
2
2S 4
n
n
(n
−
1)S
(n
−
1)
n−1 x
n−1 x
i=1
!
n
n
X
1
2 X (xi − X̄)X̄
(xi − X̄)2 X̄ 2
−
+
2
4
n n i=1 (n − 1)Sn−1
(n − 1)2 Sn−1
x
x
i=1
2
X̄
1
= σ2
+
.
2
n (n − 1)Sn−1
x
The first relation provides unbiasedness of the estimators and both relations give values of the parameters of the normal distribution in the second
relation of (7.9).
Now, look at the last relation in (7.9). The sum of errors SE gives
SE=
n
X
2
2
(ηi − η̄)2 − (n−1)Sn−1
x â .
i=1
We substitute the regression model into SE, use the relations (7.6), (7.1) and we obtain
SE =
n
X
(ηi − axi − b) − (η̄ − aX̄ − b) − (aX̄ + b − axi − b)
2
2
− â2 (n−1)Sn−1
x
i=1
=
n
X
2
2
(ηi − axi − b) + n(η̄ − aX̄ − b) + a
2
2
(n−1)Sn−1
x −2
i=1
n
X
(ηi − axi − b)(η̄ − aX̄ − b) − 2a
i=1
=
n
X
2
(ηi − axi − b − η̄ + aX̄ + b)(X̄ − xi ) − â2 (n−1)Sn−1
x
i=1
2
(ηi − axi − b)2 − n(η̄ − aX̄ − b)2 − a2 (n−1)Sn−1
x + 2a
i=1
=
n
X
n
X
2
(ηi − η̄)(xi − X̄) − â2 (n−1)Sn−1
x
i=1
n
X
2
2
2
2
(ηi − axi − b)2 − n(η̄ − aX̄ − b)2 − a2 (n−1)Sn−1
x + 2aâ(n−1)Sn−1 x − â (n−1)Sn−1 x
i=1
n
X
2
=
(ηi − axi − b)2 − n(η̄ − aX̄ − b)2 − (â − a)2 (n−1)Sn−1
x,
i=1
hence
2
n X
ηi − axi − b
σ
i=1
SE
= 2 +
σ
η̄ − aX̄ − b
√σ
n
!2
+
â − a
√
σ
n−1Sn−1 x
!2
.
The definition of the χ2 distribution, the first relation of (7.9) and Eq. (7.8) imply
η̄ − aX̄ − b
√σ
n
!2
â − a
2
∼ χ (1),
√
σ
n−1Sn−1 x
!2
2
∼ χ (1),
2
n X
ηi − axi − b
i=1
σ
∼ χ2 (n).
Taking into account assumption of independence of random variables SE, â and η̄ leads to defining of the distribution for SE according to the third
relation in (7.9). Finally, we calculate the mean
E(SE) =
n
X
2
2
E (ηi − axi − b)2 − nE (η̄ − aX̄ − b)2 − (n−1)Sn−1
x E (â − a)
i=1
=
n
X
2
2
Dηi − nDη̄ − (n−1)Sn−1
x Dâ = nσ − n
i=1
so that the unbiased estimator for σ 2 is
σ2
σ2
2
− (n−1)Sn−1
= (n − 2)σ 2 ,
x
2
n
(n − 1)Sn−1
x
SE
.
n−2
If the distributions for the random variables â and b̂ are known, the confidence intervals for regression coefficients can be found. Or a test for verification
of their values can be formulated. Both random variables have a normal distribution, so that for the confidence interval Tab. 6.1 can be used. The
two-sided confidence intervals at the confidence level γ are given by the relations
√
√
1 − r2
1 − r2
<a < â 1 + t 1+γ(n − 2) √
â 1 − t 1+γ(n − 2) √
2
2
r n−2
r n−2
v
v
n
n
uP
uP
(7.10)
u x2
u x2
√
√
1 − r2 t i=1 i
1 − r2 t i=1 i
b̂ − ât 1+γ(n − 2) √
<b < b̂ + ât 1+γ(n − 2) √
2
2
n
n
r n−2
r n−2
Based on these intervals, the test statistics and rejection regions can be derived for the tests characterized by the hypotheses
H 0 : a = a0 ,
H1 : a 6= a0 ,
or
H0 : b = b0 ,
H1 : b 6= b0 .
(7.11)
Choosing the significance level α, the t tests are obtained determined by the following characteristics (Ta for the test of a, Tb for the test of b):
p
(n − 2) r
(â − a0 ), Ka : |Ta | > t1− α2 (n − 2),
Ta = √
â 1 − r2
p
(7.12)
(n − 2) r (b̂ − b0 )
rP
Tb = √
, Kb : |Tb | > t1− α2 (n − 2).
n
â 1 − r2
x2i
i=1
n
The test characteristics were derived from the relations (7.9) and from the principal relations of the mathematical statistics (5.6).
Example 7.5 The relation between stress τ and deformation ∆l in clay was examined by a pressure test, the measured values are in the table.
In Example 7.1 we found a statistically significant linear relation between τ and ∆l. Find the estimators for the coefficients of the linear regression
model and determine the confidence intervals for them at the level of confidence γ=0.95.
∆l [mm]
τ [MPa]
0.003 0.006 0.010 0.015 0.025 0.050 0.075 0.100 0.125 0.150 0.175 0.200
0.128 0.231 0.312 0.377 0.439 0.542 0.621 0.653 0.688 0.719 0.750 0.767
Solution. First, we recall the sample characteristics calculated in Example 7.1 (where we denoted xi =∆li and yi =τi ) and find the required sum
X̄ = 0.0778,
Ȳ = 0.5189,
Sn−1 (x) = 0.0708,
Sn−1 (y) = 0.2169,
r = 0.9133,
n
X
Pn
i=1
x2i
x2i = 0.1279.
i=1
Now, we use the relations (7.6) to find the least square estimators
â =
0.9133 · 0.2169
= 2.7971,
0.0708
b̂ = 0.5189 − 2.7971 · 0.0778 = 0.3012.
We look out the quantile of the t distribution at the level 1+γ
=0.975: t0.975 (12 − 2) = 2.228 in Tab. ??. The confidence intervals are obtained from
2
the relations (7.10)
√
√
1 − 0.91332
1 − 0.91332
√
√
<a < 2.7971 1 + 2.228
2.7971 1 − 2.228
0.9133 10
0.9133 10
2.7971 − 0.3945 · 2.228 <a < 2.7971 + 0.3945 · 2.228
1.9180 <a < 3.6762,
r
r
√
√
1 − 0.91332 0.1279
1 − 0.91332 0.1279
√
√
0.3012 − 2.7971 · 2.228
<b < 0.3012 + 2.7971 · 2.228
12
12
0.9133 10
0.9133 10
0.3012 − 0.3945 · 2.228 · 0.1032 <b < 0.3012 + 0.3945 · 2.228 · 0.1032
0.2105 <b < 0.3920.
The intervals are relatively wide, as r2 =0.8341 which means that about 17% of errors are caused by the deviations from the linear model. The differences
can be seen in Fig. 7.3 and we can guess that a regression model other than linear could be better. Still, we do not know how to apply it.
τ [MPa]
0.8
0.6
0.4
measured
a∆l+b
0.2
0
0
0.05
0.1
0.15
∆l [mm]
0.2
Figure 7.3: Measured data and the linear regression model.
Correlation and regression – Transformations to linear regression models
The observed variables may behave arbitrarily and the relation between them does not have to be linear.
Let the points (xi , yi ), i = 1, 2, . . . , n represent measured values of the quantity η with respect to the values x and let the points lie ‘in a vicinity of
the graph’ of a simple unknown function f such that the relation between the values xi and yi can be expressed by the function f using a couple of
parameters a, b
yi = f(xi ; a, b).
(7.13)
Simultaneously it is supposed that the relation can be transformed by two functions Xi =gx (xi ) and Yi =gy (yi ) so that the resulting relation, containing
another couple of parameters A and B, is linear, it means that
(7.14)
Yi = gy f(gx−1 (Xi ); a, b) ⇒ Yi = AXi + B.
Then the relation η = f(x; a, b; ε) defines a regression model which can be transformed to a linear regression model Υ = Ax + B + ε.
The most common transformations use logarithmic and exponential functions, where gx (t)= ln t or gy (t)= ln t, alternatively gx (t)=tα or gy (t)=tα for
an appropriate α. Table 7.1 shows the most frequent transformations, which are used to linearize the pertinent nonlinear regression model.
Example 7.6 Transform a power function y=b xa to a linear one.
Solution. We apply a logarithmic function to both sides of the relation and we obtain
ln y = ln (b xa ) = ln b + a ln x.
y = beax
y = b xa
1
y=
ax + b
1
y=
(ax + b)2
y = ln(ax + b)
y = a ln x + b
√
y =a x+b
Y = ln y, X
Y = ln y, X
1
Y = , X
y
1
Y = √ ,X
y
y
Y =e , X
Y = y, X
Y = y, X
=x
= ln x
A = a, B = ln b
A = a, B = ln b
=x
A = a, B = b
=x
A = a, B = b
=x
= ln x
√
= x
A = a, B = b
A = a, B = b
A = a, B = b
Y = AX + B
Table 7.1: Linearization of a regression model.
y=f(x;a,b)
X=gx (x) a Y =gy (y) Transformed model
Hence, putting B= ln b and A=a, renders the relation Y =AX + B, where Y = ln y and X= ln x. This is exactly the statement of the second
transformation row in Tab. 7.1.
All the linearizing procedure of a regression model can be summarized as follows:
A1. Draw a line connecting points (xi , yi ).
B1. Choose a suitable relation y=f(x; a, b) and transform the data so that Y = AX + B.
C1. Calculate the correlation coefficient r for the transformed values Xi and Yi and test its significance.
D1. Having a statistically significant r, determine the regression coefficients A and B by the linear regression (they also determine the original coefficients
a, b).
E1. Possibly, make a test or find a confidence interval (using the relations (7.10) to (7.12)) which can help in making inferences about the searched
relation.
Physical relations often contain exponential, logarithmic or power functions which are sometimes hardly distinguishable. Nevertheless, a graph with
changed scales of axes can help in deducing the suitable relation. If the scale of an axis is changed to logarithmic, one of the mentioned functions
converts to linear, graphically to a line as we can observe in Fig. 7.4. The transformation of the axis correspond to the transformation relations in Tab.
7.1.
Y = AX + B
(a)
Y = AX + B
Y = AX + B
X = ln x
x
Y =y
Y = ln y
y
Y = ln y
y = b xa
y = b eax
y = a ln x + b
X=x
(b)
(c)
X = ln x
(d)
Figure 7.4: Linearization an a logarithmic scale: (a) no transformation, (b) power relation, (c) exponential relation, (d) logarithmic relation.
Example 7.7 Consider once more the pressure test in clay from Example 7.5. Find an appropriate nonlinear regression model to describe relation
between stress τ and the deformation ∆l and make the regression test to determine its significance. Calculate the estimators for the regression coefficients
in the proposed model and determine the confidence intervals at the level of confidence γ=0.99.
Solution. The linear regression model in Example 7.5 has not provided really satisfactory results, thus a more appropriate model can by chosen, the best
way is to draw a a graph. We plot the measured values and two possible type of curves which can approximate the relation: a power function τ =b (∆l)a ,
0<a<1 and a logarithmic function τ =a ln ∆l + b, a>0, Fig. 7.5. It seems that the logarithmic function better describes the relation between τ and
τ [MPa]
0.8
0.6
0.4
measured
b ∆la
a ln ∆l + b
0.2
0
0
0.05
0.1
0.15
∆l [mm]
0.2
Figure 7.5: Measured data and estimated types of relation.
∆l, but before the final inference we make the correlation test. In both nonlinear models (e.g. in he power ln τ = ln b + a ln ∆l) we need logarithms of
measured data:
−5.8091
−2.0557
−2.5903
−0.4764
λ= ln ∆l
η= ln τ
λ= ln ∆l
η= ln τ
−5.1160
−1.4653
−2.3026
−0.4262
−4.6052
−1.1648
−2.0794
−0.3740
−4.1997
−0.9755
−1.8971
−0.3299
−3.6889
−0.8233
−1.7430
−0.2877
−2.9957
−0.6125
−1.6094
−0.2653
In the power model, there is a linear relation between λ and η, in the logarithmic one between λ and τ . We calculate the respective sample correlation
coefficients (n=12)
λ̄ = −3.2197,
12
X
λ2i
= 146.9097,
i=1
η̄ = −0.7714,
12
X
ηi2
= 10.5448,
τ̄ = 0.5189,
12
X
i=1
12
X
i=1
12
X
i=1
i=1
λi ηi = 38.3184,
τi2 = 3.7488,
λi τi = −16.6371,
P12
rλτ = s
12
P
λi τi − 12λ̄τ̄
−16.6371 − 12 · (−3.2197) · 0.5189
12
= p(146.9097 − 12(−3.2197)2 )(3.7488 − 12 · 0.51892 )
P 2
λ2i − 12λ̄2
τi − 12τ̄ 2
i=1
i=1
i=1
= 0.9996,
P12
rλη = s
12
P
i=1
λi ηi − 12λ̄η̄
38.3184 − 12 · (−3.2197) · (−0.7714)
12
= p(146.9097 − 12(−3.2197)2 )(10.5448 − 12(−0.7714)2 )
P 2
λ2i − 12λ̄2
ηi − 12η̄ 2
i=1
i=1
= 0.9727.
√
r
The test of significance of r includes the hypotheses H0 : ρ=0, H1 : ρ6=0 and the test statistic T = √n−2
which respectively gets the values
1−r2
Tλτ = 113.2308,
Tλη = 13.2435.
In both cases, the hypothesis H0 is rejected at the level of significance α=0.01, because t1−0.005 (12−2) = 3.1692. The level of significance can even be
α=10−6 and the hypothesis H0 is still rejected as t1−0.5×10−6 (12−2) = 10.5165. For α=10−7 , the hypothesis H0 cannot be rejected in the power model
due to t1−0.5×10−7 (12−2) = 13.4395. Thus, the logarithmic model is more significant.
We determine the coefficients for the logarithmic regression model which converts to τ =aλ+b. The normal system (7.5) and its solution for the least
square estimators â and b̂ read
â = 0.1516,
12.2425â−3.2197b̂ =−1.3864,
⇒
b̂ = 1.0070,
−3.2197â+
b̂ = 0.5189,
so that we can write, see Fig. 7.6,
.
τ = 0.1516 ln ∆l + 1.0070.
τ [MPa]
0.8
0.6
0.4
measured
0,1516 ln ∆l + 1,0096
0.2
0
0
0.05
0.1
0.15
∆l [mm]
0.2
Figure 7.6: Approximation of measured data by a logarithmic function.
The confidence intervals for the coefficients a, b are
t0.995 (10)
t0.995 (10)
<a < â 1 +
,
â 1 −
Tλτ
Tλτ
3.1692
3.1692
0.1516(1 −
) <a < 0.1516(1 +
),
113.230
113.230
0.1473 <a < 0.1558,
v
v
u 12
u 12
uP 2
uP 2
u λi
u λ
t0.995 (10) t i=1
t0.995 (10) t i=1 i
b̂ − â
<b < b̂ + â
,
Tλτ
12
Tλτ
12
3.1692 √
3.1692 √
1.0070 − 0.1516
12.2425 <b < 1.0070 + 0.1516
12.2425,
113.230
113.230
0.9921 <b < 1.0218.
The relatively narrow confidence intervals prove the suitability and effectiveness of the logarithmic regression model.
Let us note that from the physical point of view, there is a problem close to zero where the logarithm is unbounded. In the vicinity of zero we should
look for a different model, usually for small deformations a linear model is appropriate, cf. Example 7.5, but we need more data to verify it statistically.
1. Can you explain why the sample correlation coefficient lies in the range of the interval h−1;1i?
2. Do you prove the principal relation of linear regression (7.7)?
3. Which test would you use for the ratio between SM and SE?
4. Can you deduce the confidence intervals for the regression coefficients in Eq. (7.10)? Write also relations for one sided confidence intervals.
5. How do you derive the test statistics for the regression coefficients in Eq. (7.12)?
6. Why is the transformation relation (7.13) required to depend on two parameters?
7. Prove all the transformations in Tab. 7.1.
8. Try to find also other linearizing formulae for nonlinear relations other than those in Tab. 7.1.
In the problems 1. to 3., make the test of correlation and find out the required characteristics using linear regression..
1. The examiner R.V. wanted to compare students’ effort to achieve good study results. Therefore, he chose randomly eighteen students and
compared their results in the in-course and the final exams, denoted respectively IC and FE, see the table. Based on the data, he suggested model
dependence. Find out, whether there exists a significant linear relationship between students’ results at the significance level α=0.05. If it exists, as
expected there, find the coefficients of the linear regression model and propose a test, the conclusion of which will provide information on the slope of
the regression line. Assess the test result in the sense of the first sentence of this paragraph.
IC
FE
16 16 17 17 17 17 18 18 19 22 23 23 24 25 25 26 27 29
35 38 34 35 35 40 33 35 38 40 41 43 48 49 52 51 55 58
.
2. The management of a water park planned construction of additional swimming pools and prepared a survey of dependence between the number
of visitors and so far constructed swimming pools. The table shows the average weekly visit rate V in thousands when operating with P constructed
swimming pools. Determine whether there is a statistically significant linear relationship between the visit rate and the number of pools at the
significance level α=0.05. If so, find the two-sided confidence intervals for regression coefficients at the confidence level γ=1−α. Can you then, based
on the confidence intervals, give statistically relevant information about the expected visit rate after construction of two additional swimming pools?
V
P
2
8
3
10
5
15
7
22
8
25
.
3. We experimentally investigated the dependence of cooling rates of ductile cast iron during the phase of heat treatment. We recorded data on
the change of temperature of the cast iron ∆T and the number ω of carbon particles in the cast iron, the measured values are shown in the table.
Determine whether there is a statistically significant linear relationship between the temperature change and the number of particles at the significance
level α=0.05. If so, find the two-sided confidence intervals for regression coefficients at the confidence level γ=1−α.
ω [mm−3 ]
∆T [K]
345
45
238
20
345
65
311
30
338
55
304
30
276
45
338
65
254
30
276
45
344
55
.
In the problems 4. to 8., check out the significance of the proposed regression model for the dependencies between the variables, optionally, if not
specified, estimate the type of the dependency between the measured values before. Follow the instructions in the individual problems and accordingly
perform further statistical analysis..
4. We measured the values of normal stress τ near the top of a V-punch depending on the distance X from it, the values are in the table. Set
the significance level α=0.05 to verify whether there exists a relationship of the type ’τ =a ln X+b’ between the stress and the position. If there is a
statistically significant dependence of this type, find the estimates of the parameters and determine for them the confidence intervals at the confidence
level γ=1−α. Try also other types of dependency and decide which one is the most appropriate regression model to describe the stress distribution.
X [cm]
τ [MPa]
0.08 0.19 0.37 0.63 1.02 1.61 2.50 3.82 5.83 8.80 13.28
10.29 9.25 8.29 7.53 6.83 6.18 5.54 4.92 4.32 3.70 3.08
.
5. We measured the values of shear stress close to an interface in a structural element depending on the distance from a reference point, the
measured values are in the table. Determine whether there is a relationship of the type ’τ = beaX between the stress and the position. If there exists a
statistically significant dependence of this type, find the estimates for the regression coefficients and determine the confidence interval for the parameter
a at the confidence level γ=1−α.
X [cm]
τ [MPa]
1.5 5
0.1 1.6
11
2.2
16
2.6
22
3.0
27
3.1
32
3.2
38
3.2
.
6. Pressing a rectangular block against flat foundation, there arises normal stress τ which should depend on the distance X and the linear theory
of elasticity provides this relation in the form ’τ =b X a . The theory also provides the parameter a which is equal to a=−0.226, if the materials of the
block and the foundation are the same. A numerical solution by the boundary elements method provided the results summarized in the table. Determine
whether there is a significant dependency of that type for numerical data at the significance level α=0.01. If so, propose a test to see if the value of the
parameter a estimated from the numerical solution contradicts its theoretical value at the significance level α.
X [mm]
τ [MPa]
0.01
429.8
0.02
419.6
0.04
349.3
0.08
300.4
0.16
256.6
0.32
219.6
0.63
188.1
1.25
161.3
2.5
138.7
5.0
119.8
.
7. The website of a computer vendor shows the evolution of prices of a new model of a computer at weekly intervals, see the table. Determine
whether it is possible to claim that the price decreases exponentially after the product placement on the market at the significance level α=0.05. If so,
find the two-sided confidence intervals for the coefficients of the exponential function at the confidence level γ=1−α. Try also other types of dependency
and decide which one is the most appropriate to describe the price evolution.
Week no.
Price [e]
1
610
2
610
3
590
4
575
5
560
6
560
7
530
8
530
9
520
10
500
11 12
500 500
.
8. We searched for a time dependence of the temperature in the heat transfer through an interface and we obtained experimental data written in the
table. Determine whether there exists a statistically significant exponential dependence of temperature change ∆T on time t at the significance level
α=0.05. If so, find the two-sided confidence intervals for the regression coefficients at the confidence level γ=1−α.
t [h]
0.250 0.375
∆T [K] 0.001 0.019
0.5 0.625 0.750 0.875 1.000
0.146 0.511 1.191 2.203 3.519
.
Conclusion
We showed how to make inferences about simple relations between various quantities by means of mathematical statistics. First, we found a characteristic
which determines a level of linear dependency between the quantities. It is the correlation coefficient. Confirming statistical significance of data correlation
we learned to quantify the relation by the parameters and their estimation by linear regression. As practical problems leads also to other type of functional
dependencies of measured data, we demonstrated in the last section of the chapter regression analysis for simple nonlinear functions. We described a
way for linearization of a nonlinear function so that all dependencies which can be transformed to a linear one, we can treat by correlation and linear
regression.
References
Problem solutions
1. r=0.9346, T =10.8356, a=2.6018, b=6.9208
2. r=0.998, T =26.568, a=3.077, b=1.215
4. r=0.9997, a= − 1.425, b=6.828
5. r=0.673, T =2.226, no regression
7. r=0.985, a= − 0.0205, b=625.3
8. r=0.952, a=10.315, b=0.0003
3. r=0.772, T =3.639, a=0.289, b= − 44.532
6. r= − 0.998, T =48.661, a= − 0.216, b=171.1

numerická matematika a matematická štatistika (numerical

Transcription

Similar documents

LRTrek - ActFX Algorithmic Trading

Monster Jobs - by Carpe Data

EM27/SUN

Afni Background Check Process Guide

NTREIS Internet Explorer Settings

1. Go to ezlm.adp.com and click the

Math English D 041-050

Thrust Force Analysis of a Rotating Ionocraft Under

AroundAboutTown.com Floorplan

1030 TABBER