www.dcs.shef.ac.uk - Machine Learning

Transcription

Polynomial roots and
approximate greatest common divisors
by
Joab Winkler
Lecture notes for a Summer School at
The Computer Laboratory
The University of Oxford
Oxford
England
17-21 September 2007
Supported by the Engineering and Physical Sciences Research Council
c Joab Winkler, 2007
Copyright Introduction
The determination of the roots of a polynomial is a classical problem in mathematics
that has played an important part in its development through the centuries. It has
motivated the introduction of many important concepts of mathematics, including
irrational, negative and complex numbers. Although the study of polynomial equations does not presently play a leading role in general computational mathematics,
it forms an important part of computer algebra, which is applied widely in algebraic
geometry computations.
There exist many algorithms for the computation of the roots of a polynomial,
but the results deteriorate as the degree of the polynomial increases, or the multiplicity of one or more of its roots increases, or the roots become more closely spaced.
Furthermore, even roundoff errors are sufficient to cause totally incorrect results, and
thus the problems are compounded when the coefficients are subject to uncertainty.
These notes describe a new method for solving polynomial equations that has been
shown to be significantly better, particularly for ‘difficult polynomials’ in the presence
of noise, than results obtained by the standard methods, such as Newton’s method.
The new method uses resultants and subresultants, structured matrices, constrained
optimisation, information theory and the method of non-linear least squares. A very
important part of this algorithm is the determination of an approximate greatest
i
common divisor of two inexact polynomials, and this topic is covered in detail. The
theory is presented for the power (monomial) and Bernstein polynomial bases because
the power basis is the most frequently used polynomial basis, and the Bernstein
basis is used extensively in geometric modelling for the representation of curves and
surfaces.
These notes form the course material for a series of lectures on the computation
of the roots of a polynomial and approximate greatest common divisors that were
given at the Computer Laboratory, The University of Oxford, Oxford, England in
September 2007. These lectures were given in response to the program call Maths for
Engineers by the Engineering and Physical Sciences Research Council.
Joab Winkler
The University of Sheffield
Sheffield, United Kingdom
September 2007
ii
Acknowledgements
The author wishes to thank the Engineering and Physical Sciences Research Council
for its financial support for the lecture course. He also wishes to thank John Allan,
his PhD student, who developed the MATLAB computer code for all the examples
in these notes. The material in this document forms a major part of his PhD thesis.
iii
Abbreviations and notation
GCD
...
greatest common divisor
LS
...
least squares
LSE
...
least squares with equality
MDL
...
minimum description length
ML
...
maximum likelihood
STLN
...
structured total least norm
H(p(x)) . . .
Shannon entropy of the discrete probability distribution p(x)
L(x)
...
code length of the symbols stored in x
S(f, g)
...
Sylvester resultant matrix for the power basis polynomials
f (x) and g(x)
Sk (f, g)
...
Sylvester subresultant matrix of order k for the power basis
polynomials f (x) and g(x)
S(p, q)
...
Sylvester resultant matrix for the Bernstein basis polynomials
p(x) and q(x)
Sk (p, q)
...
Sylvester subresultant matrix of order k for the Bernstein basis
polynomials p(x) and q(x)
iv
T (p, q)
...
Sylvester resultant matrix for the scaled Bernstein basis
polynomials p(x) and q(x)
Tk (p, q) . . .
Sylvester subresultant matrix of order k for the scaled
Bernstein basis polynomials p(x) and q(x)
ηc (x̃0 )
...
componentwise backward error of the approximate root x̃0
ηn (x̃0 )
...
normwise backward error of the approximate root x̃0
κc (x0 )
...
componentwise condition number of the root x0
κn (x0 )
...
normwise condition number of the root x0
ρ(x0 )
...
condition number of the root x0 that preserves its multiplicity
h(k) (x)
...
kth derivative of the polynomial h(x)
h
...
vector of coefficients of the polynomial h(x)
log x
...
log2 x
k·k
...
k·k2
v
Contents
Introduction
i
Acknowledgements
iii
Abbreviations and notation
iv
Contents
vi
List of Figures
viii
1 Introduction
1.1 Historical background . . . . . . . . . . .
1.2 A review of some methods for computing
1.3 Examples of errors . . . . . . . . . . . .
1.4 Summary . . . . . . . . . . . . . . . . .
. .
the
. .
. .
2 Condition numbers and errors
2.1 Backward errors and the forward error . . .
2.2 Condition numbers . . . . . . . . . . . . . .
2.3 Condition numbers, backward errors and the
2.4 The geometry of ill-conditioned polynomials
2.5 A simple polynomial root finder . . . . . . .
2.6 Summary . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . .
roots of a polynomial
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
forward error
. . . . . . . .
. . . . . . . .
. . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
2
5
9
.
.
.
.
.
.
10
11
18
21
25
34
40
3 The Sylvester resultant matrix
3.1 The Sylvester resultant matrix for power basis polynomials . . . . . .
3.1.1 Subresultants of the Sylvester matrix for power basis polynomials
3.2 The Sylvester resultant and subresultant matrices for Bernstein basis
polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
vi
42
43
45
53
60
4 Approximate greatest common divisors
4.1 Previous work . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 The non-uniqueness of the Sylvester resultant matrix . . .
4.3 Structured matrices and constrained minimisation . . . . .
4.3.1 Algorithms for the solution of the LSE problem . .
4.3.2 Computational details . . . . . . . . . . . . . . . .
4.4 Approximate GCDs of Bernstein basis polynomials . . . .
4.5 An approximate GCD of a polynomial and its derivative .
4.6 GCD computations by partial singular value decomposition
4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
62
63
64
66
71
76
90
100
108
124
5 A robust polynomial root finder
5.1 GCD computations and the multiplicities of the roots . . . . . . . . .
5.2 Non-linear least squares for Bernstein polynomials . . . . . . . . . . .
5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
126
127
128
132
6 Minimum description length
6.1 Minimum description length . . . . . . . . . . . . . . . . .
6.2 Shannon entropy and the length of a code . . . . . . . . .
6.3 An expression for the code length of a model . . . . . . . .
6.3.1 The precision that minimises the total code length .
6.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . .
133
134
138
146
147
153
162
Bibliography
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
163
vii
List of Figures
1.1
1.2
2.1
2.2
4.1
4.2
4.3
4.4
The computed roots of (x − 1)100 . . . . . . . . . . . . . . . . . . . . .
Perturbation regions of f (x) = (x − 1)m , m = 1, 6, 11, . . . , when the
constant term is perturbed by 2−10 . . . . . . . . . . . . . . . . . . . .
Backward and forward errors for y = g(x). The solid lines represent
exact computations, and the dashed line represents the computed approximation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The root distribution of four polynomials after the coefficients have
been perturbed: (a) f (x) = (x−0.6)3 (x−1)5 , (b) f (x) = (x−0.7)3 (x−
1)5 , (c) f (x) = (x − 0.8)3 (x − 1)5 , (d) f (x) = (x − 0.9)3 (x − 1)5 . . . .
(i)(a) The maximum allowable value of kzf1 k, which is equal to kf1 (x)k /µ,
(b) the computed value of kzf1 k; (ii)(a) the maximum allowable value
of kzg1 k/α, which is equal to kg1 (x)k /µ, (b) the computed value of
kzg1 k/α; (iii) the normalised residual krnorm k; (iv) the singular value
ratio σ54 /σ55 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The normalised singular values of the Sylvester matrix, on a logarithmic scale, for (i) the theoretically exact data S(fˆ1 , ĝ1 ), ♦; (ii) the given
inexact data S(f1 , g1 ), ; (iii) the computed data S(f˜1,0 , g̃1,0), ×, for
α = 10−0.6. All the polynomials are normalised by the geometric mean
of their coefficients. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
α = 101.4 . All the polynomials are normalised by the geometric mean
(i)(a) The maximum allowable value of kzf1 k, which is equal to kf1 (x)k /µ,
(b) the computed value of kzf1 k; (ii)(a) the maximum allowable value
of kzg1 k/α, which is equal to kg1 (x)k /µ, (b) the computed value of
kzg1 k/α; (iii) the normalised residual krnorm k; (iv) the singular value
ratio σ51 /σ52 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
viii
6
8
13
24
84
85
88
89
4.5
4.6
4.7
6.1
α = 100.1 . All the polynomials are normalised by the geometric mean
The variation with α of (i)(a) The maximum allowable value of kzp k,
which is equal to kp(x)k /µ, (b) the computed value of kzp k; (ii)(a) the
maximum allowable value of kzq k/α, which is equal to kq(x)k /µ, (b)
the computed value of kzq k/α; (iii) the normalised residual krnorm k;
(iv) the singular value ratio σ40 /σ41 . The horizontal and vertical axes
are logarithmic in the four plots. . . . . . . . . . . . . . . . . . . . . .
The normalised singular values, on a logarithmic scale, of the Sylvester
resultant matrix for (i) the theoretically exact polynomials p̂(x) and
q̂(x), ♦; (ii) the given inexact polynomials p(x) and q(x), ; (iii) the
corrected polynomials p̃(x) and q̃(x) for α = 102.8 , ×. All the polynomials are scaled by the geometric mean of their coefficients. . . . . . .
90
97
99
(a) A third order approximation polynomial, (b) a fifth order interpolation curve, and (c) a linear approximation, of a set of data points. . 136
ix
Chapter 1
Introduction
This chapter contains a brief historical review of the determination of the roots of
a polynomial, which is one of the classical problems in applied mathematics. Some
commonly used numerical methods for computing the roots of a polynomial are considered, and their advantages and disadvantages are stated. Several simple examples
are included in order to show that roundoff errors due to floating point arithmetic,
and errors in the polynomial coefficients, can cause a significant deterioration in the
computed roots. These examples provide the motivation for the subsequent chapters,
in which a robust method, that solves the problems that are highlighted in these
examples, is described.
1.1
Historical background
The computation of the roots of a polynomial f (x)
m
X
f (x) =
ai φi (x),
i=0
1
(1.1)
CHAPTER 1. INTRODUCTION
2
where φi (x), i = 0, . . . , m, is a set linearly independent basis functions, and ai is the
coefficient, assumed to be real, of φi (x), has a rich and long history. An excellent historical review, and the branches of mathematics that have been motivated by solving
(1.1), are discussed by Pan [33]. In particular, irrational and complex numbers, algebraic groups, fields and ideals are closely related to this problem, and in more recent
times, the development of accurate and stable numerical methods for the solution of
(1.1) has been a major area of research. The solution of (1.1) continues to be a major
area of research, motivated mainly by problems in algebraic geometry and geometric
modelling, because it arises in the calculation of the intersection points of curves and
surfaces that are defined by polynomials.
Closed form solutions that only involve arithmetic operations and radicals have
been obtained for polynomials of degree up to and including four, and Galois (18111832) proved that a closed form solution of (1.1) cannot exist for polynomials of
degree greater than four. In spite of this, the fundamental theorem of algebra states
that (1.1) always has a complex solution for all positive integers m. This absence of a
closed form solution motivated the introduction of iterative and numerical algorithms,
and some of them are reviewed in the next section.
1.2
A review of some methods for computing the
roots of a polynomial
Many numerical methods, using linear algebra, linear programming and Fourier analysis, have been developed for the solution of (1.1). These methods include Bairstow’s
method [11], Graeffe’s root-squaring method [11], the algorithm of Jenkins and Traub
3
[21, 22], Laguerre’s method [10, 15], Müller’s method [11], and variants of Newton’s
method [31]:
(a) Bairstow’s method is only valid for polynomials that have real coefficients.
Since complex roots for these polynomials occur in complex conjugate pairs,
and Bairstow’s method computes the real quadratic factors that generate these
complex conjugate roots, complex arithmetic is avoided in this method. Convergence requires a very good initial approximation of the exact root, in which
case convergence is quadratic, but the method converges slowly for quadratic
factors whose multiplicity is greater than one.
(b) Graeffe’s root-squaring method replaces the given polynomial by another polynomial whose roots are the squares of the original polynomial. The roots of
the transformed polynomial are therefore more widely separated than those of
the original polynomial, particularly for roots whose magnitude is greater than
one. This process of squaring the polynomial is repeated until the roots can
be calculated directly from the coefficients, such that the approximation to the
root has been calculated to a sufficient number of decimal places. The sign of
each root is then easily computed from the original polynomial. It is clear that
this method fails when there are roots of equal magnitude, but this problem
can be overcome [35].
(c) The Jenkins-Traub algorithm involves three stages and is only valid for polynomials with real coefficients. The roots of the polynomial are computed one at
a time, and roots of multiplicity m are found m times. They are computed in
approximately increasing order of magnitude in order to avoid instability that
4
arises when deflating with a large root [43]. It is fast and globally convergent for
all distributions of the roots, and real arithmetic is used, from which it follows
that complex conjugate roots are computed as quadratic factors.
(d) Laguerre’s method is almost always guaranteed to converge to a root of the
polynomial, for all values of the initial estimate. Empirical evidence shows that
it performs well, which makes it a good choice for a general purpose polynomial
root solver. Each iteration of Laguerre’s method requires that the first and
second derivatives be evaluated at the estimated root, which makes the method
computationally expensive. The method has cubic convergence for simple roots,
and linear convergence for multiple roots.
(e) Müller’s method is based on the approximation of the polynomial in the neighbourhood of the root by a quadratic function, which is better than the linear
approximation used by Newton’s method. An estimate of the root is calculated
from this quadratic function, and the process is repeated by updating the points
at each stage of the iteration. Experience shows that Müller’s method converges
at a rate that is similar to Newton’s method, but unlike Newton’s method, it
does not require the evaluation of derivatives. The method may converge to a
complex root from a real initial approximation.
(f) Newton’s method is an iterative procedure based on a Taylor series of the
polynomial about the approximate root. Convergence requires that the estimate
be sufficiently near the exact root, and problems can occur at or near a multiple
root if precautions are not taken. After a root is computed, it is removed from
the polynomial by division, in order that repetition of the iterative scheme does
5
not cause convergence to the same root. This removal process of a computed
root is called deflation, and polynomials of reduced degrees are obtained as
successive roots are deflated. These polynomials are subject not only to roundoff
errors, but also to the errors that occur when an approximate root is deflated
out, and this has a detrimental effect on the accuracy of the computed roots.
The accuracy of the roots can be improved by polishing them against the original
polynomial after all the roots have been deflated [43].
These methods yield satisfactory results on the ‘average polynomial’, that is, a
polynomial of moderate degree with simple and well-separated roots, assuming that
a good starting point in the iterative scheme is used. This heuristic for the ‘average
polynomial’ has exceptions, the best example of which is the Wilkinson polynomial,
20
Y
f (x) =
(x − i) = (x − 1)(x − 2) · · · (x − 20),
(1.2)
i=1
because its roots are very difficult to compute reliably. More generally, as the degree
of the polynomial and/or the multiplicity of a root increases, the quality of the results
obtained by standard numerical methods deteriorates, such that they cannot be used
for these classes of polynomials.
The next section contains some examples that illustrate the problems of computing
the roots of polynomials.
1.3
Examples of errors
This section contains several examples that show the difficulties of accurately computing the roots of a polynomial that has multiple roots. Examples 1.1 and 1.2 show
that roundoff errors can cause a significant deterioration in the computed roots, and
6
Example 1.3 shows the effect of a perturbation in a coefficient of a polynomial of high
degree.
Example 1.1. Consider the fourth order polynomial
x4 − 4x3 + 6x2 − 4x + 1 = (x − 1)4 ,
whose root is x = 1 with multiplicity 4. The roots function in MATLAB returns the
roots1
1.0002, 1.0000 + 0.0002i, 1.0000 - 0.0002i, 0.9998,
which shows that roundoff errors due to floating point arithmetic, which are about
O(10−16 ), are sufficient to cause a relative error in the solution of 2 × 10−4 .
Example 1.2. The roots of the polynomial (x − 1)100 were computed by the roots
function in MATLAB, and the results are shown in Figure 1.1.
4
3
2
Imag
1
0
−1
−2
−3
−4
0
1
2
3
Real
4
5
6
Figure 1.1: The computed roots of (x − 1)100 .
It is seen that the multiple root has split up into 100 distinct roots, such that it
1
This function uses the QR algorithm, which is numerically stable [17], to compute the eigenvalues
of the companion matrix.
7
cannot be deduced that the original polynomial contains a root of multiplicity 100 at
x = 1.
Example 1.3. Consider the effect of perturbing the constant coefficient of the polynomial (x − 1)10 by −ǫ. The roots of the perturbed polynomial are the solutions of
(x − 1)10 − ǫ = 0, that is,
1
x = 1 + ǫ 10
2πk
2πk
= 1+ǫ
cos
+ i sin
,
10
10
2πk
2πk
1
= 1+
cos
+ i sin
,
2
10
10
1
10
k = 0, . . . , 9
k = 0, . . . , 9,
if ǫ = 2−10 , and thus the perturbed roots lie on a circle in the complex plane, with
centre at (1, 0) and radius 1/2. It follows that a relative error in the constant coefficient of 2−10 = 1/1024 causes a relative error of 1/2 in the solution, which corresponds
to a condition number of 29 = 512.
If the more general equation (x−1)m = 0 is considered, and the constant coefficient
is perturbed by −2−10 , then the general solution is
1
2πk
2πk
x = 1 + 10 cos
+ i sin
,
k = 0, . . . , m − 1.
m
m
2m
These roots are shown in Figure 1.2, and it is seen that as m → ∞, they lie on a
circle of unit radius and centred at (1, 0) in the complex plane.
The results in these examples are in accord with the remarks of Goedecker [12],
who performed comparative numerical tests on the Jenkins-Traub algorithm, a modified version of Laguerre’s algorithm, and the application of the QR decomposition to
the companion matrix of the polynomial. Goedecker notes on page 1062 that None
of the methods gives acceptable results for polynomials of degrees higher than 50, and
he notes on page 1063 that If roots of high multiplicity exist, any . . . method has to
8
1
0.8
0.6
0.4
0.2
0
−0.2
−0.4
−0.6
−0.8
−1
0
0.5
1
1.5
Figure 1.2: Perturbation regions of f (x) = (x − 1)m ,
constant term is perturbed by 2−10 .
2
m = 1, 6, 11, . . . , when the
be used with caution.
It is seen that the error in the computed roots in Examples 1.1 and 1.2, and
Example 1.3, arises from roundoff errors due to floating point arithmetic and inexact
data, respectively. Roundoff errors are always present in numerical computations,
and their effect on the solution may be benign or significant. Uncertainties in the
data (experimental errors) are present in most practical examples, and it is therefore
necessary to quantify the effect of this source of error, and roundoff errors, when
considering the quality of a computed solution. These two sources of error have
fundamentally different properties. In particular, it is usually assumed that data
errors have a Gaussian distribution, and that the errors in the coefficients of the
polynomial are independent. By contrast, roundoff errors are not random, they are
often correlated, and they often behave like discrete, rather than continuous, random
9
variables [24].
1.4
Summary
This chapter considered briefly the historical background of the problem of computing
the roots of a polynomial, and some of the areas of mathematics that have developed
as a consequence. Several commonly used methods for the solution of polynomial
equations were considered, and examples that illustrate some of the difficulties that
arise were presented. These examples were included in order to motivate the difficulty
of the problem, and the next chapter will consider these problems in more detail, such
that a better understanding of their origins and potential solutions is obtained.
Chapter 2
Condition numbers and errors
It was shown in Chapter 1 that the determination of a multiple root of a polynomial
in the presence of errors, including roundoff errors, is sufficient to cause an incorrect
and unacceptable solution. In this chapter, the forward error, backward error and
condition number of a root of a polynomial are defined, and it is shown that they
quantify the results in Chapter 1. The relationship between these quantities for a
root of arbitrary multiplicity is established, and it is shown that a multiple root is
ill-conditioned and that its condition number increases as its multiplicity increases
when the perturbations applied to the coefficients of the polynomial are random
(unstructured). This ill-conditioning must be compared with the situation that occurs
when a structured perturbation that preserves the multiplicities of the roots is applied
to the coefficients of the polynomial, in which case a multiple root is well-conditioned,
even if it is of high multiplicity.
A very simple polynomial root finder is described in the last section of the chapter,
and it will be apparent that it differs significantly from the root finders described in
Chapter 1. This root finder contains, from a high level, the features of the root
10
CHAPTER 2. CONDITION NUMBERS AND ERRORS
11
finder that is considered in this report. The operations that are required for this
root finder are considered and it is shown that their implementation in a floating
point environment is not trivial because they are either ill-conditioned or ill-posed.
Moreover, the data in many practical examples is inexact, and thus a practical root
finder must be robust with respect to minor perturbations in the coefficients of the
polynomial.
It is assumed that the coefficients of the polynomial (1.1) are real, and that a
polynomial of degree m can be identified with a vector space of dimension (m + 1).
The elements of vectors that lie in this space are the coefficients of f (x) with respect
to the basis functions φi (x), i = 0, . . . , m. The 2-norm of f (x) is equal to the 2-norm
of the vector a of its coefficients,
kf (x)k = kak =
m
X
i=0
a2i
! 21
.
This norm is used exclusively in this document, unless stated otherwise.
2.1
Backward errors and the forward error
The error of a computed quantity must be calculated in order to assess its computational reliability. This error measure can be statistical, for example, an average case
measure, or deterministic, for example, an upper bound. There exist many sources of
errors in computation, and they may degrade the quality of the computed solution.
Some common sources of errors include:
• Modelling errors These errors arise when a simplified computational model
of a complex physical process is used. This may occur, for example, when a low
dimensional approximation of a high dimensional system is used.
12
• Roundoff errors These errors necessarily occur in floating point arithmetic.
• Data errors These errors are usually significantly larger than roundoff errors, and they are therefore important when computations are performed on
experimental data.
• Discretisation errors These errors occur when a continuum is replaced by a
discrete approximation, as occurs, for example, in the finite element and finite
difference methods.
These are the main sources of errors, but other sources must also be considered sometimes. For example, data may be sparse or the sample data may not be representative
of the population, in which case statistical analysis is required for the correct analysis
of the results.
The simplest error measure is the forward error, which is defined as the relative
error |δx0 | / |x0 |, in the solution x0 , but this cannot always be computed because
the exact solution x0 is not known. Another error measure, the backward error, is
therefore used, and it is based on the observation that the computed solution, which
is in error, is the theoretically exact solution of a neighbouring problem, that is, a
problem that is ‘near’ the problem whose solution is desired. Thus the forward error
is a measure of the distance between the exact solution and the computed solution,
and the backward error is a measure of the distance between the problem whose
solution is sought and the problem whose solution has been computed. The difference
between these two errors is shown in Figure 2.1, which is reproduced from [17], for
the function evaluation y = g(x). It is seen that the forward error is measured in
the output (solution) space, and the backward error is measured in the input (data)
space.
13
Input space
Output space
x
y = g(x)
backward
error
forward
error
x + δx
ŷ = g(x + δx)
Figure 2.1: Backward and forward errors for y = g(x). The solid lines represent exact
computations, and the dashed line represents the computed approximation.
The forward and backward errors of the root of a polynomial are related by its
condition number, and there are two types of backward error and condition number.
Specifically, each of these quantities can be measured in the componentwise sense
and the normwise wise, and they differ in the error model that is assigned to the
coefficients of the polynomials. In particular, it is assumed in the componentwise
error model that each coefficient ai is perturbed to ai + δai such that
ai + δai ≤ ai (1 + rεc ),
i = 0, . . . , m,
where r is a uniformly distributed random variable in the range [−1, +1] and ε−1
c is
the upper bound of componentwise signal-to-noise ratio. It follows that the componentwise error model is defined by
|δai | ≤ εc |ai | ,
i = 0, . . . , m,
(2.1)
that is, δai is a uniformly distributed random variable in the range [−εc |ai | , +εc |ai |].
This is a very simple model of roundoff error, which as noted in [24], contains many
assumptions that are not satisfied in practice. This simplicity of (2.1) does not,
14
however, diminish its use because it allows a closed form expression for the condition
number of the roots of a polynomial to be calculated using a simple probabilistic error
model.
The normwise error model is defined by
kδak ≤ εn kak ,
(2.2)
where ε−1
n is the upper bound of the normwise signal-to-noise ratio. Let µ ≥ 0 be a
random variable that has a one-sided Gaussian distribution, and thus if P denotes
probability, then
r
2 1
µ2
P (0 < µ ≤ kδak) =
exp − 2 dµ
π εn
2εn
0
Z εn kak r
2 1
µ2
exp − 2 dµ
≤
π εn
2εn
0
2
Z kak r
2
ν
=
exp −
dν
π
2
0
Z
kδak
= P (0 < ν ≤ kak) ,
where µ = νεn and ν has a one-sided Gaussian distribution. This inequality provides
a probabilistic interpretation of the error model (2.2).
The definitions of the componentwise and normwise error models (2.1) and (2.2)
respectively show that a componentwise quantity (backward error and condition number) is more refined than its normwise equivalent. The componentwise and normwise
backward errors are defined in Definitions 2.1 and 2.2, and formulae for them are developed in Theorems 2.1 and 2.2. Expressions for the componentwise and normwise
condition numbers are developed in Theorems 2.3 and 2.4.
Definition 2.1. The componentwise backward error of the approximation x̃0 , which
15
may be complex, of the root x0 of f (x) is defined as
(
)
m
X
ηc (x̃0 ) = min εc :
ãi φi (x̃0 ) = 0 and |δai | ≤ εc |ai | ; ã = a + δa .
i=0
Theorem 2.1. The componentwise backward error of the approximation x̃0 , which
may be complex, of the root x0 of f (x) is given by
|f (x̃0 )|
.
i=0 |ai φi (x̃0 )|
ηc (x̃0 ) = Pm
(2.3)
The perturbations in the coefficients that achieve this backward error are
!
ak f (x̃0 )
ak φk (x̃0 )
,
k = 0, . . . , m,
δak = − Pm
|ak φk (x̃0 )|
i=0 |ai φi (x̃0 )|
(2.4)
Proof. By definition, ãi = ai + δai , i = 0, . . . , m, and thus
m
m
m
m
X
X
X
X
ãi φi (x̃0 ) =
ai φi (x̃0 ) +
δai φi (x̃0 ) = f (x̃0 ) +
δai φi (x̃0 ) .
(2.5)
where (·) denotes the complex conjugate of (·).
i=0
i=0
i=0
i=0
By assumption, the term on the left hand side is equal to zero, and thus
m
m
m X
X
δak X
δai |ai φi (x̃0 )| .
|f (x̃0 )| = δai φi (x̃0 ) ≤
ai |ai φi (x̃0 )| ≤ max
k ak i=0
i=0
i=0
It follows that
|f (x̃0 )|
,
i=0 |ai φi (x̃0 )|
εc ≥ Pm
and the result (2.3) is established.
Consider now the perturbations in the coefficients that achieve this backward
error. By definition, these perturbations must satisfy
δak ηc (x̃0 ) = min εc : εc ≥ , k = 0, . . . , m ,
ak (2.6)
16
and (2.5) implies that they must also satisfy
m
X
f (x̃0 ) = −
δai φi (x̃0 ) .
(2.7)
i=0
Consider the perturbations, from (2.6), |δak | = ηc (x̃0 ) |ak |, or equivalently,
|ak | |f (x̃0 )|
|δak | = Pm
,
i=0 |ai φi (x̃0 )|
k = 0, . . . , m.
(2.8)
It must be verified that these perturbations also satisfy (2.7), and this is now established.
It follows from (2.8) that δak is given by
ak f (x̃0 )
δak = Pm
hk (x̃0 ) ,
i=0 |ai φi (x̃0 )|
|hk (x̃0 )| = 1,
k = 0, . . . , m,
(2.9)
where each of the functions hk (x̃0 ) is of unit magnitude and to be determined. The
substitution of this equation into the right hand side of (2.7) yields
Pm
ak f (x̃0 ) φk (x̃0 ) hk (x̃0 )
f (x̃0 ) = − k=0 Pm
,
i=0 |ai φi (x̃0 )|
and this equation determines the functions hk (x̃0 ) , k = 0, . . . , m. It follows that these
functions must satisfy
and thus
Pm
a φ (x̃ ) h (x̃ )
k=0
Pmk k 0 k 0 = −1,
i=0 |ai φi (x̃0 )|
hk (x̃0 ) = −
ak φk (x̃0 )
,
|ak φk (x̃0 )|
k = 0, . . . , m.
Equation (2.9) establishes the result (2.4).
Definition 2.2. The normwise backward error of the approximation x̃0 , which may
be complex, of the root x0 of f (x) is defined as
(
)
m
X
ηn (x̃0 ) = min εn :
ãi φi (x̃0 ) = 0 and kδak ≤ εn kak ; ã = a + δa .
i=0
17
Theorem 2.2. The normwise backward error, measured in the 2-norm, of the approximation (x̃0 ), which may be complex, of the root x0 of f (x) = 0 is given by
ηn (x̃0 ) =
|f (x̃0 )|
.
kφ (x̃0 )k kak
(2.10)
The perturbations in the coefficients that achieve this backward error are
δai =
f (x̃0 ) vi
kφ (x̃0 )k
v (x̃0 )T φ (x̃0 ) = − kφ (x̃0 )k
where
and
kv (x̃0 )k = 1.
(2.11)
Proof. The proof of this theorem follows closely that of Theorem 2.1. In particular,
since the term on the left hand side of (2.5) is equal to zero, it follows that
m
X
kδak
kak kφ (x̃0 )k .
|f (x̃0 )| = δai φi (x̃0 ) ≤
kak
i=0
Thus
|f (x̃0 )|
kδak
≤
≤ εn ,
kak kφ (x̃0 )k
kak
(2.12)
and (2.10) follows.
The perturbations in the coefficients that achieve this backward error must satisfy
(2.7) and (2.12). Specifically, consider the perturbation vector whose norm is given
by
kδak = ηn (x̃0 ) kak =
|f (x̃0 )|
,
kφ (x̃0 )k
from which it follows that
f (x̃0 ) vk (x̃0 )
δak =
kφ (x̃0 )k
where
kv(x̃0 )k =
m
X
k=0
|vk (x̃0 )|2
! 21
= 1,
and the functions vk (x̃0 ) are to be determined. The substitution of these expressions
18
for the perturbations δak into (2.7) yields
m
X
vk (x̃0 )φk (x̃0 ) = − kφ (x̃0 )k ,
k=0
and thus the functions vk (x̃0 ) must satisfy
v (x̃0 )T φ (x̃0 ) = − kφ (x̃0 )k
kv(x̃0 )k = 1.
and
This establishes the result (2.11).
The next section considers the componentwise and normwise condition numbers
of a root of a polynomial.
2.2
Condition numbers
Expressions for the componentwise and normwise condition numbers of a root of a
polynomial are derived in Theorems 2.3 and 2.4 respectively.
Theorem 2.3. Let the coefficients ai of f (x) in (1.1) be perturbed to ai + δai where
|δai | ≤ εc |ai | , i = 0 . . . m. Let the real root x0 of f (x) have multiplicity r, and let one
of these r roots be perturbed to x0 + δx0 due to the perturbations in the coefficients.
Then the componentwise condition number of x0 is
κc (x0 ) =
1 1
|δx0 | 1
= 1− 1
|δai |≤εc |ai | |x0 | εc
εc r |x0 |
max
r!
(x0 )|
|f (r)
Proof. Consider the perturbed polynomial f (x + δx),
m
X
f (x + δx) =
ai φi (x + δx)
m
X
i=0
|ai φi (x0 )|
! r1
i=0
=
m
X
i=0
(ai + δai ) φi (x + δx) −
m
X
i=0
δai φi (x + δx) .
.
(2.13)
19
Since x0 + δx0 is a root of the perturbed polynomial, that is,
m
X
(ai + δai ) φi (x0 + δx0 ) = 0,
i=0
it follows that
f (x0 + δx0 ) = −
By Taylor’s theorem,
m
X
δai φi (x0 + δx0 ) .
i=0
m
X
δxk
0
f (x0 + δx0 ) =
k=0
and
φi (x0 + δx0 ) =
m
X
δxk
0
k=0
and hence
m
X
δxk
0
k=0
k!
f
(k)
(x0 ) = −
f (k) (x0 ) ,
k!
m
X
i=0
k!
δai
(k)
φi (x0 ) ,
m
X
δxk
0
k=0
k!
(k)
φi (x0 ) .
Since x0 is an r-tuple root, f (k) (x0 ) = 0 for 0 ≤ k ≤ r − 1, and the perturbation δx0
is assumed to be small, it follows that
m
m
X
X
δxk0 (k)
r!
r
δai
φ (x0 ) ,
δx0 = − (r)
f (x0 ) i=0
k! i
k=0
and hence
1
r! r
|δx0 | = (r)
f (x0 ) 1
m
m
X
r
X
δxk0 (k)
δai
φi (x0 )
k!
i=0
k=0
! 1r
m
m X
X
δxk0 (k)
r!
≤
|δa
|
φ
(x
)
i
0
k! i
|f (r) (x0 )| i=0
k=0
! r1
m
X
1
r!
≤ εcr
|ai φi (x0 )|
,
|f (r) (x0 )| i=0
(2.14)
since δx0 is small and thus only the term corresponding to k = 0 is retained. It
20
follows that
|δx0 | 1
1 1
≤ 1− 1
|x0 | εc
εc r |x0 |
and the result (2.13) follows.
r!
(x0 )|
|f (r)
m
X
i=0
|ai φi (x0 )|
! r1
,
Theorem 2.4. Let the coefficients ai of f (x) in (1.1) be perturbed to ai + δai where
kδak ≤ εn kak. Let the real root x0 of f (x) have multiplicity r, and let one of these
r roots be perturbed to x0 + δx0 due to the perturbations in the coefficients. Then the
normwise condition number of x0 is
1 1
|δx0 | 1
= 1− 1
κn (x0 ) = max
kδak≤εn kak |x0 | εn
εn r |x0 |
Proof. It follows from (2.14) that
1
r! r
|δx0 | = (r)
f (x0 ) 1
r! r
≤ (r)
f (x0 ) r!
kak kφ (x0 )k
(r)
|f (x0 )|
r1
.
(2.15)
m
1
m
X
r
k
X
δx0 (k)
δai
φi (x0 )
k!
1
r! r
= (r)
f (x0 ) since δx0 is small, and thus
i=0
m
X
i=0
m
X
i=0
k=0
! 1r
m X
δxk0 (k)
φ
(x
)
|δai |
0
k! i
k=0
|δai φi (x0 )|
! r1
,
r1
r!
|δx0 | ≤
kδak kφ (x0 )k
|f (r) (x0 )|
r1
1
r!
r
≤ εn
kak kφ (x0 )k .
|f (r) (x0 )|
It follows that
|δx0 | 1
1 1
≤ 1− 1
|x0 | εn
εn r |x0 |
and the result (2.15) follows.
r!
kak kφ (x0 )k
(r)
|f (x0 )|
r1
,
The condition numbers (2.13) and (2.15) assume that the perturbations δai are
21
sufficiently small, such that only lowest order terms need be considered. There exist
circumstances when then this assumption is not satisfied, for example, two close
but distinct roots, and this situation is considered in [46]. Furthermore, condition
numbers are, by definition, worst case error measures, but it is also possible to derive
expressions for average case componentwise and normwise error measures [44].
The basis in which a polynomial is expressed may have a significant effect on the
condition numbers of its roots. For example, the componentwise condition numbers
of the roots of the power and Bernstein basis forms of the Wilkinson polynomial (1.2),
scaled so that all the roots lie in the interval [0, 1],
20 Y
i
,
f (x) =
x−
20
i=1
are shown in Tables 1a and 1b in [9], and it is seen that they are consistently lower, by
several orders of magnitude, for the Bernstein basis form of f (x) than for the power
basis form of f (x).
The next section establishes the connection between the forward errors, backward
errors and condition numbers of a root of a polynomial.
2.3
Condition numbers, backward errors and the
forward error
Formulae for the componentwise and normwise backward errors of a root x0 of multiplicity r of f (x) are stated in (2.3) and (2.10) respectively, and formulae for the
componentwise and normwise condition numbers of x0 are stated in (2.13) and (2.15)
respectively. It is shown in this section that these quantities are related in a simple
formula to the forward error of x0 .
22
It is readily verified that
P
1
1
ηc (x̃0 ) r
1
r! |f (x̃0 )| m
|ai φi (x0 )| r
i=0
P
,
κc (x0 )
εc =
εc
|x0 | |f (r) (x0 )| m
i=0 |ai φi (x̃0 )|
and since x0 is a root of multiplicity r, it follows that
(δx0 )r (r)
(δx0 )r (r)
f (x̃0 ) = f (x0 + δx0 ) ≈ f (x0 ) +
f (x0 ) =
f (x0 ),
r!
r!
and thus
κc (x0 )
ηc (x̃0 )
εc
1r
|δx0 |
εc =
|x0 |
Pm
|ai φi (x0 )|
Pm
i=0 |ai φi (x0 + δx0 )|
i=0
r1
≈
|δx0 |
,
|x0 |
to lowest order. It is readily verified that an identical expression is valid for the
normwise condition number and backward error, and thus to lowest order,
1
|δx0 |
ηc (x̃0 ) r
= κc (x0 )
εc ,
|x0 |
εc
(2.16)
and
|δx0 |
= κn (x0 )
|x0 |
ηn (x̃0 )
εn
1r
εn .
(2.17)
It follows that if r = 1, that is, x0 is a simple root, its forward error is equal to
the product of its condition number and the backward error of its approximation x̃0 .
Furthermore, if x0 is ill-conditioned and simple, it has a large forward error even if
its backward error is small, that is, with reference to Figure 2.1, a small error in the
input space leads to a large error in the output space. If r is sufficiently large, then
(2.16) and (2.17) reduce to
|δx0 |
≈ κc (x0 )εc
|x0 |
and
|δx0 |
≈ κn (x0 )εn ,
|x0 |
which are the conditions for which (2.13) and (2.15) hold with equality, that is, κc (x0 )
and κn (x0 ) attain their maximum values as r increases.
The multiplicity of x0 enters the expressions for ηc (x̃0 ) and ηn (x̃0 ) indirectly by the
constraints that a multiple root places on the coefficients ai . This must be compared
23
with the expressions for κc (x0 ) and κn (x0 ), for which r is both an implicit and explicit
argument, which reveals the advantage of the backward errors.
It is well known that finite precision arithmetic causes a multiple root to be computed as a cluster of closely spaced roots, from which it follows that a multiple root
is a not a differentiable function of the coefficients of the polynomial. If the radius of
the cluster is small, and it is sufficiently far from the nearest neighbouring cluster or
isolated root, then a simple interpretation of the computed solution is the approximation of the cluster of roots by a multiple root at the arithmetic mean of the cluster.
Although this appears a simple solution with an obvious justification - one is merely
‘undoing’ the effects of finite precision arithmetic - and procedures for the detection
and validation of the clusters have been developed [20], it can lead to totally incorrect
answers [32].
Figure 2.2 shows an example in which clustering fails to provide the correct multiplicities of the roots. The figure shows the computed roots, evaluated 1000 times,
of four polynomials, each of whose coefficients is perturbed by noise, using the value
εc = 10−7 . It is seen that when the roots are well spaced, their values can be estimated by simple clustering. As the roots merge, however, the clusters also merge,
such that they cannot be distinguished for the polynomials in Figures 2.2(c) and
(d), and moreover, it is impossible to deduce that the polynomials contain only two
distinct roots.
The perturbations considered in the derivation of (2.13) and (2.15) are random,
and as noted above, they are associated with the break up of a multiple root. There
exist, however, structured (non-random) perturbations that preserve the multiplicities
24
3
5
f(x) = (x−0.6) (x−1)
3
ε = 1e−007
5
f(x) = (x−0.7) (x−1)
c
0.08
ε = 1e−007
c
0.1
0.06
0.04
0.05
Imag
Imag
0.02
0
0
−0.02
−0.04
−0.05
−0.06
−0.08
0.5
0.6
0.7
0.8
0.9
Real
3
1
1.2
−0.1
1.3
5
0.1
0.05
0.05
0
0
Imag
0.1
−0.05
−0.1
−0.15
−0.15
0.9
0.95
Real
1
1.1
1
1.05
1.1
5
1.15
ε = 1e−007
c
−0.05
−0.1
0.85
0.9
f(x) = (x−0.9) (x−1)
c
0.15
0.8
0.8
3
ε = 1e−007
0.15
−0.2
0.75
0.7
Real
f(x) = (x−0.8) (x−1)
Imag
1.1
−0.2
0.8
0.9
1
1.1
1.2
1.3
Real
Figure 2.2: The root distribution of four polynomials after the coefficients have been
perturbed: (a) f (x) = (x − 0.6)3 (x − 1)5 , (b) f (x) = (x − 0.7)3 (x − 1)5 ,
(c) f (x) = (x − 0.8)3 (x − 1)5 , (d) f (x) = (x − 0.9)3 (x − 1)5 .
of the roots, that is, the multiple roots do not break up, and furthermore, the multiple root is well-conditioned with respect to these perturbations. These structured
perturbations are considered in the next section.
25
2.4
The geometry of ill-conditioned polynomials
Polynomials with one or more multiple zeros form a subset of the space of all polynomials. In particular, a root x0 of multiplicity r introduces (r − 1) constraints on
the coefficients, and since a monic polynomial of degree m has m degrees of freedom,
it follows that the root x0 lies on a manifold of dimension (m − r + 1) in a space of
dimension m. This manifold is called a pejorative manifold [19] because polynomials
that are near this manifold are ill-conditioned [23].
This section introduces the pejorative manifold of a polynomial and it is shown
that it plays an important role in determining when a polynomial is, and is not,
ill-conditioned. Specifically, a polynomial that lies on a pejorative manifold is wellconditioned with respect to (the structured) perturbations that keep it on the manifold, which corresponds to the situation in which the multiplicity of the roots is
preserved, but it is ill-conditioned with respect to perturbations that move it off the
manifold, which corresponds to the situation in which a multiple root breaks up into
a cluster of simple roots.
Example 2.1 shows that the condition number of a multiple root of a polynomial
is unbounded as the magnitude of the perturbation tends to zero, assuming that the
perturbation does not preserve the multiplicity of the root, that is, the polynomial
moves off its pejorative manifold.
Example 2.1. Consider the problem of determining the smaller real root of the
polynomial
f (x) = (x + 1 +
√
ǫ)(x + 1 −
√
ǫ),
(2.18)
where |ǫ| ≤ 0.01. Since a real root does not exist for ǫ < 0, the problem is ill-posed.
26
If it is assumed that 0 ≤ ǫ ≤ 0.1, then the problem is well-posed because the
solution is unique and changes in a continuous manner as ǫ changes. It is, however,
ill-conditioned because there exists a solution x0 , x1 for every value of ǫ, and an
arbitrarily small change in ǫ leads to an arbitrarily large change in the solution as
ǫ → 0. In particular,
x0 = −1 −
√
ǫ
and
x1 = −1 +
√
ǫ,
and thus
dx0
1
=− √
dǫ
2 ǫ
and
dx1
1
= √ .
dǫ
2 ǫ
(2.19)
A monic quadratic polynomial is of the form x2 + bx + c, and a double root exists if
b2 = 4c.
(2.20)
All real quadratic monic polynomials whose coefficients lie on the curve (2.20) in
(b, c) ∈ R2 have a double root, and this curve is therefore the pejorative manifold
for this class of polynomial. The condition (2.20) is satisfied by ǫ = 0, and the
polynomial (2.18) lies near this manifold. This proximity to the manifold is the cause
of its ill-conditioning.
The numerical condition (2.19) of the roots x0 , x1 of the quadratic polynomial
(2.18) is inversely proportional to the square root of its distance ǫ from the quadratic
polynomial that has a double root. This is a particular example of the more general
fact that a polynomial is ill-conditioned if it is near a polynomial that has a multiple
root.
A polynomial that is near a polynomial with a multiple root is ill-conditioned,
and it is recalled from above that a polynomial with a multiple root forms a subset
of the space of all polynomials. This leads to the definition of a pejorative manifold
27
of a polynomial.
Definition 2.3. A pejorative manifold of a monic polynomial f (x), of degree m,
P
whose root multiplicities are r1 , r2 , . . . , rk , is a surface of dimension m − ki=1 (ri − 1)
in the space Rm of real monic polynomials of degree m in which f (x) lies.
Example 2.2. Consider a cubic polynomial f (x) with real roots x0 , x1 and x2 ,
(x − x0 )(x − x1 )(x − x2 ) = x3 − (x0 + x1 + x2 )x2 +
(x0 x1 + x1 x2 + x2 x0 )x − x0 x1 x2 .
• If f (x) has three distinct roots, then x0 6= x1 6= x2 , and the pejorative manifold
is R3 , apart from the points for which xi = xj , i = 0, 1; j = i + 1, 2.
• If f (x) has one double root and one simple root, then x0 = x1 6= x2 , and thus
f (x) can be written as
x3 − (2x1 + x2 )x2 + (x21 + 2x1 x2 )x − x21 x2 .
The pejorative manifold of a cubic polynomial that has a double root is, therefore, the surface defined by
−(2x1 + x2 ) (x21 + 2x1 x2 ) −x21 x2
x1 6= x2 ,
,
x1 , x2 ∈ R.
• If f (x) has a triple root, then x0 = x1 = x2 , and thus f (x) can be written as
x3 − 3x0 x2 + 3x20 x − x30 .
The pejorative manifold of a cubic polynomial that has a triple root is, therefore,
the curve defined by
−3x0 3x20 −x30
,
x0 ∈ R.
28
It was stated above that a multiple root is well-conditioned when the multiplicity
of the root is preserved, in which case the polynomial stays on its pejorative manifold.
This result is established in the next theorem.
Theorem 2.5. The condition number of the real root x0 of multiplicity r of the
polynomial f (x) = (x − x0 )r , such that the perturbed polynomial also has a root of
multiplicity r is
r
ρ(x0 ) :=
1 k(x − x0 ) k
1
∆x0
=
=
r−1
∆f
r |x0 | k(x − x0 ) k
r |x0 |
where
∆f =
kδf k
kf k
and
∆x0 =
r 2
2i
i=0 i (x0 )
Pr−1 r−12
(x0 )2i
i=0
i
Pr
! 12
,
|δx0 |
.
|x0 |
Proof. If f (x, x0 ) := f (x), then
f (x, x0 ) = (x − x0 )r
r X
r r−i
=
x (−x0 )i
i
i=0
r X
r
r
= x +
(−1)i (x0 )i xr−i .
i
i=1
A neighbouring polynomial that also has a root of multiplicity r is
f (x, x0 + δx0 ) = (x − (x0 + δx0 ))r ,
and hence
r X
r
f (x, x0 + δx0 ) − f (x, x0 ) =
(−1)i (x0 + δx0 )i − xi0 xr−i
i
i=1
r X
r
r−i
= δx0
(−1)i ixi−1
+ O(δx20 ).
0 x
i
i=1
(2.21)
29
Since
(x − x0 )
r−1
r−1 X
r−1
xr−1−i (−x0 )i
i
i=0
r 1X r
r−i
= −
(−1)i ixi−1
,
0 x
r i=1 i
=
it follows that to first order,
δf := f (x, x0 + δx0 ) − f (x, x0 ) = −rδx0 (x − x0 )r−1 ,
(2.22)
and thus the condition number of x0 that preserves its multiplicity is
1 k(x − x0 )r k
∆x0
=
.
∆f
r |x0 | k(x − x0 )r−1 k
Since
r
(x − x0 ) =
the result (2.21) follows.
r X
r
i
i=0
(−x0 )i xr−i ,
Example 2.3. The condition number ρ(1) of the root x0 = 1 of the polynomial
(x − 1)r is, from (2.21),
ρ(1) =
1
r
and since
r 2
X
r
i=0
it follows that
v
u
1u
ρ(1) = t
r
! 12
r 2
i=0 i
Pr−1 r−12
i=0
i
Pr
i
2r
r
2(r−1)
r−1
,
2r
=
,
r
1
=
r
r
2(2r − 1)
2
≈ ,
r
r
if r is large. This condition number must be compared with the componentwise and
normwise condition numbers,
κc (1) ≈
|δx0 |
εc
and
κn (1) ≈
|δx0 |
,
εn
30
which are proportional to the signal-to-noise ratio. By contrast, ρ(1) is independent
of the perturbation of the polynomial and it decreases as the multiplicity r of the
root x0 = 1 increases.
Example 2.4. Consider the Bernstein basis form of the polynomial
f (x) = (x − x0 )r
= (−x0 (1 − x) + x(1 − x0 ))r
r X
r
(1 − x0 )i (−x0 )r−i xi (1 − x)r−i
=
i
i=0
r
X
(r) r
=
ai
(1 − x)r−i xi ,
i
i=0
(r)
where the superscript (r) denotes that ai
is the ith coefficient of a polynomial of
degree r, and
(r)
ai = (−1)r−i x0r−i (1 − x0 )i .
A perturbation analysis similar to that in Theorem 2.5 shows that (2.22) is valid for
the Bernstein basis form of (x − x0 )r , and thus
where
v
u P 2
(r)
u
r
r
i=0 ai
1 k(x − x0 ) k
1 u
∆x0
u
=
=
,
∆f
r |x0 | k(x − x0 )r−1 k
r |x0 | t Pr−1 (r−1) 2
i=0 ai
(r)
(r−1)
ai
a
=− i ,
x0
i = 0, . . . , r − 1.
It therefore follows that
v
u P 2
u r
(r)
i=0 ai
∆x0
1 k(x − x0 )r k
1u
=
= u
,
∆f
r |x0 | k(x − x0 )r−1 k
r t Pr−1 (r) 2
i=0 ai
is the condition number of the Bernstein basis form of (x − x0 )r that preserves the
31
multiplicity r of x0 .
The next theorem extends Theorem 2.5 to the more general polynomial
f (x) = (x − x0 )r g(x),
g(x0 ) 6= 0,
(2.23)
where g(x) is a polynomial of degree n [19].
Theorem 2.6. The condition number of x0 of (2.23) such that its multiplicity r is
preserved is
ρ(x0 ) =
=
where
|∆x0 |
k∆f k


1
 sup degree k(x − x0 )r g(x)k |δh(x0 )| 

,
r |g(x0 )| |x0 |
k(x − x0 )r−1 δh(x)k
δh(x) ≤ n
δh(x) = (x − x0 ) δg(x) − rg(x)δx0 .
(2.24)
(2.25)
Proof. Let δf (x) be a perturbation in f (x) such that the multiplicity r of x0 is
preserved,
f (x) + δf (x) = (x − (x0 + δx0 ))r (g(x) + δg(x)) ,
where δg(x) is a polynomial to be determined. It follows that
δf (x) = (x − (x0 + δx0 ))r (g(x) + δg(x)) − (x − x0 )r g(x)
= ((x − x0 ) − δx0 )r (g(x) + δg(x)) − (x − x0 )r g(x)
= (x − x0 )r − r (x − x0 )r−1 δx0 (g(x) + δg(x)) − (x − x0 )r g(x)
= (x − x0 )r−1 ((x − x0 ) δg(x) − rg(x)δx0 )
= (x − x0 )r−1 δh(x),
to lowest order, where the polynomial δh(x), whose maximum degree is n, is defined
32
in (2.25). It follows that
δf (x) = (x − x0 )r−1 δh(x),
(2.26)
δh(x) + rg(x)δx0
.
x − x0
(2.27)
and from (2.25) that
δg(x) =
Since the polynomial δg(x) is non-rational, the denominator must be an exact divisor
of the numerator, and this condition is satisfied if
δx0 = −
δh(x0 )
,
rg(x0 )
(2.28)
which defines δx0 . It therefore follows from (2.27) and (2.26), respectively, that δg(x)
and δf (x) are uniquely specified for every polynomial δh(x).
It follows from (2.26) and (2.28) that the ratio of the change in the root x0 to the
change δf = δf (x) in the polynomial f (x) is
|δx0 |
1 δh(x0 ) 1
=
kδf k
r g(x0 ) kδf (x)k
1
|δh(x0 )|
=
.
|rg(x0 )| k(x − x0 )r−1 δh(x)k
(2.29)
The condition number (2.24) follows by dividing both sides of (2.29) by |x0 | / kf k,
and taking the supremum over all polynomials δh(x) of degree less than or equal to
n.
Theorems 2.5 and 2.6, and Examples 2.3 and 2.4, consider the simplest type of
polynomials that have a multiple root, but the general polynomial
"K
#
Y
f (x) =
(x − xi )ri g(x),
g(xi ) 6= 0,
i=1
where the roots of g(x) are simple, must be considered in order to calculate the
condition number of xi such that its multiplicity ri is preserved. The derivation of
these condition numbers, one for each multiple root, follows closely the method in
33
Theorems 2.5 and 2.6, and the resulting condition numbers are very similar to (2.24).
Consider the polynomial (2.23), whose condition number is given in (2.24),
ρ(x0 ) =
σ
,
r |g(x0 )| |x0 |
where σ is the function on the right hand side of (2.24), and the neighbouring polynomial,
f˜(x) = (x − x0 )r (g(x) − g(x0 )) ,
which has a root x0 of multiplicity (r + 1). The magnitude of the difference between
the polynomials is
and thus
kδf k = f (x) − f˜(x) = |g(x0 )| k(x − x0 )r k ,
kδf k =
σ k(x − x0 )r k
,
r |x0 |
ρ(x0 )
from which it follows that if the condition number ρ(x0 ) of x0 of f (x) is large, then
there is a neighbouring polynomial that has a root of multiplicity (r + 1). This
explanation of ill-conditioning requires that the nearest polynomial on the manifold
of polynomials that have an (r + 1)-tuple root be computed. If this polynomial is
also ill-conditioned, then it is necessary to compute the nearest polynomial on the
manifold of polynomials with an (r + 2)-tuple root.
The structured condition number ρ(x0 ) shows that a polynomial is ill-conditioned
when it is near a pejorative manifold, but it is well-conditioned when it is on the
manifold, except when it approaches manifolds of lower dimension. It follows, therefore, that if a polynomial with a root of multiplicity r is ill-conditioned, then it is
near a submanifold, and if this polynomial is also ill-conditioned, then it is near a
submanifold of this manifold. This procedure of locating manifolds that are defined
34
by higher order multiplicities is continued until the roots of the computed polynomial
are sufficiently well-conditioned, and close enough to the original polynomial. In this
circumstance, the original polynomial may be considered to be a small perturbation
of the computed polynomial, all of whose roots are well-conditioned. The computed
polynomial is acceptable if it is sufficiently near the original polynomial, and it is
reasonable to hypothesize that the original polynomial has a constraint that favours
multiple roots.
2.5
A simple polynomial root finder
It has been shown in the previous sections that a multiple root is ill-conditioned with
respect to random perturbations because they cause it to break up into a cluster of
simple roots, but that it is stable with respect to perturbations that maintain its
multiplicity. A simple root is, in general, better conditioned than a multiple root,
and it is therefore instructive to consider a polynomial root finder that reduces to the
determination of the roots of a several polynomials, each of which only contains simple
roots. This method, which is described in [42], pages 65-68, differs from the methods
that are described in Chapter 1 because the multiplicities of the roots are calculated
initially, after which the values of the roots are computed. The multiplicities of the
roots are calculated by a sequence of greatest common divisor (GCD) computations.
Consider the polynomial
f (x) = (x − x1 )r1 (x − x2 )r2 · · · (x − xl )rl g0 (x),
where ri ≥ 2, i = 1, . . . , l, g0 (x) contains only simple roots, and the multiple roots are
arranged such that r1 ≥ r2 ≥ · · · ≥ rl . Since a root of multiplicity ri of f (x) is a root
35
of multiplicity ri − 1 of its derivative f (1) (x), it follows that
f (1) (x) = (x − x1 )r1 −1 (x − x2 )r2 −1 · · · (x − xl )rl −1 g1 (x),
where g0 (x) and g1 (x) are coprime polynomials, and the roots of g1 (x) are simple. It
follows that
q1 (x) := GCD f (x), f (1) (x) = (x − x1 )r1 −1 (x − x2 )r2 −1 · · · (x − xl )rl−1 ,
and thus the polynomial f (x)/q1 (x) is equal to the product of all roots of f (x),
f (x)
= (x − x1 )(x − x2 ) · · · (x − xl )g0 (x).
q1 (x)
(1)
The GCD of q1 (x) and q1 (x) is
(1)
q2 (x) := GCD q1 (x), q1 (x) = (x − x1 )r1 −2 (x − x2 )r2 −2 · · · (x − xk )rk −2 ,
where ri ≥ 2, i = 1, . . . , k, and thus
q1 (x)
= (x − x1 )(x − x2 ) · · · (x − xk ),
q2 (x)
which is the product of all the roots of f (x) whose multiplicity is greater than or
equal to 2. This process of GCD computations and polynomial divisions is repeated,
and it terminates when the division yields a polynomial of degree one, corresponding
to the divisor of f (x) of maximum degree.
In order to generalise this procedure, let w1 (x) be the product of all linear factors
of f (x), let w2 (x) be the product of all quadratic factors of f (x), and in general, let
wi (x) be the product of all factors of degree i of f (x). If f (x) does not contain a
factor of degree k, then wk (x) is set equal to a constant, which can be assumed to be
unity. It follows that to within a constant multiplier,
max
f (x) = w1 (x)w22 (x)w33 (x) · · · wrrmax
(x),
36
and thus
max −1
q1 (x) = GCD f (x), f (1) (x) = w2 (x)w32 (x)w43 (x) · · · wrrmax
(x).
Similarly,
q2 (x) = GCD
(1)
q1 (x), q1 (x)
(1)
max −2
= w3 (x)w42 (x)w53 (x) · · · wrrmax
(x)
max −3
q3 (x) = GCD q2 (x), q2 (x) = w4 (x)w52 (x)w63 (x) · · · wrrmax
(x)
(1)
max −4
q4 (x) = GCD q3 (x), q3 (x) = w5 (x)w62 (x) · · · wrrmax
(x)
..
.
and the sequence terminates at qrmax (x), which is a constant. A sequence of polynomials hi (x), i = 1, . . . , rmax , is defined such that
h1 (x) =
f (x)
q1 (x)
= w1 (x)w2 (x)w3 (x) · · ·
h2 (x) =
q1 (x)
q2 (x)
= w2 (x)w3 (x) · · ·
h3 (x) =
q2 (x)
q3 (x)
= w3 (x) · · ·
..
.
hrmax (x) =
qrmax −1
qrmax
= wrmax (x),
and thus all the functions, w1 (x), w2 (x), · · · , wrmax (x), are determined from
w1 (x) =
h1 (x)
,
h2 (x)
w2 (x) =
h2 (x)
,
h3 (x)
···
, wrmax −1 (x) =
hrmax −1 (x)
,
hrmax (x)
until
wrmax (x) = hrmax (x).
The equations
w1 (x) = 0,
w2 (x) = 0,
···
, wrmax (x) = 0,
contain only simple roots, and they yield the simple, double, triple, etc., roots of
f (x). In particular, if x0 is a root of wi (x), then it is a root of multiplicity i of f (x).
Algorithm 2.1 contains pseudo-code for the implementation of Uspensky’s method for
the calculation of the roots of a polynomial.
Algorithm 2.1: Uspensky’s algorithm for the roots of a polynomial
Input A polynomial f (x).
Output The roots of f (x).
Begin
1. Set q0 = f .
2. Calculate the GCD of f and f (1) .
q1 = GCD f, f (1)
3. Calculate h1 =
q0
.
q1
4. Set j = 2.
5. While degree qj−1 > 0 do
(a) Calculate the GCD of qj−1 and its derivative.
(1)
qj = GCD qj−1 , qj−1
(b) Calculate hj =
qj−1
.
qj
(c) Calculate wj−1 = hj−1 /hj .
(d) Calculate the roots of wj−1 .
(e) Set j = j + 1.
End while
% They are of multiplicity j − 1.
37
6. Set wj−1 = hj−1 and solve wj−1 = 0
% They are of multiplicity j − 1.
End
Example 2.5. Consider the polynomial
f (x) = x6 − 3x5 + 6x3 − 3x2 − 3x + 2,
whose derivative is
f (1) (x) = 6x5 − 15x4 + 18x2 − 6x − 3.
It follows that
q1 (x) = GCD f (x), f (1) (x) = x3 − x2 − x + 1
(1)
q1 (x) = 3x2 − 2x − 1,
and hence
(1)
(1)
q2 (x) = GCD q1 (x), q1 (x) = x − 1 and q3 (x) = GCD q2 (x), q2 (x) = 1.
The polynomials h1 (x), h2 (x) and h3 (x) are
h1 (x) =
f (x)
q1 (x)
= x3 − 2x2 − x + 2
h2 (x) =
q1 (x)
q2 (x)
= x2 − 1
h3 (x) =
q2 (x)
q3 (x)
= x − 1,
and thus the polynomials w1 (x), w2 (x) and w3 (x) are
w1 (x) =
h1 (x)
h2 (x)
= x−2
w2 (x) =
h2 (x)
h3 (x)
= x+1
w3 (x) = h3 (x) = x − 1.
38
39
It follows that the factors of f (x) are
f (x) = (x − 1)3 (x + 1)2 (x − 2),
and thus f (x) has a triple root at x = 1, a double root at x = −1 and a simple root
at x = 2.
Example 2.5 contains the essential features of the algorithm for the computation
of the roots of a polynomial that will be described in subsequent chapters of this
document. Although it is easy to follow, it contains steps whose implementation in a
floating point environment raises some difficult issues:
• The computation of the GCD of two polynomials is an ill-posed problem because
it is not a continuous function of their coefficients. In particular, the polynomials
f (x) and g(x) may have a non-constant GCD, but the perturbed polynomials
f (x) + δf (x) and g(x) + δg(x) may be coprime. Even if f (x) and g(x) are
specified exactly and have a non-constant GCD, roundoff errors may be sufficient
to imply that they are coprime when the GCD is computed in a floating point
environment.
• The determination of the degree of the GCD of two polynomials reduces to the
determination of the rank of a resultant matrix, but the rank of a matrix is not
defined in a floating point environment. In particular, the rank loss of a resultant
matrix is equal to the degree of their GCD, and a minor perturbation in one
of both of the polynomials is sufficient to cause their resultant matrix to have
full rank, which suggests that the polynomials are coprime. The determination
of the rank of a noisy matrix is a challenging problem that arises in many
applications.
40
• Polynomial division, which reduces to the deconvolution of their coefficients,
is an ill-conditioned problem that must be implemented with care in order to
obtain a computationally reliable solution.
• The data in many practical examples is inexact, and thus the polynomials are
only specified within a tolerance. The given inexact polynomials may be coprime, and it is therefore necessary to perturb each polynomial slightly, such
that they have a non-constant GCD. This GCD is called an approximate greatest common divisor of the given inexact polynomials, and it is necessary to
compute the smallest perturbations such that the perturbed polynomials have
a non-constant GCD.
• The amplitude of the noise may or may not be known in practical examples,
and it may only be known approximately and not exactly. It is desirable that
a polynomial root finder not require an estimate of the noise level, and that all
parameters and thresholds be calculated from the data, that is, the polynomial
coefficients.
The subsequent chapters in this document address these issues in order to develop a
robust polynomial root finder.
2.6
Summary
Expressions for the componentwise and normwise backward errors, and componentwise and normwise condition numbers, were developed in this chapter. It was shown
that the backward errors and condition numbers are related to the forward error and
41
that the expressions for the condition numbers attain their maximum values for a
multiple root of high multiplicity.
A multiple root is well-conditioned when the perturbed polynomial has a root
of the same multiplicity as the original (unperturbed) polynomial. The pejorative
manifold of a polynomial was defined, and this allowed a geometric interpretation of
ill-conditioning to be developed.
A simple polynomial root finder was introduced and it was shown that it differs
from the root finders that were discussed in Chapter 1 because the multiplicity of
each root is calculated initially, after which the values of the roots are determined. It
was shown, however, that the computational implementation of this algorithm in a
floating point environment and with inexact data requires that some difficult problems
be addressed.
Chapter 3
The Sylvester resultant matrix
The simple polynomial root finder in Section 2.5 is, from a high level, the polynomial
root finder whose computational implementation is considered in this document. It
requires that the GCD of pairs of polynomials be computed several times, and this
chapter describes the application of the Sylvester resultant matrix and its subresultant
matrices for this calculation.
Two polynomials are coprime if and only if the determinant of their resultant
matrix is equal to zero, and if they are not coprime, the degree and coefficients of their
GCD can be calculated from their resultant matrix. In particular, the rank loss of this
matrix is equal to the degree of the GCD of the polynomials, and the coefficients of
the GCD are obtained by reducing the matrix to upper triangular form. The Sylvester
resultant matrix for the power and Bernstein polynomial bases is considered in this
chapter, and this leads on to a discussion of their subresultant matrices and their use
in calculating the degree of a common divisor of two polynomials.
There are several resultant matrices, including the Sylvester, Bézout and companion resultant matrices, and they may be considered equivalent because they all
42
43
CHAPTER 3. THE SYLVESTER RESULTANT MATRIX
yield the same information on the GCD of two polynomials. Euclid’s algorithm is
a classical method of calculating the GCD of two polynomials, and the connection
between it and the Sylvester resultant matrix is established in [50].
3.1
The Sylvester resultant matrix for power basis
polynomials
Let f = f (x) and g = g(x) be real polynomials of degrees m and n respectively,
m
n
X
X
i
f (x) =
ai x
and
g(x) =
bi xi ,
(3.1)
i=0
i=0
where am , bn 6= 0. The Sylvester resultant matrix S(f, g) ∈ R(m+n)×(m+n) is equal to


bn
am





bn−1 bn
 am−1 am



.
 .
.
.

..
 ..
..
..
b
a
n−1
m−1




..
..


..
..
. bn 
. am b1
.
.
 a1
,
S(f, g) = 


..
..
 a0
. bn−1 
. am−1 b0
b1
a1




.
.
.
.

..
.. 
..
.. b0
a0






..
..
. b1 
. a1 


b0
a0
|
{z
n columns
}
|
{z
m columns
}
(3.2)
where the coefficients ai of f (x) occupy the first n columns of S(f, g), and the coefficients bi of g(x) occupy the last m columns of S(f, g). The most important property
of the Sylvester matrix is that det S(f, g) = 0 is a necessary and sufficient condition
44
for f (x) and g(x) to have a non-constant common divisor [45, 49]. Furthermore, if
det S(f, g) = 0, then the degree and coefficients of the GCD of f (x) and g(x) can be
computed from S(f, g).
Theorem 3.1. Let S(f, g) be the Sylvester matrix of the polynomials f (x) and g(x)
that are defined in (3.1). If the degree of their GCD is d, then
1. The rank of S(f, g) is equal to m + n − d.
2. The coefficients of their GCD are given in the last non-zero row of S(f, g)T after
it has been reduced to upper triangular form by an LU or QR decomposition.
Proof. See [4], page 36, or [6].
Example 3.1. Consider the polynomials f (x) and g(x),
f (x) = −3x3 +
25 2 23
x − x+3
2
2
and
g(x) = 6x2 − 7x + 2,
whose GCD is g(x). The transpose S(f, g)T of S(f, g)

− 23
3
−3 25
2
2


25
 0 −3
− 23
2
2


T
S(f, g) = 
2
0
 6 −7


6 −7
2
 0

0
0
6 −7
and its reduction to row echelon (upper triangular)

−3 25
− 23
3 0
2
2


25
 0 −3
− 23
3
2
2


 0
0
6 −7 2



 0
0
0
0 0

0
0
0
0 0
is
0



3 


0 
,


0 

2
form yields the matrix






.





45
The rank loss of this matrix is two, which is equal to the degree of the GCD of f (x)
and g(x). Furthermore, the non-zero coefficients in its last non-zero row are 6, −7, 2,
and thus the GCD is 6x2 − 7x + 2, which is equal to g(x), as required.
The Sylvester matrix allows the degree and coefficients of the GCD of two polynomials to be calculated. The next section considers subresultant matrices that are
derived from the Sylvester matrix, and it is shown that the order of a submatrix is
related to the degree of a common divisor of two polynomials. Subresultant matrices
will be used extensively in Chapter 4, where the GCD of two inexact polynomials is
considered.
3.1.1
Subresultants of the Sylvester matrix for power basis
polynomials
The subresultant matrices of the Sylvester matrix are obtained by the deletion of
some of its rows and columns. In particular, the k’th Sylvester matrix, or subresultant
matrix, Sk (f, g) ∈ R(m+n−k+1)×(m+n−2k+2) is a submatrix of S(f, g) that is formed by
deleting the last k − 1 rows of S(f, g), the last k − 1 columns of the coefficients of
f (x), and the last k − 1 columns of the coefficients of g(x).
46
Example 3.2. If m = 4 and n = 3, then

 a4

 a3 a4



 a2 a3


S1 = S(f, g) =  a1 a2


 a0 a1



a0










S2 = S2 (f, g) = 







a4
b3
a3 a4 b2 b3
a2 a3 b1 b2
a1 a2 b0 b1
a0 a1
a0
b0







b3 
,

b2 


b1 


b0

b3
b2 b3
a4 b1 b2 b3
a3 b0 b1 b2
a2
b0 b1
a1
b0
a0









b3  ,


b2 



b1 

b0

a
 4

 a3


S3 = S3 (f, g) = 
 a2


 a1

a0
b3
b2
b1
b0



b3 


b2 
.


b1 

b0
Theorems 3.2 and 3.3 establish the connection between the degree of a common
divisor of two polynomials and the order of a subresultant matrix.
Theorem 3.2. Let f (x) and g(x) be defined in (3.1), and let q(x) and p(x) be polynomials of degrees m − k and n − k respectively. If d(x) is a polynomial of degree k,
then
f (x)p(x) = g(x)q(x),
(3.3)
if and only if d(x) is a common divisor of f (x) and g(x).
Proof. If d(x) is a common divisor of f (x) and g(x), then there exist polynomials
47
p(x) and q(x) such that
f (x)
= q(x)
d(x)
and
g(x)
= p(x),
d(x)
and (3.3) follows.
Conversely, assume that (3.3) holds, such that, without loss of generality, p(x) and
q(x) are coprime. (If these polynomials are not coprime, then any common divisors
can be removed.) It follows that since p(x) is of degree n − k and g(x) is of degree
n, every divisor of p(x) is also a divisor of g(x). There therefore exists a polynomial
d1 (x) of degree k such that
p(x)d1 (x) = g(x),
and substitution into (3.3) yields
f (x) = d1 (x)q(x).
(3.4)
Similarly, consideration of the polynomials q(x) and f (x) leads to the equation
g(x) = d2 (x)p(x),
(3.5)
where d2 (x) is of degree k. The substitution of (3.4) and (3.5) into (3.3) shows that
d1 (x) = d2 (x), and thus the result is established.
The main theorem can now be established [7].
Theorem 3.3. A necessary and sufficient condition for the polynomials f (x) and
g(x), which are defined in (3.1), to have a common divisor of degree k ≥ 1 is that the
rank of Sk (f, g) be less than (m + n − 2k + 2), or equivalently, the dimension of its
null space is greater than or equal to one.
Proof. Let f (x) and g(x) have a common divisor of degree k, where 1 ≤ k ≤ t, and t
is the degree of the GCD of f (x) and g(x). There therefore exists a polynomial w(x)
48
of degree k such that
f (x) = w(x)f1 (x)
and
g(x) = w(x)g1 (x),
where
f1 (x) =
m−k
X
i
cm−k−i x
and
g1 (x) =
i=0
i
ai x
i=0
dn−k−ixi .
i=0
It follows from f (x)g1 (x) = f1 (x)g(x) that
m
X
n−k
X
n−k
X
j=0
j
dn−k−j x =
m−k
X
i=0
i
cm−k−i x
n
X
bj xj ,
j=0
and since this equation is satisfied for all values of x, the coefficients on both sides of
this equation can be equated. This yields the homogeneous equation


 

bn

 am
  d0   0

..

 am−1 . . .
.
  .
b
n−1


..
  ..

 .
.
 

..
..
..

 .
 
. am
. bn  
.
 .
  .


 dn−k   ..


.
.
=

. . a1
. . bn−1 
 a1
b1
 


 −c0   ...


.
..  
 
..
..
 a0
. ..
.
  .
b0
. 

..
  ..



.
 
..
..


 
. a1
. b1  

 −cm−k

0
a0
b0








,







(3.6)
where the coefficient matrix is Sk ∈ R(m+n−k+1)×(m+n−2k+2) . The coefficients ai of
f (x) occupy the first (n − k + 1) columns, and the coefficients bj of g(x) occupy the
last (m−k +1) columns. The coefficient matrix is square and reduces to the Sylvester
resultant matrix if k = 1.
If k > 1, the number of rows is greater than the number of columns, and since
it is assumed that (3.6) possesses a solution, the coefficient matrix Sk (f, g) must be
rank deficient. It therefore follows that if f (x) and g(x) have a common divisor of
degree k, then the rank of Sk (f, g) is less than (m + n − 2k + 2).
49
Assume now that the rank of Sk (f, g) is less than (m + n − 2k + 2), from which it
follows that one or more of its columns is linearly dependent on the other columns.
There therefore exist constants p0 , . . . , pn−k , q0 , . . . , qm−k , not all zero, such that
n−k
X
i=0
pi u i −
m−k
X
qj vj = 0,
(3.7)
j=0
where ui, i = 0, . . . , n−k, and vj , j = 0, . . . , m−k, are the vectors of the first (n−k+1)
and last (m − k + 1) columns of Sk (f, g), respectively. If the polynomials p(x) and
q(x) are defined as
p(x) =
n−k
X
i
pi x
and
q(x) =
m−k
X
qi xi ,
i=0
i=0
respectively, then (3.7) states that
p(x)f (x) = q(x)g(x),
and Theorem 3.2 shows that f (x) and g(x) have a common divisor of degree k. It
therefore follows that if the rank of Sk (f, g) is less than (m + n − 2k + 2), then f (x)
and g(x) have a common divisor of degree k.
Theorem 3.3 allows the degree d of the GCD of f (x) and g(x) to be calculated
because these polynomials possess common factors of degrees 1, 2, . . . , d, but they do
not possess a common factor of degree d + 1. Thus
rank Sk (f, g) < m + n − 2k + 2,
k = 1, . . . , d,
(3.8)
k = d + 1, . . . , min (m, n),
(3.9)
and
rank Sk (f, g) = m + n − 2k + 2,
and hence d is equal to the index k of the last rank deficient subresultant matrix in the
sequence S1 (f, g), S2(f, g), . . . , Sk (f, g). Alternatively, one can consider the sequence
Smin (m,n) , Smin (m,n)−1 , . . . , thereby increasing the size of the subresultant matrix, and
50
thus d is equal to the order of the first subresultant matrix that is rank deficient.
Each matrix Sk (f, g) is partitioned into a vector ck ∈ R(m+n−k+1) and a matrix
Ak ∈ R(m+n−k+1)×(m+n−2k+1) , where ck is the first column of Sk (f, g), and Ak is the
matrix formed from the remaining columns of Sk (f, g),
Sk (f, g) =
ck Ak
=
ck coeffs. of f (x) coeffs. of g(x) ,
(3.10)
where the coefficients of f (x) occupy n − k columns, and the coefficients of g(x)
occupy m − k + 1 columns.
The computation of the GCD of two univariate polynomials requires that the
equation
Ak y = ck ,
y ∈ Rm+n−2k+1 ,
(3.11)
be considered.
Theorem 3.4. Let f (x) and g(x) be polynomials that are defined in (3.1), and let
k ≤ min (m, n) be a positive integer. Then the dimension of the null space of Sk (f, g)
is greater than or equal to one if and only if (3.11) possesses a solution.
Proof. The following proof is taken from [55], and another proof is in [50].
Assume initially that (3.11) has a solution, from which it follows that ck lies in
the column space of Ak . The definition of Sk (f, g) then shows that the dimension of
the null space of Sk (f, g) is greater than or equal to one.
Assume now that the dimension of the null space of Sk (f, g) is greater than or
equal to one, and consider the left and right hand sides of (3.11). In particular, if
51
w T = w T (x) is defined as
xm+n−k xm+n−k−1 · · · x 1
,
then
T
w Ak =
xn−k−1 f xn−k−2 f · · · f xm−k g xm−k−1 g · · · g
,
and
w T ck = xn−k f,
where f = f (x) and g = g(x).
Let u(x) and v(x) be the polynomials
u(x) =
n−k−1
X
n−k−1−i
ui x
and
v(x) =
i=0
m−k
X
vi xm−k−i ,
i=0
respectively, and let the vector t ∈ Rm+n−2k+1 be formed from the coefficients of u(x)
and v(x),
T
t :=
u0 · · · un−k−2 un−k−1 v0 · · · vm−k−2 vm−k−1 vm−k
.
(3.12)
It therefore follows that
w T Ak t = uf + vg,
where u = u(x) and v = v(x), and thus the equation
w T Ak t = w T ck ,
(3.13)
uf + vg = xn−k f,
(3.14)
reduces to
for the polynomials u(x) and v(x).
Let d = d(x) be the GCD of f (x) and g(x), and thus there exist polynomials
52
f1 = f1 (x) and g1 = g1 (x) such that
f1 =
f
d
and
g
g1 = .
d
The form of Sk (f, g) for several values of k is shown in Example 3.2, and it is clear
that if (k − 1) columns are added to the coefficients of f (x) and (k − 1) columns
are added to the columns of g(x), with suitable vertical shifts, such that the matrix
S(f, g) is obtained, then the dimension of the null space of S(f, g) is greater than or
equal to 1 + (k − 1) = k, and thus the rank loss of S(f, g) is greater than or equal to
k. This implies that the degree of the GCD of f (x) and g(x) is greater than or equal
to k, and hence
degree f1 ≤ m − k
and
degree g1 ≤ n − k.
Consider the polynomial division
r
(1 − x)n−k
=q+ ,
g1
g1
(3.15)
where q = q(x) is the quotient, r = r(x) is the remainder, the degree of r(x) is less
than or equal to (n − k − 1), and
degree q = (n − k) − degree g1
= (n − degree g1 ) − k
= (degree d) − k.
It is now shown that u = r and v = qf1 are solutions of (3.13), where t is defined in
53
(3.12). In particular, these forms of u(x) and v(x) satisfy (3.14) because
uf + vg = rf + qf1 g
= rf + qf1 dg1
= rf + qf g1
= f (r + qg1 )
= (1 − x)n−k f,
from (3.15), and thus it follows that
w T (Ak t − ck ) = 0,
w = w(x),
possesses a solution for all values of x. Since w 6= 0, the solution of (3.11) is given by
y = t, where t is specified in (3.12).
The next theorem follows from Theorems 3.3 and 3.4.
Theorem 3.5. A necessary and sufficient condition for the polynomials f (x) and
g(x) to have a common divisor of degree k is that (3.11) possesses a solution.
This result is important when calculating approximate GCDs, and it will be used
extensively in the sequel.
3.2
The Sylvester resultant and subresultant matrices for Bernstein basis polynomials
This section considers the Sylvester resultant matrix, and its subresultant matrices,
for polynomials expressed in the Bernstein basis, which is, for a polynomial of degree
54
m,
m
φi (x) =
(1 − x)m−i xi ,
i
i = 0, . . . , m.
(3.16)
It follows that, for example, second and third order polynomials expressed in the
Bernstein basis are
2
2
2 2
2
c0
(1 − x) + c1
(1 − x)x + c2
x,
0
1
2
and
3
3
3 3
3
3
2
2
c0
(1 − x) + c1
(1 − x) x + c2
(1 − x)x + c3
x,
0
1
2
3
respectively.
It is shown in [45, 49] that the Sylvester resultant matrix S(p, q) ∈ R(m+n)×(m+n)
of the polynomials p = p(x) and q = q(x),
m
n
X
X
m
n
m−i i
p(x) =
ci
(1 − x) x and q(x) =
di
(1 − x)n−i xi ,
i
i
i=0
i=0
(3.17)
is
S(p, q) = D −1 T (p, q),
where D, T (p, q) ∈ R(m+n)×(m+n) ,
−1
1
D = diag
(m+n−1
)
0
1
(m+n−1
)
1
···
(3.18)
1
1
m+n−1
(m+n−2
)
m+n−1
(m+n−1
)
,
55
and

c0 m0


 c1 m

1

..

.



m
T (p, q) =  cm−1 m−1


 cm m

m




d0 n0
..
.
..
.
c0
m
0
..
.
c1
m
1
..
.
..
. cm−1
..
.
cm
..
.
..
.
..
.
dn−1
dn
m
m−1
m
m
d1 n1
..
.
n
n−1
n
n
..
.
..
.







n
d0 0



n
,
d1 1


..

.


n
dn−1 n−1 


n
dn n
(3.19)
is the Sylvester resultant matrix for polynomials expressed in the scaled Bernstein
basis, whose basis functions for a polynomial of degree m are
φi (x) = (1 − x)m−i xi ,
i = 1, . . . , m.
(3.20)
Comparison of S(f, g) and S(p, q) shows that the Bernstein basis Sylvester resultant
matrix does not exhibit the diagonal property of its power basis equivalent because of
the diagonal matrix D −1 that premultiplies T (p, q). Despite this difference, all of the
properties of the Sylvester matrix for power basis polynomials apply to the Sylvester
matrix for Bernstein basis polynomials, and thus Theorems 3.1, 3.2, 3.3 and 3.4 are
valid for Bernstein basis polynomials.
Example 3.3. Consider the Bernstein basis polynomials
3
5 3
1 3
3 3
3
2
2
p(x) = 3
(1 − x) −
(1 − x) x −
(1 − x)x +
x,
0
6 1
2 2
3
and
2
3 2
2 2
2
q(x) = 2
(1 − x) −
(1 − x)x +
x,
0
2 1
2
whose GCD is q(x) because
1
p(x) = q(x) 3(1 − x) + 2x .
2
(3.21)
56
The transpose of the Sylvester

3


 0


S(p, q)T = 
 2


 0

0

3


 0


= 
 2


 0

0
resultant matrix is

1 0
− 25 − 23


5
3

3 −2 −2 1 



−3
1
0 0 



2 −3
1 0 

0
2 −3 1

1
− 58 − 41
0
4


3
5
3
− 12 − 8 1 
4


1
.
− 34
0
0

6


1
1
1
−2
0 
2
4

1
3
0
−4 1
3
1 0 0 0 0



0
0 0 0 


0 0 16 0 0 



0 0 0 41 0 

0 0 0 0 1
1
4
The reduction of S(p, q)T to row echelon form yields


1
0
3 − 85 − 41
4




3
5
3
 0
− 12 − 8 1 
4




 0
,
1
3
0
−
1


3
4




 0
0
0
0 0 


0
0
0
0 0
(3.22)
and thus the degree of the GCD is 2. The coefficients in the last non-zero row of this
matrix yield the GCD,
1 4
3 4
4 4
2 2
3
(1 − x) x −
(1 − x)x +
x
3 2
4 3
4
2
3 2
2 2
2
2
=x 2
(1 − x) −
(1 − x)x +
x .
0
2 1
2
Deletion of the extraneous factor x2 yields the polynomial q(x), as required.
Consider now the polynomial formed from the first row of the matrix in (3.22).
57
In particular, this polynomial is equal to
4
5 4
1 4
1 4
4
3
2 2
3
(1 − x) −
(1 − x) x −
(1 − x) x +
(1 − x)x3 ,
0
8 1
4 2
4 3
which simplifies to
1 3
3 3
3
5 3
2
2
3
(1 − x) x −
(1 − x)x +
x ,
(1 − x) 3
(1 − x) −
6 1
2 2
3
0
and since the term in the square brackets is equal to p(x), it follows from (3.21) that
this polynomial can be simplified further,
1
3(1 − x)2 + 2(1 − x)x q(x)
2 1 2
3 2
2
=
(1 − x) +
(1 − x)x q(x).
2 0
2 1
(1 − x)p(x) =
It follows that the coefficients in the first row of the matrix in (3.22) define a polynomial, one of whose factors is q(x), that is, the GCD of p(x) and q(x).
Consider now the polynomial formed from the coefficients in the second row in
the matrix in (3.22). This polynomial is equal to
3 4
5 4
3 4
4 4
3
2 2
3
(1 − x) x −
(1 − x) x −
(1 − x)x +
x,
4 1
12 2
8 3
4
which simplifies to
3 2
2 2
(1 − x)x +
x q(x),
4 1
2
which is proportional to q(x), that is, the GCD of p(x) and q(x).
58
Example 3.4. Consider the polynomials p(x)

c0 (40)
 (60)

 c1 (41) c0 (40)
 6
 (1 )
(6)
 c 4 c 14 c 4
 2 (2) 1 (1) 0 (0)
 (6 )
(62)
(62)
 2
 c3 (4) c2 (4) c1 (4)
3
3
1
S(p, q) = S1 (p, q) = 
6
6
 (63)
(
)
(
)
3
3

 c4 (44) c3 (43) c2 (42)
 6
 (4 )
(64)
(64)

4
c 4 (4 )
c3 (43)


6
(5)
(65)


c4 (44)
(66)
and q(x) for m = 4 and n = 3,

d0 (30)

(60)


d1 (31)
d0 (30)

6
6

(1)
(1)

d2 (32)
d1 (31)
d0 (30)


6
6
6
(2)
(2)
(2)

d3 (33)
d2 (32)
d1 (31)
d0 (30) 
.
(63)
(63)
(63)
(63) 

d3 (33)
d2 (32)
d1 (31) 

(64)
(64)
(64) 

d3 (33)
d2 (32) 
(65)
(65) 

d3 (33) 
(66)
The second and third subresultant matrices are,

c0 (40)
d0 (30)
6
 (0)
(60)

4
4
 c1 (1) c0 (0) d1 (31)
 6
(6)
(6 )
 (1)
 c2 (4) c1 (14) d2 (1 3)

1
2
2
 (6)
(62)
(62)
2

S2 (p, q) = 
4
4
3
 c3 (63) c2 (63) d3 (6 3)
 (3)
(3)
(3 )

 c4 (44) c3 (43)
 6
(64)
 (4)

c4 (44)
(65)
and

c0 (40)
d0 (30)
6
 (0)
(6)
 c 4 d 03
 1 (1) 1 (1)
 (6)
(61)
 1
 c2 (4) d2 (3)
2
2
S3 (p, q) = 
6
 (62)
(
)
2

 c3 (43) d3 (33)
 6
 (3)
(63)
 c 4
4 (4 )
(64)
respectively,

d0 (30)
(61)
d1 (31)
(62)
d2 (32)
(63)
d3 (33)
(64)
d0 (30)
(61)
d1 (31)
(62)
d2 (32)
(63)
d3 (33)
(64)
d0 (30)
(62)
d1 (31)
(63)
d2 (32)
(64)
d3 (33)
(65)








,















.






59
The next example illustrates Theorem 3.5 for polynomials expressed in the Bernstein basis.
Example 3.5. It follows from Example 3.3 that the Sylvester resultant matrix S(p, q)
is

3
0
2
0
0



 5

3
3
1
 −
−4
0 
4
2
 8



5
1
1
1 .
1
S(p, q) = 
−
−
−
 4
12
6
2
3 


 1
3
1
3 
 4 −8
0
−4 
4


0
1
0
0
1
Consider the subresultant matrix formed k = 1, in which case it is necessary that the
equation


0
2
0
0




3
1
3


−
0
4
4
2




1
1
1 
 −5

−
 12
6
2
3 




1
0
− 34  
 − 38
4


1
0
0
1



3

y1  
 5 

 
 −8 

y2  

=
− 14 
,

 

y3  
  1 

 4 
y4
0

have a solution in order that the degree of the GCD of p(x) and q(x) be greater than
or equal to one. It is readily checked that the solution of this equation is
T
3
1
y = 1 2 − 2 −1
,
and thus the minimum degree of the GCD of p(x) and q(x) is one.
Consider now the subresultant matrix for k = 2. A necessary and sufficient
condition for the degree of the GCD of p(x) and q(x) to be greater than or equal to
two is that the equation



60

3 
0 
 2

 




1 
 −5 
 −3
y
 8 
 4
2  1 


,
=
 1 
 1
1 
−
y
−
 6
 4 
2
2 




1
1
0
4
4
have a solution. It is readily checked that the solution of this equation is
T
3
y=
,
1
2
and thus the minimum degree of the GCD of p(x) and q(x) is two. Since the degree
of q(x) is two, it follows that the GCD of p(x) and q(x) is proportional to q(x).
3.3
Summary
This chapter has reviewed some properties of the Sylvester resultant matrix, and its
subresultant matrices, for polynomials expressed in the power and Bernstein bases.
It was shown that the Sylvester matrix for power basis polynomials has a strong
diagonal pattern, which is not present in its Bernstein basis equivalent because it
is premultiplied by a diagonal matrix. Despite this difference, all the properties of
common divisors that are valid for the Sylvester matrix for power basis polynomials
are valid for its Bernstein basis equivalent. For example, the rank loss of a resultant
matrix is equal to the degree of the GCD of the polynomials, and the coefficients of
the GCD can be obtained by reducing the resultant matrix to upper triangular form,
for example, by an LU or QR decomposition, and considering the coefficients in the
last non-zero row.
The subresultant matrices of a Sylvester matrix are obtained by deleting some
61
rows and columns of the Sylvester matrix. This yields a rectangular matrix, and
some theorems about the degree of a common divisor of two polynomials and the
dimensions of the subresultant matrices were established. The subresultant matrices
of a Sylvester resultant matrix are important when it is required to compute an
approximate GCD of two inexact polynomials.
Chapter 4
Approximate greatest common
divisors
The application of the Sylvester matrix and its subresultant matrices to the calculation of the GCD of two polynomials was considered in Chapter 3. The methods
required for these calculations are adequate if neither roundoff nor data errors are
present, and all computations are performed in a symbolic environment. These conditions rarely prevail in practice because data is often inexact, and computations are
performed in a floating point environment in which roundoff errors cannot be ignored.
It is therefore assumed that the given inexact polynomials f (x) and g(x), which are
defined in (3.1), are coprime, and a minor structured perturbation of the coefficients
to ai + δai and bi + δbi , may cause the perturbed polynomials
m
n
X
X
i
˜
f (x) =
(ai + δai )x
and
g̃(x) =
(bi + δbi )xi ,
i=0
(4.1)
i=0
to have a non-constant GCD. The computed GCD is an approximate GCD of the
given inexact polynomials f (x) and g(x), and moreover, it is not unique because
62
CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS
63
different perturbations of the given inexact polynomials yield different approximate
GCDs. The Sylvester matrix and its subresultant matrices can still be used if it is
required to compute an approximate GCD of two polynomials, but modifications to
the simple operations described in Chapter 3 are required, and they are considered
in this chapter.
Two methods for the computation of an approximate GCD are considered in this
chapter. In particular, Section 4.3 describes a method that makes extensive use of the
structured nature of the Sylvester resultant matrix for power basis polynomials [48],
and it is extended in Section 4.4 to Bernstein basis polynomials [1, 47]. Section 4.6
describes a method that uses a partial singular value decomposition of the Sylvester
subresultant matrices to obtain an initial estimate of the GCD, followed by a nonlinear refinement procedure in order to improve its accuracy [53].
The method of GCD computation that uses structured matrices takes advantage
of the non-uniqueness of the Sylvester resultant matrix to introduce a parameter α
that improves the computed results with respect to its default value α = 1. This
non-uniqueness is described in Section 4.2, and the examples in Section 4.3.2 show
that an incorrect value of α leads to poor results. Also, this method yields a low
rank approximation of a Sylvester resultant matrix, which has applications for the
calculation of the points of intersection of curves and surfaces.
4.1
Previous work
The computation of an approximate GCD of the inexact polynomials (3.1) has been
considered by several authors. For example, Corless et. al. [6], and Zarowski et.
al. [52], use the QR decomposition of the Sylvester matrix S(f, g). Similarly, the
64
singular value decomposition of S(f, g) is used in [5], [8] and [40] in order to compute
an approximate GCD, but both these decompositions do not preserve its structure. In
particular, the smallest non-zero singular value of S(f, g) is a measure of its distance
to singularity, but this is the distance to an arbitrary rank deficient matrix, and not
the distance to the nearest rank deficient Sylvester matrix. Karmarkar and Lakshman
[26] use optimisation techniques in order to compute the smallest perturbations that
must be applied to the coefficients of two polynomials such that they have a nonconstant GCD, and Pan [34] uses Padé approximations to compute an approximate
GCD.
4.2
The non-uniqueness of the Sylvester resultant
matrix
The Sylvester matrix S(f, g) has a very simple structure, and this makes it convenient
for computations. This simple structure exhibits one property that has a significant
effect on the quality of the computed approximate GCD. In particular, an approximate
GCD of f (x) and g(x) is equal to, up to a scalar multiplier, an approximate GCD
of f (x) and αg(x) where α is an arbitrary non-zero constant, and thus the resultant
matrix S(f, αg) should be used when it is desired to compute an approximate GCD
of f (x) and g(x). Since S(f, αg) 6= αS(f, g), the inclusion of α permits a family of
approximate GCDs, rather than only one approximate GCD, to be computed. The
restriction α = 1 yields unsatisfactory solutions, but it is shown that the inclusion of
α allows significantly improved solutions to be obtained.
The Sylvester matrix S(f, g) is defined in (3.2), but the discussion in the previous
65
paragraph shows that it is more appropriate to consider the resultant matrix S(f, αg),
where α is a real non-zero constant,

a
 m

 am−1 am

 .
.
 ..
am−1 . .


..

..
. am
 a1
.
S(f, αg) = 

..
 a0
. am−1
a1


..
..

.
a0
.



..
. a1


a0
αbn
αbn−1 αbn
..
.
αbn−1
..
αb1
.
αb0
αb1
αb0

..
.
..
.
αbn
..
. αbn−1
..
..
.
.
..
. αb1
αb0










,










for approximate GCD computations. The scalar α can be interpreted as the magnitude of g(x) relative to the magnitude f (x), but this interpretation is only valid
provided that f (x) and g(x) are normalised in the same way. Since the coefficients
of f (x) and g(x) may vary by several orders of magnitude, normalisation by the geometric means of their coefficients is convenient, and thus the polynomials f (x) and
g(x) are redefined as
1
and
f (x) := Q
1
m+1
( m
i=0 |ai |)
1
g(x) := Q
1
( ni=0 |bi |) n+1
m
X
ai xi ,
(4.2)
bi xi ,
(4.3)
i=0
n
X
i=0
respectively, where ai , i = 0, . . . , m, and bi , i = 0, . . . , n, are the perturbed coefficients,
and thus the Sylvester matrix S(f, αg) is constructed from these polynomials. If one
or more of the coefficients of a polynomial is zero, then the normalisation by the
66
geometric mean of its coefficients, as shown in (4.2) and (4.3), requires modification.
4.3
Structured matrices and constrained minimisation
This section describes the use of structured matrices and constrained optimisation for
the computation of an approximate GCD of two inexact polynomials. This requires
the calculation of the perturbations δai , i = 0, . . . , m, and δbi , i = 0, . . . , n, such
that the perturbed polynomials f˜(x) and g̃(x), which are defined in (4.1), have a
non-constant GCD. This problem reduces, therefore, to the calculation of a low rank
˜ g̃) of a full rank Sylvester matrix S(f, αg), where f (x)
structured approximation S(f,
and g(x) are the normalised polynomials (4.2) and (4.3) respectively. It is usual to
require the smallest (structured) perturbations that perform this transformation, and
thus the computation of an approximate GCD reduces to a constrained minimisation,
where the function to be minimised (the objective function) is kδak2 + α2 kδbk2 , and
˜
the constraint is the requirement that f(x)
and g̃(x) have a non-constant GCD. This
condition is imposed by employing Theorem 3.5 for the perturbed polynomials f˜(x)
and g̃(x) and determining the integers k for which (3.11) has a solution. The largest
integer defines the degree of the approximate GCD of f˜(x) and g̃(x).
The Sylvester matrix and subresultant matrices have very strong structures, and
thus if Sk (f, αg) is a subresultant matrix, it is necessary to determine the smallest perturbations of f (x) and αg(x) such that Sk (f˜, g̃) has the same structure as
Sk (f, αg). Since f (x) and g(x) are inexact and coprime, and their theoretically exact
forms have a non-constant GCD, there exist perturbations δf (x) and αδg(x) such
67
that f (x) + δf (x) and α (g(x) + δg(x)) have a non-constant common divisor, that is,
if hk ∈ Rm+n−k+1 and Ek ∈ R(m+n−k+1)×(m+n−2k+1) are structured perturbations of ck
and Ak respectively, it follows from Theorem 3.5 that the equation
(Ak + Ek ) y = ck + hk ,
(4.4)
which is the perturbed form of (3.11), has an exact solution.
It follows from Theorem 3.5 that, for a given value of k, (4.4) has a solution if and
only if f˜(x) and g̃(x), where
f˜(x) = f (x) + δf (x)
and
g̃(x) = α (g(x) + δg(x)) ,
have a common divisor of degree k. The computation of a structured low rank
approximation of S(f, αg) therefore requires the determination of Ek and hk such
that (4.4) possesses a solution for which Ak and Ek have the same structure, and ck
and hk have the same structure. This is an overdetermined equation, and k is initially
set equal to its maximum value, k = k0 = min (m, n). If a solution exists, then the
degree of the GCD of f˜(x) and g̃(x) is equal to k0 . If this equation does not possess
a solution, then k is reduced to k0 − 1, and if a solution exists for this value of k,
then the degree of the GCD of f˜(x) and g̃(x) is equal to k0 − 1. If a solution does not
exist, then k is reduced to k0 − 2, and this process is repeated until (4.4) possesses a
solution. This result is used in the next section in order to compute a structured low
rank approximation of S(f, αg).
The perturbation matrix Ek and perturbation vector hk are structured, and thus
ordinary least squares (LS) methods cannot be used for their computation because
they do not preserve the structure of a matrix or vector. It is therefore necessary
to use structure preserving matrix methods in order to guarantee that (4.4) has the
same form as its unperturbed equivalent (3.11), and the method of structured total
68
least norm (STLN) is therefore used [39].
If zi is the perturbation of the coefficient ai , i = 0, . . . , m, of f (x), and zm+1+i is
the perturbation of the coefficient αbi , i = 0, . . . , n, of g(x), then the Sylvester matrix
Bk = Bk (z) := S(δf, αδg) of the
Bk :=
hk Ek

zm 

 zm−1 zm

 .  .. z
m−1


..

.
 z1 = 

 z0 z1


z

0





perturbations is

zm+n+1
..
.
..
.
zm+n
..
.
zm
..
. zm−1
..
..
.
.
..
. z1
z0
zm+n+1
..
.
..
. zm+n+1
zm+2
zm+n
..
.
zm+1
zm+2
..
.
zm+1
..
.
zm+n
..
.
..
.
zm+2
zm+1










 , (4.5)










where hk = hk (z) is equal to the first column of Bk (z), and Ek = Ek (z) is equal to the
last m + n − 2k + 1 columns of Bk (z). The matrix Bk (z) is a structured error matrix
because Sk (f, αg) + Bk (z) is a subresultant matrix. It follows from the definitions of
hk and z that there exists a matrix Pk ∈ R(m+n−k+1)×(m+n+2) such that


0m+1,n+1 
 Im+1
hk = Pk z = 
 z,
0n−k,m+1 0n−k,n+1
where Im+1 is the identity matrix of order m + 1 and the subscripts on the zero
matrices indicate their order.
69
Example 4.1. If

0 b3


 a3 b2


A2 = 
 a2 b1


 a1 b0

a0 0
m = n = 3 and k = 2, then




0
a3
1 0








 a2 
 0 1
b3 








 a  , P2 =  0 0
,
c
=
b2 
2

 1 









b1 
 a0 
 0 0




b0
0
0 0
T
z =
z3 z2 z1 z0 z7 z6 z5 z4
0 0 0 0 0 0
0 0 0
1 0 0
0 1 0
0 0 0



0 0 0 


0 0 0 
,


0 0 0 

0 0 0
,
and


z
 3 


 z2 





h2 = 
 z1  ,




 z0 


0

0


 z3


E2 = 
 z2


 z1

z0
z7
z6
z5
z4
0
0



z7 


z6 
.


z5 

z4
The residual r(z, y) that is associated with an approximate solution of (4.4) due
to the perturbations hk and Ek is given by
r(z, y) = ck + hk − (Ak + Ek )y,
hk = Pk z,
Ek = Ek (z),
(4.6)
where the elements of z are zi , i = 0, . . . , m + n + 1, and it is required to minimise kzk
subject to the constraint r(z, y) = 0, which is an equality constrained least squares
(LSE) problem. It is necessary to replace the vector Ek y with a vector Yk z, that is,
Yk z = Ek y,
Yk = Yk (y),
Ek = Ek (z),
(4.7)
and hence
Yk δz = δEk y,
(4.8)
70
where Yk ∈ R(m+n−k+1)×(m+n+2) , and thus the residual r(z, y) can be written as
r(z, y) = ck + hk − Yk z − Ak y.
Example 4.2. Consider the vectors and matrices in Example 4.8. It is readily checked
that

0
0
0
0
y2
0
0
0





 y1 0 0 0 y3 y2 0 0 




,
Y2 = 
0
y
0
0
0
y
y
0
1
3
2






 0 0 y1 0 0 0 y3 y2 


0 0 0 y1 0 0 0 y3
and that Y2 z = E2 y.
The calculation of the perturbations to f (x) and αg(x) such that f˜(x) and g̃(x)
have a non-constant common divisor requires the solution of r(z, y) = 0, which is a
set of (m + n − k + 1) non-linear equations in z ∈ Rm+n+2 and y ∈ Rm+n−2k+1 . The
calculation of a solution of these non-linear equations requires that they be linearised,
and it is necessary to impose the constraint that f (x) and g(x) be perturbed by the
minimum amount.
Iterative algorithms for the solution of r(z, y) = 0 require that it be linearised,
and thus if it is assumed that second order terms are sufficiently small such that they
can be neglected, then since Ek = Ek (z),
r(z + δz, y + δy) = ck + (hk + δhk ) − Ak (y + δy) − (Ek + δEk )(y + δy)
≈ ck + (hk + δhk ) − Ak y − Ak δy − Ek y − Ek δy − δEk y
= r(z, y) + Pk δz − Ak δy − Ek δy − Yk δz
= r(z, y) − (Yk − Pk )δz − (Ak + Ek )δy,
71
using (4.8). The requirement that the perturbations be minimised is imposed by
posing the problem in the form of a constrained minimisation,
min kD(z + δz)k such that r(z + δz, y + δy) = 0,
z+δz
(4.9)
where D ∈ R(m+n+2)×(m+n+2) is a diagonal matrix that accounts for the repetition
of the elements of z in Bk (z). In particular, since each of the perturbations zi , i =
0, . . . , m, occurs (n−k +1) times, and each of the perturbations zi , i = m+1, . . . , m+
n + 1, occurs (m − k + 1) times, it follows that

 

0
 D1 0   (n − k + 1)Im+1

D=
=
.
0 D2
0
(m − k + 1)In+1
The problem statement (4.9) is the LSE problem, and algorithms for its solution are
considered in the next section.
4.3.1
Algorithms for the solution of the LSE problem
The LSE problem (4.9) can be written in matrix form as
min kEv − sk ,
Cv=t
where
(m+n−k+1)×(2m+2n−2k+3)
(Yk − Pk ) (Ak + Ek ) ∈ R
(m+n+2)×(2m+2n−2k+3)
E =
D 0 ∈R
C =
t = r(z, y) ∈ Rm+n−k+1
s = −Dz

 δz
v = 
δy
m+n+2
∈
R

2m+2n−2k+3
,
∈R
(4.10)
72
δz ∈ Rm+n+2 and δy ∈ Rm+n−2k+1 . For ease of notation, it is convenient to define
m1 = m + n − k + 1,
m2 = m + n + 2,
m3 = 2m + 2n − 2k + 3,
m4 = m3 − m1 = m + n − k + 2,
where m1 < m2 , m3 , and thus C ∈ Rm1 ×m3 , E ∈ Rm2 ×m3 , t ∈ Rm1 and s ∈ Rm2 . It
is assumed that (i) rank C = m1 , which guarantees that the constraint is consistent,
and (ii) that
N (E)
\
N(C) = ∅
⇔


 E 
rank 
 = m3 ,
C
where N (X) denotes the null space of X, which guarantees that the LSE problem
has a unique solution [16], page 396.
There exist three principal methods for the solution of the LSE problem: The
method of weights, the method of Lagrange multipliers and the QR decomposition.
The method of weights
The method of weights requires that the LSE problem (4.9) be written as an unconstrained LS problem,




τC
τ
t




min 
v
−


 ,
s E
τ ≫ 1,
where v = v(τ ) ∈ Rm3 and τ is a weight whose large value guarantees that, in the
limit, the equality constraint is satisfied. The normal equations associated with this
minimisation are
τ 2 C T C + E T E v = τ 2 C T t + E T s.
If λ(τ ) ∈ Rm1 and r(τ ) ∈ Rm2 are defined as
λ(τ ) := τ 2 (t − Cv(τ ))
and
r(τ ) := s − Ev(τ ),
73
respectively, then
C T λ(τ ) + E T r(τ ) = 0,
and these three equations can

−2
 τ Im1


0


CT
be combined,

0 C   λ(τ )


Im2 E 
  r(τ )

T
v(τ )
E
0



  t 
  
 =  s .
  
  
0
(4.11)
The method of weights is attractive because of its simplicity – a standard LS
solver can be used – but a large value of τ may be necessary in order to obtain an
acceptable solution. This may, however, cause numerical problems, as shown in Section 3 in [29], and care must therefore be exercised when the method is implemented
computationally.
The value of τ must be determined, and this issue is now considered. Van Loan
1
[29] recommends the value τ = µ− 2 , where µ is the machine precision, because it
implies that
kEv − sk2 + τ 2 kCv − tk2 = kEv − sk2 +
1
kCv − tk2 ≈ kCv − tk2 ,
µ
from which it follows that the equality constraint is enforced exactly, to within the
limits of machine precision. Barlow [2], and Barlow and Vemulapati [3], recommend
1
that τ = µ− 3 , which is a heuristic that is derived from experimental results. They
note that the choice of τ is critical for the convergence of the algorithm because if τ
is too small or too large, the algorithm may converge very slowly, or it may converge
to inaccurate values, or it may not converge at all.
1
1
The values τ = µ− 2 and τ = µ− 3 are independent of the data E, C, s and t, and
74
it is therefore appropriate to multiply the constraint by a constant ν such that
E s = ν C t .
It is recommended that this scaled form of the equality constraint, that is,
C1 v = t1 ,
C1 = νC
and
t1 = νt,
be used when the LSE problem is solved by the method of weights.
The method of weights is used for the solution of the LSE problem in [25], [27]
and [55], but the disadvantages discussed above suggest that alternative methods for
its solution be sought.
The method of Lagrange multipliers
The LSE problem can also be solved by the method of Lagrange multipliers. In
particular, if λ ∈ Rm1 is a vector of Lagrange multipliers, then the LSE problem
requires the minimisation of the function h(v, λ),
h(v, λ) =
1
(Ev − s)T (Ev − s) − λT (Cv − t) ,
2
and this leads to the equations
E T Ev − E T s − C T λ = 0
Cv = t.
The residual r of the objective function is equal to
r = s − Ev,
and thus
E T r + C T λ = 0,
from which it follows that these equations can be written in matrix form,

   
0 C  λ   t 
 0

   
 0 I
 r  =  s .
E
m2

   

   
CT ET 0
v
0
75
(4.12)
It is seen that the solution of (4.11) approaches the solution of (4.12) as τ → ∞, which
provides the equivalence between the method of weights and the method of Lagrange
multipliers. The coefficient matrix in (4.12) is square and of order m1 + m2 + m3 ,
which is large for many problems of practical interest. Smaller matrices are required
when the QR decomposition is used to solve the LSE problem.
The QR decomposition
The LSE problem can be solved directly by the QR decomposition [13], pages 585-586,
and [16], pages 397-398. Let


 R1 
C T = QR = Q 
,
0
where Q ∈ Rm3 ×m3 is an orthogonal matrix, R ∈ Rm3 ×m1 and R1 ∈ Rm1 ×m1 is a
non-singular upper triangular matrix, be the QR decomposition of C T . If
EQ = E1 E2 ,
where E1 ∈ Rm2 ×m1 and E2 ∈ Rm2 ×m4 , and


 w1 
QT v = 
,
w2
where w1 ∈ Rm1 and w2 ∈ Rm4 , the constraint Cv = t becomes
R1T w1 = t.
76
Similarly, the objective function kEv − sk becomes
kEv − sk = EQQT v − s


w
1


= EQ 
−
s

w2
= kE1 w1 + E2 w2 − sk
= kE2 w2 − (s − E1 w1 )k ,
and thus it is minimised when
w2 = E2† (s − E1 w1 ) ,
from which it follows that the solution of the LSE problem is


 w1 
v = Q
 = Q(:, 1 : m1 )w1 + Q(:, m1 + 1 : m3 )w2 .
w2
This method for solving the LSE problem will be used because it does not possess
the disadvantages of the method of weights, and the matrices are smaller than those
required for the method of Lagrange multipliers.
4.3.2
Computational details
The given inexact polynomials f (x) and g(x), which are defined in (4.2) and (4.3)
respectively, are constructed by perturbing their theoretically exact forms fˆ(x) and
ĝ(x), whose coefficients are âi , i = 0, . . . , m, and b̂i , i = 0, . . . , n, respectively. It
therefore follows that if µ = 1/ε is the signal-to-noise ratio, then
kδf (x)k = ε fˆ(x)
and
kδg(x)k = ε kĝ(x)k .
If cf ∈ Rm+1 and cg ∈ Rn+1 are vectors of random variables, all of which are uniformly
distributed in the interval [−1, +1], then the perturbations δf (x) and δg(x) are given
77
by
δf (x) = ε
ˆ f (x) cf
kcf k
and
δg(x) = ε
kĝ(x)k cg
,
kcg k
and thus the inexact polynomials f (x) and g(x) are
ˆ m
f (x) cf
X
ˆ
f (x) = f (x) + ε
=
(âi + δai ) xi ,
kcf k
i=0
and
n
kĝ(x)k cg X =
b̂i + δbi xi ,
g(x) = ĝ(x) + ε
kcg k
i=0
respectively. It therefore follows that the forms of the inexact polynomials f (x) and
g(x) that form the Sylvester matrix S(f, αg) are
m
X
1
f (x) = Q
(âi + δai ) xi ,
1
m
( i=0 |âi + δai |) m+1 i=0
and
m X
1
i
g(x) = b̂
+
δb
i
i x .
1
Qn n+1
i=0
i=0 b̂i + δbi (4.13)
(4.14)
It was shown in Section 4.3 that the computation of an approximate GCD can be
posed as an LSE problem, and algorithms for its solution were considered in Section
4.3.1. Algorithm 4.1 is a simple implementation of the QR decomposition for the
solution of the LSE problem (4.9). Since the objective function r(z + δz, y + δy) is
obtained by linearisation, the QR decomposition is applied to the linearised form, and
a simple iterative procedure is used to obtain a solution of the non-linear equation.
An initial estimate of the solution is required, and this is obtained by setting z = 0,
that is, a simple LS problem is solved. The corresponding value of y is obtained by
setting r(z, y) = hk = 0 and Ek = 0 in (4.6), and thus its initial value is given by the
78
solution of the minimisation
y = arg min kAk t − ck k .
(4.15)
t
Termination of the algorithm occurs when the relative error between successive iterates is less than a specified tolerance. The perturbed polynomials that have a
non-constant GCD are given by
m
X
˜
f(x) =
(ai + zi ) xi
and
g̃(x) =
i=0
n X
i=0
zm+1+i i
bi +
x,
α
because zm+1+i , i = 0, . . . , n, are the structured perturbations of αg(x).
Algorithm 4.1: STLN for the computation of an approximate GCD
Input The polynomials f (x) and g(x), the scalar α, a value for k, where 1 ≤ k ≤
min (m, n), and the tolerances ǫy and ǫz .
Output Polynomials f˜(x) = f (x) + δf (x) and g̃(x) = g(x) + δg(x) such that the
˜
degree of the GCD of f(x)
and g̃(x) is greater than or equal to k.
Begin
1. Form the k’th Sylvester matrix Sk (f, αg) from f (x), g(x) and α.
2. Set Ek = 0 and hk = 0, and compute the initial value of y from (4.15). Construct
the residual r(z, y) = ck − Ak y, the matrix Yk from y, and the matrix Pk .
3. Form the matrices E and C, and the vectors t and s, as shown in (4.10).
4. Repeat
(a) Compute the QR decomposition of C T ,

79

 R1 
C T = QR = Q 
.
0
(b) Set w1 = R1−T t.
(c) Partition EQ as
EQ =
E1 E2
,
where E1 ∈ R(m+n+2)×(m+n−k+1) and E2 ∈ R(m+n+2)×(m+n−k+2) .
(d) Compute
w2 = E2† (s − E1 w1 ) .
(e) Compute the solution




 δz 
 w1 

 = Q
.
δy
w2
(f) Set y := y + δy and z := z + δz.
(g) Update Ek and hk from z, and Yk from y. Compute the residual r(z, y) =
(ck + hk ) − (Ak + Ek )y.
Until
kδyk
kyk
≤ ǫy AND
kδzk
kzk
≤ ǫz .
End
Algorithm 4.1 can be improved by distinguishing between valid and invalid approximate GCDs. The method of STLN allows the vector z of perturbations of the
coefficients of f (x) and αg(x) that solves the LSE problem to be calculated, but the
80
maximum permissible value of kzk is related to the signal-to-noise ratio µ of the coefficients of f (x) and g(x). In particular, the smaller the value of µ, the larger the
maximum permissible value of kzk. This consideration leads to the definition of the
legitimate solution space.
Definition 4.1 (Legitimate solution space). The legitimate solution space of fˆ(x)
is the region that contains all perturbations of its coefficients that are allowed by the
signal-to-noise ratio µ. The maximum allowable magnitude of these perturbations is
ρ, where
ˆ
kf(x)k
,
µ
and all perturbations that are smaller than this bound lie in the legitimate solution
space.
The errors consist of the data errors f (x) − fˆ(x) and the structured perturba-
tions from the method of STLN, and thus the perturbations must satisfy
ˆ f (x)
,
f (x) − fˆ(x) + kzf k ≤
µ
(4.16)
where zf ∈ Rm+1 denotes the structured perturbations of f (x). This equation requires
modification because fˆ(x) is not known, and thus if it is assumed that fˆ(x) ≈
kf (x)k, then (4.16) can be approximated by
kzf k ≤
kf (x)k
.
µ
(4.17)
This definition of the legitimate solution space is expressed in terms of f (x), and it is
clear that (4.17) is also satisfied by g(x), but with a slight modification. Specifically,
since zm+i+1 , i = 0, . . . , n, are the perturbations of the coefficients αbi , it follows that
kzg k
kg(x)k
≤
,
α
µ
(4.18)
81
where zg ∈ Rn+1 stores the structured perturbations of the polynomial αg(x). Acceptable structured perturbations require that the conditions (4.17) and (4.18) be
satisfied.
Algorithm 4.2 is an extension of Algorithm 4.1 that performs a sequence of tests
in order to eliminate values of α, and therefore polynomials f˜(x) and g̃(x), from
Algorithm 4.1 that do not satisfy error criteria with regard to the legitimate solution
space, the magnitude of the normalised residual, and the rank of the structured low
rank approximation. In particular, Algorithm 4.2 is executed for a range of values of
α, and the results are stored. Each value of α yields a different pair of polynomials
f˜(x) and g̃(x), and Step 2 of Algorithm 4.2 is used to eliminate the values of α for
which the magnitude of the structured perturbations is greater than the error in the
polynomials, that is, polynomials that lie outside the legitimate solution space are
discarded. Values of α for which the normalised residual krnorm k is too large are
eliminated in Step 3 of Algorithm 4.2, which is therefore performed on a reduced
set of solutions. Step 4 of Algorithm 4.2 calculates, for each of the remaining values
of α, the singular values of the Sylvester matrix S(f˜, g̃) in order to determine its
numerical rank. The value of α for which this quantity is most clearly defined is the
optimal value α0 of α, and a low rank approximation of S(f, αg) is constructed from
the polynomials f˜0 (x) and g̃0 (x), which are the polynomials that are associated with
α0 . An approximate GCD of f (x) and g(x) can be calculated by performing an LU
or QR decomposition on S(f˜0 , g̃0 ).
82
Algorithm 4.2: Extended STLN for the computation of an approximate
GCD
Input The polynomials f (x) and g(x), the scalar α, a value for k where 1 ≤ k ≤
min (m, n), the tolerances ǫy and ǫz , the signal-to-noise ratio µ, and a range of
values of α, α1 ≤ α ≤ α2 .
Output Polynomials f˜0 (x) and g̃0 (x) such that the degree of the GCD of f˜0 (x) and
g̃0 (x) is greater than or equal to k.
Begin
1. Apply Algorithm 4.1 with the given values of ǫy , ǫz and all values of α in the
specified range. For each value of α, store the values of kzf k , kzg k and rnorm ,
rnorm =
r(z, y)
,
kck + hk k
where r(z, y) is calculated in Step 3g of Algorithm 4.1 and rnorm is the normalised form of r.
2. Retain the values of α for the values of kzf k and kzg k that satisfy (4.17) and
(4.18), respectively.
3. Retain the values of α for which the normalised residual krnorm k satisfies the
error criterion
krnorm k ≤ 10−13 .
4. For each acceptable value of α, compute the singular values σi of S(f˜, g̃), where
f˜(x) and g̃(x) are the polynomials that are computed by Algorithm 4.1 and are
83
normalised by the geometric mean of their coefficients, as shown in (4.2) and
(4.3) for f (x) and g(x), respectively. Arrange the singular values σi in nonincreasing order, and choose the value α0 of α for which the numerical rank of
˜ g̃) is equal to (m + n − k), that is, the ratio
S(f,
σm+n−k
σm+n−(k−1)
,
(4.19)
is a maximum. The polynomials that correspond to the value α0 are f˜0 (x) and
g̃0 (x).
End
Examples 4.3 and 4.4 implement Algorithm 4.2 [48]. The polynomials in these
examples have several multiple roots of high degree, and they therefore provide a
good test for the algorithm. An approximate GCD that lies within the legitimate
solution space is obtained in both examples, and a simple test is included to show
˜
that the computed polynomials f(x)
and g̃(x) are not coprime.
Example 4.3. Consider the exact polynomials
fˆ1 (x) = (x − 0.25)8(x − 0.5)9 (x − 0.75)10 (x − 1)11 (x − 1.25)12 ,
(4.20)
ĝ1 (x) = (x + 0.25)4(x − 0.25)5(x − 0.5)6 ,
(4.21)
and
which have 11 common roots, from which it follows that the rank of S(fˆ1 , ĝ1 ) is equal
to 54. The termination constants ǫy and ǫz , which are defined in Algorithm 4.1, were
set equal to 10−6 and 10−8 , respectively.
−3.4
84
0
(a)
−3.6
(a)
−5
−3.8
−10
−4
(b)
(b)
−15
−4.2
−4.4
−5
0
−20
−5
5
log α
10
(i)
(ii)
10
log
−8
α=0
8
log10 σ54/σ55
10
||
norm
10
log ||r
5
10
−6
−10
−12
−14
−16
−5
0
log α
6
log10 α = 0
4
2
0
log10α
5
(iii)
0
−5
0
log10α
5
(iv)
Figure 4.1: (i)(a) The maximum allowable value of kzf1 k, which is equal to kf1 (x)k /µ,
(b) the computed value of kzf1 k; (ii)(a) the maximum allowable value of
kzg1 k/α, which is equal to kg1 (x)k /µ, (b) the computed value of kzg1 k/α;
(iii) the normalised residual krnorm k; (iv) the singular value ratio σ54 /σ55 .
Case 1: The computation of a family of approximate GCDs from a given structured low rank approximation of S(f1 , αg1).
The exact polynomials (4.20) and (4.21) were perturbed by noise such that µ = 108
and then normalised by the geometric mean of their coefficients, as shown in (4.13)
and (4.14), thereby yielding the polynomials f1 (x) and g1 (x). Figure 4.1 shows the
results of applying the criteria in Steps 2, 3 and 4 in Algorithm 4.2. In particular,
Figure 4.1(i) shows the ratio kf1 (x)k /µ, which is the maximum allowable perturbation
of f1 (x), and the variation with α of the computed value of kzf1 k, which is calculated
85
α = 10−0.6
0
−5
−10
−15
−20
−25
−30
0
10
20
30
40
50
60
70
i
Figure 4.2: The normalised singular values of the Sylvester matrix, on a logarithmic
scale, for (i) the theoretically exact data S(fˆ1 , ĝ1 ), ♦; (ii) the given inexact
data S(f1 , g1), ; (iii) the computed data S(f˜1,0 , g̃1,0 ), ×, for α = 10−0.6 .
All the polynomials are normalised by the geometric mean of their coefficients.
by the method of STLN. Figure 4.1(ii) is the same as Figure 4.1(i), but for g1 (x), and
it is seen from (4.17) and (4.18) that valid solutions are obtained for log10 α > −0.9.
Figure 4.1(iii) shows the variation of the normalised residual krnorm k with α, and it
is seen that it ranges from O(10−16 ) to O(10−8) in the specified range of α.
Figure 4.1(iv) shows the variation with α of the ratio σ54 /σ55 that is defined in
(4.19), and it is seen that the profile of this curve could be produced (approximately)
by calculating the reciprocal (to within a scale factor) of the normalised residual
shown in Figure 4.1(iii). This result, which has been observed frequently, suggests
that small values of the normalised residual are associated with large values of the
86
ratio (4.19).
This example clearly shows the importance of including α in the analysis because there exist, in general, many values of α for which the normalised residual is
sufficiently small and the ratio σ54 /σ55 is sufficiently large. The small value of the
normalised residual implies that the perturbed equation (4.6) is satisfied to high accuracy, and the large value of σ54 /σ55 implies that the numerical rank of the structured
low rank approximation S(f˜1 , g˜1) is well defined. Each of these values of α yields a
different structured low rank approximation of S(f1 , αg1), and therefore a different
approximate GCD of f1 (x) and g1 (x).
It is shown in Figure 4.1(iv) that in the absence of scaling, that is, log10 α = 0,
a poor solution is obtained because the ratio of the singular values (4.19) is approximately equal to 101.5 , which is about 7 orders of magnitude smaller than the ratio
obtained for log10 α = −0.6, which is the optimal value of α. Figure 4.1(iii) shows
that if log10 α = 0, the normalised residual is about 6 orders of magnitude larger than
the value obtained for log10 α = −0.6. These observations show that an arbitrary
choice of α can yield severely suboptimal results when it required to compute an
approximate GCD of f (x) and g(x) from S(f, αg).
Figure 4.2 shows the normalised singular values of the Sylvester resultant matrices S(fˆ1 , ĝ1 ), S(f1 , g1), and S(f˜1,0 , g̃1,0 ) for the optimal value of α, where all the
polynomials are normalised by the geometric mean of their coefficients. The polynomials f˜1,0 (x) and g̃1,0 (x) are the polynomials computed in Algorithm 4.2 that form
the structured low rank approximation of S(f1 , αg1 ), α = 10−0.6 . It is seen that the
computed singular values of S(fˆ1 , ĝ1 ) do not show a sharp cut off, which would suggest that the polynomials (4.20) and (4.21) are coprime. The profile of the singular
87
values of S(f1 , g1 ) shows that the noise affects the small singular values severely, but
significantly improved results are obtained when the Sylvester matrix S(f˜1,0 , g̃1,0) is
considered. In particular, it is clear that the numerical rank of this matrix is equal
to 54 because σ54 is about 7 orders of magnitude larger than σ55 . Since the Sylvester
matrix is of order 65 × 65 and k = 11, the method of STLN has yielded an excellent
result. Convergence of the algorithm was achieved in 45 iterations. It is clear that
S(f˜1,0 , g̃1,0 ) can be used to compute an approximate GCD of f1 (x) and g1 (x).
This example has considered the situation in which the correct subresultant has
been selected because the degree of the GCD of fˆ1 (x) and ĝ1 (x) is 11, which is the
chosen value of k, but this information is not, in general, known a priori. It is
therefore necessary to consider how the solution changes as a function of k, and this
is investigated in Case 2.
Case 2: The effects of different subresultants.
Computational experiments showed that the method of STLN is able to compute
structured low rank approximations for k = 10, . . . , 1. Figure 4.3 shows the results
for k = 8, and it is seen that the numerical rank of S(fˆ1 , ĝ1 ) is not defined, but the
numerical rank of its structured low rank approximation S(f˜1,0 , g̃1,0) is equal to 57,
corresponding to a loss in rank of 8. Convergence was achieved in 26 iterations.
Consider now the situation that occurs for k = 12, 13 and 14. In particular,
successful results were obtained for k = 12 and k = 13, but the computed solution
for k ≥ 14 was not acceptable. This can be seen for k = 14 in Figures 4.4(i) and (ii),
which show that although valid solutions exist for either f1 (x) or g1 (x), they do not
exist for both f1 (x) and g1 (x). It is noted that if it is not required that the solution
lie in the legitimate solution space, it is possible to construct structured low rank
88
α = 101.4
0
−5
−10
−15
−20
−25
−30
0
10
20
30
40
50
60
70
i
data S(f1 , g1 ), ; (iii) the computed data S(f˜1,0 , g̃1,0 ), ×, for α = 101.4 .
approximations matrices that can be used for the computation of approximate GCDs
of f1 (x) and g1 (x), such that the ratio (4.19) is large and the normalised residual is
small.
Example 4.4. Consider the polynomials
fˆ2 (x) = (x − 1)8 (x − 2)16 (x − 3)24 ,
and
ĝ2 (x) = (x − 1)12 (x + 2)4 (x − 3)8 (x + 4)2 ,
which have 16 common roots, and thus the rank of S(fˆ2 , ĝ2 ) is 58. The polynomials
−2
89
0
(b)
−2
−2.5
−4
−3
(a)
−6
−8
(a)
−3.5
(b)
−10
−4
−12
−4.5
−5
0
log α
5
−14
−5
10
(i)
(ii)
11
10.8
norm
||
log10 σ51/σ52
−14.5
10
log ||r
5
10
−14
−15
−15.5
−16
−16.5
−5
0
log α
10.6
10.4
10.2
10
0
log10α
(iii)
5
9.8
−5
0
log10α
5
(iv)
Figure 4.4: (i)(a) The maximum allowable value of kzf1 k, which is equal to kf1 (x)k /µ,
(b) the computed value of kzf1 k; (ii)(a) the maximum allowable value of
kzg1 k/α, which is equal to kg1 (x)k /µ, (b) the computed value of kzg1 k/α;
(iii) the normalised residual krnorm k; (iv) the singular value ratio σ51 /σ52 .
were perturbed by noise such that µ = 108 , and the result for k = 16 is shown
in Figure 4.5. It is seen that although the numerical rank of S(fˆ2 , ĝ2 ) is not well
defined, the rank of the structured low rank approximation S(f˜2,0 , g̃2,0 ) is 58, which
is the correct value. Convergence was achieved in 22 iterations.
90
α = 100.1
0
−5
−10
−15
−20
−25
0
10
20
30
40
50
60
70
80
i
data S(f2 , g2 ), ; (iii) the computed data S(f˜2,0 , g̃2,0 ), ×, for α = 100.1 .
4.4
Approximate GCDs of Bernstein basis polynomials
It was shown in Section 4.3 that the computation of an approximate GCD of two
polynomials in the power (monomial) basis can be expressed as an LSE problem,
and algorithms for its solution were discussed in Section 4.3.1. In this section, the
computation of an approximate GCD of two Bernstein basis polynomials is considered,
and it is shown that only minor changes are required to the theory and computational
implementation for the power basis.
91
The calculation of a structured low rank approximation of the Sylvester matrix
for Bernstein basis polynomials requires that the polynomials be transformed to the
scaled Bernstein basis, the basis functions of which are stated in (3.20). Comparison
of these basis functions with the basis functions (3.16) that define the Bernstein basis
show that a polynomial expressed in the Bernstein basis can be transformed to the
scaled Bernstein basis form by moving the combinatorial factor mi from the basis
function to the coefficient,
X
m
m X
m
m
m−i i
ci
(1 − x) x =
ci
(1 − x)m−i xi .
i
i
i=0
i=0
The expression on the left is a Bernstein basis polynomial, and the expression on the
right is the scaled Bernstein basis form of this polynomial. The matrix T (p, q) in
(3.19) is the Sylvester resultant matrix of the polynomials p(x) and q(x), which are
defined in (3.17), when they are expressed in the scaled Bernstein basis. This must
be compared with S(p, q), which is the Sylvester resultant of p(x) and q(x) when they
are expressed in the Bernstein basis, and these two resultant matrices are related by
a diagonal matrix, as shown in (3.18). The resultant matrices T (p, q) and S(f, g)
have exactly the same structure, and thus the formulation of the LSE problem for
the calculation of an approximate GCD of scaled Bernstein basis polynomials is very
similar to its formulation for power basis polynomials.
The subresultant matrix Sk (p, q) of the Bernstein basis resultant matrix S(p, q)
can be decomposed as
Sk (p, q) = Dk−1 Tk (p, q),
where Tk (p, q) ∈ R(m+n−k+1)×(m+n−2k+2) is the k’th subresultant matrix of T (p, q),
that is, Tk (p, q) is formed from T (p, q) by deleting the last (k − 1) columns of p(x),
the last (k − 1) columns of q(x), and the last (k − 1) rows. Similarly, the diagonal
92
matrix Dk ∈ R(m+n−k+1)×(m+n−k+1) is obtained by deleting the last (k − 1) rows and
the last (k − 1) columns of D. The matrix Tk (p, q) is written as
Tk (p, q) =
ek Fk
=
ek coeffs. of p(x) coeffs. of q(x) ,
where ek ∈ R(m+n−k+1) and Fk ∈ R(m+n−k+1)×(m+n−2k+1) , and thus
−1
−1
−1
Sk (p, q) = Dk
ek Fk = Dk ek Dk Fk ,
which is the same as (3.10), but for a Bernstein basis polynomial.
Example 4.5. The Sylvester resultant matrix of the polynomials
4
4
4
4
3
p(x) = 3
(1 − x) − 2
(1 − x) x − 5
(1 − x)2 x2
0
1
2
4
4 4
+2
(1 − x)x3 + 6
x,
3
4
3
5 3
1 3
3 3
3
2
2
q(x) = 3
(1 − x) −
(1 − x) x −
(1 − x)x +
x,
0
6 1
2 2
3
is


3
0
0
3
0
0
0




1
5
1
 −4

0
−
0
0
 3

2
12
2




8
1
1
1
1
−
−
0
 −2 − 15

5
10
6
5



1
3
1
3 
S(p, q) = S1 (p, q) =  25 − 32 − 52
− 40 − 8 20  ,
20




8
1
1
1 
 2
−2
0
−
−
 5
15
15
10
6 



4
1
1 
1
0
0
−
 0
3
6
4 


0
0
6
0
0
0
1
(4.22)
93
and this matrix can be decomposed as the product D1−1 T1 (p, q),



1 0 0 0 0 0 0
3
0
0
3
0
0
0






5
 0 1 0 0 0 0 0   −8
3
0
0 
3
0 −2



6






1
3
5
0
0
−30
−8
3
−
0
0
0
0
−
3
0



15
2
2







1
3
5
 0 0 0 20 0 0 0  
8 −30 −8
1 −2 −2
3 .






3
5 
 0 0 0 0 1 0 0 
6
8 −30
0
1 −2 −2 


15



 0 0 0 0 0 1 0 
3 
0
6
8
0
0
1
−


6
2 



0 0 0 0 0 0 1
0
0
6
0
0
0
1
The vector e1 is equal to the first column of T1 (p, q), and the matrix F1 is formed
from columns 2, . . . , 7, of T1 (p, q).
Consider now the situation k = 2, for which the first column of S2 (p, q) is
 



3 
 3   1 0 0 0 0 0 
 



 − 4   0 1 0 0 0 0   −8 
 3  


6
 




 



1
0 0 0   −30 
 −2   0 0 15

 

 = D −1 e2 ,
2
 2 =




  0 0 0 1


0
0
8
20
 5  



 


 2   0 0 0 0 1 0 

6
 5  


15

 


0
0 0 0 0 0 61
0
and the matrix formed by columns
 

0
3
0
0   1

 

5
1
  0

1
−
0
 

2
12
2

 
 8

1
1 
− 61
  0
 − 15 − 10
5

=

 
3
1
3
1
 −

− 40 − 8 
2
20

  0
 

 8
1
1 

0
− 10
 15
  0
15

 
1
1
0
0
0
6
2, . . . , 5, of S2 (p, q) can be decomposed as D2−1 F2 ,


0 0 0 0 0 
0
3
0
0 




1
5
0
0
0
0
3
0
3
−


6
2





1
0 15
0 0 0   −8 − 32 − 52
3 

.


1
3
5

0 0 20 0 0 
1 −2 −2 

  −30


1

3 
0 0 0 15
0 
8
0
1
−

2 


0 0 0 0 16
6
0
0
1
The first column of S3 (p, q) is

 
3
1

 
 4  
 −   0
 3  

 
 −2  =  0

 

 
 2  
 5   0

 
2
0
5
0
0
0
0

3
94





 −8 
0 0 0 




−1
1


0 15 0 0   −30 
 = D3 e3 ,




1
0 0 20
0 
8 


1
0 0 0 15
6
1
6
and the matrix formed by the second and third columns of S3 (p, q) is

 


1 0 0 0 0
3
0
3
0

 


 
 5

 5
1
1
 −
  0

0 0 0 
3 
2 
6
 12

  −2


 


 − 1 −1  =  0 0 1
  − 3 − 5  = D3−1 F3 .
0
0
 2
 10

6 
15
2 
 



 1





3
1
 20 − 40
  0 0 0 20
0   1 − 32 

 


1
1
0
0 0 0 0 15
0
1
15
Since (4.22) is the scaled Bernstein basis equivalent of (3.10), it follows from
Theorem 3.5 that the scaled Bernstein basis polynomials p(x) and q(x) have a common
divisor of degree k if and only if Fk y = ek possesses a solution. Since the vector ek
95
and matrix Fk form the matrix Tk (p, q),
Tk (p, q) =
ek Fk

m
d0 n0
c0 0


..
 c1 m
.
d1 n1

1

..
..
...

c0 m0
.
.


..

n
m
=  cm−1 m−1
.
c1 m1
dn−1 n−1


..
..
 cm m
.
.
dn nn

m

...

m
cm−1 m−1


cm m
m
..
.
..
.
..
.
..
.
..
.







d0 n0



n
,
d1 1


..

.



n
dn−1 n−1


n
dn n
and this matrix has exactly the same pattern as the power basis Sylvester matrix
S(f, g), it follows that the theory in Sections 4.3 and 4.3.1 can be reproduced for
scaled Bernstein basis polynomials. In particular, T (p, q) and its subresultant matrices contain the indeterminacy that is described in Section 4.2, and thus it is more
appropriate to denote the Sylvester matrix of the scaled Bernstein basis polynomials
p(x) and q(x) as T (p, αq). Furthermore, the error matrix Bk (z), which is defined in
(4.5), is also the structured error matrix for Tk (p, αq), and thus the computation of
an approximate GCD of two scaled Bernstein basis polynomials reduces to an LSE
problem.
The entries of the vector z are the perturbations of p(x) and αq(x) when they are
expressed in the scaled Bernstein basis.
z = zp
Since z is partitioned as
αz ,
q
where zp ∈ Rm+1 and zq ∈ Rn+1 are the perturbation vectors of p(x) and q(x)
96
respectively, it follows that the corrected scaled Bernstein polynomials are
m X
m
p̃(x) =
ci
+ zi (1 − x)m−i xi ,
i
i=0
and
n X
n
zm+1+i
q̃(x) =
di
+
(1 − x)n−i xi .
i
α
i=0
The Bernstein form of these polynomials is
! m
X
zi
m
p̃(x) =
ci + m
(1 − x)m−i xi ,
i
i
i=0
and
q̃(x) =
n
X
i=0
zm+1+i
di +
α ni
! n
(1 − x)n−i xi ,
i
respectively, and thus the perturbations of the coefficients of the Bernstein form of
p(x) and q(x) are
zi
m ,
i
respectively.
i = 0, . . . , m,
and
zm+1+i
,
α ni
i = 0, . . . , n,
Example 4.6. Consider the Bernstein form of the exact polynomials [47],
p̂(x) = (x − 0.6)8 (x − 0.8)9 (x − 0.9)10 (x − 0.95)5 ,
and
q̂(x) = (x − 0.6)12 (x − 0.7)4 (x − 0.9)5 ,
whose GCD is of degree 13, and thus the rank of S(p̂, q̂) is equal 40. The 13’th
subresultant matrix, corresponding to the value k = 13, was selected, and µ was set
equal to 108 .
Figure 4.6(i) shows the ratio kp(x)k /µ, which is the maximum allowable magnitude of the perturbations of the coefficients of p(x), and the variation with α of
−1
97
2
(a)
−2
0
−3
−2
−4
(a)
−4
(b)
−5
−6
−6
−8
(b)
−7
−5
0
log α
5
−10
−5
10
(i)
(ii)
12
−6
10
log10 σ40/σ41
log10||rnorm||
5
10
−4
−8
−10
−12
−14
8
6
4
2
−16
−18
−5
0
log α
0
log10α
(iii)
5
0
−5
0
log10α
5
(iv)
Figure 4.6: The variation with α of (i)(a) The maximum allowable value of kzp k,
which is equal to kp(x)k /µ, (b) the computed value of kzp k; (ii)(a) the
maximum allowable value of kzq k/α, which is equal to kq(x)k /µ, (b)
the computed value of kzq k/α; (iii) the normalised residual krnorm k; (iv)
the singular value ratio σ40 /σ41 . The horizontal and vertical axes are
logarithmic in the four plots.
the computed value of kzp k. Figure 4.6(ii) is the same as Figure 4.6(i), but for q(x)
instead of p(x). It is seen that the four plots in this figure are identical in form to
their equivalents in Figure 4.1 for the power basis polynomials considered in Example
4.3.
The Bernstein basis equivalent of the inequality (4.17) for p(x) is satisfied for
all values of α in the specified range, but the corresponding inequality (4.18) for
q(x) is only satisfied for log10 α > 1.72. This is the minimum value of α for which
98
the bounds on the structured perturbations of the coefficients of p(x) and q(x) are
satisfied. Figure 4.6(iii) shows the variation with α of the normalised residual krnorm k,
where
rnorm =
r(z, y)
,
kek + hk k
and hk is defined in (4.5) and r(z, y) is defined in (4.6). It is seen that this variation
is significant, and in particular, the graph shows that there exist values of α for
which the normalised residual is large. It therefore follows that there does not exist
a structured matrix Ek and a structured vector hk such that (4.6) is satisfied for
these values of α. By contrast, it is also seen that there exist values of α for which
the normalised residual is equal to O (10−16 ), which implies that (4.6) is satisfied (to
within machine precision). The normalised residual is a minimum when α = 102.8 ,
and this is therefore the optimal value of α.
The theoretically exact rank of S(p̂, q̂) is equal to 40, and thus a measure of the
effectiveness of the method of STLN is the ratio γ = σ40 /σ41 of the Sylvester resultant
matrix S(p̃, q̃), where p̃ = p̃(x) and q̃ = q̃(x) are the polynomials that are computed
by the method of STLN, σi is the ith singular value of S(p̃, q̃) and the singular values
are arranged in non-increasing order. Figure 4.6(iv) shows the variation of this ratio
with α, and it is seen that it is identical in form to Figure 4.1(iv), and thus the
comments made in Example 4.3 are valid for this example. In particular, a poor
choice of α can lead to unsatisfactory results (the ratio γ is small and the normalised
residual krnorm k is large), and the default value (α = 1) may lead to poor results.
Figure 4.7 shows the normalised singular values of the Sylvester matrix of the
theoretically exact polynomials p̂(x) and q̂(x), the given inexact polynomials p(x)
and q(x), and the corrected polynomials p̃(x) and q̃(x) for α = 102.8 , which is the
99
α = 102.8
0
−10
−20
−30
−40
−50
0
10
20
30
40
50
60
Figure 4.7: The normalised singular values, on a logarithmic scale, of the Sylvester
resultant matrix for (i) the theoretically exact polynomials p̂(x) and q̂(x),
♦; (ii) the given inexact polynomials p(x) and q(x), ; (iii) the corrected
polynomials p̃(x) and q̃(x) for α = 102.8 , ×. All the polynomials are scaled
by the geometric mean of their coefficients.
optimal value of α. All the polynomials are scaled by the geometric mean of their
coefficients. Figure 4.7(i) shows that S(p̂, q̂) is of full rank, which is incorrect because
p̂ and q̂ are not coprime, and Figure 4.7(iii) shows the results for S(p̃, q̃), which are
significantly better because its (numerical) rank is 40, which is the correct value.
Since the perturbations that are used for the formation of this matrix are, by construction, structured, and its rank is equal to 40, this matrix is a structured low rank
approximation of S(p, αq), α = 102.8, that can be used to compute an approximate
GCD of p(x) and q(x).
It is seen from Figures 4.6(iii) and (iv) that there are many values of α > 101.72
100
such that the ratio σ40 /σ41 is large and the normalised residual krnorm k is sufficiently
small. There therefore exist many structured low rank approximations of S(p, αq)
that satisfy tight bounds on krnorm k, which is an error bound for the satisfaction of
(4.6), and also satisfy tight bounds on the ratio γ = σ40 /σ41 , which is a measure of the
numerical rank of S(p̃, q̃). Each of these approximations yields a different approximate
GCD, and additional constraints can be imposed on the optimisation algorithm in
order to select a particular structured low rank approximation of S(p, αq), and thus
a particular approximate GCD.
Another example of the computation of an approximate GCD of two Bernstein
polynomials is in [1].
4.5
An approximate GCD of a polynomial and its
derivative
Uspensky’s method for the computation of the roots of a polynomial, which is described in Section 2.5, requires the determination of the GCD of a polynomial and
its derivative. In this circumstance, the theory presented in earlier sections of this
chapter for the independent polynomials f (x) and g(x) can be extended because the
condition g(x) = f (1) (x) imposes additional structure on the matrices that arise in
the LSE problem, such that their dimensions can be reduced.
101
The Sylvester matrix S(f, αf (1) ) ∈ R(2m−1)×(2m−1) is

mαam
am


(m − 1)αam−1 . . .
 am−1 . . .
mαam


.. . .
.. . .

. (m − 1)αam−1
.
.
.
am 



2αa2
a1
am−1 

..
.. .
.

αa1 . .
.
a0 . .
. 

..
..

.
.
2αa2
a1 

αa1
a0 









,








and it is seen that columns m, . . . , 2m − 1, that is, the columns occupied by the
coefficients of f (1) (x) can be calculated from columns 1, . . . , m−1, which are occupied
by the coefficients of f (x). This constraint can be imposed on the problem, with the
consequence that the dimensions of the matrices in the LSE problem are reduced.
The subresultant matrices Sk f, f (1) are defined in exactly the same way as the
subresultant matrices Sk (f, g).
Example 4.7. If m = 4, then

S1









(1)
(1)
f, αf
= S f, αf
=








a4
4αa4
a3 a4
3αa3 4αa4

a2 a3 a4 2αa2 3αa3 4αa4
a1 a2 a3
αa1 2αa2 3αa3
a0 a1 a2
αa1 2αa2
a0 a1
αa1
a0









4αa4  ,


3αa3 



2αa2 

αa1
102
and S2 = S2 f, αf (1) and

4αa4
 a4

 a a 3αa
3
 3 4


 a2 a3 2αa2
S2 = 

 a1 a2 αa1


 a a
 0 1

a0
respectively.
S3 = S3 f, αf (1) are

4αa4
3αa3
2αa2
αa1






4αa4 
,

3αa3 


2αa2 


αa1


a 4αa4
 4



 a3 3αa3 4αa4 





S3 =  a2 2αa2 3αa3 
,




 a1 αa1 2αa2 


a0
αa1
Each matrix Sk f, αf (1) is partitioned into a vector ck ∈ R(2m−k) and a matrix
Ak ∈ R(2m−k)×(2m−2k) , where ck is the first column of Sk f, αf (1) and Ak is the
matrix formed from the remaining columns of Sk , where 1 ≤ k ≤ m − 1,
(1)
Sk f, αf
=
ck Ak
(1)
=
ck coeffs. of f (x) coeffs. of αf (x) ,
where the columns of f (x) occupy m − k − 1 columns and the columns of αf (1) (x)
occupy m−k+1 columns. Theorem 3.5 shows that a necessary and sufficient condition
for f (x) and f (1) (x) to have a non-constant GCD is that the equation
Ak y = ck ,
y ∈ R2(m−k) ,
(4.23)
possess a solution.
If hk ∈ R2m−k and Ek ∈ R(2m−k)×(2m−2k) are the perturbations of ck and Ak
respectively, then ck and hk have the same structure, and Ak and Ek have the same
structure. It therefore follows that if zi is the perturbation of the coefficient ai , i =
0, . . . , m, of f (x), the error vector z is equal to
T
z = zm zm−1 · · · z1 z0
∈ Rm+1 ,
103
and the error matrix Bk = Bk (z) is equal to
Bk :=
hk Ek

mαzm
zm 

 zm−1 . . .
(m − 1)αzm−1

 . ..
 .. . . . z
.

m


=  z1 . . . zm−1
2αz2


.
..
 z0 . .
.
αz1


...

z1


z0
..
.
..
.

mαzm
..
. (m − 1)αzm−1
..
..
.
.
..
.
2αz2
αz1









,








where hk is equal to the first column of Bk (z), and Ek is equal to the last 2m − 2k
columns of Bk (z). The first (m − 1 − k) columns of Ek store the perturbations of
f (x), and the last (m + 1 − k) columns of Ek store the perturbations of αf (1) (x).
It follows from the definitions of hk and z that there exists a matrix Pk ∈
R(2m−k)×(m+1) such that
Example 4.8. If m = 4

 a4

 a a
 3 4


 a2 a3
(1)
S2 f, αf
=

 a1 a2


 a a
 0 1

a0


hk = Pk z = 
Im+1
0m−k−1,m+1
and k = 2, then
4αa4
3αa3 4αa4
2αa2 3αa3
αa1 2αa2
αa1







4αa4 
,

3αa3 


2αa2 


αa1


 z.








c2 = 








a4 

a3 



a2 
,

a1 


a0 


0


z
 4 


 z3 





z =  z2 
,




 z1 


z0
hT2
and








P2 = 







=
z4 z3 z2 z1 z0 0


1 0 0 0 0 

0 1 0 0 0 



0 0 1 0 0 
,

0 0 0 1 0 


0 0 0 0 1 


0 0 0 0 0







E2 = 







,
4αz4
z4 3αz3 4αz4
z3 2αz2 3αz3
z2
αz1 2αz2
z1
αz1
z0
104







4αz4 
.

3αz3 


2αz2 


αz1
The residual r(z, y) that is associated with an approximate solution of (4.23) due
to the perturbations hk and Ek is given by
r(z, y) = ck + hk − (Ak + Ek )y,
hk = Pk z,
Ek = Ek (z),
and it was shown in Section 4.3 that the residual r(z, y) can be written as
r(z, y) = ck + hk − Ak y − Yk z.
Equation (4.7) allows the rule for the construction of the elements of Yk to be established because closed form expressions for the elements of Ek have been derived. In
particular, it follows from (4.7) that
m+1
X
r=1
2(m−k)
(Yk )i,r zr−1 =
X
s=1
(Ek )i,s ys ,
i = 1, . . . , 2m − k,
and thus for i = 1, . . . , 2m − k, where 1 ≤ k ≤ m − 1,
m+1
X
(Yk )i,r zr−1 =
r=1
m−1−k
X
2(m−k)
(Ek )i,s ys +
s=1
=
m−1−k
X
=
(Ek )i,s ys
s=m−k
(Ek )i,j yj +
j=1
m−1−k
X
X
m+1−k
X
(Ek )i,m−1−k+j ym−1−k+j
j=1
zm+1+j−i yj +
j=1
m+1−k
X
j=1
α(m + j − i)zm+j−i ym−1−k+j
from the formulae for (Ek )i,j
=
m−k
X
zm+j−i yj−1 +
j=2
=
j=1
m−k
X
j=2
m+1−k
X
α(m + j − i)zm+j−i ym−1−k+j
yj−1 + α(m + j − i)ym−1−k+j zm+j−i
+α(m + 1 − i)ym−k zm+1−i
+α(2m + 1 − k − i)y2(m−k) z2m+1−k−i ,
which enables closed form expressions for the elements of Yk to be obtained.
Example 4.9. If m = 4 and k = 1, then


4αz4





 z4
3αz3 4αz4






z
z
2αz
3αz
4αz

 3 4
2
3
4




E1 =  z2 z3 αz1 2αz2 3αz3 4αz4  ,




 z1 z2
αz1 2αz2 3αz3 






z
z
αz
2αz
 0 1
1
2 


z0
αz1
105
and it follows from (4.7) that

4αy3
106






3αy3 (y1 + 4αy4 ) 






2αy
(y
+
3αy
)
(y
+
4αy
)

3
1
4
2
5 




Y1 = 
αy3 (y1 + 2αy4 ) (y2 + 3αy5)
4αy6  .






(y1 + αy4) (y2 + 2αy5 )
3αy6




 y (y + αy )

2αy

 1
2
5
6


y2
αy6
If m = 4 and k = 2, then








E2 = 







4αz4
z4 3αz3 4αz4
z3 2αz2 3αz3
z2
αz1
z1
z0
and it is readily verified from (4.7) that








Y2 = 







y1
2αz2
αz1







4αz4 
,

3αz3 


2αz2 


αz1

4αy2 

3αy2 (y1 + 4αy3 ) 



2αy2 (y1 + 3αy3)
4αy4 
.


αy2 (y1 + 2αy3 )
3αy4



(y1 + αy3)
2αy4


αy4
Equation (4.23) is a non-linear equation that is solved iteratively after it has been
107
linearised, and it was shown in Section 4.3 that the linearised form of this equation is
r(z + δz, y + δy) = r(z, y) − (Yk − Pk )δz − (Ak + Ek )δy.
It is necessary to solve r(z + δz, y + δy) = 0, subject to the constraint that the
perturbations zi are minimised. Different perturbations zi occur a different number
of times, and it is therefore necessary to minimise kDzk subject to
r(z, y) = (Yk − Pk )δz + (Ak − Ek )δy,
where the element dii = di , i = 1, . . . , m + 1, of the diagonal matrix D is calculated
from the number of times that zi occurs in Bk (z):
zm occurs
(m − k) + m(m − k + 1) times
zm−1 occurs (m − k) + (m − 1)(m − k + 1) times
zm−2 occurs (m − k) + (m − 2)(m − k + 1) times
··· ···
··· ···
z1 occurs
(m − k) + (m − k + 1) times
z0 occurs
(m − k) times
and thus
D = diag {di} = diag {(m − k) + (m − i + 1)(m − k + 1)} .
This is an LSE problem that is similar to the LSE problem in Section 4.3, and it can
be written in matrix form as
min kEv − sk ,
Cv=t
108
where
(2m−k)×(3m−2k+1)
(Yk − Pk ) (Ak + Ek ) ∈ R
(m+1)×(3m−2k+1)
E =
D 0 ∈R
C =
t = r(z, y) ∈ R2m−k
s = −Dz

 δz
v = 
δy
m+1
∈
R

3m−2k+1
,
∈R
δz ∈ Rm+1 and δy ∈ R2(m−k) . The LSE problem is solved by Algorithms 4.1 and 4.2.
4.6
GCD computations by partial singular value
decomposition
This section describes the method developed by Zeng [53] for the computation of an
approximate GCD of two polynomials. It uses the Sylvester resultant matrix, but in
a slightly rearranged form from that used in this report. In particular, he uses the
Sylvester resultant matrix of the polynomials
(1)
W (f, f ) = Cm (f (1) )
f (x) and f (1) (x),
C
m−1 (f ) ,
where Ck (f ) is the Cauchy matrix, with k

 a0

 a
 1

 ..
 .
Ck (f ) = 

 am





109
columns, of f (x),

..
.
..
.
a0
..
.
..
.
a1
..
.
am







.







The Cauchy matrix arises when polynomial multiplication is expressed in terms of
matrices and vectors. In particular, if g(x) is a polynomial of degree n, and f , g and
h are the vectors of coefficients of f (x), g(x) and h(x) = f (x)g(x) respectively, then
h = Cn+1 (f )g = Cm+1 (g)f .
The subresultant matrix Wk f, f (1) ∈ R(2m−k)×(2m−2k+1) is obtained from W f, f (1)
in the same way as the subresultant matrices Sk f, f (1) are obtained from S f, f (1) ,
that is, the last k − 1 columns of the coefficients of f (x), the last k − 1 columns of
the coefficients of f (1) (x), and the last k − 1 rows, of W f, f (1) are deleted.
Example 4.10. If m = 4,










W (f, f (1) ) = W1 (f, f (1) ) = 








a1
0
0
0 2a2 a1
0
0 3a3 2a2 a1
0 4a4 3a3 2a2 a1 0 4a4 3a3 2a2 0
0 4a4 3a3 0
0
0 4a4 a0
0

0 

a1 a0 0 



a2 a1 a0 


a3 a2 a1  ,


a4 a3 a2 



0 a4 a3 

0 0 a4
110
and the subresultant matrices W2 (f, f (1) ) and W3 (f, f (1) ) are


0
0 a0 0 
 a1


 2a
a a 
a
0
1
 2
1 0 





3a
2a
a
a
a
2
1 
3
2
1
(1)

,
W2 (f, f ) = 

 4a4 3a3 2a2 a3 a2 




 0 4a 3a a a 
3 
4
3 4



0
0 4a4 0 a4
and

respectively.





(1)
W3 (f, f ) = 





0 2a2 a1 3a3 2a2 4a4 3a3 0 4a4 a1
a0



a1 


a2 
,


a3 

a4
The method described by Zeng [53] uses the relationships (3.8) and (3.9) between
the order of a subresultant matrix Wk (f, f (1) ) and the degree d of the GCD of f (x)
and f (1) (x),
rank Wm−j (f, f (1) ) = 2j + 1,
j = 1, . . . , m − d − 1,
rank Wm−j (f, f (1) ) < 2j + 1,
j = m − d, . . . , m − 1.
and
If the first rank deficient subresultant matrix Wm−j (f, f (1) ) ∈ R(m+j)×(2j+1) , as j
increases from 1 to m − 1, occurs for j = k, then the degree d of the GCD of f (x) and
f (1) (x) is given by d = m − k. It follows that if ξj , j = 1, . . . , m − 1, is the smallest
singular value of Wm−j (f, f (1) ), that is, ξj = σ2j+1 Wm−j (f, f (1) ) , then
ξ1 , ξ2 , · · · , ξm−d−1 > 0,
ξm−d = ξm−d+1 = · · · = ξm−1 = 0.
111
(4.24)
The following theorem is proved in [53].
Theorem 4.1. Let u = u(x) and v = v(x) be polynomials of degrees t = m − d and
t − 1 respectively, such that
u(x)d(x) = f (x)
v(x)d(x) = f (1) (x).
and
(4.25)
Then
(a) u(x) and v(x) are coprime.
(b) The (column) rank of Wd (f, f 1 ) = Wm−t (f, f (1) ) is deficient by one.
(c) The vector


w=

u 
,
−v
where u and v are column vectors of the coefficients of u(x) and v(x) respectively, is the right singular vector of Wd (f, f (1) ) that is associated with the smallest (zero) singular value ξm−d .
(d) If u is known, then the vector d of the coefficients of d(x) is the solution of
Cd+1 (u)d = f.
Equation (4.24) shows that if the first rank deficient matrix Wm−j (f, f (1) ) for
j = 1, . . . , m − 1, occurs for j = k, then the degree d of the GCD of f (x) and f (1) (x)
is equal to m−k. The singular value decomposition is the natural method to calculate
the rank of a matrix, but it follows from (4.24) that the calculation of the degree of
112
the GCD of f (x) and f (1) (x) only requires the determination of the index of the first
zero singular value, and the associated right singular vector, and not all the singular
values and singular vectors. It is shown in [53, 54] that if the columns of Wk (f, f (1) )
are rearranged, the QR decomposition can be updated and an inverse iteration used
to compute these two quantities efficiently. This rearrangement requires that the
columns of the coefficients of f (x) and f (1) (x) interlace, and the rearranged form of
Wk (f, f (1) ) is obtained by adding two columns to the right hand side, and a row at
the bottom, to the rearranged form of Wk+1 (f, f (1) ) for k = m − 2, m − 3, . . . , 1.
Example 4.11. Consider Wk (f, f (1) ), k = 1, 2, 3, for
f (x) = (x − 1)2 (x − 2)(x − 3) = x4 − 7x3 + 17x2 − 17x + 6,
and thus
f (1) (x) = 4x3 − 21x2 + 34x − 17.
The subresultant matrix W3 (f, f 1 ) is

a
0
 1

 2a2 a1


(1)
W3 (f, f ) = 
 3a3 2a2


 4a4 3a3

0 4a4
a0



a1 


a2 
,


a3 

a4
and this matrix is rearranged so that the coefficients of f (x) and f (1) (x) interlace.
113
The reordered form of W3 (f, f (1) ) is therefore given by


a1 a0
0




 2a2 a1 a1 




r
(1)

W3 (f, f ) =  3a3 a2 2a2 
.




 4a4 a3 3a3 


0 a4 4a4
The subresultant matrix W2 (f, f (1) ) is

0
0
 a1

 2a
0
 2 a1


 3a3 2a2 a1
(1)
W2 (f, f ) = 

 4a4 3a3 2a2


 0 4a 3a
4
3


0
0 4a4
and the reordered form of this matrix is

 a1

 2a
 2


 3a3
r
(1)
W2 (f, f ) = 

 4a4


 0


0
a0
a1
0
a0

0

0 

a1 a0 



a2 a1 
,

a3 a2 


a4 a3 


0 a4
a1 a0
a2 2a2 a1
a3 3a3 a2
a4 4a4 a3
0
0 a4
0 

0 



a1 
,

2a2 


3a3 


4a4
and the columns of the coefficients of f (x) and f (1) (x) interlace. The reordered matrix
W2r (f, f (1) ) is obtained by the addition of a column of the coefficients of f (x) and
a column of the coefficients of f (1) (x) on the right hand side, and an extra row, to
W3r (f, f (1) ).
The subresultant matrix W1 (f, f (1) ) is

a
0
0
0
 1

 2a2 a1
0
0



0
 3a3 2a2 a1


W1 (f, f (1) ) =  4a4 3a3 2a2 a1


 0 4a4 3a3 2a2


 0
0 4a4 3a3


0
0
0 4a4
a0
0
0
a1 a0
0
a2 a1 a0
a3 a2 a1
a4 a3 a2
0 a4 a3
0
0 a4
114










,








and the rearranged form of this matrix, such that the columns of f (x) and f (1) (x)
interlace, is

a1 a0


 2a2



 3a3


r
(1)
W1 (f, f ) =  4a4


 0



 0

0
0
0
0
0
a1 a0
0
0
0
a1 a0
0
a3 3a3 a2 2a2 a1
a1
a1
0
a2 2a2 a1
a4 4a4 a3 3a3 a2 2a2
0
0 a4 4a4 a3 3a3
0
0
0
0 a4 4a4










,








which is formed from W2r (f, f (1) ) by the addition of two columns and one row, as
described above.
The computation of the GCD of f (x) and f (1) (x) reduces to the computation of
the polynomials u(x) and v(x), which are defined in (4.25), and this involves three
stages:
• Obtain initial estimates of the coprime polynomials u(x) and v(x) (Section
4.6.1).
115
• Obtain an initial estimate of the GCD d(x) (Section 4.6.2).
• Use the method of non-linear least squares to obtain improved estimates of
u(x), v(x) and d(x) (Section 4.6.3).
The degree of the GCD of f (x) and f (1) (x) is not known a priori, and it is therefore
r
necessary to interrogate the subresultant matrices Wm−j
(f, f (1) ) for j = 1, . . . , m − 1,
and determine the first matrix in this sequence that is rank deficient. This requires
a criterion in order to determine when its smallest singular value ξj can be assumed
to be (numerically) zero, and this is established in Section 4.6.4. Since it is only
r
required to determine the smallest singular value of Wm−j
(f, f (1) ), and the associated
r
right singular vector, a complete singular value decomposition of Wm−j
(f, f (1) ) is not
required. It is shown, however, in Lemma 2 in [53] and Lemma 2.4 in [54] that ξj can
r
be calculated from the QR decomposition of Wm−j
(f, f (1) ), and furthermore, a fast
r
update procedure for the calculation of the QR decomposition of Wm−j
(f, f (1) ) from
r
the QR decomposition of Wm−j+1
(f, f (1) ) is derived.
4.6.1. The calculation of the coprime polynomials
The accurate computation of the GCD of f (x) and f (1) (x) requires that their coprime polynomials u(x) and v(x) be calculated, and pseudo-code for this is shown in
Algorithm 4.3. The coefficients of u(x) and v(x) are stored in the vectors u and v
respectively, and they are calculated in Step 3(b) in Algorithm 4.3 from the right singular vector yj that is associated with the singular value ξj ≤ θ, as stated in Theorem
4.1. An expression for the threshold θ in terms of the noise level of the coefficients of
f (x) and the singular values ξj is developed in Section 4.6.4.
Algorithm 4.3 is very simple, such that the computational reliability of u and v
116
cannot be guaranteed, and thus their values must be improved. They are therefore
initial estimates u0 and v0 in a refinement strategy, which is implemented by the
method of non-linear least squares and considered in Section 4.6.3.
Algorithm 4.3: The coprime factors of a polynomial and its derivative
r
Input The matrix Wm−1
(f, f (1) ) and a tolerance θ for the smallest singular values of
r
Wm−j
(f, f (1) ), j = 1, . . . , m − 1.
Output The coprime factors, whose coefficients are stored in u0 and v0 , of f (x) and
f (1) (x).
Begin
r
1. Calculate Wm−1
(f, f (1) ) by rearranging the columns of Wm−1 (f, f (1) ).
r
2. Calculate the QR decomposition Wm−1
(f, f (1) ) = Qm−1 Rm−1 .
3. Set j = 1.
While j ≤ m − 1
(a) Use the inverse iteration algorithm in [53, 54] to calculate the smallest
r
singular value ξj and corresponding right singular vector yj of Wm−j
(f, f (1) ).
(b) If ξj ≤ θ
Then d = m − j, and calculate u0 and v0 from yj .
Go to End
(c) Else Set j = j + 1.
117
r
r
(d) Calculate Wm−j
(f, f (1) ) from Wm−j+1
(f, f (1) ).
End While
% f (x) and f (1) (x) have no common factors.
End
4.6.2. The calculation of the initial estimate of the GCD
The calculation of the initial estimates u0 and v0 allows an initial estimate d0 of the
GCD of f (x) and f (1) (x) to be calculated. In particular, it follows from (4.25) that
Cd+1 (u0 )d 0 = f
and
Cd+1 (v0 )d 0 = f (1) ,
and these equations can be combined into one equation




 Cd+1 (u0) 
 f


 d0 = 
,
Cd+1 (v0 )
f (1)
(4.26)
where the coefficient matrix is of order (2m + 1) × (d + 1). This is a linear least
squares problem that is solved by standard methods.
4.6.3. Refinement by non-linear least squares
Initial estimates u0 and v0 of the coprime polynomials of f (x) and f (1) (x) were
calculated in Algorithm 4.3, and an initial estimate d0 of the GCD of f (x) and
f (1) (x) was calculated in (4.26). It is shown in this section that the refinement of these
estimates leads to a non-linear least squares minimisation that is solved iteratively
by the Gauss-Newton method, in which u0 , v0 and d0 are the initial estimates of the
solution.
118
Equation (4.25), which shows how the polynomials u(x), v(x) and d(x) are related
to f (x) and f (1) (x), can be cast in matrix form,
Cd+1 (u)d = f
and
Cd+1 (v)d = f (1) .
This pair of equations contains an arbitrary scale factor that must be removed before
the initial estimates u0 , v0 and d0 of u, v and d, respectively, can be refined. It is
therefore necessary to add a constraint on the magnitude of d, and thus the equations
that are used for the refinement of u, v and d are
Cd+1 (v)d = f (1)
Cd+1 (u)d = f ,
and it is therefore required to minimise
T
r d −1 = 0
C (u)d = f
d+1
Cd+1 (v)d = f (1)
or kF (z ) − bk2 , where

T
 r d −1

F (z ) = 
 Cd+1 (u)d

Cd+1 (v)d



,




 d 



z =
 u 


v
and
2
,
and
r T d = 1,

 0

b=
 f

f (1)



,


(4.27)
in order to obtain improved estimates of u, v and d. The vectors Cd+1 (u)d and f are
of length m + 1, and the vectors Cd+1 (v)d and f(1) are of length m.
The equation that is satisfied at stationarity must be determined. Let
r = kF (z ) − bk2 = F (z )T F (z ) − 2b T F (z ) + b T b,
and thus
δr = F (z + δz )T F (z + δz ) − F (z )T F (z ) − 2b T (F (z + δz ) − F (z )) .
(4.28)
119
Since, to first order,
F (z + δz ) = F (z ) +
2m−d+2
X
i=1
= F (z ) +
= F (z ) +
where
∂F (z )
∂zi
∂F (z )
∂z1
∂F (z )
δzi
∂zi
∂F (z )
∂z2
···
∂F (z )
∂z2m−d+2
∂F (z )
δz ,
∂z
is a column vector of length 2m + 2 and
(2m + 2) × (2m − d + 2), it follows that
F (z + δz )T = F (z )T + δz T
∂F (z )
∂z
δz
is a matrix of order
∂F (z )T
.
∂z
The substitution of these expressions into (4.28) yields
∂F (z )T
(F (z ) − b) ,
δr = 2δz
∂z
T
to lowest order, and thus the vector z that minimises r satisfies
∂F (z )T
(F (z ) − b) = 0.
∂z
The solution of this equation requires that an expression for
(4.29)
∂F (z )T
∂z
be derived, and
this is obtained from the definition of F (z ) in (4.27), from which it follows that


T
 r (d + δd ) − 1




F (z + δz ) =  Cd+1 (u + δu) (d + δd ) 
.


Cd+1 (v + δv) (d + δd )
Since, to lowest order,
Cd+1 (u + δu) (d + δd ) = Cd+1 (u + δu) d + Cd+1 (u + δu) δd
= Cd+1 (u)d + Cd+1 (δu)d + Cd+1 (u)δd
= Cd+1 (u)d + Cd+1 (u)δd + Cm−d+1 (d)δu,
120
and a similar expression is valid for Cd+1 (v + δv) (d + δd ), it follows that


T
 r δd




F (z + δz ) − F (z ) =  Cd+1 (u)δd + Cm−d+1 (d)δu 



Cd+1 (v)δd + Cm−d (d)δv



T
0
0
 r
  δd 



  δu 
= 
C
(u)
C
(d)
0
m−d+1


 d+1



Cd+1 (v) 0
Cm−d (d)
δv
=
∂F (z )
δz.
∂z
The Jacobian matrix J(z )∈ R(2m+2)×(2m−d+2) is therefore given by


T
0
0
 r


∂F (z) 
,
J(z ) =
=
C
(u)
C
(d)
0
d+1
m−d+1


∂z


Cd+1 (v) 0
Cm−d (d)
and thus (4.29) becomes
J(z )T (F (z ) − b) = 0,
(4.30)
which is a set of 2m − d + 2 non-linear equations in 2m + 2 unknowns. It is solved
iteratively by the Gauss-Newton iteration, in which the initial estimates of u0 and v0
are computed in Algorithm 4.3, and the initial estimate of d0 is calculated from the
solution of the least squares problem (4.26).
Equation (4.30) is solved by the Gauss-Newton iteration,
zj+1 = zj − J(zj )† (F (zj ) − b) ,
where
and J(z)† = J(z)T J(z)

−1
121

 d0 



z0 = 
 u0  ,


v0
J(z)T is the left inverse of J(z). It is clear that this itera-
tion requires that J(z j ) have full column rank, and the conditions for the satisfaction
of this condition are stated in the following theorem, which is proved in [53].
Theorem 4.2. If the polynomials u(x) and v(x) are coprime, and rT d 6= 0, then
J(z) has full (column) rank.
The Jacobian matrix J(z ) contains the vector r which must satisfy the condition
rT d 6= 0 for the convergence of the Gauss-Newton iteration, and Zeng [53] recommends
that r=d0 .
4.6.4. The calculation of the threshold
r
Step 3(b) of Algorithm 4.3 requires that the smallest singular value ξj of Wm−j
(f, f (1) )
be measured against a threshold θ in order to determine its numerical rank. Since the
singular values of a matrix are invariant with respect to row and column permutations,
r
it follows that ξj is also the smallest singular value of Wm−j
(f, f (1) ). An expression
for θ in terms of ξj is developed in this section.
The polynomial f (x) is usually not known exactly, and furthermore, roundoff
errors are adequate to cause incorrect results. It is therefore necessary to consider the
theoretically exact polynomials f (x) and f (1) (x), and their perturbed forms f˜ = f˜(x)
and f˜(1) = f˜(1) (x), respectively, where
2 (1) 2
(1)
f
−
f̃
+
f
−
f̃
≤ ǫ2 .
122
In general, f˜(x) and f˜(1) (x) are coprime, and thus if ξ˜j is the smallest singular value
r
˜ f˜(1) ), then ξ˜j ≥ 0, j = 1, . . . , m − 1. It is assumed, however, that f (x)
of Wm−j
(f,
and f (1) (x) have a GCD of degree d, and thus the singular values ξj satisfy (4.24).
Since the Sylvester resultant matrix and its subresultant matrices have a linear
structure, it follows that
Wjr (f˜, f˜(1) ) = Wjr (f, f (1) ) + Wjr (f˜ − f, f˜(1) − f (1) ),
j = 1, . . . , m − 1,
where each subresultant matrix is of order (2m − j) × (2m − 2j + 1). Since there are
m − j and m − j + 1 columns of the coefficients of f (x) and f (1) (x) respectively, it
follows that
2
r ˜
(1)
(1) ˜
Wj (f − f, f − f ) ≤ (m − j + 1)ǫ2 ,
F
where k·kF denotes the Frobenius norm. It is shown in [13], page 428, that if σi (X), i =
1, . . . , min (m, n), are the singular values of X ∈ Rm×n , arranged in non-increasing
order, then
|σk (A + E) − σk (A)| ≤ σ1 (E) = kEk2 ≤ kEkF ,
and thus if
A = Wjr (f, f (1) ),
˜ f˜(1) ),
E = Wjr (f˜ − f, f˜(1) − f (1) ) and A + E = Wjr (f,
then for k = 1, . . . , 2m − 2j + 1,
p
r ˜ ˜(1)
r
(1) σk (Wj (f , f )) − σk (Wj (f, f )) ≤ ǫ m − j + 1.
If j = d + 1, then
√
r
r
(f˜, f˜(1) )) − σk (Wd+1
(f, f (1) )) ≤ ǫ m − d,
σk (Wd+1
(4.31)
(4.32)
r
r
where Wd+1
(f, f (1) ) = Wm−(m−d−1)
(f, f (1) ) ∈ R(2m−d−1)×(2m−2d−1) , and thus its smallr
est singular value is ξm−d−1 . The matrix Wd+1
(f, f (1) ) is non-singular because f (x)
123
and f (1) (x) have a common divisor of degree d, and thus it follows from (4.24) that
r
ξm−d−1 = σ2m−2d−1 (Wd+1
(f, f (1) )) > 0,
and
r
ξ˜m−d−1 = σ2m−2d−1 (Wd+1
(f˜, f˜(1) )).
The substitution of these results into (4.32) for k = 2m − 2d − 1 yields
√
˜
ξm−d−1 − ξm−d−1 ≤ ǫ m − d,
or
√
√
ξm−d−1 − ǫ m − d ≤ ξ˜m−d−1 ≤ ξm−d−1 + ǫ m − d,
r
which relates the smallest singular value of Wd+1
(f, f (1) ), the smallest singular value
r
˜ f˜(1) ), and the noise level ǫ.
of Wd+1
(f,
If j = d, then (4.31) becomes
√
r ˜ ˜(1)
r
(1) σk (Wd (f , f )) − σk (Wd (f, f )) ≤ ǫ m − d + 1,
(4.33)
r
where Wdr (f, f (1) ) = Wm−(m−d)
(f, f (1) ) ∈ R(2m−d)×(2m−2d+1) is singular because f (x)
and f (1) (x) have a common divisor of degree d. It follows that
ξm−d = σ2m−2d+1 Wdr (f, f (1) ) = 0,
and the substitution k = 2m − 2d + 1 in (4.33) yields
√
ξ˜m−d ≤ ǫ m − d + 1,
√
θ = ǫ m − d + 1,
(4.34)
where the threshold θ is specified in Algorithm 4.3. This equation is an upper bound
on the smallest singular value of Wdr (f˜, f˜(1) ).
r
˜ f˜(1) ), j = 1, . . . , m − 1, is constructed and
The sequence of matrices Wm−j
(f,
their smallest singular values ξ˜1 , ξ˜2, . . . , ξ˜m−1 , are computed. When (4.34) is satisfied,
124
there is a possibility that f˜(x) and f˜(1) (x) have an approximate GCD of degree d, and
this is confirmed or rejected by applying the Gauss-Newton iteration. If this yields
an acceptable approximate GCD, then the algorithm terminates. If, however, an
r
acceptable GCD is not obtained, then the next matrix Wm−j−1
(f˜, f˜(1) ) is considered,
and the process repeated. This guarantees that an approximate common divisor of
maximum degree is calculated.
An algorithm for the calculation of the GCD of two polynomials is in Section 7.4
in [53].
4.7
Summary
This section has considered the application of structured matrix methods, and the
partial singular value decomposition of the Sylvester resultant matrix, for the computation of an approximate GCD of two polynomials. It was shown that the method
that uses structured matrices leads to an LSE problem, and three methods for its
solution were considered. It was shown that although the method of weights is frequently used for the solution of this problem, it possesses some disadvantages, which
are not shared by the method based on the QR decomposition.
It was shown that the Sylvester matrix contains an indeterminacy that can be
used to introduce an extra parameter α into the computations, and moreover, it was
shown in the examples that the quality of the computed approximate GCD varies
significantly as α varies. Examples of the computation of approximate GCDs of two
inexact polynomials, expressed in the power and Bernstein bases, were given.
The partial singular value decomposition of the Sylvester resultant matrix yields
initial estimates of the coprime polynomials and the GCD, which are then refined
125
by the method of non-linear least squares. Since it is only required to calculate the
smallest singular value and the associated right singular vector of the subresultant
matrices, a full singular value decomposition is not required. It was shown that
these two quantities can be calculated efficiently using the QR decomposition of the
subresultant matrix, and that a fast update procedure can be used to make the
computations efficient.
Chapter 5
A robust polynomial root finder
It was shown by example in Chapter 1, and theoretically in Chapter 2, that the
accurate computation of the multiple roots of a polynomial in a floating point environment is very difficult because even if the coefficients are known exactly, roundoff
errors are sufficient to cause a multiple root to break up into simple roots. Several
popular methods for the computation of the roots of a polynomial were reviewed in
Section 1.2, and a method due to Uspensky [42], pages 65-68, was discussed in Section 2.5. In this method, the multiplicities of the roots are determined initially, after
which the roots are calculated. This procedure differs from the methods discussed in
Section 1.2 because the roots are calculated directly, without prior knowledge of their
multiplicities.
It is shown in this chapter that the GCD computations described in Chapter 4
enable Uspensky’s algorithm to be implemented, such that the computed roots are
numerically stable. In particular, these GCD computations enable the multiplicity
of each root to be calculated, and initial estimates of the roots of a polynomial are
obtained by solving several lower degree polynomials, all of whose roots are simple.
126
CHAPTER 5. A ROBUST POLYNOMIAL ROOT FINDER
127
These estimates are then refined by the method of non-linear least squares, using
the multiplicities as prior information in this equation in order to obtain numerically
accurate and robust computed roots.
5.1
GCD computations and the multiplicities of
the roots
The multiplicities of the roots of a polynomial are calculated directly from the GCD
computations of a polynomial and its derivative, as shown in Algorithm 2.1 and
Example 2.5. It is seen from this example that the roots of f (x) are equal to the roots
of h1 (x), h2 (x) and h3 (x), where numerically identical roots are grouped together, or
the roots of w1 (x), w2 (x) and w3 (x) are the roots of f (x) with multiplicities 1, 2 and
3 respectively.
Example 5.1. Consider the polynomial f (x) in Example 2.5,
f (x) = x6 − 3x5 + 6x3 − 3x2 − 3x + 2,
whose roots are x = 1 with a multiplicity of 3, x = −1 with a multiplicity of 2, and
a simple root x = 2. It is shown in Example 2.5 that
h1 (x) = x3 − 2x2 − x − 2,
h2 (x) = x2 − 1,
h3 (x) = x − 1,
and the roots of h1 (x) are x = −1, 1, 2, the roots of h2 (x) are x = −1, 1, and the root
of h3 (x) is x = 1. When these roots are grouped together, it is seen that they are
equal to the roots of f (x).
The polynomials w1 (x), w2 (x) and w3 (x) are calculated in Example 2.5 and it is
seen that the roots of wi (x), i = 1, 2, 3, are the roots of multiplicity i of f (x).
128
If the roots of the polynomials hi (x) are used to calculate the roots of f (x), then
numerically identical roots are grouped together, thereby obtaining the multiplicity of
each root and an approximation to its value. Alternatively, the roots of the polynomials wi (x) can be used to calculate the roots of multiplicity i of f (x). The first method
requires a criterion to decide when two numerically distinct roots are theoretically the
same root, and the second method requires that polynomial divisions be performed.
This operation can be performed in a stable manner by the method of least squares,
as shown in Section 4.6.2. It is therefore assumed that one of these methods has
been used to obtain an estimate of each root of f (x), and its multiplicity, and these
estimates are refined using the method of non-linear least squares. The application
of this method to power basis polynomials is described in [54], and its extension to
Bernstein polynomials is described in the next section.
5.2
Non-linear least squares for Bernstein polynomials
It is shown in this section that if initial estimates of the roots of the Bernstein basis
polynomial p(x) are given, and their multiplicities are known, their refinement leads
to a non-linear equation that is solved by the method of non-linear least squares. This
requires an expression for each coefficient of p(x) in terms of its roots.
Example 5.2. Consider the quadratic polynomial
2
p1 (x) = (x − α)2 = − α(1 − x) + (1 − α)x ,
where the term on the right is written as the square of a linear polynomial in the
129
Bernstein basis. The convolution of [−α (1 − α)] with itself yields
−α (1 − α) ⊗ −α (1 − α) = α2 −2α(1 − α) (1 − α)2 ,
which are the coefficients of p1 (x) in the scaled Bernstein basis,
2
2
2
p1 (x) = α (1 − x) + − 2α(1 − α) x(1 − x) + (1 − α) x2 .
The ith coefficient of the Bernstein basis form of p1 (x) is recovered by dividing the
ith coefficient of its scaled Bernstein basis form by 2i , i = 0, 1, 2,
−2α(1−α)
(1−α)2
α2
2
2
→
α −2α(1 − α) (1 − α)
(20)
(21)
(22)
=
α2 −α(1 − α) (1 − α)2 .
Example 5.3. Consider the scaled Bernstein form of the cubic polynomial p2 (x),
p2 (x) = (x − α1 )2 (x − α2 )
2 =
− α1 (1 − x) + (1 − α1 )x
− α2 (1 − x) + (1 − α2 )x
= −α12 α2 (1 − x)3 + α1 (α1 + 2α2 − 3α1 α2 )(1 − x)2 x
−(1 − α1 )(2α1 + α2 − 3α1 α2 )(1 − x)x2 + (1 − α1 )2 (1 − α2 )x3 .
The convolution
α12 −2α1 (1 − α1 ) (1 − α1 )2
⊗
−α2 (1 − α2 )
,
is equal to the vector of coefficients of the scaled Bernstein basis form of p2 (x), and
the Bernstein basis form of this polynomial is
α1 (α1 + 2α2 − 3α1 α2 )
2
3
p2 (x) = −α1 α2 (1 − x) +
3(1 − x)2 x
3
(1 − α1 )(2α1 + α2 − 3α1 α2 )
−
3(1 − x)x2 + (1 − α1 )2 (1 − α2 )x3 ,
3
130
where the coefficients are obtained by dividing the ith scaled Bernstein basis coeffi
cient by 3i , i = 0, . . . , 3,
α
(α
+2α
−3α
α
)
(1−α
)(2α
+α
−3α
α
)
1
1
2
1
2
1
1
2
1
2
−α12 α2
−
(1 − α1 )2 (1 − α2 ) .
3
3
These two examples show that each coefficient of a Bernstein basis polynomial
can be expressed as the repeated convolution of linear Bernstein basis polynomials,
followed by division by the combinatorial coefficients. Pseudo-code for this repeated
convolution is in [54], and it can be reproduced for Bernstein basis polynomials, with
the division of each term in the final coefficient vector by a combinatorial factor, as
shown in Examples 5.2 and 5.3.
The Bernstein basis form of p(x) that has r distinct roots is
m
X
m
p(x) =
ci
(1 − x)m−i xi
i
i=0
= k
r
Y
(x − αi )li
i=1
m
X
m
gi (z )
(1 − x)m−i xi ,
i
i=1
= k
= k
r Y
i=0
li
− αi (1 − x) + (1 − αi )x
where the root αi has multiplicity li ,
T
z = α1 · · · αr
,
and
(5.1)
r
X
(5.2)
li = m,
i=1
m
X
m
i
k = (−1)
(−1) ci
,
i
i=0
m
which is obtained by considering the coefficient of xm in the power basis form of p(x).
Each coefficient gi (z ) in (5.2) is obtained by convolution followed by division by mi ,
131
as shown in Examples 5.2 and 5.3.
It follows from (5.1) and (5.2) that the distinct roots αi can be determined from
the coefficients ci by solving the over-determined non-linear equations
kgi (α1 , . . . , αr ) = ci ,
i = 0, . . . , m,
which is a set of (m + 1) equations in r unknowns. These equations can be written
as G(z) =p, where G(z), p∈ Rm+1 ,

 kg0 (α1 , . . . , αr )

..

.


kgm (α1 , . . . , αr )
and thus the minimisation problem is
minm
z ∈C


 
 
=
 
 
c0
..
.
cm



,


1
kkG(z ) − pk2 ,
2
which is identical to the minimisation problem considered in Section 4.6.3. The
stationary points of
1
2
kkG(z ) − pk2 are the solutions of
J(z )T [kG(z ) − p] = 0,
(5.3)
where the Jacobian matrix J(z ), which is of order (m + 1) × r, is


∂g0 (z )
∂g0 (z )
∂g0 (z )
·
·
·
∂α2
∂αr
 ∂α1


 ∂g (z ) ∂g (z )
k∂g1 (z ) 
1
 1
···

 ∂α1
∂α2
∂αr
J(z ) = k 
.
.
.
.
.


.
.
.
.
.
.
.
.




∂gm (z )
∂gm (z )
∂gm (z )
·
·
·
∂α1
∂α2
∂αr
The Gauss-Newton iteration for the solution of (5.3) is
z j+1 = z j − J(z j )† [kG(z j ) − p] ,
j = 0, 1, . . . ,
(5.4)
where the initial estimates z 0 of the roots are calculated by Uspensky’s method.
It is clear that the iteration (5.4) is only defined if the left inverse J(z )† of J(z )
132
exists, which requires that J(z ) have full column rank. It is shown in [54], for
power basis polynomials, that this condition is necessarily satisfied because the roots
αi , i = 1, . . . , r, are distinct, and the proof is easily extended to Bernstein basis polynomials. This paper also contains examples of the application of this method to the
computation of multiple roots of a polynomial.
5.3
Summary
This section has considered the implementation of Uspensky’s algorithm for the calculation of the roots of a polynomial. The method relies extensively on the GCD
computations discussed in Chapter 4 because they are used to obtain initial estimates of the multiplicities of the roots. This reduces the computation of the multiple
roots of a polynomial to the computation of the simple roots of several polynomials.
These simple roots are then refined by the method of non-linear least squares, using
the calculated multiplicities as prior information to obtain improved solutions.
Chapter 6
Minimum description length
An important part of the polynomial root solver described in Chapter 5 is the GCD
calculations that are used to obtain the multiplicities of the roots, which are then
used as prior knowledge in the solution of the non-linear equation that is used for the
refinement of the initial estimates of the roots. The calculation of the multiplicity of
each root requires that the numerical rank of a matrix be determined, which necessitates that a threshold be set, below which the singular values of the matrix can be
assumed to be zero. This threshold is dependent upon the amplitude of the noise in
the coefficients of the polynomial, but this may not be known, or it may only be known
approximately. It is therefore desirable to estimate the rank of a noisy matrix, assuming that this noise amplitude is not known. This section describes the principle of
minimum description length (MDL), which is an information theoretic measure that
provides an objective criterion for the selection of one hypothesis from a collection of
hypotheses in order to explain or model a given set of data [14, 36, 37, 38, 41].
133
CHAPTER 6. MINIMUM DESCRIPTION LENGTH
6.1
134
Minimum description length
Every set of data can be represented by a string of symbols from a finite alphabet,
which is usually the binary alphabet, but tertiary and higher alphabets can be used.
The fundamental idea of the principle of MDL is that any regularity in a given set of
data can be used to compress it, that is, the data can be described by fewer symbols
than are needed to describe it literally. Grünwald [14], pages 6-7, considers three long
sequences of bits that differ in their regularity:
• The first example consists of the repetition of the sequence of 0001 2500 times.
The entire sequence, which is 10000 bits long, is therefore highly regular and it
can be compressed significantly, that is, a short code is required to describe the
entire sequence.
• The second example of 10000 bits consists of the outcomes of tossing a fair
coin, and it is therefore totally random. The bit sequence does not contain any
regularity, and it cannot therefore be compressed.
• The third example contains elements of the first and second examples because
this bit stream contains about four times as many 0s as 1s, and deviations
from this regularity are statistical rather than deterministic. Compression of
this data is possible, but some information will be lost. The length of the code
required to describe this bit stream is therefore between the lengths of the codes
required for the two examples above.
The application of the principle of MDL requires that the data be coded, such that
the regularity of the data can be quantified. In particular, highly regular data requires
a short code length (bit stream), and the code length increases as the randomness
135
of the data increases, such that totally random data cannot be compressed. This
is closely related to the complexity theory developed by Kolmogorov, which is the
fundamental idea of the theory of inductive inference developed by Solomonoff. The
complexity theory of Kolmogorov does not provide a practical method for performing
inference because it cannot be computed, and the principle of MDL can be considered
as an attempt to modify Kolomogorov’s complexity theory, such that the revised
theory is amenable to practical implementation.
The principle of MDL requires that several hypotheses for a set of data be postulated, and it selects the hypothesis that compresses the data the most, that is,
requires the fewest bits for its description. The compressive measure of each hypothesis is equal to the sum of the code lengths, in bits, of encoding the data using the
hypothesis, and then decoding the encoded data, that is, estimating the error between
the actual data and the data calculated by the model.
Example 6.1. Consider Figure 6.1, which shows a set of points through which it is
required to fit a polynomial. Figure 6.1(a) is an approximation curve that is obtained
with a third order polynomial. It is relatively simple, but the error between it and
the data points is small. By contrast, the polynomial curve in Figure 6.1(b) is of
higher order and follows the exact fluctuations in the data, rather than the general
pattern that underlies it, and Figure 6.1(c) is too simple because it does not capture
the regularities in the data.
Example 6.1 illustrates the general point that a very good fit is obtained if a high
degree polynomial (that is, a complex model) is fitted through a set of data points,
and a poor fit is achieved by the simple linear approximation of these points. The
third order polynomial achieves a compromise because the complexity is sufficient to
136
1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
(a)
1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
(b)
1
0
0
1
(c)
Figure 6.1: (a) A third order approximation polynomial, (b) a fifth order interpolation
curve, and (c) a linear approximation, of a set of data points.
capture the regularity in the data, but the errors are small. Furthermore, if these three
curves are tested against another set of data that originates from the same source,
the fit for the fifth order polynomial and linear curves will be poor. By contrast, the
errors between the third order polynomial and this new data set will be small.
The complexity of a model in the principle of MDL is measured by the length, in
bits, of its description. This measure of complexity and Example 6.1 suggest that the
‘best’ model among a given collection of models is the one that yields the shortest
description of the model and the data:
• The length of the model: The length L(Hi ) of the model Hi in the collection
of all the models H = {H1 , H2 , . . . , HM } is defined as the code length of its
encoded form.
• The length of the data: The length L(D| Hi) is equal to the code length of the
data D using the model Hi .
137
With reference to Example 6.1, M = 3 because there are three models:
• H1 : Cubic polynomial approximation
• H2 : Fifth order polynomial interpolation
• H3 : Linear polynomial approximation
and
• L(H1 ) is the length of encoding the number of coefficients (‘4’), and their values.
The length of description of each model (hypothesis) L(D| Hi) is a measure of the
goodness-of-fit of the model Hi . Thus:
• L(D| H1 ) is small because the error between the approximating curve and data
points is small.
• L(D| H2 ) is zero because the curve passes through the data points.
• L(D| H3 ) is large because the error between the straight line and data points is
large.
It is seen that L(H2 ) + L(D| H2 ) is dominated by L(H2 ) (the complexity of the
model), and L(H3 ) + L(D| H3 ) is dominated by L(D| H3 ) (the large errors in the
reconstruction of the data). The principle of MDL states that the ‘best’ hypothesis
is the one for which the length L(Hi ) + L(D| Hi), i = 1, 2, 3, is a minimum, and for
the data in Example 6.1,
L(H1 ) + L(D| H1 ) < L(H2 ) + L(D| H2 ), L(H3 ) + L(D| H3 ),
138
that is, the model H1 would be selected by the principle of MDL. The cubic approximation provides a compromise between the complexity of the model and the
reconstruction errors of the data, and it therefore avoids the problem of overfitting,
which can be a problem in regression if care is not taken.
This simple example shows that the principle of MDL is consistent with the principle of parsimony or Occam’s Razor if the parsimony of a model is interpreted as its
length:
Choose the model that gives the shortest description of the data
It is clear that the principle of MDL requires that the length of description of a model
be quantified, and the following examples show that this description is closely related
to the Shannon entropy and the average length of a coded message.
6.2
Shannon entropy and the length of a code
This section shows that the Shannon entropy of a binary string allows an expression
for the lengths (number of bits) of a model L(H), and the data given the model
L(D| H), to be quantified.
Let
x=
x1 x2 . . . xN
,
be a string of symbols from a finite alphabet X . An efficient coding scheme for these
symbols requires that symbols that occur frequently have shorter code lengths than
symbols that occur rarely. If lk = l(xk ) is the length of the code for the symbol
xk ∈ X , and pk = p(xk ) is the probability of occurrence of xk , then the average length
139
of a code is
E {L(x)} =
X
li pi .
i
The Shannon entropy of the code is
X
H(p(x)) = −
pi log pi ,
i
log ≡ log2 ,
and a standard result in information theory states that if the code is a prefix (instantaneous) code, that is, the codeword for xk is not the prefix of the codeword for xl ,
k 6= l, then the Kraft inequality
N
X
k=1
2−l(xk ) ≤ 1,
(6.1)
is satisfied [30], pages 94-97. In this circumstance,
H(p(x)) ≤ E {L(x)} ≤ H(p(x)) + 1,
from which it follows that the Shannon entropy is a lower bound for the average
length of a prefix code. The minimum value of the average length of a prefix code
occurs, therefore, when the length of the code for the symbol xk is
lk = − log pk .
(6.2)
This is an intuitively appealing result because it states that a symbol that occurs
more frequently (a large value of pk ) has a shorter code length than a symbol that
occurs less frequently (a small value of pk ). It is clear that lk is an integer only if the
probability pk is of the form 2−q , q > 0, which cannot be guaranteed in practice. If it
is required to construct a code for the symbols in X , then lk is the smallest integer
that is equal to or larger than − log pk .
Example 6.2. The code length of a deterministic integer j ∈ {0, 1, . . . , N − 1} is
approximately equal to log N. This result follows from (6.2) by assuming that each
140
of the N integers has an equal probability of occurring.
Example 6.2 considers the code length of an integer that lies in a defined range,
and thus an upper bound on the length can be specified. The situation is more
complicated when the magnitude of the integer is not known, and this is considered
in Example 6.3.
Example 6.3. Consider the situation that occurs when it is required to transmit
the binary representation of a natural number n > 0 of unknown magnitude. If this
number is followed by other numbers whose binary forms are to be transmitted, then
it is necessary to mark the junction between them. One possible way to achieve
this is to precede each binary representation by its length, in bits, and transmit this
number, in addition to the binary representation of the number. For example, if it is
required to transmit the number n = 101101000010110, then the code 1111, which is
the binary representation of 15, the length (number of bits) of n, is transmitted, and
thus the actual bit stream transmitted is s = 1111101101000010110. This does not
solve the problem, however, because the end of the bit stream that represents log n,
and the beginning of the bit stream that represents n, must be defined. Furthermore,
the string s will, in practice, be preceded and followed by other bit streams, and the
entire bit stream can be decoded if the codes for (a) n and log n, and (b) the codes
for successive integers, can be distinguished.
In order to solve this problem, Rissanen [36] proposed that log log n bits, that is,
the length of the binary representation of log n, precede the log n bits, which precede
the n bits. This process is repeated, so that the total code length of n > 0 is
L∗ (n) = log∗ n + log c0
= log n + log log n + log log log n + · · · + log c0 ,
(6.3)
141
where the sum only involves positive terms, and log n is rounded up to the nearest
integer.1 The constant c0 ≈ 2.865064 is added to make sure that the Kraft inequality
(6.1) is satisfied with equality,
∞
X
2−L
∗ (j)
= 1.
(6.4)
j=1
If n can take both positive and negative values, then this result is generalised to [41],


 1
if n = 0
∗
L (n) =
(6.5)

 log∗ |n| + log 4c0
otherwise,
where log∗ n is defined in (6.3), and the Kraft inequality (6.4) is replaced by
∞
X
∗
2−L (j) = 1,
j=−∞
because the summation is taken over all negative and positive integers. In particular,
∞
−1
∞
X
X
X
1
∗
−L∗ (j)
−L∗ (j)
2
=
2
+
2−L (j) +
2
j=−∞
j=−∞
j=1
= 2
= 2
=
1
2
∞
X
j=1
∞
X
2−L
∗ (j)
2− log
j=1
∞
X
∗
2− log
j=1
+
1
2
j−log 4c0
∗
j−log c0
+
1
2
+
1
2
from (6.5)
= 1 from (6.3) and (6.4),
as required.
The coding induced by (6.3) and (6.5) is called Elias omega coding, and it can
be used to code and decode the natural numbers. The following procedure is used to
code a natural number using this scheme:
1
It is necessary to round log n up to the nearest integer because, for example, 3 bits are required
to represent 7 in binary and log 7 ≈ 2.807, and 4 bits are required to represent 8 in binary, but
log 8 = 3.
Integer
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
142
Elias omega code
0
10 0
11 0
10 100 0
10 101 0
10 110 0
10 111 0
11 1000 0
11 1001 0
11 1010 0
11 1011 0
11 1100 0
11 1101 0
11 1110 0
11 1111 0
10 100 10000 0
10 100 10001 0
Table 6.1: The Elias omega code for the integers 1-17.
1. Place the bit 0 at the end of the representation.
2. IF the number to be encoded is 1, STOP.
ELSE Place the binary representation of the number, as a string, at the beginning of the representation.
3. Repeat the previous step, using one digit less than the number of digits just
written, as the new number to be encoded.
Example 6.4. The binary representation of 14 is 1110, and the length of this string
is 4. The output of Stage 2 is therefore 1110 0, where the space is included for ease
of reading, but it is not transmitted. Since 4 digits have been added, it follows that
143
the binary representation of 4 − 1 = 3 must be added at the beginning of 1110 0,
thus yielding the string 11 1110 0. Since 2 digits have been added, the next stage
requires that the binary representation 2 − 1 = 1 be placed at the beginning of the
string. It follows from the IF statement that the algorithm terminates, and thus the
Elias omega code of 14 is 11 1110 0.
The Elias omega codes for the integers 1-17 are shown in Table 6.1, and it is seen
that the code for 1 is an exception because it is the only integer whose code starts
with 0.
The following examples show how to decode an integer that has been coded using
the Elias omega code.
Example 6.5. It is seen from Table 6.1 that the integer to be decoded is 1 if the first
bit is 0, and the integer is greater than 1 if the first bit is 1.
1. Consider the integer represented by the string 1111010.
Since the first bit is 1, the integer cannot be 1. The first two bits 11 are therefore
considered, and since this is the binary representation of 3, the next 3 + 1 = 4
bits, that is, 1101 are considered. This is the binary representation of 13, and
since the next bit in the string is 0, the decoding procedure is terminated. It
follows that the integer represented by the string 1111010 is 13.
2. Consider the integer represented by the string 10100100010.
It is clear that the integer cannot be 1, and thus the first two bits 10 are
considered. This is the binary representation of 2, and hence the next 2 + 1 =
3 bits, that is, 100 are considered. This is the binary representation of 4,
and thus the next 4 + 1 = 5 bits, that is, 10001 are considered. This is the
144
binary representation of 17, which is the desired integer because the next bit
in the string is 0, which is the end of string marker. It follows that the integer
represented by the string 10100100010 is 17.
It is clear from this example that the termination bit 0 enables a long string that
represents the concatenation of several integers to be decoded.
Equation (6.2) applies to a discrete random variable, and its extension to a continuous random variable θ is obtained by discretisation. In particular, consider the
situation that occurs when θ ranges over a subset A of the real line, in which case
A can be discretised into a finite number of intervals, which enables the result for a
discrete random variable to be applied. This is considered in the next example, where
the discretisation is performed in Rk , thereby yielding a cell of dimension k.
Example 6.6. Consider a probability density function p(θ) = p(θ1 , . . . , θk ) of k continuous random variables θ1 , . . . , θk , where θ ∈ Rk . The probability that the random
variable θ lies between θ = Θ and θ = Θ + δΘ is equal to
p(Θ)
k
Y
δΘi ,
i=1
and thus
− log p(Θ)
k
Y
i=1
δΘi
!
= − log p(Θ) −
k
X
log δΘi .
(6.6)
i=1
This equation is obtained by considering a cell of side lengths δΘi > 0 and centred
at θ = Θ.
The exact code for a deterministic real variable v ∈ R generally requires an infinite
number of bits, which must be compared with the finite number of bits that are
required for the code of an integer. Truncation is therefore necessary for the binary
representation of v, and (6.6) allows the code length L(vǫ ) for v to precision ǫ > 0 to
145
be defined as
L(vǫ ) = L∗ ([v]) − log ǫ,
(6.7)
where vǫ is the value of v truncated to precision ǫ, that is, |v − vǫ | < ǫ, [v] is the
integer nearest to v, and L∗ (j) is defined in (6.3) for j > 0, and in (6.5) for j ∈ Z.
The second term on the right hand side of (6.7) is the number of fractional bits of
the error due to truncation.
Example 6.7. [41] Consider a discrete degradation model of exact data f ∈ RN by
an error vector e ∈ RN , such that only noisy data d ∈ RN is available,
d = f + e,
e ∼ N (0, s2 I).
(6.8)
It is required to estimate the exact signal f , given noisy data d, and an algorithm
that solves this problem involves the construction of a library L = {B1 , B2 , . . . , BM }
of bases. If it is assumed that the unknown signal f can be decomposed exactly by
k < N elements of one basis Bm , then
f = Wm a(k)
m ,
(6.9)
(k)
where Wm ∈ RN ×N and am ∈ RN is the vector of coefficients in the basis Bm . It is
noted that the correct integer k and correct basis Bm are not known, and that (6.9)
is a chosen model of the exact data f , rather than a physical model that can be used
to explain it. The estimation problem has been transformed into a model selection
problem, where the models are defined by the bases in the library L, and the number
of terms in each basis, assuming additive white Gaussian noise of known variance.
Data compression requires that k be as small as possible, but it is also required
to minimise the distortion between the noisy and exact signals by choosing the most
suitable basis, and this distortion usually decreases as k increases. There therefore
146
exists a conflict between data compression and data reconstruction, and the method
of MDL allows this conflict to be resolved.
6.3
An expression for the code length of a model
The examples in Section 6.2 allow the principle of MDL to be stated more clearly.
Specifically, let H = {Hm : m = 1, 2, . . . } be a collection of models, where the integer
m is the index of a particular model, and let x be a data vector of length N. It is
assumed that the true model that generated the data x is not known.
The code length for the selection of best model from H is
L(x, θm , m) = L(m) + L(θm | m) + L(x| θm , m)
km
X
L(θm,j | m) + L(x| θm , m),
= L(m) +
(6.10)
j=1
where θm ∈ Rkm is the parameter vector of the model Hm . It is seen that the total
code length is composed of the sum of three terms:
• The code length of the index m
• The code length of the model Hm , given m
• The code length of the reconstructed data x using the model Hm
The second term on the right hand side of (6.10) must be replaced by the code length
of θ, truncated to precision δθm , as given in (6.7), and thus (6.10) is written as
L(x, θm , m) = L(m) +
km
X
i=1
L∗ ([θm,i ]) −
km
X
log δθm,i + L(x| θm , m).
(6.11)
i=1
As each component δθm,i increases, corresponding to a coarser precision, the third
term on the right hand side decreases, but L(x| θm , m) increases because the truncated
147
parameter vector can deviate more from its optimal non-truncated value. It therefore
follows that it is necessary to determine the optimal value of the precisions δθm,i , such
that the total code length is minimised. This calculation can be simplified by noting
that the precision is independent of the index m, that is, it can be assumed that there
is only one model in the set H, in which case (6.10) and (6.11) reduce to
L(x, θ) = L(θ) + L(x| θ),
(6.12)
and
L(x, θ) =
k
X
i=1
∗
L ([θi ]) −
k
X
log δθi + L(x| θ),
i=1
respectively. The calculation of the precision δθ that minimises the total code length
is considered in the next section.
6.3.1
The precision that minimises the total code length
Let θ = α be the value of θ that minimises the code length L(x, θ), which is defined
in (6.12). As shown above, this optimal value must be truncated, and thus let θ̄ be
the truncated value, to precision δα, of α. The parameter vectors θ̄ ∈ Rk and α ∈ Rk
are related by
θ̄ = α + δα,
and it is required to determine δα such that the code length L(x, θ̄) is minimised.
Consider the Taylor expansion of L(x, θ̄) about α,
k
k
X
∂L(x, θ) 1X
∂ 2 L(x, θ) L(x, θ̄) = L(x, α) +
+
δαi
δαj
δαi
∂θ
2
∂θ
∂θ
i
i
j
θ=α
θ=α
i,j=1
i=1
3
+O kδαk
k
1X
∂ 2 L(x, θ) 3
= L(x, α) +
δαi
δα
+
O
kδαk
,
j
2 i,j=1
∂θi ∂θj θ=α
148
since α is the optimal non-truncated value of θ. It therefore follows from (6.7) that
up to and including second order terms,
k
X
1
L(x, θ̄) = L (x, [α]) + δαT Γδα −
log δαi ,
2
i=1
(6.13)
where Γ ∈ Rk×k is the matrix of the second derivatives of the code length L(x, θ)
evaluated at θ = α and L (x, [α]) is the code length of L(x, α) when α is evaluated to
precision δα. The term on the right hand side of (6.13) is a minimum when
T
1
Γδα = β,
β = δα1 δα1 . . . δα1
,
δα
1
2
k−1
(6.14)
k
and thus the optimal truncation parameters are δα = Γ−1 β. Equation (6.13) therefore
yields
k
L(x, θ̄)
min
k X
≤ L (x, [α]) + −
log δαi ,
2 i=1
(6.15)
where an inequality has been used because the truncations δαi that minimise the
code length L(x, θ̄) must still be determined. It is now shown that if the data vector
x is sufficiently long, that is, N ≫ 1, then an expression for an upper bound of the
minimum of L(x, θ̄) can be obtained.
It follows from (6.12) that
L (x, [α]) = L([α]) + L (x| [α]) ,
and thus (6.15) becomes
L(x, θ̄)
min
k
k X
≤ L([α]) + L (x| [α]) + −
log δαi .
2 i=1
(6.16)
This expression for the upper bound of the code length contains two parameters, α
and δα, whose values must be determined. Consider initially the value of α, after
which the form of δα for large values of N will be determined.
The value of α must be calculated from the data x, and it is usual to select
149
the maximum likelihood (ML) estimate of θ, or an estimate based on a Bayesian
procedure. The ML estimate is used in this work, and thus α = θ̂.
The second term on the right hand side of (6.16) can be approximated by a simpler
expression. In particular, this term is the code length of the reconstruction of the
data x from [θ̂], the truncated form of the ML estimate of the parameter vector. In
practice, however, it is rarely required to obtain [θ̂], and its non-truncated form, up
to machine precision, is adequate because the likelihood surface is usually smooth.
The second term on the right of (6.16) is therefore written as
− log p(x|θ̂) − N log δd ,
δd ≈ 10−16 ,
where δd is the machine precision. The term N log δd is constant for all models in
H and it can therefore be omitted from the expression for the upper bound of the
minimum of L(x, θ̄), and thus (6.16) can be written as
k
L(x, θ̄)
min
k X
≤ L([θ̂]) + L(x|θ̂) + −
log δ θ̂i .
2 i=1
(6.17)
Let q(θ) be the prior probability density function of the parameter vector θ. Typically,
it is obtained from training data, and thus it is independent of the data x. It therefore
follows that (6.17) becomes
k
k X
L(x, θ̄ǫ ) min ≤ − log q([θ̂]) − log p(x|θ̂) + −
log δ θ̂i
2 i=1
k
k X
= − log p(x|θ̂)q([θ̂]) + −
log δ θ̂i
2 i=1
k
k X
= − log p(x|θ̂)q(θ̂) + −
log δ θ̂i ,
2 i=1
because, as noted above, the non-truncated value of θ̂ is usually adequate.
150
Since
it follows that
L(x, θ̂) = L(θ̂) + L(x| θ̂) = − log p(x|θ̂)q(θ̂) ,
∂ 2 log (p(x| θ)q(θ)) ∂ 2 L(x, θ) Γij =
=−
,
∂θi ∂θj θ=θ̂
∂θi ∂θj
θ=θ̂
(6.18)
where Γ is defined in (6.13). Since
p(x, θ) = p(x| θ)q(θ),
(6.19)
where p(x, θ) denotes the joint probability density function of θ and x, it follows that
if the reconstruction errors of the data are independent, then
p(x| θ) =
N
Y
i=1
and thus
− log p(x, θ) = −
N
X
i=1
p(xi | θ),
(6.20)
log p(xi | θ) − log q(θ).
Differentiation of both sides of (6.19) with respect to θ and the evaluation of the
derivatives at θ = θ̂, followed by division by N, yields
1 ∂ 2 log p(x, θ) 1 ∂ 2 log p(x| θ) 1 ∂ 2 log q(θ) −
=−
−
.
N
N
∂θi ∂θj
N ∂θi ∂θj θ=θ̂
θ=θ̂
The second term on the right decreases to zero as N → ∞ because q(θ) is independent
of N, and thus if N is sufficiently large,
∂ 2 log p(x, θ) ∂ 2 log p(x| θ) ≈
.
It follows from (6.20) that
− log p(x| θ) = −
N
X
i=1
log p(xi | θ) ≈ −N (E {log p(x| θ)}) ,
(6.21)
151
where the expectation is taken with respect to x, and thus
∂ 2 log p(x| θ) ∂ 2 (E {log p(x| θ)}) −
≈ −N
= µij N,
∂θi ∂θj
θ=θ̂
where µij is a finite constant that is independent of N. Equations (6.18) and (6.21),
and this approximation, therefore yield
∂ 2 L(x, θ) ∂ 2 log p(x, θ) ∂ 2 log p(x| θ) Γij =
=−
≈−
≈ µij N,
for i, j = 1, . . . , k, which implies that the elements of the matrix Γ/N are independent
of N.
This result is used in (6.14) to obtain an approximate expression for the elements
of the vector δ θ̂. In particular, this equation can be written as
k X
1
Γij
δ θ̂j =
,
i = 1, . . . , k,
N
N
δ
θ̂
i
j=1
which is approximated by
N
k
X
j=1
µij δ θ̂j =
1
δ θ̂i
,
i = 1, . . . , k.
Since the constants µij and k are independent of N, it follows that
where ci ≪
√
ci
δ θ̂i ≈ √ ,
N
ci = ci (µij , k),
i, j = 1, . . . , k,
N because δ θ̂i ≪ 1. Equation (6.17) therefore becomes
L(x, θ̄)
k
min
X
k k
≤ L([θ̂]) + L(x|θ̂) + + log N −
log ci ,
2 2
i=1
and since it is assumed that N is large, this inequality is simplified to
k
L(x, θ̄)min ≤ L([θ̂]) + L(x|θ̂) + log N,
2
(6.22)
which is an expression for the upper bound of the shortest code length with which long
data sequences can be encoded. The last term on the right states that each parameter
√
is encoded to precision 1/ N, and it is noted that this result is in agreement with the
152
distribution of the sample mean from a population that has a normal distribution,
σ
σx̄ = √ ,
N
√
that is, the standard deviation of the sample mean σx̄ varies as 1/ N with the
population standard deviation σ, where N is the size of the sample.
The derivation of (6.22) assumes that the collection H contains only one model,
and it is therefore necessary to extend this expression when there are several models.
It follows from (6.11) and (6.22) that the expression for the minimum code length is
L(x, θ̂m , m) = L(m) +
km
X
L([θ̂m,j ]) + L(x| θ̂m , m) +
j=1
km
log N,
2
(6.23)
where the model θm has km parameters. The minimum value of this expression yields
the best compromise between the low complexity of the model and the high likelihood
of the data.
The code length L(m) = − log p(m) is the probability of selecting the mth model,
and p(m) should reflect prior information about the models, such that models that
are more likely to describe the data should be assigned a higher value of p(m) than
other models. If this prior information does not exist, then it is assumed that all
models are equally likely, and the uniform distribution should be used.
The following two points are noted:
• Even if the collection H of models does not include the correct model, the
principle of MDL achieves the best result among the available models.
• It is not claimed that the principle of MDL computes the absolute minimum
description of the data. Rather, it requires an available collection of models
and provides a criterion for selecting the best model from the collection. This
must be compared with the Kolmogorov complexity, which provides the true
153
minimum description of the data, but it cannot be computed.
6.4
Examples
This section contains several examples that show the practical application of the
principle of MDL for the solution of some common problems in applied mathematics.
Example 6.8. [41] Consider N data points (xi , yi ) ∈ R2 through which it is required
to fit a polynomial. It is clear that the maximum degree of an approximating polynomial is N − 1, and thus the class of models is the set of polynomials of orders
{0, 1, . . . , N − 1}. The parameters of the mth model, m = 0, . . . , N − 1, are the
coefficients of a polynomial of degree m, θm = [a0 , a1 , . . . , am ]. It is assumed that the
data is corrupted by Gaussian white noise of zero mean and standard deviation s2 ,
and it is required to compute the polynomial f , where
yi = f (xi ) + ei ,
ei ∼ N (0, s2).
The discussion above shows that it is first necessary to compute the code length of
the mth model, that is, the code length of the m + 1 coefficients of a polynomial of
degree m. The ML estimates of these coefficients are
â = â0 , â1 , . . . , âm ,
and since the noise is composed of independent random samples, the ML estimates
of these coefficients are equal to the least squares estimates. It is assumed that the
independent variables xi , i = 1, . . . , N, and the noise variance s2 are known to the
encoder and decoder, and they need not therefore be transmitted.
If a polynomial of degree N − 1 were used to interpolate the data, then the fit
would be perfect (no reconstruction errors) but information would not have been
154
gained because compression has not occurred. The other extreme occurs when the
approximating polynomial is the constant polynomial. Although this is the simplest
model and it permits maximum compression, a large number of bits are required to
quantify the reconstruction errors, unless the underlying data is constant, in which
case the reconstruction errors are zero.
Consider the situation that occurs when prior information on the degree m of the
approximating polynomial is not available, in which case
L(m) = − log p(m) = log N.
The error in the ith data point, i = 1, . . . , N, is
m
X
ei = yi −
âj xji ,
j=0
and thus its probability density function is
√
1
2πs2
exp −
e2i
2s2
=√
1
2πs2
  yi −
exp −
Pm
j
j=0 âj xi
2s2
2 

.
Since there are N data points and the errors are assumed to be independent, the joint
probability density function p(e) of these errors is

2 
PN Pm
j
N
Y
i=1 yi −
j=0 âj xi
1
e2
1


exp − i2 =
,
N
N exp −
2
2s
2s
(2πs2 ) 2 i=1
(2πs2 ) 2
and thus the code length of the reconstructed data is − log p(e),
!2
N
m
X
X
N
log
e
j
− log p(e) =
log(2πs2 ) +
yi −
âj xi .
2
2s2 i=1
j=0
155
It follows that the expression (6.23) for the total code length is
m
X
(m + 1)
N
L(y, θm , m) = log N +
L∗ ([âj ]) +
log N + log(2πs2 )
2
2
j=0
!2
N
m
X
log e X
+ 2
yi −
âj xji ,
2s i=1
j=0
and the principle of MDL requires that the ‘best polynomial’ is obtained by choosing
the degree m∗ that minimises this expression. Since N is constant and it is assumed
that the variance s2 is known to the encoder and decoder, the first and fourth terms
are constant, and they can therefore be neglected in the minimisation.
The next example extends Example 6.7 by considering the formulae for the code
lengths of the index m, the model Hm given m, and the reconstruction of the data x
using the model Hm .
Example 6.9. Consider an encoder and decoder for the data in Example 6.7, in
which the library L consists of M orthogonal bases. Given the integers k and m in
(6.9), then
1. The encoder expands the data d in the basis Bm .
2. The number of terms k, the specification of the basis indexed by m, the k expansion coefficients, the variance s2 of the Gaussian noise, and the reconstruction
errors are transmitted to the decoder.
3. The decoder receives this information, in bits, and attempts to reconstruct the
data d.
The total code length to be minimised is expressed as the sum of the following code
lengths:
156
• The natural numbers k and m
(k)
• The (k + 1) real parameters am and s2 , given k and m
(k)
• The deviations of the observed data d from the estimated signal f = Wm am ,
(k)
given k, m, am and s2 .
The approximate total code length is therefore given by
(k) ˆ2
(k) ˆ2
ˆ2
L(d, âm , s , k, m) = L(k, m) + L âm , s | k, m + L(d| â(k)
m , s , k, m),
(6.24)
(k)
(k)
where âm and sˆ2 , the ML estimates of am and s2 respectively, are now derived.
Since it is assumed that the noise is white and Gaussian, it follows from (6.8) that
the probability density function of the data, given all the model parameters, is
 2 
(k) 1
 d − Wm am 
2
exp
p(d| a(k)
,
s
,
k,
m)
=
−
,
N
m
2s2
(2πs2 ) 2
and thus the loglikelihood of this density function is
2
(k) d
−
W
a
m
m
N
2
2
ln
2πs
−
.
ln p(d| a(k)
,
s
,
k,
m)
=
−
m
2
2s2
(6.25)
Differentiation of both sides of this expression with respect to s2 yields the ML estimate sˆ2 of s2 ,
sˆ2 =
2
(k) d − Wm am N
,
and thus the loglikelihood expression (6.25) becomes
 2 
(k) 2π d − Wm am  N
ˆ2 , k, m) = − N ln 
ln p(d| a(k)
,
s

− .
m
2
N
2
(6.26)
(6.27)
Since Wm is orthogonal, the vector of expansion coefficients of d in the basis Bm is
157
equal to d˜m = WmT d, and thus
2
2 ˜
T
(k) 2
(k) d − Wm a(k)
=
W
W
d
−
a
=
d
−
a
m
m
m
m
m
m .
It follows from this equation that the loglikelihood expression (6.27) is maximised
2
(k) (k)
when d˜m − am is minimised, and since am contains exactly k non-zero elements,
this minimum occurs when these k elements are equal to the k largest coefficients in
(k)
magnitude of d˜m . The ML estimate âm is therefore given by
(k) ˜
(k)
T
â(k)
m = Θ dm = Θ Wm d,
where Θ(k) ∈ RN ×N is a threshold matrix that retains the k largest elements of d˜m ,
and sets the other elements equal to zero. The substitution of this expression for the
(k)
ML estimate of am into (6.26) yields
(I − Θ(k) )W T d2
m
,
sˆ2 =
N
(6.28)
for the ML estimate of s2 , where I is the identity matrix of order N.
The expressions for the ML estimates âm and sˆ2 enable (6.24) to be considered in
(k)
more detail. It is assumed that prior information on the value of m is not available,
and thus L(m) = log M, which is a constant and can therefore be neglected. The
integer k < N, the number of non-zero coefficients, must be transmitted, and this
requires a maximum of about k log N bits, and thus L(k, m) = L(k) = k log N.
The second term in (6.24) represents the code lengths of transmitting the ML
(k)
estimates âm and sˆ2 , and these lengths are
(k + 1)
∗
(k)
∗ ˆ2
ˆ2
L(â(k)
log N.
m , s | k, m) = L ([âm ]) + L ([s ]) +
2
The third term in (6.24) is calculated from (6.27), but using logarithm to base 2
158
instead of natural logarithms. In particular, it is easily verified that, using (6.28),
2
(k) d
−
W
a
m
m
N
2
2
log(2πs
)
+
log e
− log p(d| a(k)
,
s
,
k,
m)
=
m
2
2s2
2
N
2πe
N
=
log
+ log (I − Θ(k) )WmT d ,
2
N
2
where the first term can be ignored because it is independent of k and m, and similarly,
the noise s2 is independent of k and m. It therefore follows that the total code length
to be minimised is, ignoring all constant terms and assuming that prior information
on m is not available,
2
∗
(k)
(k) ˆ2
ˆ2
L(d, â(k)
m , ŝ , k, m) = k log N + L ([âm ], [s ]| k, m) + L(d| âm , s , k, m)
k
X
3k
∗ ˆ2
log N
=
log∗ â(k)
m i + log [s ] +
2
i=1
2
N
(6.29)
+ log (I − Θ(k) )WmT d ,
2
where log∗ j, j ∈ Z, is defined in (6.3) and (6.5). The expression (6.29) is minimised
over all values of k and m in the ranges
1≤k<N
and
1 ≤ m ≤ M,
respectively.
If prior information on the values of k and m is available, then it can be included
in (6.29). For example, if it is known that the number of terms k in the basis functions
satisfies k1 ≤ k ≤ k2 and the uniform distribution is assumed in this range of k, then


 L(m) + log(k2 − k1 + 1)
if k1 ≤ k ≤ k2
L(k, m) =

 +∞
otherwise.
Example 6.10. Zaarowski [51] considers the application of the principle of MDL for
the estimation of the rank of a noisy matrix when the noise is not known.
159
Consider a matrix A ∈ Rm×n of have rank r ≤ p = min (m, n) whose singular
values are
σ1 ≥ σ2 ≥ · · · ≥ σr > 0, σj = 0,
j = r + 1, . . . , p.
In many practical problems, the singular values are known approximately and not
exactly, in which case only estimates σ̂i of the exact singular values σi are available,


 σi + ei
i = 1, . . . , r
σ̂i =
(6.30)

 ei
i = r + 1, . . . , p.
It is assumed that the errors ei are statistically independent random variables with
Gaussian and Laplacian probability density functions,


 √ 1 exp − e2i2
i = 1, . . . , r
2s
2πs
p(ei ) =

 α exp(−αei )
i = r + 1, . . . , p,
(6.31)
where s, α > 0. This simple model is used because it enables considerable analytical
progress to be made, and in particular, it provides a trade-off between a physically
accurate model and a mathematically simple model.
It is assumed that a polynomial provides a good model for the variation of the
exact non-zero singular values σj , j = 1, . . . , r, with j,
σ1 = a0 10 + a1 11 + · · · + ak 1k
σ2 = a0 20 + a1 21 + · · · + ak 2k
..
.
σr = a0 r 0 + a1 r 1 + · · · + ak r k ,
that is,
σj =
k
X
l=0
al j l = b(j)T a,
j = 1, . . . , r,
(6.32)
160
where the vectors a and b(j) are, respectively,
T
a = a0 a1 · · · ak
∈ Rk+1 ,
and
b(j) =
j0 j1 · · · jk
T
∈ Rk+1 .
The integer k is the degree of the polynomial model of the singular values, which
are arranged in non-increasing order. It follows that the interpolating polynomial
cannot have maxima or minima, and thus a low degree polynomial, k = 2 or k = 3,
is adequate. The integer k is therefore assumed to be a known constant.
The r equations in (6.32) can be combined into one equation

 

T
b(1)
σ

 1  

 

 σ2   b(2)T 

 

 .  =  .  a,
 ..   .. 

 


 

T
b(r)
σr
and since the matrix in this equation is of order r × (k + 1), it follows that the least
squares solution of this equation is unique if r ≥ (k + 1). Furthermore, since n ≥ r,
it follows that r satisfies the inequalities
(k + 1) ≤ r ≤ n.
It follows from (6.30) and (6.31) that the joint probability density function of the
random variables ei is
p
r
X
αp−r
1 X 2
− 2
e −α
σ̂i
r exp
2s i=1 i
(2πs2 ) 2
i=r+1
!
,
and the substitution of (6.30) and (6.32) into this expression yields the probability
161
density function for the estimates σ̂j of the exact singular values σj ,


!2
p
r
k
p−r
X
X
X
α
l
− 1
pσ̂ =
σ̂
−
a
j
−
α
σ̂i 
r exp
j
l
2s2 j=1
(2πs2 ) 2
i=r+1
l=0
!
p
r
X
1 X
αp−r
2
=
− 2
σ̂j − b(j)T a − α
σ̂i ,
r exp
2
2
2s
(2πs )
j=1
i=r+1
(6.33)
where pσ̂ = pσ̂ (σ̂| a, s2 , α, k, r). The ML estimate of α is obtained by setting the
partial derivative of pσ̂ with respect to α equal to zero, which yields
p−r
α̂ = Pp
ˆj
j=r+1 σ
.
(6.34)
Similarly, the ML estimate of the variance s2 satisfies
r
X
2
σ̂j − b(j)T a = r sˆ2 ,
(6.35)
j=1
and the ML estimate of the vector a satisfies
Hâ = q,
where the coefficient matrix H is Hankel, and
r
X
H=
b(j)b(j)T ∈ R(k+1)×(k+1)
and
j=1
(6.36)
q=
r
X
j=1
σ̂j b(j) ∈ Rk+1 .
Zarowski [51] uses (6.33), (6.34), (6.35) and (6.36) to derive an expression for the total
code length, which he then minimises in order to find the best estimate of the rank
r of A. This procedure yields an ill-conditioned linear algebraic equation because of
the poor numerical properties of the interpolating polynomials that are stored in the
vector b(j). This problem is overcome by using orthogonal polynomials, which are
numerically well-behaved, and Zarowski therefore uses Gram polynomials [18], [28]
and [35] in order to obtain an equation that has better numerical properties. He gives
examples in order to show the effectiveness of the principle of MDL.
6.5
162
Summary
This chapter has considered the theoretical principles of MDL for the selection of a
hypothesis, from a collection of hypotheses, that best explains a given set of data.
The principle of MDL, which is closely related to Occam’s razor, does not find the
globally best model because it makes its selection from the given hypotheses.
It was shown that it is necessary to distinguish between the code length of a
deterministic and probabilistic parameter, and between an integer variable and a real
variable. The calculation of the code length of a random variable uses the Shannon
entropy, and that the code length of an integer is given by its Elias omega code.
Several examples of the application of the principle of MDL were given.
Bibliography
[1] J. D. Allan and J. R. Winkler. Structure preserving methods for the computation
of approximate GCDs of Bernstein polynomials. In P. Chenin, T. Lyche, and
L. L. Schumaker, editors, Curve and Surface Design: Avignon 2006, pages 11–20.
Nashboro Press, Tennessee, USA, 2007.
[2] J. Barlow. Error analysis and implementation aspects of deferred correction for
equality constrained least squares problems. SIAM J. Numer. Anal., 25(6):1340–
1358, 1988.
[3] J. Barlow and U. Vemulapati. A note on deferred correction for equality constrained least squares problems. SIAM J. Numer. Anal., 29(1):249–256, 1992.
[4] S. Barnett. Polynomials and Linear Control Systems. Marcel Dekker, New York,
USA, 1983.
[5] R. M. Corless, P. M. Gianni, B. M. Trager, and S. M. Watt. The singular
value decomposition for polynomial systems. In Proc. Int. Symp. Symbolic and
Algebraic Computation, pages 195–207. ACM Press, New York, 1995.
163
164
BIBLIOGRAPHY
[6] R. M. Corless, S. M. Watt, and L. Zhi. QR factoring to compute the GCD of univariate approximate polynomials. IEEE Trans. Signal Processing, 52(12):3394–
3402, 2004.
[7] I. Emiris, A. Galligo, and H. Lombardi. Numerical univariate polynomial GCD.
In J. Renegar, M. Schub, and S. Smale, editors, The Mathematics of Numerical
Analysis. Volume 32 of Lecture Notes in Applied Mathematics, pages 323–343.
AMS, 1996.
[8] I. Emiris, A. Galligo, and H. Lombardi. Certified approximate univariate GCDs.
J. Pure and Applied Algebra, 117,118:229–251, 1997.
[9] R. T. Farouki and V. T. Rajan. On the numerical condition of polynomials in
Bernstein form. Computer Aided Geometric Design, 4:191–216, 1987.
[10] L. Foster.
Generalizations of Laguerre’s method.
SIAM J. Numer. Anal.,
18:1004–1018, 1981.
[11] C. F. Gerald and P. O. Wheatley. Applied Numerical Analysis. Addison-Wesley,
USA, 1994.
[12] S. Goedecker. Remark on algorithms to find roots of polynomials. SIAM J. Sci.
Stat. Comput., 15:1059–1063, 1994.
[13] G. H. Golub and C. F. Van Loan. Matrix Computations. John Hopkins University
Press, Baltimore, USA, 1996.
[14] P. Grünwald. A tutorial introduction to the minimum description length principle
http://www.grunwald.nl, 2005.
BIBLIOGRAPHY
165
[15] E. Hansen, M. Patrick, and J. Rusnack. Some modificiations of Laguerre’s
method. BIT, 17:409–417, 1977.
[16] N. J. Higham. Accuracy and Stability of Numerical Algorithms. SIAM, Philadelphia, USA, 1996.
[17] N. J. Higham. Accuracy and Stability of Numerical Algorithms. SIAM, Philadelphia, USA, 2002.
[18] F. B. Hildebrand. Introduction to Numerical Analysis. Tata McGraw-Hill, New
Delhi, India, 1974.
[19] D. G. Hough. Explaining and Ameliorating the Ill Condition of Zeros of Polynomials. PhD thesis, Department of Computer Science, University of California,
Berkeley, USA, 1977.
[20] V. Hribernig and H. J. Stetter. Detection and validation of clusters of polynomial
zeros. Journal of Symbolic Computation, 24:667–681, 1997.
[21] M. A. Jenkins and J. F. Traub. A three-stage variable-shift iteration for polynomial zeros and its relation to generalized Raleigh iteration. Numerische Mathematik, 14:252–263, 1970.
[22] M. A. Jenkins and J. F. Traub. Algorithm 419: Zeros of a complex polynomial.
Comm. ACM, 15:97–99, 1972.
[23] W. Kahan. Conserving confluence curbs ill-condition. Technical report, Department of Computer Science, University of California, Berkeley, USA, 1972.
BIBLIOGRAPHY
166
[24] W. Kahan. The improbability of probabilistic error analyses for numerical computations. http://www.cs.berkeley.edu/∼wkahan/improber.ps, 1996.
[25] E. Kaltofen, Z. Yang, and L. Zhi. Structured low rank approximation of a
Sylvester matrix, 2005. Preprint.
[26] N. Karmarkar and Y. N. Lakshman. Approximate polynomial greatest common
divsior and nearest singular polynomials. In Proc. Int. Symp. Symbolic and
Algebraic Computation, pages 35–39. ACM Press, New York, 1996.
[27] B. Li, Z. Yang, and L. Zhi. Fast low rank approximation of a Sylvester matrix
by structured total least norm. J. Japan Soc. Symbolic and Algebraic Comp.,
11:165–174, 2005.
[28] J. S. Lim and A. V. Oppenheim. Advanced Topics in Signal Processing. Prentice
Hall, Englewood Cliffs, New Jersey, USA, 1988.
[29] C. Van Loan. On the method of weighting for equality-constrained least squares
problems. SIAM J. Numer. Anal., 22(5):851–864, 1985.
[30] D. J. MacKay. Information Theory, Inference, and Learning Algorithms. Cambridge University Press, Cambridge, UK, 2003.
[31] K. Madsen. A root-finding algorithm based on Newton’s method. BIT, 13:71–75,
1973.
[32] D. Manocha and J. Demmel. Algorithms for intersecting parametric and algebraic curves II: Multiple intersections. Graphical Models and Image Processing,
57(2):81–100, 1995.
BIBLIOGRAPHY
167
[33] V. Y. Pan. Solving a polynomial equation: Some history and recent progress.
SIAM Review, 39(2):187–220, 1997.
[34] V. Y. Pan. Computation of approximate polynomial GCDs and an extension.
Information and Computation, 167:71–85, 2001.
[35] A. Ralston. A First Course in Numerical Analysis. McGraw Hill, USA, 1965.
[36] J. Rissanen. A universal prior for integers and estimation by minimum description length. Ann. Statist., 11(2):416–431, 1983.
[37] J. Rissanen. Universal coding, information, prediction and estimation. IEEE
Trans. Information Theory, 30(4):629–636, 1984.
[38] J. Rissanen. Stochastic Complexity in Statistical Inquiry. World Scientific, Singapore, 1989.
[39] J. Ben Rosen, H. Park, and J. Glick. Total least norm formulation and solution
for structured problems. SIAM J. Mat. Anal. Appl., 17(1):110–128, 1996.
[40] D. Rupprecht. An algorithm for computing certified approximate GCD of n
univariate polynomials. J. Pure and Applied Algebra, 139:255–284, 1999.
[41] N. Saito. Simultaneous noise suppression and signal compression using a library of orthonormal bases and the minimum description length criterion. In
E. Foufoula-Georgiou and P. Kumar, editors, Wavelets in Geophysics, pages
299–324, Boston, MA, 1994. Academic Press.
[42] J. V. Uspensky. Theory of Equations. McGraw-Hill, New York, USA, 1948.
BIBLIOGRAPHY
168
[43] J. Wilkinson. Rounding Errors In Algebraic Processes. Prentice-Hall, Englewood
Cliffs,N.J., USA, 1963.
[44] J. R. Winkler. A statistical analysis of the numerical condition of multiple roots
of polynomials. Computers and Mathematics with Applications, 45:9–24, 2003.
[45] J. R. Winkler. Numerical and algebraic properties of Bernstein basis resultant
matrices. In T. Dokken and B. Jüttler, editors, Computational Methods for
Algebraic Spline Surfaces, pages 107–118, Germany, 2005. Springer-Verlag.
[46] J. R. Winkler. High order terms for condition estimation of univariate polynomials. SIAM J. Sci. Stat. Comput., 28(4):1420–1436, 2006.
[47] J. R. Winkler and J. D. Allan. Structured low rank approximations of the
Sylvester resultant matrix for approximate GCDs of Bernstein polynomials, 2006.
Submitted to Computer Aided Geometric Design.
[48] J. R. Winkler and J. D. Allan. Structured total least norm and approximate
GCDs of inexact polynomials, 2006. To appear in Journal of Computational and
Applied Mathematics.
[49] J. R. Winkler and R. N. Goldman. The Sylvester resultant matrix for Bernstein
polynomials. In T. Lyche, M. Mazure, and L. L. Schumaker, editors, Curve
and Surface Design: Saint-Malo 2002, pages 407–416, Tennessee, USA, 2003.
Nashboro Press.
[50] J. R. Winkler and J. Zı́tko. The transformation of the Sylvester matrix and the
calculation of the GCD of two inexact polynomials, 2007. In preparation.
BIBLIOGRAPHY
169
[51] C. J. Zarowski. The MDL criterion for rank determination via effective singular
values. IEEE Trans. Signal Processing, 46(6):1741–1744, 1998.
[52] C. J. Zarowski, X. Ma, and F. W. Fairman. QR-factorization method for computing the greatest common divisor of polynomials with inexact coefficients. IEEE
Trans. Signal Processing, 48(11):3042–3051, 2000.
[53] Z. Zeng. The approximate GCD of inexact polynomials. Part 1: A univariate
algorithm, 2004. Preprint.
[54] Z. Zeng. Computing multiple roots of inexact polynomials. Mathematics of
Computation, 74:869–903, 2005.
[55] L. Zhi and Z. Yang. Computing approximate GCD of univariate polynomials
by structured total least norm. Technical report, Institute of Systems Science,
AMSS, Academia Sinica, Beijing, China, 2004.

www.dcs.shef.ac.uk - Machine Learning

Transcription

Similar documents

a TRIP- Order Form Oct 6 09

IMA Journal of Numerical Analysis (1992) 12, 387

A Successful Formula, Down to the T-Bone

PDF - Visual Optics Institute

the little brown church

Numerical CFD comparison of Lillgrund employing RANS

Binomial`s power There is a formula which links the power of

on the parallelization

request to produce

read an exclusive excerpt of Ex-Purgatory right here