PDF version - The Institute for Computational

Transcription

PDF version - The Institute for Computational
TICAM REPORT 97-05
April 1997
Krylov-Secant Methods for Solving Large Scale
Systems of Coupled Nonlinear Parabolic Equations
Hector Klie
Krylov-secant methods for solving large scale
systems of coupled nonlinear parabolic equations
Hector Klfe
September
TR96-2S
1996
RICE CNIVERSITY
Krylov-secant methods for solving large scale systems
of coupled nonlinear parabolic equations
by
Hector Klie
.-\ THESIS
IN PARTIAL
SUBMITTED
FULFILL~lENT
REQl~IRE~IENTS
OF THE
FOR THE DEGREE
Doctor of Philosophy
ApPROVED,
THESIS
I):
.'
I
\' ....(
.-;
t.. (
!
COMMITTEE:
;;
~IClrYF. \Vhe~r;r, Chairman
Ernp~t and vlrginia Cockrell Chair
in Engineering
Ulliver~ty of Texas
DallllY <)/:'orensen
Prof~sllr of Computational
~ I it t ht> 11 lilt ic~
(;~()rgt> Hiri\saki
pror~ssor of ('hemical
and Applied
Engineering
('lint ~. O"w:;on
Associat.t> Profes~or of Aerospace
Enginpt"ring and Engineering Mechanics
{:l1i\'~rsity of Texa~ at Austin
~Iarcelo Rame
CR PC Research Scientist IV
Houston. Texas
Sept.elllbE'l', 1996
Abstract
Krylov-secant methods for solving large scale
systems of coupled nonlinear parabolic equations
by
Hector Klie
This dissertation
of applications
Cf.'llters on two major aspects dictating
The former aspect leads to the conception of a
way of reusing the l(rylov information
systems arising within a ~ewton method.
developed on a nonlinear
soh'ill!!; non-symmetric
(IS
time
based on the solution of systems of coupled nonlinear parabolic equa-
tions: nonlinear and linear iterations.
\1O\'e\
the computational
generated
by GMRES for solving linear
The approach
version of the Eirola-Nevanlinna
stems from theory recently
algorithm
(originally
linear systems) which is capable of converging
Broydpn's I1lpthod . ..\ secant update strategy
for
twice as fast
of the Hessenberg matrix resulting
from the .-\rnoldi process in G:\IRES amounts to reflecting a secant update of the current .Jacobian with the rank-one term projected onto the generated
(Krylov-Broyden
~evanlinna
update).
This allows the design of a new nonlinear
Krylov subspace
Krylov-Eirola-
(KEN) algorithm and a higher-order version of Newton's method (HOK\)
as well. The underlying
development
by cheaper Richardson
iterations
tion. Hence, three algorithms
the nonlinear
Eirola-:\evanlinna
is also auspicious to replace the use of G MRES
for the sake of fulfilling the inexact Newton condi-
derived from ='lewton's method,
algorithm
are proposed
Broyden's method and
as a part of a new family
\\'ith carefulness
and diligent
work: Carol San Soucie. for those never-ending
worthy moments of developing solver code for RPARSIM
the questions
I addressed
me that Latin enthusiastic
in this dissertation;
attitude
which promoted
many of
and ~liguel Argaez, for sharing with
towards research.
Finally. I acknowledge Intevep S.A. for giving me the opportunity
Fnivf"rsity and for its continuous
hut
financial support
all these years.
to come to Rice
Contents
.-\bstract
v
Acknowledgments
VII
List of Illustrations
XIII
List of Tables
XXI
1 Introduction
1.1
~Iotivation.......
1.2
Structure
1.:3 ~ otation
of the thesis.
6
. . . . . . . .
8
1.-1 Some preliminary
9
results
1.-1.1
Matrix analysis results
10
1.-1.2
Fundamentals
12
1.-1.:3 :\onlinear
2 Newton-Krylov
LIThe
2.2
1
of iterative solution methods.
convergence
'
.
16
19
methods
19
inexact ~ewton framework
2.1.1
Algorithm
..
20
2.1.2
Forcing terms
22
2.1.:3
Globalization
2,)
2.1.-1
\Vhy Krylov subspace methods
28
G\IRES
. ,
.
2.2.1
The Arnoldi factorization
2.2.2
~linimization
2.2.:]
Convergence
of residuals.
.
.) ')1
_._.-1
'I gon.th m
.--\
2.2.;)
The role of G\IRES
:n
.
:2.:3 BiCGSTAB
in Newton's method
.
2.:3.1
General remarks
2.:3.2
Algorithm
....
3 Secant methods
41
:3.1 The family of rank-one
solvers.
-1:3
:3.1.1
Broyclen's method.
. . .
:3.1.2
The family of E01-like methods
-l8
1.1.:3
Inexactness
.55
4-l
in secant methods
:3.2 Secant preconditioners
:3.2.1
Secant preconditioners
:3.2.2
Preconditioners
:3.:3 Exploiting
:3.:3.1
:JA
.
based on multiple secant updates
....
62
6-l
the Arnoldi factorization
6.5
:3.:3.2 On the I\rylov-Broyclen
update
11
\olllirlf'ar
....
,t
Krylov-E\
:3.-1.t
The nonlinear
:3.-1.2
.-\ higher-order
methods
I\E\
algorithm
KryIO\'-\ewton
4 Hybrid Krylov-secant
-l.t
for inexact Newton methods
I\rylov basis information
r pdating
.57
algorithm
76
methods
83
Hybrid Krylov methods
8:3
-t.1.1
Projection
S.}
-t.1.2
Reducing t he cost of solving the new Jacobian
-1.1.:3 Spectra
onto the Krylov subspace
vs. Pseudospectra
equation.
. . . . . . . . . . . . . . . . .
S8
91
XI
"".1A
Richardson iteration
9.)
and Leja ordering
-1.2 The family of HKS methods
98
-1.2,1
The HKS algorithms
"".2.2
The role of preconditioning
.4:.2.:3
Globalization
.4:.:3 Computational
.4:.:3.1
9S
in Krylov-secant
methods
106
.
considerations
118
for HKS methods
121
Limiting memory compact representations
"".:3.2 Computational
complexity
122
.
12.5
5 Preconditioning Krylov methods for systems of coupled
nonlinear equations
129
.).1
\Iotivation..........
129
:).2
Description of the Problem.
1:31
.5.2.1
Differential Equations
131
:).2.2
Discretization.....
1:3""
5.:3
·SA
5.·S
.1.2.:3 ~ewton and linear system formulation
1:36
The algebraic coupled linear system framework.
1:36
:).:3.1
Structure
1:37
5.:3.2
An algebraic analysis of the coupled Jacobian
of Resulting
Decoupling operators
Linear System
. .
matrix
1:38
. .
.S.4.1
Block decoupling
.S.-L2
Properties
1.4:2
.
of the partially decoupled blocks
1.4::3
1-17
Two-stage preconditioners
I·S0
,S.·5.1
Background....
150
:)..1.2
Combinative
:)..1.:3
.-\dditive and multiplicative
two-stage preconditioner
extensions
.
1.'):3
1·5""
XII
·1..5.4
Consecutive
block two-stage
.J.:).:)
Relation
between
·).:),6
Efficient
implement.ation
alternate
preconditioners
...
and consecutive
forms
l:)'j
l6i
.
6 Computational experiments
6.1
Evaluating
Krylov-secant
6.1.1
Preliminary
6.1.2
The modified
6.1.:3
Richards'
171
methods.
examples.
172
. . .
problem.
tT,
equation
. . . . . .
tS?)
for coupled
Bratu
6.2
Evaluat.ing
preconditioners
6.:3
Evaluating
parallel
preconditioners
171
Krylov-secant
for systems
6.:3.1
Brief description
6.:3.2
Considerations
methods
of coupled
193
systems.
and two-stage
nonlinear
of the model
for implementing
202
equations
202
.
the HOKN
algorithm
with
the :2GSS preconditioner
6.:3.:3
~ lImerical
results
. . . .
7 Conclusions and further work
20,
221
Bibliography
227
Glossary
245
Illustrations
2.1
The use of the forcing term criteria for dynamic control of linear
tolerances.
The solid line represents a standard
implementation
with fixed linear tolerances 0.1. The dotted line is
the inexact :\ewton implementation
Each symbol
inexact Newton
* indicates
with the forcing term criterion.
the start of a new Newton iteration.
:3.1 Convergence comparison
of ~ewton's
method (dash-dotted
Broyden's method (dotted line), the composite
Newton's
. . ..
2.:'
line),
method
(dashed line) and the NEN algorithm (solid line) in their exact versions. ;'):3
:3.:2
C'onvergenge comparison
of ~ ewton's method (dash-dotted
Broyden's method (dotted
line). the composite
Newton's
line),
method
(dashed line) and the NEN algorithm (solid line) in their inexact
versIons.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
:3.:3 Convergence comparison
:3.-1
between Broyden's
..
,:3
. . . . . . . . ..
,9
method (solid line). . . . . . . . . . ..
Convergence comparison
between the nonlinear
I~EN algorithm
(solid line).
Surface and contour plot of the Rosenbrock Jacobian
at the first and last nonlinear iteration.
-!.2
line)
and the Krylov-Broyden
(dotted line) and the HOKN algorithm
-l.1
method (dotted
iteration
pseudospectra
. . . . . . . . . . . . . . . .
Surface and contour plot of the Powell Jacobian
first and last nonlinear
.16
pseudospectra
92
at the
'
9:3
XIV
-1.:3
Surface and contour plot of the Chandrasekhar
pseudospectra
-1.-1
at the first and last nonlinear iteration (easy case).
Surface and contour plot of the Chandrasekhar
pseudospectra
.1.,:)
.Jacobian
~n
..
9:3
.Jacobian
at the first and last nonlinear iteration
Convergence comparison
..
bet\veen the HKS-Broyden
(hard case).
(dash-dotted
10:3
line) and the HK5-EN (solid line) algorithm.
-1.6 Convergence of t.he HKS-N algorithm..
-l.T
Pseudospectra
of preconditioned
. . .
104
.Jacobian matrices for the extended
Rosenbrock fUIlct.ioIl. C pper left corner: Al\I-l;
upper right corner:
.-\.+.U-1: lower left. corner .-\+(.\[-1)+ and, lower right corner: (AiVI-1)+.11:3
-l.S
Pseudospectra
of preconditioned
singular function.
.Jacobian matrices for the Powell
Upper left corner: AA[-l; upper right corner:
'-\'+.U-l: lower left corner .4.+p.[-I)+ and, lower right corner: (AkJ-l)+.11:3
UJ
Pseudospectra
of precondit.ioned
of the Chandrasekhar
H-equation.
.Jacobian matrices for the easy case
Upper left corner: AAJ-1; upper
right corner: .-\.+.\I-l: 10\,,'er left corner ,4+(;\1-1)+ and, lower right
corner: (._lJ[-I)+. . . . . . . . . . . . . . . . . . . . . . . . . . . ..
LlO Pseudospectra
of preconditioned
of the C'handrasekhar
H-equation.
.Jacobian matrices for the hard case
r pper
left corner: ,4.~\I-I: upper
right corner: .4+J[-I: lower left corner ..{+(.\I-I)+
corner: (.·U[-I)+
L 1t Convergence
11-1
bet\veen the HKS-Broyden
line) and the HI\S-EN (solid line) algorithm
preconditioning.
-l.12 Convergence
and. lower right
,
comparison
1l-t
(dashed-dotted
with tridiagonal
. . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
of the HKS-N algorithm
with tridiagonal
preconditioning.
117
117
Xli
,1,1
~Iatrix structure
of linear systems in the Newton iteration.
.').2 Spectra of the blocks composing the sample Jacobian
top to bottom,
:').:3
they correspond to Jpp, Jpe, Jep and Jee•
matrix.
1:38
From
1-1:0
.••.•••
Spectrum of the sample Jacobian matrix to be used throughout
discussiQQ..,on two-stage preconditioners.
.j.-l:
. . . ..
the
. . . . . . . . . . . . . . ..
141
Spectra of the partially decoupled forms of the sample Jacobian
matrix. The one above corresponds to D-l J, and the one below to
W J (or equivalently,
.j..j
~V J.) . . . . . . . . . . . . . . . . . . . . . ..
Spectra of each block after decoupling with D. From top to bottom,
they correspond
to the (1,1), (1,2), (2,1) and (2,2) blocks ....
:').6 Spectra of the .Jacobian right-preconditioned
the combinative
C). j
1-1:9
operator.
the two-stage additive operator.
the two-stage multiplicative
1.54
by the exact version of
. . . . . . . . . . . . . . . . . . ..
.1.S Spectra of the .Jacobian right-preconditioned
the two-stage block .Jacobi operator..
"
the two-stage block Gauss-Seidel operator.
5.11 Spectra of the .Jacobian right-preconditioned
l·j7
by the exact version of
. . . . . . . . . . . . . . . ..
right-preconditioned
l.j6
by the exact version of
operator
,).~) Spectra of the .Jacobian right-preconditioned
UjD
by the exact version of
,......................
Spectra of the .Jacobian right-preconditioned
,).10 Spectra of the Jacobian
,.
1·19
by the exact version of
. . . . . . . . . . . . ..
162
by the exact version of
the two-stage block discrete projection operator.
. . . . . . . . . ..
162
XVI
6.1
Performance
in millions of floati'ng point operations
method (dash-dotted
line), composite Newton's method (dashed line)
and the HOKN algorithm
Rosenbrock's
(solid line) for solving the extended
function, the extended Powell's function and two levels
of difficulty of the Chandrasekhar
6.2
Performance
H-equation.
Rosenbrock's
Powell's function
H-equation.
Performance
in millions of floating point operations
(dash-dotted
line). HKS-E~
for solving the extended
:"ionlinear iterations
of \e\\·ton-like
(clashed line) and the HKS-N (solid line)
Rosenbrock's
function, the extended
iterations
Performance
HKS-\
algorithm
-to grid.
1T)
-w
vs. Relative ~onlinear
1SO
Residuals ~orms (RNRN)
for the modified Bratu problem on a -10x -10grid. lSI
in millions of floating point operations
method, composite
x
H-equation.
.
\olliinear
to
Powell's
methods for the modified Bratu problem on a 40 x
of secallt-like methods
6.6
174
vs. Relative Nonlinear Residuals Norms (RNRN)
grid
G.:)
. . ..
of HKS-B
function ancl two levels of difficulty of the Chandrasekhar
6.-!
algorithm
(solid line) for solving
function, the extended
and two levels of clifficulty of the Chandrasekhar
17:3
of Broyden's
line). the nonlinear Eirola-Nevanlinna
(dashed line) and the nonlinear KEN algorithm
6.:3
. . . . . . . . . . . ..
in millions of floating point operations
method (dash-dotted
the extended
of Newton's
of Newton's
:"Jewton's method, the HOKN algorithm
and the
for solving the modified Sratu problem on a
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
182
XVII
6.7
Performance
in millions of floating point operations
method, the nonlinear Eirola-:'Jevanlinna
KEN algorithm,
algorithm,
of Broyden's
the nonlinear
the HKS-B algorithm and the HKS-EN algorithm for
solving the modified Bratu problem on a -10 x 40 grid ..
6.8
\-Vater content, distribution
6.9
Dispersivityat
1st and 1000th time steps
6.10 Hydraulic Conductivity
6.11 Performance
~ewton's
at 1st and 1000th time steps.
.
IS7
coefficients at 1st and 1000th time steps.
in accumulated
and the HKS-N algorithm
method,
for solving Richards'
in accumulated
187
of
the HOKN algorithm
equation.
. . . . ..
millions of floating point operations
Broyden's method, the nonlinear Eirola-Nevanlinna
nonlinear KE)I algorithm,
. ..
millions of floating point operations
method. composite ~ewton's
6.12 Performance
IS7
the HKS-B algorithm
algorithm,
IS9
of
the
and the HKS-EN
algorithm for solving Richards' equation
.
190
6.1:3 Relative residual norms vs. iteration of G~IRES for different
preconditioners.
The performance
with different preconditioners
are
organized in matrix form. Subplot (1.1): ILU (dot), Trid(dash),
block .Jacobi (solid). Subplot (1. 2): two-stage combinative
two-stage additive (dash), two-stage multiplicative
(solid). Subplot
(2,1): two-stage block Jacobi (dot), two-stage Gauss-Seidel
two-stage discrete projection
(dot), two-stage multiplicative
(dot),
(dash).
(solid). Subplot (2,2): block Jacobi
(dash), two-stage discrete projection
(solid): Problem Size: -1 x 8 x S. :It = 0.1. . . . . . . . . . . . . . ..
200
XVIII
6.1-! Relative residual norms vs. iteration of BiCGSTAB for different
preconditioners.
The performance
with different preconditioners
organized in matrix form. Subplot (1,1):
ILU (dot). Trid(dash).
block .Jacobi (solid). Subplot (L 2): two-stage combinative
two-stage additive (dash), two-stage multiplicative
(dot),
(solid). Subplot
(2,1): two-stage block Jacobi (dot). two-stage Gauss-Seidel
two-stage discrete projection
(dash), two-stage discrete projection
~t=O.1.
(solid). Problem Size: -!:x8xS.
function (RIGHT).
(dash),
(solid). Subplot (2.2): block Jacobi
(dot). two-stage multiplicative
6.1:) Relative permeability
are
of both phases (LEFT)
201
and capillary pressure
. . . . . . . . . . . . . . . . . . . . . . . . . . ..
204
6.16 Speedup \'s. number of processors for the two-phase problem using
the HOK~/2SGS
solver on an Intel Paragon after 20 time steps. . ..
211
6.1" Speedup \'s. number of processors for the two-phase problem using
the HOK~/2SGS
solver on an IBM SP2 after 20 time steps.
6.18 Log-log plot of the number of processors vs. execution
two-phase problem using the HOK~/2SGS
Paragon after 20 time steps.
. . . ..
211
time for the
sol\'f~r on an Intel
. . . . . . . . . . . . . . . . . . . . ..
212
6.19 Log-log plot of the number of processors \'s. execution time for the
two-phase problem using the HOK~ /2SGS solver on an IB~:{ SP2
after 20 time steps.
. . . . . . . . . . . . . . . . . . . . . . . . . . ..
6.20 Number of accumulated
G\IRES
iterations
vs Relative nonlinear
residual norms ();,RN'R) using the HOKN /2SGS, Newton/2SGS
\'ewtonf2SComb
21:1
and
solvers on 12 nodes of the IBM SP2 for a problem
size of 16 x -!S x -~Sat the third time step with :It
=
.05 day. . . ..
21:1
XIX
6.21 CPU time vs Relative nonlinear residual norms (:\'RNR) using the
HOK:'J/2SGS,
Newton/2SGS
and Newton/2SComb
solvers on 12
nodes of the IB\>I SP2 for a problem size of 16 x 18 x 18 at the third
time step with ~t = .05 day..
6.22 Performance
in accumulated
and Newton/2SComb
.
,..........
G~lRES iterations
of the HOKN/2SGS
solvers after 100 time steps of simulation with
DT = .0.5 of a 16 x -lS x 48 problem size on 16 SP2 nodes.
6.2:3 Performance
in accumulated
:Jewton/2SC'omb
CPU time of the HOKN/2SGS
solvers after 100 time steps of simulation
~t = .05 of a l6 x -lS x -lS problem size on 16 SP2 nodes ...
. . . . ..
21.j
and
with
..
21.j
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
216
6.2-l Performance
in accumulated
Newtonj2SGS
simulation
nodes.
2U
nonlinear iterations
and ~ewton/2SComb
"
of HOKN/2SGS.
solvers after 100 time steps of
with ~t = .0.5 of a 16 x 48 x 48 problem size on 16 SP2
XXII
6.:3 Total of nonlinear iterations
applicable)
Richardson
(NI), G\IRES
iterations
iterations
(G1) and (when
(Rich) for inexact versions of
se\'eral nonlinear solvers. The problem size considered
is of size
16 x 16 gridblocks after 100 time steps of simulation.
. . . . . . . ..
6.-1 Results for G:\IRES preconditioned
by the nine schemes tested in this
work .. Vit: number of outer iterations;
the solver iteration:
Ts: elapsed time in seconds for
Tp: elapsed time in seconds to form the
preconditioner:
:Vi.a: average number of inner iterations
outer iteration.
Preconditioners
Tridiagonal
(Tridiag.).
Incomplete
LU factorization
with no infill
(2SComb.),
t.\vo-stage .-\clclitive (2SAcld.), t\vo-stage \lultiplicative
(2S1Vlult.),
two-stage block .Jacobi (2SB.J), two-stage Gauss-Seidel
(2SGS) and
two-stage Discrete Projection
(2SDP).
Results for 8iC'GSTAB preconditioned
194:
by the nine schemes tested in
this work. :Vit: number of outer iterations:
seconds for the solver iteration:
the preconditioner:
outer iteration.
tridiagonal
Ts: elapsed time in
Tp: elapsed time in seconds to form
.Vi.a: average number of inner iterations
Preconditioners
(Tridiag.),
t\vo-stage .\dditive
incomplete
(2S:\dd.),
Lr factorization
with no infill
(2SComb.).
two-stage :\hdtiplicative
two-stage block .Jacobi (2SB.J), two-stage Gauss-Seidel
two-stage Discrete Projection
Physical input data.
per unit
shown are from top to bottom:
(ILC(O)), block .Jacobi (8.J). two-stage Combinative
6.6
per unit
shown are from top to bottom:
(ILU(O)). block .Jacobi (8.1), two-stage Combinative
6..j
L92
(2SDP).
(2S:Ylult.),
(2SGS) and
197
20-1:
XXIII
6.;
Summary of linear iterations
of backtracks
(LI), nonlinear iterations
(NI), number
(~B) and execution times of GMRES and Bi-CGSTAB
with the use of the 2SComb and the 2SGS preconditioners.
simulation
The
covers 20 time steps with ~t = .05 and ~t = ..j for a
problem of size 8 x 24 x 24 grid blocks on a mesh of 4 x 4 nodes of the
Intel Paragon.
("'): Backtracking
method failed after the 17th time
step; ("''"): ~t was halved after the 16th time step..
6.S
Results for the HOKN /2SGS and Newton/2SComb
. . . . . . . ..
20S
solvers for
different large problem sizes vs. different number of processors of the
Intel Paragon.
simulation
Execution figures are measured after 10 days of
with ~t = 1 day. CPU times (T) are measured in minutes,
(E) indicates parallel efficiency. ("') Abnormal efficiency due to
.
pagmg
6.9
0f
.
t h e operatmg
system.
. . . . . . . . . . . . . . ..
....
'>1-I
:..
CPU time measured in minutes of a million and six hundred
thousand
simulation
of unknowns on 16 nodes of the SP2 for 10 days of
''''ith ~t = 1 day.
. . . . . . . . . . . . . . . . . . . . . ..
219
1
Chapter 1
Introduction
1.1
Motivation
For the last thirty years. reservoir simulation has played a major role in the petroleum
industry
and, as a by product,
tors have been instrumental
reco\'ery strategies
stantly expanded
algorithm
industry
as well. Numerical simula-
in helping reservoir engineers locate oil reserves, design
and optimize oil field management.
the computing
development
development
in the computer
These packages have con-
frontiers by driving a major part of the research in
of the last three decades and by creating
of vector and, more recently, parallel computing
growing concern for underground
a market for the
systems.
pollution has made it possible to adapt much of the
same software and hardware technology to the numerical simulation
contaminant
migration
and of contaminant
The advent of increasing computing
larger scientific and engineering
power has been a driving force for solving
problems.
indeed.
Consequently,
have been coming forth with this computer
~owadays.
the idea of solving partial
differential
technology
equations
lions of unknowns is becoming plausible and attractive
programmer.
of underground
clean-up strategies.
algorithms
the application
The recently
(POE's)
new numerical
sophist.ication.
involving mil-
to the numerical analyst and
The need for solving problems with at least one million
grid blocks. and several unknowns per grid block, has become one of the main challenges in the reservoir community.
solvers plays an important
Therefore the conception
role in the oil industry
research.
of robust and efficient
~Iajor challenges
arise
In
connection
to solving coupled sets of nonlinear equations
implicit discretization
This dissertation
application
of multi-phase
and practical
We are primarily
mining the overall computing
by a fully
models .
is an immediate
context.
as obtained
interested
response to the aforementioned
in enhancing
time of a simulation:
those aspects deter-
linear and nonlinear iterations.
To this end. we have divided the present work in two major points
• Efficient and robust implementations
Krylov-subspace
iterations.
• Efficient preconditioners
of the inexact Newton method based on
and .
for coupled systems of linear equations.
[n the following. \ve describe the role that both ideas play in this research. This
shall serve as further motivation
and defining scope of the present work.
Ideas on performing inexact Newton steps have been around for some time. Ortega
and Rheinboldt
had already suggested this type of computation
the se\'enties [10,1]. However. maturity
Dembo. Eisenstat
\\'11<'11
of these ideas came up as result of the work of
[4:1]. Their work provided mechanisms for deciding
and Steihaug
the relat i\'e residuals
at the beginning of
of the linear solver are sufficiently small to ensure an
accc>ptable \'ewton step. Since then, the reliability and acceptance of inexact )iewton
methods
have been continuously
growing according
to the sophistication
of linear
iterative solvers for large scale problems.
On the other hand, quasi-Newton
with the high computational
.Jacobian operators.
cost and difficulties
associated
to deal
to the evaluation
of
These methods rely upon low-rank updates that serve as correc-
tion terms for secant approximations
of these low rank updates
problems.
methods have been a good alternative
is Broyden's
to the Newton equation.
method [26,49].
the main difficulty resides in maintaining
The most widely used
In the context of large scale
the convergence of the method
without destroying
the sparsity pattern
of the Jacobian
advances aimed at overcoming this can be categorized
Comprehensive
matrix.
Hence, most of the
as limited memory methods.
discussion and pointers to the literature
about using these methods
for large scale problems can be found in [28,49, 101].
This dissertation
looks at both approaches in a complementary
way: we perform
inexactness through a Krylov iterative method (i.e., GMRES) and perform (or rather
reflect) secant updates of the .Jacobian matrix restricted
to the generated
Krylov sub-
space basis informat.ion.
To make this possible, we exploit the information
by the I\:rylov iterative
method by solving a sequence of minimal residual approxi-
mation problems or by propagating
eigenvalue information
generated
of the current Jacobian
matrix to the .Jacobian matrix at the next new point. The former procedure'leads
the generation
to
of faster versions of Broyden's method and a higher order version of
N' ewton's method.
cheaper Richardson
The latter allows one to replace the use of G MRES iterations
by
it.erations. The reliability of both approaches depends upon how
well the Broyden update
restricted
to the Krylov subspace resembles the one given
by Broyden's update in the entire subspace.
Sillc~ G~IRES is based on the .-\rnoldi process to generate a basis for the KrylO\'
subspace.
we propose to update the Hessenberg matrix and preserve the basis for fu-
ture :\'"ewton steps. Such updates are based on Broyden's method but restricted
current
Krylov subspace giving rise to what we call Krylov-Broyden
underlying
mechanism
allows one to reformulate
Therefore.
The
solutions of future linear Newton
equation as a sequence of minimal residual approximation
ing the G:\IRES method.
updates.
to the
problems without reinvok-
faster and higher-order
nonlinear
methods can
be built and their effectiveness relies upon how well the linear directions contained
in
the Krylov basis are able to generate a descent direction for the norm of the nonlinear
function.
6
1.2
Structure of the thesis.
The present work is organized
as follows. The remainder
of the chapter is devoted
to an overview of some notational
conventions and results of linear algebra.
review of linear iterative
solution
methods with emphasis
ation and its connections
to I\.rylov subspace methods
groundwork
for further analysis of Hybrid Krylov-Secant
the chapter
wit h a review of types of nonlinear
A brief
on the Richardson
iter-
are in order to prepare the
(HKS) methods.
convergence
\Ve end
and some addi tional
notation.
In Chapter
method.
2. we establish
the framework that distinguishes
It comprises the use of local and global convergence as well as forcing term
criteria to dynamically
adjust linear tolerances
is followed \vith a brief description
method
our inexact Newton
is the cornerstone
for the linear solver. The discussion
of G~IRES and BiCGSTAB.
The former iterative
in developing the family of I\rylov-secant
latter serves for comparison
purposes in eventual
numerical
methods.
experiments
the
with two-
stage preconditioners.
In Chapt.er :3. we review the fundamentals
I'nt rallk-one type of solvers:
intprpretations
\'iewpoints
methods.
Broyden's met.hod and the EN algorithm.
Both ha\'e
in t.he linear and nonlinear world. \Ve also re\·iew a couple of recent
along the notion of complementarity
That is. the need to incorporate
inexact Newton iterations.
incorporate
of secant methods through two differ-
between inexact and quasi-;\f'wton
the preconditioner
A very appealing enhancement
not only the preconditioner
into the convergence of
to this original idea is to
but also the I\rylov information
produced by
G~[RES. This requires looking more closely at the .-\rnoldi factorization
method and
its modifications
analyzing
after secant updates.
The main contribution
of the thesis focuses in
these updates
under the figure of Krylov-Broyden
updates
more efficient algorithms
for both inexact and quasi-Newton
methods.
and producing
In Chapter 4, we complete the second part of Krylov-secant
the previous chapter.
Besides updating
ideas introduced
the Arnoldi factorization
in
without destroying
the Krylov basis, we need to account for the changes of the nonlinear function after
each Newton step (i.e., at every new point).
This implies extending
the approach
to the change of right hand sides arising at different Newton linear equations.
find that switches to the Richardson
t.he underlying Krylov information.
HKS algorithms
iteration
are appropriate
to take advantage of
\Ve propose and discuss two different version of
according to the type of secant updates
with Broyden update)
employed:
are devoted to address the problem of preconditioning
methods.
subsequent
We also dis-
A couple of sections
and implementation
of line-
We end the chapter with some special considerations
for efficient large scale implementations.
pact representations
HKS-B (HKS
and HKS-E:'-i (HKS with EN type of update).
cuss a Newton's method variant called the HKS-N algorithm.
search globalization
We
This leads to revise limited memory com-
for the implicit computation
of accumulated
updates
through
nonlinear iterations.
In Chapter
.) we focus the attention
plexity associated
on the issue of preconditioning.
to coupled, non-symmetric
to study strategies of decoupling
The com-
and indefinite linear systems leads
and their role in preconditioning
liS
the whole system.
Since we are looking at a specific problem of two-phase flow. we devote some preliminary discussion to the numerical
system. This aids to understand
model. its discretization
algebraic
the convenience of an aggressive decoupling strategy
for the typical coupled systems arising in these problems.
clecoupled operators
and associated
A detailed discussion on
is in order, followed by coverage of different two-stage precon-
ditioners (those based in nested or inexact iterations).
out some implementation
issues.
\Ve end the chapter pointing
Chapter
6 covers extensive
numerical
experimentation.
[n agreement
two main points of the thesis, this chapter is divided into experiments
implementations
of the Krylov-secant
with the
for sequential
methods and for the two-stage preconditioners.
At t.he end of the chapter. both ideas are integrated
and tested upon a parallel two-
phase reservoir simulat.or.
Chapter 7 summarizes the main results and conclusions of this dissertation.
recommendations
Further
and directions of work are ranked in the order that the author con-
siclers worthwhile within the setting of large scale implementations.
1.3
Notation
For notational
simplicity. all scalars, vectors and matrices are treated as being in the
real vector space. We closely follow Householder notational
ing most of the entities.
:\Iatrices, spaces and functions are denoted by capital Roman
or Greek letters and vectors by lower case Roman letters.
lower case Greek letters.
engineering
conventions for represent-
Only exception
and physics notation.
All scalars are denoted by
to this rule. in order to respect customary
applies to differential equation
terms.
The norm 11·11 refers to the Euclidean norm and induced matrix norms. The inner
product
of two vectors
represents
the transpose
\Ve indicate
by
I\: (.-\.)
=
ll.
v E R is indistinctly
operation
IIAll IIA-III
denoted
hy ( ... ) or Ih~. (Here. /
of a "ector or a matrix in the real vector space.)
the condition
numher of an invertible operator
A E lRnxn. The spectrum or set of all eigem'alues of a matrix A is denoted hy AC-\.)
and it is a suhset of the complex space C. Given::
+ bi
E
C, the real part is
va
+ b2•
\Ve denote hy Inxn the n x n identity matrix; if the dimension n is apparent
from
denoted
by Re(=)
we represent
the context.
=
a and
= a
the imaginary
part by 1m (=) = b. With
Izi
=
2
the modulus of z.
we simply write I. The symhol
a
is used for the scalar zero and for
9
the zero matrix: in the latter case, the dimension is assumed to be evident from the
context.
\Ve represent with
the vector of zeroes with value 1 at the ith position.
ei
Its length should be apparent
from the context.
We also use the following notation
Pm,
=
{<PP)
=
f>l'i,\i I
aj
E
m},
IR,O ~ i ~
1=0
for the set of polynomials
of degree at most m.
For any vector u E IRn and any matrix A. E IRnxn,
we use
Km (A, u) = span {v, Av, A2v, ... Am-Iv},
to indicate the mth Krylov subspace of lIe, generated
Linear iterates are denoted by subscripts
(1.1)
by A and v.
(usually i and j) and nonlinear iterates
k enclosed between brackets. For the remainder of the chapter ancl
by the superscript
in some forthcoming sections of the thesis, we refer to the solution of a linear system
of the form
A.r = b,
where A = (ai.j)
non-symmetric
.fO
E
nxn
!R.
,
and x,b E
in general.
!R.n.
( 1.2)
\Ve assume the matrix A is nonsingular
In order to indicate inst.ances of linear iterations,
as the initial guess for (1.2),
Xi,
as the ith iterate
and
ri
= b-
AXi,
and
we use
as the ith
linear residual.
1.4
Some preliminary results
In this section a subset of linear and n'onlinear iteration
reference in the forthcoming
discussion.
results are established
as
10
1.4.1
Matrix
analysis
results
The following definitions
iterati"e
methods
properties
are found in the standard
(e.g .. [:3. 12]). \Ve state them as self-reference
1.4.1
The class znxnof
znxn
Definition
1.4.2
nxn
\I-matrices
= {A E ~nxn
=
when we discllss
when a given iterative
is given by
i
: ai,) ~ 0;
# j}.
of M-matrices
{A. E znxn
:
(A-1) .. ~
is given by
O}.
I,)
play an important
in\'f'rse of a \I-rnatrix
Z-matrices
The class of Mnxn
.u
t hat
of matrix theory or
of coupled linear systems.
Definition
termining
literature
role in matrix analysis.
method is convergent.
is lIonnegatil'f.
They are the basis for de-
Definition lA.1 says that the
and therefore monotone.
'We remark. however.
a matrix could be monotone even though it is not a M-matrix.
Other important
class of matrices are those that have all eigenvalues in the right half of the complex
plane.
Definition
1.4.3
The class of
of po"itirf.
pnxll
stable (or positive real)
matrices is given by
pnxn
= {A E lRnxn : all t.igftll'aluu.
The class of positive stable
matrices
gO\'erned by systems of coupled ordinary
of A. hal'(; positive
appears
frequently
differential
real part}.
in dynamical
equations.
systems
Their occurrence
implies the stabilit.y of the numerical model solution. The following result established
in [:3] characterizes
the relation between \I-mat.rices
and positive stable matrices.
II
Theorem
1.4.1
Some additional
Let A E
then A E
znxn,
if and only if A E
pnxn.
results are in order to estimate the location of eigenvalues.
we point out that a matrix is irreducible
if there is no permutation
such that PA.P! is a hlock upper triangular
Definition
JInxn
1.4.4
First.
matrix P E llexn•
matrix.
The matrix A is diagonally
dominant
if
n
laijl > Llaijl,j
= 1,2, ... n.
)=1
#i
and irreducibly
least one
ro\V
diagonally
dominant
if the strict inequality
holds for at
and
n
L laij!.j
laijl ~
= 1, 2, ... n.
)=1
):¢:.i
Theorem
1.4.2
(Gershgorin)
A (A) is enclosed in the union of the discs
and in the union of the discs
For implementation
Woodbury formula
original matrix.
where B E
of single or multiple secant updates,
the Sherman-Alorrison-
is useful to relate inverses of rank-one updates
to inverses of the
That is
jRmxm
and C \I·t E
jRnxm.
Here, we have implicitly
expressions between brackets are invertible.
assumed
that
the
1:2
1.4.2
Fundamentals
Iterative
methods
on whether
general
T-I.\[
(T-I
initial
guess
methods;
The scheme
.ro. The
that
most
.\1 is assumed
to be invertible
non-stationary
value
iteration
T.
scheme known.
the optimal
relaxation
parameter.
is callecl the /'ela.ration
Topt,
(see e.g .. [:3. 7·L 11:3]). Here.
'\max
and
).min
i.
as conjugate
by (1.4).
applied
iteration
to the preconis the simplest
stated
parameter
positive
Topt
character
by choosing
of the iteration
by some a priori spectral
is determined
is given by
(such
It is simply
when A is a symmetric
and can
for all values of
methods
described
of as the Richardson
(1.4)
The stationary
to a given constant
Ti
in convergence
of A. For instance.
=
definite
L .. : L._
are the largest
matrix.
thell
for a stationary
and minimum
of A. respectin:{v.
eigenvalue
In the positive
parameters
interval.
i = 0,1, ... ,
of the iteration.
known
parameter.
information
A = ;\1.,. - ,'iT =
+ T;JI-lrj,
.f;
operator
iterative
its effectiveness
In a
C. Hence, the following
.\/-1A.e = .\I-lb. In fact. the Richardson
.\[ = l. The damping
the iterations.
splitting
E
T
see e.g .. [S]) can not be exclusively
and lion-stationary
and rl'gularly
parameter
(b - A..r;) =
can be thought
system
:-;tatiollary
iteration
by t.he following damped
here as the preconditioner
however.
during
depending
results
+ T;.\[-l
1';
or non-stationary
of their parameters
by fixing the parameters
We remark.
ditioned
scheme
.r;+1 =
be interpreted
is induced
are changes
as stationary
JI - .4). for a given nonzero
iterative
for a gi\'en
classified
this can be realized
non-stationary
gradient
are generally
or not there
format,
-
of iterative solution methods
reduces
which
is customary
definite
case.
to finding
is given
to assume
the computation
the best
by a Chebyshev
that
of optimal
approximation
polynomial
the symmetric
part
Richardson
polynomial
[3].
of the
on a continuous
In more
matrix
relaxation
general
is positive
cases.
definite
it
(i.e., the matrix
across
is positive
stable)
or that the eigenvalues
each side of the complex
are absent
in the original
these requirements.
special
plane
matrix,
(the indefinite
then adequate
In the broad
literature,
well kno\ ....n stationary
.\1.,. is defined:
e.g., the damped
preconditioning
methods
of A. When
part
.Jacobi and Gauss
Ti
iterative
Jacobi
(A.) and the SOR iterative
triangular
\\ihen
in two ellipses
these conditions
should
iteration
try to meet·
is considered
a
(see e.g., [72, 74, 145] for further
.
Note that
T-Idiag
case).
the Richardson
case of the large class of Chebyshev
discussion)
are clustered
=
methods
iterative
method
when
=
0,1, ...
1, Vi
derive
method
their
arises
AIr is defined as
name
when
from how
taking
T-1 times
we get the non-damped
AIr =
the lower
versions
of
Seidel iterations.
From (1.4) it follows that
successive
non-stationary
Richardson
iterations
yields
the ith residual
(1..5 )
where
Oi ().)
nomial
1-
=
rIJ=o
(1 -
and it is monic
.\I.'i-I (,\)
= 1 - ,\
Tj).)
(i.e.,
Pi. This polynomial
E
4>(0) = 1). Therefore,
L~~~"·O).j,
is known
as the residual
it can be rewritten
poly-
as 4>;().)
with U' E Pi-I' Furthermore.
,ri = .ro
+ l!'i-I
(.-\)"0
(1.6 )
i-I
= .ro
+ 2: Tjrj.
j=O
This
implies
that
the solution
at the ith
iteration
is determined
by the affine
subspace
Xi
The relation
becomes
evident.
between
E
Xo
the Richardson
The Richardson
+ Ki (A,
iteration
iteration
( 1.7)
ro) .
and Krylov subspace
delivers
elements
in Krylov
methods
subspaces
now
of
1-1
increasing
dimension.
the same I\rylov
To generate
defining
subspace
polynomial
and with V represent.ing
Tj'S
(j)j
iterations
elements
in
set Q enclosing>'
(.4),
parameters
1. However.
assumptions
T,
these
are made.
Let
the eigenvalues
of .-\.
If we have some knowledge
eigenvectors.
then we can reformulate
the problem
of finding
[6.5]to
116dA) roll :S I\: (V)
min
As said above.
polynomials
the origin.
~early
optimal
obtained
in the general
expensive
to compute
On the other
subspace
[:3]
j
polynomial
when
all eigenvalues
(i.e.,
asymptotically
case of non-symmetric
(1.T)
implies
that
(.4., ro) without
and Arnoldi
processes).
obtain
than
of the required
relaxation
there
on a Krylov
• ~[inimal
shifted
Q
are real and
and scaled
does
polynomials
not
include
can be only
[62, S6]. Incidentally,
systems
are two types
subspace
residual
iterates
)iote that
we can readily
Basically,
through
optimal)
any prior
and from there.
reciprocals
is computable
they are
numerically.
hand.
K
(1.8)
.\eQ
,p(0)=1
the optimal
Chebyshev
maxl<i>d>')lllroli.
4lEPj
0(0)= I
defined
<
IIct>j(A)II
unless some suitable
the corresponding
·>E",
Lanczos
other
we seek relaxation
in such a way that
().)
available
IIrjll = min
Krylo\'
we produce
in the form A = V-1 DV, with D containing
A be diagonalizah1e
the
Richardson
are not readily
of a compact
after each iteration
as (1.6) suggests.
convergent
the residual
parameters
:\'[oreover,
information
from a basis of the
on A (.-\.) (as happens
such basis allows us to compute
Pl.
The
Tj
for solving
prior information
Choose
for the Richardson
Zm
E
the linear
of the spectrum
Km (.-\.. ro)
in
1;'j_1 (,\)
of d>j (>') are nothing
roots
parameters
of approaches
without
approximation:
OJ
can be obtained
more
iteration.
system
(1.2)
of .4.:
and solve (e.g., :\lI~RES.
G:\IRES)
min
:EKm(.-\.ro)
lib -
A (xo
+ z)11 =
min
:EKm(.-\.ro)
lira -
Azli.
(1.9 )
1.)
or a,
• Petrov-Galerkin
approximation:
Choose
Zrn
E
Km (A. ro) so that
(1.10)
.~'
The Petrov-Galerkin
approximation
reduces to the Galerkin approximation
Sm == Km (A. ro) (e.g .. CG). For solving non-symmetric
based on Petrov-Galerkin
wElle.
approximations
These algorithms
use 8m
problems,
== Km (.4t•
when
most methods
w). for some initial vector
are built over Lanczos bi-orthogonalization
procedures:
e.g, BiCG, CGS and BiCGSTAB.
As regards to (1.6), both approaches
solution by set.ting
Xm
The non-stationary
tation.
In Chapter
practical
standpoint
=
IO
(1.9) and (1.10) obtain
an approximate
+ =m·
Richardson
iteration
plays an important
-I:. we discuss in more detail this method
in connection
role in this disser-
from a theoretical
to near optimal residual polynomials
and
and Krylov
subspaces.
\Ve say that the splitting of A is a (weak regular splitting)
is monotone and the term (AI;l NT) NT is nonnegative.
block splitting
for M-matrices
regular splitting
if .\/,.
Convergence of point. line or
A and .\1. following the scheme (1.-1:)is guaranteed
a~
the following theorem reveals.
Theorem
splitting
Proof
1.4.3
If A = .\1 - N is a weak regular splitting,
then the
is convergent if and only if A is monotone.
o
See e.g., [:3;p. 213].
~ote that (l.-l:) is a form of the classical Picard's
method of successive approxi-
mations converging to a unique fixed point when the mapping is a contraction
[1 V5].
l6
From that point of view. ~evanlinna
for stat.ionary and non-stationary
[99] makes a rigorous analysis of convergence
(including the GMRES iterative solver) methods.
Finally, as regards to the following 2 x 2 block partitioning
-\.r. .
-
All
(
All
we define the Schu.. complement
,412)
u )
(
An
v
( f
--b
9
of A with respect
A21i"li} A12 (5'22 = All - .412.4.221 A21)
)
of (1.2)
-.
to All (.4.22)
as 511 = An -
The Schur complement
arises when pivoting
with the diagonal blocks to generate a block LG decomposition.
We use it to generate
an efficient t.wo-stage preconditioner
coup:ted equations.
1.4.3
Nonlinear
•
for pressure coefficients associated with the linear
For a helpful source on Schur complement
analysis see [106].
convergence
For !the convergence theory of inexact ~ewton
methods,
we need the following defi-
nitiorrs (see e.g .. [-19. 10,5] for further details).
Definition
1.4.5
A function F : 0 C IRn -+ IRn is Lipschitz continuous
if it sat isfies
IIF (.r)
- F (!I)lI
lor some,,: > 0, and V.r. yEO.
F E
~ :
lI·r -
The condition
!III·
is summarized
by denot.ing
.c.., (0).
Definition
{.rlk)}
k=O
1.4.6
Let.r .. r(k)
E IRn, k = O. 1. ...
Then the sequence
converges to .r
• q-li/lwrly
if there is a constant. c E (0,1) and an integer
that for all J.: ~ k
k ~
0 such
17
if there exist an integer k ~
• q-superlinearly
a such
converging to
k
k
that for all k ~
if there is a constant c >
• n-step q-quadratically
such that for all k ~
function F :
1.4.1
n ~ IR
n
(Standard
~
• The equation
assumptions).
• F' (u")
IRn x n E L, (n) .
is invertible.
an integer
Consider
k~ a
a nonlinear
IRn for which we seek to solve the equation
above has a solution at u·.
~
a and
k
F(u)=O.
• F': n ~ IRn
and a sequence {Ck}
if there is a constant c > 0 and an integer k ~ 0 such
• 'I-quadratically
Assumption
that for all k ~
a
19
Chapter 2
Newton-Krylov
2.1
methods
The inexact Newton framework
Interest in using )rewton's method combined with a Krylov subspace method to solve
large scale nonlinear problems dates from the middle 19S0's [143]. At that time, these
methods were rapidly evolving together with their applicability
arising from systems of nonlinear ordinary differential equations
and references therein).
for algebraic problems
(see e.g., [19, 22, :36]
In the context of partial differential equations their suitability
for solving large nonlinear systems was finally established
through the work of Brown
and Saad [20]. In their paper, Brown and Saad include extensions
globalization
techniques,
to several types of partial
scaling and preconditioning.
differential equations.
is still going on fr0111both the theoretical
_. • •'3')
_.
[')'3
- :):).
-0]
I;J
for application
of
They also discuss application
Currently.
and the practical
intensive investigation
standpoint;
see e.g.,
•
This chapter discusses the globally convergent inexact method used in the present
work and formalized by Eisenstat
and Walker [.55.. =)6].The basic components
of this
method is a Krylov subspace iterative procedure for solving the Jacobian equation.
forcing term selection criteria on dynamically
search backtracking
The discussion starts
adjusting
a
linear tolerances and a line-
method to provide global convergence under suitable conditions.
from the presentation
to describe each of these components.
devoted to the G~lRES Krylov-subspace
the Arnoldi factorization
of the general algorithm
and proceeds
This comprises § 1. In § 2, the discussion
iterative method with particular
attention
is
to
on which it is based. Some words are devoted to the method
:W
and its implementation.
§:3 outlines the BiCGSTAB
Krylov iterative
solver.
Its
analysis is given with less emphasis than GMRES since it is only used for comparative
purposes
2.1.1
when
we study
two-stage preconditioners
in Chapter
.5.
Algorithm
Consider finding a solution u" of the nonlinear system of equations
(2.1 )
F(u)=O.
where F :
and .I(k)
hh
{1 ~
IRn
== .I (/t(k»)
-+
IRn.
For the remainder
denote the evaluation
~ewton step. respectively.
of the thesis. let
of the function
== F (u(k»)
F(k)
and its derivative
at the
Algorithm 2.1.1 describes an inexact Newton method
applied to equation (2.1).
Algorithm
1. Let
2.1.1
11(0)
(Inexact
:\ewton method)
be an initial guess.
2. For /..~= O. 1. 2....
2.1 Choose
until convergence do
E [0, 1).
'l(k)
2.2 {'sing some Krylov iterative method, compute a vector
S(k)
sat-
isfying
(2.2)
The residual solution.
r(k),
represents the amount by which the solution,
by the I\.rylov iterati\'e solver (namely, GMRES or BiCGSTAB
satisfy the .Vewton
Equation
(or Jacobian
equation)
s(k).
given
in this work) fails to
21
(2.:3 )
The use of an iterative solver stems from the high cost associated with solving (2.:3)
exactly. Obviously, if
Ol't
is far from u· then iterating
ll(k)
to a low tolerance may produce
/'soluing of the Ne\\,ton equation since there may be a considerable
between the nonlinear
disagreement
function F and its local linear model. The residual norm in
the linear solver must be reduced strictly for a locally valid Newton step.
The step length
is computed
,\(k)
ensures a decrease of f(ll)
= ~F(ll)t
to be a descent direction for f(u(k»).
using a line-search backtracking
F(ll).
method which
The step given by (2.2) should force
s(k)
That is,
(2.4)
In this case. we can assure that there is a (0 such that
0 < ( < (0.
'V
\ote
that in view of (2.2), (F(k)r
that the required condition
is sufficient for
s(k)
=
-IIF(
+ (F{k)r
inequality argument tells us that
being a descent direction.
linear solver must be reduced strictly
which means
r(k),
Ilr(k) II = IIF(kl
+ J(kls(k) II <
Thus, the residual norm in the
for a locally valid Newton step.
this condition is achieved by setting a linear tolerance,
residuals generated
2
klI1
(2.4) is achieved whenever
.\ simple Cauchy-Schwarz
II Fik) II
J(k)s{k)
say
7](k),
In practice,
to be met by relative
in the linear solver. That is,
o < 7] (k) < 7]max
< 1,
(2 ..j)
where m indicates
act ~olution
s(k)
the number of linear iterations
is acceptable
employed.
condition.
condition
The predefined linear tolerance
term of (2.,5) (i.e., the term that forces such condition
Dembo, Eisenstat
inexact )l'ewton iterates
'Ilk)
:::; 'Imax
TJ(k)
< L and if
Tl(k)'S
uniformly bounded
is known as the forcin[J
to be satisfied).
is nonsingular.
1.4.1, for
u(k)
suf-
below one, that the sequence of
If the sequence converges to zero with
converge q-linearly,
Jl·)
This
or more generally. the
and Steihaug [.tI] showed under Assumption
ficiently close to u· and
the lOex-
for the Newton step whenever (2.5) is satisfied.
condition is known as the Dembo-Eisenstat-Steihaug
ine.mcf Newton
Therefore,
then the iterates generated
If
2.1.1 converge to the solution superlinearly.
TJ(k)
=
0
(IIF(k)II),
by Algorithm
then the sequence
converges quadratically.
2.1.2
Forcing terms
Empirical criteria t.o select forcing terms can be seen, for instance,
Although
the choices proposed
in [20. :3.5.10:3].
therein are somehow simple. cheap to compute and.
some of them promise fast local convergence. their ad hoc nature preclude them from
a broader applicability.
small
,/(k)
In some situations.
these choices may generate a prematurely
for large values of /...with iterates
causing oversoh'ing
of the :\ewton
equation.
of 'Ilk) (close to 1) after se\'eral iterations
convergence.
Therefore.
u(k)
relati\'ely
far from the solution
u·
Conversely. keeping quite high values
may severely affect the nonlinear
rate of
the point is to select these forcing terms in a systematic
way
in order to ensure efficiency and rapid convergence at the same time.
Criteria for choosing the forcing term,
,,(k).
in (2 ..5) have been extensively stucliecl
by Eisenstat and \Valker [.5.5,.56]. Although their choices still have a heuristic blend.
they are designed together
becoming arbitrarily
with an efficient mechanism
small or large.
Besides providing
for safeguarding
them from
robust and efficient choices
:2:3
that prevent a significant amount of oversolving in (:2.:3) without sacrifying desirable
rates of convergence.
their results formalize much of the trial and error judgment
specifying dynamic linear tolerances
\Ve have incorporated
mentation
of this thesis.
choice for
7](k)
Eisenstat
in an inexact Newton method.
and Walker ideas in the Newton-Krylov
imple-
In fact, it has been observed that the following particular
works \vell in practice [-1:0, .56]:
(2.6)
\\·here
The choice of
17(k)
-(k)
_IIIF(k)II-IIF(k-l)
'1
-
+ .j(k-l)s(k-l)11I
I\F(k-l)1\
given above reflects the agreement
model at the previous iteration.
between F and its linear
Thus, the linear solver tolerance
~e\Vton step is less likely to be productive
(2.7)
.
is larger when the
and smaller when the step is more likely
to lead to a good approximation.
Expression (2.6) is a safeguard that prevents (2.7) from becoming arbitrarily
especially when the iterates are still far from the solution.
there is coincidentally
a good agreement
smalL
This may happen either if
between F and its local linear model (e.g ..
regions where the function behave almost linearly away from the solution) or if there is
a \'Cry small :\'ewton step (i.e .. there is little progress as consequence of being far from
the solution).
and
,](0)
In most practical
cases. Eisenstat and vValker suggest
= 0.5 as a fair initialization
"max
= 1 - 10--1
strategy.
Eisenstat and \Valker suggest other choices for forcing term criteria and safeguards.
Among alL the one proposed here has been the most robust combination
observed in
practice (\ve refer the reader to [.56]).
The following result formalizes the local convergence predicted
ing term criterion.
by the above forc-
Theorem
2.1.1
(Eisenstat
ditions given by Assumption
r
and \Valker, [,56]) nder the standard
1.4.1, the sequence {1](kl} generated
conby the
forcing term criterion (2.7) produces a sequence of inexact Newton iterates
{/t(k)}
that converges as follows
.\ couple of remarks are in order.
Remark
2.1.1
The theorem immediately
gence and t\vo-step q-quadratic
Remark
I.:
2.1.2
implies q-superlinear
COl1\'er-
convergence.
Clearly, this safeguard
ensures
(1](k-l)
f
~1](k)
for all
> O. One can t hen argue that expression (2.6) may prevent this con-
\'ergence result (2.S) from happening.
(:2.6) eventually
r-quadratic
becomes inactive.
In practice. however, the safeguard
Eisenstat
and 'vValker [.36] show that
convergence proceeds p\'en if the safeguard condition is active
for all \'allles of k.
The following example illustrates
on a real application
application
problem
the high le\'el of efficiency that can be achif>\'ed
with this forcing term criterion.
Incidentally.
this
represents one of the immediate targets to be pursued in this dissertation.
Example
2.1.1
Fig 2.1 shows a loglo scale plot of accumulated
of G~IRES iterations
against
IIPII
for a moderate
number
problem size of 24 x
:24x ,32 (composed of more than SOOOO unknowns) taken from a given time
step of a :3-0 two-phase black oil reservoir simulation
details).
The staircase
shape displayed
(see [40] for further
by an inexact
Newton method
:2.5
2
-2
-3
o
0,5
1
1.5
10910 of number of GMRES
2
iterations
3
Figure 2.1 The lise of the forcing term criteria for dynamic control of
linear tolerances. The solid line represents a standard inexact Newton
implementation with fixed linear tolerances 0.1. The dotted line is the
inexact ~ewton implementation with the forcing term criterion. Each symbol
* indicates the start of a new Newton iteration.
with a fixed forcing term suggests the amount
in generating
decreasing
Newton directions.
of wasted computation
In contrast,
the criterion
given by (2.6) and (2.7) avoids flat portions of curve resulting then in a
significant overall saving of G~IRES iterations
2.1.3
(about -WO iterations).
Globalization
The condition (2.5) itself is not sufficient for converging to the root of the nonlinear
function F if the inexact ~ewton iteration starts at any arbitrary
we use the line-search
backtracking
this method, we need to find a step
point. In this work
method in order to remedy this situation.
slk)
For
that not only satisfies (2 ..5) for a given '11k)
but also a condition for ensuring a sufficient decrease for IIF(k)lI.
can be also achieved by trust region methods
Global convergence
(e.g., [49, 124]). However, we restrict
26
our implementation
to line-search
flpxible
t.o use. sufficient
ble.
and simpler
Some developments
methods
reasons
on the use of trust
since they tend to be more
for them
region
to be more
methods
within
widely
applica-
inexact
Newton
can be found in [20. 2:3. :j.5. /7].
iterations
[n any case. the key point
greater
model
backtracking
or equal
to some fraction
(i.e., the direction
condition
is to guarantee
translates
This inequality
combined
the act.ual reduction
of the predicted
reduction
from solving
the linear
obtained
to accepting
that
a new Newton
needs to be
given by the local linear
Newton
equation).
step if
with (2.5) yields
(2.10)
t E (0,1).
The above expression
This
can also be seen as the result
of combining
the a-condition
[-i9],
of Goldstein-Armijo
(:2.11)
with
(:2.:')).
inequality
Here.
f
= t IIF112.
and (2 ..1). it readily
f
(u(kl
+ s(kl)
0
< (\ <
1. In fact.
applying
the Cauchy-Schwarz
follows from (:2.11) that
= IIF(k+1lll 2 ~
IIF(klll
2
+ 20
2a(-IIF(klf
=
IIF(kl
=
[1 - 2a (1 - '1(k))] IIF(klf
2
+
I1
(F(kl.J(kls(kl)
+
(F(kl,
r(kl))
(2.12)
.
therefore.
Since )1 - 2a(1square
roots
'1(kl)
~
1-
0'
(1 - '1(kl)
in both sides of the above
inequality
for 0
<
2a
we obtain
(1-1J(k))
~ 1. by taking
(2.10) with a
== t.
Note
that
the coefficient
procedure
IIF(k)11
of
as the iteration
It is straightforward
that
in (2.10) is less than
advances.
to observe
That
that
<
1}(k)
is sm,all. ~'Ioreover.
;'sufficient
reduction"
condition
(2.,5) alone
II F(k)1I
is not
for this is that
consequently.
between
met for a given
computed
of norms
< 2(1-
may not converge
the nonlinear
As said above,
step
decrease
in
in (2.12) implies
the following
and
the restriction
for any 0
TJ(k))'
conditions
'l(k)
is recommended
in
f.
This
t.
~
to shorten
the current
to replace
<
n(k)
<
1]max
it is not true
<
and
IIF(k) II.
See
of
step
s(k)
the a-condition
[49] and it additionally
conditions).
for a given pair
To overcome
to avoid small relative
by
,\(k)8(k)
this fractional
t
this, it
decreases
for a suitable
of Goldstein-Armijo
follows that
t,
on
1.
that
simult.aneously.
step in order
the current
It can be shown that
holds in this case
However,
(2 ..5) and (2.10) are satisfied
means
[~.':\] C (0.1).
17(k)
f
condition
~ot.e that since t (= a) is less than unity (due to the Goldstein-Armijo
this imposes
The
s(k).
to a minimizer
a sufficient
for
of this problem.
1
t
requirements
necessarily
sources
and the actual
simultaneous
agreement"
to generate
1, which suggests
the predicted
method.
6] for two identified
0<
<
'7(k))]
given by Newton's
{u(k)}
of the
= O.
model
the sequence
The non-negativity
between
and for a "sufficient
it may not be possible
[49. Chapter
(1 -
[1 - t
(2.10) and (2..j) impose
F and the local linear
function
reason
of
the convergence
f (u(k))
is, limk_oo
for a value of t close to one, the margin
reduction
1, implying
,\(k)
E
(2.11) st.ill
step leads t.o
(2.1:3)
Therefore.
by
,\(k)
,\(k)s(k)
and
(2 ..5) and (2.10) are satisfied
1-
,\(k)
from approaching
grow arbitrarily
larger
at the same time with
+ ,\(k)17(k),
respectively.
zero which
would make
than
1. contradicting
S(k)
The only subtlety
17(k)
(2.10).
~
1. making
and
1](k)
replaced
left is to prevent
it possible
for t to
The discussion above provides almost all the necessary ingredients
tracking globalization
method.
To generate adequate
fractional steps with
polynomial p (A) interpolating
find the minimizer of a quadratic
for the backwe
,\(k).
the function
(2.1-1)
over a predefined
practical
=
p(O)
:2 (F(k).
[~,X]
interval
implementations.)
=
p(l)
;:(1)
The mechanics
J(k)s(k)).
fined as
s(k)
=
,\(kls(k)
2
+ s(k))11
(u(k)
is computed.
and
,.,(k)
1_,\(k)
,.,(k))
in such a way that.
and p'(O)
in backtracking
the ~ewton
(1 -
== [0.1,0.5], in most
this interpolating
is standard
,\(k)
=
= IIF
[~,X]
is constructed
for computing
defining the interval of interpolation
(see e.g .. [-19]). Once
(Initially,
This polynomial
= IIF(k)112,
;:(0)
C (0.1).
= ;:'(1)
polynomial
=
and
implementations
step and forcing term are redeuntil condition (2.10) is eventually
met.
Furt.her theoretical
results on globalization
procedures
for inexact Newton meth-
ods can be seen in [23 .. 5.5].
2.1.4
Why Krylov subspace
methods
[~rylo\' subspace iterative methods are basically based on a set of computational
nels such as inner products.
makes them part.icularly
performance
method.
attractive
This featurp
In the context of the Newton's
methods are suitable for reducing the frequently unaffordable
with the computation
the computation
in several respects:
• The function F (ll)
cost
of an exact Newton step. Besides high cost reasons
(i.e .. implied by a direct procedure).
impractical
multiplications.
for soh'ing large sparse linear systems at high
rates on vect.or and parallel machines.
iterative
associated
AXPY's and matrix-\'ector
ker-
is non-differentiable.
of the Jacobian
matrix can be
29
• Even if the Jacobian
analytically
exists, it may be expensive
or numerically .
• Even if the storage and computation
really ill-conditioned
Fortunately,
to store or compute either
of the Jacobian
is affordable, it may be a
system.
since in Krylov subspace iterative methods explicit knowledge of the
Jacobian is not required, but rather its action on a vector in the form of matrix-vector
multiplication,
the .Jacobian
the first two aforementioned
J(k)
on an arbitrary
and preconditioning
points can be overcome.
vector v can be approximated
The action of
by finite differences
can be even performed to overcome the third point above.
instance, right preconditioning
for a suitable small
E
via an operator
For
M can be realized as follows,
> O.
In problems modeled by PDE's, if the Jacobian is not available, the preconditioner
can be constructed
time-lagging
last respect,
by approximating
some operators
a lower order discretization
or conceiving domain decomposition
there has been a lot of activity
of the Jacobian,
strategies.
in the areas of nonlinear
In this
and non-
symmetric problems (see [1.5, 30, :32, 31, 78, 108) to mention just a few). Theoretical
convergence results of inexact Newton methods based on finite differences have been
analyzed in depth by Brown [17).
In our particular
scenario, we do not make a special consideration
the choice to rely upon finite-differences
Jacobian
matrix,
whatever
representation
turns out to·be applicable.
or explicit computation
We are primarily
with the general mechanics of algorithms not necessarily supporting
ity of the operators
therein involved. Throughout
if the user has
this dissertation,
of the
concerned
explicit availabilthe implicitness
:30
of the Jacobian
and its corresponding
memory quasi-Newton
2.2
methods
preconditioner
is achieved by use of limited
to draw the maximum savings in computation.
GMRES
Much has been written about the GMRES algorithm
in 1986 [114]. Currently,
proposed by Saad and Schultz
it is considered the most robust Krylov iterative
scientific and engineering
applications
(see e.g., [32, 39, 59, 73, 146]). This feature
has driven further research towards the generation
efficient iterative solvers, several preconditioning
of other equally robust but more
strategies and, the issue of inexact-
ness in Newton's method as analyzed here. Instructive
new iterative solvers in connection
One of the strongest
monotonically
findings have come about from
with the GMRES algorithm
arguments
method in
[50, 81,133,144].
for using GMRES is its capability
of producing
decreasing residual norms. For a problem size n, convergence is guar-
anteed within n iterations in the absence of roundoff errors. However, m iterations of
GMRES requires 0 (m2n)
making the procedure
floating point operations
infeasible for large values of m . Restarting
steps (with m ~ 11) has alleviated
longer guarant.eed.
and 0 (mn) of memory storage.
GMRES after m
the problem but, in this case, convergence is no
However. the restarted
version of GMRES (i.e., GMRES(m))
been observed to work well in practice.
Remark 2.2.1
Research on generalized minimal residual algorithms had
started several years before the launching of GMRES. In 1969, Kuznetsov
[8.5] proposed the minimization
lower dimensional subspaces.
in fact to the minimization
of weighted residual norms over suitable
His work on m-step residual methods leads
of residuals
Pointers and discussion of particular
subject
to a Krylov subspace.
cases on this approach can be found
has
31
in [88, 116]. Later related advances and unifying ideas are shown in [45,
53].
In this section, we discuss those details relevant to the situation
thesis. Further details on GMRES implementations
analyzed in this
are given at length in (113, 136,
137].
2.2.1
The Arnoldi factorization
The GMRES algorithm
generates
a basis for the Krylov space through
process. The Arnoldi process constructs
through the Gram-Schmidt
algorithm.
process creates a decomposition
an orthogonal
the Arnoldi
basis for the Krylov subspace
Given the notation
in § 1.3, the output of such
of the following form
(2.15 )
which can be equivalently
expressed as
(2.16)
where
The matrix FmH is orthogonal
and its columns represent
The matrix Hm is upper Hessenberg.
the initial vector
l'l
=
ro/ IIroll.
VI
Considering
a basis for Km (A,
t'l)'
(1.2) as the system to be solved,
is defined as the normalized
initial residual
fO
=
b - Axo, i.e.,
Note that it immediately follows that Hm = V~AVm' Clearly, the upper
Hessenberg matrix Hm is nothing but the matrix representation
A onto Km (A,VI)
with respect to the orthogonal
of the projection
of
basis V. The main idea of GMRES
(and in general, of Krylov subspace methods) is to project the original problem into
Km (A,
vd
and to apply any of the two approaches
described by (1.9) or (1.10).
32
2.2.2
Minimization
of residuals
In view of (2.16) the minimal residual approximation
min
:EX:m(A.ro)
where i3 =
IIrall
= yEJR
miItn ILBel -
lira - Azil
and
el
E
JRm+l,
AVmyll
(1.9) can be reformulated
= yEJR
millll,Be1
due to the orthogonality
Ym minimizes (2.17) the solution of (1.2) is obtained
In practice, the minimization
Givens rotations.
- H
myll '
as
(2.17)
of the columns of Vm+l. If
by means of
of H m via
problem is solved by a QR factorization
To achieve higher efficiency, the QR factorization
is performed
the Arnoldi process advances. This allows to compute rm without explicit use of
More precisely, consider that m Givens rotations,
reduce it to an augmented
where Q E
matrix.
JR(m+l)x(m+l)
upper triangular
Gi, have been applied to H m to
Rm
E
JRmxm
is an upper triangular
Therefore. we can rewrite (2.17) as the following minimization
Furthermore,
Ym = ;3R:;;/Q~el.
an easy calculation
Xm.
form
is a unitary matrix and
\vhose solution is given by
as
with
Qm = (Qm,qm+l)
,Om
problem
E
reveals that the residual of the minimization
lR(m+lxm).
problem
reduces to
(2.18)
Hence. the mth residual depends on the size of the initial residual and the resulting
entry (1. m
+ 1)
of the unitary
matrix Qm intervening
in the QR factorization.
It is
shown in [19] that this entry corresponds
transformations,
namely
(i,
to the accumulated
product
of the sine
involved in every Givens rotation Gi. In other words. one
can further say that
(2.19)
Therefore, one does not need to explicitly compute rm = b - AXm.
scalar product
per iteration
the QR factorization
retrieves the norm of the current
residual.
Instead, one
Note that
can be carried out efficiently since it is applied to an upper
Hessenberg ma.trix. The cost incurred in this factorization
is of only 0 (m) floating
point operations.
2.2.3
Convergence
If A is diagonalizable,
case Eisenstat,
then the basic result (1.8) follows. However, for a particular
Elman a.nd Schultz [.53]established
Theorem
2.2.1
If the symmetric
the following result
part As =
AI
At
of A is positive defi-
nite, then the norm of the mth residual produced by GMRES is bounded
by
(2.20)
where
Amin
(As) and
'\max
(At A) denote the smallest eigenvalue of As and
the la.rgest eigenvalue of At A. respectively.
2.2.4
Algorithm
In this work, we use right preconditioning;
i.e., we solve
and solve
lUX
=
x,
:34
to carry out the action of the preconditioner
.M.
It is well known that this form is preferable over left preconditioning
different preconditioners
iterative
since it makes relative residual nqrms measures within the
solver invariant.
of globalization
This norm size invariance simplifies the implementation
methods,
described
in § 2.1.
reason for adopting right preconditioning.
If a left
such as the line-search
There is a even more compelling
preconditioner
for comparing
is inexact or unavailable
norms accurately, or rather consistently,
backtracking
in closed form, there is no way to measure
throughout
the steps of an inexact Newton
method.
Interestingly
preconditioning
enough, there are no substantial
when the preconditioner
differences between right and left
is well-conditioned.
Both approaches gener-
ate the solution in the same Krylov subspace but with the basic difference that the
latter
leads to minimization
of the preconditioned
residual norm I/M-I (b - Arm)11
instead of (2.19). Further discussion on this subject can be seen in [113].
Given a matrix A, an initial guess, .ro, right hand side vector, b, right preconditionel' AI, restart
algorithm
parameter
of m and stopping
is
GMRES(A .. rQ,b,l\I,m,e)
Algorithm 2.2.1
1. Solve
My =
XQ.
2. ro = b - A.y.
:3.
VI
=
roj
Ilroll·
4. For j = t, 2....
, m do
4.1 Solve iVly =
4.2 tv = Ay
Vj.
tolerance
c, the restarted
GMRES
4.3 For i = 1,2, ... , m do
4.3.1 hi•j = (w, Vi).
4.3.2 w = w - hi,jVi.
4.4 hj+l,j
=
4.5 Vj+~;'
5.
Xm
=
Xo
Ilwll·
lV /
hj+l,j.
+ VmYm,where Ym
6. Solve AJy
=.,fm•
7.
r
=
b - Ay. If
8.
Xo
=
Xm
m
minimizes (2.17).
Ilr II < s
m
return.
and goto 1.
The loop in step 4 defines the Arnoldi process which is based here on the modified
Gram-Schmidt
orthogonalization.
We should point out that when hm+1.m =
-!.4 then it is not possible to generate
a at
step
the next Arnoldi vector. This implies that the
residual vector is zero and the algorithm delivers the exact solution in this step (i.e.,
a happy breakdown).
Conversely, if the algorithm
hm+1.m = O. Theoretically,
stops at step j with rj = 0, then
this implies the delivery of an exact Newton step instead
of an inexact Newton step in Algorithm 2.1.1.
2.2.5
The role of GMRES in Newton's method
The GMRES
algorithm
offers the opportunity
to compute
Newton's method in a more efficient way. For instance,
term criterion (2.7) can be computed
several entities
within
in light of (2.18) the forcing
as
IIIF(k)ll-
{3lq~+lelll
IIF(k-l)
II
(2.21 )
:38
2.3.1
General
remarks
In the Bi-CGSTAB
algorithm
residual fi is orthogonal
way.ri
is orthogonal
the iterates
to {rd~-l
1'0,
(Bi-orthogonality
Wjx),
condition).
The i-th residual can
where Pi is a monic polynomial of degree less than or
equal to i. The "shadow" residualsrj
rt=l (1-
in such a way that the
with respect to a sequence of vectors {fd~-l and in the same
be expressed as ri = Pi (A)
Qj (;r) =
are constructed
where the
are implicitly computed
Wj
as ri = Qi(At)ra.
Here,
are chosen so that
i # j.
This last condition can be enforced without explicitly referring to At. BiCGSTAB
has small storage requirements,
and produces a solution
smoother
E
Xa
+ K2k
residual norm behavior
guaranteed
2.3.2
Xk
requires two matrix-vector
(A, ra) . Typically,
products
per iteration
this method produces much
than CGS, but the residual norms still are not
to decrease from one iteration to the next.
Algorithm
Gi\'ell a mat.rix .4.. an initial guess, ,fa. right hand side vector, b. preconditioner
and stopping tolerance s. the BICGSTAB algorithm
Algorithm
1.
fa
2.3.1
.ro, b, AI, e)
BiCGSTAB(A,
= b - Axa.
2. Choose ro so that
rfJ'Fo =/:
:3. Po = Q = ",",'a = i = 1;
Va
O.
= Po' = O.
-!. While (Ilrill > s) do
I
"1'.
1
PI. -
rafi-l,}J
-t
. I:J _
-.
(-2L)
( -:-.
P,_I
<>
"'1_1
)
is
AI
39
4.2
= Ti-l + {3 (Pi-l
Pi
Mp = Pi.
4.3 Solve
Ap.
4.4
Vi
=
4 ..5
Q
= pi!
4.6
S
=
(~Vi)'
Ti-l
-
QVi.
4.7 Solve ills = s.
4.8 t
=
As.
4.9 Wi=WS)j(ttt).
+ QP + WiS.
4.10 Xi = Xi-l
4.11
Ti
=s-
4.12 i = i
Wjt.
+ 1.
j-d·
- Wi-lV
41
Chapter 3
Secant methods
We now introduce secant methods as iterative procedures
linear equations.
for solving linear and non-
They are better known for solving the latter type of problems (e.g.,
Broyden's method, DFP, BFGS, and so forth; see e.g., [49, 79,98, 101]) but, recently,
there has been important
activity formulating
[50, 52, 135, 144]. Recent developments
secant methods and other established
In our particular
context,
ways to solving nonlinear
has been one of the most effective
when the computation
Traditionally,
as a linear solver and consequently,
the iterative algorithms
literature.
almost forgotten
(the EN algorithm,
Their algorithm
of the Jacobian
matrix
this method has been con-
Eirola and Nevanlinna
since they developed a new algorithm
based on rank-one updates.
between
methods.
Broyden's method
equations
linear solvers
have shed light on new connections
is highly expensive or infeasible to obtain.
sidered impractical
them as non-symmetric
throughout
[52] revitalized the subject
as it is currently
known)
is an extension of the linear version of
Broyden's method and it is mathematically
equivalent
a special selection of the type of update.
Van der Vorst and Vuik proposed
efficient ways to implement
sion of GMRES (RGMRES)
to the GMRES algorithm
the former EN algorithm
[133, 134, 135]. Motivated
by these results, Deufihard,
.
Freund and Walter [.50]revisited the family of secant updates in combination
with a
In this way they were able to improve the effectivity of
Broyden's method as a linear iterative
with the GMRES algorithm.
more
which led to a recursive ver-
.
suitable line-search strategy.
for
method up to the level of being competitive
A unifying algorithmic
advances was recently introduced
approach
of all these previous
by Yang in her Doctoral thesis [144]. It is impor-
42
tant to remark that her work provides a nonlinear version of the EN algorithm
plays an important
role in the present dissertation.
In t.he scenario of nonlinear
exploiting
the conjunction
pOl·tant attempt
equations some isolated activity has been devoted to
of inexact Newton methods and secant methods.
rate of convergence by absorbing
treated
secant approximations
to reuse information
.Jacobian equation.
of low-rank updates
regard Broyden's
left or required by the linear iterative
Approaches
local q-superlinear
of the Jacobian matrix into
Other efforts on the sources of inaccuracy
in [48, 47]. Most recent approaches
method
Newton
as a vehicle
like these were proposed by Martinez [89] and Brown,
for I\rylov iterative methods such as GMRES. In the optimization
ically in truncated
methods,
similar attempts
preconditioners
arena, more specif-
were previously
reported
Nash [9.5, 96]. An approach
based on the combination
truncated
is reported by Byrd, Nocedal and Zhu [29].
Newton methods
to justify and describe a couple of new nonlinear algorithms
Instead of building preconditioners
to propagate
that subspace.
Krylov subspace
information
order versions of this algorithm
through Broyden's
it can be recurrently
and of Newton's method.
can afford the use of any desired preconditioner
overhead in computing
provides the
for large scale
out of secant updates, we rather suggest
The idea allows the efficient implementation
(NEN) algorithm and furthermore,
by
of limited memory BFGS and
This chapter reviews part of above the ideas but more importantly,
problems.
are
method for solving the
Saad a.nd Walker [21]. Their work was primarily meant to generate
foundation
One im-
dates back to the work Eisenstat and Steihaug in extending inexact-
ness issues to Broyden's method [54]. Basically, they characterize
linear residuals.
which
updates
restricted
of the nonlinear
extended
to
EN
to develop higher-
As a result, these algorithms
to the current problem and avoid the
secant updates and Jacobian evaluations
simultaneously.
43
The discussion sets out with Broyden's method
and the EN method.
the linear version in order to contrast both methods
the nonlinear
case.
This encompasses
updates as a prec9fr?itioning
tical scope, they certainly
strategy.
and motivate the arguments
§3.1. Thereafter,
Martinez and Brown, Saad and Walker enhanced
We present
we highlight some ideas of
by the possibility
of using secant
Although, they may turn out of limited prac-
drive the need of using the secant method as a device to
alleviate the high cost involved in solving the Jacobian
system.
This describes the
contents of § :3.2. In § :3.3. we suggest a way to perform rank-one updates
Hessenberg matrix resulting from the Arnoldi factorization.
previous development
(potentially
of the
In § 3.4, we find that the
allow us to reuse the Krylov information
version of the NE~ algorithm
The overall procedure
for
and devise an efficient
faster than the inexact Newton's method).
is named the nonlinear
KEN algorithm
and its cost is about
the same as that of a GMRES solution plus the almost negligible cost of updating
Hessenberg matrix and an extra minimal residual
solution such as (2.17).
higher-order
method is also introduced.
3.1
version of this algorithm for Newton's
a
An even
The family of rank-one solvers
Rank-one updates
for solving nonlinear equation
are sometimes referred as secant or
since no "true" Newton equation
qua.5i-.\'fwton
methods
all iterations.
These updates
new Jacobian
approximation.
one originally introduced
obey a secant condition
is ever formed throughout
that has to be satisfied by the
The best of these methods still seems to be the first
by Broyden [24, 2.5] *.
'Throughout
this dissertation,
we restrict the attention to the "good" versions of Broyden 's method
and the NEN method since they have been commonly observed to be the most effective in practice
and do not introduce a loss of generality to our discussion.
44
Under the same philosophy, rank-one updates for solving linear systems progressively updates an a.pproximation
to the desired solution.
conditioners
approaching
this variability
to the matrix (or its inverse) in order to converge
These approximations
are nothing more than variable pre-
or acting as the original operator
in the preconditioner
discussed in §§1.4.2.) Unfortunately,
of the system. (Note that
implies the non-stationary
iterative method, as
this type of methods has had a bad reputation
in
solving linear systems for a long time. Recent efforts such as [50, 52] have contributed
to a reconsideration
of this position.
for Yang in yielding a nonlinear
Remarkably enough, these works were valuable
interpretation
of the EN algorithm
[144].
Our goal in this section is to introduce Broyden's method and the NEN algorithm.
The evolutionary
path leading to the current NEN algorithm
the developments
in the linear case be covered first.
further motivation
algorithm,
of the ideas in this chapter.
which incidentally
requires that some of
However, this should serve as
We emphasize the essence of the EN
presents a close affinity to higher-order
methods derived
from Newton's method and already known for around thirty years. The key result is
that inexactness can be introduced
into these rank-one met.hods without losing much
of their local rapid convergence.
3.1.1
Broyden's method
Given u ~ u- and lvl ~ J (u) , we can find an approximate
new Newton step, ll+, by
(:3.1 )
Broyden's method computes a new 1\1+ by means of the following rank-one update
iU+ = 1\-'/
+
[p+ (u) -
P (u) - 1\-'/ s] gt
gts
'
(:3.2)
45
which, whenever the approximated
Jacobian
equation
is solved exactly, i.e.
lYf
s =
-F (u), it reduces to
~'l+ _
lV.
for gts
#
u
-
lVl
+ F+ gt(u)
s
gt
,
O. The vector 9 can be chosen in several ways. For instance,
we obtain the "good Broyden's update"
the "bad Broyden's update"
and when 9
= Mt
when 9
s,
[F+ (u) - F (u)], we have
(see [49]).
Applying the Sherman-Morrison-
Woodbury
formula
(1.3) we obtain
the corre-
sponding inverse form of (3.2)
(3.3)
where y = F+ (u) - F (tl),
In particular,
P
=
l
i- o.
AJ-1 and provided that py
if F (u) = Ax - b = 0 is a linear function, then it is not hard to see
that (3.1) represents the instance of a stationary
lvI. In such case, we have the·following
iterative method with preconditioner
formula to update M-I
at the ith iteration
(3.4 )
with qi = ri+l - ri, nqi
#
O. Here, ri, denotes the ith residual of the linear iteration.
As in t.he nonlinear case. there are several possible choices for
hensive list of choices for which we refer the interested
she concludes that ji
where
Po, PI, ... ,Pi-I
= (I -
AA'!i-1 )t(I - .4AJi-I)qi
form an orthogonal
initial guess ia = Xa
conj-ugate
+ AJal Xa.
residual
reader to [144]. In summary,
and ji = AI-t(pi
- L~::b(pj,
(GCR)
method
were mutually
Pi)Pj),
by Pi,i
The former is mathematically
and to GMRES
orthogonal.
Both options
with the
of updates
make Broyden's
=
equiva-
The second option coincides with the projected
dates studied by Gay and Schnabel [68] and implies the computation
if all directions
cites a compre-
basis for the space spanned
0,1, ... ,j - 1, are two of the best options.
lent to the generalized
Ii- Yang
upas
method
-!6
to converge in at most n steps.
line-search
strategy
Deuflhard,
Freund and Walter [50] incorporate
to refine the proper step length,
Cl'i,
for updating
a
intermediate
residuals and solutions, that is
(:3.5 )
This makes the method also competitive
with GMRES even for simple choices of
Ii
such as Ii = l\ifi-tpi (i.e., the "good Broyden's update").
All the above features were absent in the former Broyden's
was established
to terminate
version of Broyden's
definition of
algorithm
within at most 2n steps [67]. Therefore,
[25] which
an improved
method for the linear case looks as follows (we leave open the
Id:
Algorithm
3.1.1
(Linear Broyden iterative solver)
1. Give an initial guess
2. Compute
TO
Xa
1\-10'1.
and inverse preconditioner
= b - Axo.
:3. For i = O. 1, ... until convergence do
3.2 qi
= Api.
"1,-1 (Pi-,'-'ti-1q, )1,1
3.3 •.\/-1
i+l = lv.. j +
II
q,
I
3.4
Cl'i
= II{TI. Provided
1
q,
that Ifqj
.
P
'd d h
rOVl e
t at
-'-
i qj T
a.
# o.
Note that except for the update in step :3.3 and defining
this algorithm
ft
is a general form of a descent method
Ii
= qj, for all i
for linear systems.
= 0,1.
....
Eisenstat.
47
to derive th~. GCR method and other
Elman and Schultz [53] use this presentation
three closely related methods.
Broyden's method for the nonlinear case relies on equations (3.1) and (3.2) above.
One of the major virtues of the method consist of finding the minimal solution
Frobenius norm ( i.e.,
IIM+ -
in
MIIF) over all matrices satisfying the secant equation
M+ S
=Y
= p+ - p.
(3.6)
For simplicity, we omit the line-search step length determination
in the next al-
gorithm.
Algorithm
3.1.2
(Nonlinear
Broyden)
1. Give an initial guess uta) and Jacobian
approximation
Mo.
2. For k = 0, 1, . , . until convergence do
2.2 Update solution
2.3
2.4
q(k)
=
F(k+l)
JI(k+1)
_
= i\lI(k)
u(k+1)
=
ll(k)
+ s(k).
p(k).
+ (q(kl_M(klsP'l)(
slI'l )'
It can be shown (see e.g., [49, 79]) that Broyden's
superlinearly
u(k)
#
u·
to P*
=
P (u·)
=
0 under Assumption
method iterates
converge q-
1.4.1 and limk_-x, u(k)
=
lL*,
if and only if
(:3.7)
Condition
cornerstone
optimization
(3.7) is better
known as the Dennis-More
in proving local q-superlinear
(see e.g., [43, 49]).
characterization
and it is
convergence for general secant updates
in
48
3.1.2
The family of EN-like methods
The family of EN-like methods
Eirola-Nevanlinna
algorithm
in the way the step length,
update
are specified.
efficient algorithms
outperform
refers to a generalization
formerly proposed
Qi,
in [52]. The generalization
and the components
Depending
given by Yang [144] of the
intervening
resides
within the rank-one
on how they are selected, Yang shows that more
than Broyden's
other well established
method
are obtained
which consequently,
may
methods such as the GCR algorithm and GMRES.
In general terms, the EN algorithm computes directions based on Mi+l rather than
on :Hi as depicted in Algorithm 3.1.1. This implies that the algorithm
is looking one
step ahead compared to Broyden's
complexity
the EN algorithm
approximately
method.
restarts,
truncation
EN algorithm
is about
twice as fast than the other two.
in terms of memory management
and implicit updates)
and computation
give additionally
The linear version can be described as follows:
3.1.3
(Linear EN iterative solver)
1. Give an initial guess
2. Compute
Xa
and inverse preconditioner
1\10'1.
ra = b - Axa.
:3. For i = 0, 1, ... until convergence do
3.2 qi = Api.
'33
"1-1
'.'
.l~ i+1
:3.4
Pi =
-.
1
,\Ji
A1i+\ri.
+
(p,-lv[,-Iq,
f', q,.
(through
slight advantages
[144].
Algorithm
of
doubles both Broyden's and the GMRES algorithm
[.50, 144]. However, the EN algorithm
Careful implementations
Hence, the computational
)1,'. PrOVl
'd e d t hat ft i qi
...i.
I
a.
to the
49
3.6
Qi
=
This algorithm
ri
jji:
Provided that fttqi
•
i q.
is equivalent
Moreover,
as the composition
of two Broyden's
=
O.
to Algorithm
steps 3.4 and 3.5.
Q
#
one iteration
3.1.1 when
Pi
= Pi and qi = qi in
of the EN algorithm
iterations:
can be regarded
one with unity step length (i.e.,
1) followed by another one with the optimal choice (3.5) without updating
lvli-1 to A-I.
approximation
In light of the similarities
the
in the linear case, one can
expect that the nonlinear version performs two Jacobian system solutions as suggested
by steps 3.1 and :3.4 above. Indeed,
Algorithm 3.1.4
(Nonlinear
1. Give an initial guess
uta)
EN)
and Jacobian approximation
1\-'/0.
2. For k = 0,1, ... until convergence do
2.2
q(k)
=
JI(k+1)
F(k+l)
=
_
plk).
AIlk) +
(qlkl_M1kls!k))(s!kI)1
(s(k)f
Update solution
Notice that the direction
combination
of the direction
U(k+l)
=
computed
s!k)
u(k)
+ S(k).
by the nonlinear
delivered by Broyden's
EN algorithm
method
is a linear
and an extra direction
coming from step 2.4. In fact, it can be shown after some algebraic manipulation
u(k+l)
=
u(k)
that
+ s(k)
(3.8)
.jO
where
S(k)
= _
(NI(k))-l
S(k) = _ (NI(k»)
-1
p(k),
+ S(k))
p (U(k)
,
and
provided that
U(k+l)
= u(k)
(S(k))
t S(k)
_ (NI(k»)
=/: O.
-1
Furthermore,
+ elk)
[P(k)
p (u(k) _ (M(k))
-1
P(k))],
for k = 0,1, ...
(3.9)
The last expression clearly exhibits that the updated solution is formed by combining a Bl'Oyden's step and a damped chord method step. The chord step is defined by
fixing the Jacobian
(its approximation
Kelley presents an updated
in this case) for .some iterations.
analysis of this method in [79].
Since the angle between the two directions
r
and magnitude,
If l"1lk)
= jlk)
is defined by
II'
as
1
=
lI';tk)
1-
This clearly shows that for mutually
chord step is performed.
(k)
elk) can be reformulated
u
,5<k)
,5<k)
s
cos l' = Ilslk) IIl1
lI(k)
and
s(k)
(s(k)
the damping parameter
Incidentally,
II
~COs1'
Ils{klll
orthogonal
directions
s(k)
and
s(k),
a full
On the other hand, if both entities are identical in direction
then the chord step contribution
vanishes.
and elk) = 1 for k = 0,1, ... then (3.9) becomes
51
This recurrence represents a higher-order modification
generated by (3.10) converge q-superlinearly
studied by Shamanskii
of Newton's method.
Iterates
with q-order 3 [105]. These methods were
[118] and Traub [129]. They pointed
out that even higher-
order methods can be built out of a longer sequence of chord steps alternated
regular Newton steps. In a more recent treatment,
Shamanskii
and compares
the particular
with
Kelley names those methods after
case (3.10) numerically
against Newton's
method [79]. Here, we rather adopt the term composite Newton's method when referring to recurrence (3.10).
Along the lines of Gay's local convergence analysis for Broyden's
was able to show that the NEN algorithm
dimensional
problems [144]. Therefore,
verges twice faster than Broyden's
important
converges n-step
method, Yang
q-quadratically
for n-
as in the linear case, the NEN method con-
method.
The following theorem summarizes
this
result.
Theorem 3.1.1
Let the standard assumptions
Let for any x, yEn
Then there exist
t,
in Assumption
= f~ J [y + t (x
that Ilu(O) - u*11 ::;
~ Rn F (x) - F (y)
8, CB, CEN >
8 and, for which Broyden's
a such
1.4.1 hold.
- y)] (x - y) dt.
t
and IIA1(0) -
1*11 ::;
method converges as follows
and the NEN method converges as follows
Hence, the NEN algorithm
the one-dimensional
converges q-quadratically
as does Newton's method in
case. Note that the method reduces to a forward finite difference
method in 1-D (144] which is sometimes referred to as Steffensen's
method [10.5]. In
.52
such case, the above equations
give rise to the recurrence
(k)
S
a(k+l)
. p(k)
=--a(k)
=
p
,
+ S(k))
(U(k)
-
p(k)
S(k)
U(k+l)
=U(k)
_
p(k)
a(k+1) .
The first equality provides a systematic
way to adjust the step length within the
forward finite difference scheme as the iteration progresses.
the shorter the step
s(k)
The steeper the slope
a(k)
and, vice versa. Moreover, current derivatives are estimated
in terms of the previous derivative rather than two consecutive function values as it
occurs with the secant method.
dimensional
It has been proven that the secant method for one-
problem converges 2-step q-quadratically
we can easily determine that the EN-algorithm
and two extra floating point operations
A key point can be made.
NE~ method is to composite
[67]. In terms of complexity.
requires one extra function evaluation
compared to Broyden's method.
Broyden's
Newton's
method is to Newton's
method.
method
what the
Hence, it is possible (in fact. not
rare in practice) that the NEN method produces faster converging iterates than those
of :\ewton's
method. especially, when
The following example corroborates
1\1(0)
and
u(a)
are sufficiently good.
t.he previous observation.
The cases shown
there will be frequently brought up as the ideas are developed throughout
and next chapter.
We momentarily
look at convergence in terms of nonlinear itera-
tions and leave the discussion on computational
operations)
the present
cost (i.e .. in terms of floating point
to Chapter 6.
Example 3.1.1
vVe consider the extended
versions of the Rosenbrock
function and Powell function described in Appendix
guesses
u(O)
= (0,1,0.1, ... ,0, l)t and
u(a)
B of [49] with initial
= (0, -1,0, 1,... ,0, -1,0, l)t,
.53
Powell
Rosenbrock
10°
-2
~ 10
-.
~
-8
10
10°
10-2
~ 10-·
z
_.
ex: 10
8' 10
-
~ 10-·
o
~ 10-·
-10
10-10
10
-'2
10
10-12
0
15
10
0
5
10
Iteration
Iteration
Chandrasekhar
~
z
Chandrasekhar
[c-.9]
10°
10°
10-2
10-2
~
10~
~ 10-·
10-10
10-12
4
'- \
6
8
'-
\
10-12
!
2
[c~.999999]
10~
~10-·
[
10-10
\
o
20
~ 10-·
o
,
,
,
~10-·
15
\
\
0
5
Iteration
\
10
15
Iteration
Figure 3.1
Convergence comparison of Newton's method (dash-dotted
line), Broyden's method (dotted line), the composite Newton's method
(dashed line) and the NEN algorithm (solid line) in their exact versions.
respectively.
We a.lso consider two variants
of a more physically sound
problem which arises in radiative heat transfer applications
by the so-called Chandrasekhar
P(u)
= H(u)
H-equation
- 1-
and modeled
(see [38, 79]):
1
£ r1 tJ.H(~)d~
2 Ja tJ.+~
= 0,
with u E [0. 1] .
There are two solutions known for acE
one, the problem
(0, 1) and, as this value approaches.
becomes harder to solve.
specifications given in [79]; that is, H (u)
the composite midpoint
of the H-equation
Here, we closely follow the
= u, uta) = (0,0,0,
rule to discretize the integral.
are determined
by setting
... ,0, O)t and
The two variants
c = .9 and c = .999999.
For all four different cases we specify 100 unknown variables.
Figure 3.1
shows the relative nonlinear residual norms (NRNR) against the number of
54
nonlinear iterations for Broyden's method (dotted line), Newton's method
(dash-dotted
line), the composite Newton's method (dashed line) and the
EN method (solid line). For the first and last method. the initial Jacobian
approximation
=
1\;[(0)
J(a)
was defined.
The backtracking
line-search
method was utilized in all methods.
In the case of the Rosenbrock
function, both Broyden's
EN method were unable to generate
a descent direction
method and the
II PI\
for
at the
first few steps of the process. In such case it was required to reevaluate the
.Jacobian by the finite difference approximation.
can see that the NEN method
employed by Broyden's
takes roughly half number the iterations
method.
This reduction
reduction
in iterations
Newton's
method.
showed by the composite
Newton's
method except in the Rosenbrock
The NEN method
linear residual norms.
converging superlinearly
However, in all cases we
appears
surpasses
Newton's
in .50% the
method over
to converge faster than
case at relative small non-
In the remaining cases, the NEN method appears
with a q-order between 2 and 3. Again, this trend
breaks down in the Rosenbrock
case, where also Broyden's
method
has
seriolls difficulties and seems to have a q-order close to unity.
For small and moderate problem sizes, the exact \'ersion of the composite Newt.on·s
and the NEN methods can be efficiently implemented
factorization
to solve two linear systems with different right hand sides.
plies significant savings in pivoting operations
evaluations
methods.
by reusing the underlying
and rank-one updates
.Jacobian is computationally
This im-
whereas the total number of functions
are reduced due to the faster convergence of both
Kelley observes that alternation
posite Newt.on is potentially
LU
attractive
of chord steps and Newton's steps in comfor large scale problems where building
expensive compared to function evaluations
the
[79]. Chord
5.5
steps may be a plausible and effective option in the setting of algebraic systems arising
from transient
problems (i.e., implicit formulation
of parabolic equations),
tial Newton iterates may be close to the root, particularly
where ini-
in simulations approaching
the steady state (see e.g., [.59]).
However, in large scale implementations
where linear iterative
tually a must, the high efficiency promised by the composite
methods are VIr-
Newton's
and NEN
methods fades away on account of the fact that two Jacobian systems must be solved
from scratc h. The pitfall is that most iterative
Krylov subspace methods)
do not offer reusable information
in the linear system coefficients.
NEN algorithm
methods
as computationally
Consequently,
(including
some popular
in the event of changes
this makes an inexact step of the
expensive as two steps of an inexact Broyden's
method.
Fortunately,
as we saw in §§2.2 the GMRES algorithm
preserves Krylov infor-
mation delivered by its intrinsic Arnoldi factorization.
However, until now, this in-
formation
in subsequent
has been restricted
to build preconditioners
GMRES within the inexact Newton's method.
performed
upon the current
underlying
utilizations
of
We show that chord steps can be still
Krylov basis.
preserve much of the integrity of an inexact nonlinear
In this way, we are able to
EN algorithm
and recover the
efficiency that it promises compared to Newton's and Broyden's method.
3.1.3
Inexactness
in secant methods
The issue of inexactness
in quasi-Newton
Reference [54] is of particular
and Steihaug in [41].
has been examined
in [54, 126].
interest since it is shown there that local q-superlinear
rate of convergence is still attained
results are a generalization
methods
for the inexact Broyden's method.
of the work previously
In fact, those
developed by Dembo, Eisenstat
·56
Since the same conditions stated in [54] can be also imposed upon the inexact NEN
algorithm,
iterates.
it is straightforward
to show that it produces q-superlinearly
convergent
These conditions are given by
.
IIAJ(k)s(k)
II
IIAf(k)s(k)
+ F(k)
k--00
and
.
+ F(k)11
lIm
lIm
k-oo
n/L\11
-,
II
-
F(k+l)
" "",
= O.
Rosenbrock
10°
10-'
10
4
~ 10-4
0
~ 10-0
:z
~ 10-
c:>
-
0
8' 10 ·0
~ 10-
-
10-10
10-12
_10
10
o
5
10
15
10
-,.0
Chandrasekhar
[c=.9]
10
Iteration
Chandrasekhar
15
20
[c=.999999]
10°
10-'
10-'
~ 10-'
~ 10-4
:z
5
Iteration
10°
~
Powell
-.
10°
~ 10-
a
-
:z
10-0
~
:z.
0
~ 10-
10-10
...... ,
10-"
,
0
,
,
10.
10-'°
\
10-12
o
10·"
2
4
Iteration
6
8
o
15
5
Iteration
Figure 3.2 Convergenge comparison of Newton's method (dash-dotted
line), Broyden's method (dotted line). the composite Newton's method
(dashed line) and the NEN algorithm (solid line) in their inexact versions.
Clearly, the first condition follows if the forcing terms converge to zero as k
-+ ()().
The second one suggests that the residual should look like the value of the function
at the new point with a discrepancy
direction
produced for k ~
00.
these conditions hold and u(k)
a q-superlinear
way.
-+
size converging faster to zero than the size of the
Eisensat and Steihaug show that whenever both of
u·.
it follows that the sequence {u(k)}
converges m
.')/
Rather
illuminating
than going over the lengthy details of this proof. we consider it more
to present the convergence
results for the cases exposed in Example
:U.1 with G:\[RES solving the .Jacobian equations.
Figure 3.2 presents the convergence history when G:\lRES
Example 3.1.2
was used as inexact solver of the .Jacobian equation.
tracking line-search strategy
and the forcing term selection discussed in
Chapter 2, The G~[R ES restart parameter
and no preconditioning
We follow the back-
was chosen to be :30, 1]rna.x = .1
\"'as specified. As Figure :3.2 shows there is no ap-
parent change in the convergence of the composite Newton's method and
\"ewton's
method.
The secant methods
in the number of iterations
but without
t.hat both ha\'e between each other.
or
ree\"aluat ion
t
instead,
show a slight increase
altering the convergence margin
Rarely enough, the inexactness
and
he .Jacobian were more beneficial to the :'JE~ algorithm
in achieving bettf'r convergence rates than \"ewton's method itself for the
Rosenbrock
fUllction. Table :3.1 and Table :3.2 complement
by illustrating
the number of G~[RES iterations
the iterations
for the particular
these results
and \'allles of
case of the C'handrasekhar
1](k)
along
H-equation
wi the = .~)999g9.
3.2
Secant preconditioners
Secant procedures
nonlinear equations
ha\'e been traditionally
at a lower cost.
conceived as an alternative
As happens
way to solve
with inexact Newton solvers they
have been studied to tackle large scale problems.
Recently, both methods began to be
l'f'garcled more as complf'mentary
procedures.
than competing
position has been mainly driven by ~[artinez
Theory support.ing this
[89. 90]. Brown. Saad and Walker [21]
Table
3.1
Comparison
of Broyden's methocl and ~ewton's method
soh'ing the the C'handrasekhar
H-equation with c = .999999.
Broyden
R:\R
k
1
'2
:3
-!
.1
I
2.24e-O 1
8 .-!:'S("-0'2
:3.-17e-0:2
1.l:Se-O:2
:L96e-O:3
1.:")1e-0:3
fi
-I 6.2:")e-0·.t.
L2le-04
8
1.1Oe-0-!
9
10 :3.1:3e-O.j
II 9.05e-06
12 :3.2.Se-06
1:3 9.9Se-07
1-! 1.:").)e-07
1:j .1. 12e-09
16 :2.:j:3e-1O
17 ~.:3ge-12
Table
for
~ewt.on
'1( k)
1.00e-01
1.00e-0 1
1.00e-01
1.00e-01
1.00e-Ol
1.00e-0 1
LOOe-Ol
1.00e-Ol
1.00e-Ol
1.00e-01
1.00e-01
1.00e-0 1
1.00e-01
1.00e-01
2.88e-02
2.7:Je-O:2
ILl
2
1
1
2
:2
2
2
2
:J
:3
:3
:3
:3
:3
:3
:3
R~R
I
2.2-!e-Ol
4.98e-0'2
1.44e-02
:3AOe-03
8.:2.5e-04
2.28e-04
.SAge-05
1.2ge-0.5
2.53e-06
2.65e-07
-!.74e-09
-!..54e-l1
1.2ge-l.j
TJ{k)
1.00e-01
l.00e-01
1.00e-01
1.00e-Ol
1.00e-0 1
1.00e-01
1.00e-01
1.0Oe-01
1.00e-Ol
1.00e-01
1.6.5e-02
2.72e-03
ILl
2
1
2
2
2
2
:3
:3
:3
:3
3
-!
3.2
Comparison
of the ~E:\ and the composite :\"ewton's method
for soh'ing t.he the C'handrasekhar
H-equat.ion with c = .999999.
:\E:\"
/,:
1
:2
:~
Il
:)
6
I
S
9
R:\R
9.16e-02
1.60e-02
2.66e-O:~
-l,O~e-04
~).oge-a.)
:3.l7e-06
6.68e-07
2.8:2e-OS
6.2-!e-10
12.2:3e-11
2.90e-l-!
I
10
11
I
TIl
Compo
J.:)
1.00e-0 1
8.S0e-02
l.OOe-Ol
1.00e-O 1
l.OOe-01
1.00e-01
2.66e-0:2
-!.01e-02
1.-!.Se-02
:3..j7e-02
ILI
.)
:2
2
:2
:3
:3
:3
:3
-!
:3
R~R
1.0-!e-0 1
lA6e-02
:2.0:3e-0:3
:3.l-le-O-!
4.:36e-0.j
.j.20e-06
2.·50e-07
2.23e-1O
1.0.5e-l.j
I
~e\\'ton
q(k)
1.OOe-0 1
1.00e-01
1.00e-01
1.00e-O 1
1.00e-0 1
1.00e-0 1
l.OOe-01
1.93e-02
ILl
1
2
:2
2
:3
:3
:3
-!
·')9
propose multiple secant updates of a given Arnoldi factorization
good preconelitioners
for G~IRES in subsequent
to eventually generate
nonlinear iterations.
Additionally.
a
few scattered efforts aim at the possibility of combining secant updates with sparsity
preserving methods (i.e .. structured
least-change secant updates)
for solving .Jacobian
systems (e.g .. [..19] and references therein).
~evertheless.
integration
of secant updates
ods is barely in its beginnings.
preconditioners
into other traditional
iterative meth-
In our reality. we consider Martinez's
theory of secant
and Brown. Saad and \Valker ideas on multiple-secant-update
ditioners to be the most representative
precon-
works .. -\t this point. we momentarily
digress
and revie\\I these two works. The discussion highlights some of the practicallimitations to be taken into consideration
in the development
that follows in the present
and next chapter.
3.2.1
Secant preconditioners
One of the most instructive
inexact and quasi-\ewton
relies on C[uasi-\ewton
for inexact Newton
points in \lartlnez's
methods.
methods
\\lork is his complementary
For the purpose of complying with (2.5) \Iartlnez
updates of some approximation
to the .Jacobian matrix.
t his condition is not fulfilled for a fixed linear tolerance ". t.hen the condition
to be satisfied by a standard
inexact )iewton iteration
step.
eventually.
steps should predominate
and dictate
the quasi-Newton
view of
\[artlnez
When
is forced
argues that
the convergence
of the method.
Consider sol ving (:2.l ) and let .\I(k+l l be the precondi tioner of the .J acobian matrix.
J(k+ll
JI(k+l)
corresponding
to the (I.:
+ l)th
~ewton
is required to satisfy the secant equation
iteration.
The secant preconditioner
60
where
.,,(k)
=
u(k+l)
_
u(k).
Hence. the secant preconditioner
for the inexact ~ewton method can be algorith-
mically described as follO\\!s:
Algorithm
3.2.1
(\"ewton with secant preconditioner)
Let 0 < '1 < 1 nnd Illk) E (0. 'I) such that limk_x,
1. Gi\'e an initial guess
1/(0)
and preconditioner
,,(k)
= O. then
J/(O).
2. For k = O. 1. , . , . until convergence do .
2.1 If (,\I(k)
is nonsingular)
2.1.1 Solve
JI(k)s(k)
=
then
-F(k),
2.1.2.2 goto step 2.:3.
.).) Find
s(k)
such that IIJ(kls(k)
+ F(k)11 < 'l(k) IIF(k)11 by some itera-
tive method.
',!,l
f'pdate
preconditionf'r
:\ote that this algorithm
of the preconditioner.
of
./(0)
may be quite expensive if no care is taken in the select.ion
To make this algorithm efficient step 2.1.2 should be preferably
satisfied in ll10st cases.
approximation
.\1(1.;) by
The initial preconditioner
and com'eniently
expressed
.\[(0)
is to be chosen as a fair
in some factor form (e.g .. block
61
Jacobi.
ILU). \Ve remark that step 2.-! should be performed
so that the 100\'-rank
update strategy can take advantage of the underlying data structure.
limited memory compact representations
To that purpose
could be a reasonable choice [28].
The main drawback of this approach is to carry .Jacobian evaluations
updates
simultaneously.
decrease directions for
\Vhen secant updates
II F(kl II
and secant
are doing a good job in generating
then an overhead is incurred in evaluating
.Ilk)
in order
to check the condition in step 2.1.2. Conversely. if the secant updates are performing
poorly. this overhead is added indefectibly
2.2.
One may argue that
for the inexact iteration.
generate
ill-conditioned
the iterative solver.
the operator
to the inexact iteration
AI(k)
suggested by step
can be still useful as preconditioner
However, there are cases where the secant updates
systems and hence they may be inadequate
can
for accelerating
Besides, there is no reason to believe that the updates
should
eventually generate an operat.or close to t.he true .Jacobian (see [4-9]for discussion on
this). In summary. it is hard to find a case when this algorithm
\'ersions of both Broyden's
In order to guarantee
~Iore characterization
ators.
outperforms
inexact
method and Newton's method.
J/lk+ll has to satisfy the Dennis11.\!1k+1lll.II(.\!lk+ll) -III should be bounded oper-
fast local convergence.
(:3.7) and
("nder these assumptions.
the potentialities
of the abO\"e algorithm
can be
formal ized as follows:
Theorem
3.2.1
Let the assumptions
Let ;\ssumpt ion 1.-!.1 also hold.
Slk)
Proof
=
s(k)
and the convergence
above hold for all k
Then there exists a k
> k
=
0.1. ....
such that
is q-superlinear.
o
See [89].
The theorem states that there should be an eventual good approximation
.Jacobian (or its action onto a descent direction)
that insures the condition
to the
at step
and Hm. :'-iotice that this block form (:3.18) implies t.he application
of m consecuti\,{'
Broyd(,[l'S updates.
In terms of floating point operations
pensiH' than other standard
disa{!\'antage.
the method may turn out to be more ex-
preconditioners
such as block .Jacobi or ILU. The other
is that using the identity matrix as an initial approximation
be enough to get significant improvement
may not
on the rate of convergence of G \IRES as
shown by Brown. Saad and \Yalker for a rather simple Bratu problem in i-D. In
light of (:3.18). something
a prohibiti\'e
owrhead.
lated C\IRES
method.
iterations
more elaborated
induces
For that problem they observed a three-fold lesser accumucompared
with no preconditioning
~Ioreo\"er. the preconditioncr
for the entire ~ewton's
is built in a least-change
where t he secant update preconditioner
Laplacian.
than this initial approximation
secant update format.
is combined with a fixed part given by a 1-0
Brown. Saad and \Valker also in\"estigate a "bad" Broyden's
update
and
a hybrid \'ersion leading to similar conclusions.
,\Ithough
tlwir pxperiences are limited we consider them inspiring in the sense of
ho\\" ~ecant updatf'~ may be used as a de\'ice to exploit underlying Krylov information.
TIlt:' k('~' pOint "hall become more e\'ident in the remainder
3.3
of the chapter.
Exploiting Krylov basis information
The previous discussion
associated
Illoti\'ates
us to take ad\"antage of the Krylov information
with .Jtkl or its approximation
preconditioners.
we restrict the generation
in a different way. Rather
than building
of successive descent directions for
IIFII
to
the current Krylov basis. This implies to perform rank-one updates in the Hessenberg
matrix
resulting
npproximation
from the .\rnoldi
factorization
(2.16) and implicitly
of Broyden's llpdate of the .Jacobian matrix.
here is to minimize the direct manipulation
reproduce
an
Hence. the main objectiH'
of the .Jacobian matrix and the use of
G\lRES as much as possible in the process of converging t.o the root of F, );otp that in
contrast to \[artinez's
approach.
we do not perform Jacobian
evaluations and spcant
updates at the same time.
Consider A as an approximation
to the current Jacobian
with ,.{+ s = F+ - F restricted
ested in looking at a minimum change to .4. consistent
to the underlying
Krylov subspace.
matrix J. We are inter-
A basis for this subspace
ing an iterative linear soh-er sllch as G\IRES
arises as result of us-
for solving the approximated
.Jacobian
system with A.
\Ve quote howeH'r. that the present development
algorithm.
The Full Orthogonali::ation
is not only valid for the G~IRES
.\Iethod (FOAl) also known as the Arnoldi iter-
ati\'e method [11:3] can be employed for the purposes underlined
here. It is important
to remark. however. that the GMRES algorithm is still more robust and efficient than
this approach [18].
3.3.1
Updating
the Arnoldi
factorization
In § :2.:2 we discllssed the role that the Arnoldi process plays in G~IRES.
t h(' \'f'hide to express t he minimal residual approximation
way.
The.\
discarded
rnoldi factorization
at all e\'ery time a G\IRES
reflect secant updates
hasis.
provides \'aluahle
on the .Jacobian matrix without
For the sake of simplicity.
We now show how to
altering the current
Krylov
induced
by
to converge at a predefined
forcing term value).
Consider the solution to the following approximated
nonlinear iteration
over.
that should not 1)('
let llS omit the sources of inexactness
the use of G\[RES whose relative residuals are supposed
tolerance (i.e .. to a prescribed
(1.9) in a more manageable
information
solution starts
It is basically
Jacobian equation at the kth
()()
=
A(k) ..,(k)
G\IRES algorithm.
with m steps of the
beclcled in an inexact
tained.
\ow.
method.
Krylov
subspace
we wish to use the information
an approximation
s~)
=
s&k)
+ V(k)y(k)
for this problem
during
can be regarded
as
be the solution
is given by K~)
the solution
('111-
oh-
/'bk))
(:ilk).
.
of (:3.19) to provide
to the system
I(k+l)
with corresponding
t.hat
Let
gathered
.'""\
guarantee
( :3. 1!) )
,
This linear solution
Broyden's
The associated
_F(k)
Krylov
K:!~+l)(
basis
A(k+1),
Arnoldi
the h:rylo\' basis.
is.
That
__
-
K~)(.-.t(k+l).
r~k+l»)
onto the corresponding
,(k+l)
.::;
=
K!;)(
factorization
F(k+l)
,
(:3. 20)
Clearly,
r~k+1)).
.-.t(k) , l'~k)).
in general
However,
we can not
rank-one
of (:3.19) can be done without
updates
destroying
or f>f(ui\"alently.
for allY vectors:;.
tv
rather
than
dated
by a rank-one
A(k).
E IRm. Expression
:';ote that
Before proceeding.
In terms
of a solution
introduce
an implicit
matrix
the current
whose range
it would
.Jacobian
equation
to express
on the Krylov
in terms of
a clearer
way to update
approximation
lies on K~) (A
be cOll\'enient
lying strictly
secant
(:J.2:,n suggests
(k),
to be up-
k
)).
the secant
subspace.
A(k+l).
rb
appears
H,(,~')
equation
Othenvise.
To remove
this would
the shift from the
oj
origin. we reformulate
(:3.19) as
4.(k)
S(k)
,
F(k)
__
_
-
Ilk) .,lk)
"'0
/1.
_
-
r.lk)
0
,
and redefine the final solution as s~l = ~.·(k)y(k), that is, as if the initial guess werf'
zero. Obviously. the associated
Krylov basis is the same depicted above. Thereforf'.
the secant equation
(:3.2:3) becomes
for s(kl =
\Iultiplying
\.·lk)y(kl.
(~r(k)r
both sides by
it readily follows that H~k+ll
should satisfy the follO\\!ing secant equation
H(k+l)
m
where.3 =
Ilr6k)ll.
=
/Ilk)
.~
)!
(V(kl
+ 'Be 10
F(k+l)
(:3.26)
Hence. the Krylov subspace projected version of the secant equation
(:3.:2:n can be written as
((V'(k)r
F(k+l)
+
3el -
(y(kl)t
Remark 3.3.1
of par/inl
poft
(y(k)r
y1k)
The form (:3.21) has been pre\'iously used in the context
fl.5."igIlTnfllf
problems ill control theory.
place a few eigenvalues conforming t he spectrum
set of eigenvalues representing
technique
H~)y(kl)
The idea is to re-
of a matrix A by another
more stable modes within the system. This
is applied once the Arnoldi process have deli\'ered Km (A, l.:) as
a small invariant subspace under .-\ for a given vector v. Further details
and pointers to this problem can be seen in [111].
The following theorem
Broyden's
update
for A(kl:
states
that
update
(:3.27) yields a modified version of
68
3.3.1
Theorem
corresponding
update
=
_llk+l)
Ilk)
,
=
Let (:J.n) be the rank-one
,"'1
+
.4(k)
+
of
[P(k)
according
A(k)
+ /,(k)
F(k+l)
update
to (:3.21) is given
_ (P(k)
A(k) P(k))
,(k)]
U
,0;
1
(s(k))
[F(k+l)
_ F(k)
of
H~),
then
the
by
(,(k))t
.s
s(k)
_ A(k)s(kl]
(s(k))1
+
(:3.28)
(s(ld)ts(k)
[(1-
P(k))
+ .4.(kl.s(kl)
(F(k+l)
- A(k)s~k)]
(s(k)r
(s(k))ts(k)
Proof
for notational
superscripts
convenience.
~, + l hy the symbol
-- -- I/IF+
~
let us drop
+. Thus.
+. J tl
the
in view of (:3.21) choose
H mY -- I:'!F+
+ .'3el
~
-
k and replace
the superscripts
-
H m Vt S,
and
I'lt
t.1
u:=-=
.'/.11
= A + \-.:u..I\·1
.l+
= .1
.1 \....
.'
.IJ q
,t I..'
,'\ ' .Il
+ (\.\'.
--
[-'"1-
.0;
v
... t S
+
1
/'1)
-
\
'Hm \/ts) 8
.o;t.o;
Arnoldi
factorization
we substitute
Hm = Vt.-tV into the above expression.
Thus
A+ = A
+
(VV1F+
+"0 -
V\/tAVVt.s)st
s/s
which
.~(ld E
can be split
A.~m (A(k).
r6
up in the desired
k
))
.
form (:3.28).
~otice
that
p(k)S(k)
=
s(k).
since
0
69
We refer to the update (:3.28) as the I\'rylov-Broyden
ator
;'\ote that the oper-
rb
k
projector onto the I\:rylov subspace Km (A(k).
is an orthogonal
p(k)
update.
))
.
That
IS •
• (P(k))
•
(p(k)f
2
=
(Idempotency).
p(k)
(Symmetry).
= plk)
The update of H~~) reflects an update of
larger the value of
update.
In
the closer both updates
The following observation
Remark 3.3.2
,4(k)
If
(k)
$0
=
on a lower dimensional
space. The
(:3.28) and (3.30) are to Broyden's
provides us with further insights.
a then
furthermore.
\\"11('1'('.·t~+l)
is the .Jacobian operator
resulting from Broyden's
update.
This stems from the fact that the third term of (:3.28) is orthogonal
""m'(k)
1
(.I{k)
.'1.
.ro(k))
.
.\ little algebra leads to the following alternative
.-\.(k+l)
=
..t(k)
[(I -
+
.(Flk+
l) _
F(k)
_
.4. (k)
(s(k))
P(k))
F(k+l)
(s(k))
s(k))
I.
form of (:3.28)
t
.
$(")
+ .-\.(k)$~k)
_ h~L.mt·m+l
ls(k))1
s(k)
(v~s(k))l (s(k))1
to
,\ssliming
-"6k)
= 0, the above expression
tells liS that the departure
from Broyden's update does not only depend on the acute angle between
the underlying
I\rylov subspace but. also on how nearly the columns of
of (:L:Hl)
F(k+l)
\.'(k)
ilnd
span an
invariant subspace of ,11k).
Clearly. H!,~+I) is not necessarily an upper Hessenberg matrix.
that expression
of H~)
(:3.:21) can be efficiently performed
by updating
However. we qllote
a given QR form
(see e.g .. [.l9. T1]). This form is not readily available, instead most standard
implementations
of G\IRES
progressi"ely compute a QR factorization of~)
new column enters the Arnoldi process (recall discussion
there are efficient ways to perform the QR factorization
the last row of H~,:')already
point operations
factorization
factorized in QR form.
in §§ 2.2.2).
as en:'ry
Fortunately.
of H!:) by just deleting
This requires 0 (m2)
floating
(see [T1. pp .. )96-.197]) .. -\n even more efficient way to obtain this
consists of keeping an immediate copy of the QR factorization
before applying all previous Givens rotations
to the new entering column.
of H~)
In other
.words. if
-(k)
Hm-l
is the QR factorization
and
,.1
=
(;=t
= Qm-l R'n-I
of the augmented
-;1 /2)'
= (/rn-l
(
Rm-l )
0
Hessenberg at the (m - l)th G\lRES
~tt'p
with r E IRrn-1 is the entering column. then the QR factor-
ization of H!nk) at the mth GMRES step is given by
( :3.:31 )
In both cases. it
IS
necessary to use 0 (m2)
memory locations for storing the factor
Q to keep update 0.21) within a cost of 0 (m2)
floating point operations.
,I
3.3.2
On the Krylov-Broyden
(:3.28) is the solution
Expression
liB -
min
1!l)n.n
A(k)11 F
update
to the problem
subject
s(k) = p(k)
to (p(k) BP(k))
+ r(k)0
p(k+l)
.
BEJ1'l..
In fact.
IIA.lk+l) _ Alk)11
=
II [(P(k)BPlkl)
(P(k),4(klP(kl)
s(k) -
F
(.s(kl)t
(s(k1r
s(k)]
s(k)
II
F
(:L32)
due
to the consistency
12-norm
lution
( p(k)
property
of an orthogonal
projector
follows from t.he convexity
B P(k)) ..,(k) = p(k)
On the otlwr
of the
F(k+ 1)
Frobenius
is bounded
norm
and
to the
by 1. Uniqueness
above
liB - A(kIII
of the functional
fact
over all
F
that
the
of the so-
B satisfying
+ r&k) .
hane\. it similarly
follows that
exprpssion
(:3,27) is the solution
to
tilt-' problem
GE1R~)(m
:3.:3.1 establishes
Theorem
lems.
IIG - FI~k)IIF
However.
set of matrix
other
of
.1J
=
F(k+l)
.r. the
That
is.
set of matrices
generating
Gy =
(\.'(k)r
between
-
F(q by .:;
I
=
F(k)
+ Jel.
these two minimization
(:3.28) can be stated
Q = {B E IRnxn
and
to
the equi\'alence
\,iew of updat.e
quotients
subject
lI(k+11
as follows.
-
u(k)
Consider
defined
prob-
Q. the
by
Bs = !J}.
the same Krylov subspace
Km
= Km
(A(k).
r&k)) .
-.)
1-
The
resulting
among
these
matrix
in (:3.:28) can be thought
.-t1k+l)
the set of nearest
matrices
two sets is not empt.y,
ronst ruet ion of least-change
standard
secant
(t->.g.. sparsity
The
tion,
condition
pattern.
\'ectors
the
solution
to Q.
then
Alk+11 E
secant
updates
and other
Furthermore,
n
.l'
propert.y
to (:3.19) lies on the
= ..,(kl/((s(kl(s(kl).
equation
minimization
if the intersection
Q. This observation
consistent
the
(see [-!-!. -!6]).
subspace
On the other
(:3.2.1) and having
satisfying
by a given affine subspace
in IRm. However,
I\:rylov
of
is key in the
with operators
prescribed
in IRnxn
posit.ive definiteness)
with thp secant
following
.r
:; and u: in (:3.21) are arbitrary
Fylkl/(ylkl)tylk))
sistent
in
to Alk'l in .1.'
as the nearest
we could
hand.
A(kl
since by assump-
in
pick
z
finding
.t' amounts
=
=
V(k)u'
con-
V(kl:;
to solving
the
problem
(:3.:3:") )
Since the solution
implying
the n('iH'est .-\.lk+ll to
intprpretation
In" Dt'llnis
is nothing
same
convergence
de\'t>loped
is "harmless"
behavior
for cOn\'ergence
tat ions can be pxtended
few adaptations.
The
holds as a consequence
Dennis-\lore
=
- .4.(kls),
(\/(kl)t(y
then
the update
in .1' to Q is given by (:3.28). This
of all matrices
case of the general
result
established
[-!-l].
in
pxperimentation
of (:3,:W)
,4.(k)
by:;.
more than a particular
and Schnabel
Exltausti\'p
equality
of (:3.:3.5) is gi\'en
re\"eals tltat
the last
in the sense
that
as Broyden's
of Broyden's
to show q-superlinear
bounded
deterioration
of the bounded
charactt->rization
this
method.
method
in its exact
convergence
property
on the right
update
[n fact.
deterioration
can be \'erified.
IeI'm
side of tilt'
produces
theoretical
almost
tools already
and inexact
of update
for update
of (:3.27).
the
implemen-
(:3.28) with
a
(:3.:30) or (:3.28)
In the same way the
~::":r====:J1
Powell
Rosenbrock
=
~
~ 10··
:§> 10··
1
iJ
5
10
I
10·'0
10-"[o
15
5
10
10
Chandrasekhar
J
Chandrasekhar
[c~. 9}
10
.•
10'
10
Z
a: 10
.•
o
~10··
,at
g. 10··
f
-
_10
10
-1210
J
2
4
-'2
10
8
6
0
5
Figure 3.3
(dotted
Example
Illethod
3.3.1
(dotted
Convergence
comparison
between Broyden's
method
line) and the Krylov-Broyden
method (solid line).
In this example.
the com"ergence
line) and of the Krylov-Broyden
C-OlllP"l't'd for t he ~ame four cases presented
llllJl"t'
IluTin-'able differences
;t1r1l1l1l!!,hthe I\rylo\'-Broyden
it starts
delin>rillg
!"\"entllally
surpasses
(not
at relati\'e
nonlinear
residual
casp of the Chandrasekhar
shown)
does not happen
duces
iterates
faster
more difficult
\'ersion
than
gets
the performance
the crossing
but again
Broyden's
of the problem.
.. \mong
STuck within
norms approaching
!wrformance
(solid line) are
all of them.
in the case of the Rosenbrock
lIlore rapidly
equation.
method
of Broyden's
for c = .'L In The former
('qllation
nwthod
behavior
pre\'iollsly
are detected
t"llllcti"lI "lid tIll-' ('handrasekhar
at -;ome point
15
10
\leration
iteration
rllt'
[c=.999999]
~ 10··
~ 10'·
Z
20
·2
10
a:
,
15
\leration
Iteration
con\'erging
1.0 x 10-10.
between
both
and
method
In the easy
both
methods
method
does at some points.
cur\'es
region.
iterates
of Broyden's
the Krylo\'-Broyden
method
a gi\'en
one.
proIn the
look alike but with sev-
I'ral crossing
points.
The case of the extended
a case \\"t're the [\:rylov-Broyden
identically.
rolumns
In t his situation.
of \ -Iq after
was reached).
approach
that
\ ot t' that
H~~) implyin~
.Jacobian
,\s it can he obsen·ecl.
is no major
t he last
that
operators
\evertheless.
difference
term
and Broyden's
an in\'ariant
four G~[RES
may be better.
there
method
Powell function
subspace
iterations
nothing
after
remains
constant.
Therefore.
of ~-(.\:) span
an invariant
not only
to the current
.Jacobian
the eige!1\"alues of subsequent
implicit
obsen'at.ion
shall become
3.4
and its Ilsefulness
.Jacobians
which
indicates
in general.
secant
the eigenvalues
(i.e .. the clo~er the columns
the approximation
about
several
the smaller
by the
breakdown
experimentation
both approaches
size in approximating
perform
were generated
can be asserted
of (:3.21) is preserved
t he error
method
(i.e .. a happy
broader
between
illustra.tes
updates
of
of corresponding
the term
Ilh~~l.mt'm+lll
for A.(k») the better
subspace
but also the approximation
with this approach.
more evident
in Chapter
This
to
is a key
-!.
Nonlinear Krylov-EN methods
III ,hi ... ""',·riIJn we pl'f~sent two algorithms
~t'IlI'r(\tPd
rclati\'e
,ia
l ;\IRES
nonlinear
for the inexact
st:'\'f'ral
consecuti\'e
a descent
The first algorithm
case and it is based
algorithm
\\,hethpr
as a de,"ice to ~('rlPrate
residuals.
second
is a high order
residual
for
IIFII
on only one
make lise of the [\:rylov
accpptable
minimization
or a maximum
method
in Rmxm
is exhausted
prespecified
informatioll
for decl'f'i"lsing
of the \E\
G~IRES solution
problems
by G\IRES
directions
is an extension
\·ersion of \ewton's
the [\:rylo\' basis produced
direction
that
algorithm
per iteration.
and amounts
(with
to soh'ing
m ~
and unable
The
n)
until
to generatf'
user value is exceeded.
I .)
3.4.1
The
KEN algorithm
nonlinear
\\'1" are no\\' in a po~ition to describe
that exploits
the information
t he nonlinear
an inexact
left behind
by the G~[RES
3.4.1
(:\onlinear
1. Gi\'e an initial
:2. For I,
.)
1
-'
guess
rl~(k)
'.
("~lk)
'.1
•
'2.:2 'Ilk) = (~.;~k)
r
and .Jacobian
con\'prgence
h(k)
·m+l.m·
H(k)m ,vY'(k)
m
F(k+I)
+
_'+ruin
",,~,,)
1'1
Dt'note
Je
t+
its solution
- .. ).- -t~·) _-'
mY·
Some comments
=
II (k)
+ .s( k) .
are in order.
Ilr(kllll
a
k ) ) ( y( k) )
by !?~'I.
with p(k) = y-(k) (v·(k)r.
II (k + I)
.3 =
-
Hm
'2.6 Perform
'2. i
approximation
A(a).
do
(k+l)
-
\·(k)-(.(:)
,) -
Hence. we introdllcf'
=G'IRES(
.V
I(k)
.'1..
_F(k)
,(kl)
..~.
3el .
H~ )y(
•,/(1, ) _
.::.
method.
Krylov-E:'J")
1l(0)
= O. t. , . , until
\'t>r~ion of the E~ algorithm
(K E:'J") algori thm as follows.
I\:rylo\'- Eirola-:'; evanlinna
Algorithm
nonlinear
r
yll . with
.
H~+11
= (
)
• The .Jacobian
could
bl" a.ddresspd
l'xplicit
G:\lRES
\ote
is not required
p(k)
and
work upon
the rest of the ,'alues
returned
and consequently.
• \\'e ha,"p lIut included
the
direction
.Jacobian
Ination
approximation
(!t"l\ally
particl\lar
3.4.2
.\
1""1"11
Wl\tt'xt
rallk-ulw
'tpdates
of the
next C;:\[R ES call.
residual
The extend
minimization
we abandon
Hessenberg
simultaneous
sulfi<:iPllt amount
of decrease
This presentation
h:E\
of
IIF"
such as in the nmtext
of \ewton's
Hesspnlwrg
matrix
may result
the inexact
\e\\'ton
method
than
);ote
that
as part of its
the update
in step
does not generate
a
lead to reset the current
with a new .Jacobian
Discussion
in
approxi-
on this topic for the
[-!9].
can be attained
is l\f>termined
descent
Hessenberg
further
method.
in relati\'e
where
as long as possible
is delivered
allows us to illustrate
the form I:LH )
available
or the update
al~orithm
matrix
and
the
algorithm
in producing
.Jacobian
purposes.
factorization.
can be found
of these llpdates
problem
QR
the process
Krylov-Newton
,"prsion of the nonlinear
to retrieve
these situations
method
to
is required.
by finite differences),
Broyden's
1)1'
(suhject
end.
situations
system
Basically.
for efficiency
are readily
storage
to handle
and restart
obtained
A higher-order
fastt'r
IIFII.
formulations
we suggest
Hessenberg
no extra
criteria
for
to that
by G:\[RES
:2.6 gi,"es rise to an ill-conditioned
descent
memory
also that.
out step 2A efficiently,
to carry
machinery
by limited
in the next chapter).
forl11ulat ion of
• In order
from
be updated
directions.
updates
by \'erifying
before
makin~
We stress
faster
that
the
In this opportunit
.....
and check instead
the condition
further
for a higher
nonlinear
the
by the capability
KE~
update
updates
order
if a
(2.10).
uses of the Krylov-Broyden
less overhead
a possible
by performin~
to the
\'ersion
algorithm.
uf
Thf'
II
point is that the latter one requires simultaneous
may reaclil.y increase the total number of updates.
situation
of H;:) and
updates
A(k)
which
Of course, t.his may be a desirable
in terms of rapid convergence updates but it may turn out to be expensiw
in terms of computer
memory use.
The algorithm can be outlined as follows.
Algorithm
3.4.2
(Higher-Order
1. Give an initial guessu(a)
2. For I..~ = 0, 1....
Krylov-Newton)
and define
[max.
until convergence do
2.1 [,:;(k).!I(k).H~).
==
v~k),h~:~l.m.J
Ilr~k)11l =GMRES(J(k),-F(k),,:;(k»).
2.2 I = O.
2.:3 Repeat
2.:3.1
= (V~k)r
q(k+I)
F(k+l+l)
+
71 k+ll_
]el'
H~~+1l
.III k+1l
(.II' k+l;)'
.Ill
(.Ill
k+ll)
I
k+l)
:? .:3.:3 Solve
f1lk+l+l)
'EK,{''''''~'') II .h, + -1l~~"+11
!III· with
min
Denote its solution by
2.:3.0
[=
[
if
.-;(k+l}
m
I
(k)
Im+l.mt",
y(k+I+l).
+ 1.
2.-t Cntil (l =
2,;)
1l~+I+" = (
[max)
OR ,~;lk+/) is not a decreasing step for IIF(k+l)lI.
is a decreasing step for IIF(k+l)
\I
then
c
).
78
2.6 else
2.6.\
=
lI(k+I)
Il(k) +.s(k+l-ll.
:3. EndFor
This algorithm can be de\'ised as a variant of the composite Newton's method that
seek chord directions belonging to the underlying Krylov subspace. The faster version
of t he nonlinear K E); algorithm
can be easily stated from the above presentation
just including the r,rylov-Broyden
\'ersion should be appealing
E:'\ are effective compared
To verify that
evaluation
of F.
update of
in situations
to );ewton's
A.(k)
within the repeat loop 2.:3. This
where Broyden's
method or the nonlinear
method.
represents a sufficient decrease for
s(k+l)
by
However. this computation
IIF(k+I)1I
implies one extra
can be reused by a line-search
back-
tracking method following the end of the repeat loop. In general, the failure of this
sufficient decrease can be corrected by shortening
the
!>l'I'\'IOUS
the step afterwards or by accepting
accept able step as suggested in step 2.6.1.
rIll' following example illustrates
the performance
of the last two algorithms
sell t f'cl.
Example
III this particular
3.4.1
example. we use the line-search back-
t racking met hod with the same parameter
specifications
Figure :3.-1shows t he relative nonlinear
residual
par KE); algorithm
Krylov-);ewton
(dotted
(solid line).
behavior of both approaches.
history of the nonlin-
line) and the higher-order
Table :3.:3supports
of Example :L\.2,
implementation
of
part of the convergence
In all cases, G;\'lRES \vas able to converge
within a prespecified restart value of m = 20, a zero initial guess vector and
no preconditioning,
gorithm. we set
lm"r
For the higher-order
version of the Krylov-:'\ewton
= 10. .-\S was observed
al-
before. the Rosenbrock func-
prf'-
79
10
. Powell
Rosenbrock
o
100
-.
10
10-'
iii§ 10-::z
~ 10'·
Z
-8
;; 10-8
0:: 10
o
8'10-
~ 10-8
8
-
-10
10-10
10
-12
10
10
10-12
0
10
Iteration
Chandrasekhar [c=.9]
o
10
15
20
10-'
~ 10-·
::z
-8
0::10
o
8'10-'
-
10
15
Iteration
Chandrasekhar [c=.999999]
100
-.
~ 10·-
::z
5
0
~
10-8
~ 10-8
..
-10
10
-
10
10-.0
10-"
0
2
4
Iteration
6
8
5
0
15
10
Iteration
Figure 3.4 Convergence comparison between the nonlinear I\.E~
algorithm (dotted line) and the HOKN algorithm (solid line).
Table 3.3
:'\iumber of successful Hessenberg updates (0jHC)
and G:\IRES iterations (LI) in the HOI\.:\' algorithm.
Rosenbrock
:\'HC
1
0
2
0
1
:3
4
1
:)
1
1
6
k
I
8
:2
:2
I
LI
10
12
12
10
·8
6
4
-l
Powell
~HU
10
10
10
10
10
10
I
LI
-l
-!
-!
-!
-!
-!
Chand.( c=.9)
~Hl; I LI
:3
:3
2
;3
:2
-!
Chand,( c=.999999)
:'\iHl~T
LI
:3
2
:3
-!
10
6
2
-!
4
-!
tion represents the hardest case. Hence. the algorithms
portant improvements
method
do not show im-
compared to their Newton's method and Broyden's
counterparts.
The plateau
rit.hm at the first. iterations
Krylov- Broydell update
portion exhibited
by the KE~ algo-
obeys to the difficulties encountered
for the same case (see Figure :3.2). ~ot surpris-
ingly. Table :3.:3confirms the lack of success of the Krylov-Broyden
for the higher-order
by the
version of the Krylov-Newton
note the occurrence of backtracking
update
algorithm (the zeros de-
steps at the first to two nonlinear cy-
cles). The PO\rell function introduces an opposite situation.
The solution
to the minimal residual problem resulting from every Hessenberg updates
was al\ra.vs able to generate a decreasing step for
nonlinear
\E\
I\:E~ algorithm
reproduces
algorithm and the higher-order
dramatically
outperforms
1 inns
all
Consequently.
the
almost exactly the behavior of the
Krylov-Newton
(HOKN)
algorithm
the composite Newton's method (with only one
(;:\IRES call per nonlinear cycle). It is important
generales
IIFII.
to remark that GMRES
ill\'ariant Krylov subspace under the .Jacobian after -l itera-
to the level of double precisioll roundoff errors (i.e .. the residual term
ill (:3.21) \ras of order 1.0 x LO-\ti).
\ote.
however. that this does not llCC-
essarily imply that the value of the function at the new point belongs to
that im'ariant subspace as it seems to be the case here. An intermediate
beha\'ior
is shown by the C'handrasekhar
equation.
with a more favor-
able tendency as the difficulty of the problem increases, though.
easy case. the higher-order
Krylov-:\ewton
method.
In the
method is competiti\"e \vith the
composite
\ewton's
The nonlinear
KEN algorithm outperforms
Broyden's
method but it is slightly worse than the NEN algorithm.
The
difficult case delivers similar conclusions to the Powell function case. The
8l
reader can verify that each convergence history is qualitative
reflecting to
that observed in :3.:2. ,\s a final comment. Table :3.:3clearly illustrates that
the larger the dimension of the Krylov subspace does not mean a longer
chain of decreasing directions for
Before concluding.
in step 2.:3 of Algorithm :3.4.2.
a coupled of points need to be addressed.
one perform line-search globalization
this context of Krylov-Broyden
use throughout
IIFII
the examples
strategies
updates·?
and forcing term selection criteria in
We have implicitly
without much detailing
commented
on their practical
tat.ion. Secondly. what. are t.he effects. if any, on both algorithms
preconditioners
for the .Jacobian or its approximations'?
suspected some implications
due to preconditioning
produce a new unpreconditioned
on their
implemen-
due to the use of
The reader may have already
since the update (3.27) does not
.Jacobian to be used in the next nonlinear iteration.
Both questions are to he discussed in the Chapter
algorithms
First, how does
based upon the same I\rylov-Broyden
-! in conjunction
update
philosophy.
with three new
Chapter 4
Hybrid Krylov-secant
4.1
methods
Hybrid Krylov methods
As an attempt
to reduce the amount of work of G~lRES
and other affine methods.
a plethora of hybrid I-':rylo\' subspace methods has been proposed.
The principle of
these met hods is to start with an iterati\-e solver that requires no a priori information
about the matrix but which itself can produce useful information
spectrum
of the matrix).
The computation
(i.e., such as t.he
is then switched to a more economical
method that requires such information.
has been going on for some years [87. 94, 117. 12-1].
Research on this subject
.-\lgorithms along these lines are primarily distinguished
is estimated
and passed over to Richardson
O\'en-iew of all these algorithms
is of particular
importance
of the matrix spectrum.
is reported by Nachtigal
Instead,
for Richardson iteration.
methods
~achtigal
et (/1. suggest the construction
to handle
of the
(The reader may recall the discussion on Chapter
roots approximate
hand, there has been: an increasing
problems arise frequently
A historical
et ai. [94]. This last work
eigenvalue estimates
In fact. as we shall see later. they are rather pseudo-Ritz
On the other
iterative
iterations.
whose roots happen to be the reciprocals of relaxation
1 and realize that these polynomial
matrix.
or Chebyshev
since it is the first one to avoid the explicit computation
associated G~IRES polynomial
parameters
by the manner the spectrum
interest
values [65]).
in adapting
linear systems with several right hand sides.
in engineering
12:3]). and there seems no evident
of the
and scientific applications
Krylov
These
(see e.g. (:3:3. 61.
way to overcome the need to start
the Krvlov
iteratin'
method from scratch e\'ery time a new right hand side becomes available.
\[oreon~r.
the !\rylO\' subspace definition (1.1) expresses how tight its dependencf' is
on a particular
right hand side (i.e .. the initial linear residual).
.\mong several efforts. perhaps the most effective is the one proposed by Simoncini
and Gallopoulus
ft al.
[120. 121]. Their work is a direct consequence
[9-!] and therefore does not require explicit estimation
.\11 appealing
of that of ~achtigal
of matrix eigenvalues.
point of their approach compared to previous ones (e.g .. [:3:3, 10·1. 119])
is t hat right hand sides may be either simultaneously
or sequential
available during
the processi ng.
The intention
casual here.
G\[RES
II FII·
of bringing experiences
\\'e are interested
iterations.
C\IRES
on multiple right hand sides is not merely
in reducing the computational
\[ore precisely. to the end of generating
can be replaced by a cheaper Richardson
cost due to several
descent directions
iteration
for
and at every new
nonlinc>ar cycle (i.e .. when a new nonlinear residual comes out) we use the underlying
spectrum
framework
(or pseudospectrum)
information
to handle the ne'.\' .Jacobian equation.
The
pre\'iously developed in § :3.:3 fits entirely here. The Krylov-Broyden
up-
dates do not destroy the current Krylov basis and the modifications
matrix
can march along modifications
sped rum estimation
to the .Jacobian matrix making the requirpd
available at the next nonlinear iteration.
The present chapter
with some additional
to the Hessenberg
is organIzed
moti\'ation
as follows.
\Ve complete
the current
and insights about the main ingredients
section
toward the
generation
of a new family of Hybrid- Krylov Secant (HKS) methocls: problems with
projecting
onto the current Krylov basis, how to handle efficiently the new .Jacobian
linear system. the spectra versus pseudospectra
with Leja ordering as a mechanism
concept and the Richardson iteration
to apply effectively its relaxation
In § :2 we describe three new algorithms:
parameters.
the HKS-B. based on Broyden's
method.
,';'.j
the HhS-\
algorithm
based on :\e\\'ton's
Ilonlinear Eirola-:'ie\'anlinna
fi :~
and §
\amely.
..J:
algorithm
method
(hence superior
we address the t\\'o last questions
preconditioning
and. the HKS-EN. based on the
and globalization
t.o the HKS-B algorithm!.
pending
of Krylov-secant
at the end of Chapter
methods.
Therefore.
[n
:~.
the
discussion on these t\\'o sections re\'ises the hE~ and the HOKN together with the new
HhS algorithms.
Concerns
§
,j
is devoted to some computational
on operation
savings via limited memory
issues in all HKS algorithms.
quasi-Newton
and complexity
analysis are detailed.
Before getting into further details. we remark that HKS methods in general pro\'ide an infinite menu of alternatives
forthcoming
G~IRES iterations.
oriented
to reduce the computational
cost of
In response to this reality. future advances on hy-
brid met hods may be fitted in our framework.
4.1.1
Projection
onto the Krylov subspace
[n t.he previous chapter we were able to generate and solve several minimal residual
approximation
problems out of the hrylov basis generated
ment. we \\'ere looking at the norm of the same nonlinear
all
by G\lRES.
At that mo-
residual function but with
updated version of the Hessenberg matrix reflecting hrylov- Boyden updates of the
current. .Jacobian matrix. Right now. it would be desirable to carry out the same procedure even in forthcoming
nonlinear cycles and account for the changes of nonlinear
residuals. that is. changes of the right hand sides in the linear Newton equation
as
well.
Given that G~rRES was used at the !.:th nonlinear
rally tempted
ated to (:3. 20):
iteration,
one would be natu-
to so!\'e the follO\ving minimal residual approximation
problem associ-
II (,.,(k)
min
!lE.".(._l,kl._F'ki)
)/ F(k+1)
+ H(k+l)
m+l
!J
m'
II
( I.l )
•
and take
.(~·+l)
_
.~
-'
where
jj(k+1)
is the minimizer of (.l.t).
. t hat problem was
-(k+l)
Hessenberg. Hm
== a and
(k+l)
.50
.
.
IS
\ '(k)-
y.
Here, it is assumed that the initial guess for
no restart has taken place. Of course. the augmented
defined as
)
7t;+11 = (
In terms of a normal equation
(\ -/q
is projected
F(k+l)
r.
onto the current
Howe\'t:'L the quality of
nothing
F(k).
if
on how close
to do if
$~k)
F(k+1)
F(k+I)
already
I\:rylov basis then
totally useless.
In general,
s(k+l)
,-\s Saad [L lO]
may be far from sufficient.
,:;(k+1l
is to the current
the quality of the projection depends
I\rylov subspace.
Ideally. there is
lies there and shares no components
(i.e ..
F(k+l)
E
K
On the other extreme.
= ,O;bk+1)
if
F(k+l)
with
r6~') (or
specified in
is orthogonal
to t.he
== O. implying that the solution of (.l.l) is
we expect to be between these two extremes \vhich not
necessarily lead to satisfactory
situation
(4.1) and (4.2) indicate
= 0). This implies convergence at the same linear tolerance
the previous G :\IRES solution.
current
Expressions
I\:rylov basis by means of the operator
points out in the context of Lanczos iterations.
primarily
(L3)
solution. (.l.t) and (.l.2) reduce to
This approach presents a serious limitation.
that
(L1)
(A(k).
linear residuals.
r6
k
))).
Note that even in the ideal extreme
we may need to obtain an even better linear
solution in order to guarantee
rapid local convergence
(recall discussion on forcing
terms in § 2.1.2).
Since F is a nonlinear function the chances of being close to the current Kl'ylo\'
subspace
(i.e .. having a small angle
with respect
to that subspace)
are minimal.
Chances may increase in any of the following situations:
• There is no reasonable
progress toward the solution (i.e .. we may be far from
t he region of rapid local convergence),
• The function F is slightly nonlinear (i.e., it is virtually
• Km
(A(k).
r&kl)
is (almost)
invariant under
linear or constant).
or
A(k).
Certainly. these conditions are unrealistic in practice and therefore, it is necessary
to refine the solution obtained for (..L 1) in order to satisfy a predefined linear tolerance.
Further
~~l(.-\(k),
improvements
_F(k))
with
can be drawn
F(k+1).
if we try to increase
This implies to reorthogonalize
all \'ectors already present in the basis (i.e .. all columns of
age to accommodate
the new vector.
and Cf'l'adin [:3-1:]. Parlett
This approach
the dimension
of
the new residual against
V(k))
and have extra stor-
has been suggested by Carnoy
[107] and Saad [1101 in the context of symmetric problems
but could be equally applicable
to nomymmetric
Arnoldi instead of the Lanczos algorithm.
problems
uuder the light of the
However, we do not follO\v this direction
due to the following reasons:
• It steadily increases storage requirements
and floating point operations
at the
same pace as does GMRES with a higher restart value,
• Reort.hogonalizat ion to prevent loss of orthogonality
may be a costly issue and.
• There is not much gain in terms of parallel implementations
of the method.
In summary. the remedy to the problem of efficiently handling the changing \'alues of the nonlinear function F relies on advances along the solution of systems with
sen'ral
right hand sides by Krylov subspace methods such as GYIRES. In the follow-
ing section. we discuss how hybrid Krylo\" subspace methods
could be an appealing
solution to this problem.
4.1.2
Reducing
the cost of solving the new Jacobian
equation
\\'e ha \'1" seen t hat project ion and exhausti ve reorthonalization
meet the solution requirements
of a new .Jacobian equation.
refine or replace (by inexpensive
underlying
iterations
The objective no"" is to
means) the solution of (4.1) without
Kr.ylov subspace information.
affecting the
To that end. we suggest to use Richardson
instead of G~IRES and extract suitable relaxation
t he recently updated
are not enough to
parameters
for it out of
Arnoldi factorization.
The work of Simoncini and Gallopoulos offers a satisfactory
answer to such objec-
tive [120.121]. This work was motivated by the need to solve iteratively
nonsymmetric
linear systems with several right hand sides. They propose a version of G~IRES
:\IHC:\IRES
algorithm)
that combines the .\rnoldi
t ion, The resulting algorithm
process with Richardson
(tilt'
itera-
turns out to be more Hexible and efficient than ot her
methods such as block BiCG and BG:\IRES (see discussions on these two algorithms
in [11:3. 122. 120]) since there is no restriction
in the order and availability
of the
right hand side \'ectors and t he cost of the .\ rnoldi process is somehow alleviated
by
cheaper Richardson iterations.
The \IHG\1 RES algorithm
has seven basic components:
projections
of linear residuals onto the generated
generalized
eigem'alues
ters. Richardson
(or pseudo-eigenvalues),
iteration
the Arnoldi process.
Krylov subspace,
computation
Leja ordering of relaxation
and a seed system procedure.
of
parame-
The Arnoldi process serves
to create a basis for the h:rylov subspace associated
seeded residual.
to the linear system for a given
Hence. a block Arnoldi solution is provided to the current residual
together with the remaining residuals previously projected
(Projection
and corresponding
solution of each non-seeded
onto the h:rylov subspacp
residual have the san1P
flavor depicted by the steps (.l.l) and (-t 2).)
In order to define a meaningful set of relaxation
iteration
procedure.
parameters
for the Richardson
the following generalized eigenvalue problem is solved
(.l ..J)
These eigenvalues of (.l ..j) are roots of the GMRES residual polynomial,
dentally, have been useful in determining
residuals polynomials.
which inci-
a suitable compact set on which to minimize
To be more precise, level curves (i.e., lemniscates)
associated
to the G~IRES residual polynomials bound regions (not necessarily convex) that potentially
exclude the origin. thus removing the possibility
for nonmonic polynomials
discussed in Chapter
Related literature
l).
of generating
coefficients
(zero can not be a root of a residual polynomial
In the next subsection,
\ve elaborate
as
WetS
more on this point.
on this topic can be seen in [64. 6.j, 94].
The Leja ordering provides a stable way to apply the reciprocals of the G:'vIRES
residual polynomial roots as the relaxation parameters
for Richardson iteration.
these parameters
are feeded to Richardson as if they were equally distributed
(in the potential
sense) in the region of interest.
Therefore,
Richardson
Basically.
points
iteration
with Leja ordering produces residual polynomials that tend to decrease in norm (i.e ..
convergence is not as erratic as in other known types of parameter
The seed system is a heuristic
the linear residuals.
Simoncini
processed in norm decreasing
orderings).
way to provide an effective processing
and Gallopoulos
suggest that
order. Their choice is motivated
residuals
order of
should be
by the minimization
of t hose residuals having the maximum
subspace.
distance in norm to the underlying
However. this distance is not known beforehand
since the projection
that subspace depends on residuals that. have not undergone
the absence of better information.
1\: rylo\'
onto
the Arnoldi process. In
they assume that all residuals are orthogonal
to
t he current I~rylo\" subspace in order to come up with the norm decreasing selection
criteria.
In our particular
context.
we are mainly interested
eralized eigem'alues. the implementation
ing.
The Arnoldi factorization
of Richardson
in the computation
iteration
of gen-
and its Leja order-
is readily available from the last GMRES solution.
\[oreO\'er, it has been modified due to the Broyden update of Hm. Therefore. solution
of (.l ..l) in terms of the updated
ation parameters
approximation
projecting
associated
Hessenberg matrix should provide the suitable relax-
to the Krylov-Broyden
A(lo;). [n t.he last subsection.
t.he \'alue of the nonlinear
update
we exhibited
of the current Jacobian
some.of the deficiencies in
function at the new point (i.e., the new right.
hand side) onto t he current Krylov basis. Although in ideal conditions
\'ide a small initial residual norm for Richardson
tlw minimal numerical advantage
iteration.
of this projecting
this may pro-
\ve argue further below on
step .. -\ guess
..,(Hl)
=
a
works
nnl:' in practice and it amounts to assuming that the new right hand side is ortllOgonal to the current Krylov basis (i.e .. the hypothetical
concerned about a seed mechanism.
and Gallopoulos
fits naturally
worst case). We need not be
[n fact. the seed selection heuristics of Simoncini
in our case. since the initial guess to the linear system
is zero a.nd nonlinear residuals are sUPP9sed to decrease monotonically
is approached.
as the solution
91
4.1.3
Spectra
vs.
Pseudospectra
In yiew of (L.8). we can realize hO\\" the Richardson
upon the shape and size of the region
include
the origin.
otherwise
timating
or underestimating
norms).
while the
that
minimal
latter
situation
residual
Q is computed
rates
method
the).
mi.
numerical
satisfying
difficulties.
(i.e .. slow reduction
is a critical
In general.
polynomial
of the method
has been subject
overes-
The former situation
may be left out of the set).
issue that
heavilv
the normalization
of the residual
may lead to divergence
polynomial
depends
(.4) . This region can not
than unity on such region.
Q introduces
may lead to slow conyergence
desired
polynomial.
Q (0) = 1 can ever be smaller
condition
C enclosing
Q C
no residual
iteration
(i.e .. the
Therefore.
of careful
the \vay
treatment
in
the literature.
There
extreme
are several
approaches
and undesirable
be roughly
divided
in
pSPlldospectrum
described
The division
is defined
4.1.1
Q and many
above.
computing
obeys
[1:30] and later justified
Definition
.\( (A).
situations
in two: approaches
the pseudospectrum.
by Trefethen
to obtaining
of them
However.
the spectrum
to the ideas
for hybrid
the theory
on this can
and. those computing
introduced
Krylov
may lead to the
on pseudospectra
methods
in [94].
The
as
For
f ~
0 and .4 E
CFlXFl.
the pseudospectrum
of A .
is given by
(4.6)
In terms of perturbation
all complex
numbers
in [...!-norm by
f.
that
of eigenvalues,
are eigenvalues
In practice.
the norm of .4. but larger
than
this definition
of .4. +
E for some perturbation
the value of ( is intended
roundoff
error.
can be restated
to be relatively
as the set of
E bounded
smaller
than
The introduction
of the pseudospectrum
high ~en~iti\'ity associated
is highly non-normal.
to spectrum
framework
computations,
responds
primarily
to the
especially when the matrix .4.
[n this respect. spectrum computations
may lead to misleading
definitions of Q. [n fact. expression (L.8) does not follow in such case. On the other
hanet. it has been found in practice that rough eigenvalue estimations
reliable as reciprocal of relaxation
out that these estimations
parameters
than even exact eigenvalues.
somehow approach the pseudospectrum
is wort h to add. howeyer. that \vhen A is normal the pseudospectrum
balls of radius
t
may be more
around the eigenvalues of A. Hence, limf_-:-oAf (,4.)
It turns
of the matrix.
It
is the union of
=).
(,4.).
Ro •• nDrock
o. _
-05_
-2.
-z
.
5_
-3
·4
IrnaQ,n"rV
_4
R.a.
Figure 4.1
Surface and contour plot of the Rosenbrock .Jacobian
pseudospectra at the first and last nonlinear iteration.
Strategies
gill.
to compute
For instance,
{=H ..t:; : = E C.II=II
Consequently.
the spectrum
usually
of the on-
the field of values of A. (i.e., the set of all Rayleigh quotients
= 1}) is a conyex set at least as large as the convex hull of ,\ (.-\.).
eigell\'alues
distributed
on either side of real or imaginary
easily define regions that enclose the origin.
pseudospectrum
lead to the inclusion
[n this respect. approximations
axis may
to the
can oyoid this situation due to its non-convex defining region feature.
Pow.1l
0.'
-0.5
.
-,
'0
'0
Figure 4.2 Slll'face and contour plot of the Powell .Jacobian
pseudospectra at the first and last nonlinear iteration.
-0 ~ ..
'~~
o
•onoilo,n.ary
0'
.0 ,
.,
.
0
Figure 4.3 Surface and contour plot of the Chandrasekhar .Jacobian
pseudospectra at the first and last nonlinear iteration (easy case).
-
..
..
,
-2
lm·o,nary
-.
0
Figure 4.4 Surface and contour plot of the Chandrasekhar .Jacobian
pseudospectra at the first and last nonlinear iteration (hard case).
,
.
\achitgal
tf
al. [94] found out that the following set
(,1.,)
with
I'm.
standing
for the mth linear residual of G~[RES.
defines a lemniscate
thai
ptfectively excludes the origin.
Computing
C\
(;\IRES
the pseudospectrum
em'ironment
can be computationally
the [>re\'ious observation gives clues for defining a fair approx-
illlat ion to it. To that pnd. we can explicitly compute
takE' rt'l'iprocals of its ('oots as suitable relaxation
This \-iew was gi\'pn hy \achitgal
the G\[RES
Howe\·er. ill
demanding.
f.t aI.
polynomial as a pseudo-kernel
the G~IRES polynomial
parameters
of Richardson iteration.
A more efficient approach
polynomial
and
is to consider
[64]. The idea is to solve the
generalized eigenvalue problem (i.e .. obtain the pseudo-Ritz
values) depicted in (-1.5).
wlwre each eigell\'alue represents a root of such kernel polynomial.
[hp generalized eigem'alue problem can be solved via the QZmethod
dirpct application
of the method may not be that advisable
cOIHlitioning of H: H
1l
ization of Hm.
=
(2 R
E
A more efficient approach
due to the potential
updates
of the Hessenberg
in G.\[RES (recall discussion at the end of ~§:3.:3.1). Thus. ha\'ing
)xm.
jR(JIl+1
we can transform
(.!.:'))
to the following equivalent problem
R-- --,. \Q-I--.
wlwre
QI.
is the
ill-
is to work with the QR factor-
which is readily available from rank-one
matrix ~eneratt'd
77",
m'
[71]. Howe\·er.
111
x
In
leading princi pal submatrix
( -1:.8)
of Qt.
The QZ met hod can be now applied more efficiently due to the presence of the
upper triangular
matrix
R in (4.8).
Besides. this is a much a better
conditioned
problem than (4.0) since there is no need to form the normal equation associated
fl./I'
t'nfortunately.
the complexity
of the QZ algorithm
is of ()(m3)
to
floating point
operations,
even though
triangular
form R. This
111
«
roughly
half of the operations
complexity
is still affordable.
however.
due to the IIp[Wr
m where
for small
II.
Example
4.1.1
Figures
-L L--L-! show pseudospectra
('a~f>Sllsed for test ing t he nonlinear
f
(indicated
dimensional
associate
in laglO
plots.
scale)
to a particular
Richardson
Thl-' simplest
iteration
nonstationary
linear
system
stated
as follows
Algorithm
4.1.1
iterative
guess
For i = O. 1....
residual
.).) Compute
relaxation
The parameters.
'.I'
rank of H. This cycling
system,
[n
pseudospectra
are also presented.
ordering
method
is the Richardson
Ti.
for
iteration.
Given the
i = 0.1. ... m - 1, it can be
iteration)
until convergence
solution
its multiplicity).
.I'll'
2.1 Compute
2.:3 Cpdate
Leja
levels of the three-
into account
parameters
(Richardson
I. Ci\'e an initial
'J
and
values of
give an idea of the sensitivity
of the .Jacobian
iteration
m relation
(1.2) and
(taken
in difficulty
Different
by different
of contours
eigenvalue
at the first and last nonlinear
of the four problem
solver algorithms.
are measured
The number
order to reflect changes
4.1.4
are saved
ri
= b - A.x;_t.
parameter
.1'i+1 =.1'
are applied
do
+
Tj.
T/i .
in a cyclic fashion:
is a way of restarting
Tj
= Tjmodm'
where
If
is complex.
the method.
Tj
m is the
as it
is expected
out
from most
in real arithmetic
iterations.
That
nonsymmetric
by pairing
Tlw Richardson
methods.
These
subspace
methods
product
with
Tj
+ (2
Re rJ
met hod is a particular
methods
free llwthods
iteration
conjugate
is carried
in two consecuti\'('
the [\rylov
their
discussed
Table
,L l compares
the operation
inner
products
of preconditioners
computationally.
:\Ioreover.
them even more attractive
in carrying
counts
KrylO\'
they iue
for parallel
a better
choice than
vectors
that
of G~[RES
rnatrix-\'ector
Clearly,
we ha\'e a good selection
resides
family of Chebyshe\'
i 111-
methods
that
are not in the Krylov
at the end of §§4.1.1.
(DOTs).
(PRECs).
( L~))
to traditional
basis with new residual
as w(' aln'ady
- Asj).
when compared
use represents
subspac('.
(-F
case of the more general
which makes
try to E'xpand
G\[RES
-ITjI2.4)
:,inc(' they are much cheaper
[8]. Therefore
wht>ne\'er
Richardson
its complex
are really attractive
plementations
of .-\XP'{S.
the
IS •
.l'j+l = .rj
inner
problems.
Richardson
of relaxation
out the Gram-Schmidt
and
products
Richardson
(MVP's)
is a more economical
parameters.
process
and
for building
the
linear
The major
Tj.
in terms
use
solv('r
cost of
up the Krylo\"
hasis.
Table 4.1
iteration.
Operation
count comparison
bet\veen G\IRES
The value i indicates the number of iterations
G~IRES and Richardson.
.\lgorithm
G\IRES
Richardson
The effectiveness
the parameters
Tj
DOT
i i+ l)
X
(
10
of Richardson
are applied.
This
AXPY
i"'(i+1 )
!\[VP
and Richardson
employed by
PREC
.)x·
- I
iteration
depends
is fully discussed
heavily
upon the order in which
by Reichel
[109] who proposed
the Leja ordering as a robust way to define the parameter
sequence for Richarrlsoll.
This ordering is also suggested in [9-!] and successfully' used by Simoncini and Callopolllo!'
[ 1:20].
The Leja ordering for different \'alues of
III1 _....!..
T'
J-\
!=o
;-\ It
II
J
= max
Tj
In algorithmical
Algorithm
I
Jslsm
i=Ll
Tj
is defined by the recurrence
T'
- ....:1·
j = L.~ .....
m-l.
(L [())
TI
terms. the above recurrence can be expressed as follows.
4.1. 2
(Leja ordering)
[nput: ,-\n array p with entries Pi E Co fori = l.~ .... n.
Output:
l. ql
An array q \'lith entries qi E C, fori = 1. 2.... n.
= p(~'Iax~[od(lpil.
1 ::; i ::; n)).
2. Complu= TReE.
:L For 1= 1. 2 .....
:U
[1' (Comple;r
n - 1 do
= TRVE) or (im (pd
#
0) then
:L1.2 Cumplc.r=FALSE.
:~.~ else
:3.2.2 Comple.r=TRI-E.
:L:3 end if
-!. end for
The function ~[ax~[od returns the maximum element index of the set specified in
its argument.
Similarly as in the Richardson iteration.
the procedure takes care of the
')8
complex
conjugate
number
modulus
This
updatt>~
order
of
:\ote
that
and its relative
algorithm
the same
4.2
pairs.
requires
only
of operations
depends
distance
with respect
() (ml)
floating
required
the complex
to the rest of the points.
point
to perform
on both
operations.
In fact.
the QR factorization
this
l~
of rank-one
Hm.
The family of HKS methods
In our context
matrix
problem.
has been
\'alue
of the
Hessenberg
obtained
after each nonlinear
already
function
matrix
updated
at the
to the .Jacobian
information
the use of a fair set of relaxation
us to switch
method
Broyden
current
or any other
update.
nonlinear
is. the next
alternates
~ewton
GMRES
nonlinear
Once
step.
\vith the linear
flow of computation
the
equation
and
(:3.28).
system
Richardson
In this
iteration
i~ switched
iteration
to this
way. \ve can use
to solve and make possihle
iteration.
deli\'ered
method.
the entire
to converge
Broyden's
by a Krylov-
and
back to the original
Therefore.
This forces
method,
has com'erged
in order
the
one would ha\'e
to the one governed
is soh'eel by G:\IRES.
Richardson
that
for the Richardson
method)
with
associated
(e.g .. given by Ne\vton's
solution
the computation
information
approximation
update
parameters
a Hessenbt>rg
to be used in conjunction
Spectrum
of the h:rylov-Broyden
eigem'alue
step has been completed
and is ready
new point.
corresponds
by means
consistent
1If'\\'
the Leja ordering
tht'
Tltat
procedurf'
to the nonlinear
solution.
4.2.1
The HKS algorithms
One of the main
lowing
algorithm:
results
on H vbrid
Krylov-Secant
methods
is synthetized
in the fol-
Algorithm
1.
Set
4.2.1
(HKS-B)
FALSE. ok = FALSE and k = O.
.:·;J,:ipgmrf8 =
'2. Gi\'e an init.ial guess
a function
IL(O).
mat ion A. (0) and a tolerance
value
F(O),
a .Jacobian
approxi-
,.,(0).
:30 Do until con\'ergence
=
:3.1 If (ok
TReE)
.)..1 1 C",ompute
then
F(k)
'J
.
:3.1.2 If (sJ.-i pgmres
A.(k-l)
= FALSE) then Perform
a Broyden
update
of
Endif
of
.
:3.L.:3 Else Perform
a Krylov-Broyden
update
A(k-1);
:302 Else o/.· = TReE: Endif
:L:~ [f (..,!.-ipgmrf."
:>)..).
.J L r(kl
l'~
=
FALSE) then
..tJ (k) • H(k)m'
. p( Iate
\'(I~)
Ttl
•
('
:30:3.:3
·~!.-iP!Jmre8 = TReE:
'J.)
.J] -C"'[RES(
I."
ilk)
:1.,
_F(k)
H(k)
.)..).,)
II·m+l.
(k)
m··)
m'
Endif.
:L-t Else
:3o·Ll [0] = Eig(H).
:3.L2
[T] = Leja(f!).
:~.4.:3
[..;;(k).ok]
= Richardson
:3.4...1 ..,J.-ipgmres
=
(.4.(k). _F(k)
.. .,(k).T}(k).
T).
F.-\LSE: Endif.
:3.3 if (o/.: = TRUE) then
1]
:3..3.1 [,\(kl.1}(k+1
:3..3.2
U(k+11
=
u(k)
= Backtracking(A.(k).
+
,\lk) ..;(k)
F(k).
s(k).
u(k).
,.,(k)).
,(k)
.,';.
T}
(k))
•
tOO
:3.0.:3 k = k
+ L: Enclif.
-L Enddo
The abo\"!:, algorithm
imation
is written
to the .Jacobian
is prO\'ideci by means
\Yith some slight modifications.
be obtained
in terms of Broyden's
method.
of Broyden
other instructive
That
updates
and important
is. an approx-
(see step
nonlinear
as well. From now on. we shall refer to the above algorithm
:L 1.:2).
solvers call
as the HI\S-B
algorit hm.
Several
to control
linear
comments
are in order.
the use of G~[RES
iteration
counter.
by the 1lI0st suitable
dynamically
and the Richardson
iteration.
procedure
leaves open
the option
available
(e.g.,
if the former operation
At that
adjusted
variables
Step:2
diffen.>nce approximations
or IIna\·ailable).
The boolean
k is the non-
The variable
to approximate
reevaluate
is either
step we also define an initial
(in accordance
skipgmres and ok are flags
the Jacobian
it exactly
or by finite-
computationally
linear
expensi\'e
'](0), which
tolerance.
to (2.6) and (2.7)) by the backtracking
is
procedure
in step :L·). L.
1'111'nonlinear
Th('ll.
loop starts
with the G\[RES
"II t>lt'llIents belonging
tIll' lllillimal
approximation
tilt' Hessenberg
to prO\'ide
matrix
to the .-\rnoldi
problem
parameters
iterat ion. Since. t he rank-one
we do not consider
Since the \'ariable
iteration).
step
It'ngth.
update
the checking
process
are ref rie\'ed
for Richardson
is performed
of its possible
backtracking
This globalization
procedure
proceeds
linear equation.
alld the least squares
solutinn
of
IIpdatt'
ur
out Broyden's
the Hessenberg
iteration
matrix
singularity
matrix.
or ill-conditioning.
(and after a successful
to \'erit'y and correct
also computes
is ready
for the next nonlinear
once for a given Hessenberg
oJ.: is always true after G\IRES
the line-search
of the ~ewton
to carried
:LL1). Thus.
(as in .i.lgorithm
the relaxation
solution
a new linear
Richarson
the nonlinear
tolerance,
It·,.
tOI
according to the agreement
[f backtracking
function F and its linear ll1oc!t->\.
between the nonlinear
steps take place, a fraction of the nonlinear step is taken. Othel"wisf'.
the full step is accepted to improve the solution so far.
The value of 8J.:ipgmre8 in step :3.1.1 allows the selection between the two tyP(~
of updates.
Clearly. the Krylov-Broyden
update
Hessenberg matrix in t he previous iteration.
follows the Broyden update of th€'
[n §§4.:3.1 we shall devote some discussion
t.o compactly collecting t.hese updates for the sake of efficiency.
Steps :3..1.1 and :3.L~summarize the discussion given in §§4.1A.
The variable
n has
to (·L.j) (i.e .. the roots of the G~[RES polynomial).
the list of eigenvalues associated
These eigenvalues are then reordered and assigned to the list i by the Leja ordering
described in Algorithm 4.1.2.
If Richardson iteration fails. we recommend here to reuse the operators constructed
so far and resume the computation
whether
with G\[RES. \ote that the variable ok determines
to avoid t he backtracking
procedure.
the recomputation
of the .Jacobian
matrix and the \'alue of the function at the current point. Another possibility (which
in fact. it is used in our implementations)
and ree\'aluate
it. If this is not expensive compared
may hf' a(h-antageous
Broydt'nllpdatt~.
mean
t
is to flush out the current .Jacobian matrix
to the rest of the computation
to take a better solution step t han that provided by a Krylo\'-
\Ve remark however. thaI a failure of Richardson does not nect'ssinil.\'
hat a poor
I,ry lov- Broyden
update
has occurred.
The Richardson
itf'l"i1t ilJII
method can fail due to a poor definition of the set where the residual polynomial
minimized.
We already know that indefiniteness
[n t his regard,
undesirable situations
is
(eigenvalues located on both sides
of t he complex plane) or eigenvalues close to the origin cause severe problems
t he method.
il
good preconditioning
(even for an efficient utilization
strategies
of
should remO\"e these
of G\IRES).
lOt
Powell
Rosenbrock
100
100
-2
10
10'·
ti§
4
~ 10Z
,.
a: 10
~
·8
-
-10
8' 10
10
.
10
10'·
~ 10.0
~10'·
,.
10-10
10"·
a
10
15
a
5
10
15
Iteration
Chandrasekhar [c=.999999)
Iteration
ChandraseKhar [c=.9)
100
100
·2
10
ti§
10'·
ti§
10"
Z
a:: 10
:; 10.0
i
10'·
10.,0
_10
10
,'2
10
10'·
Z
.0
<=>
8'10.0
-
20
10"·
0
2
4
Iteration
Figure
4.6
B
a
5
10
15
Iteration
Convergence of the HKS·~
algorithm.
lar function casf.'. although
t.his is the most favorable case for the I\E:\'
and the HOK\' algorithms.
[n the absence of preconditioning,
eigenvalues
located on either side of the complex plane make the Richardson
tiun di\"e\'!~/'at. all nonlinear
I ht'
steps (indicated
cun"(' ...i. The Chandrasekhar
I'Xill1lpk
"11(,('f.'SS
1)["
itera-
by the absence of circles
H-equation
problem
011
is an illustrati\'(>
how the dist.ribut ion of eigenvalues plays a decisi\'t~ role in the
of Richardson iteration
iSf:'(~
Figures -l.:J andL!).
All.Jacobian
ma·
trices are positive stable for both the easy and the hard case. However.
il:;
the nonlinear solution
is approached
the eigenvalues
spread out and closer to the origin. This situation
t
tend to be more
is more pronounced
in
he hard case which explains the success of HKS methods only in the first
half of the total number of nonlinear iterations.
eigem'aluE's
ation.
III
t he easier case was certainly
The other instructive
The major clustering of
beneficial to Richardson
point is the deterioration
iter-
in convergence of
'.
lO.)
all methods (particularly,
for the HKS-B algorithm)
with the Krylov-Broyden
update.
\ote
due to the switching
t hat switching between Newton's method and Broyden's
itly depictpd
in Algorithm
force a Krylov-Broyden
:L2.1.
update
linear step (i.e .. after C\[RES
step). \Ioreover.
Broyden's
The difference here. is that
we systematically
solution at least to take place in every even nonhas computed
the new nonlinear direction at the odd
the already observed quality of the Krylov-Broyden
update and its direct application
alternative
method is implic-
approximation
to the current Jacobian
with respect to
rather than an
of it. provides reasons to believe that this approach is by
far more appealing than the one suggested by Algorithm :3.2.1. In other words. there
are more chances to succeed with a Krylov-Broyden
\"ewton's method step t han a Broyden's
gence rate of an inexact ~ewton's
r nfort unately.
iteration
t here is no guarantee
I he
y =
repeated
hand).
the conver-
t hat the solution obtained by the Richardson
the formal possibility
iteration solutions before returning
within the :\rnoldi
use of Richardson
(neglecting
an inexact
(if ever
!'P-
The basic question is how to update the Hessenberg matrix and
hClt l\pdate consistent
\.:1$
overtaking
lies on the current Krylov basis. [\m. This truncates
'1l\irt~d) 10 G \[RES.
1
method eventually
preceding
method (recall discussion in §§ :3.2.1.)
of creating a long chain of Richardson
kepI'
update
decomposition.
One way to rf'tain
is by simply taking its solution vector s and doing
of course, any shift from the I\:rylov subspace
Then the vector y is used to update
of Hessenberg matrix
that one has in
and relaxation
parameters
are computed again. This vector is nothing but the least squares solution
of llIinqelR"'
IIVy - ·~iland
step
$.
thus. Vy represents the closest vector in [\m to the current
As an approximation
in §§ ·U.1.
and therefore.
to t he problem \ve fall into the same issues discussed
this approach
may result insufficient in many situat ions.
Howewr. it is worth to mention that this approach was reported
in [8:3] with reason-
IOf)
ably good results in practice.
[n this dissertation.
however. this idea is not. pursued
any further.
Of course. a refinement
to the ahove approach
can be achieved by projecting
the Richardson solution vect.or and increasing the dimension of the Krylov basis hy
including
it.
However. t his introduces
reorthogonalization.
additional
which were also discussed in §§ 4.1.1.
The good news is that in many nonlinear
matrices
may not change dramatically
linear solution.
parameters.
difficulties such as those due to
the spectrum
of .Jacobian
during the course of converging to the non-
This opens the opportunity
Hence. the nonlinear
problems,
to further reuse the current relaxation
method may rely on Richardson
iterations
until
its convergence det erioril.tes significantly (or breaks down) in which case. G~[RES is
invoked once again. This idea deserves a separate and exhaustive
treatment
and goes
heyond t he scope of the present work.
4.2.2
The role of preconditioning
So far. our algorithms
('lIIplo.\'('(1
in Krylov-secant
methods
have been described under the assumption
a preconditioner
that we have not
in G~IRES. It is not difficult to realize that.. if a precondi-
tioner is used. the update (:3.'27) rather than reAecting a secant update of the .Jacohi<lll.
reHects a secant update of the .Jacobian times t.he in\'Prse of its preconditioner
assuming
right preconditioning).
[n other words. given .\I(kl as the preconditioner.
(:3. '28) \Vould become
(._\.\l_l)(k+lJ
=
.-t(k)
(.\f-l)(k)
+ p(k)
k
(F(k+l)
+ r6 )
-
(S'(kl)t
where
(i.t' ..
l
.-t(k)s(k )
.S'(k)
(J(k)r
10 'j
This
means
parameters
fectively
that
the spectrum
information
are based corresponds
.-\Igorithm
to the form above.
4.2.1 we should
with its preconditioner
on which
ensure
are somehow
the Richardson
consistent
in order t.o apply ef-
Therefore.
that the .Jacobian
relaxat.ion
operator
with the associated
toget.her
J(k+l)
Richardson
relax-
at ion parameters.
There
are three
form update
iteration
possible
ways to overcome
(4.11) and carry out t he matrix
in terms
of (.-U/-1
)(k+l)
consistent.
with the preconditioned
to soh'ing
the following
this
vector
This certainly
.
Richardson
preconditioned
problem.
products
makes
iteration.
.Jacobian
system
Firstly.
within
we may perthe Richardson
the relaxation
parameters
This approach
is equi\'alent
by Richardson
iteration
(4.l2)
Clearly.
in order
preconditioning
to obtain
effect embedded
a meaningful
in the operator
is no explicit
form of the preconditioner
One possible
approximat.ion
(,\/-1
)(k)
(k+l)
=
(.\I-l)
to t.he problem
(k)
+
step
(.4..\/-1
we need to remove
Unfortunately.
)(k+l).
leading to t.he right llnpreconditioned
by means of the Sherman-:\lorrison-
(,\I-I)
nonlinear
[s(1.-) -
is to perform
Woodbury
there
solution.
the [,rylov-Sroyden
formula.
(,\[-1 )(k)
'Ilk)]
(8(kl)/
(s(kl)/
(.\[-1
}(kl q(kl
that
t.he
update
is. compute
(.\1-1 )(kl
( U:~)
where
and apply it to the preconditioned
iteration.
This implies
solution
the solution
delivered
at con\'E'rgence
by the Richardson
of the linear system
(-L 1-~)
\"ote
obtained
that
the solution
of t.his linear
from a Krylov-Broyden
update.
system
does not necessarily
:\[oreover.
the operator
represent
(.·-LVI-l)(k+ll
that
.\Ilk ...l)
108
l\lay introduce a significant overhead in t he implementation
and in the computation
tion of it) is required.
of a globalization
of the future updates \vhere the Jacobian (or an approximaConsequently,
where even rapidly convergent
its manipulation
Richardson
iterants
may cause misleading situations
for solving (4.14) lead to poorly
nonlinear steps (i.e .. insufficient in producing a descent direction for
The following theorem
update of the Jacobian.
For notational
the proof of Theorem :3.:3.1) we drop the superscript
sign to indicat.e t he operators
Theorem
4.2.1
IIFII).
provides an upper bound for this approximation
~pect to the Krylov-Broyden
+
strakgy
updated
with re-
simplicity (as in
on k and adopt the conventional
by the Krylov- Broyden update.
Let the I\rylov-Broyden
( U 1) .. -\lso. let the Krylov-Broyden
update
of tLU-l be given by
update of both A and AI be given
by the formula (:3.~8). then
II (.-url)
+ .\1+ - A +1/
~ II [ -,;.\~/-III(lIq
.-1811 + K (.\I) 11..11111811)
-
+ IIq ~ ::$11 (1 + Ilqll)
K
(JJ) .
(,t.FS)
where q = F+
Proof
+ roo
1\
(.\l) =
11.\/1111,\1-111
and. provided that
II·~II
=1=
o.
.\ simple algebraic manipulat ion yields the followi ng expression
(...1.,\[-1)+
.\[+ =.,t
+ A,\J-l
p!q - ,\I.,)
.,=
.~I.;
+
(.\[.S)I y
(J/ S)I .\1S
+ p((! - '''\..~)~.\[$)1.\1
( '\/''')
.\1 $
. P l q - "t:i) $1 _ P (q - As) .s!
SiS
Sl.:;
III this development we have used the fact that (.\IS)I P = (Ms)t
to the I\rylov subspace)
in order to split the product
t he last two terms appearing
(i.e .. .lIs belongs
of the two rank-one
at the right hand side of the above expression.
terms ill
Hence.
l09
+P
(.US)t!}.st
(q - .4.8)
t
[ (.\IS)
+ ,_\..\I-I P (.4.\1-1 Taking
norm on bot.h sides and noting
(.-\.\r
II
l
)
+ .\1+ - .-\+ II
::;
.\Is
.sIS
t
}Is
.\182-.
.~,rs
1, we obtain
III - AJI-11111q -Iisil.4.811 + IIq
+
PIs)
,I
r)
IIPII ::;
that
(.\.18)t.\1]
+
. -
-
Asil
(lliill
+ ~)
lis II 11·\I.sll
III - .-\.\I-llllIfLH-'IIII}!.~11
Thus
(lIq -
II (.-url) + .\!+ - .-\+11 ::;111-li:~i~I-lll
Asil
+ II:LW-1\lIlMslI)
- {"IIII:\!8111IYII
+ IIA[1I11811
II
+ q .~
11.\1s1111811
::;
III -
4 .\l- 'II
li.~il
ilq - .-bll
+
....
(11q- .4sll +
(l
+ Ilqll)
t>
f.'
(.\1)
IIAllllsll)
.
(.\f).
o
Remark
4.2.1
\',lIli,,\tes if .\!
\'ote
that
the first term at the right hand side of (-U5)
= .-\. leaving
with the following
liS
1\[+.-\+
- .-\+11::;
,
~Ioreover.
directly
a sharp
relative
bound
IIq ~I.:~·"II (l + i1qll) h: (.\!).
; ·,:,1
error bound
with [,rylov-Broyden
upper
update
can be easily obtained
of the identity
by working
matrix.
.-\+11 < II(q - As)ll.
11.4+11
Iisil
III+A+
This
matnx
bound
arises
since
-
[+ is a rank-one
which by itself. perturbs
perturbation
the Krylov-Broyden
update
of the
of
identity
A. In view
l12
tioner does not e\'ol\'e in agreement
t.o the undergoing
[~ryIO\'-Broyden updates
of
each .Jacobian matrix.
[n view of this approach
the HOK~ algorithm.
(.4,\[-I)lk+ll ;\[lkl.
imation
and the particular
the incorporation
case of the nonlinear
of preconditioning
forces us to work \\·jth
It is clear there that the solution of the minimal residual approx-
problems (:3.:l6) and (:l.Ti) is referred to the linear system (4.12), \vith the
exception
that the \'alue of the nonlinear function is taken at the hh step. Although
this may imply the adoption of some of the potential
ill tlw case of that approach.
form globalization
fortunately
difficulties already discussed
this does not occur here. [n order to per-
strat.egies we do not need the explicit form of the .Jacobian mat rix.
or there is no purpose in either updating
the .Jacobian by a Krylov-Sroyden
There is even a much stronger reason: the failure of the Krylov-Sroyden
li\'ered by the least squares solution of the preconditioned
compromise much less possible ullwasted computation
The key observation
algorithms
not imply recomputation
dOf'S
t he computation
HOI\\ algorithms.
problems (3.36) and (:3.Ti))
than the HKS-B and the HI\.S-
of the .Jacobian and its preconditiollt:'r
is a suitable approach
Obviollsly. any attE'mpt to update
a relative high O\'erhead to a computation
with the .Jacobian matrix.
Furthermore.
npproach is to keep the preconditioner
(-f. 11) ..
step (i.e .. de-
if
providing the improH'd step fails.
hxing the preconditioner
Theorem
update.
sterns from the fact that the KE)J and HOI\.\
E~ algorithms.
Thlls.
[\E\ and
4.2.2
in the nonlinear
the preconditioner
[~E:'\ alld
illtrndll(,('"
that does not require direct manipulation
the following theorem shows that the \)pst
fixed in all KrylO\'-secant algorithms.
Let the Krylov-Sroyden
-\lso. let .\1 be a preconditioner
update of A.U-l be given by
for A and. let the Krylov-Sroyden
0.5
0.5
en
a
.§
'"
-0,5
1
Real
ci>
en
a
-0.5
1
Real
2
0.5
0.5
.§
a
2
a
§
1
Real
0
2
-0.5
,
a
Real
2
Figure 4.7 Pseudospectra of preconditioned .Jacobian matrices for the
extended Rosenbrock function. Cpper left corner: AAI-l; upper right corner:
.-\+ ,\.1-1: lower left corner A + (AI-1)+ and, lower right corner: (.4.A/-1)+.
1]
J
co:
<::X:>
c
0
en
~
,
.§
::::?
I
0
-5
2
,
a
co:
~
:5
:::?
1
Real
51
I
.: :j
~
2
Real
Real
1°1 c
-5
5
x
0
51
::s
2
en
'"
.§
~
x
01
-J
0
~
c
<:::::X::>
1
2
Real
Figure 4.8 Pseudospectra of preconditioned .Jacobian matrices for the
Powell singular function. Upper left corner: AJI-1; upper right corner:
.-\+M-1; lower left corner A+(;U-1)+ and, lower right corner: (.4..\.1-1)+.
III
0.2
0.2
(3
0.1
C>
§
'"
0
-0.1
-0.2
C>
.§
0.5
1
Real
C>
.§
0
-0.1
-0.2
1.5
0.2
0,1
0,1
®
-0.1
-0.2
C>
.§
I
-0.2
1.5
0.5
1.5
8
0
-0,1
1
1
Real
0.5
0.2
0
8
0.1
0.5
1
Real
1.5
Figure 4.9 Pselldospectra of preconditioned .Jacobian matrices for the easy
case of the Chandrasekhar
H-equation. Cpper left corner: AM-1: upper
1
right COrlwr: ..-\+.\1- : lower left corner A+(J.J-1)+ and, lower right corner:
(A.U-I)+.
\
0.2(
0.1
C>
§
'"
Or
r
~
I
-0.1 f
~
-02h
0.5
1
Real
0.2,
I
0.1
C>
§
'"
f
,I
I
-0.1
-O.~
/
I
0
.5
,
1
i j
!
/
\
®
1
Real
0·'8
0.2
I
j
-0.1
I
r
I
I
-0.2'
0.5
1.5
1
Real
1.5
1
Real
1.5
0.2
I
0.1
C>
§
0
-0.1
1.5
e 8>
0,
-O,~
8
,5
Figure 4.10
Pseudospectra of preconditioned .Jacobian matrices for the
hard case of the Chandrasekhar
H-equation. l:pper left corner: AAf-1: upper
right corner: .-\.+;\/-1: lower left corner .4+(.\1-1)+ and, lower right corner:
(.4.\1-1)+.
of A be given by the formula
update
(:3.28), then
(4.18)
=
where q
Proof
F+ + "0.
h"
= il.\IIIII.\[-ll1
(.\1)
and. provided
Iisil =I o.
that
It easily follows that
(,."'1....\[-1)+
Taking
the /z-norm
1+ = P (q - As)
,r
\[ _
."'''1
...
I
,,.
('I-)t"1
.......
S
\ f
(4.19)
.LV.
on both sides. it results
1
II (.-L\[-I) + J[ - .4+11::; IIPllllq <!Ill - Asil
- IIsll
II"HIIIIM- 11
AslI
I'i.(J[).
o
The result abow~ applies
To characterize
the difference
t hat implicitly
HKS-E:'\
directly
associated
algorithms.
to the nonlinear
between
the new preconditioned
to the Richardson
we again
II (.-t.\1-1) +
KE:--i and the HOK~
relaxation
.Jacobian
parameters
algorithms.
matrix
and
in the HKS-B
and
ha\'e that
- .4+,\[-111::;
II (.-t.\r )+
l
.\I -
A+IIII·\[-111
< Ilq - As II II ,\I-Ii\ . ( \'I)
I\sll
. I/'i,·
.
The
entire
enhancing)
change
the quality
boils down
Krylo\'-secant
and
to realizing
of the preconditioner.
in close correspondence
conditioned
.Jacobian
discll~sion
to update
methods
its preconditioner
that
rather
than
we should
better
let the preconditioner
(3.28). Therefore,
is dictated
are to (.-\..U-I
maintaining
the performance
by how close the combined
)(k+I)
and how this itself.
(or
of preupdated
is close to
II()
t
he identity mat rix. Obviously, maintaining
these operators
time the G:\[RES
does not prevent the use of the best preconditioning
iterati\'e
4.2.2
strategy each
much clearly all the above discussion.
Figures -l.'--LiO show pseudospectra
tended Rosenhrock
function.
the extended
plots of the ex-
Powell funct.ion and the two
cases of the Chandrasekhar
H-equation.
.-\.\[-1,
and (:_\..\/-1)+ are presented.
..\+.\/-1 .. _\+(.\/-1)+
generated
In every Figure, subplots
in terms of the first and second nonlinear
agonal preconditioner
All plots are
iteration.
A tricli-
similarity
shown by the operators
.4.+ ,\I-I and
(.,\.\[-1)+
ill ilil problem cases. which confirms the result established
Theorem
l.~.~.
The Rosenbrock
the Krylov-Broyden
a slightly better
updates
function case perfectly illustrates
may cause certain
new .Jacobian.
condition
IIo\\·c\·er. this situation
t
ion subplots indicate.
number
than both
in
how
quality deterioration
In this particular . .4.+ (.\1-1)+
.4.+.\[-1
of
presents
and (A.\/-I)+.
does not al\\'ays hold as the Powell singular func·
\ote.
t hat one conjugate eigenvalue pair of A + (.\[-1) "-
would be alit of the com'ex hull (i.e .. the G:\[RES
lemniscate)
to (.4..\[-1)+ . This may negati\'ely affect Richardson's
. as it was discussed in §§4.1.:3. The Chandrasekhar
one may obtain a better conditioned
trend is emphasized
for
was employed in all cases. The reader can observe
the close pselldospectra
the precollditioned
among
solver is required.
The following example illustrates
Example
this consistency (or resemhlance)
associated
rate of convergence
H-equation shows that
matrix A + (.\1-1)+
than .4.'\1-1, This
from the easiest to the most difficult case of this non-
linear integral equation.
liT
Powell
Rosenbrock
100
10°
-.
10
10-'
~ 10-'
~ 10"
Z
_.
a: 10
a
-10
..
10
10 0
10-12
5
10
~
10
15
0
10
15
Iteration
Chandrasekhar (ca.999999]
,
20
0
10
-.
10-'
10-·
Z
5
Ileration
Chandrasekhar [c2.9]
0
10
~-t
~::-:·l
8',0"
-
~
10-'
~
~
-
G
-
EO -
'0-
_.
a: 10
~ 10-·
a
~10-·
a
8',0"
-
-10
10
10
,'.
10-10
0
10-12
2
4
6
8
0
5
Iteration
10
15
Iteration
Figure 4.11 Convergence comparison between the HKS-Broyden
(dashed-dotted line) and the HKS-EN (solid line) algorithm with tridiagonal
preconditioning.
Rosenbrock
,.
10
Z
10"
-
10
a:
z
a:
a
10"
.
•.
~10
,
~ 10-'
~
~10'
Z
8
~ 10-
"'J:
10
10"0
•
,.!
0
10
10°
10-12
10
Ileration
Chandrasekhar [c2.9)
5
15
0
-. tI
10"
~ 10-·
f
t
~ 10'·
10
15
20
Z
~10-·
,.
to- °L
1
10
Iteration
10"
~ 10"
~'0"1
-
5
Chandrasekhar [c=.999999]
100
,
10
~
Powell
100
10°
0
10-10
10-12
2
4
Iteration
6
B
0
5
10
Ileration
Figure 4.12 Convergence of the HKS-N
algorithm with tridiagonal preconditioning.
15
Since the procedure to introduce preconditioning
established.
in the HKS algorithms has Iwen
it is now convenient to show its performance
Example 4.2.3
and 4.2.2.
This example complements
Clearly. the introduction
the success of the Richardson
in practice.
results of Examples -l.2.1
of preconditioning
iteration
helps notably in
in all HKS algorithms
(indicated
by circles on?r the curves depicted in Figures -l.ll and -l.12). The Powell
function is a particularly
instructive
case. The previous inclusion of the
origin in the domain of minimal residual polynomials
by preconditioning
here but also A+.H-1 does a good job of approximat-
ing both 04.\1-1 and (AJI-1)+
succeeds in every turn.
in generating
nonlinear
. Consequently.
the Richardson
::\ote that the more Richardson
a descent. direction for
iterations
is not only removed
IIFII.
iteration succeeds
the greater the total number of
is. This reflects the fact that Krylov-Broyden
are not as good as ::\ewton's steps.
\\'e shall later address in §§6.1.1 The few Richardson
Rosenbrock case and the C'handrasekhar
iIlcrease of "'.(.\!) at those !lonl i l1t'ar
steps
\Ve stress this does not necessarily
mean an increase (of the same order) in the overall computing
4.2.3
iteration
cost as
failures (as in the
easy case) is due to a noticeahle
sl f'pS.
Globalization
We ha\'e already
discussed
ill §§2.1.:3 t hat a globalization
strategy
is necessary
t.o
pre\"ent possible mo\'(-'ments away from the solution or even divergence of t.he nonlinear
procedure.
This is a consequence of the poor approximation
of eq1\at ions produced by a linearization
the solution.
In that opportunity.
of the nonlinear system
by Taylor series whenever a point is far from
we argued that manipulation
of .Jacobians at the
it y
current point are required to both carry out parabolic line-searches and computation
of t he forcing terms.
[n our particular
context.
the Krylov-Broyden
~ociated quasi-:-.rewton directions.
that is
to be a direction of decrease
IIF!I.
for
$(1.:)
update delivers systems whose as-
(.4.(k»)
= -
the f'xact .Jacobian matrix at the current point.
customary
globalization
Since our secant methods
eration to implement. Eisenstat
tolerances
dynamically
rarely occurs and therefore. it
IS
as if they were exact for the purpose
are inherently inexact.
we also adopt the last consid-
and \Valker ideas [.56]. That
is. we compute
as if we were dealing with an inexact
linear
Newton method.
Of
to the .Jacobian are sufficiently good to avoid
possible breakdowns of the line-search backtracking
additional
explicit knowledge of
strategies (see [79]).
course. we assume that approximations
The incorporation
without
On
This is a common problem of any
In pract ice, howe\·er. t his situation
to handle .Jacobian approximations
of implementing
are not guaranteed
In fact. they may fail to satisfy (2 ..)).
top of that. there is no \vay to verify this situation
Sf'cant method.
-1 F(I.:),
strategy.
of line-search in the HKS-B and the HKS-~ does not bring any
considerations
to the discussion in §§2.t.:3 since a .Jacobian approximation
is ['t'i1dily a\'ailable via the E:rylov-Broyden update.
IIowt.'\·er. an important. note call
he hrought up in relation to the directions obtained from t he least squares solutioll uf
the minimal residual approximation
HOK\
l:J,:36) ill the HKS-E~.
nonlinear
KE\
and the
algorithms.
The point is that the least squares solution of (:3.:36) does not require the explicit
knowledge of the approximated
search backtracking
.Jacobian matrix.
and t he forcing term criterion selection we should try to avoid its
f'xplicit use as well. :\ot. surprisingly.
t
In order to develop an efficient line-
the underlying
.-\rnoldi factorization
his possibility which in advance was presented in §§2.2 ..'j,
provides
I~O
Hence
in \'ipw of the solution
fo!lo\\'ing
two quant.ities
im'oh'ing
to
(:3.:36). we can infer from §§:2.2..) that only t.h~
.·l+ are required:
and
11. =
II .-l+s
=
"
Both expressions
HOI~:\ algorithm.
is given
by
rpmain
CI
F. ~':n+l ( Jel - -+))
HmYm
111'",112 - !IF112 - 2 (F.A+s).
The complexity
for computing
Hm
of consecutive
the inner product
updates
in the
(F, V~+1 (3el - H:!J,,,
0 (n + 1111. + ml1) which. for small values of m. may be preferable
Since. we are using right preconditioning,
than
these expressions
the samp.
[n particular.
for
+ IIFII 1. + 2 (
1.
arE' also \'alid for any number
+.., directly.
computing.-\.
Ilrmll
zero initial
if we are luckily able to com'erge
guess
(recall
within
the
GMRES restart
2.:2.:2). we have the corresponding
Remark
window
simplified
expressIons
;llld
This
:\ote
that
simplifiration
implies
additional
-II Fill. + 111'",112
computing
advantages
does not require
quanti t jes ha \'e beell computed
previously
the contrary.
distribution
whatever
(F. :ls} demands
overhead
tions.
the data
communication
of communication
This special
for parallel
among
communication
and are therefore
layout
readily
because
available.
of .4, F and s, the inner
all processors.
but also a synchronization
case can be fully exploited
implementations.
This
point
tht'st'
011
product
not only introducps
an
for all local computa-
in any of the Krylov-secant
algorithms.
i:2i
4.3
Computational considerations for HKS methods
\Ve can further exploit the secant update (:3.28) to save morecomputation
B and the HKS-E0i algorithms.
features of Krylov subspace
We already know that one of the most important
methods is that they do not require explicit knO\"'ledge
of the matrix that multiplies a vector and the preconditioner.
action. This feature motivates
multiplications
for any starting
the Jacobian
to perform
matrix.
In this way.
of the .Jacobian, we are able to avoid problematic
to that purpose
the .Jacobian sparsity structure
valuable information
representation
fill-in due to secant rank-one updates.
change secant updates
maintain
without recomputing
approximation
issues such as matrix
We only require their
the present section.
In this section we propose limited memory compact
matrix-vector
in the H[~S-
There are also least-
[-!9]. However. since they are thought
constant throughout
secant updates. some
may be left out and spoil rapid convergence properties.
On the other hane\. in large scale settings and cases of ill-conditioned
systems. preconditioners
iteratiw
.Jacobian
tend to be expensive in response to the limitations of Krylov
solvers in tackling these problems.
problems not only demands
We can expect that solving large nonlinear
costly function and deri\'ative
significant computer time in setting up these preconditioners.
computations
type of iterations
implicit forms for updating
(see Chapter
(see e.g .. [,lj. [72]). Hence. it is important
.Jacobian::; and preconditioner
but also a
[n fact. preconditiollcrs
may not be a\'ailable in an explicit form. as it occurs in two-stage
or inner-outer
operators.
the purpose of using Richardson
iteration.
."»)
to employ
We point out
according to our discussion in §§-!.2.2. that \ve need not update the preconditioner
preconditioner
to
for
However. it may be desirable to update the
(\'ia Broyden's update) to accelerate the convergence rate ofG~lRES.
Let us remind that
Krylov-Broyden
and Broyden updates
fashion in the HKS-B and the HKS-E~ algorithms.
occur in an interlean>d
The final part of the section is de\'oted to studying
of these limited
operation
4.3.1
memory compa.ct representations
the computational
impact
in terms of their floating point
complexity.
Limiting memory compact
representations
\Ye ca.n either adapt mult.iple secant. updates or limited memory quasi-:"iewton
pact representations
particularly
to carry out implicit and efficient secant updates.
rOlll-
They are
llseful when analytica.l deri\·at.ives are not available or are costly to com-
pute. The former approach is widely' known and it was formerly suggested by Barnes
[71 and Gay and Schnabel [681. ~[ultiple secant updates enforce a set of secant conditions to hold but are unstable numerical representations
linearly dependent.
if the directions are nearly
The latter one. cOl1\·ersely. does not present these numerical dif-
ficulties but t he secant equation
is only guaranteed
to hold for the previous update.
This type of scheme was recently proposed by Byrd. ~ocedal and Schnabel [281. They
claim t hat there is not a clear distinction
[n t his work. we limit our attention
a nalyzed by Byrd. \'ocedal
t
hat obtained
is key in the deri\'ation
Lemma 4.3.1
R.nand
to the compact limited memory representat ions
and Schnabel.
by multiple secant updates
bela\\' is lower triangular
p(k)
=
instead of
as to which one of the two is the best.
Iwill!?;
([n fact. these representations
eli!fer from
in that the definition of the matrix
a full dense mat rix.) The following
1(-'111111<1
of such representations.
Let
{.s(k)
[(.~(k)r
8(1.)]
H~o and
-I
{y(k)
H~o be
(provided that
ing the followi ng mat ri x recurrence
s(k)
=1=
.\'(k)
sequences of \'ectors in
0), T:/k = 0.1. ... , defin-
with
<{>(O)
= O. Then
=
<f>lk+l)
y(k)
(.Vlk))
= 0,1.
V~~
-I S(k),
... ,
where
and
if i S: j.
otherwise.
Proof
This is proved by induction as part of Theorem
The following theorem
Krylov-Broyden
updates
provides
a compact
and Broyden updates
representation
for the alternating
of the .Jacobian matrix.
should be applicable in both the Richardson iteration
Broyden update is followed by a Broyden update.
pact representation
o
6.1 in [28].
This form
and GMRES. Since the Krylov-
the formula is similar to that com-
of Broyden's method with the exception that a projection onto a
KrylO\' subspace occurs at e\'ery' ewn nonlinear step.
Theorem
4.3.1
Let
AY)
tained by updating
Broyden's update
A(O)
be a nonsingular
r~ 1 times
matrix and let
with formula (:3.:28) and
A(k)
be ob-
l ~j times
with
for I = O. 1. .... ~~- 1. Then
(4.20)
\vhere
S(k)
and
.v(l,)
are defined as in Lemma 4.:3.1 and
Q(k)
=(
with
for I even.
for I odd.
q(O).
q(1),
...
• q(l:-I)
)
.
I~ I
Proof
\\'h('re
whf're
\Ve express .-\(1.:) as
.\(0)
is a constant
B(O)
=
o.
B(/;)
=
8(1,-1)
,
[
=
p(i.')
term and the recurrence
[I -
(. ...(1.:)
p(k) s(k)
(.,(k)) t ,~y)]-1 . Hence.
r] +
is defined as
B(k)
p(klq(k)
t • Vk
(.-;(1.:))
= O. L ....
using Lemma4.:3.1 we obtain (..J:.20) in a straight-
forward wav.
0
The result is a n'ry simple extension to the limited memory compact representation of Broydf'Il'S Ilpdate.
compact
representation
For the purpose of applying the preconditioner.
of Broyden's
update is the most convenient.
to he aware t hilt. t he formula is used after a Richardson
'iurcessfully
obtained.
Applying
iteration
the inverse
We only have
solution has been
the Sherman-\Iorrison-\Voodbury
formula we call
('asi ly obtai n the f'xpression
.\,(1)
=
.v(/)
~(l)
"-
()(I)
an(.I
\ote
~\'(l)' IS
that
= (
F('2) _
+ (.~,(/))'(.\/(l)lrl
_
('(
.~ 1)11)
...
F('!)
F{l).
(eI fi neeI·'SlIl1I '1ar I y to ,\'(1..) as
.\'(1) -
u'(I)r
S(l)
is a I
;<
~\
_
.III
..
F(3).
u,(11r
()(I) -
. ~,(k)
...
...
)
..
• Flk+I)
L emma 't...
I '3 1 Wi
V
,;,(1).
=
_
(k+1)
..!
F(k)
).
. h k.
WIt
I strict upper diagonal matrix .
=
1. '3
.. ,)....
4.3.2
Since
Computational
.S'(k).
putation
Q(k)
complexity
Q(l)
.. V(k) .. ~(I).
and Sf(l) increase in one column at the time. some com-
and bookkeeping can be performed prior to the application
of the Jacobian
and its preconditio-ner in both the Richardson iteration and GMRES. In addition.
note that there are some common operations
Starting
we
between (4.20) and (4.21).
the analysis with .Jacobian products,
we observe that the following oper-
ations ran be prepared beforehand:
• Formation of
(.v(k))
-I .
~ot.e that
(S(k-1lr
(p(k))-l
V(k-I)
o
s(k)
thus
(S(k1r' =
Therefore.
matrix
0(11
(.\'(k:"r'
for e\'ery incoming step ,./ we only require one dot product.
\'ector product
+ kn + k"!)
• Formation of
inal.Jacobian
and one backward
t\lat\'ec
indicates
impll'llwnt.at iOIl.
This yields roughly
floating point operations .
This only requires one matrix \'ector product (wit h the orig-
A(O))
and two :\XPY operations.
associated
+ mn) + 0 (\[atVec)
Ci \'f'n a \'ector
substitution.
one
Q(k).
the computation
o (n
(
It
with the orthogonal
projector
ha\'e
P. Overall. we have
t floating point operations.
E IRn• the product
the typical
[f /,:is even we additionally
A. (k)
cost of performing
It
can be carried out as follows:
a matrix-vector
multiplication
in a particular
l~(i
It is not hard to show t hat the complexity
o (\[at
Vec).
respect
Thus.
to /".
governed
this operation
~evertheless.
[n a similar
fashion.
has a contribution
for small
by the matrix-yector
of this operation
that
of k, the
values
0 (1m
+ /...2)+
grows quadratically
whole operation
with the initial .Jacobian
we can determined
is given by
wit h
should
1)('
approximation.
the complexity
associated
to the prerOI\-
ditioner.
• Formation
right
of
(.y(l))
to left. we obs~rve
one .-\X PY operation.
llpdate
the mat.rix
opplication
with
We need to form
-1.
operat.ions.
of this partial
of [(II
==
(.\f(0)
.5(1) -
here is one AXP\'.
of
implying
of 0 (nl)
result
r
to
.\'(1).
Therefore.
(Prec)
(S(l)r
of nonlinear
The
floating
S(/)
point
and the LC
point operations.
j-:
perforl1wd
0 (III floating point operations.
It
E IRn is clone as follo\\'s:
~ote
that
I i~ approximately
is characterized
half of the total
numlwr
iterations.
The analysis
the nonlinear
(Prec).
twice.
to
term of this expression
From there it is t'asy to H'rify that the overall cost of this operation
+ nl + il) + 0
il1\'oln.>s
in ordt'r
the only operation
The act ion of the preconclit ioner onto a given vector
by 0 (n
Q(I)
and the multiplication
+0
X(I) -
Q(l). The second
1
in
this contribution
0 (II + III + 13) more floating
of S(l) introduces
first. Going from
performed
onto one \'ector
the result gives rise to a total
ill t he computation
required
has been already
preconditioner
Q(I)
of a new column
so we do not need to count
The addition
factorization
the formation
This operat.ion
of the initial
(.5(/)) t of
• Formation
that
(S(/)r C\J(O))-t
shows that
method
the cost of Richardson
ach'ances
iteration
to the solution .. \s noted
and
G~IRES
increases
as
above. k (i.e .. the numher
of
~e\vton steps since the last actualJacobian
to t he problem size in practice.
recomputation)
should be small compared
However, this linear growth in the operation
may be an import.ant concern \...
·hen .Jacobian and preconditioner
COllnt
assembly costs are
overtaken by the cost. of these linear solvers.
Storage requirements are also a delicate matter for significant values of k. \'evert heless.
one of the main aC!\'antages of the Byrd. :'-Jocedal and Schnabel limited memory rOnlpact represent at ions is t hat they are well defined for any value of
Consequently,
S(k)
(or
~'(k)).
we can prefix a maximum value for 11:, say 7 or 8, and start replacing
the oldest columns in the each of the above defined operators.
This can be done.
without affecting the process considerably.
We finally remark that this implicit manner of performing low-rank updates sacrifices part of the parallelism.
duced in the matrix-vector
Specifically, some new synchronization
multiplication
points are intro-
and in the application of the preconditioner.
Ho\\'e\·er. on the other hand, the block structure
a good degree of coarse grain parallelism
of compact representations
t hat grows as more nonlinear
proceed. This is certainly a trade-off that deserves particular
attention.
contains
itera t ions
129
Chapter 5
Preconditioning Krylov methods for systems of
coupled nonlinear equations
5.1
Motivation
In this chapter we focus our attention
on two-stage procedures
in the literature as nested or inner-outer
We address their use as preconditioners
ing from the cell-centered
l"lement discretization
procedures; see e.g .. [3,4, 16,41. 57. 72, 11:2].
for the several large sparse linear systems aris-
finite difference or. equivalently,
(with an appropriate
sequent. :\ewton linearization
of the coupled algebraic system of nonlinear equations.
surprisingly,
are not frequent in the literature
trasting
lowest-order mixed finite
rule: see [141]) and the sub-
quadrature
These linear systems (i.e .. instances of ~ewton equations)
and indefinit.e. \ot
which are also known
specific preconditioners
are highly non-symmetric
for these type of problems
due in part to the complexity suggested by the con-
ph~"sical beha\'ior of the \'ariables invol\'ed:
pressures
(elliptic or parabolic
rom ponent) ilnd sat Ilrat ions (hyperbolic or convect ion-dominated
component.)
De:-;pite t he difficulty of these linear systems. there are certainly some "nice" propl,rties associated to the coefficient blocks that affect each type of variable. Cnde(' mild
conditions.
which are regularly
is irreducible and diagonally
met at a modest time step siz.e. each of these blocks
dominant.
\Ioreover.
the strict diagonal dominance
in
some of these blocks leads to the :\[-matrix property. These block algebraic properties
can be exploited so that better
system.
t
\Ioreover.
conditioning
can be achieved in the entire coupled
de\"ices leading to this desirable situation
he coupling of the discretized
also aid in \veakening
nonlinear partial differential equations
represented
by
the off-diagonal blocks. \Ve call these devices decoupling operators
a preprocessing
:itep to facilitate the effectiveness of two-stage preconditioners.
We remark
intermediate
t hOllgh.
that different solvers or preconditioning
to multi-stage
methods
\Ve rather center our attention
complement
The combinati\'e
based system.
as a possible inexact
order to strengthen
algorithms
that anse
block .Jacobi. block Gauss-Seidel and Schur
met hod relies primarily
[ll
In fact. the idea can be
on those two-stage
based. We include in our analysis a combinative
in [11] and later restated
can he used as
[~7]. \Ve do not pursue this idea further here.
natlll'ally in block type of preconditioning:
proposed
strategies
steps within these two-stage preconditioners.
generalized
and use them as
preconditioner
procedure
in [1:38. 1:391.
upon the solution of a reduced
its robustness
originally
pressure
we propose an additive
llluitiplicati\'c
extension of this combinative
preconditioner
concentration
(i.e .. density times saturation
of a given phase in our particular
residuals.
We also aim these preconditioners
two \\'('11 known I\rylov-subspace
[I
is \\'ort h mentioning
ill1plicilllt'SS
I iOIl rUl'llIlt!at
iterative
Seq uential solut ion methods
by means of operator
of
equations
or time-lagging
(i.e .. remove part of the flllly
role not only in the time discreti7.a-
flow and transport
can be regarded
splitting
case)
methods:' G~"IRES and BiCGST.-\B.
in time) ha\'e played an important
also in the solution of ~avier-Stokes
in terms of pressure and
at adding efficiency and robustness
that ideas to sequentialize
ion of multi-phase
and
in porous media simulation
gO\'erning fluid dynamics
as strategies
hilt
problems.
to decouple the system
some of the variables present in the
physical model. Along this trend, we have the \vell knO\vn [\IPES (L\Jplicit PressuresExplicit Saturations)
formulation in reservoir simulation (see. e.g., [.5]) and. for .\il\·jer-
Stokes problems. t he segregated
methods
in CFD [/5. 16]. Such strategies
.tainly be inspiring to generate preconditioners
can cer-
for coupled linear systems arising from
t he fully implicit scheme. This general idea motivates
our discussion here.
as follows. We begin § 2 with a presentation
This chapter is organized
equations
governing the multi-phase
discretization
structure
and the linearization
flow in porous media.
by the Newton method.
We then describe their
In § :3, we analyze !'ht-'
of the linear system to be solved at every Newton step.
discussion on two different decoupling operators
§ 4 focuses the
and their implications
in cluster-
§ 5 is devoted to discussing the
ing the eigenvalues of the original coupled system.
philosophy behind the family of two-stage procedures
ditioners that the author considers most appropriate
of the
and to describing those preconfor the type of modeling problem
addressed in this dissertation.
5.2
Description of the Problem
This research concentrates
\'v'hich constitute
on the analysis of the equations
the simplest way to realistically
port in porous underground
formations.
look at the two-phase model.
for black-oil simulation
model multi-phase
flow and trans-
To further simplify the presentation
Extensions
to multiple unknowns
we only
per grid block are
readilye\·ident.
5.2.1
Differential
The basic equations
Equations
for black-oil reservoir simulation
tions for oil. gas and water.
However. for simplicity.
wetting (i.e .. water) and a non-wetting
n. respecti\Oelyo .\ more thorough
[91]. The mass conservation
consist of conservation
we limit the presentation
(i.e .. oil) phase. denoted by subscripts
description
equato a
wand
of the model can be found in (0)] and
of each phase is given by
(:).1 )
(.).2 )
where PI is the density.
term
with denotes
the phase
where
0
the production/injection
permeability
the \'iscosity.
PI is the pressure.
ca.n be either
tV
the following
and non-wetting
• ('apillary
depend
TIll' model
also allows
for slight
t-'IHries. porosity.
The simulator
problellls
and
CI
\'iscosity
are gi\'en
and depth
general
boundary
is
III
is
The subscript
I
permeability.
and Z is the depth.
phase.
add up to one:
on both
location
compressibility
physical
depend
used in the experiments
from both the petroleum
it can specify
UI
These
equations
are
Sw
+ Sn =
1.
= PrJ - P" ..
permeabilit.ies
PUI
and
relations:
• Relati\'e
where
conditions.
krl is the relative
tensor.
sat.urations
P... (S'L')
prt'ssure:
at reservoir
or n for the non-wetting
extra
ql is the SOI.Il't't-'
t is time.
as
9 is the gra\'ity
for the wetting
through
• \Vetting
POlf.'"!P/.
rates
Darcy \'f'?locity which is expressed
K is the absolute
coupled
SI is the saturation.
is t he porosity.
constants
and saturation.
of both
.. ·\bsolute
au", . ii
i.e .. PI
permeability
=
(PI)
tensor
only upon location.
presented
in t his work can accommodatl'
and the environmental
conditions
phases.
engineering
disciplines
for
given by
+ VP'l:
= hl.L"
(0.:3 )
(·~.I )
1:~:3
\vhere a and v are spatially varying coefficients, ;'i is the outward. unit, normal vector
and hi is a spatially
varying function.
Initially. Pn and S'l' are specified.
to solve for an initial value of Sn.
:\ gravity equilibrium
On reservoir engineering,
conditions are of :'-!eumann type for both the saturation
condition
is then used
the typical boundary
and pressure unknowns.
The
resulting (possibly) rank deficient linear system is solved by choosing the bottom hole
pressure at a given reservoir location.)
Frequently, t.he primary unknowns in the preceding system of parabolic equations
are pressures and saturations
of one phase or two different phases (see the discussion
of [.S]about other possible formulations.)
are Pn and
respectively.
en. standing
All other variables can then be computed
parabolic-hyperbolic
unknowns in our simulator
for pressure and concentration
In the case of slight compressibility.
character.
pressure and one nonlinear
U>l].
The primary
of the non-wetting
phase.
explicitly based on these tvv·o.
it can be shown that the system is of mixed
with one nonlinear
convection-diffusion
In this model. there are weak nonlinearities
parabolic
equation
equation
in terms of
in terms of concentration
related to those \'ariahles that depend
upon pressures of one phase (e.g., densities) and their effect depends on the degree
of pressure change.
In contrast,
basically depend on saturations
strong nonlinearities
such as relative permeability
:'\ote t.hat a combined nonlinearity
depend upon pressure.
incompressibility
are present in \'ariables that
and capillary pressure.
effect is present for concentrations
The pressure equation degenerates
of both phases (i.e .. en
= Cw = 0).
since densities
into an elliptic equation for
On the other hand. the diffusive
term in the latter equation
vanishes in the absence of capillary pressure. giving rise
to a first order quasi-linear
hyperbolic equation.
1 :\ l
5.2.2
Discretization
,\'owadays.
resen'oir
disc·ussionsl.
[n between
discretizations
Howe\'f'r.
th(> fully implicit
altel'llatiw's
methods
resides
len>1. [f :'iewton
systems
in long term
method
t he
and
context
dislTt't ization
blocks
discretization
blocks.
U =
1/.1(')
and adaptive
robustness
drawback
system
implicit
among
these
of fully implicit
of equations
then several non-symmetric
at each time
and indefinite
linear
at each time step.
unknowns
and
The main
of a large nonlinear
of the two-phase
COl\l"t'ntration
[102]
offers the highest
simulation.
is employed
need to be soh'ed
in time.
[.5] and [91] for detailed
(see
semi-implicit
schemes
[6:3].
formulation
in tlw solut.ion
of discretization
formulations
these two extremes.
ha\'e been proposed
possible
Slln'
rely on a variety
from the L\[PES to the fully implicit
ranging
III
simulators
problem
being discussed
(degrees
of freedom)
n~locities
are approximated.
The components
jwt\\'t't't1 t\\'o grid elements
in this work. both pres-
occupy
on the edges
of the flow coefficients
are defined
the centers
of the
or faces of the
or mass mobilities.
/\1
as follows
T+l
T+I
.
\.,+1/2.Jk
\\'11l'("1'the sll(H'rscript
ilt'ralt'S
T
+
to a \'aiue at the (1/
. grid block location.
1 denotes
+
upstream
direction
of the flow to account
finite differences
.;c'\'pn point stencil
the
weighting
and
of t he model
(or. equi\·alently.
for pressures
!!;l'lleral to ~8 different
.
[\·i+I/1.)/.:.
i+I/2.).k
iT
+
l)-th
approximation
l)·tl1 time le\'el: the subscripts
The first fractional
through
Discretization
= p!k
_l
( j/I)
coefficients
of the \f'\\'tOIl
i.) and I..~ indicate
tht'
fact.or on the right hand side is approximated
t he permeability
for variations
equations
harmonically
in the
in grid block sizes.
(.).1 )-( .).2) is performed
by lowest-order
and concentrations
associat.ed
is weighted
mixed
by block-centered
finite elements)
of both phases.
with a given internal
obeying
a
thus giving rise in
grid point locat.ion.
1:~.i
This discretization
leads to a system of nonlinear algebraic equations given by
+ similar
terms for the y and z directions.
(.5 . .5 )
where ~.i·i+l/2
= (.ri+1 - .rd/'2.
similar way. ~.l/i+ln
and ~=i+l/2
i.e .. the cell midpoint
along the .r direction.
are defined. Higher degree of discretization
considered in the context of Cv[PES formulations
heterogeneities.
together with general boundary
has been
[127]. Dawson et ai. [40] consider a
19-point stencil in space within a fully implicit parallel reservoir simulator
. underground
In a
They use a full permeability
to handle
tensor implementation
condition specifications.
TIll-' t'xtra relations mentioned in the previous sllbsection and their corresponding
partial differentiation
the \ewton
with respect to the primary
linearization
Ilnknowns are used in obtaining
of the nonlinear conservation
ibility allows for sOllle simplifications.
equations.
Small compress-
without affecting the validity of t.he numerical
approximation.
The abo\'(:>procedure
follows the description
veloping a parallel hlack-oil simulator.
equations can be found in [.5] and [.51].
by \Vheeler and Smith [14:2] on de-
Further insights about discretization
of these
L:\6
5.2.3
The
Newton
and linear system formulation
fully implicit.
parabolic
formulation
equations
leads
for the numerical
solution
to solving the following
liS
of systems
nonlinear
of nonlinear
problem
for each time
step
F(u)=O.
F : IR"
where
conccntrations
large-scale
systems
arising
in the \"e\vton
methods
pract ice in resen'oi
the
I'
engineering
applications
L~8j bllt its plf<,ctin'ness
o problems.
and G\[RES
e.g ..
[7:3]
5.3
Lately.
general
and
typical
unknowns
problem
Consequently.
~ewton
ces
some others)
of current
inexact
solvers
[91]
of years (see
physical
conditions
\Iultigrid
methods
in
Newton
methods
are
recent
[,!1].
ORTHO~H;-.J" have been of common
and
interest.
and
to solve the lin(~ar
is relatively
ha\'e lost popularity
with
in pressures
sizes encountered
methods
for a general O\·en·iew.)
over time on account
common
in resen'oir
[12.
has been also im'estigated
has only been shown for moderate
rock heterogeneity
like BiCGST:\B.
as inner so!\"ers for inexact
Chebyshe\'
\"ewton
and 2iteral ions
methods
IS('('
Iherein.)
coupled linear system framework
description
in step 2.2 of .\lgorithm
system
The
for a number
in dealing
The algebraic
of the partitioned
analysis
SOR,
I\rylov-subspace
and rt'ferences
represents
of inexact
han:' been employed
\Ve now provide
arising
iteration.
engineering
lack of robustness
tL
mle Ollt the use of direct
theory
(and
vector
phase.
such as SIP.
four algorithms
of tlwir
the
sillllllation
Although
iteratin:'
Here.
•
of olle particular
resen"oir
preferred.
Tlwst'
Tl
lR
-+
linear
2.1.1 We identify
and establish
the de\'elopment
of the
systems
properties
some moderate
of the procedures
(i.e .. :'-Jewton equation)
associated
assumptions
on \vhich
with the blocks
to facilitate
the preconditioners
the
art'
tTi
based.
These assumptions
are not intended
to give a definitive characterization
of
real life simulation matrices but are met when the time step is short enough to ensure
conwrgence
e\'aluating
of the :\"e\vton method
itself and, therefore,
the latest advances in preconditioning
provide a framework
for
coupled linear systems in reservoir
engl neenng.
5.3.1
Structure of Resulting
Each linear system associated
Linear System
with the two phase model depicted in (.S.l)- (.j.2) can
be part.itioned in the following :2 x :2 block form
.lpp .Jpc
( .l.:p .lec
Jx-/¢:}
Each block
and
In
(f'L')'
.Ii.).
)
(
p ) C
-
(
In)
(.j.6 )
Iw
-
i. j = c. p is of size nb x nb. where nb. is the number of grid blocks
is the residual vector corresponding
to the non-wetting
(wetting)
phase
coefficients.
Each group of unknowns
is numbered
pressure unknowns are numbered
(nh) and the concentrations
The block
.Ipp.
in the non-\vetting
parabolic
from nb
+ 1 through
fashion:
phase pressures.
The block
.fpc
of a purely 1'1of the .Jacobian
similar to that of a discret.ized first-order hyperbolic
phase concentrations.
problem in the non-wetting
parabolic (convective-diffusi\'e)
the
2nb.
presslll't' coefficif'nts. has the structure
liptic problem in the the non-wetting
matrix has a structure
lexicografic
from one through the total number of grid blocks
are numbered
containing
in a sequential
problem
Jep has the coefficients of a convection-free
phase pressure and. finally_ lec represents
problem in the oil concentrations
a
.
The position of nonzero entries of a given .Jacobian matrix is sho\',;n in Figure
.j.l.
[n this particular
example. we can observe the effect of the upstream
weighting
0,........•
501-
.•••,•••,•••
100~
'.'.
....•
I
",
!
~.~
150~
.
'.
200~
.
.....
.......
250l
..........
300l·········.•
!
....
I
.'
".",
'.
350~
'~'"
,
I
"
'.
400~
""'"
.
.....
J
......
'.
".
",
"'"
"'
450'
•
"
.....
",
'.".
'.
100
Figure
5.1
200
:\[atrix structure
within the block .fpc:
300
nz = 5504
.
400
j
"" I
",
"~'"
500
of linear systems in the Newton iteration.
the moving front is one block behind giving the only nonzero
c<wlficients in the lowpr part of the block. However, the absent values in the upper
part
added po~iti\'ely to the main diagonal of that block.
iUP
5.3.2
An algebraic
analysis
of the coupled
The presence of slight compressibility
(further
matrix
ensllrPs im'ertibility
of the .Jacohian matrix
this issue is gi\'en in [;')].) In general. in system (.1.6). the
discussion about
block coefficients
Jacobian
and Jcp share
.f/>p' .fpc
ther physical insights and
p.
t
he following properties
l2] for mathematical
(see e.g .. [2..j] for fur-
definitions and related theoretical
results ):
• Diagonal dominance .
• Positi\'e diagonal
matrices).
and
entries
and negative off-diagonal
entries (i.e., they are Z-
• Irreducibility.
Strict diagonal dominance in all rows is only present in
pressibility and pore \'olume term rontribution
In COl)sequence. these blocks are nonsingular.
diagonal dominance for ~ome of the rows of
and
.fpc
.fep
as result of
COI11-
to the main diagonal of these blocks.
positive stable and M-matrices.
.fpp
Strict
can be achieved by the contribution
of hottom hole pressures specified as part of the boundary
conditions.
In this case.
this block is an irreducibly
In addition,
under small
diagonally
changes of formation \'olume factors:
can expect both blocks Jpp and
The concentration
t he other blocks.
.r
p
dominant
matrix.
and flow rates between adjacent grid blocks we
to be nearly symmetric.
coefficient block -Jee presents algebraic properties
It has a convection-diffusion
behavior characterized
similar to
by capillary
pressure derivative terms (the diffusive part) and wetting phase relative permeability
derivat.i\·e terms (the convective part). The diffusive part becomes dominant over the
roI1vecti\'e part when capillary pressure gradients are higher than relative permeability
gradients of the wetting phase. It is likely that this occurs at the beginning and end
of
lilt:'
~imulation
when the capillary
pressure curve tends to be steeper.
illtprll1cdiatt' time steps of the simulation.
the wetting-phase
pressure gradients
rt'lat i\'p permeability gradients with respect to wetting saturations
and affect negati\'e!y the magnitude
affecting negatively
and
are less pronounced
of the convective part. However, under the same
trend the capillary pressure derivati\'es
prominent
During
with respect to wetting saturations
are less
the amount of dispersivity.
Desirable diagonal dominance
in -J.:.: can be indeed achieved by shortening
time step. \Ve have observed that the conditioning
the
of this block has an immediate
Formation volume factors of each phase are defined as the ratio of the volume occupied by the
phase at reservoir conditions to the volume occupied at stock-tank or atmospheric conditions.
l
ll-l
III pln'sical
('ients
il~
terms.
the decoupling
if ("()Ij('('lltration
operator
derivatiw's
tends
to approximate
in the transmissibility
computa-
or e\'aluating
some transmissibilities
explicit ly.
U
We prefer the form D-1J O\'er .10-1 ~ince the latter
~Il bsect
of J. Other
implications
may spoil the inherent
of this choice will be discussed
in the
diagIlext
ion.
Tlw abon~ decoupling
('all a~sociate
smaller
or .-\BF operator
matrix
rows and columns
pressure
followed
unknown
to repeat
pt'rmut ation
this
an alternate
unknowns
in an interleaved
by the concentration
for P\'ery
grid block.
within
fashion
unknown
P
Let
representation.
PJ pi =
_
Ji.J
.
if
_
-
.11.1
.11.2
.12,1
.7.2,2
.7,11:0.1
JnU
1·I.lp"
be the matrix
representing
. .:
(
-'I> ; ,j
the coupling
follows for an invertihle
(./ pc ) i,J
)
•
f)
(.
is the '2 x :2 matrix
\
0 that
(JcJi.j
between
the mesh.
and to number
at the same
and define
./ =
[1. clearly
admits
blocks with individual
mealls to pt'rmllte
and
COt'Hi-
were neglected
t ion. Hf'nce. this is Ii kt>..t iIlle-lagging
onal dominance
pressure
unknowns.
that
We
This
en~ry
grid
block
performs
such
Hence. D-l is a block diagonal matrix whose blocks are the inverses of the .Jacobian
blocks associated
to a local problem at each grid block. That is.
-
D-I ==
I
.11.1
0
0
. 2.2
I
.JD ==
0
J-I
...
0
To follow the underlying
.
"
0
(.1.9)
0
-
1
.J;;b.nb
notation, let us define the alternate
decoupled system as
D-l.J = P D-I.J pt.
This idea appears rather natural.
the possibility of decoupling
In fact. Behie and Vinsome [11] comment about
more equations
with respect to pressure coefficients.
shall note below. that
in their combinative
method but only
They did not foresee the positive effect, as we
a full decollpling of the grid block has in conditioning
the
system.
The core of the combinati\'e
. "ystems.
In this situation.
appr,
ch is the effective solution of pressure-based
there is no need to go further in the decoupling process as
eXIHPsseclin (;).9). The coefficients introducing
the coupling with pressures are zeroed
out wit hin the grid block by Gaussian elimination
so that corresponding
coefficients
at Ilt'ighboring grid blocks are expected to become small. To be more precise. let
(IVp)
\Vp =
0
0
\\There
1
0
...
0
(\Vpb
0
0
...
( H/p)nb
I.
(5.10)
I ! Ii
and (-I = (1.0)/.
Therefore.
in each :2 x :2 diagonal
the coupling
fact. it readily
IFp is a block diagonal
the operator
block with respect
matrix
that
to the pressure
unknown.
( .Icc)
The
[n
follows that
o
Similarly.
remon's
we could define
Wp was introduced
operator
(rc' with the canonical
an operator
by "'allis
in his I~IPES
)
i.j
vector
two-stage
f2
=
(0. I)'.
preconditionf'r
[1:39].
The consecuti\'e
lVT>' of the alternate
counterpart.
representation
of the operator
IV,., is gi n~n by
J"p
.
== IFpJ
=
( :,.-ID,
0
o
If.'
) (
D".J" - DpJp D"J" - D"J" )
J.:p
I',bxnb
.lee
(.J.II)
( JIl,
t>p
pl'p
ep
Clearly.
J \l'p
ec
the lower blocks
suit ing pressul't'
[n order
block (i.e ..
are unmodified
say.
t
as well as the main diagonal
of the' ("('-
J}/~'P == .Ic'p and Jc~,~·.o== .lee.)
to !"l'duce the already
of coefficients.
is defined
JlI,
)
IX
decu'lpll'd
hose associated
--ystem to ,me ill\'oh'ing
to pressure
unknowns.
a particulilr
the operator
,.;('1
R~ E JR"'" ~,,'
by
if
i = k and
j = I
+ :2 (k
- 1) .
otherwise
for k = 1. 2 .....
we could
IIh. [n this particular
also define j = :2
concentration
coefficients.
+ :2 (k
lexicographic
alternate
- 1) for R~ in order
ordering
to obtain
of unknowns.
the corresponding
1.t7
Finally, we stress that this presentation
can be easily extended to more unknowns
sharing a given grid point (e.g., three phases and multi-component
5.4.2
Properties
of the partially
decoupled
systems.)
blocks
In generaL it is a difficult task (and in fact. an open problem in many related fields
[1.6. lL ·)8. 9:3]) to characterize
properties associated with the entire coupled Jacobian
matrix and even more so if it has been affected by some of the operators
described
above. This is one of the reasons that theory concerning existence and applicability
different linear solvers or preconditioners
of
is based on some specific assumptions on the
matrix J. For the class of matrices that we obtain, there is not yet an easy-to-check
theoretical
result that determines
\vhen the symmetric
when a matrix
part of a matrix could have only positive eigenvalues although
the matrix has some blocks that are \l-matrices
In the applications
spectrum
is positive stable and moreover,
of iterative
of the operators
and present diagonal dominance.
solvers it is fundamental
on which they are applied.
to have an idea of the
Specially, one would like to
know if the eigenvalues are located on the right half of the complex plane to guarantee
theoretical com'ergence of the iterative method . .\lso important
is to detect a possible
clustering of the eigem'alues since this may increase the rate of convergence.
section. we briefly present two immediate
blocks of the already partially
In this
results related to the individual diagonal
decoupled .Jacobian matrix through
Consider the decoupled matrix with a block-partitioned
the action of D.
representation
as showed in
(.).6) .
Theorem
5.4.1
Let Jpp and -Jcc be diagonally
and let Jpc and Jcp be ~l-matrices
matrices.
in IRnbxnb.
irreducibly
then J~
Z-matrices
and Jc~ are ~v[-
I IS,
Proof
\Ve prove separately
dominant
they
matrices.
are
proceeds
J~ and Jc~ are Z-matrices
by [:3. Lemma
Then
\[-matrices.
that
\Ve only show
and strictly
6.:3. page 20-1] it immediately
J~ is an \I-matrix.
that
diagonally
follows that
The
proof
for J~
similarly.
First.
note that
<
(~-I)i.i
O.Vi = 1.2 .... l1b. In fact. (DpP)i.i,(DpJi.i
and (D~~ )i., is negative
are all positive
for
i
=
and (o':p),.,
nb. so that
1. 2 ....
Therefore
( J[?[J)' .
.
<.J
= (~-
rL ( o',J
I ).
1.1
1.1
since (JpP)i ..j ::; 0 and (f'P)i.j
For the strict
of the elements
diagonal
(f'xcept
(J p P ). ".J.
::; 0, Vi
dominance,
D pc );.. , (J cp ).
-
(
=1=
j, i. j = L. 2 ....
consider
I.J
.]
~
O. Vi
=1=
=
j. i. j
1.:2.. .. n b
I1b.
the summat.ion
over the absolute
the one lying on the main diagonal)
along
the i-th
value
row of
J~. then
L l(.Ipp ),J = L I(~-It, (D,.,..l
"b
"b
pl'
:=:
:==t
J:;t.1
j:;t.l
=
-
OP''!''p
)i.J
I
.
1(..\ )",1 [IWJ, it. 1(.1,·,I" I + I(Dp, )iil t.1'
-I
I]
J#l
J~l
<
J,p) ij
1(..\-I),J [1(Dccl,.,II(OpP),..!
+ I(Dpcl,.,II(Dcp),
.• I]
=1.
The
(~ote
inequality
is obtained
due
to the strict
inequality
that we can not affirm this with the exclusive
diagonal
dominance
are bounded
in Jpp.)
held in f.'\'t'ry row of
contribution
from the irreducibh'
\Ve can say t hen. that all entries of the transformed
by 1. which is the value that all entries
II'
have in the main diagonal.
blocks
0
149
_~E><'O-'
-4
o
)II
i :0.4
0.8
-3500
-3000
..
.-)114-.• (
........... -"".!
0.2
,
0.9
:~~~--- ---- --,
-44bOO
""........
iI
...
.
I
,
1
1.2
1.4
1.6
, .:
-2500
-2000
-11500
~
1.8
2
~ j
-'000
-500
0
500
Figure 5.4 Spectra of the partially decoupled forms of the sample
.Jacobian matrix. The on~a~ove corresponds to D-IJ, and the one below to
\tV J (or equivalently. \tv J.)
.\n inmecliate consequence
Corollary
Proof
5.4.1
of the above result is given by the following corollary.
The diagonal blocks J£ and Jc~ are positive stable.
This is just the result stated in [:3. Theorem
6.12, page 211].
o
In Figure .j...t we show the spectrum of the resulting Jacobian matrix after applying
t.he decollpling operators
has
!Jt't'll
D-l and W. [nterestingly
enough, the Jacobian
spectrum
significantly compressed and shifted to the right half of the complex plane by
the action of D-l.
In contrast.
t.he original structure
strategies
that intent to presen'e as much as possible
of the matrix perform very poorly as preconditioners
(see the
great resemblance between the spectrum of \tV J and J.) Several experiments
like this
one ha\'e indicated that the best strategy is to break as much as possible the coupling
between equations
(or unknowns)
of the individual blocks.
than trying to preserve some desirable properties
L.iO
....
)( 10-;)
or
21
-2
--
!
a
0.2
0.4
1 )( 10-3
_~t ..
:
n
-0.4
-0.3
0.6
.. t_
0.6
:
-'"
1-6
1.4
'-6
....
: :
::..: .. -___
- ~I~ 1-:
I:
1
:
~
-0.1
-0.2
, .2
1
I: -- ..- -- J.
I
.... ---
0
0.1
0.2
- U.;"
0.4
::J
0.6
...... ..
• ;:.. :-- .:
~
!!!.!!!,""
1
'o.g
'-,
0.03
.
.
---
j
1.2
1.3
1-4
1.5
Figure
5.5
Spectra of each block after decoupling
with D. From top to
hottom. they correspond
to the (1.1). (1.2). (2.1) and (2.2) blocks
5.5
Two-stage preconditioners
We Iwgin by gi\'ing a brief background
order
of t he following
presentation
obeys roughly
led to the formulat ion of the \'arious
detail
\\'alli<
ildditl\'E'
(\lId
t\\'o-~tilge preconditioner
11I1lItiplicati\'E' form.
(·()w.;i~till~ of thl' combination
Background
Efforts
to den~lop
Behie
equations
and Vinsome
preconclitioners
minor
change
format
Pilei
for the forthcoming
a chronology
preconditioners
ideas.
of how the ideas that
arose.
\Ve discuss
[1:39]. and a couple of extensions
thi~ section
with a more efficient
opprator
such as block .J acobi.
The
in
to it in
approach
D-I and the inexact solutioll
block Gauss
Seidel and Schur
based.
5.5.1
parabolic
\\"e
two-stage
of the decoupling
\)1' ~t a ndard block preconditioners
complt-'ment
as motivation
general
and
efficient
soh'ers
for systems
ha\-e started
to emerge
[11]
to be the first researchers
appear
as a form of c1ecotlpled
to the idea but seeking
strongly
elliptic
in the last few years.
preconditioners
to incorporate
of coupled
to consider
in reservoir
saturation
Howe\-er.
combinnfit'f:
engineering.
information
and
.-\
was later
1.11
proposed by Behie and Forsyth [10] and 'Wallis [138].
with the introduction
of the constrained
Wallis generalizes
residuals preconditioning
the idea
which allows the
freedom to choose those residuals to be driven exactly (or almost exactly) to zero. In
particular,
if this residual corresponds to pressure variables, the method coincides with
that proposed by Behie and Vinsome. However, Wallis later suggests that the iterat.ive
solution of reduced pressures systems are one of the most appealing
tackle larger reservoir simulation
concatenation
or combination
problems [139]. ~Ieanwhile. developments
of inexact preconditioning
for general symmetric and non-symmetric
of domain decomposition
approaches
to
on the
stages have been proposed
problems [138] but specially in the context
[13, 1.5,80] for flow in porous media. These works, hO\vever.
do not address the topic of specialized preconditioners
for coupled equations.
In CFD the idea of using decoupled matrix blocks for the construction
ditioners for iterative methods and for the implementation
of precon-
of solvers has been around
longer. Segregated algorithms have been successfully applied to solving Navier-Stokes
equations
alternate
(see e.g .. [76. 1:31] and references therein).
These methods
solution of pressures and velocities or on the exhaustive
them to get a good overall solution of the problem.
oped in sequential
formulations
solution of one of
Similar ideas have been den'l-
at the level of time discretization
of linear solvers or preconditioners
rely upon the
rather at the len,1
for fully implicit formulations.
The use of two-stage methods is not new (see e.g .. [LOa] and references cited there).
In fact, these methods are known under different names and are scattered
throughout
the literature.
[41, 57. 72].
They are also known as inner-outer
In the context of preconditioning
or inner-outer
preconditioners
parallel computing
or inexact iterations
they have been referred to as nonlinear,
[3, 4, 112]. They have been also subject
settings (e.g. see [27] and further references therein.)
variable
of study in
However. in
t he context of large-scale systems of coupled equations
t hey strangely seem to han'
been owr\ooked.
The renewed interest in using two-stage methods obeys primarily to the
tional cost associated with solving large inner linear systems.
KrylO\'-subspace
methods have also contributed
For example. the t"zawa algorithm
Recent dewlopments
in
to the renewed interest in t his area.
formalized the inexact version of this algorithm
In same fashion. intensive work has been recently devoted to extending cur-
rE'nt non-symmetric
iterat ive soh'ers to be able to accommodate
\'ariability of the preconditioner
from iteration
to iteration:
.\s said in Chapter 2. we use right preconditioning
step for the consecuti\'e
ordering of unknowns.
cheap and that its proper application
That is. if r =
(I'fl'
or
through this work. \Ve make an
D-l as a preprocessing
In view that this operator
is fixed.
introduces the desirable diagonal dominance of
tlw lIlain diagonal blocks of the coupled system (.).6).
sidf'. Ilencf'. t1H' application
the inexactness
e.g. [3.112.1:3:3. l:H) .
exceptIon to this rule when we include the decoupling operator
norms.
(1-
has been around for more than :3.5years and it was
only recently that some researchers
[16. :Yi].
COmp1\I
Wf'
consider its use on the If'ft
of D-' implies the use of weighted norms for all \'I'ctor
,)1 is a gi\'f~n residual (\\'hich concatenates
,.,...
hlJlh lite wetting and the non-wetting
residuals of
phases) whose norm needs to be complltl'l\.
t 11f' n
Clearly,
t his
mentation.
dot's not int roduce any major complication
or o\-erhead into the implf"
\Ioreovpr. this step can be also regarded as a scaling step for the couplcd
\'ariables of the nonlinear -function in a given ~ewton step. This incidentally improws
the robustness of the whole :'\ewton method.
\ewton
method can be seen in [22. -19].
Further discussion on scaling within the
5.5.2
Combinative
two-stage
preconditioner
Consider the two-stage preconditioner
."1 expressed as the solution of the precondi-
tioned residual equation ;\-Ipv = r. Also, denote
preconditioner
JWp ==
vVpJ.
Then the action of the
Alp is described by the following steps,
Algorithm
5.5.1
(Two-stage
Combinative)
1. Solve the reduced pressure system (R~Jwp Rp) p
note its solution by
=
R~ VVpT and de-
p.
2. Obtain expanded solution p = Rpp.
:3. Compute new residual
4. Precondition
r
= r - J p.
and correct v =
.\/-1,. + p.
The action of the whole preconditioner
can be compactly
written as
(5.12)
The preconditioner
ation.
.il
is to be preferably
computed
This means that :~l should be easily factored.
once for each Newton iterThe system
R~IFpr is solved iteratively giving rise to a nested procedure.
(R~.JwpRp) p =
We finally remark that
.lIp is an exact left inverse of .J on the subspace spanned by the columns of Rp. That
IS, (.\-/;11)
Rp
=
Rp•
This is the preconc\itioner as stated by Wallis [1:39]. In contrast to the combinati\'e
method of Behie and Vinsome [11], he proposes to solve the pressure system iteratively
and formalizes the form of the operators
the preconditioner
as two-step
stage combinative
preconditioner
accepted
terminology
IMPES
preconditioner,
(2SComb)
for convergent
vVp and Rp•
Although
we consider the term two-
more appropriate,
nested inexact
Wallis refers to
procedures
according to a more
and to the former
0_08
I
1
O.C6f
I
. .
:- .. .o:a .
002-
.
"tl
•
..
I
II I
'.
1
I
]
.1... -
I
I
J
-J.:JoI-
I
,B8~
oJO
Figure
df'~igllation
operator
~_7
employed
"'il h t hp lise of
i,dditiull
solution
tin'ly
and
[)-l
14
'_5
system.
by
.1.6 shows the spectrum
In this particular
of the
example . .\[
extensions
ing solllt ion to a n-'dll('('d concentration
\Ve propose
system
two different
In the following we present
residual
preconditiol1er
Figure
to the redllCt'd prf'ssure
and Olultiplicati\·ely.
two-stage
j
t.J
part of .J.
multiplicative
preconditioner.
the preconditioned
12
of the pressure
and illcorporat
to the ...olution
the previous
.
I_'
by B('hie and \'insome.
for an exact
Additive
J.3
5.6
Spectra of the .Jacobian right-preconditioned
the f'xact \'ersioll of the combinati\'e
operator.
was t a kpn to he the tridiagonal
5.5.3
08
i
t·
=
(2S.-\dd)
.\[~t~I" and
is gi\-en by
t'
=
Wf'
ill
can imprO\'e the qualit.\, of
ways to accomplish
both procedures
JI~~ltr.
systefll
The additive
this: mldi-
for computing;
combinati\'e
L,j.j
Algorithm 5.5.2
(Two-stage
1. Solve the reduced
Additive)
pressure system (R~JD Rp) p = R~D-l r and de-
p.
note its solution by
system (R~JD Rc)
2. Solve the reduced concentration
C
=
R~D-lr and
denote its solution by C.
:3. Obtain expanded solutions p = Rpp and c = Rcc.
4. Add both approximate
.J.
Compute new residual
r=
The nlultiplicati\'e
combinative
+ C.
Jy.
r -
and correct v = ;\:/-li-
6. Precondition
sequential treatment
solutions y = p
+y.
two-stage preconditioner
of the partially preconditioned
(2SMult)
residuals instead.
proposes the
In algorithmic
terms it is given by
Algorithm
5.5.3
(Two-stage
1. So1\'e the reduced
pressure system
note its sol u t ion by
.J
Obtain expanded
3. Construct
\lultiplicative)
denote its solution by
r
= r -
Jp.
system (R~JDRc) c
c.
solutions c = Rec.
6. Compute new residual w = r 7. Precondition
p = R~D-l r and de-
solutions µ = Rpp.
new residuals
Obtain expanded
p)
p.
4. Solve the reduced concentration
::J.
(R~.JD R
and correct u =
J( c + p).
.~I-IW
+ c + p.
=
R~D-lr and
~~
i
;--
·......--..
·· ~III-· , .....'"
.. .I t<""
··
-·11-
1.112
Figure 5.7
exact
As:,uming
that
both
introducing
the notation
for I = p. c:
t IIf> (\cl
'-3
14
1.5
Spect ra of the .J acobian right-preconditioned
H'rsion of the two-stage additive operator.
reduced
pressures
al1d concentrations
ion nf these preconditioners
by the
are solved exactly
can be characterized
and
by
(;').1~ I
and
(:).1.1.1
The dilTerencf> het\\'ppn
term
1~.7tp reslllting
·').;').:3. Prelimil1ary
sented
in
[82].
precol1ditioners
the two preconditioners
from the computation
computational
[n Figure
perform
;j. j"
of anew
experIences
and
in clustering
Figure
resides in the inclusion
with
residual
these
in step 6 of .-\lgorithm
preconditioners
-1.8 we can observe
the spectrum
around
of the cross
the task
the point
were prethat
thest'
(1. 0) of the
IY,
complex
major
plane.
Note that
the l11ultiplicati\'e
two-stage
d usteri ng of t he real part s -of the eigem'alues
('ven though
the resulting
preconditioned
system
preconditioner
around
producf>s thf>
uni ty among
has a negative
t hf> t h !'C'('.
eigenvalue.
~.06.
...
I
I
lI.
..
I
~06'
-0.2
Figure
5.5.4
In the
block
way that
form. \\.f>can express
Howe\·er.
coupled
operator
system.
the correcting
decouplil1g
the o\"erhead
),6
0.8
1.2'
4
1.6
In other
\·ia
additive
il1trorluced
have intprpretation
described
we pn'st-'Ilt them
a "good"
words.
-'I
preconditioners
operators
the preconditiollers
performs
step
its corresponding
},4
two-stage
in this opportunity
decoupling
,)2
5.8
Spectra of the .Jacobian right-preconditioned
by the
exact n~rsion of the two-stage multiplicative
operator.
Consecutive
:0;(1111('
I
J
in a ~impler
form.
the spectrum
we apply the block \·ersions
and multiplicative
block
abo\-e in cOllsecuti\-f> block
job in clustering
as it is depicted
in alternate
directly
in the combinative
extensions.
by this opf>ration is difficult
gin:'n
that
to compensate
tilt'
of the original
to
JD and omit
preconditioner
The reason
fOl'lll.
and
for this is that
for by its own
l·,)8
preconditioning
action
effecti\"eness.
to reinforce
At the end of this section
=
the analysis
of its
this vie\ .....
For the ease of presentation
If]
we extend
(I"""'
we consider
I;;' (1m -I
o
)
a
so that
,
0
\.1)
';"0
(r'
(
-I
=
0D) (
(Sf]
Illryx nb
'-
_
the factored
form of the block-partitiolled
~"b:7b DO),
.fcc
(.fcc)
.fcp
0
-
(.fc~rl.fl~(80)-1
(.fc~rl
!5.16)
lnbxnb
)
x
(.1.17)
( I",,",
-I;;' (fEr' ) ,
o
\rlwrp
S'o =
rllf'rf'forE'.
.fg
- J/~ (.fc~)
if
,.0
-1
= (,.~.
l'lbxnb
.fL~ is the Schul' complement
,.~r
is a given
residual.
the inexact
t jOI\l'r1 hlocks <ls~()ciated to (.1.1 i) can be described
Algorithm
5.5.4
(Block
.)
n'
as follows.
solving)
l. Sol\'{-' J~q = ,.~and denote
its solution
by ({.
= ,.~ - .1;:/1.
:3. SO!\'e .S'Dp
=
Ii'
and denote
;). Solve Jc~c = !J and denote
6. Return
(fi.c).
its solution
its solution
of JD with respect
by
by
p.
c.
action
to /~.
of the parti-
1.")9
[f steps 1. :3 and ·1 are solved iterati\'ely
il
two-stage
hea\'ily
method,
Obviously,
Ilpon the con\'ergence
preconditioner,
and satisfied
its dfirit-'ncy
instead
of via a direct
the COI1\'ergence of the whole
of each indi\'idual
is dictated
for every new outer
inner
H~
Jl,
I
..
,
•
'
........
'
. ",
"-'
'
,
".
l'
'
assuming
the blocks
and
I
U
'05
";S
ioner like this is cost Iy to implement
linear
with .S'D probably
systems
it is straightforward
approximations
neglecting
.
j
5.9
Spectra of the .Jacobian right-preconditioned
exact \'{'rsion of the two-stage block .Jacobi operator.
different
to de\-ise the steps
to (JD )-1, Discarding
Jpc- and
:2SGS to the solution
/-p
by the
in our context.
dense.
However.
for carrying
we ohtain
under
lhi~
out the action
the first step of Algorithm
to be zero matrices
It demands
l)f
.1..1.-1 and
the two-stage
block
Gauss-Seidel
(25G5) results
from
Jp.;- This reduces step :3 of .-\lgorithm
.j.,j,,! for both
2SB.J
preconditioner
only the block
are chusf'1l
•
_,~I9
the :'1)lutioll of three
as a
1
"""' ".
-v 6~
.Jacobi r:2SB.J)
tolerances
this
I
•
.•.It
pr<'s('ntation
Regarding
I
+ :.....
:
('It-'arly. a precondit
dept>nds
I
.oZ>I" • ..
Figure
procedme
I
.
!
1_6,:
lZI~' •
we obtain
iteration.
I
I
solve,
by the way in which
:9,
I
method.
whereas
the two-stage
of
o
.J ppp
D
= rn
•
Il:iO
A more robust
preconditioner
of the Schur complement.
order
to maintain
customary
in Figurf's
approximation
.).1)-;).11.
around
\"'ot surprisingly.
under
reasonable
that
one eigem'alue
relationship
point
these
of the block subsystems
an'
two-stage
1)1'
job of clustering
from the rest as shown on Figure
preconditioner
preconditioners
clustering
by 2SB.J preconditioning.
bet\'v'een the action
on t he left half of the complex
between
it is
does an even better
separated
resemblance
[n
costs.
(1. 0) produced
except for ol1e that appears
of the multiplicati\'e
are in\"ol\'ec!,
:').9. we can obser\"e the significant
the complex
is also a certain
matrix
computational
for exact solution
[n Figure
approxil11iltion
to (Jfc)-l.
the 2SGS block preconditioner
the eigem"alues
;').10, There
by means of a better
all blocks of the original
of tlwse pr('conditioners
the f'igen\"alues
and
8. where
to prO\'ide a ~imple approximation
Spf'ctrum
shown
this
can be obtained
plane.
of this preconditioner
although
This
which shall
the latter
fact illustrates
become
lea\"es
the close
more evident
in t h,'
next st'ction.
Strategies
soln>r
this
\·ariants.
In
the Schur complement
C'FD problems.
(·oncept .. -\ classical
(>olllpi('Illt'nt
COllT
im'oh'ing
with
example
respp.ct
ril~t to How ill porous
assemhled
sol\'ing
simultaneously
not all) of the primary
precol1ditioner
to soh'e
f'([uations
.,addle
inspired
point
nwHlcients
media applil'i1t
(see e.g .. [76]) ranging
for incompressible
by the
Richardson
the global disCl'etized
of freedom
Among
formulations.
departs
solves
the Schur
III
itp.ratioll.
equation
\lany
is
11"\'('"
\'ariat il)[\S
we construct
proposed
from the discretization
Ro\\'. The algorithm
work unclf'r
to
with one or some (but
\'ariants,
method
linear
for each nodalunknowll
associated
the several
projection
arising
algorithms
which
from solving separately
by the discrete
formulations
lOllS.
in several
algorithm
for fully implicit
for all the degrees
unknowns.
segregated-type
i~ the t'zawa
to \'elocity
and soh'ec! in its entirety
are possible
many
have been employed
a third
by Turek
[l:~q
of ~a\'ier-Stokes
from an approximation
to
Lo I
the Schur complement
component
That
given
with respect
to pressures
hy the \'elocities
(role represented
is. to obtain
the preconditiol1ed
residual
Algorithm
5.5.5
Discrete
-, (-.fccD) -
1, Set
'J
'.
-' So!\e
I
and solves iteratively
(Two-stage
(1\.
the hypf'rblilic
by concentrations
('.,)1 we perform
in our ca~(').
the following
,I ('!>';
Projection)
D _ 1,
~ (Ix)
[.fppD -
D
.fpc
(r',l'
i'-ell•
-1
( J-D
ec )
]
D.
II'
_
I. P -
D
rn
-.lpc
( .Icc- D)-1
D
D'
."
r'L' Iteratl\el~.
.
Obtalll
L'p.
-1. Return
i,e .. the preconditioned
residual
corresponding
to
(/'~. /'~).
The idea behil1d this prE'conditioner
some approximation
to concentration
on the Schur compll-'Il\f'nt
tl)
tht' [:\IPES
philosophy
is to give a sharper
coefficients.
with respect
solution
\Ye propose
to concentrations
than the Schur complement
to pressures
this algorithm
to pressUl'f'S \\-hich
would resoh-e l1lore accurately
the coneent rat ion components.
Throughout
\\'(~n-'fer to Ihis preconditioner
as discrete
two-stage
precondit
to a\'oid
an additional
The first :itep in ,\Igorithm
it(3ration
J-:; I
is introduced
to solve .f,_~(1 = /'~, as suggestt-'d
is chosen
to \)f' computationally
in\'erse of the diagonal
to the identity
in Figure
,1.'),·)
-1.11.
matrix
part of
.fe,-
iu step
('Iwap.
1 of Algorithm
Turek
spectrum
this \\-\)rk.
ioner US D pi,
and co"t
h'
·1.-1,--1:. The operator
[l:n] suggests
(i.(' .. .Jacobi preconclitioner)
in our case, The eigenvalue
based
since it is more closely rdat('d
with respect
projection
giw'n
that
/-:::1
which clearly
thus generated
he t Iw
rcc\1\C(-,~
is shown
1.
)
I
.~~
•• J
I,
J~~
I
:01t
I
i
~~(~~
~a~
~d
"
~~l
I
~~l
;2
"
13
U
Figure 5.10 Spectra of the .Jacobian right-preconditioned
by the
exact \'ersion of the two-stage block Gauss-Seidel operator.
)0'
303,
,
,
'le2'
0'
0
jCl·
0
0
o·
~
-vOIo
,,
~C3'
I~S
'.'
115
12
125
.
1.J
lJ5
1£
Figure 5.11 Spectra of the .Jacobian right-preconditioned
by the exact
\'ersioll of the two-stage block discrete projection operator.
5.5.5
Relation
If _\[ indicates
between alternate
any of the preconditioners
it is it is desirable
• Continuity
described
predict
when tilt' :-;ymmetric
1-
the follo\\!ing error bound
part of JJf-1
the preconditioner
\\'a.\- tlll'\' Iwlp the con\-prgpnce
h'idently.
pl'l'COllditioner
there
-,
matrix./.
(1
+ 17)2
the smaller
17
is expected
al1d hetween
lIlt'thod.
machine
the multiplicative
"hara('terizatioll
is absent
(recall
factor)
Theorem
•
to be. In this Sf'nse. it is
fll1d in what
For the silkp of ~il11plicit.\·. \\'t'
precision)
the additiw
t-'\'ery inner rpdllCt'd
and block
alld Gauss-Seidel
at the first step of Algorithm
a similar
definite
d<:'\-eloped herE' are related
Ito
between
(convergence
-117
)
of til(-' itt'riltin>
is a relation
is positive
(1+17)"
we are able to sol\"(> eXiletl.\'
t ionf'1'. By looking
'1'Ilforlullatl'ly.
(1-17(
to ob~erve how the preconditioners
ClSSlllllt-'that
for the .Jacobian
or bOlllldlless:
:!.1.1 )'1:
importallt
abo\'e
two conditions
These two abo\'e propertif's
The better
forms
that
which implies the following
fur (;\IRES
and consecutive
Jacobi
two-stage
t\,,'o-stage
precondi-
;).,j.l we can see that
for BiCGST..-\B.
the solution
l()b
for the two-stage
additive
.J. then
job preconditioning
a comparable
effect.
original
.Jacobian
1m\' error frequencies
matrix
less compact
than
justified
algf'braic
instead
a bettt'r
in
higher
t he
behavior
propertit"s
of taking
Although
all
Consequently.
the decoupling
global
or at least.
operation
1f'<\\'t'S
.\I for
preconditioner
expected
to eliminate
two-stage
those
iterations.
:'\ot
preconditioner
art"
provide
the desired
method,
M seems to be
it still has to capture
from a linear
(recall
effectiH'lless
spectrum
pictures
easier rt'c\ucecl concentration
problem
part of the
whose block
in Figures
problem
.).2--).G).
obtain{·r\
1)\-
strategy.
The last poil1t can be made
Eadd
.fI has a better
the use of the operator
in concentrations
it difficult
doE's a good
c
counterparts.
combinati\'e
contained
+t
and concentration
,\[ may e\'entually
cost.
ath-ant age of
decoupling
that
an efficient
from pressure
consecutiw
original
make
expect
plots for the alternate
a more elaborated
but at a significal1t
hyperbolic
their
If tp
operator.
which is additionally
remaining
the spectrum
better
we should
task of finding
surprisingly'.
Of course.
propagation
:'\ote t hat the omission
one with the difficult
t he
error
more precise
by looking
(I - I\[-I) [I - ](/" + fJ]
= ( I-./.\I---1)[
I-.J -0- 1/;>+1".1- ,(-
at the computation
of
=
by taking
norms
./-./
-D)
(tp+lc.lJ'
1
we obtail1
(,').10\
where
I =
III- I\[-III:
'J =
II (.J
a
- .J D)
= IIEs.!11 =
(t
p
il/-
+ t c) II :
.ID.\Islll.
IG,
Equation
(.1.20) shows
mate
ilS il
.\ddit
ionally.
result of lea\'ing
is unlikely
this penalty
linear system
.if
used
for retrieving
Other
acceptable
effectivel.y
in special
information
form could be a coarse
ing a problem
for enhancing
concentration
under simple terms on sequential
it. is intended
for \'ector
at lowf'r computational
residuals.
demands
cases.
has to
-I
cOllplf'd
For example.
it can lw
lost in a line correction
met hod.
of the original
discretization.
are not easy to obtain
In general,
and within
implementations.
opera.tol" .
the original
J[
pst i-
factor.
problems
implementations
and parallel
cant ained in the decollpled
with
representation
reliable coarse meshes for hyperbolic
final error
a. The \'ariable
than
the overall error propagation
of the global
the
effect of the decoupling
to be smaller
seems to be only justified
part
by II into
introduced
by preconditioning
and decreasing
The use of
penalty
off the preconditioning
this \'ariable
compensate
Howe\·er.
the
neaT-
it should be designed
more relaxed
\Ye believe
that
bounds
better
if
results
can be obtained
by incorporating
more informatiol1
blocks and improving
the performance
of each subsystem
solll t ion.
till'
Finally.
we ['(>mark that
contf'xt
of microdectronic
IhClt
flll"tlll'l" pl"Pcol1ditioning
tin;1
propo~ed
by Bank
precol1ditioning
5.5.6
ff
Fan d rll. just recently
de\'ice ~imulation
after
at similar
obsef\'ations
[60], They experimentally
tllP decoupling
ai, results
arri\'ed
in significant
,.;ta!!,;ewith
in
OhSf'l"H'd
the .-\BF tranSfOl'lllil-
illq>l'O\"f'Illf'nt compared
to
Id()l'k
alone.
Efficient implementation
III t his section we propose
of the two-stage
In order
se\'eral
strategies
to enhal1ce
the computational
efficiel1cy
preconditioners.
to derrease
the computational
can use the old but still effective
nwthod
requiremel1ts
of onr preconditioners
of line correction.
This concept
we
was tlrst
170
of the decoupled
.Jacobian.
tin' I\ryIO\'-slIbspace
\'aril',ty of domain
\Ye can certainly
method
In particular.
tiplicative
residual
erate
computation
diagonally
he line-correct
require
preconditioner
(e.g .. block .Jacobi) .. \
both overlapping
and non-o\·erlappil1,Q;.
decomposition
algorithms,
the coarse-grid
involving
setting
component
parabolic
(see [:37] for detailed
in the algebraic
convergence
them
e.g., additive
theory
mit!-
with mod-
on the subject).
to non-symmetric
01'
in the preconditioned
convection-diffusion
properties
of overlapping
\'ery appealing
to solve
ion method.
this way. Other
also el1lploYf'f1 but. cont.rary
ltIc!p!illiu'
domain
for systems
in :3-D. making
formulated
by an itera-
Result.s like
?vI-matrices
which are
dominC\l1t.
Additionally.
t
110t
component
this are applicable
than
do
approaches.
in parallel
these problems.
overlapping
Schwarz.
convective
with a block-type
decomposition
can be used to precondition
solve these systems
<;\-~tellls
Very robust
non-overlapping
to overlapping
is uncertain.
and highly
domain
methods.
schemes
are better
the :2-D problems
parallel
arising
preconditiol1ers
decomposition
methods
in 2-D
from
can be
can be
their success for non-symmetric
1-;-1
Chapter 6
Computational experiments
This chapter
encompasses
and preconditioners
are de\'oted
introduces
[11
for coupled
to analyze
the
e:qwrimentation
linear systems.
performance
by Wheeler
ilnd Smith
of both
The first two sections
in a parallel
of the chapl!'1'
The
black-oil
in [1-1::2]and later improved
met hoc I~
Krylov-secant
of each one separately.
ideas from these t.wo approaches
described
6.1
numerical
last section
reservoir
simulator
ft al. in [10].
by Dawson
Evaluating Krylov-secant methods
this section
present
Wf'
numerical
the four secal1t algorithms
devised
f'xperiments
in this thesis,
(HOK~) algorithm
higher order KryIO\'-\"ewton
to illustrate
namely.
and. t.he
the effectiveness
the nonlinear
KE:'\.
of
the
HKS-B. HKS-E~ and HI,S-
\" algorithms.
TIlt" r1isrllssion
aud C'hapu'l'
bf'gins
reviewing
L Ihe f'xtended
Rosenhrork's
flll\Clioll "lid two le\'(-'I~ of difficulty
T\\"O additional
a nonlinear
problems
steady-state
the f'xample
dimelbions
here because
inexact
\ewton
methods
[:20.56.
The second t>xample in\'oh'es
to model groundwater
in two space dimensions
transport
serves
for the~t' Il'sl--.
distribution
Chaptpl'
:~
PO\\"(,II'"
singular
\-I-P(llIation.
known as the {modifiul}
temperature
1 hroughout
the Pxtf'llded
of Ihe Chandrasekhar
model for the steady-state
and is included
function.
were also dlosen
equation
cases shown
The
B"(Ifu
in rf'acting
first of tht'lll
it has been used repeatedly
This i~ il
problun.
systems
i,..
in two spacf'
as a test bed for
108].
a simplificatiol1
in the unsaturated
as a window
of -Richard8'
zone.
to observe
Equation.
which is uspd
This time-dependent
the Krylov-secant
model
algorithms
1-:-·)
.-
in action
for underground
algorithm
simulation
should benefit reservoir
tal industries.
This should
on a parallel
two-phase
.--\\1 [\.rylov-secant
:\ewtol1 "s method.
applications.
simulators
prepare
resen'oir
methods
B royden's
specifica\ly.
the .Jacobian
method
6.1.1
equation
in Chapter
criteria
in their
inexact
algorit h III
versions.
:\[oL'e
each time in con-
and the line-search
experiments
SPARe
the composite
Eirola-;\I e\'anlinna
is solved by GMRES
2 . .--\11numerical
performance
Figures
of the examples
backtracking
were run in this section
10.
Plllplo:--ed to accelerate
According
employed
to their order
that
algorit hms based
e\·er. we rather
to decrease
work in millions
relative
of appearance.
(those
provide
categorized
and secant
type
nonlinear
KEN algorithm.
method.
idea.
of methods
(Broyden's
the HKS-B
algorithm
norlll~
preconditiont'r
W(\s
at every
haye IWt'll
step).
secant-
and the last set of three
In the next subsections.
in ~ewton
the HOKN
of accumu-
residual
in this subsection
the .Jacobian
I\:rylov-secant
point oper-
in each implementatioll.
to the .Jacobian),
the nine methods
:\ewton's
gorithm)
all methods
evaluate
an approximation
upon the hybrid
the composite
that
nonlinear
in this thesis .. --\ tridiagonal
the rate of COl\\"prgetlce of C:\IRES
as :\ewton-like
type (those
shown before in terms of floating
0.1. 6.2 and 6.:3 show the computational
ror pach Ul\e or the met.hods discussed
method.
method,
Preliminary examples
lated float il1g point operations
classified
term
\·l.:!c on a Sun workstation
We present
at ions.
to Newton's
are presented
the use of the forcing
descrilwd
on Matlab
or Newton
experimentation
§ :3.
met hod and the nonlinear
this section .. --\11of them
with
in
and environmen-
for the forthcoming
are compared
throughout
junction
in use by the petroleum
the ground
simulator
\.ve believe that this (or a similar)
type of methods
algorithm
method.
(:\"ewtoll's
and the HKS-N
the NEN algorithm.
and the HKS-EN
ho\\"-
algorithm).
althe
Ext. Powell
Ext. Rosenbrock
10°
10
~
10°
,
-2
,
10
tE
10-'-
Z
c: 10
_0
~
<=>
,
10-'
,
10-8
8
-
..0-10
10
0
2
4
MFIOp
ChandraS6khar
10
6
..
_10
- '2
10
'
"
8'10-
~10-0
-
-2
-
0
2
4
6
MFIOp
(c=_9)
ChandraS6khar
-.
(c=_999999)
10°
10°
10-2
10
~
10-"
i
~
~
10-
0
r
:z
c: '0
.S'10-8
I
10-"
10-10
..
10-10
12
10-
_0
<=>
~ 10-8
o
0_5
1.5
1
MFlop
10
2
-
0
3
Figure 6.1 Performance
in millions of floating point operations
of
.\ewton·s method idash-dotted
line). composite Newton's method (dashed
line) and the HO[~.\ algorithm
(solid line) for solving the extended
Rosenbrock's
fUllct iOI1. the extended Powell's function and two levels of
difficulty of the Chandrasekhar
H-equation.
Comparison
tended
of the set of .\e\\"ton-like
Rosenbrock
fUllction
is definitely
Ilackt racking stf'pS before entering
,hI' ('kar winner is tlw composite
hI' l\Ol\.\ algorithm
~teps performed
spend
is given
difficult
to t he region of rapid
method
of nonlinear
about
by the latter.
the most
.\ewton·s
('Oll\-,'r!!.!'s ill t ht' ft'\\"f'St number
t
methods
in Figure
case.
It requires
The exsewral
COIl\'ergence.
In this case.
which is incidentally
the one that
iterat ions,
The
.\t-'\\"ton·s
the sallle effort dlte to t hf' poor
The reader
6.1.
can confirm
method
alld
Krylo\"-Broydell
the same trend
on Figures
:t I
and :~.-t-.
The extel1clecl PO\\"l,,11equation
case reveals the great
rit hill. In this case. t he four cOTlSecutiw I~rylov- Broyden
t\orms
d
much faster
q-cllbic
to zero than
local cot\\"prgent
e\'en the composite
method.
potential
of the HOl\.:\' alga-
steps drive nonlinear
.\ewton·s
method.
residual
theoretically
1-;-1
Ext.
10°
--
10-2
.....
-
'
-\
10-'&
;Z
_.
10
22
I
,,
..
<=>
~10-·
",\
'0
10
0
2
-.
-
2
ChandraS6khar
[c-.9]
;Z
10
_8
<=>
~10··
,
10
,
..
10-·
~
<=>
10--
,
0_5
, ,
, ,
,
-'0
-12
'-5
1
MFIOp
,
10-·
10
,
0
(c-_999999)
-.
2E
jo
-'0
_
10
6
10°
,
10-
a: 10
4
MFIOp
4
~
-'0
6
Chandrasekhar
10
10
4
MFIOp
10°
_.
a: 10
\
10·'
4
"10-
;Z
'.'!
a: 10
<=>
~10'·
-.
10°
,
~
E)(I. Pow611
Rosenbrock
10
2
0
2
MFIOp
3
Figure 6.2
Performal1ce in millions of floating point operations
of
Broyden's Illethod (dash-dotted
line). the nonlinear Eirola-Nevanlinna
alg;orithm (dashed line) and the 110nlinear KEN algorithm (solid line) for
solving the extended
Rosenbrock's
function, the extended Powell's function
and t\\'o levels of difficulty of the C'handrasekhar
H-equation,
\ote
that
the composite
t hO'lgh it requires
\ewton's
t\\'o G:\IRES
method
solutiol1
H -I-'(Ii wt iOll rdlt'rt s t he sa me t rend
per nonlinear
seen before
('()tlllt.
In thi~ particular
case. increasing
I-IOK\
algorithm,
underlines
certain
harder
situations.
also in Figures
Figure
6.'2 sho\\'s
methods
pared
\\'ith \e\\'toll
algorithm
some additional
this favorable
in handling
stlperior
the Krylov-Broyden
steps
than
step
al-
of the problem
it era t iOll
favors
of the algorithlll
circumstance
tIll'
for
comes to light
step is performed.
group
the extended
This explains
in the nonlinear
Broyden's
method
The Chandrasekhar
robustnC'ss
for the secant-like
type of approaches.
[\:rylo\"-Broyden
is slightly
Ol1ce again.
poorly
\ewton's
iteration.
the nonlinearity
accidentally.
performance
perform
by the
\ot
than
in t erllls of t he nonlinear
6.:2 and 6.:3 when -a Krylov- Broyden
these
played
This
is better
method
is very effective
of methods.
Rosenbrock
In general.
function
in part the wasted
KE\
algorithm.
for small nonlinear
in dealing
com-
effort disThe
\E:\
tolerances.
with the extended
,-l I .)
Powell function
and therefore.
ones. all of which com'erge
In the Chandrasekhar
dear
for the easier
the nonlinear
H~equation
case (i.e .. c
linear ~ystems.
Howe\'er.
sQn1e relatively
small
obtained
KE\'
in e\'ery
sa\'ing obtained
better
with a similar
KEN algorithm
the nonlinear
the nonlinear
=
the convergence
algorithm
tolerances.
in tllf' nonlinear
about
step towards
than
the ~EN
at small tolerances
the nonlinear
system
The additional
corresponds
Ext. Powell
10-2
10-2
Z 10-4
~
~
6
j
_ I'
8'10'·
~
~10-·
\1;
J2
l'
II
10-12
\1
o
2
I
4
MFIOp
Chandrasekhar
>0'1
z 10"
a:
~
10-'2
o
2
4
Chandrasekhar
I
I'
...
...
6
1.5
[c=999999]
1~
>0-'
Z , 0-'
a:
I I:::: I
"
\
\
...
1
MFlop
10-12
[c=,9]
',10-'0
0.5
I
\
MFlop
\ ',',> ,
10.,ot
','
0
- _...- ...
I::: I
'~
10-10
6
I
ir
10-
I"
10-'0
1~
I
8
~ 10.
10·"
2
0
1
2
MFlop
at
suggests
1~
Z 10-·
(lnd
algorithm
solution.
Ext. Rosenbrock
1~
lIol
.Jacobian
In this case. every linear
of work.
is
method
c = .9999999.
the same amount
KE:\ algorithm
I1WI hods
of Broyden's
case with
as the best choice.
implies
of these
in solving the associated
better
The
Iw otlwr
t
cost.
portion
perform
outperforms
behavior
.9). The plateau
methods
nonlinear
l\rylov-:\ewton
computational
is due to the difficulty
both
Illethod
h:EN algorithm
3
Figure
6.3
P(~rr()nnance in millions of floating point operations
of HKS-B
idash-dotted
line). HKS-E:\ (dashed line) and the HKS-:\ (solid line) for
so!\'il1g the extended
Rosenbrock's
function. the extended
Powell's function
and t\\'o lew'ls of difficulty of the C'handrasekhar
H-equation.
to
i:)'
1,6
The last set of methods.
depicted
in figure
dle the extended
algorithms.
i.e .. those alternatively
using Richardson
6.:3. The failure of Broyclen and Krylov-Broyden
Rosenbrock function produces no clear distinction
However. the lise of a cheaper Richardson
sa\'ing in million of Hoating point operations
primarily based on G\IRES.
iteration
in comparison
iteration.
art'
steps to han-
among t.he t href'
explains the slight
to a :'\iewton's method
The interleaved action of Krylov-Broyden
updates pro-
duces a stairway shapf' in the convergence rates of all methods for the extended PO\\'f'1l
function.
This indicates a loss of convergence rate each time such update is executed,
Despite this. the HKS-~ algorithm
:jOCX of the computational
and HKS-E:'\i algorithms
is able to outperform
Newton's method in almost
effort .. -\ similar observation
can be made of the HKS-B
with respect to Broyden's method and the NEN algorithm.
Ho\\·e\'er. the HI\S-E\' algorithm does not take advantage of the Krylov-Broyden
step
in the same way that thl-' Ilonlinear KE\' algorithm.
The success
algorithms
l)f
the Richardson
illt roduces additional
iteration at the first steps of HKS-B and HKS-E\'
savings with respect to the corresponding
parts 1-3royden's lllethod and the \E\'
still
I he
algorithm.
COlll1tf'r-
Howe\'er. the KE:'\i algorithm
most et-ficit>ntamong all. for the t'handrasekhar
H-equation.
i~
this grollp of
HI~S methods I)('rformed modestly \\·ell. TIlt' reader call obserH' that the HKS-\' algorithm is hardly more efficient than \'t'\\'tl)l\'S method in the easy case. Additiollalh·.
the HKS-E0i is competiti\'e
with Broyden's
method. especially in the hardest
However. the performance of the HKS- B is disappointing
of iterations
iteration.
in soh'ing the linear systems
case.
due to an excessive number
with both G:\IRES and the Richardson
[-;-7
6.1.2
The
modified
The moclified Bratu
Bratu
problem
problem
is gi\'en by 11
V
2
all
II
II
This problem
tor design
and
phenomena.
respect
to
threshold
solution
=a
plays an important
processing
and
In the absence
and hence
II
u
on
represents
:S ,\ •.
an.
a simplified
of the convection
it always
n.
10
role in combustion
term,
has a solution
\'alue .\. for which the equation
for .\
.
+ 0-:-)( .1' + ,\f =.f
modeling
model
for nonlinear
this operator
for A
<
has no solution
For more details.
and semiconduc-
is monotone
O. \Vhen
for A
we refer the reader
diffusion
> O. there is a
,\
> A.
wit h
and at least one
to [70, 92] and pointers
therein.
\Ye soh'e this problem
conditions,
il
detailf'd
See. f'.g .. Glowinski.
scheme and no upwinding
lilwal' ~\·~t('l11 generated
situation.
.Jacobi (with
Richardson
1X
LQ-12
computed
Keller and
hy the :\e\\'ton
we consider
boundary
Reinhart
problem
in [9:2] for
iteration.
except
was considered
problem
f'qual
where indicated
for thesf' experiments
by means of ef[uatiol1s
by a block-centered
is discretized
0
harder
as
0
al1d .\ grow.
= I:!S as suggested
size) preconditioner
TIll'
coefficient.
III this
in [108] .. \ hlod,
was used for
in the tables .. -\ ;'\ewton
and the linear solution
tolerance
tolerances
till'
of
were
(2.6) and (2.7).
of all nonlinear
sizes :\. These
Bratu (or Gelfand)
proposed
was used for the con\'t,cti\'e
,\ =!)i' and
Table 6.1 shO\vs the comparison
~The actual
Dirichlet
stt'p becomes
~ blocks of approximately
for six different
with homogeneous
III this work. the problem
df'scription.
llnite-ditff'rpnce
particlllar
in the unit square
problem has
are indicated
Q
=
Q.
methods
utilized
ill these
tests
on the first row of this table.
l,~
Table 6.1
Total number of linear iterations
(LI) and nonlinear iterations
C'\{) for all methods discussed in this thesis applied to the modified Bratu
problem. The quantities
in parentheses
indicate the number of Richardson
iterations employed by the HKS algorithms.
!
I \Iet
! \'
LI
hod
! \' e\\'ton
Ie_omp, .ewton
\'
HOI\\'
HKS·~
Broyden
n
:38
22 ( L1)
:)
fj,
9
--)
.-
9
L1
10
:36(21 )
i.t' .. ('\'enly spaced
of the coordinate
G\IRES
:)
70
:37(26)
KE::\
HI\S·B
HI":S· E:'-i
lillC'itr iterations
directions,
algorirhm
.. \11 HI\S
(shown
ear iterations
bottom
line hert.' is that
algorithms
reproduce
obsen'ation
smaller
8
10
~
"-
dimension
--!
I
9
8
preconditioner
9
8
respectively.
\\"as used to accelerate
by
\e\\'ll)f\
a reduction
method.
'"
of nonlinear
~avings
properties
the Hessenberg
algorithms.
these
of Broyclen's
update,
behaves
I
,~
I
9
6 '
the
the non-
and \E\'
iteratiull
number
we can observe
governing
matrix
matrix,
method
of lina small
t.hough.
The
Krylov·secant
method.
This last
i.e .. an operator
like Broyden's
'
~ \
half the number
in the overall
for these
updates
than
of Richard:sol1
Conn:>rsely.
iterations
the .Jacobian
f3roydell's
lip the tllllllber
adding
the Krylo ...·Broyden
because
of about
f
in each
t he linear
can still appreciate
than
I
degree
represent
~ i
:~, Ii
:~ i
,
17:3
108
12-1:(19)
20:3
2:3--1
2-t4
10.5(80)
92( .54)
--!
iterations
\'1 :
12.)
in a higher
well the cOl1\'ergence
is important
:3
:3
6
8
i
I
Ll
I
-l
:3
:3
6
8
98
1:31
8:3
64( .52)
1·56
178
186
80(66)
86(:)7)
-!
-
--!
by the HI\S algorithms.
in the number
illcn-'nH'nt
8
I \'
LI
I
can be made on these result.s.
methods
\\'e
employed
-t
:3
:3
6
I \'
points
\Ioreo\'(:>r.
in parenthesis)
LI
70
92
61
4.5( 36)
11:3
l-r(
1:3.)
:'59(46)
62( :38)
·")0
interesting
f'lllployed
counterparts.
I :\1
:\ tridiagonal
size affects
l)f (;\[ B ES iterations
,40
:30
wit.h 10. 20. :30. -[0 and :'50 grid blocks.
meshes
cOI1\·ergencf'. Se\'eral
TIlt' problf'm
Ll
-t6
60
-to
2/(18)
i--!
84
102
-18(28)
-I:2(2.j)
I
-I:
:3
-I:
:~t
\'E\'
much
20
10
of
update
179
of tilt' .Jacobian.
Therefore.
gence C[lIalit.\· of Bronlf'n's
the HKS methods
method
promise
to approximate
with the added savings
st em1l1 i II!.!, from t he fact that u pc\at
t-'S
are performed
the ('<)lI\'l'r-
in Hoating point operations
on a mat rix of considera hly lo\\"pr
order.
all the
other
1)\' the HOI\)i
I ion. ilS
hancl. \\"e can obser\"E' the savings
algorithm
compared
it has been obsen'ed
~teps than the composite
Ilumlwr of nonlinear
\e\\'ton's
nonlil1ear
residual
deli\'ered
by the composite
terms
:\'e\\'toIl's
in the HOI\::\'
to remark
that
1Ilg; in sonw situations.
\lot !1('c('ssarily
\)pl'rilli()n~
(;\IHES
method
grows
quadratically
l"sually.
"olul ion is approached
residuals
this number
due to the
point
norms delin'red
by Sroyden's
t
crill-'ria
operatiolls
by the KE\
below.
algorithm
[,-I!)].
iterations
method
(in
it is
may be decei\'-
G:\IRES
iterations
does
of floating
point
taken
in a particlllClr
is higher as the nOl1liIlt'iH
tolerances
(i.e .. decrease
of 'lk!
This fact shall be important
of the HOI~\
Finally.
The difficulty
On this matter
of iterations
i!.!,htt-'ning;of linear
the one
and the nonlinear
\\"e remark
are also smaller
that
to
I~E\
the nonlinear
than those producl"d
met hod.
The quadratic
more
by G\IRES).
of lillf'ar iterations
than
:'-rewton's
work since the number
for the COI1\'ergf>I1Cl"
analysis
in tp.rms of floating
than
of G\IRES
the number
hy the Eisf>l1stat and \\'alker
take into account
6.4).
\\"ords, more accumulated
with
the norm of the final
(see e.g .. Figure
employed
hettt"r
spend the same
smaller
higher
In addi-
to generate
both basically
that
iterations
method,
of magnitude
\\"as slightly
the o\'erall\lllmber
In other
~ewton's
to remark
orders
impl,\' more computational
,",olulion,
p\'t'snilwd
Although,
is se\'eral
G~IRES
has the potential
it is important
of thl" 111lllllwr of linear iterations
important
I'omes
the HOK~
method.
in the HOI,~
of the linf'ar systems
with the composite
before,
iterations.
in accumulated
gro\\'t h of the number
pronoullced
as the
problem
of floating
point operations
size increases.
This
in G\IRES
implies
that
lw-
Sil\"IIlgs
1,--0
in operations
relatin:,
also grow quadratically
number
of linear
even though
itE'rations
among
the table shows almost
tilt-'
~C1llle
all methods.
Modifi6d Bralu Problem
-
- Newton
-
Camp" Newton
-HOKN
HKS_N
\
\
\
\
\
,
,
,
,
,
\
,
\
,
,
,
\
0.5
Figure 6.4
(R\RY)
'_5
3
2
Iteration
of nonlinear
to COII\'l'rge to the solution .. -\5 in the example
as t!1l' best ill terms of total
\\"c can obsen'e
that
the composite
highf'st
nonlinear
method
and that
to the nonlinear
iteration
by the
number
of nonlinear
\cwtOI1's
iterations
nOl1linear
of the :\E);
KE:\
will make the HKS-E:'\
\orms
on a
for all method~
methods
appf'ar ...
\e\\"ton's
I1wthod
iterations.
method.
algorithm
\ote
taken
higher-order
iterations
The
that
than
HKS-\
amol1g all. [n a similar
algorithm.
KE:'\ algorithm.
iterations
cases.
HOI~\ takes less nonlinear
htlt more than
CUl'\'e described
4
\onlinE'at' iterations
\"s. Relative \onlinear
Residuals
of \ewtoll-like
methods for the modified Bratu problem
-1:0 x lO grid,
Fi!!;1Il'f's h. f and 6.:') show the number
numlwr
3_5
algorithm
fashion.
takes
the
the com"ergC'n('(>
falls between
that
of Broyden's
the HKS-EN
performs
similarly
so that one may expect
the use of the Richardson
algorithm
whene\'er
more efficient
Richardson
suc-
1';;1
ceeels at every attempt.
this last observation
The
applies
HKS-B
mimics
the behavior
of Broy'den's
met.hod.
~o
as well.
Modified
Bratu
Problem
-
- Broyden
- - NEN
-KEN
HKS_B
HKS_EN
2
3
6
4
Iteration
Figure 6.5
(R\R\')
\'onlinear
iterations
\·s. Relative ~onlinear
Residuals ;'\J'orms
of secant-like methoels for the modified Bratu problem on a -to x 10
~rid.
Figure
6,.)
calibrates
once more the quality
pared to tlw well known Broyden
algorithm
and Broyden's
dates are restricted
the intermediate
rithm
and
faster
beha\'ior
KrylO\'-Broyden
performs
versions
method
that
Kr,\'ll)\' hasis.
of the nonlinear
algorithm.
updates.
the
method.
{'nder
\'E\
performs
updates.
mill-
Broyden's
Ill'-
the sam!' light. we can explain
I\:E:'-i algorithm
KE\
upelate
of cur\'f'S betweE"n the HKS-I~
not much is lost when
ThE" nonlinear
o-nly l\:rylov-Broyclen
of Broyclen's
The closeness
suggt'sts
to the current
and the HKS-E\
HKS-E\
npdatt'.
of the KrylO\'-Broyelen
hetween
algorithm
only
Broyden
the \E.\
alternates
updates
All three share the feature
"I!!;l)-
Broyrlf'n
and.
tlue>
of being
.\S Iwfore. measuring
nonlinear iterations
floating point operations
instead of number of numher of
provides more conclusi\'e insights.
t he computational
Figures 6.6 and 6.7 illustratl'
efficiency of the new methods.
Modified Bratu Problem
10°
10·'
10-2
3
10-
~---,-
!
-
,-,
r
10-4
~
,
~
F
,
,
~ 10·a
E 10.0
10.7
10.0
U - . Newton
- -
::,JI0
Comp_ NeWlon
:~;~N
2
4
6
8
10
MFlop
14
12
16
20
18
Figure 6.6 Performance in millions of floating point operations of ~e\vton's
method. composite .\ewton·s method. the HOK.\ algorithm and the HKS·.\
algorithm for soh'ing the modified Bratn prohlem on a ,-1:0 x -1:0 grid.
Figure (-j.Gshows how the HOK.\ algorithm outpprforllls
method.
The penalty introduced
\('\\·toll·~
in soh-ing two linear ~ystems with G~£RES with
the latter method spoils the nice capabilities
provides higher conn~rgence
the composite
rates without
suggested
incurring
in Figure 6.-1. The HOI\:'\
in such penalty.
Although.
it
may not he as effecti\'e as the composite :'-iewton's method in driving the nonlinear
residual norms down. it saves a sensible amount of computation
l)f the underlying Krylov information.
the Kryiov-Broyden
step deteriorates
In this particular
by taking advantage
case. however. the quality of
as the solution is approached.
making :\ewtoll'~
method more efficient for nonlinear tolerances in the order of LO x 10-7 which ma.\·
be considered
fairly small
final I\rylov-Broyden
Richardson
iteration
in most
practical
steps explains
situations.
the poor results
was always able to converge
good as those deli\'ered
The lack of success
of the HKS-E);
but the nonlinear
of
algorithm.
tllP
The
steps were not il~
by G\IRES.
Mooilied
Sratu
PrOtllem
r
I - , Sroyden
1- -
NEN
'-KEN
II
. -
HKS S
-
HKS_EN
10
5
20
15
MFlop
25
30
Figure 6.7
Performance
in millions of floating point, operations
of
Broyden's method, the nonlinear Eirola-:\'e\'anlinna
algorithm,
the nonlinear
I\E\" algorithm.
the HKS-B algorithm and the HKS-E\" algorithm
for
soh'ing the modified Bratu problem on a -lQ x 10 grid.
Figllre
6. j' ~hows a much closer
t hey were less effect i \'e than
linear step in more than
l)elS
This fact stems
primarily.
of C:\IRES
to become
t hose met hods evaluati ng the .J acobian
,jO% of computing
based on Krylov- Broyden
met hod tends
methods
more efficient
from the increasing
from the increasing
iterations
rc-'st'l11blaIlCf-'ilmol1g all Sf-'('ill1t methods.
Secondly,
yield the desired
at small
deterioration
difficulty
(see Table
work.
relative
provides
s
norms.
update.
The significant
a more consistent
meth-
Broyden'
residual
of the KrylO\'-Broyden
llOll-
secant
pay-off although
of the linear systems.
6.1 above)
at, e\'ery
the faster
nonlinear
Firstly,
but
savings
behavior
of
I ~-l
computational
effort against
the HKS-B
algorithms
In general.
~E~
compared
the contrasting
algorithm
con\'erging
is amazing
picture
nonlinear
:\ewton's
on the modified
makes appropriate
an analysis
ilnd.
tridiagonal
IlXI())
Both
ii.p. incomplete
block-.Jacobi
linear iterations
G:\IRES
U'
and Richardson
In this sense, the new
the computational
we present
matrix
system
some indefiniteness.)
cost of the
-l we de\'oted
methods
of the associated
preconditioners:
appear
point
proposed
.Jacobian
.Jacobi
preconditioners
to achien'
(in the HI,S algorithms)
precol1ditioned
for
how the precondi-
for all the Krylov-secant
They also produce
not positi\'e
rate rna,\" /Jot
linear
of this kind here.
quitt' poor in this casf:'. due to the strong
inherits
convergence
(i.e .. diag;onill
with ;3 and
\\'ith no infill inside the matrix
operator
the
the most expt-~lIsi\'I'
(see Table 6.2). In Chapter
block-.Jacobi
preconditioner
and
t.hat make them attractive
problem.
of difficulty
preconditioners
for all methods.
theoretical
exceeding
Bratu
of standard
preconditioner.
method
6.-1:-6.7. From being the methods
a balance
of all methods
high degree
a family
and
method.
The
We consider
",;(·itlill~).
without
to the use of preconditioning
in this dissertatiol1.
:\"ewton's
implementation.
maintain
rates
lionel' affects the conw'rgence
systems
of the composite
of a computer
and Broyden's
To end the analysis
of the HKS-E:'\
steps t.hey go to being almost
algorithms
conw'rgf:'nce
norms
KEN algorithm.
whel1 one looks at Figures
in terms
family of Krylo\'-secant
a discussion
residual
two ext remes show how a rapid
sOllnd as promising
traditional
nonlinear
to the nonlinear
in the fewest nonlinear
to rese, These
faster
relative
hlocb
bandwidth).
the lowest number
the lowest accumulated
iterations.
--I:
~ote that
of total
number
of
the ILt'(O)
is
cOI1\'ecti\'e part that makes the inverse of till-'
stable.
(Consequently.
lie on the left side of the complex
some eigenvalues
of [he
plane and the preconditioned
Table 6.2
Summary of the total number of linear iterations
shown with
se\'eral preconditioners
for all nonlil1ear methods.
The quantities
in
pilrel1theses indicate the number of Richardson
iterations
employed by tht:'
HKS methods.
The problem considered is on a -1:0 x 4:0 grid.
I
\I('thod
\ewton
Comp. \ewton
HOI\:\
HKS-:'\
.Jacobi
:120
,1:07
:3:)9
:t21(:3~))
I
I Broyden
I ~E~
'
Ul
-t:31
:ri.)
I\:E~
HI\:S-B
HI\:S-E:\
Tridiagonal
98
1:31
~:3
64:( .12)
1.')6
.
preconditioners.
Recall
hetween
that
rf'sldt
ill railure
pn'cuIHliriollf'l'.
of nonlinear
I
j
28( 19)
2.1( 16)
1:38
17:3
16:3
9.1(68)
n6
260
269
linear
iterations
·n
66
74,
87
4,:3(4,1)
4:5(:32 )
and preconditioned
effectiw'ly
are applied
!
1
I
1
i
i
II
I
for different
to the precondi-
II FII
system . .\ large inconsistf'lIcy
(recall
then'
are sllllllllarized
\Verf'
process
according
110
differences
in Table
in §§-t.2.2) ma\'
discussion
as the nonlinear
reduc<' consistently
to add that
(which
i
127( 108)
1:30( 8:3)
of the methods
updates
I
and that therE' is no \-'v'ayto reflect (at least in terms
iterations
It is worth
;')8
78
90
4:8(32)
:39( :3:3)
al1d the fixed preconditioner
in reducing
tablp shu\\'" lhat
I
-lQ
I .J
KryIO\'-Broyden
cost) the llpdated
this system
[LUIO)
6.2 is to show the stability
tiolled ~ystf'm soh'ed by G\[RES
of cOlllputational
B.Jacobi( 4:)
178
186
80(66)
86(,:)7)
207(169)
220( 1;')2)
The main point of Table
B.Jacobi(8)
H
62
.')2
ach·al1ces.
to the quality
TII(,
or
lilt'
in the totallll\llllwl'
6.1 for this problem
size of
-to x -to.)
6.1.3
Richards'
This example
III
a \-ertical
equation
problem
models
cross-section.
. region betwf'en
the ground
the infiltration
of the near-surface
This is a rase of unsaturated
surface
and the water table.
flow that
underground
takes
i.e .. the so called
wile
place in tilt'
t'adOSf
-:;OIlf.
\\"Iwrp f! a nel parE' t he ground water capilla ry head and densi ty. respect ively.
different
functional
cients
the subsurface
Oil
and hydraulic
forms are often
used to describe
\vater content.
conducti\'ity
the dependence
For this example,
HOWf'H'r.
of both codfi-
our choices
of dispersi\"ity
are. respectin>ly.
I
[{(c) = [{oel.
and
wil h
.-
C -
Co
Cs -
Co
('
\\"here
Co
is the irreducible
dependent
coefficient
underground
whose nonzero
[\'o(i.j) = ~
1-
This choice of
[\"0.
and. represents
although
a narrow
The hydraulic
1110\'('.
I,d,('
distances
('1/
=
Figure
in underground
n.];),
domain
The solution
I
I
shows
for
+1
is sometimes
at saturation
formatiol1s.
for tlw
found in underground
rock where the moisture
is proportional
of magnitude
\-alue of the irreducible
for a mesh of 16 x 16 at the
151
the effect of the heterogeneity
content
formations
is allowed
to
to the rock permeabilwithin
[n our computational
lIlUistmE'
to
1 ~ i,j ~5.
o\"er a few orders
which reprE'sE'nts a typical
6.S slll)WS the solution
dimensional
J
chanl1el of pernwable
conducti\·ity
1\0 is a position
and the tensor
\"alues have been chosen according
contri\·ed.
ity. which has been shown to change
~hort
water content
relati\"(~ly
experiments
watf:'r content.
distribution
the
O\'et'
W('
St'('
1\\"0-
and 1000th time steps of simulation.
in the resulting
subsurface
water
was chosen
small
content.
"-\ constant
t'!}()11gh
ilnd
time
step
to allow the inexact
was used
for these
simulations.
which
~ ewton met hod to can verge \\"ithin
gi \'en by
~t
= ~h2
16
.
-to n0l11inear
iterat ions
This small time step was required
[net as an acceptable
Figures
initial
in order
guess for the nonlinear
6.9 show the dispel'sivity
for the same discretization
,;teps.
Figure 6.10 sho\\'s the distribution
geneity
the interval
scaling
of having
in t he scaling
time
o\'er the two-dimensional
£10-
6.8. at the pi and
of the transport
of the model.
(0 ..j) as a result
is hidden
mesh as in Figure
to give the reader
and nonlinearities
of the previous
iteration.
D(c).
coefficient.
malll.
Bot h figures are intended
to use the solution
The coefficients
scaled
of the spatial
and
instf'ad.
effect of hetero-
are shown
l\(c)
both
l\(c).
coefficient.
a feel for the combined
1000th timt'
to vary within
D(c) by
[(O.max.
This
coordinates.
Richards' equation
500
,,'"
450
II
400
"
,"
~"
350
II
"
,"
~"
300
0-
d::
::;;
"
250
"
200
I- -
150
I- -
100
Newton
Comp_ Newton
-HOKN
50
HKS_N
I..;"
o'-=0
10
20
30
40
50
TIme step
60
70
80
90
100
Figure 6.11
Performance
in accumulated
millions of floating point
operations
of :'\ewton's method. composite \'ewton's
method. the HOI\:\'
algorithm and the HKS-~ algorithm for sol\'ing Richards' equation.
Figure
6.11 and
for all the nonlinear
6.12 show
methods
the accumulated
(million)
as t he simulation
progresses
floating
point
operations
up to 100 time steps.
fOf
I Jl)
f
Richards' equation
500
450
400
350
300
,
CL
£
250
:::;;
!
200
-
- Broyden
150
-
-
NEN
-KEN
100
HKS_B
50
HKS_EN
-+-
0
0
10
20
40
50
Time step
60
70
80
100
90
Figure
6.12 Pf'rformance
in accumulated
millions of floating point
operations
of Broyden's method. the 110nlinear Eirola-~evanlinna
algorithm.
the nonlinear [\:E:\ algorithm.
the HKS-B algorithm and the HKS-E~
algorithm
for soh-ing Richards' equation.
a c1iscrl'tization
('xhibils
lIlt>
mesh of :32 x :3:2.
computational
TIlt' 1I0h~:\algorithm
methods.
(see Fi~lIrl' Ii. I I ). TIlt' incr('ilsing
ach'ances
operations.
110111illParproblems
The CUf\'e clearly
shows a signific'i1llt ~i\\'ing in computational
as simulation
of HO(lting point
was Ilsed.
cost trel1d of allllol1linear
to ('nd of t his ~hort simulation
lil1t'ar problems
:\0 preconditioning
This
producps
growth
cost from ~ta('l
difficulty
a slIpt'r1il1ear growth
or
tilt'
in the nllllllwl'
is not only due to the complexity
but also to that of the linear
problem,
1101\-
This is an example
of t lw
wllf'l"l"
t he region of rapid com'ergence
is far from the initial
guess given at e\'ery time step.
causing
to \"ewton's
before
can
unexpf'cted
be obsern'd
toward
difficulties
in Figure
the solution.
making
6.12 that
them
secant
method
methods
more preferable
than
reaching
produce
:\ewton
that
region.
more efficient
[t
steps
type of approaches.
The HOKN algorithm
~teps per nonlinear
E\
delivers between one and two successful Krylov- Sro~'dpl1
for this problem case.
:'iote. ho\....
ever. that the HI\S-
algorit hm is more efficient than this algorithm
during the first :30 time steps.
Throughout
iteration
t.he whole simulation.
the HKS-EN algorithms
efficient than \ewtol1's
method and the composite
substitution
by Richardson
of G\IRES
iterations.
turns out to be mon~
~ewton's
method owing to till"
However. in the absence of that
beneficial secant step it shows a similar order cost to that of the other t.wo nonlinear
methods.
Figure 6.12 shows again that the nonlinear KE\
close competitors.
perhaps
algorithm
\....ith a marginal advantage
and the HKS-E\
for the latter.
In this case.
the HI\:S-S performs badly as result of a sequence of poor Krylov-Sroyden
somehow are correctt>d ill the HKS-E~ algorithm.
a clear winner betwef'n Sroyc\en and the \E~
\ewton's
are
steps that
Also. there does not seem to he
algorithm
(as it also occurs betweell
method and the composite :\"ewton's method) but. both the nonlinear KE\
and the HKS-E\
algorithms
Table 6.3 summarizes
perform hetter yet.
convergence of the previous plots. The table confirms the
excessi\'e work (in terms of nonlinear
tlw composite \ewton's
iterations)
method and the HKS-\
algorithm and all secant methods.
bel' of nonlinear iterations
number of linear iterations.
carried out by :\ewton's
algorithms
The composite \ewton's
of Sewton's
method.
compared to the HO!\:\
lllf'thoc\ hah'es the
Ulllll-
method but both spend ahout the same totid
The figures for the HOK\
algorithm
perfectly justify
what is obsen'ed in Figure 6.11. It ['ec\uces in half t he number of nonlinear iterat iOlls
taken by the composite \ewton's
method and. besides. it reduces by an almost -l:-fold
the total number of linear iterations with respect to this higher order method.
the HOI\::\" algorithm
not only tackles efficiently the nonlinearities
much easier linear problems that arise at the neighborhood
Hence.
but also leac\s to
of the solution.
Table 6.3
Tot.al of nonlinear iterations
(~r). G:\1RES iterations
(GI) al1d
(\\'hen applicable)
Richardson
iterat ions (R ich) for inexact versions of se\'f~ral
nonlinear solvers. The problem size considered is of size 16 x 16 gridblocks
after 100 time steps of simulation.
! \"1
:\lethod
\ewton
Camp. ~ewton
HOK:'-i
HKS-:\
Sroyclen
:'\E:"i
KEN
Hr~S-B
HKS-E:'-i
The
\EN
algorithm
Sroyden's
method
putational
efficiency
nonlinpar
between
The
obtalOed
iterations
HKS-B
algorithm
iterations.
compared
o\"erwork
of Richardson
C:\lRES
iterations.
iterations
corroborate
displayed
0
0
0
0
0
0
0
0
0
the
HOD
number
of linear
of nonlinear
iterations
KEN algorithm
than
\ote
to Broyden's
accounts
1\:E:\ algorithm
method:
the relative
more nonlinear
that did not alle\"iatt'
those of last section
in that
the combined
conH'\'!!;t'~
high cost of the
iterations
number
all
of G:\1R ES
results
of linear
and
relyil1g on
in the number
These
lilt,
work inducl'd
the cost of merely
algorithms.
by the
situation
the additional
exhibits
of
but n'(luc('~ in
In this particular
compensate
and HKS-E~
is marked
by
com-
number
that the HI\:S-EN algorithm
of this table is the reduction
by the HKS-:\
shown
for the similar
but its efficiency
iterations.
The table dearly
iterations
takes an intermediate
the nonlinear
iterations
itpratiolls
One of the other highlights
422
2'j;j 7
:3078
19:39
0
0
0
11:309
0
0
0
2909
1774
of (;:\IRES
via Richardson
by ext ra nonlinear
11890
1:2186
2673
6091
4046
displaid"
i'Oll2;hly 129(· the total number
savings
162/
8:35
:391
1622
6:31
:J47
these two algorithms
ft'\\'f'r Ilumlwr of linear iterations
in a few more nonlinear
Sacks ..
'6·j
but t he number
iterations
Rich.
. 499
also hah"es
of both"
G1
appear
to
iterations
of
HKS algorithms
:\ewton's
6.2
is approximately
and Bro.n[('I\·s
of C\[RES
iterations
in the
methods.
Evaluating preconditioners for coupled systems
[n this section
we discuss
t he results
and 6 ..1. which were designed
previously
covered
The matrices
and
right
hand
1.1 times
grid spacing
The data
higher
than
for the tests
:3 \f'wton
comhinations
iteratiol1s
1)2,:')
\[Hz
shown in Tahles 6.--1
for coupled systems
that
problems
to the description
and one injection
The permeability
in the vertical
discretization
sizes:
8x 8x
given
\'ertical
is uniform
direction.
was downloaded
in
\vells
in the areal
We use non-uniform
--1 and
within
from the simulation
the cmrent
time lew\.
16 x 16 x --1. \Ve ran
and preconditioner
tested
These
nodes g;i\'e a peak
after
1 time step and
The code including
was written
were run on a single node of and 18\[
clock).
were generated
~t = 0.1. 1.0 days.
of linear soh'er
and all of the tests
according
of one production
of the reservoir.
and two different
for our test
ParSim
corners
hoth cases with time steps
wit h a
side \'ectors
black oil simulator
at opposite
sel1se and
experiments
to test the ideas on preconditioners
-1. Our test model consists
Chapter
located
of the numerical
in this work.
hy the two phase
after
equal to the number
in FORTR.-\:\
SP1(RS6000.
performance
all the
77
model
:370.
of 12.1 \[F\ops
al\d
have 128 \18 of R:\\I.
The tests included
with
runs made with hoth G\1RES
each of the schellles
conditioners
tridiagonal.
Table
analyzed
in this work and.
of COlllmOIl use in reservoir
1Le(O) (i.e .. incomplete
6.--1shows
the results
Table 6 ..) shO\vs the corresponding
and BiCGSTAB
simulation
LV factorization
additionally.
(particularly
with three
the last two),
prei.e ..
with no infill) and block .Jacobi.
for all the preconditioners
results
preconditioned
for BiCGSTAB
applied
to G\[RES
preconditioned
and
with each
I! Hi
Bi(,(SL\B.
This owes to the fact that
per iteration
instead
BiCGSTAB
of the single one needed
gence of BiCGST-\B
is {'rratic.
has two matrix-vector
multipli('~
by G~IRES. Additionally,
as is well known
the COlm'r-
and call be appreciated
in Figun'
6.1-t,
Comparison
a greater
betwf'f'11 t he results
numbn
exceptions)
all of the two-stage
number
these results
stage
the
of Ol\tf'r iterations
of outer
partial
for the short{'r
jtt'ral
gence history
that
(with
a few
preconditioners
time step.
operator
after
the combil1ative
The increased
difficulty
of outer
gm'
implemented
the system.
for the two-
We believe that
full decoupling
is less for t Ilf'
is more effecti\"f' as
preconditioner.
which only uses
iterat ions for ~t = 1.0 than
of the problem
in the a\"erage number
iI
The key in interprf'ting
one and the preconditioner
number
1)1' inner
iterations
:\otice
aw'rage
only soh'es
of il1ner iterations
of the
per
Sit'!'
soh-ers. except
of each case.
:2SComb
domlllance
blocks
shows a greater
~hows the accumulated
number
notice
for both iterative
\\·hereas
1.0 show~·
that
time stt'P i~
with a longer
of inner iterations
pt>r II11it
1011.
rlw nUlllbt>r
results
.Jacobian
in all caSt-'S by the growth
rdlf'l"tl'd
IJllll'r
~f.
for the 2SComb)
for the longer
for the shorter
To this point.
decoupling
(except
of the full-decoupling
ight of t he off-diagonal
result.
=
for ~t
those
and its own power to precondition
longer time step than
il
0.1 and
for the first four preconditioners
iterations
is in the action
preconditioners
II't
=
for the long!:'r time step.
However.
smaller
for ~t
itt'ralion
for minor differenn's
is comparable
due to particular
of both
the pressure
for pressure
blocks
and concentration
components
and
therefore
Jacobian
S,-"
components
show a lower
in the time step size damages
of the decoupled
in II\('
conn'r-
that in the case of the last five preconditioners
.. -\n increase
main-diagonal
uf the outer
the diagonal
thus
producing
1l)7
Table 6.5
Results for BiCGST.-\B preconditioned
by the nine schemes
tested in this \\'ork .. Vit: number of outer iterations:
Ts: elapsed time in
seconds for the solver iteration: Tp: elapsed t.ime in seconds to form the
preconditioner:
Si.,,: average number of inner iterations
per unit outer
iteration.
Preconditioners
shown are from top to bottom: tridiagonal
(Tridiag.). incomplete LU factorization
with no infill (ILU(O)), block .Jacobi
(B.1). two-stage Combinative
(2SComb.), two-stage Additive (2SAdd.).
two-st.age ~-lultiplicative
(2S~Iult.). two-stage block Jacobi (2SBJ), two-stage
Gauss-Seidel (2SGS) and two-stage Discrete Projection (2SDP).
I
Time Step Size
Prob.
Size
-lx
8
x
8
I Time
Proh.
Size
16
x
16
x
-1
~t =.1
--+ r
Precond.
Tridiag.
rLl.'(O)
BJ
2SC'omb.
2SAdd.
2S~[ult.
2SBJ
2SGS
2SDP
Step Size
Precond.
Tridiag.
rLl'(O)
B.l
2SComb.
2SAdd.
2S~'Iult.
2SBJ
2SGS
2SDP
1')_I
2:39
80
L06
2-lI~J
2:3
11
10
I
:3.-1-2
.57.90
2.98
LL:3.7·1
88.7·1
61.:38
2-l-.97
11.9 L
20.0-l-
Ni.a
0.26
0.:37
0,17
0 ..50
1.00
1.00
0.02
0.02
0.0:3
Sit
176
> 1000
,57
170
68
-1-4
-11
17
1:3
n,:~7 .1..S-l-
-
~t
-
118 ..5:3 -l5.99
9.7.1
292 ..50
:361.7.1 1:3.:38
2:38.88 1:3.:38
8:3.21
0.09
:3.5.82 0.09
.56.72
0.12
1-l.10
-l-9
-1-9
·50
61
> LOOO
'\';"1
0')._1
0.17
0 ..50
0.7.1
0.7·5
0.02
0.02
0.0:3
Tp
-
.\"i
,
.'1
-
:~81.22 :31.-1--1
:37.-l-6
In.6-l690.00
12.00
-l-90.6:3 1:3.18
2·5.5.25 1:3.:38
68.91
0.09
:3:3.81
0.09
11
0.12
10 ·58.62
-l-2-l69
180
61
:32
23
81
18:3
188
17:3
177
179
= 1.
T,
.Vit
I
.
Tp
,
.V;.~
1.
Ts
Nit
,
Tp
Ts
=
6.20
227
>1000
2.83
75
-1-0
125 2.53.50
:3-1: 1·58.38
118
11.)
67.:38
14
-l-6.91
116
24
115
12 24.15
118
1-1- H.01
~t =.1
!
I
Tp
Ts
.\iit
--+
~t
!
!
i
2-1
, I
1 'J
76
76
82
It (l rdt'r i nller sol \"{~s.as rpflpcted
by t he res III t s on bot h tables.
a groWl h ill t hE' size of t he linear system
.-\5 for the quest ion of efficiency.
and discrete
t illws to converge
the linear
systems
typical
mpnt ioned
al though
t he problem
conspcuti\'f'
gl't'iiter
preconditioners
size. The comhinati\'e
for tl1t'~p rathpr
.\ccOl'ding
preconditioner
friendly
on a\-prage
than
\n'IIIH'ltcl\'pd
and cOl1centration
timings.
is not robust
.. \s
t Iw
enough
of
e\'en
blocks
In all experilllent~.
\\'f'
preconditioners,
TIlt' fOl'lIlt'r family
with the addition
of the global
in the consecuti\'e
type).
in the application
of the global
as \\'as mentioned
abo\·e . .\! should
decoupled
preconditiol1er
pressure
be at least
and
is that
process,
I)
to them an'
to apprlJxi-
to concentriltiolls.
to the latter
schemes.
incurred
:\Ioreo\'er.
as a preconditioner
blocks.
but
is absent
to the high overhead
of the alternate
concentration
;).-r.
\vith the consecllti\'e
Nlui\-alent
as effective
the
conditioned
step gi\'en by ,\[ (this step
times testify
more
Theorem
matrix
with respect
is approximately
The total elapsed
performs
or approximation
of the altprnate
preconditioning
(recall
use the identity
of the Schill' complement
to the comparison
2SDP
from the decoupling
blocks are :\[-matrices
in this rasp.
this appreciation
is more poorly
resulting
systems
,\ !inal word is de\'otf'd
that
as the best
2SGS. The reason
to concentrations
Schur complement
till' indi\'idual
simulation
for problems
However,
~ote
that
lnatf' ./,~ in the construction
Me
black-oil
t.he 2SDP appears
results).
its closer competitor
ilnd cOl1centration
is no guarantee
similar
at overall
than
then'
the best elapsed
robustness
for example.
by ollter iterations,
mat rix with respect
pressure
display
i.e.. tIlt'
here are only modest.
t.o ha\'e t.he required
Schur complenwnt
.-\It hough.
projection.
sizes presented
:2SGS achieves
when looking
t he pressure
precondi t ioners.
problems.
(although.
iterations
su rprisi lIgly.
in e\'ery case.
in fully implicit
preconditioner.
to the trend suggested
can be misleading
inner
appear
a
the consecu ti ve- type
two-st age block .Jacobi. Gauss-Seidel
above.
.vi.
decreased
Somehow
However.
as
J! is a
109
preconditioner
for the full .Jacobian.
or worse. We are now looking
the c1ecoupled blocks,
and therefore.
It should
with
so that
These experiments
be mentiol1ed
the coupling
-l grid blocks
numbering
scheme
.ff
results
.\1 was
blocks).
in the number
in the algebraic
that the properties
we require
step sizes. i.e .. clearly
threshold
value.
expected
to deteriorate
was ahvays
which
In spite
of linear
analysis
\Ve also remarked
11ll' I ill1l'
increase
hod bu t dops not. 110ticeably
these
results
guarantee
we believe
there
the performance
that
increasing
of greater
all
in the
robustness
of this
elapsed
matrix
times
blocks (see
combinative
because
The results
the performance
a n.\' of t he at llf'r
is still considerable
method
§ .3.:3)
blocks are met for reasonable
will not be all valid for ~t beyond
deteriorates
CO[l\'ergence of the .\ewton
was chosPI1
(notice
rapidly
of the .Jacobian
that the t\\-'o-stage
il ffecl
of bands
of the assumed
as the time step size increases
step
of J
iterations.
from the individual
our assumptions
of
soh·er.
factorization
retained
is most
llents (which it is based on) are no longer dominant.
11lf'1
number
its main effect seems to be the posting
T~ with no reduction
"how that
time by the iterative
The
layers
in the z-direction.
has to beat the action
as an incomplete
of 19.
of the path.
that this is a losing proposition
in wasted
chosen
a bandwidth
of the grid
global preconditioner.
for J that
show clearly
of nearest-neighbor
cases have
We ment.ioned
of
t.hat
infill inside
us back to beginning
for a preconditioner
the application
complete
which throws
I
room
of any of the five new two-stage
a given
preconditioner
the pressure
was
compo-
of these experiments
of the first t\\'o-st age
wo-~l age schemes.
ill choosing
itself without
time
a ~t
substantially
preconclitioners
In\' if'\\' of
which will
damaging
proposed
in this
work.
Figure 6.1:3 summarizes
size of 8 x 8 x --l and ~t
the three standard
the convergence
= 0.1. On the upper
preconditioners.
behavior
of G yIRES for the discretizat
left corner.
The plot on the upper
the plot shows the results
ion
for
right shows the convergence
200
1..
\
I
en
E
0.8~ \
en
...
"
~ 0.6~.
enQ)
i,
~0.4
~en 0'6~l
\
~ 0.4 \
l
"" I
Q)
a:
o.a
E
I
\
0.2
Q):
a:
" ,-
\
\
0.2
,
\
o
o
50 100 150 200 250
Iter.
"
20
10
30
40
50
Iter.
0.1 r
I
I
en
o.oaf
en
E...
g 0.06
g 0.6
Ul
Q)
~
1\
Q)
I
~O.4t \
~ 0.04~\
a:
o.a
E...
:\,
,\
a:
O .02~"
i \~
i
j
O~
10
5
Iter.
15
0.2
i
\
\
1
"
I
j
Or--..
5
15
10
Iter.
Figure
6.13 Relatin> residual norms VS, iteration of G:\[RES for different
preconditiollers.
The performance
with different preconditioners
are
ur!!;f\llized ill matrix form. Subplot (1.1): [Ll' (dot). Trid(dash).
block .Jacobi
I ~olid). Subplot
( 1. 2): two-stage combinative
(dot). t\vo-stage additi\'e
(dash). two-stage multiplicati\'e
(solid). Subplot (2.1): two-stage block
.Jacobi (dot). two-stage Gauss-Seidel
(dash). two-stage discrete projection
(solid). Subplot (2.2): block Jacobi (dot). two-stage multiplicative
(clash).
two-stage discrete projection
(solid). Problem Size: -! x 8 x 8. ~t
0.1.
=
201
Figure 6.14
Relative residual norms \'s. iteration of BiCGST...\B for
different preconditioners.
The performance
with different preconditioners
are
nrp;anizt>d in matrix form, Subplot (1.1): IU' (dot). Trid(dash).
block .Jacobi
holid). Subplot (1. :2): two-stage combinati\"e
(dot). two-stage additive
(dash). two-stage multiplicative
(solid). Subplot (:2.1): two-stage block
.Jacobi (dot). two-stage Gauss-Seidel
(dash). two-stage discrete projection
(solid), Subplot (2.2): block .Jacobi (dot). two-stage multiplicative
(dash).
two-stage discrete projection
(solid). Problem Size: -l x 8 x 8. ~t = 0.1.
conditions.
This implies
illld concent ration
the manipulation
coefficient
of .Jacobian
'.
"
,
H-
with 6--1:prf'sstl\'f.'
arrays.
,
"
matrices
l
~9-
.,\
~~-
.,,
.3.5 ..
,) .a~
JJ;-)2"
.:;'p
.~-J
"
~i
~.2
)3
"".""'9
oJ"
p". ..
g. I
JS
U,lul"'ll(ln.
,
".2
') J
0.4
05
Welting ph ...
Sw
Q.6
UlurallO".
0.7
Sw
l)
8
'H
Figure 6.15
(LEFT)
Relatin:, permeability
of both phases
and capillary pressure function (RIGHT).
Table
6.6
Physical
input
Inirial non\\'etting
phase presstlre at --lYft
Illili,,! \\,(>tting saturation
at -il) ft
\llll\\'Ptl in~ phase density
\llll\\'l't I if!!.!; phase (olllpressibilit~,
\\'('Itillg phase compressibility
\llllwetting
phase \'iscosity
\\'t'tting phase \"iscosity
.\real permeability
Permeahilityalong
bt and 2nd half of \'ertical
_\dditionally.
t
the data are decomposed
he ~ilnw origillalnlllllber
fact that
than
in most
the horizontal
resen'oir
plane
data.
:300p.si
.,j
~8Ib/ft3
1.:2 x LO -~ p,~i-I
'3')
, ..) ", 10-'; p...-'--I
1.6c[J
O.:2:3cp
l·jOmd
gridblocks
in an areal sense (i.e .. each processors
of grid blocks along t he dept h direction).
domains
where
lOme! and :30md
the \'ertical
the phases
direction
flow.
The
This is due to t\\\~
is relatively
effective
holds
much smaller
manipulation
of a
20:')
full permeability
tensor
IOllcf'ntrations
pressures
induces
of the linearized
ilnd a I-point
pquat ion (t his gives
unknown),
densities
matrix-vector
Table
implementation.
and satmatiolls
a 19-point
of the linearized
arrays
data
alld
stencil
non-wetting
accompanying
involves
neighbors
block of pressures
for the pressures
for
phas('
each gridhlock
communication
of each
(see to [--10]for further
det ails I.
comprises
and concentrations
the G\[RES
(i.e., the product
phase) .. -\ tridiagonal
preconditioner
of
is uSf'd
thf' COIl\'prgence rate of this inner GMRES.
"hows the associated
lIlodel consist
the physical
relati\'e
of a water
at the coordinate
6.3.2
and.
the 2SGS preconditioner
of a particular
6.6 summarizes
"p('cified)
equat.ion
products
and four corner
of each indi\'idual
to accelerate
phase
rise to the 6--1coefficient
Thereforf'.
In our particular
wetting
:;tencil for concentrations
nodf' wit h its four lat('ral
solution
a 19-point. st.encil discretization
parameters
permeability
injection
for this problem,
and capillary
well (with
bottomhole
(l. I) of the plane and. a production
at t lIP opposite
Considerations
corner
for
pressure
and Figure
functions
pressure
used.
specified)
6.1.)
The
located
well (\'v'ith bottomhole
pressure
of the plane.
implementing
the
HOK~
algorithm
with
the
2GSS preconditiorier
B(-'fun-' pr('~(-'lltillg the numerical
cOIl:,ideratiolls
the
HOK~
arising
nOlllinear
linf'ar system.
of t he form
rt'slllh.
it is important
from the joiut
illlplp.mentation
solver.
the 2SGS
the secant
Since
equation
to t'~tahlish
some e~"'1)('('i<l\
of the 2SC;S preconditiont:'r
demands
previous
on which the Krylov-Broyden
decoupling
update
flU"
of tlw
is based.
is
~06
for a given !.:th nonlinear
Here . .\Ilk)
reprf'sents
decoupled
matrix
a similar
the inexact
(D(kl)
presentation
-1
to t he .Jacobian
system
one determines
that
with;
=
preconditioner
matrix
acting
\'(k'I!/lkl.
UpOIl t l]f'
as 2 x :2 blocb
expressed
has
the secant
factorization
equation
update
for the Hessenberg
of the Hessenberg
the \'alue of the function
ill~ prlJjpctt>d \)I\to the underlying
and cOllsequently'.
terms of the decoupled
.\11 efficient
('ratioll
block Gauss-Seidel
.~g" +
=
s(k)
matrix
is given by
is given by
matrix
!!(Olklrl /,~k)ll.
1\('11(1'.
update
solution
in (.j.I).
.-\rnoldi
Broyden's
and a G\.[RES
.-t(kl. This decoupled
depicted
{":-;ing t he associated
Therefore.
iteration
ill place over all arrays
lIals f'ntries
of each block
same :-;tandard
coefficients.
Euclidean
HOK\
befort-' Ilt'-
poillt neptis to be decoupled
Technically.
implementation.
the Krylo\'- Bro.\'flt'll
can be carrit,d
\)l1t ill
system .
implementation
.Jacobian
1If'\\'
Krylo\' ~ubspace.
the entire
linear
the original
at the
is accomplished
holding
the matrix
fi\'e arrays
and the \'ector
by carrying
coefficients.
are employed
entries
norm in the line-search
out the decoupling
of ~.
This
backtracking
In order
to rf'store
to hold the main
diago-
allows to maintain
strategy.
op-
forcing
tl\l'
tprrn
207
selection
and in the nonlinear
after all Krylov-Broyden
As explained
Krylov-Broyden
stopping
criteria.
steps in the HOK:'-J' algorithm
in Chapter
update
there
--1-.
is no need
for the implementation
tiol1s can be done in terms of the updated
\';n+l
and the minimal
preconditioner
\Ve remark
inexact
Broyden
6.3.3
residual
~ewtol1
nonlil1ear
that
the
method
HOK~
Hessenberg
Numerical
All opera-
the orthogonal
matrix
E IRm. A.dditionally.
y(k)
update.
to retrieve
tlw
the
8~).
algorithm
can
be easily
changed
to the standard
the computation
of the Krylov-
results
BiCGSTAB
for two different
of the 2S('omb
for a problem
of modest
difficulty
plo.\'" alIllost hair of till' total
Bi(,(;ST.-\B
call" made
donbks
and the :2SGS preconditioning
values of ~l.
The table shows that both G~1RES
number
at each linear
The cost associated
of iterations
iteration
ils t he simulation
problems
:\lgorithm
for ~l
=
2.2.1 and
,.j
reveals.
pmhand.
.-\lgoritlllll
and the application
times comparable
Bi-CGSTAB
latter
BiCGST.-\B
alld Plw'onditiulI"r
multiplication
the
and
similarly
hilt on the other
Illllitiplications
problems.
perform
that
makes t he performance
In simple
in more complex
algorithms
.0,:)). \'otice
(d.
on GylRES
in Table 6.1.
of (;\IRES
to the matrix-\'ector
tween these two linear sol\"ers.
and efficient
=
(i.e .. for ~l
the IIl1mher of l1liltrix-\"ector
by G\IRES
whereas
This is shown
al1d BiC'GST.-\B
of any of the two-st age preconditioners
robust
the .Jacobian·s
steps.
the dff'ct
G:\IRES.
form
algorithm.
matrix,
solution
with a single flag inhibiting
\Ye compare
2.:3.1),
to explicitly
after each Krylov-Broyden
direction.
values an' restored
have been completed.
of the HOK~
approximation
_\[(1.) is kept fixed,
un preconditioned
The coefficients
method
tends
be-
to outperform
lends
to be more
Also remarkable
ditioner
in relation
is the performance
to the 2SComb
:2SGS preconditioner
tions.
Since the number
t he computer
This
reduces
times
result
matrices
extracted
linear
preconditioner.
soh'ers
by almost
iterations
the obsen'ations
from this physical
is practically
I. recall discussion
10 times
made
with the 2SGS precon-
For this particular
by more than a 10-fold the total
of nonlinear
corroborates
of both
in the
problem.
number
of linear
the
itera-
unchanged.
we impro\"(;>
on cost of both
schemes).
previous
section
fOf samplf'
model.
Table 6.7
Summar~' of linear iterations
(LI). nonlinear iterations (:\1).
number of backtracks
(NB) and execution times of G\IRES
and Bi-CGST.-\B
with the use of the :2SC'omb al1d the :2SGS preconditioners.
The simulation
cO\'ers :20 time steps with ,,:).t = .O.j and ,,:).t = ..j for a problem of size
8 x 2--1:x :2t gridblocks on a mesh o1'-!- x --I: nodes of the Intel Paragon.
("'):
Backtracking
met.hod failed after the 1,th time step; (**): flt was halved
after the 16th time step.
Linear solverjPrec.
G \IR ES j2SComb
.O.j
G\[RESj2SGS
Bi-CGST:\Bj2SComb
! Bi-CGST.-\.Bj2SGS
,,:).t
!
I G\IRESj2SComb
!
,,-i
I
! Bi-CGST.-\.Bj2SGS
tiOller forces a l'f'duction
concf'lltrations
within
to regulate
pressures
of material
balance
Time(Hrs.)
n
102
-!-9
8.10
--I:.j
0
0
0
(i6
1.10
0.11
L.19
0.07
-1--1: 0
6.:3'1
100 0
107 0
O.·jl
l!JO --1:1 ;').62
10:2 12
0.70
I 12808
(Co')
for different
I
-!-~n
reasons.
For,,:).t
=
.,j.
the 2SGS
of the time ~tep clue to the high changes
the time
the next
and saturations
\B
I
G\IRESj2SGS
!l3i-CGSTA B/2SComb('
fails twice
\iI
I 6.-!.j
:'):38
I
BiCGST-\.B
tomary
I
LI
l--l:·jO
step,
In many
resen'oir
time step according
within
the current
due to the deterioration
time
simulation
to a maximum
step.
or eventual
precondi-
of pressures
codes
allowable
This prevents
and
it. is ellschange
possible
failure of the nonlinear
or
lo~s
solu-
209
tion.
Shortening
the time step increases
the chances
of convergence
for the nonlinear
met hod.
The failure with the ~SC'omb preconditioner
because
the linear
soh-er was unable
(0.1. in our case).
IIFil,
for decreasing
high number
Figures
Paragon
011
--I:. 8 and
t
This
was enough
that
prohlem
this execution
direction
had undergone
the issue of parallel
problem
sizes,
to capture
the simulator
is mainly
t
an acceptable
allO\ved
a
scalability
\ve compare
the efficiency
in
timil1gs
trend
of the
machines.
scales better
on the IBNI SP2 than on
due to the low latency
has compared
size the larger
tolerance
failt-'c\
steps.
6.17 summarize
This
can obserw
machil1e
breakdown.
For four different
soh"er on both
Iw smallest
before
The line-search
at the maximum
could not provide
and nonlinear
IB:\I SP2.
t he Intel Paragon.
problem
that.
12 processors.
The reader
to converge
BiCGSTAB
6.16. 6.19 and
and
HOK\j2SGS
the former
\"ote
of backtracks
6.18.
the
Therefore.
is more seri~us.
with the latter
he efficiencies
size is practically
obtained.
go\"erned
and high handwidth
ol1e. As expected.
~ote
that
the larger
how the computing
hy the communication
the
time
for
overhead
111
hot h machines.
TllC' major
hulk of parallelism
TIlt' hllKk tridiag;onal
and
sufficient
preconditioner
to meet
the
in the inl1er and outer
operations
(i.e ... -\XPY·s
is chosen
in the construction
(Howe\"er.
orthogonality
and
O\"er the
-of the
the G:\lRES
linear
G:\IRES
inner
Krylo\'
tolerances.
C\IRES
without
contemplates
is totally
t'nfortunately.
the classical
to exploit
sacrificing
iterative
most
iOlH'r.
parall('l
of til(-'
at the level of BLAS-l
In this regard.
Gram-Schmidt
basis
of the precondit
are parallelizable
products).
modified
implementation
if required.)
in t he computation
used in the innermost
required
operations
Schmidt
resides
further
stability
refinement
Gram-
parallelism
requirements,
to presen"p
:lIO
The
Krylov-Broyden
step
allelism
since the Hessenberg
.\rnoldi
process.
among
with \'ectors
The
Hessenberg
on that
basis.
encouraging
time
sizes than
shown
those
display
interprocessor
relatin"ly
by the SP2.
needed
of the SP2 make
length.
arises
thus
for difff'rf'nt
results
sizes.
gain for different
increasing
not illlpl~' 1Ilolor reductions
of
!IFII
is explained
time
(ll"(~
decay
problem
"iize
by the fact
because
of the
The much shortf'r
more linearly
of computation
a high penalty
ill ncl'
the problem
of zero length.
dependent
to communication
it is important
steps aIIO\\'('([ ill the HOI\:\
the IIIlIIIlwr of h:rdo\'-secant
basis \"f'ctors
of performing
is latency-bound
cases.
the
shown in both figures
This
transfer
III both
after
of code with a small degree
that
on the Paragon
the ratio
of par-
show a more rapid speedup
on the Paragon.
the interprocessor
problem
as result
of efficiency
to set up a message
keeping
in each processor
proportion
means,
part
and distributing
the scaling
This
impact
communication
long times
on message
. constant
matrix
a \vider range
lllay ha'\"e a grf>ater efficiency
latencies
of partitioning
The results on the Paragon
but. at the same
loses a significant
up replicated
in spite of the significant
of fine grain parallelism.
that
ends
ih mind the above discussion.
Ha\'ing
somehow
matrix
This is a consequence
all processors.
products
in the HOK:'-l' method
algorithm
fairly
to watch
for
since if tlwy do
may be paid in terms
of pcll'Cdlt"
dficiency.
It is important
-t processors
to add that
so. this explains
The log-log plots in Figures
the simulator
why timings
6.l8 and
are compared
f).
(indicated
with slope -L) that
all problem
the SP2.
This
the fact that
complements
as more processors
are added
from ;')O7( to 1OO~
faster
than
\\'as not designed
relatiwly
t he Paragon.
to this number.
I!> show the de\'iation
size cases present
in the SP2.
to work with less than
timings
from ideal speedup
on both the Paragon
are less sensitive
In these experiments
In theory
to degradation
the SP2 shows
this margin
and
is expected
to 1)1'
to 1)(>
~ll
I
3~
2.8
i-
,
4.12x12
1- -
i
1
8x24x24
261-1
12x36x36
-
I
- 16x48x48
2.4
~
2'2~
c.
~
2
l
-
,
I .8
--
- -
-
-
-
-
- -
- - - -
- - '---
-
-
JJ
I
_ '
1 6~
-
---
1'
1I
- - - - - - -
1
::,~,
,J
4
5
6
7
N.
8
processors
9
10
II
12
Figure 6.16 Speedup vs. number of processors for the two-phase problem
using the HOI\:\j2SGS
soln'r on an Intel Paragon after 20 time steps.
3
'r
28~
:: -
4.12x12
-
8.24x24
2.6~
12.36x36
i -
.
16x46x48
2.4
I
2.2~
c.
~
:!t.
2
Ul
1.8
1.6
1.4
I
i
1.2
1
4
5
6
10
789
N.
II
12
processors
Figure 6.17 Speedup vs. number of processors for the two-phase problem
llsing the HOK:\/2SGS solver on an IB~I SP2 after 20 time steps.
larger.
but the author
tllf' pl"rformance
suspects
that
memory
hierarchy
f'ffects may be deteriorating
of the SP2.
-4"12,,12
I
1- -
I
[
8><24.24
12><36><36
-
- 16><48><46
g t
- - - i
~I
'~l
_
1
! ~------------
----
-------d
~
fL
I
,
1
.
j
10'
Logl0 or N. orocessors
Figure
6.18
Lug-log plot of thl" number of processors vs. execution time
for the two-phase probll"m using the HOT\:'-l'j2SGS solver on an Intel Paragon
after :W till ..' :;teps.
Figllre
(),~lJ
\t'\\'lulI/1S(;S
\\'jt
illustrates
:,oln'rs
the relati\'ely
ha\'e in the simulati()n.
112S( 'omh pl'<'conditiOillng
2SCS
preconditioning.
slightly
rapid
The
more this number
nonlinear
Figme
diffel't'nce
tll)\\'
rile line correcrton
sol\'ing
presen'e
tilllf'S
Krylo\"-Broyden
of accumulated
that
For a modf'ratf'
to
takes abu\"('
impact
HOK\
and
size. C\IHES
problem
lilll';u iterations
I!lUI'<'
:-;leps in the
iterations.
bot.h HOK:--;j2SGS
than
method
This is accomplished
wit 11
l'f'dul"I'
by a mort'
com'ergence.
6.21 t'xpresses
is
strong
G\IRES
iess prominent
in tilt' 2SComb
iteration
between
effort
the :!SComb
preconditioner
the pressure
:;ystem.
This method
the highest
possible
robustness.
in terms
and
contributes
\\'as not introduced
The line correction
of computer
time.
The
:2SGS preconditioning.
to reducing
the cost for
in the 2SGS in order
method
to
in the :2SGS has
-
4x12x12
-
-
10'
12x36x36
-
~
'"
8x24x24
-
I
" 16.48.48
- - -
i..=
J
"0
...J
E
~
10 ~
_
~
,I
_
~
la'
I
LaglO of N. processors
Figure 6.19
Log-log plot of the number of processors vs, execution t.ime
for the two-phase problem using t.he HOKNj2SGS sol\'er on an IBM SP2
.
aft.er :W time steps.
I
10° ~
-
HOKN/2SGS
II
Newlon/2SGS
f
z
a:
..
Newlon/2SComb
10-';:
Z
a:
10 '~
..
,
a
t
Figure 6.20
100
50
150
GMRES ,ter,
\ umber of accumulated GMRES iterations vs Relati\'e
nonlinear residual norms (\R\R)
using t.he HOK~j2SGS. :'-l'ewtonj2SGS
and \e\\"tonj2SComb
solwrs on l~ nodes of t.he IB\1 SP2 for a problem size
of l6 x -tS x -t8 at the t hire! t.ime st.ep with ~t = .O,j day.
-
HOKN/2SGS
Newton/2SGS
Newton/2SComb
..
300
..
400
CPU lime
..
700
600
500
(5)
CPC: time vs Relative
Figure 6.21
nonlinear residual norms (NRNR)
using the HOI\::\' j2SGS. :\ewton/2SGS
and Newton/2SComb
solvers on l2
nodes of the IB~I SP2 for a problem size of 16 x -t8 x -t8 at the third t.ime
step with ~t = .0,) day ..
:,onw dilficulties
due to the lack of diagonal
(i.e .. this situation
the
does not happen
SYSII'lll is rt'ally
easy
",hell tl1l'1'1' ,It'(' relatiw'
!!;radif>nts uf
till'
for concentrations
phcbP
to the nonlinear
that
the line-colTPctioll
time
steps
decoupling
solution.
by G:\IRES
preconditioner
but
with
is partial
greater
coefficients
gradients
tillle
st('ps.
backtracking
it is preferred
in order
Illethod
pressure
at large
of TlwuwII1 ;').,L I, Since the line-search
guesses
of the pressure
This loss of diagollal
to solve),
small capillary
\wlfing
dominance
where.
dominance
compared
\'il)latillg
11II.'t
block matrix
thl'n
hod allows
to reinforce
to he able to take larger
is obsen'pd
to permeabilit.\,
tl1l' conditiolls
to handle
the robustness
time steps.
still works fine in the :2SGS preconditioner
restriction
than
and Illore of the elliptic
in the 2SComb
properties
contrarily,
approach.
of pressures
had
of the
\Ve remark
for slllall
where
the
coefficients
are
:! I,)
.. ooo~I -
~
9000
-
6000
~I
!5
.~
-
I
'
Newton/2SComb
HOKN/2SGS
"
/ /
.
I
/
I
///
"1
j
///
/
7000'
,,;-
<1>
~
~
::;:
6000
///
/
/
Cl
5000
.!3!,
~ 4000
§
;
z
1
I
/
]I
/ /
t
1
/
,
I
,ooo~
---
::t",/ ,
0.5
1
"""
1.5
2
2.5
3
Simulation in days
35
4
4.5
5
Figure 6.22 Performance in accumulated GMRES iterations of the
HOI\:?'i/2SGS a.nd :'\ewton/2SC'omb solvers after 100 time steps of simulation
with DT = .0,) of a 16 x ,~8 x ,18 problem size on 16 SP2 nodes.
10
- - Newton/2SComb
9
-
HOKN/2SGS
8
7
~
6~
~
5~
i
!
a..
U
/
/
I
:::l
4~
3
i
2
a
0.5
1.5
2
2.5
3
Simulation in days
3.5
4
4.5
5
Figure 6.23 Performance in accumulated CPl: time of the HOK~j2SGS
and ~ewton/:2SC'omb soh'ers after 100 time steps of simulation with
~l = ,n,-) of a 16 x ·18 x -48prohlem size on 16 SP2 nodes.
- -
Newton/2SComb
-
HOKN/2SGS
g
'iii
~
350
rJJ
~ 300
~
al
250
I
n;
"3
E 200
B
l;l
'0150
100
1.5
2
2.5
3
Simulation in days
4.5
4
3.5
5
Figure 6.24 Performance
in accumulated
nonlinear iterations of
HOKN j:2SGS. \ewton/:2SGS
and Xewton/2SComb
solvers after 100 time
steps of simulation
with ~t = .0.5 of a 16 x -t8 x 48 problem size on 16 SP2
nodes.
Despite
timings
this.
of the inexact
:2S( '01111> premlldit
Thf'
G\IRES
HOI\:\j:2SGS
tilP
\ewtonj2SC'omb
HO[\.\/:2SGS
a considerably
to the Newtonj:2SComb
tioner
than
since one G\IRES
the timings
new solver.
longer and ~t
The line correction
This
is larger.
Iime
step
a three-fold
method
size than with
explains
clearly
use in the
<3
x
n :<2 \.
the sa\'in~
long simulation.
FigmE' 6.'2:2 shows that
smaller
of G\IRES
amount
the
iterations
till'
of
111'\\'
compan>d
soh'er.
with the :2SComb
\e\"f'rtheless,
the
for a particular
for a moderately
spends
As hd·ore.
solver.
by almost
iOlwr was more effect.i\'e with this prohlem
prt>\'julls allalysis
iterations
solver still outperforms
iteration
is more f'xpensi\'e
preconditioner
the Figure
of the simulations
margin
howe\·er.
tends
are reduced
to increase
with the 2SGS precondi-
6.2:3 exhibits
a fairer reality.
by more than
a third
as the simulation
with
timE' is
:.!L7
Figure 6.2-1 shows that not onlv linear iterations
reduced. This figure illustrates
but also nonlinear iterations
the effect of using only one Krylov-Broyden
step per
nonlinear iteration .. -'\lthough, the HOK~ does not imply a noticeable speedup
the inexact ~e\\'ton
ill-conditioning
llIethod (due mostly to its limited parallel capabilities
an'
0\"f'1'
and the
of the .Jacobian matrix) its use is still advisable for achieving l)f'ttt~r
material balance.
In most cases. relatl\'e nonlinear residuals are driven closer to thl'
solution than in those cases where the I,rylov-Broyden
belie\'e that the HOK\
effectiveness is attenuated
that acts as a left preconditioner
ill the approximation
preconditioning
of the system.
of the Krylov-Broyden
step was disabled.
\Ye also,
due to the decoupling operation
This introduces
further concerns
update for simultaneous
left and right.
of ,lw .Jacohian mat rix.
Table 6.8
Result.s for the HOI,\, j2SGS and \'ewtonj2SComb
solvers for
different large problem sizes vs, different number of processors of the Intel
Paragon. Exectltion figures are measured a.fter lO days of simulation wit.h
~t = 1 day. CPt' times (T) are measured in minutes, (E) indicates parallel
efficiency. ("')Abnormal efficiency due to paging of the operating system.
\. of P rOCf'ssors
1.,
-+
i Sol \"PI'.
I
III .<: IS :.t1 ~
I
:W
;<
! HOK:\j2SGS
lOa x LOa
I :)0 x lOa x lOa
I
:
:\e\\"tonj2SComh
Sol \'lOll'.
HOK~/2SGS
:\ ewtonj2SComb
Solver.
HOI,~/2SGS
\ewtonj2SComb
I
T
LIL 7:2
,i:l.!J.1
r
.iO..j2
:n:2.-t2
T
99.78
. 5;)";.26
In order to show the capabilities
some tests representing
L6
unknowns)
r
E
O.~;j
0./8
E
0.99
l.;)'; (
'< )
E
l-LV) O,9l
68.I9 I 0.90
T
E
72.8I
l.00
0-t8.7I
l.00
I
T I f:'
:20.-17 I L.UO
(),1.8·1 i L.UO
E
l.00
l.00
of the HOK~j2SGS
six hundred thousand
pressure and concentration
!
at large scale we run
and one million of unknowns (addint?;
on the Intel Paragon.
These problem sizes
are quite
challenging
homogeneous
here.
physical
situation
These results
The
so!\'er
unknown
specti\'ely.
cases
An a\'erage
this short
simulation.
:\ewtonj2SC'omb
preconditioner
that
sol\·er.
from
the deterioration
appretiate
that
HOI\:\j2SCS
timings
l.(i('l.
llleans
increase
more rapidly
On the other
\vith the ~SGS
of the HOK~/2SGS
solver
can be drawn
problem
sizes
of the table.
than
that the HOK:\/2SGS
\\"1"
for the
at the case of 16 x -t8
tht? \ewtonj2SComb
the
seconds.
the first column
looking
than
:-<.
;;01\"1-'('
solver going from lei,
can not be appreciated
·l~
:2,)
for the large ...,
machine.
sol\'er is determined
by its execution
times rela-
It is at least five times as fast as the :\ewton/2SComll
de\'ice.
o.;o!\·er is caused
G~IRES
-to
for increasing
we obserH'
of the parallel
t he line correction
This restart
hancl.
7::?000 unknowlIs).
solver.
it.eration
for the \"ewtonj2SComb
limitations
ifier.! for the outermost
rates.
along
alld
stp\>. rf'-
iteration
of robustness
method
Going
thousand
per time
linear
G~IRES
The aspect
this trend
ti\'e to the \ewtonj2SC'omb
of the
per time step was neeec\ec\ for
and efficiency
approach.
The efficiency of the HOI~:\/2SGS
:\e\\"ton;2SComb
every
r"nfortunately.
cases due to memory
501\'er without
that
robustness
L7:2 and ;).0:3 tinw~ faster than
and :l!i processors.
modeled
the difficulty
lO minutes
case takes approximately
employed.
approach.
increase
the six hundred
·S and
of the line correction
grid blocks (i,e,. more than
is
to further
70 linear it.erations
t.he largest
of processors
of permeabilities)
is. l,)-20 times less accumulated
both
even for the qua~l-
in Table 6.8,
to tlw \ewton/2S('omb
and number
changes
ill approximately
This
for solving
formulation
was able to sol\'e both
of roughly
The table also exhibits
compared
(i.e .. moderate
are compiled
HOK~/2SGS
one million
in a fully implicit
a ~t = L day was specified
:\evertheless.
problem,
to handle
The anomalous
by the high restart
in order
\'alue is reduced
to maintain
to 12 for GNIRES
paging
situation
for till'
\'alue (of -to) that has to be specacceptable
linear
com;ergenCt'
using the 2SGS preconditioner.
:2l !J
~\lthough the deterioration
of line correction affects negatively the performance of
(; \IR ES with the 2SComb preconditioner.
its llse is still justified in these cases due
to the significant savings of floating point operations
introduced.
cpr
Table 6.9
time measured in minutes of a million and six hundred
thousand of unknowns on l6 nodes of the SP2 for La days of simulation with
~t = 1 day.
Soh'er
HOK:\j2SGS
~ewton/2SComb
' :30 x LOax lOa
.jO.49
l·j6.26
;')0 x lOa x lOa
78.24
-t:3.).7,)
The two largest cases were also executed on l6 nodes of the IBM SP2. Timings
obtained
on both machines show that the HOK~/2SGS
and the Newton/2SC'omb
solver perform similarly on :36 nodes of the Intel Paragon and on l6 nodes of the
IB\I
SP2.
The relati\"f.'ly slight time reduction
higher bandwidth
of the latter solver is due to the
and lower latency of the SP2 (in addition to the fact that smaller
('ol11mullication o\'erhead is incurred in a less number of processors).
The reduction
ill t>xf'cution time is about 1.2,)-1..5 fold on l6 nodes of the SP2 compared
to :36 nodt's
of tlw Parag;on,
\otf'
I
hat a similar amount
of dt'~radatioll
is obsf'l'\'ed in the line correct
itlll
method as the problem size increases which denotes once again, that the HOK:\j2SGS
sO!\'er is more robust than the :\ewtonj2SComb
lems.
sO!\'er for solving large scale prob-
Chapter 7
Conclusions and further work
In this research we have proposed a novel way to solve coupled systems of nonlinear
equations
at a lower cost.
fort. on propagating
we have concentrated
the ef-
IIseful I~rylov subspace informat.ion in two consecutive nonlinear
~teps of an inexact \ewton
Krylo\'-Broyden
To achieve this objective.
or inexact quasi-~ewton
method.
updates (or Broyclen updates restricted
a reasonable vehicle to propagate
\Ve have found that
to the Krylov subspace) are
t his information in the form of efficient steps toward
the solution of the nonlinear problem.
Five algorithms
were proposed to solve large scale nonlinear
order \'crsion of \ewton's
method (HOK~ algorithm).
problems:
a higher
a faster version of Broyden's
met hod (nonlinear K E:\' algori thm) which appears as a more efficient variant of t he recent nonlinear Eirola-\e\·anlinna
method (HKS-\).
a hybrid Krylov-secant
a hybrid Krylov-secant
a hybrid I\:rylo\'-secant
gorithms
(:\'E\).
version of \ewton's
version of Broyden's method (HKS-B) and.
version of the nonlinear
KE\'
(HKS-E\).
The first two al-
\t'ad to the least squares solution of two or more minimal approximation
problems (ill a lower dimensional
space) for every C\lRES
of algorithms are rather characterized
instead of the more expensi\'e G\IRES
.-\mong all. only the HKS-E~
use of Richardson iterations
iterati\'e linear solver in every nonlinear cycle,
algorithm combines effectively these two approaches.
We ha\'e observed that explicit
some of the implementation
by the alternative
call. The last threp spts
knowledge of the .Jacobian
of the above algorithms.
in Roating point operations .. -\dditionally,
the lI~e of any desired preconditioner
is not required
for
This intr<?duces further savings
the method can effectively accommodate
whose effect turns out to be hidden (but not
tri\'ially
,,1'1'<\111
separable)
algorithms
prup()~('d
after
seem to adapt
experiments
update.
wP.11to efficient
globalized
have exhibited
in sa\'ing a large amount
attractive
tation
Krylo\'-Broyden
In general.
inexact
our I\:ryl()\'-
.\ewton
method"
lately in the literature.
Computational
rithms
a given
of operations.
for large ,;cale implementations.
in this direction
of Krylov-secant
par and nonlinear
on the .-\moldi
process
e:-;t in exploiting
possihle
analysis
ill Chapter
\'alue theory
symmetric
(see
combinations
experimcn-
problems
[4:4])
This
(arising
linear
Recent
inexact
inter.\ewton
[291.
in
parameters
systems
in tl\l'
paralll-
Richardsull
of nonlinear
considering
seems
with
sound but deserws
for systems
and
and
in [8:3] and briefly discll~"I'd
eigenvalue
in most
iterations
are reported
.Jacobian
problem
however,
and
in lin-
ideas based
the life of useful relaxation
non-symmetric.
point.
that
of Lanczos
relaxation
is heuristically
of predicting
believes
of quasi-.\ewton
is. keep sO!\'ing future
problems
may be derived.
was detected
(1:-;
This is a challenging
of a matrix.
in terms
of Richardson
problems
falls short
strongly
to prolongate
are usually
explored:
optimization
until it fails, This argument
where linpar sy"tems
updates
methods
It is possible
t That
formalization.
algo-
particularly
further
for solving optimization
The author
of the adequacy
in some nonlinear
iteration
methods
ill the arena of unconstrained
H I\:S algorit hms.
!'tel'S
makes them
encourage
need to be further
can be reinterpreted
of BFGS
• Detailed
This feature
\Ve strongly
programming,
Ilew formulations
Illethods
of Krylov-secant
.
We have found t hat some aspects
• The \'iability
the potential
further
equations
that and the pi~en-
redistributions
after
low-rallk
to be more manageable
nonlinear
programming
for
opti-
mization problems) where many results of eigenvalue interlacing theory can bf>
applied.
• Future theory on Krylov-Broyden
updates should be in order. In this dissprta-
tion we have just gi\'en a preliminary
develop the algorithms.
motivation
However. it is necessary to characterize
gence and identify situations
their COll\'er-
were the update may work badly or well. This will
help to cletermine the scope of the Krylov-secant
possihle enhancements
around this idea in order to
algorithms
and come up with
to them.
• Extend the ideas above to other linear iterative solvers. \Ve have used G \IRES
as a framework to develop all Krylov-secant
EirQla-\evanlinna
algorithms.
However. the linear
algorit.hm keeps track of search directions generated
t he process which may be also re-utilized or propagated
sit Ilation seems to be less clear in those algorithms
Ilpon Petrov-Calerkin
approximations.
during
as we did here. The
whose functionality
depends
However. any positive advance in that
direction capahle of handling systems with multiple right hand sides or combining se\'eral low-rank updates
may result into important
extensions
to t hI'
present work.
On the other hane!. the effort of this work has been complemented
analysis of the physical and cOlTPsponding algebraic properties
linear systems associated
flow problems).
with a cardul
coming from coupled
to the .Jacobian matrix (specifically arising in multi-phase
This study leads to the conception
of a new family of precondition-
ers which are basically inexact extensions (i.e .. with blocks solved inexactly)
frequently used hlock preconditioners
but simple decoupling strategy
coupling is a good preconditioner
but, with the peculiarity
of the entire system,
of the
of relying on a strong
It was established
that the de-
hy itself and this. combined with the block solution
or the decoupled
system.
gi\'es rise to an efficient preconditioner
for the original
linear
S \'st em,
Tlwrefore.
these
and general
cient
basis:
iterations
theory
they are easy to implement
and generality
already
developed
inner linear systems.
for indi\'idual
In our particular
the prohlem
The author
vehicle to extend
PDE's
to coupled
parabolic
onal blocks producing
the algebraic
convection-cliffusion
properties
a simp/f'
that
believes
iterative
scenarios.
especial
of full decoupling
make them
linear
enhancements
information
that
of PDE's,
systems
may lead to further
case, the consideration
in a way to concentrate
under
and can afford the use of several effi-
here could be fitted into several
on the physics behind
interpretations.
have been developed
prO\'ides a satisfactory
.\It hough the ideas presel\ted
siderations
preconditioners
to soh'e the resulting
this simplicity
so!\'er
two-stage
COl\and
translated
in two main diagamenable
to efficient
ilIlH'r it erat i\'e so!\·ers.
Our numerical
tiona!
.Jacobi
approaches
\\'1' l'ol\sidpr
addressed
show that
proposed
or banded
illt'X(lct \'ersion
results
preconditioners
of resel'\'oir
(for preconditioning
of the combinative
the
lWW
in the literature
preconditioner
that
the
approach
l'ollowing
(originally
isslws on two-stage
outperform
a fl"w t radi-
ILC(Q).
engineering:
of the entire
system).
de\'eloped
for coupled
precollditioners
hlock
and an
need
linl'ill'
10
lit·
in the future:
• Dynamic
characterization
of the
tolerances
controlling
the
II1ner componel\t
so!\'ers of the precond it ioner.
• Theoretical
analysis
per grid block.
what
propertie;
and extension
It is important
can be exploited
of the preconditioners
to severalunknowl1s
to know what. choice of primary
when solving
large coupled
variables
systems
and
of nonlin-
ear equations.
\Ve have obsen'ed
that some decoupled
t han others,
so this may determine
blocks are easier to so!\'e
the type of linear solvers
to be used wit.hin
the preconditioner.
III general. further computational
ing IJllt need to be e\'aillated
scale.
,\mong
phase.
target
applications
compositional
nonlinearities
urations
and
between
for the pressure
ions)
and
good rf>asons to investigate
industrial
in se\'eral
. III summary.
an'ilS,
tl)pic
1:'\'('11
dehuitely
In contrast
gorithms
Hence,
potential
stand
further
to the two-stage
by thell1se!\'es
experiences
for the solution
equations
force and
itself
The
coming from threedifferent
of t.he reservoir
another
bet\veen
strengths
in
model (i.e .. one
or others
reservoir
more on the preconditioning
and
task,
coupled
t'nconrage
simulation.
and at a larger
for the sat.variables
are
ideas displayed
but also extremely
applications.
is a difficult
though
situations
lllodel is not only challenging
full understanding
linear systelll
\\'('
coupled
Our results are promis-
useful the insights
the coupling
important
llw
physical
or flow dri\'ing
The reservoir
Oll
more stringent
reservoir
t he typical
in t his dissertation.
coupled
are required.
we consider
thermal
potel1tial
or concentrat
,;ufficiently
under
experiences
conception
Only a few experiences
systems
research
of equations
in \'aried
arise
preconditioners
for
have been reported
in many
application
in this direction,
preconditioner~
as general
of efficient
soh-ers
scenarios
of large scale problems.
,;tlldied
lwn-'. the Krylov-secant
for nunlinear
are desirable
"ystems
to calibrate
al-
of equations.
their
great
227
Bibliography
PI
,J, :\ARDE:'\
..\\0
ing coupled
K.
Preconditioned
KARLSSO~.
S,ljstFTlI of fundamental
cg-type
semiconductor
methods
equations,
fo/' ,'..011'-
BIT. 29 (1989).
pp. 916-9:37.
[:2] :\1.
G. BEHlE,
:\LLE~.
medin. in Lectures
[:31 0,
AXELSSO\'.
[-t] 0.
_~XELSSO\
\otes
..\.\iD
Solution
.\hthods.
Verlag,
Cambridge
ituations
1:2 (1991).
and l'aI'iable-8tep
flow
Berlin.
t'niversity
in pOT'OI/S
1988.
conjugate
preconditioning,
199·1:.
Press.
gradient
SIA\I,J.
\Iatrix
Applied
Scienn-'
pp. 62,=)-64,,1:.
Pffrolfllm
Rfsuroir
Siml/lation,
L9~'n,
Publisher.
T. CHAN. \V. COrC;!IR:\\.
B.\\~.
fl/do/'i::ation
(11)89).
Springer
P. VASSILEVSKI...t black box generali::ed
[.')] K. _~Z[Z ..\\D _~. SETHARI.
~Iil R.
.. \fultiphnse
,J, TRA~GENSTEIN
in Engineering.
[/f/'atit'e
.'.ol/'a lcith inllfl'
.-\nal. Appl..
A~D
prorfdure
for
,\\D
The altunate-blod.·
K. S\IITH.
8lj8tf/1/ .... of partial
dijJuelltilll
equntiolls.
BIT.
~!)
pp. !):38-9.i..t,
[7] .J. BAR\ES . .-111 I/Igorithm
method.
Compl\ter
[~] R, BARRETT.
GARRA. \-.
fa/' ,'wl/'illg lIonlinear
eqllations
ba8ed on the 8fCI/I1t
.Journal. S (L%,)). pp. 66-67.
:-'1. BERRY. T.
CHA:'>i.
EIKKHOLTT. R. POZO.
Templatf8
fo/' the solution
ods. SIA\l.
Philadelphia.
.T. DDnIEL
C. RO~lI.\iE,
.
A~D H.
of linear 8!}8teTns: building
1994.
.J. DC:'>iATO, .J. DO\VAN
DER VORST,
blocks for ituatit1e
mffh-
Ul]
Dynamics
.J. BEAR.
of Fillids
[LO] G. BEHlE A\O p, FORSYTH.
.-;imlliation
of enhancEd
in Porous
.\Iedia.
Incompldefactori::ation
oil recovery.
SIAM
Inc, I ~)'::!.
DO\'er Publications.
methods
J, Sci, Statist.
for fully implicit
Comput..
,) (L!)81).
pp. ;,)-!:3-,,)61.
[l L] G, BEHlE
.
.-;lfll11
[12] A.
I (L·IOII
t'
•. ~
JOe. 0
BER~IA\
Blocl.: Ifu'(ltice
t'p et, E ng ....J
R,
A\O
in Classics
.';('ifflas,
[L:3] R.
P. \·[\SO~[E.
.\\0
BHOGES\\'ARA
.
in ,-\pplied
.J. E.
..
I-:ILLOUGH,
\V. C . .JR ..
Domain
Df'composition
Italy.
B./ORST.\O
1/11" pdroltlun
position
Domain
.J. BR.-\~[BLE.
:\Iathematical
decomposition
in computational
and mll/ti-
memory
parallel
pp. 1')_1- 16')
_.
Pamllel
domain
International
Computing.
decompo8itioll
Conference
D. Keyes and.J.
on
~11.
Society.
f)o/TIt1ill "((,o/TIpo,~ition,
in DOlllain-hased
1994.
Philadelphia,
-t (l()9'))
,,_.
for Scientific
K.\HST.\D.
ulginf.f:J'ill!}.
parallelism
science and engineering,
pfl/'Ol/r:/ cornpl/fillil
and problem
decol\l-
0, Keyes. y, Saac\,
eels .. SI.\.\I. U)9-1. pro :39-,')6.
.J. PASCIAI\
1/I.IJ0rithm for 8nddlf
grid .\Iethods,
.\Iethods
T.
rr:,~fl.,.oi/'
in the mathematiccd
on di8tributed
E. GROSSE.
.-\\0
IQq:3. .-\nwrican
A\O
methods
and D. Trllhlar.
[L6]
SIA:\I.
to rOl/pled tl'llllSpOJ't eqllation .... in Se\'enth
I'd~.. ('nllW,
;1\ r,
matrices
ill poroll8 media
.
. clentl 'fi c C"ompllt1l1g.
J 01lrna 1 a t· S'
[U] P. BJ0RSTAO.
I/ppliul
.\'onnf!)atiL'f
:\Iathematics.
!J/'i" SOI/'f/'8 fo/' }lo1l' simlilation
!}/'OCf8801".';;
fully implicit
(·Ll)'·))
. ti_ . pp. 6-'
:)ti- 66"
~,
PLE\DIO\S
A\ 0
lTIethodsfor
L99,).
point
. .\\0
_-\. \-:\SSILEV,
probh:ms.
in Copper
,·lnalysis
.\Iountain
of the ine,ract
Conference
C::nl('f/
on :\IlIlt-
[L 7] P. BROWN, A local conuergence
difference
[L8] -.
projection
A theoretical
.1. Sci, Statist.
[l9]
methods,
thEOry for combined
SIA~l J. ~umer. Ana!., 2-t (1987).
comparison
A\'D
and GJ/RES
of the Arnoldi
Com put.. 20 (l992).
P. BRow:'i
inexact-Newton
-/finitE-
pp. -t07--t:l-L
nlgorithms,
S1.-\\[
pp. ,')8-78.
A. HINDMARSH,
Matrix-free
methods
for
stiff
sysfF.ms
of
nonlinear
systems
of
with lOll' mnk
up-
ODE's, SI:-\\I J. :'\umer. Ana!., :23 (1986), pp. 610-6:38.
[20] p,
BROWN A\'D
fquntions.
[2L]
Y.
Private
[:t.2] P. ~.
S,-\AD. A:'iD
thE solution
[:n] P, ~.
of lrzrge-8cale
[2,5] -,
BROYDE\'.
.-\\'D
(26] -.
II (l990).
H. \V.-\LKER.
pp. -t,)0-4-81.
Preconditioning
199,),
differential-algebrnic
Y. S.-\AD. lOT/I'ergellce
Csing l\rylot'
methods
ill
S1.-\:\I J. Sci. Comput..
-"ystems.
thwry
of IIOllliT/enl' S£:Iu'oll-/\'/,ylol'
J, Opt.im .. ,~ (l99-1), pp. 297-:3:30.
A cla"'8 of metho({·"fo/,
ics of Computation.
A lieu' me/hod
.Journal.
Comput..
for
pp. L 167--L·l88.
BRo\\'\'
\[athemat
I\rylo1J methods
A. HINDMARSH. A:-;D L. PETZOLD.
,1Igori/I1111,."SI.-\\I
[:2q C,
Hybrid
Communication.
BROW\'.
L,) (L99-i).
SAAD,
51.-\:\1.1. Sci. Statist.
P. BRO\VN.
dntes.
Y.
for soh'illg
$o[,.ill.'J lIolllilltnr
L~) ( L%,)). pp .
Ilonlinerzr
.j'jj
...imliltanfOu8
equl//ioll ....,
-,)9:3.
. ,imllltanf.OllS
equations.
Computing
1:2 (L l)()q). pp. !)-i-99.
The COllt'f/'gfllce
Computation.
2~ (l970).
of .'Jingle-rank
pp. :36.5-:382.
quasi-Sewton
mEthods.
:\lathematics
of
[Vii
.T.
DE~:\IS
K. Tl'RNER,
;\:\0
.\ppl.. ~8 (1()K7),
[U'i] .T.
DE:\:\IS
IIprlntes
[l71 -.
\\"ALKER.
.Jersey.
'';;l)(!r'8E ,<;fcallt
,\Ilal.. :2:2 (l98,)).
\1l111er,
R. B.
.\:\0
([nd
1I0nlinwr
R.
DF.l'Fl.II.\RD.
/lpdllte.;
I
Local
improvunent
in
thwnm.<;.
Programming
at On>r-
Irith
inaccllmlt
8fcan!
conditiolls.
pp. 760-778 .
•. Vllmerical
methods
Prentice-Hall.
eqllatiol/8.
FREr\'o.
The IT/Ilthunatic$
E/)\\'I.\(;.
it,.;
.\ppl..
EISE:\ST.-\T,
IIOll8,1)mmtlric
pp. :34;')-:3:37 .
:2 (L!)C)Ol.
for
unconstrained
Englewood
lilltl/I'
Fa..:;t Sfcant
.-;ystfll/S.
Cliffs.
\'ew
:\E\'.-\\L1:\\,\,
method..;; for
I:\[P,-\CT
Ilu
of C'omplltill!!;
pp. 2·14-]7() .
of n ...
rl'l'nil' ...imllllllioll.
Sl.\:\I. Philadelphia,
[.')]] T, EIROLA ,\:\D O.
.\ig. alld
..-\. \\·ALTER.
,\:\D
of IrIl"!Je non'':'-'jl1llnf:!l'u:
.-ollliion
:\Iatlwmatic<;.
for
,.;;e ("(I II
pp. 949-987.
:22: :\Iathematical
SCH~ABEL
ill Scil>lIc(' (lnd Engineering.
[,):~] S.
l("nst-change
Lq~n.
,h/'lllil'l
.-~~n,
for
198-1:.
f.t(I ...t-chllll.t}f
optimi::atioll
methods:
Study
Programming
L,l9] .T. E. DE:\:\IS
theorem8
COIII'UgfIlC€
in II/lftsi-,Yell'ton
\orth-Holland,
SIA:\I .1,
Linear .-\Ig. and
direction8.
SI.-\:\[ .1. ~umer. ,-\naL. 18 (l98l).
[nncc/lmcy
[L8] -.
'."in) p,
H.
I1Itfho(!s.
\\'olfach.
conju!}nte
pp. 187-209.
.\:\D
:\Iathematica\
Genun!i::ed
ill frontier~
in :\pplit·.[
l~)S:L
,\cctlunting
wilh
mnk-one
updnte8.
Linear
LlL l L~)S9). pp, ·)Ll-;,)20.
H. EUI..\:"i . .-\:\D :\1. SCHFLTZ.
..;;ystems of linear
fqllations,
\"nl'iational
SIA:\[.J.
iterative
mf:/hod ...
\"umer. Anal.. 20 (l98:3),
[,j..!j
mtthod",.
Tech. Rep. \lASC
S. EISE:"iSTAT .-\:'\0
[.56] -,
Choosing
('ampuL
[.57] H,
H.
J. Optimization.
SIA\[
A\D
G.
ELS:"iER A:'\D
systems
H.
EIl.:"i.
V.
terms
('omp1\t..
l,)
P.
F.\:'\.
fa/' itfmti,.f
[6l]
in an inexact
Inuact
GaLl'S.
S1:\\l.1,
inexact
Newton
Newton
Rin~
m ffh ad ....
S1.-\\I .J. Sci.
method.
and preconditioned
:'-rumer. Anal..:Jl
Convergence
in the numerical
U::awa algorithms
(1994). pp. L64.5-1661.
of block iterative
solution
of Euler
methods
equations.
for
:"J'umer.
pp. ;')-tl-,j.59.
GIO\'A\GIGLI.
lif/tar
(l996).
Globally convergent
V. \IEHR\IA:"i:'-l,
llrising
!Jol'ilhmic
[60] Q.
Sciences.
-t (L99-t). pp. :39:3-422.
the forcing
\Iath .. ;')9 (l991).
[591 A.
qltasi-.\'f1L'fOf/
l7 (l996). pp. l6-:J2.
EL\IA:"i
linear
TR 82-7. Dept. of \lathematical
\VALKER.
for sllddle point problems,
[.581 L.
of ine,ract
L982.
l"niversity.
[;').5]
T. STEIHAFG. Local analysis
S. EISE:'lSTAT AND
system
D.
KE'{ES, A:"iD
\1. D.
..W/t'fI'8 for I/onlinear
S\IOOKE.
dliptic
Towards
problems.
pol!}nl-
SIA\1 .J. Sci.
(1~)9-t). pp. 68L-70:L
FORSYTH ..
J. \Ic\L\CKE:'>i.
801L'fI's in duiCf
. .;i/lli/lation.
A\D
\V.
S1:\\[
T.-\\G,
Performallc(
.1. Sci. Statist.
I,~,";I/(,';
Comput..
I,
pp. lOO-lL7,
C. FARHAT.
L.
CRIVELLI.
liVE ....olvEI's to multiple
Raux. £,l'tfnding
load and repeated analyses.
Center for Space Structure
Colorado.
A:'>iDF.
substructure
Tech. Rep. CU-CSSC-9:3-L 7.
and Controls. College of Enginnering,
Boulder. Colorado . .1uly 199:3,
based itfrll-
ljniversity of
:.!ll
[fil] B.
FISHER
.-\:\0
,I. Appro\:.
[ti:3] p,
R,
Theory.
FORSYTH
G;,)(L99l).
iturztions
[6;')] R,
simulation
Qua..-i-I.,t1'l/d
FREr:\O.
FREl"\O.
G.
WI' systEm..-.
GOlXB
Practical
.
considerations
and their
. .-\\0
for
adaptit'f
ill/plicit
~I.
:\,
Ilse in non-Hermitian
olllt,.i,/,
Math .. -!~ (1992), pp. l:3,=)-1.18.
and Applied
\umerica.
in ,\da
I/ot allL'([y,,;; optill/lIl.
(1I'f:
.1. of Compo Physics. 62 (1986), pp, 26;')-281.
polynomials
.1. Computational
.
polYllomill18
pp. 161-172.
P. SA~I\IO:\'.
.-\:\0
/TlF.thods in rr,,:;ul"oir
[6-t] R.
Chfby.,dH'I'
FREl":\O.
Iterative
:-;ACHTIGAL.
Cambridge
[niversity
80lution
of IiI/-
Press, ~ew York. L!)ql.
pp. Yi -lOa.
[Go] R.
FREC:\O
.-\\0
I.ahorilt\)rip~,
[h,] D. G.-\y.
:1),'''1]
,)omf
(,()lIl·f.l"gence
Lh IIIJI~»).
II ',"i 11/'
R.
thofl/cith
for
GILL.
indtji'llih
Il992).
[,0]
\\",
SCH\.-\BEL.
rep .. AT&T
proputif."
interior-point
:\umerical
algorithm
,\nalysis
of Broydfn',,;;
Soltillg
pmjfcftflllprla/f,o.;.
for
:\Ianuscript.
SIA\I .1.
mf:fhod.
:':;Y8ttm8 of 1I0l/linUlr
in \onlinear
D. PO:\CELEO:\,
."i.IJ."tU1/,';([rising
in optimi::rztioll.
tf[llrztiOIlS
Programming:L
Robinson, I'd~.. ,\cademic
~IrRR.-\Y.
80ft.·-
Bell
:\lllller.
by
!3/'O.'I-
O. \Ian2;Cl~ar-
Pr('~!'. \.Y .. L!),8, pp. 2-!,,),:'!."L.
.-\\D
~I.
SI.\\[
S.-\l'\OERS,
.J, \Iatrix
rfconditiollu',"i
Anal. .-\ppl..
L:~
pp,l!J:!-:nL.
H.
R, GLO\\T\Sh:1.
,fitllt
Q.\IR-based
pp. 62:3-6:30.
idn. R. \[eyf'l'. and S,
[()!)] P.
A
Hill. \.1. l'l!.);),
\[Ilrray
0, G,\y ,\:\D
,h
,J.-\RRE.
porofJrrzm..-. tech.
illY !il/u/r
\nal..
F.
mtfho,ff;
1,l1h. Siam
for
KELLER
tht
ltrzst
.. -\\D
,':iq/Ill/·f.';
.J Sci. St at. ('omput..
L.
REI~HART.
,"iollltion
Continuation-coT/jugatt
of nOlllillwr
~ ( L!J8,i). pp. ,9:3-8:3:3.
bOllndary
/'flillf
gl'n.pm/)-
[71] G. GOLUB A~D C. V. LOA.~. ,\Iatri.r Computations,
John Hopkins t'niversily
Press, 1989.
[72]
G.
GOLrB
Richardson
(l988).
[7:3]
~I.
:\\D
The convergence
OVERTO~.
itf./'{/tiL'f
mflhods
fOl' sol ring linear
of ine.ract
systems,
Cheb!Jshu
lind
:\Iath .. .j:l
Numer.
pp. ;j7l-;,)9:3.
.J.
S. GO~l EZ A~ D
G.\IRES
and ORTHO'\l1,V
\lathematics
Performance
~IORALES.
of Chebyshev
on a set of oil reservoir
for Large Scale Computing,
iterative
simulation
method.
problems.
in
In J.e, Diaz, New York, Basel, 1989.
pp, :26.)-:295.
[7t]
\\'.
Itt-ratite
HAcKBrsH.
plied ~lathematical
[7,,)] R.
HA~B'\r".
D.
I'tgatf:ll itFmtil't
of Large Sparse
Science. Springer-Verlag,
SILVESTER, A:'-iD
.J.
8ollLtion ttchniquu5
TWH-:216, t'nin'rsity
[7b] \'.
Soll/tion
A\D
111 fTl I
,,/gol'ithll1,-:; for the Tlllllluicf/1
sollLtion
[Jroblt
f1/8,
[77] ~I.
I. .1. for :'\umer.
HE1\KE\SCHLOSS
and Applied
[78] ~I.
HOLST.
L. VICE\TE.
"PQ algorithm8.
intErior-point
\Iathematics,
,\
flin!] ("qllation8.
of coupled
and
8fg-
swirling flow, Tech. Rep.
1.
H:\SBA\I.
Segrf!]alfd
finitF
of l"I'gt- ...cah incompl'fs8iblt
t!f-
.llolt'
Analysis
of ine.ract
trust-rf.gioll
Tech. Rep. TR9,5-18. Dept. of Computational
Rice l"niversity.
/'Obl/st and e.ljicifllt
L!l9..t:.
A comparison
ill Fluids .. L7 (l~)9:3), pp. :l2:3-:3..t:~.
\leth.
A:'-iD
,-\p-
199-t.
E\GEUI.\\.
HAHOI"TL"\IA\.
of Equation8.
1994.
for incompressible
of \Ianchester.
~I.
CHEW.
Systems
numerical
19!)'=).
method for nonlinear
proteill
/Hod-
['jl)] C.
ill ,\pplied
[80] .T.
Ituntit'f
KELLEY.
methods
\Iathematics.
I-:ILLOl"GH A:\D
il/l'f:8ligatioll
for linear
SIA\L
Philadelphia,
\1. \VHEELER. Parallel
of dO/lll/il/ decomposition
SPE Symposium
on Reser\"oir
eqliation,.;;. in Frolltiers
and nonlinear
199.5.
iteratil'f
linea/' equation
solvers fo/' /'fst/'I'oir
Simulation,
SPE paper
SO!vt/'8: ,-1/1
,;;imlliation.
in \inth
no. L60:21. San .\ntonio.
Texas. 198'j.
[8L] K ..JBILOC
H.
A:\D
.,;;ol1-ing systnl18
[82] H. KLlE.
of iiI/wI'
1.
R.-\\IE:
-,.'/"1011";;of /lonlil/wr
cllld ,\pplied
[~I] -.
\[ocleling
.. -\\D
"llel Compl\tation.
Group
Rice ("ni\'f~rsity,
[S.} -.
method..;; for
A:\D
mllltiphasE
.. \ugust
19!H.
for 8011'in.'7
L99,5.
Russia.
Tchebychu
Center
ill
/TIll
of Research
It i-ph
prorfdllrE
itUfltio/l.
\umer.
for
1/
""t
on P,1I'-
L')96.
SOlliE problem..;; ill the thwr!)
Thf
111 dll or/.",
and applications
of itemti/.'f.:
II/(
th-
L969.
ituatioll
fo/' I/o/lsymmtfl'ic
!inEn/' "'Y8tfTl1,";.
.. :28 \ 11)11). pp, :3O'j-:32/.
Adaptit't
Tchtbychu
DAWSO~.
.-\ffiliates \leeting
Tech. Rep. C'RPC-TR%6-11.
Rice Cniwrsity.
\IA\TEl"FFEL.
\ tuner. \Iath
.\"fleton-I\'rylol'
Industrial
C.
So l'C'IE ,
S.
\1. \\"HEELER. l\'rylov- . ;;ecant mtlhods
ods. PhD thesis . .\O\·osibirsk,
[So) T,
fo/'
\Iath .. (199,5), pp. 73-89.
C.
RA\IE,
metllOr!. ..' /01'
extrapolation
p ,.u:o 1/ dit io 1/ f ,.." fo,. iI/ f ,ract Sf letO II
/'t,.,;t/,/,oi,. . ,;imlliatioll.
ETSOV.
)i umer.
rector
f:qllatio/l";;. Tech. Rep. TR9,5-:27. Dept. of Computatiollal
\Iathematics.
T I/'o-,.,f agf
[8,5] Y. Krz:\
\1.
P/'fconditionfl's
flolL' .... ill Suhsurfacl':'
[8:3] H, KLlE. \1.
eqliations.
P.-\\'ARI\,O.
\1. "'HEELER.
Allnl!}si..;; of some
SADOh:.
t81il1lflting
parameters
~Iath .. :31 (L978).
pp,
for
the
l8:3-208.
nonsymmf/I'I('
Methods
[88] G, MARCHUK,
ics, Springer-Verlag,
[89]
.J.
of Numerical
Applications
of \-Iathemat-
197.5.
Thwry
)'IARTi~EZ.
Jlathematics,
of 8ecrlT/I preconditioners.
~lath.
of Computation,
bO
(199:3). pp. 699-,18.
SOR-8ec(Jllt
[90] -.
[9l]
C, \IATTAX
Series,
[92]
.J,
methods.
A:'iD R.
Richardson.
),IORE,
S1:-\)'I.1.
~umer.
Reservoir
DALTON.
Ana\.,:31
Simulation,
pp, 21,-226.
(l994),
vol. 1:3, SPE-Monograph
TX, 1990.
.-l colltction
of nonlinHl/' problems,
ics, Vol. 26. E. Allgower
and K. Georg.
in Lectures
eds .. American
in Applied
~lathe\1lat-
~'lathematical
Socif'ty.
L990. pp. ,:2:3-,62.
[9:3] R.
ear
~ABBE:-;
.
Algebra.
.-t new application for generali:ed
proceedings
L. Reichel.
Computing.
of
t
he conference
A, Ruttan.
and
JI-matrices,
in Linear
in Numerical
Algebra
and
R. Varga, eds .. Walter
Lin-
Scient ific
de Gruyter.
L~H):3.
(!l-tJ :'\.
:'\,\CHTIGAL.
L, REICHEL
rithlll for non8ymmtfric
,\\0
L. TREFETHE:\
linea/' ,"!J8tflll.... SL-\~I.J.
.
~Iatrix
.-l hybrid
G.\IRES
,--\nal. .--\ppl.. 1:3
Ill.tjo-
([<)<)'2),
pp. ,96--8:2,),
[9.3] S.
~ASH.
Newton-type
Anal.. 21 (198·l).
[96] -.
minimi:ation
ria the Lanc:os
method,
S1A\1 .1. \um.
pp. ,70-778.
Preconditioning
of truncated-Sell·ton
put.. 6 (l98,3). pp. ;399-616.
methods.
SIA~l .1. Sci, Stat. ('om-
[9T] S.
:\'ASH
IIItlhod
,1.
.\~D
:\"OCEOAL.
alld Ilu: trlll/cated
Optim .. 1 (L99l\.
[98] S. :\.\SH
A:\D
A nUT1Hrical stlldy
.\"fll·ton
mffhod
for
of Ihe limited
BFGS'
memory
large 8cale problem..,.
51.-\:\1
.J,
pro :3;');3-:3T2,
_-\. SOFER.
[illwr
alld .vontinear
programming,
:\lcGra\\"- Hill.
L996.
[99] 0,
\Iathematics.
[LOO] :\.
.T,
Birkhallser
fqllalion8.
ThfOry
:\OCEOAL.
.T.
:\OLE\
.-\:\0
....tmi-i/TIplirif
[l1):3]
D.
\'er!ag, Basel. 199:3.
~;):3- :!()(j,
B.
:\Ot"R-O\IID.
for ,...Olilfioll
of tlco-stage
of algorithm8
itr:ratit'e
for
processes
for solrill!}
pp. -t60--l:69 .
uncon.strained
optimi::ation,
in Acta
University Press. ~ew York. 1991, pp. 199-2-I:2.
BERRY,
n ....
u/'Oi,.
pp.
in Lectures ill
for tineal' equations.
.}, \lImer .. -\nal.. lO (l97:1),
51.-\\1
\ umerica, Cambridge
[lO:2]
of iterations
Oil Iflr: COIINrgf/lcr:
:\ICHOLS.
linwr
[LOL1
COll/'ugellrf
:\EVA:\LI:\:\A.
nst
,..imulalioll
B.
Itchlliquf8.
PARLETT.
of 1I01llill.U/I' }illilf
on the 8lability
,\\0
(IUlIf
R.
/II
and time-8tep
Trans.
TAYLOH
.
fquatioll,';;.
8fn8itirity
SPE of .-\C\IE.
:2,j:3 (L9T:3).
.-l .\"elL'lon-Lallc::o.,
Computers
of
/TIt//IOt!
and Strllct
111'1'''';,
If) (L~)8:3). pp. 2-tl-~,,)2,
[LO-l:j D.
The bloc/..' conjugale
O·LEARY.
ear .-\lg, alld ,-\ppl..
[lO.s] .T,
111
[l06]
D.
ORTEGA
Sutl'al
.\\0
(UJ8L).
\\".
\ (/I'/(/b!f8.
Ol"ELLETTE.
:2~)
algorithm
and relatEd mEthdo.", Lill-
pp. 29:3-:322.
RHEI:\BOLOT.
ltf:l'atiL'e Solution
of .vonlinwr
Equatio/l
...
:\cademic Press. ~ew '{ork. 1970.
Schul'
pp, LST-29.S.
(1980).
gradient
complEments
and statistics.
Linear
.-\lg. and .-\ppl.. :3-1
[LOi] B.
[lOS]
PARLETT.
A new look at the Lanc::os algorithm for solving symmetric
tuns of linear equations.
Linear Alg. and App\.. 29 (l980).
:vI.
A.'{DH.
PER.'{ICE.
L.
tial differential
ZHOU.
equations
Ctah Supercomputing
[109] L. REICHEL.
The application
[110J Y.
SAAD,
sHeral
[lll] -
methods,
mffhod
of Computation,
iteration
linear systems
Operator
ods. \L Kaashoek . .1. \'am Schuppen.
u'ith
-t8 (1987), pp, 6.j1-662.
. .-tn ot'erview of Krylov subspace methods with applications
Scattering,
and poly-
(1991), pp. :389-4l-1:.
for solving symmetric
\[athematics
lems. in Signal Processing,
par-
L994.
of Leja points to Richardson
On the Lanc::os
of nonlinear
Tech. Rep. TR-l8:....9,L
Linear Alg. and Appl.. rJ-t-l56
right-hand-,.,ides.
pp, :323-:J..l6.
Parallei solution
using ine.ract Newton
Institute.
Ilomial preconditioning,
\VALKER.
."!I:;;-
to control prob-
Theory and Numerical :v[eth-
and A. Ran. eds .. Birkhauser.
1990.
pp. -Wl--ll0.
[l1:2] -.
A ./ie.l'ible inner-outer
CompuL
[11:3] -.
preconditioned
G.\fRES
algorithm.
SIA:vI .1. Sci.
14 (199:3). pp, -l61--t69.
!tuatire
,\ltfhods
for Sparse Linear Sy . ,tems, P\VS Publishing
Company.
L996.
[ll-t] Y. SAAD A:\D:VI.
SCHCLTZ.
for soluing nonsymmtf,.ic
GJlRES:
A gellf:l'ali::fd minimal
linear systems,
residual algorithm
S1:\:\I.1. Sci. Stat. CompuL
i (1986),
pp. 8.56-869.
[11;3] T. SAATY, ,\lodem
nonlinear'
equations,
Dover, 1981.
[116] A. A. SA~IARSKIl .-\.'{D E. :\'IKOLAEV .. Vumerical
vol. II: Iterative
\Iethods.
Birkhauser
J,Iethods for Grid EquatioN:';,
Verlag. 1989,
[117] P. SAYLOR A\O D, S~IOLARShl.
R ichard ..wn ....(1Igol';thm.
[Ll8] V.
(1967), pp. l:n-L:~8,
til'e
A\O
,,,clHlIlf,'
liplt
.-l
/If II'
approach
proctl/llrt.
Computing.
A\O
in Sixth
E. G.,\LLOPOrLOS,
to construction
An ituatiue
.\ lIIfmory-con$fl'l'ing
hybnd
method
of ejjiciwt
,I,
/'11-
CG and BiCe mftho,{,""
processand
SIAM
method
for
.J. Sci. Statist.
non..,ymITu:I.
Comput..
l().
-!J:n.
for solring/infar
..,ide.". Tech. Rep. CSRD-L10:3.
Oe\'elopment.
('0/11'(
(-nin>l'sity
of Illinois.
Center
8ystuns
with
for Supercomputing
l.'rbana-Champaign.
/11111-
He-
Feb. l!)l):!,
1!Jf.IICf pmp" rtiEs of hloe/.· (;,\{ RES for :,ol,.ifl.r/ ,..y,.,.tf 1Tl8 with "'IIII,/il/
.....Tech. Rep. ('SHU
Clnd De\·elopment.
C, S~IITH . .-\,
T"lli\'ersity
PETERSO\
. .-\\0
of 11/llltiplt
Prop .. :~7 (lq~9).
pp. l·l90-Lt9:!'
SOHE\~E\.
Sf/clan·,.,.
l:\l(j.
uf Illinois.
fOf' Ihf l/'fl/tlllfllt
SI.-\~l.J.
Zh .. l~)
L99:3. pp. ,)7-69.
sides,
[12-1:] D, C,
~Iat.
D. Keyes. :\1. Leuze, L. Petzold.
R. Sincovec,
right-hlwd
riyhl·/IIII/lI.,idf.
[L2:3]
Ukran.
SL\~I Conf. in Parallel
riyhl-h(wd
-.
.
wilh multiplt
"t'alTh alld
L 1:2:2]
method,
bloc/': ,\rnoldi
[L20] \". SnIO\C(\1
[In] -,
1,)-t-l,)6 (1991). pp. 61:,)--(db.
Fariable
D. Reed. eds .. SIA~I.
!) l7
(dgol'illlJl/ /01'
In Russian.
.-\, YERDII\
ing for Scientific
pp.
of .Vewton's
of an rzdaptil'f
fol' I/II/."",,'/'(I.'} pI/mile/algorithms:
I/nd l'(lriabh
ric -"'ys/fm8
Alg. and Appl..
A modification
SHA~IA\Shll.
[I L91 H. SnIO\
Linear
fmplunentation
('ellll'r
l"rbana-C'hampaign,
R, :\IITTRA.
i/lcilhlll
rnflhorl
fur SlIpt'l'colll(>lIting
A conjugate
tltctroT1!l/gnetic
Oct. L99:~.
g/'lldient
algof'ilhm
fie.ld8. IEEE Trans.
l.cith rz model trust
\lllner .. -\nal.. UJ (L!)82), pp. -1:09--1:26,
Ht'"I'(\1'l'11
,\111.
region modificl//llIlI.
[l2,5] G.
STARKE
AND
R.
A hybrid Arnoldi-Faber
VARGA.
8ymmetT'ic SY8tems of linear equations.
[126]
T.
tion methods.
[127] 1.
Local and superlinear
STEIHAUG.
ylathematical
A:'-oiD \V.
TAGGART
niques
in reservoir
:\Iath .. 64 (1993),
convergence
for truncated
27 (1983),
SPE
Reservoir
pp. 2t:3-:!tO.
iterated projec-
pp. 176-190.
The use of higher-order
PlNCZEWSKl,
method for flon-
~umer.
Programming,
simulation.
iterative
differencing
Engineering,
(August
If.ch1987).
pp. :360-:372.
[L28] R.
TEIGL.-\N'D
media
.J.
[L:30]
L.
Cell centered
FLAD\L\RK.
methods
Birkhauser
ItuatiL'e
TRAUB.
glewood
G.
in :\Iultigrid
flOIL',
grid Conference.
[129]
A:'\D
III : proceedings
Verlag,
mulligrid
methods
in pOrolL.~
of the :3rd European
\Iulti-
1991.
methods for the solution
of equations,
Prentice
Hall, En-
Cliffs. 1964.
TREFETHEN'.
P'jflldospectra
fiths and G. Watson,
eds ..
of matrices,
in ~lImerical
l:3. Longman
\'01.
Scientific
Analysis
91. D. Grif-
and Technical,
1992,
pp. 778-79,).
[l:3l] S, Tr
REK,
u/,wtions.
[t:32] H.
In preparation.
for the solution
Compllt..
L:3 (l99:2).
VAN' DER VORST
.\Iethods,
mEfhods forthf
incompre88ible
,VaL·ier-Stokt.j
1994.
BICGSTAB:
\'A:'-oi DER VORST.
BI-CG
[1:3:3] H.
On discrete projection
a fast and smoothly
of nonsymmetric
linear
convergent
variant
of
SIA:\l .J. Sci. Stat.
systems.
pp. 6:31-6-t4.
A~D
C.
Tech. Rep. TR91-80.
VUlK,
GMRESR:
Technological
A Family
Cniversity
of Ne.jted c;,llRES
of Delft.
1991.
~I:~~] C, \·Uh:. FI/rthu
il"al t'nin'rsity
[l:nj C.
\'C[[~
[L:W] H.
[l:ri] -.
\YALLIS.
nf'ser\'Oir
Incomplete
(,'.\IRES
gl'flditTlt
SiIlllllation.
\YEISF.R
....for
,\~D
SPE pap<-,r
9 (l988).
method.
using householder
f/'{/II,";-
pp. L')2-16:3.
Computer
Phys\cs Commnllilli-
Technil'id
1l0erS,
S P [ paper
[Ln] L, \\'IGTO\.
dYlIlIlT/ie,o;
flO,
SPE Symposium
L2:W-i. San Francisco.
linf
Communication.
,..ucCfsi/'e
fIlUt!/j8i, ... Soc. of Pet. Eng ..
probltm:;.
,\~D
,\nnual
as a preconditioning
in Seventh
Pri\'ate
\1. \YIIEEI.F.R.
flliptie
.1. \YHEELEIl
jil/id
mtlhod
f!iminafion
aCCf!tmfion.
Two· ....
11 p !J/'Il'ol/'{itiollil/g,
f, I'Inl'l
G.\IRES
Computing.
gaussian
!J/'Ob!f1I1,"; II thforftical
[l-l-lj
of fhf
.1. "",\TTS . .-1 method of improl'il/fJ
_-\,
ITIffh-
·'):3i 19~9). pp. :311-:320,
i::fd conjugaff.
~Il t;
of some G;\IRES-like
L60 ([fJ92). pp. 1:H-L62.
of Scientific
of flif
A compari.'wn
VORST.
ImplffllfntatioT/s
Implu1Ifntafiol/s
:I:~!)] -
1992.
VA:'-1 DER
.... .Jollrnal
cations.
[110]
H.
Tech. Rep. TR!J2--L:2. TI~chll()II)!!;-
with C,\IRESR,
.\Ig, and its ,-\ppl..
\YALh:ER.
formation
[l:~~] .1.
of Delft.
.-\~D
orf,,,. Linear
f,l'ptrienn.,,;
R.
S~"TH.
Conference
no. L9801. Sail ,\ntonio.
D, Yl'. A~D :\. YOl'~G,
f'Of!t,~,
in Proceedings
199:3,
in ani8ofmpi,.
J.. (L07:3). pp. LO,~-118,
of b!ocJ.:-Cfllfufd
Simulation
and Exhibition
G.\IRES
./inih
IliF-
pp. :3;')1:\7·),
on a hypercube.
of the Society
Texas.
19tn.
Texas.
,I. \Ull1er. ,\nal.. :(') (Lt)~~).
Rf:-;fl'l'oil'
gfn(:/'{/I-
on :\umerical
orerrda.ration
011 ("Ol/I'ugfnf'(
~I.\:\l
for
of Petroleum
in ()-~th
Engi-
1089.
flcct/eration
L!)b;'),-\1.-\.-\ Conference,
of complltfltio/lol
Del1\"er. CO. Iq',-l.
[l-1-t]
of preconditioned
iterative
PhD thesis, Dept. of Computer
Science,
1]. YANG,
tUTl8.
Champaign.
[l-t,)] D, YOC:'\G.
[l-t6] S. ZE:-iG.
A family
"olvas
r niversity
fteratilJe Solution
C. VeIn,
AND
equations
of Large Linear Systems.
P. \VESSELING,
Solution
in generaL coordinates
grid method8. Tech. Rep. TR~):3-6-t, Technological
Z.
ZORA:'-';
linear
S.1/8-
of Illinois. ("rbana-
199.5.
,Yarler-Stokf8
[l-li]
for sparse
AND
X.
SHEN.
Academic Press, 1971.
of the incomprE88iblf
by I{ryLov subspace
university
and fT/lllti-
of Delft, 199:3.
Pnrallel aLgorithms for optimaL controL of large scale
linea,. sY8tem8. Springer- Verlag, 1993.
Glossary
2SAdd
Two-stage addi t i\·e.
2SBJ
Two-stage block .Jacobi,
2SComb
Two-stage Combinative.
2SDP
Two-stage discrete projection.
2SGS
Two-stage block Gauss Seidel.
2SMuit
Two-stage multiplicative.
ABF
.-\lternate
EN
Eirola-:\ evanlinna.
HKS
Hybrid Krylov-secant.
HKS-B
Hybrid Krylov-secant
based on Broyden's method.
HKS-EN
Hybrid Krylov-secant
based on the Eirola-Nevanlinna
HKS-N
Hybrid Krylov-secant
based on :'-iewton's method,
HOKN
Higher-order
Il\IPES
Implicit press1Il'es-explicit saturations.
KEN
I\:rylov- Eirola- \ f'\'anlinna.
MHGwIRES
:\[odified hybrid G:\IRES.
NEN
\onlinear
block factorization.
1-':
ry 10\'- \ ewt on .
Eirola-~e\'anlinna.
algorithm.