presentation - PMAA`06

Transcription

presentation - PMAA`06
Outline
Numerical experiments with additive Schwarz
preconditioner for non-overlapping domain
decomposition in 3D
Azzam Haidar
CERFACS, Toulouse
joint work with
Luc Giraud (N7-IRIT, France) and Shane Mulligan (Dublin Institute of Technology, Ireland)
4th International Workshop on Parallel Matrix Algorithms and Applications,
September 7-9, 2006, IRISA, Rennes, France
1/21
Numerical experiments with additive Schwaz preconditioner
Outline
Outline
1
General Framework
2
Algebraic Additive Schwarz preconditioner
Structure of the Local Schur Complement
Description of the preconditioner
Variant of Additive Shwarz preconditioner MAS
MAS v.s. Neumann-Neumann
3
Parallel numerical experiments
Numerical scalability
Parallel performance
4
Prospectives
2/21
Numerical experiments with additive Schwaz preconditioner
General Framework
Algebraic Additive Schwarz preconditioner
Parallel numerical experiments
Prospectives
Outline
1
General Framework
2
Algebraic Additive Schwarz preconditioner
Structure of the Local Schur Complement
Description of the preconditioner
Variant of Additive Shwarz preconditioner MAS
MAS v.s. Neumann-Neumann
3
Parallel numerical experiments
Numerical scalability
Parallel performance
4
Prospectives
3/21
Numerical experiments with additive Schwaz preconditioner
General Framework
Algebraic Additive Schwarz preconditioner
Parallel numerical experiments
Prospectives
Background
The PDE
8
< −div (K .∇u)
u
:
(K ∇u, n)
=
=
=
f
0
0
in
on
on
Ω
∂ΩDirichlet
∂ΩNeumann
The associated linear system
0
A11
Au = f ≡ @AT1Γ
0
A1Γ
(2)
+ AΓ
A2Γ
(1)
AΓ
4/21
10 1 0 1
0
u1
f1
AT2Γ A @u2 A = @f2 A
uΓ
fΓ
A22
Numerical experiments with additive Schwaz preconditioner
Background
Algebraic splitting and block Gaussian elimination:
AI1 I1
B ..
B .
B
@ 0
A Γ 1 I1
0
SuΓ =
...
..
.
...
...
N
X
A IN IN
A Γ N IN
!
RΓTi S (i) RΓi
uΓ = fΓ −
i=1
where
N sub-domains case
1
uI1
f I1
AI1 Γ1
.. C B .. C B .. C
B C B C
. C
CB . C = B . C
A
AIN ΓN @uIN A @fIN A
uΓ
fΓ
AΓΓ
0
..
.
10
1
N
X
0
RΓTi AΓi Ii A−1
Ii Ii fIi
i=1
S (i) = AΓi Γi − AΓi Ii A−1
Ii Ii AIi Γi
(i)
Spectral properties for elliptic PDE’s
κ(A) = O(h−2 )
||e(k ) ||A ≤ 2 ·
κ(S) = O(h−1 )
!k
p
κ(A) − 1
p
||e(0) ||A
κ(A) + 1
General Framework
Algebraic Additive Schwarz preconditioner
Parallel numerical experiments
Prospectives
Structure of the Local Schur Complement
Description of the preconditioner
Variant of Additive Shwarz preconditioner MAS
MAS v.s. Neumann-Neumann
Outline
1
General Framework
2
Algebraic Additive Schwarz preconditioner
Structure of the Local Schur Complement
Description of the preconditioner
Variant of Additive Shwarz preconditioner MAS
MAS v.s. Neumann-Neumann
3
Parallel numerical experiments
Numerical scalability
Parallel performance
4
Prospectives
6/21
Numerical experiments with additive Schwaz preconditioner
General Framework
Algebraic Additive Schwarz preconditioner
Parallel numerical experiments
Prospectives
Structure of the Local Schur Complement
Description of the preconditioner
Variant of Additive Shwarz preconditioner MAS
MAS v.s. Neumann-Neumann
Structure of the Local Schur Complement
Non-Overlapping Domain Decomposition
Eg
Em
Ωj
Ωi
Ek
E`
Γi = E` ∪ Ek ∪ Em ∪ Eg
Distributed Schur Complement
0
S (i)
(i)
Smm
B
B Sgm
=B
@ Skm
S`m
Smg
(i)
Sgg
Skg
S`g
Smk
Sgk
(i)
Skk
S`k
1
Sm`
C
Sg` C
C
Sk ` A
(i)
S``
(i)
(j)
Sgg = Sgg + Sgg
If A is SPD then S is also SPD ⇒ CG
In a distributed memory environment: S is distributed non-assembled
7/21
Numerical experiments with additive Schwaz preconditioner
General Framework
Algebraic Additive Schwarz preconditioner
Parallel numerical experiments
Prospectives
Structure of the Local Schur Complement
Description of the preconditioner
Variant of Additive Shwarz preconditioner MAS
MAS v.s. Neumann-Neumann
A simple mathematical framework
The local component
U a algebraic space of vectors associated
with unknowns on Γ
Ui subspaces of U such that
U = U1 + ... + Un and
Ri : the canonical pointwise restriction from
U 7→ Ui
Mloc =
n
X
RiT Mi−1 Ri where Mi = Ri SRiT
i=1
Examples :
Ui associated with each edge: block Jacobi
Ui associated with ∂Ωi : additive Schwarz
8/21
Numerical experiments with additive Schwaz preconditioner
General Framework
Algebraic Additive Schwarz preconditioner
Parallel numerical experiments
Prospectives
Structure of the Local Schur Complement
Description of the preconditioner
Variant of Additive Shwarz preconditioner MAS
MAS v.s. Neumann-Neumann
Additive Schwarz preconditioner [ Carvalho, Giraud, Meurant, 01]
Preconditionner properties
Ui associated with the entire interface Γi of sub-domain ∂Ωi
MAS =
#domains
X
RiT (S̄ (i) )−1 Ri
i=1
0
S̄ (i)
Smm
B Sgm
=B
@ Skm
S`m
Smg
Sgg
Skg
S`g
Smk
Sgk
Skk
S`k
1
Sm`
Sg` C
C
Sk ` A
S``
Assembled local Schur complement
0
S (i)
(i)
Smm
B
B Sgm
=B
@ Skm
S`m
Smg
(i)
Sgg
Skg
S`g
Smk
Sgk
(i)
Skk
S`k
1
Sm`
C
Sg` C
C
Sk` A
(i)
S``
local Schur complement
Remarks
MAS is SPD if S is SPD
9/21
Numerical experiments with additive Schwaz preconditioner
General Framework
Algebraic Additive Schwarz preconditioner
Parallel numerical experiments
Prospectives
Structure of the Local Schur Complement
Description of the preconditioner
Variant of Additive Shwarz preconditioner MAS
MAS v.s. Neumann-Neumann
Cheaper Additive Shwarz preconditioner form
Main characteristics
Cheaper in memory space
FLOPS Reduction
Without any additional communication cost
Sparsification strategy
s̄ij
b
sij =
0
if
else
s̄ij ≥ (|s̄ii | + |s̄jj |)
Mixed arithmetic strategy
Compute and store the preconditioner in single precision
arithmetic
10/21
Numerical experiments with additive Schwaz preconditioner
General Framework
Algebraic Additive Schwarz preconditioner
Parallel numerical experiments
Prospectives
Structure of the Local Schur Complement
Description of the preconditioner
Variant of Additive Shwarz preconditioner MAS
MAS v.s. Neumann-Neumann
MAS v.s. Neumann-Neumann
Neumann-Neumann preconditioner
[J.F Bourgat, R. Glowinski, P. Le Tallec and M. Vidrascu - 89]
[Y.H. de Roek, P. Le Tallec and M. Vidrascu - 91]
A(i)
1
1
S
⇒ S −1 = ((S (1) )−1 + (S (2) )−1 )
S (1) = S (2) =
2
2
2
«„
«„
«
„
« „
Aii AiΓ
Aii
0
I
0
I A−1
ii AΓi
=
=
(i)
I
AiΓ A−1
0
I
0 S (i)
AiΓ AΓ
ii
„
«
`
´
0
(S (i) )−1 = 0 I (A(i) )−1
I
#domains
MNN =
X
#domains
RiT (Di (S (i) )−1 Di )Ri
i=1
while
MAS =
X
RiT (S̄ (i) )−1 Ri
i=1
11/21
Numerical experiments with additive Schwaz preconditioner
General Framework
Algebraic Additive Schwarz preconditioner
Parallel numerical experiments
Prospectives
Numerical scalability
Parallel performance
Outline
1
General Framework
2
Algebraic Additive Schwarz preconditioner
Structure of the Local Schur Complement
Description of the preconditioner
Variant of Additive Shwarz preconditioner MAS
MAS v.s. Neumann-Neumann
3
Parallel numerical experiments
Numerical scalability
Parallel performance
4
Prospectives
12/21
Numerical experiments with additive Schwaz preconditioner
General Framework
Algebraic Additive Schwarz preconditioner
Parallel numerical experiments
Prospectives
Numerical scalability
Parallel performance
Computational framework
Target computer
IBM-SP4 (CINES)
SGI O3800 (CINES)
Cray XD1 (CERFACS)
System X (Virginia Tech) jointly with Layne T. Watson-Virginia Polytechnic Institute
Local direct solver : MUMPS [Amestoy, Duff, Koster, L’Excellent - 01]
Main features
-
Parallel distributed multifrontal solver (F90, MPI)
Symmetric and Unsymmetric factorizations
Element entry matrices, distributed matrices
Efficient Schur complement calculation
Iterative refinement and backward error analysis
Public domain: new version 4.6.3
www.enseeiht.fr/apo/MUMPS - [email protected]
13/21
Numerical experiments with additive Schwaz preconditioner
General Framework
Algebraic Additive Schwarz preconditioner
Parallel numerical experiments
Prospectives
Numerical scalability
Parallel performance
Numerical scalability
3D Poisson problem:
Number of CG iterations where either:
H
h
constant while # sub-domains is varied
Increasing mesh size
sub-domains size
20 × 20 × 20
25 × 25 × 25
30 × 30 × 30
35 × 35 × 35
MAS
MSpAS
MAS
MSpAS
MAS
MSpAS
MAS
MSpAS
H
h
horizontal view →
while # sub-domains kept constant
27
64
16
16
17
17
18
18
19
19
23
23
24
25
25
26
26
28
vertical view↓
# sub-domains ≡ # processors
125
216
343
512
729
25
26
26
28
27
29
30
30
29
31
31
34
32
36
33
38
32
34
33
37
34
40
35
46
35
39
37
42
39
44
43
46
39
43
40
45
42
48
44
50
1000
42
46
43
49
45
52
47
56
The solved problem size vary from 1.1 up to 42.8 Millions of unknowns
The number of iterations increases slightly when increasing # sub-domains
This increase is less significant when the local mesh size
14/21
H
h
grows
Numerical experiments with additive Schwaz preconditioner
General Framework
Algebraic Additive Schwarz preconditioner
Parallel numerical experiments
Prospectives
Numerical scalability
Parallel performance
Numerical scalability
3D Difficult Discontinuous problem :
Jumps in diffusion coefficient functions a() = b() = c(): 1 − 103
15/21
Numerical experiments with additive Schwaz preconditioner
General Framework
Algebraic Additive Schwarz preconditioner
Parallel numerical experiments
Prospectives
Numerical scalability
Parallel performance
Numerical scalability
3D Difficult Discontinuous problem :
Jumps in diffusion coefficient functions a() = b() = c(): 1 − 103
Number of CG iterations where either:
H
h
constant while # sub-domains is varied
Increasing mesh size
H
h
horizontal view→
while # sub-domains kept constant
vertical view↓
# sub-domains ≡ # processors
sub-domains size
20 × 20 × 20
25 × 25 × 25
30 × 30 × 30
35 × 35 × 35
MAS
MSpAS
MAS
MSpAS
MAS
MSpAS
MAS
MSpAS
27
64
125
216
343
512
729
1000
32
32
29
34
34
30
31
29
37
42
41
45
43
47
43
51
44
48
46
51
46
52
49
58
53
58
52
63
57
68
62
71
58
63
60
66
61
70
63
84
68
75
71
82
75
90
80
92
78
85
80
89
84
96
87
105
82
91
85
99
87
105
92
116
16/21
Numerical experiments with additive Schwaz preconditioner
General Framework
Algebraic Additive Schwarz preconditioner
Parallel numerical experiments
Prospectives
Numerical scalability
Parallel performance
Parallel performance
3D Difficult Discontinuous problem:
Implementation details:
Setup Schur: MUMPS
Setup Precond: dense Schur(LAPACK)- sparse Schur(MUMPS)
Target computer : System Xserve MAC G5 - jointly with Layne T. Watson-Virginia
Polytechnic Institute
Parallel elapsed time:
103 processors
H
h
vary = 10−4
Jumps in diffusion coefficient functions a() = b() = c(): 1 − 103
Sub-domains size
20 × 20 × 20
25 × 25 × 25
30 × 30 × 30
35 × 35 × 35
setup Schur
1.30
1.30
4.20
4.20
11.2
11.2
26.8
26.8
setup Precond
0.93
0.50
3.05
1.60
8.73
3.51
21.4
6.22
time per iter
0.08
0.05
0.23
0.13
0.50
0.28
0.77
0.37
total
8.79
6.17
26.8
18.6
63.0
44.1
119
75.9
# iter
82
91
85
99
87
105
92
dense local Schur Precond MAS - sparse local Schur Precond MSpAS
116
17/21
Numerical experiments with additive Schwaz preconditioner
General Framework
Algebraic Additive Schwarz preconditioner
Parallel numerical experiments
Prospectives
Numerical scalability
Parallel performance
Local data storage
MAS vs MSpAS Memory behaviour
MAS
MSpAS
−5
Subdomains size
20 × 20 × 20
25 × 25 × 25
30 × 30 × 30
35 × 35 × 35
= 10
35.85MB
91.23MB
194.4MB
367.2MB
18/21
7.5MB
12.7MB
19.4MB
28.6MB
(10%)
(14%)
(10%)
( 7%)
= 10−4
1.8MB
2.7MB
3.8MB
10.2MB
( 5%)
( 3%)
( 2%)
( 2%)
Numerical experiments with additive Schwaz preconditioner
General Framework
Algebraic Additive Schwarz preconditioner
Parallel numerical experiments
Prospectives
Outline
1
General Framework
2
Algebraic Additive Schwarz preconditioner
Structure of the Local Schur Complement
Description of the preconditioner
Variant of Additive Shwarz preconditioner MAS
MAS v.s. Neumann-Neumann
3
Parallel numerical experiments
Numerical scalability
Parallel performance
4
Prospectives
19/21
Numerical experiments with additive Schwaz preconditioner
General Framework
Algebraic Additive Schwarz preconditioner
Parallel numerical experiments
Prospectives
Prospectives
Objective
Control the growth of iterations when increasing the # processors
Various possibilities (future work)
Numerical remedy: two-level preconditioner
- Coarse space correction, ie solve a closed problem on a coarse
space
- Various choices for the coarse component (eg one d.o.f. per
sub-domain)
Computer Science remedy : several processors per sub-domain
- two-level of parallelism
- 2D cyclic data storage
20/21
Numerical experiments with additive Schwaz preconditioner
General Framework
Algebraic Additive Schwarz preconditioner
Parallel numerical experiments
Prospectives
Numerical alternative: preleminary results
Domain based coarse space : M = MAS + ROT A−1
O R0
“As many” dof in the coarse space as
sub-domains [Carvalho, Giraud, Le Tallec, 01]
Partition of unity : R0T simplest constant
interpolation
Anisotropic and Discontinuous 3D problem:
# procs
125
216
343
H
h
= 30
512
729
1000
setup Schur
11.2 11.2 11.2 11.2 11.2 11.2 11.2 11.2 11.2 11.2 11.2 11.2
setup Precond
8.70 8.70 8.70 8.70 8.70 8.70 8.70 8.70 8.70 8.70 8.70 8.70
setup coarse
0.80
time per iter
0.50 0.50 0.50 0.50 0.51 0.50 0.51 0.50 0.52 0.50 0.53 0.50
total
40.2 44.9 46.7 50.9 51.8 55.4 55.0 59.9 58.5 64.0 62.5 67.6
# iter
39
-
0.83
-
50
52
62
with coarse space
21/21
0.87
61
-
71
0.92
67
-
80
0.96
73
-
88
1.30
78
-
95
- without coarse space
Numerical experiments with additive Schwaz preconditioner
General Framework
Algebraic Additive Schwarz preconditioner
Parallel numerical experiments
Prospectives
Parallel computing alternative
Main characteristics of the two-level of parallelism
Anisotropic and Discontinuous 3D problem: very preliminary result ongoing work
# Sub-domains
Sub-dom size
# iter
Setup Schur
Setup MAS
time/iter
total time
1 Level
1000
20 × 20 × 20
186
1.30
0.95
0.08
17.0
2 Level
125
39 × 39 × 39
99
21.0
11.2
0.26
58.5
22/21
Numerical experiments with additive Schwaz preconditioner