Short Update on Computing Complex Symmetric Eigenvalue

Transcription

Short Update on Computing Complex Symmetric Eigenvalue
Short Update on Computing Complex Symmetric
Eigenvalue Problem Derived from KKM Reaction
Theory -A Simplified Interface for Parallel Processing
G. Arbanas, C. Bertulani, A. Kerman, K. Roche , K. Ushkala
First order effort: map the complex problem to numerically real problem
Err = Cx-cx
N
double
precision
single
precision
double
precision
single
precision
max backward max backward
max backward max backward
error:
error:
error:
error:
direct complex
direct real
direct complex
direct real
factorization - factorization factorization - factorization symmetric
symmetric
random
random
random
random
complex matrix complex matrix
complex matrix complex matrix
64
7.35E-14
9.50E-07
2.52E-14
3.25E-07
128
5.22E-14
5.19E-07
8.41E-14
1.36E-06
256
1.15E-13
7.67E-07
1.88E-13
1.03E-06
512
7.13E-13
2.13E-06
5.80E-13
2.60E-06
1024
2.65E-13
1.93E-06
3.75E-13
2.16E-06
In the table, the feasibility of the
transformation is checked. Be careful drawing
conclusions about the accuracy data. This study
fixes the memory demand and reveals the loss
in accuracy when going from double to single
precision w/ the transformation from complex
to real arithmetic.
(double)Complex variant : n*n*16 BYTES
(double)Real variant : 4 * n * n * 8
BYTES
(single)Real variant : 4 * n * n * 4 BYTES
Try a Dense Direct Method
This method is not generally stable.
Block Cyclic Decomposition
of Natural Data
: kfil_2dbc_rd() ; kstr_2dbc_rd();
: kfil_2dbc_wr() ; kstr_2dbc_wr();
•need a method for generating a large data set
•trying hard to create techniques that keep the user out of
the decision making process for parallelization
•want to leverage existing and useful software
ScaLAPACK
LAPACK
PBLAS
BLACS
BLAS
int csyeig( MPI_Comm commw , int n , double complex * a , double
complex * z , double complex * c ) ;
void test_cfnc( double complex * a , int m , int n , int seed , void (*gen_fnc)
( double complex * , int , int , int ) ) {
(*gen_fnc)( a , m , n , seed ) ;
}
MPI
Tests (more testing needed)
• against KKM.f (n=512)
• against zgeev() -for general complex systems (n=8192)
• self-consistent tests ( |AZ-DZ| ) (n= 65536)
• against Toeplitz form (n=32768)
csym-mat-gen() n = 32768
PAPI_TOT_INS : Tot[ 111865019536195 ]
Rt[ 109248056535 ]
PAPI_FP_INS :
Tot[ 2199627218944 ]
Rt[ 2148089464 ]
PAPI_L2_DCM : Tot[
321977754 ]
Rt[
263514 ]
PAPI_real_cyc = 157816348692 PAPI_real_usec = 68615804
PAPI_user_cyc = 157826000000 PAPI_user_usec = 68620000
Example
XT5
Run:
csyeig() n = 32768
PAPI_TOT_INS : Tot[ 7502114505715511 ] Rt[ 6706228558246 ]
PAPI_FP_INS :
Tot[ 967948054098509 ] Rt[ 1332067487703 ]
PAPI_L2_DCM : Tot[
3181343933369 ] Rt[
2894696203 ]
PAPI_real_cyc = 6407554186034 PAPI_real_usec = 2785893130
PAPI_user_cyc = 6406765000000PAPI_user_usec = 2785550000
Err|AZ-DZ| n = 32768
PAPI_TOT_INS : Tot[ 523619868353838 ]
Rt[ 519582386553 ]
PAPI_FP_INS :
Tot[ 584968040019504 ]
Rt[ 589249121346 ]
PAPI_L2_DCM : Tot[
306773490444 ]
Rt[
287988286 ]
PAPI_real_cyc = 219179680380 PAPI_real_usec = 95295514
PAPI_user_cyc = 219121000000 PAPI_user_usec = 95270000
|AZ-ZD{a}|_inf ~ 4.05821e-10
[thy_err=3.68073e-07]
real 5h49m11.314s
herm-mat-gen() n = 16384
PAPI_TOT_INS : Tot[ 7034091200126 ] Rt[ 27481087715 ]
PAPI_FP_INS :Tot[ 137581551616 ] Rt[ 537443960 ]
PAPI_L2_DCM : Tot[ 87160426 ] Rt[ 259682 ]
PAPI_real_cyc = 39183316211 PAPI_real_usec = 17036224
PAPI_user_cyc = 39192000000 PAPI_user_usec = 17040000
pzheev_()n = 16384
PAPI_TOT_INS : Tot[ 430200433828620 ]
Rt[ 1667297054044 ]
PAPI_FP_INS :Tot[ 103533019765351 ]
Rt[ 428900801349 ]
PAPI_L2_DCM : Tot[ 195552827152 ] Rt[ 697091250 ]
PAPI_real_cyc = 1598982738786 PAPI_real_usec = 695209893
PAPI_user_cyc = 1598500000000PAPI_user_usec = 695000000
Err|AZ-DZ| n = 16384
PAPI_TOT_INS : Tot[ 65548973697866 ] Rt[ 260164538213 ]
PAPI_FP_INS :Tot[ 73124481940711 ] Rt[ 294638569624 ]
PAPI_L2_DCM : Tot[ 37741819129 ]
Rt[ 144649371 ]
PAPI_real_cyc = 109914371587 PAPI_real_usec = 47788858
PAPI_user_cyc = 109848000000 PAPI_user_usec = 47760000
|AZ-ZD{a}|_inf ~ 1.46099e-10
real 12m41.374s
user0m0.180s
sys 0m0.112s
[thy_err=9.22587e-08]
Playing with
Hermitian
Problems
too
Summary and Plan
• parallel complex symmetric diagonalization routine
• software that removes the user from the process of
parallelizing their dense numerical problem
• more testing necessary in the resource allocation /
selection process
• pkkm.c needs to be tested -right now FILE based
version exists and is being tested; incore variant easier
• io routines
• more numerical testing

Similar documents