Linear Scaling DFT based on Daubechies Wavelets for Massively

Transcription

Linear Scaling DFT based on Daubechies Wavelets for Massively
Linear Scaling DFT based on Daubechies
Wavelets for Massively Parallel Archiecture
BigDFT
BigDFT
Wavelets
Operations
Performances
O (N )
L Ratcliff1 , Stephan Mohr1,2 , Paul Boulanger1,3 ,
Luigi Genovese1 , Damien Caliste1 , Stefan Goedecker2 ,
Thierry Deutsch
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
1
Laboratoire de Simulation Atomistique (L Sim), CEA Grenoble
2
Institut für Physik, Universität Basel, Switzerland
3
Institut Néel, CNRS, Grenoble
Centre Blaise Pascal, Lyon
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L Sim
Thierry Deutsch
Outline
1
The machinery of BigDFT (cubic scaling version)
Mathematics of the wavelets
Main Operations
MPI/OpenMP and GPU performances
2
O (N ) (linear scaling) BigDFT approach
BigDFT
BigDFT
Wavelets
Operations
Minimal basis set approach
Performance and Accuracy
Fragment approach
Applications
Perspectives
Performances
O (N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
3
Conclusions
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L Sim
Thierry Deutsch
A basis for nanosciences: the BigDFT project
STREP European project: BigDFT(2005-2008)
Four partners, 15 contributors:
CEA-INAC Grenoble (T.Deutsch), U. Basel (S.Goedecker),
U. Louvain-la-Neuve (X.Gonze), U. Kiel (R.Schneider)
BigDFT
BigDFT
Wavelets
Operations
Performances
O (N )
Aim: To develop an ab-initio DFT code based
on Daubechies Wavelets, to be integrated in
ABINIT, distributed under GNU-GPL license
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
After six years, BigDFT project still alive and well
BigDFT formalism from HPC perspective
Wavelets for O (N ) DFT
Resonant states of open quantum systems
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L Sim
Thierry Deutsch
Why do we use wavelets in BigDFT?
Adaptivity
One grid, two resolution levels in BigDFT:
• 1 scaling function (“coarse region”)
BigDFT
• 1 scaling function and 7 wavelets
(“fine region”)
BigDFT
Wavelets
Operations
Performances
O (N )
1.5
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
φ(x)
Ideal for big inhomogeneous systems 1
Efficient Poisson solver, capable of
0.5
handling different boundary conditions –
0
free, wire, surface, periodic
Explicit treatment of charged systems -0.5
Established code with many capabilites-1
ψ(x)
-1.5
-6
-4
-2
0
2
x
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L Sim
Thierry Deutsch
4
6
8
A brief description of wavelet theory
Two kind of basis functions
A Multi-Resolution real space basis
The functions can be classified following the resolution level they
span.
BigDFT
BigDFT
Wavelets
Operations
Performances
O (N )
Scaling Functions
The functions of low resolution level are a linear combination of
high-resolution functions
Minimal basis
Performance
=
Fragment
Applications
+
m
Perspectives
φ(x ) =
Conclusions
∑
hj φ(2x − j )
j =−m
Centered on a resolution-dependent grid: φj = φ0 (x − j ).
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L Sim
Thierry Deutsch
A brief description of wavelet theory
Wavelets
They contain the DoF needed to complete the information which is
lacking due to the coarseness of the resolution.
=
BigDFT
+ 21
m
BigDFT
Wavelets
1
2
φ(2x ) =
Operations
∑
m
h̃j φ(x − j ) +
j =−m
Performances
∑
g̃j ψ(x − j )
j =−m
O (N )
Minimal basis
Performance
Increase the resolution without modifying grid space
Fragment
Applications
SF + W = same DoF of SF of higher resolution
Perspectives
Conclusions
m
ψ(x ) =
∑
gj φ(2x − j )
j =−m
All functions have compact support, centered on grid points.
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L Sim
Thierry Deutsch
Adaptivity of the mesh
Atomic positions (H2 O)
BigDFT
BigDFT
Wavelets
Operations
Performances
O (N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L Sim
Thierry Deutsch
Adaptivity of the mesh
Fine grid (high resolution)
BigDFT
BigDFT
Wavelets
Operations
Performances
O (N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L Sim
Thierry Deutsch
Adaptivity of the mesh
Coarse grid (low resolution)
BigDFT
BigDFT
Wavelets
Operations
Performances
O (N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L Sim
Thierry Deutsch
Basis set features
Orthogonality, scaling relation
Z
dx φk (x )φj (x ) = δkj
BigDFT
BigDFT
1
φ(x ) = √
m
∑
2 j =−m
hj φ(2x − j )
The hamiltonian-related quantities can be calculated up to machine
precision in the given basis.
Wavelets
Operations
Performances
O (N )
Minimal basis
Performance
Fragment
Exact evaluation of kinetic energy
Applications
Perspectives
Conclusions
14
The accuracy is only limited by the basis set (O hgrid
)
Obtained by convolution with filters:
f (x ) =
c` φ` (x ) ,
∇2 f (x ) =
∑
`
c̃` =
∑ c̃` φ` (x ) ,
`
∑ cj a`−j ,
a` ≡
Z
φ0 (x )∂2x φ` (x ) ,
j
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L Sim
Thierry Deutsch
version 1.7.x
http://www.bigdft.org
• Isolated, surfaces and 3D-periodic boundary conditions
(k-points, symmetries)
• All XC functionals of the ABINIT package (libXC library)
• Hybrid functionals, Fock exchange operator
BigDFT
BigDFT
• Direct Minimisation and Mixing routines (metals)
• Local geometry optimizations (with constraints)
Wavelets
Operations
• External electric fields (surfaces BC)
Performances
O (N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
• Born-Oppenheimer MD
• Vibrations
• Unoccupied states
• Empirical van der Waals interactions
• Saddle point searches (NEB, Granot & Bear)
• All these functionalities are GPU-compatible
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L Sim
Thierry Deutsch
Optimal for isolated systems
Test case: cinchonidine molecule (44 atoms)
101
BigDFT
BigDFT
Wavelets
Operations
Performances
Absolute energy precision (Ha)
Plane waves
0
10
Ec = 40 Ha
10-1
Ec = 90 Ha
10-2
h = 0.4bohr
Ec = 125 Ha
-3
10
h = 0.3bohr
O (N )
Minimal basis
Wavelets
10-4
Performance
Fragment
Applications
105
Number of degrees of freedom
106
Perspectives
Conclusions
Allows a systematic approach for molecules
Considerably faster than Plane Waves codes.
10 (5) times faster than ABINIT (CPMD)
Charged systems can be treated explicitly with the same time
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L Sim
Thierry Deutsch
Systematic basis set
Two parameters for tuning the basis
The grid spacing hgrid
The extension of the low resolution points crmult
BigDFT
BigDFT
Wavelets
Operations
Performances
O (N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L Sim
Thierry Deutsch
Massively parallel
Two kinds of parallelisation
By orbitals (Hamiltonian application, preconditioning)
By components (overlap matrices, orthogonalisation)
BigDFT
BigDFT
A few (but big) packets of data
Wavelets
Operations
Performances
O (N )
Minimal basis
Performance
More demanding in bandwidth than in latency
No need of fast network
Optimal speedup (eff. ∼ 85%), also for big systems
Fragment
Applications
Perspectives
Conclusions
Cubic scaling code
For systems bigger than 500 atomes (1500 orbitals) :
orthonormalisation operation is predominant (N 3 )
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L Sim
Thierry Deutsch
Orbital distribution scheme
Used for the application of the hamiltonian
The hamiltonian (convolutions) is applied separately onto each
wavefunction
BigDFT
BigDFT
ψ1
Wavelets
MPI 0
Operations
Performances
O (N )
ψ2
Minimal basis
Performance
Fragment
ψ3
Applications
MPI 1
Perspectives
Conclusions
ψ4
ψ5
Laboratoire de Simulation Atomistique
MPI 2
http://inac.cea.fr/L Sim
Thierry Deutsch
Coefficient distribution scheme
Used for scalar product & orthonormalisation
BLAS routines (level 3) are called, then result is reduced
MPI 0
BigDFT
MPI 1
MPI 2
ψ1
BigDFT
Wavelets
Operations
Performances
ψ2
O (N )
Minimal basis
Performance
ψ3
Fragment
Applications
Perspectives
Conclusions
ψ4
ψ5
Communications are performed via MPI ALLTOALLV
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L Sim
Thierry Deutsch
OpenMP parallelisation
Innermost parallelisation level
(Almost) Any BigDFT operation is parallelised via OpenMP
4 Useful for memory demanding calculations
BigDFT
BigDFT
4 Allows further increase of speedups
4 Saves MPI processes and intra-node Message Passing
Wavelets
4.5
2OMP
3OMP
6OMP
Operations
O (N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
6 Less efficient than
MPI
6 Compiler and system
dependent
6 OpenMP sections
should be regularly
maintained
Laboratoire de Simulation Atomistique
4
OMP Speedup
Performances
3.5
3
2.5
2
1.5
0
http://inac.cea.fr/L Sim
20
40
60
80
No. of MPI procs
Thierry Deutsch
100
120
High Performance Computing
Localisation & Orthogonality → Data locality
Principal code operations can be intensively optimised
BigDFT
BigDFT
Optimal for application on supercomputers
100
2
Wavelets
Little communication, big
packets of data
Performance
Fragment
Applications
Perspectives
No need of fast network
Optimal speedup
4
Efficiency (%)
Minimal basis
4
98
Performances
O (N )
2
16
Operations
96
8
94
44
1
1
4
22
92
11
5
(with orbitals/proc)
Conclusions
Efficiency of the order of 90%,
up to thousands of processors
32 at
65 at
173 at
257 at
1025 at
90
88
1
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L Sim
2
3
10
100
Number of processors
Thierry Deutsch
1
1000
Task repartition for a small system (ZnO, 128 atoms)
1 Th. OMP per core
100
1000
90
BigDFT
Wavelets
Percent
70
BigDFT
100
60
50
40
Operations
Performances
O (N )
Minimal basis
Performance
10
30
Seconds (log. scale)
80
20
10
Fragment
Applications
Perspectives
0
16
24
32
48
64
96
144 192 288 576
1
Efficiency (%)
Time (sec)
Other
CPU
Conv
LinAlg
Comms
No. of cores
Conclusions
Data repartition optimal for material accelerators (GPU)
Graphic Processing Units can be used to speed up the computation
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L Sim
Thierry Deutsch
Using GPUs in a Big systematic DFT code
Nature of the operations
Operators approach via convolutions
Linear Algebra due to orthogonality of the basis
Communications and calculations do not interfere
BigDFT
BigDFT
* A number of operations which can be accelerated
Wavelets
Operations
Performances
O (N )
Minimal basis
Performance
Fragment
Evaluating GPU convenience
Three levels of evaluation
1
Applications
Does the operations are suitable for GPU?
Perspectives
Conclusions
Bare speedups: GPU kernels vs. CPU routines
2
Full code speedup on one process
Amdahl’s law: are there hot-spot operations?
3
Speedup in a (massively?) parallel environment
The MPI layer adds an extra level of complexity
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L Sim
Thierry Deutsch
Hybrid and Heterogeneous runs with OpenCL
NVidia S2070
Connected each to
a Nehalem
Workstation
ATI HD 6970
BigDFT may run on
both
BigDFT
BigDFT
Wavelets
Operations
Performances
O (N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
Sample BigDFT run: Graphene, 4 C atoms, 52 kpts
No. of Flop: 8.053 · 1012
MPI
GPU
Time (s)
Speedup
GFlop/s
1
NO
6020
1
1.34
1
NV
300
20.07
26.84
4
NV
160
37.62
50.33
1
ATI
347
17.35
23.2
4
ATI
197
30.55
40.87
8
NV + ATI
109
55.23
73.87
Next Step: handling of Load (un)balancing
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L Sim
Thierry Deutsch
Configuration space of the cage-like boron clusters
Stabilize the buckyball configuration of B80 systems
PRB 83 081403(R), 2011
0.5
0.5
0
0
-0.5
-0.5
-1
-1
BigDFT
BigDFT
Wavelets
Operations
Performances
O (N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
-1.5
-1.5
14:6
B80
B12@B68
-2
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L Sim
Thierry Deutsch
-2
Configuration space of the cage-like boron clusters
Stabilize the buckyball configuration of B80 systems
PRB 83 081403(R), 2011
Sc N
3
BigDFT
0.5
0.5
-12.76 eV
BigDFT
Wavelets
Operations
0
0
-0.5
-0.5
Performances
O (N )
Minimal basis
Performance
Fragment
Applications
Perspectives
-1
on B12@B68
Conclusions
-13.55 eV
-1.5
-1
-1.5
8:12
-2
@ B80
Laboratoire de Simulation Atomistique
-2
http://inac.cea.fr/L Sim
Thierry Deutsch
Outline
1
The machinery of BigDFT (cubic scaling version)
Mathematics of the wavelets
Main Operations
MPI/OpenMP and GPU performances
2
O (N ) (linear scaling) BigDFT approach
BigDFT
BigDFT
Wavelets
Operations
Minimal basis set approach
Performance and Accuracy
Fragment approach
Applications
Perspectives
Performances
O (N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
3
Conclusions
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L Sim
Thierry Deutsch
Scaling of BigDFT (I)
We can reach systems containing up to a few hundred electrons
thanks to wavelet properties and efficient parallelization:
• MPI
• OpenMP
• GPU ported
BigDFT
Wavelets
Operations
Performances
O (N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
But the various operations scale differently:
O (N log N ): Poisson solver
• O (N 2 ): convolutions
• O (N 3 ): linear algebra
•
and have different prefactors:
• cO (N 3 ) cO (N 2 ) cO (N log N )
60
Time / iteration (s)
BigDFT
50
40
30
20
10
0
0
1000
Number of atoms
−→ for bigger systems the O (N 3 ) will dominate and so we need a
new approach
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L Sim
Thierry Deutsch
2000
Scaling of BigDFT (II)
To improve the scaling and thus the scope of BigDFT, we must take
advantage of the nearsightedness principle:
• the behaviour of large systems is short-ranged, or
near-sighted
• the density matrix decays exponentially in insulating systems
BigDFT
BigDFT
Wavelets
Operations
• can take advantage of this locality to create linear-scaling
O (N ) DFT formalisms by having localized orbitals e.g. as in
ONETEP , Conquest, SIESTA...
Performances
O (N )
• we want to adapt this formalism for BigDFT
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L Sim
Thierry Deutsch
Minimal basis and density kernel
Write the KS orbitals as linear
combinations of minimal basis
set φα (r):
BigDFT
BigDFT
Ψi (r) = ∑ ciα φα (r)
Define the density matrix ρ(r, r0 )
and kernel K αβ :
ρ(r, r0 )
• localized
∑ Ψi (r)ihΨi (r0 )
=
Performances
O (N )
• expanded in wavelets
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L Sim
∑ φα (r)iK αβ hφβ (r0 )
α,β
• atom-centred
i
α
Wavelets
Operations
=
Thierry Deutsch
The algorithm
Minimal basis set
Initial Guess:
Atomic Orbitals
Basis optim ization:
• Minimize the ‘trace’, band structure
energy or a combination at fixed potential
or
• SD or DIIS with k.e. preconditioning
BigDFT
BigDFT
Wavelets
Operations
• Initialized to atomic orbitals, optimized
using a quartic confining potential subject
to an orthogonality constraint
Performances
O (N )
Density kernel
Minimal basis
Performance
Fragment
Applications
Perspectives
Kernel Optim ization :
Diagonalize Hamiltonian
Update potential
Three methods for self-consistent optimization:
• Diagonalization
Conclusions
• Direct minimization
Forces
Laboratoire de Simulation Atomistique
• Fermi operator expansion (linear-scaling)
http://inac.cea.fr/L Sim
Thierry Deutsch
Scaling
60
3
40
30
20
Cubic
Dmin
FOE
10
BigDFT
Wavelets
Operations
Performances
12
Memory (GB)
Time / iteration (s)
BigDFT
2
ax + bx + cx:
1.0e+00; 3.0e+03; 1.5e+06
4.6e-02; 2.2e+02; 1.4e+06
7.6e-03; 4.3e-03; 1.4e+06
50
10
8
6
4
2
0
0
1000
2000
3000
4000
5000
6000
7000
8000
Number of atoms
O (N )
Minimal basis
Performance
Fragment
Improved time and memory scaling
Applications
Perspectives
Conclusions
• 301 MPI, 8 OpenMP for above results
• ∼ O (N ) for FOE
• need sparse matrix algebra for direct
minimisation → O (N )
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L Sim
Thierry Deutsch
0
4000
8000
Number of atoms
BigDFT
BigDFT
Wavelets
Speedup wrt 160 cores
20
15
10
Speedup
Ideal speedup
Amdahl’s law, P=0.97
Efficiency
5
Operations
Performances
O (N )
Minimal basis
Performance
0
0
500 1000 1500 2000 2500 3000 3500
Number of cores (MPI tasks x 4 OMP threads)
100
90
80
70
60
50
40
30
20
10
0
4000
Efficiency (%)
Parallelization – MPI
Fragment
Applications
Perspectives
Conclusions
Efficient parallelization for 1000s of CPUs
• ∼ 97% of code parallelized with MPI
• ∼ 93% of code parallelized with OpenMP
960 atoms
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L Sim
Thierry Deutsch
Accuracy: Total energies and forces
Several systems 44 – 450 atoms
• compared with standard BigDFT
• total energies accuracy: ∼ 1 meV/atom
• forces accuracy: ∼ a few meV/bohr
BigDFT
BigDFT
Wavelets
Operations
Performances
O (N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L Sim
Thierry Deutsch
Geometry optimization
1e+00
Time (min.)
RMS F (eV/Å)
10
1e−01
1e−02
5
BigDFT
1e−03
BigDFT
0
Wavelets
Cum. time (min.)
100
Operations
Performances
Minimal basis
Performance
Fragment
0.10
∆ R (Å)
O (N )
0.05
Applications
80
288 atoms
60
40
Cubic
Reset
Reformat
20
Perspectives
Conclusions
0.00
0
0
5
10
15
0
5
10
15
Further savings through support function reuse
• Pulay forces not needed
Laboratoire de Simulation Atomistique
• lower prefactor than single point
http://inac.cea.fr/L Sim
Thierry Deutsch
Energy differences
Silicon cluster with a vacancy defect
• 291 atoms
BigDFT
BigDFT
Wavelets
• need 9 support functions per Si
• accuracy in defect energy of 12 meV
Operations
Performances
O (N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
cubic
4/1 (sp-like)
9/1 (spd-like)
pristine
eV
−20674.2
−20667.6
−20672.9
Laboratoire de Simulation Atomistique
vacancy
eV
−20563.1
−20556.5
−20561.7
http://inac.cea.fr/L Sim
∆
∆-∆cubic
eV
111.167
111.038
111.155
meV
–
129
12
Thierry Deutsch
Fragment approach
• By dividing a system into fragments,
we can avoid optimizing the support
functions entirely for large systems
• Substantially reduces the cost
(need an efficient reformatting)
BigDFT
BigDFT
Wavelets
Operations
• Many useful applications, including the
explicit treatment of solvents
Performances
O (N )
Minimal basis
Performance
Fragment
Applications
Perspectives
• A necessary first step towards a
tight-binding like approach, with each
atom as a fragment
Conclusions
Reformatting the minimal basis set in the same grid
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L Sim
Thierry Deutsch
Support function reuse
As we have seen, the reuse of support functions lends itself to
some interesting applications, e.g. geometry optimizations,
charged systems
For this, we need an accurate and efficient reformatting scheme
BigDFT
100
BigDFT
10
Performances
O (N )
Minimal basis
Performance
Fragment
Applications
E - Ecubic (meV / atom)
Wavelets
Operations
Cubic eggbox
Linear eggbox
Linear template
1
0.1
0.01
Perspectives
Conclusions
0.001
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L Sim
Thierry Deutsch
Transfer integrals and site energies
−1.50
DZ J12 LUMO
TZP J12 LUMO
TMB J12 LUMO
0.80
0.60
Energy (eV)
Energy (eV)
1.00
0.40
0.20
0.00
Operations
Performances
O (N )
Minimal basis
Performance
Fragment
30
60
Rotation angle
−3.00
−3.50
90
DZ J12 HOMO
TZP J12 HOMO
TMB J12 HOMO
0.80
0.60
0
30
60
Rotation angle
90
DZ e1,2 HOMO
TZP e1,2 HOMO
−4.50 TMB e1,2 HOMO
−4.00
Energy (eV)
Wavelets
1.00
Energy (eV)
BigDFT
−2.50
−4.00
0
BigDFT
DZ e1,2 LUMO
TZP e1,2 LUMO
TMB e1,2 LUMO
−2.00
0.40
0.20
−5.00
−5.50
−6.00
0.00
0
30
60
Rotation angle
90
0
30
60
Rotation angle
90
Applications
Perspectives
Conclusions
BigDFT compared to ADF fragment approach
• support functions from molecules reused in dimers
• Application to OLED (Organic Light-Emitting Diode)
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L Sim
Thierry Deutsch
Transfer integrals for OLEDs (I)
We want to consider environment effects in realistic ‘host-guest’
morphologies: 6192 atoms, 100 molecules
BigDFT
BigDFT
Wavelets
Host molecule
Operations
Performances
O (N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
Guest molecule
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L Sim
Thierry Deutsch
Transfer integrals for OLEDs (II)
Transfer integrals (left) and site energies (right) calculated with
(bottom) and without (top) constrained DFT
host-host
BigDFT
guest-guest
BigDFT
Wavelets
Operations
-5.5 -5.4 -5.3 -5.2 -5.1 -5 -4.9 -4.8 -4.7 -4.6 -4.5
host-guest
Performances
host
guest
O (N )
Minimal basis
Performance
all
Fragment
Applications
Perspectives
Conclusions
-0.06
-0.04
-0.02
0
0.02
Energy (eV)
0.04
0.06
-7.3 -7.2 -7.1 -7 -6.9 -6.8 -6.7 -6.6 -6.5 -6.4 -6.3
Energy (eV)
Generate good statistics.
Then use of Markus model to calculate the efficiency of OLEDS.
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L Sim
Thierry Deutsch
Perspectives of linear scaling
Combination of wavelets and localized basis functions
• Highly accurate (energies and forces) and efficient
• Linear scaling – Thousands of atoms
BigDFT
BigDFT
Wavelets
Operations
• Ideal for massively parallel calculations
• Flexible e.g. reuse of support functions to accelerate
geometry optimizations and for charged systems
Performances
O (N )
Minimal basis
Fragment approach
Performance
Fragment
Applications
Perspectives
Conclusions
• Accurate reformatting scheme – can accelerate calculations
on large systems e.g. explicit treatment of solvents
• Wide range of applications: constrained DFT, transfer integrals
• Crucial for future developments e.g. tight-binding, embedded
systems (QM/QM)
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L Sim
Thierry Deutsch
Outline
1
The machinery of BigDFT (cubic scaling version)
Mathematics of the wavelets
Main Operations
MPI/OpenMP and GPU performances
2
O (N ) (linear scaling) BigDFT approach
BigDFT
BigDFT
Wavelets
Operations
Minimal basis set approach
Performance and Accuracy
Fragment approach
Applications
Perspectives
Performances
O (N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
3
Conclusions
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L Sim
Thierry Deutsch
Conclusions
Wavelets: Powerful formalism
• Linear scaling – Thousands of atoms
BigDFT
BigDFT
• Accurate reformatting scheme – can accelerate calculations
on large systems e.g. explicit treatment of solvents
• Wide range of applications: constrained DFT, transfer integrals
Wavelets
Operations
• Towards multi-scale approach (embedded systems)
Performances
O (N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Beyond DFT
• Resonant states are the way to describe fully a system
• Give a compact description (few states) of the systems
Conclusions
• Linear-response, TD-DFT, ...
• Many-body perturbation theory (GW, ...)
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L Sim
Thierry Deutsch
Acknowledgments
Main maintainer Luigi Genovese
Group of Stefan Goedecker A. Ghazemi, A. Willand, S. De,
A. Sadeghi, N. Dugan (Basel University)
BigDFT
BigDFT
Wavelets
Operations
Performances
O (N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
Link with ABINIT and bindings Damien Caliste (CEA-Grenoble)
Order N methods S. Mohr, L. Ratcliff, P. Boulanger (Néel Institute,
CNRS)
Exploring the PES N. Mousseau (Montreal University)
Resonant states A. Cerioni, A. Mirone (ESRF, Grenoble),
I. Duchemin (CEA) M. Morinière (CEA)
Optimized convolutions B. Videau (LIG, computer scientist,
Grenoble)
Applications P. Pochet, E. Machado, S. Krishnan, D. Timerkaeva
(CEA-Grenoble)
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L Sim
Thierry Deutsch