BigDFT tutorial

Transcription

BigDFT tutorial
BigDFT tutorial
F IRST YARMOUK S CHOOL
Introduction
Inputs
BigDFT tutorial
DFT
Pseudopotential
Basis set
SCF
Thierry Deutsch
Performances
Parallel
GPU
L_Sim - CEA Grenoble
Conclusion
November, 2. 2010
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Outline
1
Introduction
Inputs
2
Density Functional Theory (quick view)
Pseudopotential
Basis set
SCF
3
Performances
Parallel
GPU
4
Conclusion
Introduction
Inputs
DFT
Pseudopotential
Basis set
SCF
Performances
Parallel
GPU
Conclusion
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
A basis for nanosciences: the BigDFT project
STREP European project: BigDFT(2005-2008)
Four partners, 15 contributors:
CEA-INAC Grenoble, U. Basel, U. Louvain-la-Neuve, U. Kiel
Introduction
Inputs
DFT
Pseudopotential
Basis set
SCF
Performances
Aim: To develop an ab-initio DFT code
based on Daubechies Wavelets, to be
integrated in ABINIT, distributed freely
(GNU-GPL license)
Parallel
GPU
Conclusion
L. Genovese, A. Neelov, S. Goedecker, T. Deutsch, et al.,
“Daubechies wavelets as a basis set for density functional pseudopotential calculations”,
J. Chem. Phys. 129, 014109 (2008)
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
BigDFT version 1.3: capabilities
http://inac.cea.fr/L_Sim/BigDFT
Free, surface and periodic boundary conditions
Local geometry optimizations (with constraints)
Global geometry optimization
Introduction
Inputs
Saddle point searches
Vibrations
DFT
Pseudopotential
External electric fields
Basis set
SCF
Performances
Parallel
Born-Oppenheimer MD
Unoccupied KS orbitals
GPU
Conclusion
Collinear and Non-collinear magnetism
All XC functionals of the ABINIT package
Empirical van der Waals interactions
Also available from within the ABINIT package
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
BigDFT version 1.4 (June 2010)
http://inac.cea.fr/L_Sim/BigDFT
Better performance for periodic systems (convolutions)
Exact exchange
Introduction
Inputs
DFT
Pseudopotential
Hybrid functionals (libXC, G2-1 tests)
Molecular Dynamics from ABINIT
K points (periodic)
Basis set
SCF
Performances
Parallel
Symmetries from ABINIT
Preliminary version for XANES
GPU
Conclusion
GPU version with S_GPU
BigDFT/ABINIT training workshop: June 30 – Juny 1
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Spirit of BigDFT
Simple for developers: no parser,
use input files with mandatory parameters
Introduction
Inputs
DFT
Pseudopotential
Basis set
SCF
Performances
Parallel
GPU
Few input files
Optional input files
as input.kpt (for periodic systems) or input.perf
Few input parameters
ex. not parameters for parallel calculations
Conclusion
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
BigDFT: Executables
bigdft Single point calculation, geometry
optimisation, molecular dynamics
Execute simply: bigdft
Introduction
Inputs
DFT
memguess ’number_of_cores’ Check input files and
determine the memory amount
Pseudopotential
Basis set
SCF
Performances
Parallel
frequencies Vibration properties
abscalc XANES calculations
GPU
Conclusion
bart BigDFT + ART (global minima search)
NEB Determination of saddle point
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
BigDFT: input files
posinp.xyz Atomic positions + boundary conditions
input.dft Everything about DFT calculations
psppar.N Pseudopotential file with the name of atom
(ABINIT format)
Introduction
Inputs
DFT
Pseudopotential
Optional files
Basis set
SCF
Performances
Parallel
input.geopt Geometry optimisation or molecular dynamics
parameters
GPU
Conclusion
input.freq Frequencies calculations
input.perf Options for performances
input.kpt K-points mesh for periodic systems
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Atomic positions: posinp.xyz or posinp.ascii
units reduced, angstroem, bohr or atomic
Boundary conditions periodic, surface
Introduction
Inputs
DFT
Pseudopotential
Basis set
SCF
Performances
Parallel
GPU
Conclusion
8 reduced
periodic 10.26085 10.26085
10.26085
Si 0. 0. 0.
Si 0.5 0.5 0.
Si 0.5 0. 0.5
Si 0. 0.5 0.5
Si 0.25 0.25 0.25
Si 0.75 0.75 0.25
Si 0.75 0.25 0.75
Si 0.25 0.75 0.75
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Atomic positions: posinp.xyz or posinp.ascii
units reduced, angstroem, bohr or atomic
Boundary conditions periodic, surface
Introduction
Inputs
DFT
Pseudopotential
Basis set
SCF
Performances
Parallel
GPU
Conclusion
3 angstroem
surface 3.0 0.0 3.0
O 1.5 0.0 1.5
H .7285 0.620919 1.5
H 2.2715 0.620919 1.5
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Atomic positions: posinp.xyz or posinp.ascii
units reduced, angstroem, bohr or atomic
Boundary conditions periodic, surface
Introduction
Inputs
5 atomic
DFT
Pseudopotential
Basis set
SCF
Performances
Parallel
GPU
Conclusion
Si 1.62 1.62 1.62
H 3.24 3.24 3.24
H 0 0 3.24
H 3.24 0 0
H 0 3.24 0
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Kohn-Sham formalism
H-K theorem: E is an unknown functional of the density
E = E [ρ] → Density Functional Theory
Kohn-Sham approach
Introduction
Mapping of an interacting many-electron system into a
system with independent particles moving into an effective
potential.
Inputs
DFT
Pseudopotential
Basis set
Find a set of orthonormal orbitals Ψi (r) that minimizes:
SCF
Performances
Parallel
GPU
Conclusion
E =−
1
N /2 Z
∑
2 i =1
Ψ∗i (r)∇2 Ψi (r)dr +
1
Z
2
ρ(r)VH (r)dr
Z
+ Exc [ρ(r)] +
Vext (r)ρ(r)dr
N /2
ρ(r) = 2 ∑ Ψ∗i (r)Ψi (r)
∇2 VH (r) = −4πρ(r)
i =1
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Performing a DFT calculation
A self-consistent equation
ρ(r) = ∑ Ψ∗i (r)Ψi (r), where |ψi i satisfies
i
1 2
− ∇ + VH [ρ] + Vxc [ρ] + Vext + Vpseudo |ψi i = Ei |ψi i ,
2
Introduction
Inputs
DFT
(Kohn-Sham) DFT “Ingredients”
Pseudopotential
Basis set
SCF
Performances
Parallel
GPU
Conclusion
An XC potential, functional of the density
several approximations exists (LDA,GGA,. . . )
A choice of the pseudopotential (if not all-electrons)
(norm conserving, ultrasoft, PAW,. . . )
A basis set for expressing the |ψi i
An (iterative) algorithm for finding the wavefunctions |ψi i
A (good) computer. . .
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Exchange-Correlation Energy (input.dft)
Exc [ρ] = Kexact [ρ] − KKS [ρ] + Ve−e exact [ρ] − VHartree [ρ]
KKS [ρ]: Kinetic energy for a non-interaction electron gas
Introduction
Inputs
DFT
Pseudopotential
Basis set
Kohn-Sham formalism is exact (mapping of an
interacting electron system into a non-interacting
system)
SCF
Performances
Parallel
Many types from
GPU
Conclusion
ABINIT: Use same codes
(ixc: LDA=1, PBE=11, see manual)
libXC: − ’code for exchange’ ’code for correlation’
(−101130 see manual)
Hybrid functionals from libXC
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Performing a DFT calculation
A self-consistent equation
ρ(r) = ∑ Ψ∗i (r)Ψi (r), where |ψi i satisfies
i
1 2
− ∇ + VH [ρ] + Vxc [ρ] + Vext + Vpseudo |ψi i = Ei |ψi i ,
2
Introduction
Inputs
DFT
(Kohn-Sham) DFT “Ingredients”
Pseudopotential
Basis set
SCF
Performances
Parallel
GPU
Conclusion
An XC potential, functional of the density
several approximations exists (LDA,GGA,. . . )
A choice of the pseudopotential (if not all-electrons)
(norm conserving, ultrasoft, PAW,. . . )
A basis set for expressing the |ψi i
An (iterative) algorithm for finding the wavefunctions |ψi i
A (good) computer. . .
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Gaussian type separable Pseudopotentials (HGH)
Local part
Vloc (r )
−Zion
=
r
r
"
1
erf √
+ exp −
2
2rloc
(
C1 + C2
Introduction
r
rloc
2
+ C3
r
2 #
rloc
4
r
+ C4
rloc
r
6 )
rloc
Inputs
DFT
Pseudopotential
Nonlocal (separable) part H (~r ,~r 0 )
Basis set
2
SCF
Performances
H sep (~r ,~r 0 )
=
∑ ∑ Ys,m (r̂ ) pis (r ) his pis (r 0 ) Ys∗,m (rˆ0 )
i =1 m
Parallel
GPU
+
Conclusion
∑ Yp,m (r̂ ) p1p (r ) h1p p1p (r 0 ) Yp∗,m (rˆ0 )
m
p1l (r )
=
√
2
rle
l + 23
rl
− 12 ( rr
l
)2
q
Γ(l + 32 )
Laboratoire de Simulation Atomistique
p2l (r )
http://inac.cea.fr/L_Sim
=
√
2
r l +2 e
l+ 7
rl 2
− 12 ( rr )2
l
q
Γ(l + 72 )
T. Deutsch
The nonlocal part of the pseudopotential
H sep Ψ = |P >< P |Ψ >
Introduction
Inputs
DFT
Pseudopotential
Basis set
SCF
Performances
Parallel
GPU
Conclusion
If the expansion coefficients cJ of P in a Daubechies basis
are known then
!
!
Z
< P |Ψ >=
cI φ(x − I )
ΨJ φ(x − J ) dx = cJ ΨJ
∑
I
∑
J
∑
J
Because a Gaussian is separable and a 3-dim scaling basis
is a product of 1-dim scaling functions/wavelets,
ci ,j ,k can easily be calculated with machine precision.
Z Z Z
ci ,j ,k =
φki (x )φkj (y )φkk (z ) exp(−(x 2 + y 2 + z 2 )) dx dy dz
Z
=
φki (x ) exp(−x 2 ) dx
Z
Z
× φki (y ) exp(−y 2 ) dy φki (z ) exp(−z 2 ) dz
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Pseudopotential file: psppar.N
In the same directory as the calculation.
Large library:
http://www.abinit.org/downloads/psp-links
Accurate and transferable
Introduction
Inputs
DFT
Pseudopotential
Basis set
SCF
Performances
Parallel
GPU
Conclusion
Goedecker pseudopotential for N
7 5 070301 zatom,zion,pspdat
10 11 1 2 2001 0 pspcod,pspxc,lmax,lloc,mmax,r2well
0.28379051 2 -12.41522559 1.86809592 rloc nloc c1 c2
2 nnonloc
0.25540500 1 13.63026257 rs ns hs11
0.24549453 1 0.00000000 rp np hp11
0.00414750 kp11
Save memory: projectors calculated on-the-fly
(cost < 1%, experimental option)
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Performing a DFT calculation
A self-consistent equation
ρ(r) = ∑ Ψ∗i (r)Ψi (r), where |ψi i satisfies
i
1 2
− ∇ + VH [ρ] + Vxc [ρ] + Vext + Vpseudo |ψi i = Ei |ψi i ,
2
Introduction
Inputs
DFT
(Kohn-Sham) DFT “Ingredients”
Pseudopotential
Basis set
SCF
Performances
Parallel
GPU
Conclusion
An XC potential, functional of the density
several approximations exists (LDA,GGA,. . . )
A choice of the pseudopotential (if not all-electrons)
(norm conserving, ultrasoft, PAW,. . . )
A basis set for expressing the |ψi i
An (iterative) algorithm for finding the wavefunctions |ψi i
A (good) computer. . .
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Basis sets in real space: Wavelet basis sets
Properties of wavelet basis sets:
localized both in real
and in Fourier space
allow for adaptivity
(for core electrons)
Introduction
Inputs
are a systematic basis set
DFT
1.5
φ(x)
Pseudopotential
Basis set
1
ψ(x)
SCF
0.5
Performances
Parallel
0
GPU
-0.5
Conclusion
-1
-1.5
-6
-4
-2
0
2
4
6
8
x
A basis set both adaptive and systematic, real space based
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
A brief description of wavelet theory
Two kind of basis functions
A Multi-Resolution real space basis
The functions can be classified following the resolution level
they span.
Introduction
Inputs
DFT
Pseudopotential
Basis set
SCF
Scaling Functions
The functions of low resolution level are a linear combination
of high-resolution functions
Performances
=
Parallel
+
GPU
m
Conclusion
φ(x ) =
∑
hj φ(2x − j )
j =−m
Centered on a resolution-dependent grid: φj = φ0 (x − j ).
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
A brief description of wavelet theory
Wavelets
They contain the DoF needed to complete the information
which is lacking due to the coarseness of the resolution.
=
Introduction
Inputs
DFT
Pseudopotential
1
2
+ 12
m
m
φ(2x ) =
∑
h̃j φ(x − j ) +
∑
g̃j ψ(x − j )
j =−m
j =−m
Basis set
SCF
Performances
Increase the resolution without modifying grid space
Parallel
GPU
SF + W = same DoF of SF of higher resolution
Conclusion
m
ψ(x ) =
∑
gj φ(2x − j )
j =−m
All functions have compact support, centered on grid points.
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Adaptivity of the mesh
Adaptivity
Resolution can be refined following the grid point.
The grid is divided in:
Low resolution pts
(SF, 1 DoF)
Introduction
High resolution pts
(SF + W, 8 DoF)
Inputs
DFT
Pseudopotential
Basis set
Points of different resolution
belong to the same grid.
SCF
Performances
Parallel
GPU
Conclusion
Data can thus be stored in compressed form.
Localization property
Empty regions must not be “filled” with basis functions.
Optimal for big inhomogeneous systems, O (N ) approach.
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Adaptivity of the mesh
Atomic positions (H2 O), hgrid
Introduction
Inputs
DFT
Pseudopotential
Basis set
SCF
Performances
Parallel
GPU
Conclusion
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Adaptivity of the mesh
Fine grid (high resolution, frmult)
Introduction
Inputs
DFT
Pseudopotential
Basis set
SCF
Performances
Parallel
GPU
Conclusion
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Adaptivity of the mesh
Coarse grid (low resolution, crmult)
Introduction
Inputs
DFT
Pseudopotential
Basis set
SCF
Performances
Parallel
GPU
Conclusion
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
input.dft: hgrid, crmult, frmult
hgrid Step grid: twice as bigger than the lowest
gaussian coefficient of pseudopotential
Roughly 0.45 (only orthorhombic mesh)
Introduction
Inputs
DFT
Pseudopotential
Basis set
crmult Extension of the coarse resolution (factor 6.):
depends on the extension of the HOMO for the
atom
SCF
Performances
Parallel
GPU
frmult Extension of the fine resolution (factor 8.):
depends on the pseudopotential coefficient
Conclusion
In practice, we use only hgrid from 0.3 to 0.55.
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
No integration error
Orthogonality, scaling relation
Z
dx φk (x )φj (x ) = δkj
Introduction
Inputs
DFT
Pseudopotential
1
φ(x ) = √
m
∑
2 j =−m
hj φ(2x − j )
The hamiltonian-related quantities can be calculated up to
machine precision in the given basis.
14
The accuracy is only limited by the basis set (O hgrid
)
Basis set
SCF
Performances
Parallel
GPU
Exact evaluation of kinetic energy
Obtained by convolution with filters:
Conclusion
f (x ) =
∑ c` φ` (x ) ,
`
c̃` =
∑ cj a`−j ,
∇2 f (x ) = ∑ c̃` φ` (x ) ,
`
a` ≡
Z
φ0 (x )∂2x φ` (x ) ,
j
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Separability in 3D
The 3-dim scaling basis is a tensor product decomposition of
1-dim Scaling Functions/ Wavelets.
e ,e ,e
e
φjxx,jy ,yjz z (x , y , z ) = φejxx (x )φjyy (y )φejzz (z )
Introduction
Inputs
With (jx , jy , jz ) the node coordinates,
(0)
φj
(1)
and φj
the SF and the W respectively.
DFT
Pseudopotential
Basis set
SCF
Performances
Parallel
GPU
Conclusion
Gaussians and wavelets
The separability of the basis allows us to save computational
time when performing scalar products with separable
functions (e.g. gaussians):
Initial wavefunctions (input guess)
Poisson solver
Non-local pseudopotentials
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Wavelet families used in BigDFT code
Daubechies f (x ) = ∑` c` φ` (x )
R
Orthogonal c` = dx φ` (x )f (x )
LEAST ASYMMETRIC DAUBECHIES-16
1.5
Interpolating f (x ) = ∑j fj ϕj (x )
Dual to dirac deltas fj = f (j )
INTERPOLATING (DESLAURIERS-DUBUC)-8
1
scaling function
wavelet
scaling function
wavelet
1
0.5
0.5
Introduction
Inputs
DFT
Pseudopotential
Basis set
SCF
Performances
Parallel
GPU
0
0
-0.5
-0.5
-1
-1.5
-6
-4
-2
0
2
4
6
Used for wavefunctions,
scalar products
-1
-4
-2
0
2
4
Used for charge density,
function products
Conclusion
Magic Filter method (A.Neelov, S. Goedecker)
The passage between the two basis sets can be performed
without losing accuracy
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Performing a DFT calculation
A self-consistent equation
ρ(r) = ∑ Ψ∗i (r)Ψi (r), where |ψi i satisfies
i
1 2
− ∇ + VH [ρ] + Vxc [ρ] + Vext + Vpseudo |ψi i = Ei |ψi i ,
2
Introduction
Inputs
DFT
(Kohn-Sham) DFT “Ingredients”
Pseudopotential
Basis set
SCF
Performances
Parallel
GPU
Conclusion
An XC potential, functional of the density
several approximations exists (LDA,GGA,. . . )
A choice of the pseudopotential (if not all-electrons)
(norm conserving, ultrasoft, PAW,. . . )
A basis set for expressing the |ψi i
An (iterative) algorithm for finding the wavefunctions |ψi i
A (good) computer. . .
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Kohn-Sham Equations: Operators
Apply different operators
Having a one-electron hamiltonian in a mean field.
Introduction
Inputs
1 2
− ∇ + Veff (r )(r ) ψi = εi ψi
2
DFT
Basis set
Veff (r ) = Vext (r ) +
SCF
Performances
Parallel
GPU
Conclusion
ρ(r 0 ) 0
dr + µxc (r )
V |r − r 0 |
Z
Pseudopotential
E [ρ] can be expressed by the orthonormalized states of one
particule: ψi (r ) with the fractional occupancy number fi
( 0 ≤ f i ≤ 1) :
ρ(r ) = 2 ∑ fi |ψi (r )|2
i
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Kohn-Sham Equations: Computing Energies
Calculate different integrals
E [ρ] = K [ρ] + U [ρ]
Introduction
K [ρ] = −
Inputs
DFT
2 ~2
2 me
Z
∑
i
V
dr ψ∗i ∇2 ψi
Pseudopotential
Basis set
SCF
Performances
Parallel
Z
U [ρ] =
V
dr Vext (r ) ρ(r )+
1
2 V
|
GPU
Z
ρ(r )ρ(r 0 )
+
E [ρ]
| xc{z }
|r − r 0 |
{z
} exchange−correlation
dr dr 0
Hartree
Conclusion
We minimise with the variables ψi (r ) and fi
Z
with the constraint
V
Laboratoire de Simulation Atomistique
dr ρ(r ) = Nel .
http://inac.cea.fr/L_Sim
T. Deutsch
KS Equations: Self-Consistent Field
−
Set of self-consistent equations:
1 ~2
2 me
2
∇ + Veff
ψi = εi ψi
with an effective potential:
Introduction
Inputs
Veff (r ) = Vext (r ) +
Z
V
DFT
Pseudopotential
|
Basis set
ρ(r 0 )
|r − r 0 |
{z
}
dr 0
Hartree
+
δExc
δρ(r )
| {z }
exchange−correlation
SCF
Performances
Parallel
and:
GPU
ρ(r ) = 2 ∑i fi |ψi (r )|2
Conclusion
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
KS Equations: Self-Consistent Field
−
Set of self-consistent equations:
1 ~2
2 me
2
∇ + Veff
ψi = εi ψi
with an effective potential:
Introduction
Inputs
Veff (r ) = Vext (r ) +
Z
V
DFT
Pseudopotential
|
Basis set
ρ(r 0 )
|r − r 0 |
{z
}
dr 0
Hartree
+
δExc
δρ(r )
| {z }
exchange−correlation
SCF
Performances
Parallel
and:
GPU
ρ(r ) = 2 ∑i fi |ψi (r )|2
Conclusion
Poisson Equation:
∆VHartree = ρ
Laboratoire de Simulation Atomistique
2
2
2
(Laplacian: ∆ = ∂∂x 2 + ∂∂y 2 + ∂∂z 2 )
http://inac.cea.fr/L_Sim
T. Deutsch
KS Equations: Self-Consistent Field
−
Set of self-consistent equations:
1 ~2
2 me
2
∇ + Veff
ψi = εi ψi
with an effective potential:
Introduction
Inputs
Veff (r ) = Vext (r ) +
Z
V
DFT
Pseudopotential
|
Basis set
ρ(r 0 )
|r − r 0 |
{z
}
dr 0
Hartree
+
δExc
δρ(r )
| {z }
exchange−correlation
SCF
Performances
Parallel
and:
GPU
ρ(r ) = 2 ∑i fi |ψi (r )|2
Conclusion
Poisson Equation:
∆VHartree = ρ
Laboratoire de Simulation Atomistique
2
2
2
(Laplacian: ∆ = ∂∂x 2 + ∂∂y 2 + ∂∂z 2 )
http://inac.cea.fr/L_Sim
T. Deutsch
KS Equations: Self-Consistent Field
−
Set of self-consistent equations:
1 ~2
2 me
2
∇ + Veff
ψi = εi ψi
with an effective potential:
Introduction
Inputs
Veff (r ) = Vext (r ) +
Z
V
DFT
Pseudopotential
|
Basis set
ρ(r 0 )
|r − r 0 |
{z
}
dr 0
Hartree
+
δExc
δρ(r )
| {z }
exchange−correlation
SCF
Performances
Parallel
and:
GPU
ρ(r ) = 2 ∑i fi |ψi (r )|2
Conclusion
Poisson Equation:
∆VHartree = ρ
Laboratoire de Simulation Atomistique
2
2
2
(Laplacian: ∆ = ∂∂x 2 + ∂∂y 2 + ∂∂z 2 )
http://inac.cea.fr/L_Sim
T. Deutsch
KS Equations: Self-Consistent Field
−
Set of self-consistent equations:
1 ~2
2 me
2
∇ + Veff
ψi = εi ψi
with an effective potential:
Introduction
Inputs
Veff (r ) = Vext (r ) +
Z
V
DFT
Pseudopotential
|
Basis set
ρ(r 0 )
|r − r 0 |
{z
}
dr 0
Hartree
+
δExc
δρ(r )
| {z }
exchange−correlation
SCF
Performances
Parallel
and:
GPU
ρ(r ) = 2 ∑i fi |ψi (r )|2
Conclusion
Poisson Equation:
∆VHartree = ρ
Laboratoire de Simulation Atomistique
2
2
2
(Laplacian: ∆ = ∂∂x 2 + ∂∂y 2 + ∂∂z 2 )
http://inac.cea.fr/L_Sim
T. Deutsch
KS Equations: Self-Consistent Field
−
Set of self-consistent equations:
1 ~2
2 me
2
∇ + Veff
ψi = εi ψi
with an effective potential:
Introduction
Inputs
Veff (r ) = Vext (r ) +
Z
V
DFT
Pseudopotential
|
Basis set
ρ(r 0 )
|r − r 0 |
{z
}
dr 0
Hartree
+
δExc
δρ(r )
| {z }
exchange−correlation
SCF
Performances
Parallel
and:
GPU
ρ(r ) = 2 ∑i fi |ψi (r )|2
Conclusion
Poisson Equation:
∆VHartree = ρ
Laboratoire de Simulation Atomistique
2
2
2
(Laplacian: ∆ = ∂∂x 2 + ∂∂y 2 + ∂∂z 2 )
http://inac.cea.fr/L_Sim
T. Deutsch
Direct Minimisation: Flowchart
ψi = ∑ cai φa
Basis: adaptive mesh (0, . . . , Nφ )
a
Orthonormalized
Introduction
Inputs
DFT
Pseudopotential
Basis set
SCF
Performances
Parallel
GPU
Conclusion
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Direct Minimisation: Flowchart
ψi = ∑ cai φa
Basis: adaptive mesh (0, . . . , Nφ )
a
Orthonormalized
inv FWT + Magic Filter
occ
Introduction
ρ(r ) = ∑ |ψi (r )|2
Fine non-adaptive mesh: < 8Nφ
i
Inputs
DFT
Pseudopotential
Basis set
SCF
Performances
Parallel
GPU
Conclusion
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Direct Minimisation: Flowchart
ψi = ∑ cai φa
Basis: adaptive mesh (0, . . . , Nφ )
a
Orthonormalized
inv FWT + Magic Filter
occ
ρ(r ) = ∑ |ψi (r )|2
Introduction
Fine non-adaptive mesh: < 8Nφ
i
Inputs
DFT
Poisson solver
Pseudopotential
Basis set
SCF
Performances
Vxc [ρ(r )]
VH (r ) =
R
G(.)ρ
VNL ({ψi })
Veffective
Parallel
GPU
Conclusion
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Direct Minimisation: Flowchart
ψi = ∑ cai φa
Basis: adaptive mesh (0, . . . , Nφ )
a
Orthonormalized
inv FWT + Magic Filter
occ
ρ(r ) = ∑ |ψi (r )|2
Introduction
Fine non-adaptive mesh: < 8Nφ
i
Inputs
DFT
Poisson solver
Pseudopotential
Basis set
SCF
Vxc [ρ(r )]
Performances
VH (r ) =
R
G(.)ρ
VNL ({ψi })
Veffective
Parallel
GPU
Conclusion
− 12 ∇2
Kinetic Term
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Direct Minimisation: Flowchart
ψi = ∑ cai φa
Basis: adaptive mesh (0, . . . , Nφ )
a
Orthonormalized
inv FWT + Magic Filter
occ
ρ(r ) = ∑ |ψi (r )|2
Introduction
Fine non-adaptive mesh: < 8Nφ
i
Inputs
DFT
Poisson solver
Pseudopotential
Basis set
SCF
Performances
Vxc [ρ(r )]
VH (r ) =
Parallel
G(.)ρ
VNL ({ψi })
Veffective
FWT
GPU
Conclusion
R
− 12 ∇2 FWT
δcai = −
∂Etotal
+ Λij caj
∂ci∗ (a) ∑
j
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
Kinetic Term
Λij =< ψi |H |ψj >
T. Deutsch
Direct Minimisation: Flowchart
ψi = ∑ cai φa
Basis: adaptive mesh (0, . . . , Nφ )
a
Orthonormalized
inv FWT + Magic Filter
occ
ρ(r ) = ∑ |ψi (r )|2
Introduction
Fine non-adaptive mesh: < 8Nφ
i
Inputs
DFT
Poisson solver
Pseudopotential
Basis set
SCF
Performances
VH (r ) =
Vxc [ρ(r )]
Parallel
G(.)ρ
VNL ({ψi })
Veffective
FWT
GPU
Conclusion
R
− 12 ∇2 FWT
δcai = −
∂Etotal
+ Λij caj
∂ci∗ (a) ∑
j
Kinetic Term
Λij =< ψi |H |ψj >
preconditioning
new ,i
ca
= cai + hstep δcai
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
Steepest Descent,
DIIS
T. Deutsch
Direct Minimisation: Flowchart
ψi = ∑ cai φa
Basis: adaptive mesh (0, . . . , Nφ )
a
Orthonormalized
inv FWT + Magic Filter
occ
ρ(r ) = ∑ |ψi (r )|2
Introduction
Fine non-adaptive mesh: < 8Nφ
i
Inputs
DFT
Poisson solver
Pseudopotential
Basis set
SCF
Performances
VH (r ) =
Vxc [ρ(r )]
Parallel
G(.)ρ
VNL ({ψi })
Veffective
FWT
GPU
Conclusion
R
− 12 ∇2 FWT
δcai = −
Stop when δcai small
new ,i
ca
∂Etotal
+ Λij caj
∂ci∗ (a) ∑
j
Kinetic Term
Λij =< ψi |H |ψj >
preconditioning
= cai + hstep δcai
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
Steepest Descent,
DIIS
T. Deutsch
input.dft: idsx DIIS history
Use DIIS algorithm
Direct Invesion of Iterative Subspace
Introduction
Inputs
DFT
Pseudopotential
Use idsx=6 previous iterations Memory-consuming
(decrease if too big=)
Steepest descent if idsx=0
Basis set
SCF
Performances
Parallel
itermax=50 Specify maximum number of iterations
Typically 15 iterations to converge
GPU
Conclusion
gnrm_cv=1.e-4 convergence criterion
Residue |δψ|2 < gnrm_cv
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Description of the file input.dft
Introduction
Inputs
DFT
Pseudopotential
Basis set
SCF
Performances
Parallel
GPU
Conclusion
0.450 0.450 0.450
hx, hy, hz
5.0 8.0
crmult, frmult, coarse and fine radius
1
ixc: XC parameter (LDA=1,PBE=11)
0 0.0
ncharge: charge of the system, Electric field
10
nspin=1 non-spin polarization, total magnetic moment
1.E-04
gnrm_cv: convergence criterion gradient
50 10
itermax, nrepmax
76
# CG (preconditioning), length of diis history
0
dispersion correction functional (0, 1, 2, 3)
write "CUDAGPU" to use GPU (GPU.config file needed)
0F0
InputPsiId, output_wf, output_grid
5.0 30
length of the tail, # tail CG iterations
0 0 0 davidson treatment, # virtual orbitals, # plotted orbitals
2
verbosity of the output 0=low, 2=high
T
disable the symmetry detection
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Description of the file input.dft
Introduction
Inputs
DFT
Pseudopotential
Basis set
SCF
Performances
Parallel
GPU
Conclusion
0.450 0.450 0.450
hx, hy, hz
5.0 8.0
crmult, frmult, coarse and fine radius
1
ixc: XC parameter (LDA=1,PBE=11)
0 0.0
ncharge: charge of the system, Electric field
10
nspin=1 non-spin polarization, total magnetic moment
1.E-04
gnrm_cv: convergence criterion gradient
50 10
itermax, nrepmax
76
# CG (preconditioning), length of diis history
0
dispersion correction functional (0, 1, 2, 3)
write "CUDAGPU" to use GPU (GPU.config file needed)
0F0
InputPsiId, output_wf, output_grid
5.0 30
length of the tail, # tail CG iterations
0 0 0 davidson treatment, # virtual orbitals, # plotted orbitals
2
verbosity of the output 0=low, 2=high
T
disable the symmetry detection
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Description of the file input.dft
Introduction
Inputs
DFT
Pseudopotential
Basis set
SCF
Performances
Parallel
GPU
Conclusion
0.450 0.450 0.450
hx, hy, hz
5.0 8.0
crmult, frmult, coarse and fine radius
1
ixc: XC parameter (LDA=1,PBE=11)
0 0.0
ncharge: charge of the system, Electric field
10
nspin=1 non-spin polarization, total magnetic moment
1.E-04
gnrm_cv: convergence criterion gradient
50 10
itermax, nrepmax
76
# CG (preconditioning), length of diis history
0
dispersion correction functional (0, 1, 2, 3)
write "CUDAGPU" to use GPU (GPU.config file needed)
0F0
InputPsiId, output_wf, output_grid
5.0 30
length of the tail, # tail CG iterations
0 0 0 davidson treatment, # virtual orbitals, # plotted orbitals
2
verbosity of the output 0=low, 2=high
T
disable the symmetry detection
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Description of the file input.dft
Introduction
Inputs
DFT
Pseudopotential
Basis set
SCF
Performances
Parallel
GPU
Conclusion
0.450 0.450 0.450
hx, hy, hz
5.0 8.0
crmult, frmult, coarse and fine radius
1
ixc: XC parameter (LDA=1,PBE=11)
0 0.0
ncharge: charge of the system, Electric field
10
nspin=1 non-spin polarization, total magnetic moment
1.E-04
gnrm_cv: convergence criterion gradient
50 10
itermax, nrepmax
76
# CG (preconditioning), length of diis history
0
dispersion correction functional (0, 1, 2, 3)
write "CUDAGPU" to use GPU (GPU.config file needed)
0F0
InputPsiId, output_wf, output_grid
5.0 30
length of the tail, # tail CG iterations
0 0 0 davidson treatment, # virtual orbitals, # plotted orbitals
2
verbosity of the output 0=low, 2=high
T
disable the symmetry detection
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Description of the file input.dft
Introduction
Inputs
DFT
Pseudopotential
Basis set
SCF
Performances
Parallel
GPU
Conclusion
0.450 0.450 0.450
hx, hy, hz
5.0 8.0
crmult, frmult, coarse and fine radius
1
ixc: XC parameter (LDA=1,PBE=11)
0 0.0
ncharge: charge of the system, Electric field
10
nspin=1 non-spin polarization, total magnetic moment
1.E-04
gnrm_cv: convergence criterion gradient
50 10
itermax, nrepmax
76
# CG (preconditioning), length of diis history
0
dispersion correction functional (0, 1, 2, 3)
write "CUDAGPU" to use GPU (GPU.config file needed)
0F0
InputPsiId, output_wf, output_grid
5.0 30
length of the tail, # tail CG iterations
0 0 0 davidson treatment, # virtual orbitals, # plotted orbitals
2
verbosity of the output 0=low, 2=high
T
disable the symmetry detection
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Description of the file input.dft
Introduction
Inputs
DFT
Pseudopotential
Basis set
SCF
Performances
Parallel
GPU
Conclusion
0.450 0.450 0.450
hx, hy, hz
5.0 8.0
crmult, frmult, coarse and fine radius
1
ixc: XC parameter (LDA=1,PBE=11)
0 0.0
ncharge: charge of the system, Electric field
10
nspin=1 non-spin polarization, total magnetic moment
1.E-04
gnrm_cv: convergence criterion gradient
50 10
itermax, nrepmax
76
# CG (preconditioning), length of diis history
0
dispersion correction functional (0, 1, 2, 3)
write "CUDAGPU" to use GPU (GPU.config file needed)
0F0
InputPsiId, output_wf, output_grid
5.0 30
length of the tail, # tail CG iterations
0 0 0 davidson treatment, # virtual orbitals, # plotted orbitals
2
verbosity of the output 0=low, 2=high
T
disable the symmetry detection
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Description of the file input.dft
Introduction
Inputs
DFT
Pseudopotential
Basis set
SCF
Performances
Parallel
GPU
Conclusion
0.450 0.450 0.450
hx, hy, hz
5.0 8.0
crmult, frmult, coarse and fine radius
1
ixc: XC parameter (LDA=1,PBE=11)
0 0.0
ncharge: charge of the system, Electric field
10
nspin=1 non-spin polarization, total magnetic moment
1.E-04
gnrm_cv: convergence criterion gradient
50 10
itermax, nrepmax
76
# CG (preconditioning), length of diis history
0
dispersion correction functional (0, 1, 2, 3)
write "CUDAGPU" to use GPU (GPU.config file needed)
0F0
InputPsiId, output_wf, output_grid
5.0 30
length of the tail, # tail CG iterations
0 0 0 davidson treatment, # virtual orbitals, # plotted orbitals
2
verbosity of the output 0=low, 2=high
T
disable the symmetry detection
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Description of the file input.dft
Introduction
Inputs
DFT
Pseudopotential
Basis set
SCF
Performances
Parallel
GPU
Conclusion
0.450 0.450 0.450
hx, hy, hz
5.0 8.0
crmult, frmult, coarse and fine radius
1
ixc: XC parameter (LDA=1,PBE=11)
0 0.0
ncharge: charge of the system, Electric field
10
nspin=1 non-spin polarization, total magnetic moment
1.E-04
gnrm_cv: convergence criterion gradient
50 10
itermax, nrepmax
76
# CG (preconditioning), length of diis history
0
dispersion correction functional (0, 1, 2, 3)
write "CUDAGPU" to use GPU (GPU.config file needed)
0F0
InputPsiId, output_wf, output_grid
5.0 30
length of the tail, # tail CG iterations
0 0 0 davidson treatment, # virtual orbitals, # plotted orbitals
2
verbosity of the output 0=low, 2=high
T
disable the symmetry detection
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Description of the file input.dft
Introduction
Inputs
DFT
Pseudopotential
Basis set
SCF
Performances
Parallel
GPU
Conclusion
0.450 0.450 0.450
hx, hy, hz
5.0 8.0
crmult, frmult, coarse and fine radius
1
ixc: XC parameter (LDA=1,PBE=11)
0 0.0
ncharge: charge of the system, Electric field
10
nspin=1 non-spin polarization, total magnetic moment
1.E-04
gnrm_cv: convergence criterion gradient
50 10
itermax, nrepmax
76
# CG (preconditioning), length of diis history
0
dispersion correction functional (0, 1, 2, 3)
write "CUDAGPU" to use GPU (GPU.config file needed)
0F0
InputPsiId, output_wf, output_grid
5.0 30
length of the tail, # tail CG iterations
0 0 0 davidson treatment, # virtual orbitals, # plotted orbitals
2
verbosity of the output 0=low, 2=high
T
disable the symmetry detection
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Description of the file input.dft
Introduction
Inputs
DFT
Pseudopotential
Basis set
SCF
Performances
Parallel
GPU
Conclusion
0.450 0.450 0.450
hx, hy, hz
5.0 8.0
crmult, frmult, coarse and fine radius
1
ixc: XC parameter (LDA=1,PBE=11)
0 0.0
ncharge: charge of the system, Electric field
10
nspin=1 non-spin polarization, total magnetic moment
1.E-04
gnrm_cv: convergence criterion gradient
50 10
itermax, nrepmax
76
# CG (preconditioning), length of diis history
0
dispersion correction functional (0, 1, 2, 3)
write "CUDAGPU" to use GPU (GPU.config file needed)
0F0
InputPsiId, output_wf, output_grid
5.0 30
length of the tail, # tail CG iterations
0 0 0 davidson treatment, # virtual orbitals, # plotted orbitals
2
verbosity of the output 0=low, 2=high
T
disable the symmetry detection
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Description of the file input.dft
Introduction
Inputs
DFT
Pseudopotential
Basis set
SCF
Performances
Parallel
GPU
Conclusion
0.450 0.450 0.450
hx, hy, hz
5.0 8.0
crmult, frmult, coarse and fine radius
1
ixc: XC parameter (LDA=1,PBE=11)
0 0.0
ncharge: charge of the system, Electric field
10
nspin=1 non-spin polarization, total magnetic moment
1.E-04
gnrm_cv: convergence criterion gradient
50 10
itermax, nrepmax
76
# CG (preconditioning), length of diis history
0
dispersion correction functional (0, 1, 2, 3)
write "CUDAGPU" to use GPU (GPU.config file needed)
0F0
InputPsiId, output_wf, output_grid
5.0 30
length of the tail, # tail CG iterations
0 0 0 davidson treatment, # virtual orbitals, # plotted orbitals
2
verbosity of the output 0=low, 2=high
T
disable the symmetry detection
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Description of the file input.dft
Introduction
Inputs
DFT
Pseudopotential
Basis set
SCF
Performances
Parallel
GPU
Conclusion
0.450 0.450 0.450
hx, hy, hz
5.0 8.0
crmult, frmult, coarse and fine radius
1
ixc: XC parameter (LDA=1,PBE=11)
0 0.0
ncharge: charge of the system, Electric field
10
nspin=1 non-spin polarization, total magnetic moment
1.E-04
gnrm_cv: convergence criterion gradient
50 10
itermax, nrepmax
76
# CG (preconditioning), length of diis history
0
dispersion correction functional (0, 1, 2, 3)
write "CUDAGPU" to use GPU (GPU.config file needed)
0F0
InputPsiId, output_wf, output_grid
5.0 30
length of the tail, # tail CG iterations
0 0 0 davidson treatment, # virtual orbitals, # plotted orbitals
2
verbosity of the output 0=low, 2=high
T
disable the symmetry detection
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Description of the file input.dft
Introduction
Inputs
DFT
Pseudopotential
Basis set
SCF
Performances
Parallel
GPU
Conclusion
0.450 0.450 0.450
hx, hy, hz
5.0 8.0
crmult, frmult, coarse and fine radius
1
ixc: XC parameter (LDA=1,PBE=11)
0 0.0
ncharge: charge of the system, Electric field
10
nspin=1 non-spin polarization, total magnetic moment
1.E-04
gnrm_cv: convergence criterion gradient
50 10
itermax, nrepmax
76
# CG (preconditioning), length of diis history
0
dispersion correction functional (0, 1, 2, 3)
write "CUDAGPU" to use GPU (GPU.config file needed)
0F0
InputPsiId, output_wf, output_grid
5.0 30
length of the tail, # tail CG iterations
0 0 0 davidson treatment, # virtual orbitals, # plotted orbitals
2
verbosity of the output 0=low, 2=high
T
disable the symmetry detection
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Description of the file input.dft
Introduction
Inputs
DFT
Pseudopotential
Basis set
SCF
Performances
Parallel
GPU
Conclusion
0.450 0.450 0.450
hx, hy, hz
5.0 8.0
crmult, frmult, coarse and fine radius
1
ixc: XC parameter (LDA=1,PBE=11)
0 0.0
ncharge: charge of the system, Electric field
10
nspin=1 non-spin polarization, total magnetic moment
1.E-04
gnrm_cv: convergence criterion gradient
50 10
itermax, nrepmax
76
# CG (preconditioning), length of diis history
0
dispersion correction functional (0, 1, 2, 3)
write "CUDAGPU" to use GPU (GPU.config file needed)
0F0
InputPsiId, output_wf, output_grid
5.0 30
length of the tail, # tail CG iterations
0 0 0 davidson treatment, # virtual orbitals, # plotted orbitals
2
verbosity of the output 0=low, 2=high
T
disable the symmetry detection
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Basis sets: wavelets
pseudopotential
self-consistent (SCF)
norm-conserving
PAW
all electrons
Hybrid functionals
Harris-Foulkes functional
relativistic
non-relativistic
Introduction
Inputs
DFT
h
beyond LDA
GW
LDA,GGA
LDA+U
i
1 2
− 2 ∇ +V (r ) +µxc (r ) ψki= εki ψki
Pseudopotential
periodic
Basis set
SCF
Performances
non-periodic
atomic orbitals
Gaussians
plane waves
Slater
augmented
numerical
Parallel
GPU
Conclusion
N 3 scaling
O (N ) methods
non-spin polarized
real space
non-collinear
finite difference
spin polarized
Wavelet
collinear
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Optimal for isolated systems
Test case: cinchonidine molecule (44 atoms)
101
Introduction
Inputs
DFT
Pseudopotential
Absolute energy precision (Ha)
Plane waves
0
10
Ec = 40 Ha
10-1
Ec = 90 Ha
10-2
h = 0.4bohr
Performances
Parallel
GPU
Conclusion
Ec = 125 Ha
10-3
h = 0.3bohr
Basis set
SCF
Wavelets
10-4
105
Number of degrees of freedom
106
Allows a systematic approach for molecules
Considerably faster than Plane Waves codes.
10 (5) times faster than ABINIT (CPMD)
Charged systems can be treated explicitly with the same time
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Systematic basis set
Two parameters for tuning the basis
The grid spacing hgrid
The extension of the low resolution points crmult
Introduction
Inputs
DFT
Pseudopotential
Basis set
SCF
Performances
Parallel
GPU
Conclusion
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Poisson Solver using interpolating scaling function
The Hartree potential is calculated in the interpolating scaling
function basis.
Poisson solver with interpolating scaling functions
From the density ρ(~j ) on an uniform grid it calculates:
Introduction
Inputs
VH (~j ) =
R
d~x
ρ(~x )
|~x −~j |
DFT
Pseudopotential
Basis set
SCF
Performances
Parallel
GPU
Conclusion
4 Very fast and accurate, optimal parallelization
4 Can be used independently from the DFT code
4 Integrated quantities (energies) are easy to extract
8 Non-adaptive, needs data uncompression
Explicitly free boundary conditions
No need to subtract supercell interactions
→ charged systems.
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Poisson Solver for surface boundary conditions
A Poisson solver for surface problems
Based on a mixed reciprocal-direct space representation
Z
Vpx ,py (z ) = −4π dz 0 G(|~p|; z − z 0 )ρpx ,py (z ) ,
4 Can be applied both in real or reciprocal space codes
Introduction
4 No supercell or screening functions
Inputs
DFT
4 More precise than other existing approaches
Pseudopotential
Basis set
SCF
Example of the plane capacitor:
Performances
Parallel
GPU
Hockney
Periodic
Our approach
Conclusion
V
V
V
y
x
Laboratoire de Simulation Atomistique
y
x
http://inac.cea.fr/L_Sim
y
x
T. Deutsch
input.geopt: Relaxation or molecular dynamics
DIIS
500
1.d0
Introduction
Inputs
ncount_cluster: max steps during geometry relaxation
frac_fluct: geometry optimization stops
if force norm less than frac_fluct of noise
0.0d0
randdis: random amplitude for atoms
4.d0 5
betax: steepest descent step size, diis history
DFT
Pseudopotential
Basis set
Different algorithms:
SCF
Performances
Parallel
GPU
Conclusion
DIIS: Direct Inversion in the iterative subspace
VSSD: Variable Size Steepest Descent
LBFGS: Limited BFGD
AB6MD: Use molecular dynamics routines from
ABINIT 6
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
input.geopt: Relaxation or molecular dynamics
DIIS
500
1.d0
Introduction
Inputs
ncount_cluster: max steps during geometry relaxation
frac_fluct: geometry optimization stops
if force norm less than frac_fluct of noise
0.0d0
randdis: random amplitude for atoms
4.d0 5
betax: steepest descent step size, diis history
DFT
Pseudopotential
Basis set
Different algorithms:
SCF
Performances
Parallel
GPU
Conclusion
DIIS: Direct Inversion in the iterative subspace
VSSD: Variable Size Steepest Descent
LBFGS: Limited BFGD
AB6MD: Use molecular dynamics routines from
ABINIT 6
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
input.geopt: Relaxation or molecular dynamics
DIIS
500
1.d0
Introduction
Inputs
ncount_cluster: max steps during geometry relaxation
frac_fluct: geometry optimization stops
if force norm less than frac_fluct of noise
0.0d0
randdis: random amplitude for atoms
4.d0 5
betax: steepest descent step size, diis history
DFT
Pseudopotential
Basis set
Different algorithms:
SCF
Performances
Parallel
GPU
Conclusion
DIIS: Direct Inversion in the iterative subspace
VSSD: Variable Size Steepest Descent
LBFGS: Limited BFGD
AB6MD: Use molecular dynamics routines from
ABINIT 6
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
input.geopt: Relaxation or molecular dynamics
DIIS
500
1.d0
Introduction
Inputs
ncount_cluster: max steps during geometry relaxation
frac_fluct: geometry optimization stops
if force norm less than frac_fluct of noise
0.0d0
randdis: random amplitude for atoms
4.d0 5
betax: steepest descent step size, diis history
DFT
Pseudopotential
Basis set
Different algorithms:
SCF
Performances
Parallel
GPU
Conclusion
DIIS: Direct Inversion in the iterative subspace
VSSD: Variable Size Steepest Descent
LBFGS: Limited BFGD
AB6MD: Use molecular dynamics routines from
ABINIT 6
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
input.geopt: Relaxation or molecular dynamics
DIIS
500
1.d0
Introduction
Inputs
ncount_cluster: max steps during geometry relaxation
frac_fluct: geometry optimization stops
if force norm less than frac_fluct of noise
0.0d0
randdis: random amplitude for atoms
4.d0 5
betax: steepest descent step size, diis history
DFT
Pseudopotential
Basis set
Different algorithms:
SCF
Performances
Parallel
GPU
Conclusion
DIIS: Direct Inversion in the iterative subspace
VSSD: Variable Size Steepest Descent
LBFGS: Limited BFGD
AB6MD: Use molecular dynamics routines from
ABINIT 6
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Operations performed
Different numerical
operations performed:
Introduction
Inputs
DFT
Pseudopotential
Basis set
SCF
Performances
Interpolating
Daubechies
For each
wavefunction
(Hamiltonian
application)
Between
wavefunctions
(Linear algebra)
Parallel
GPU
Comput. operations
Conclusion
Convolutions with
short filters
BLAS routines
FFT (Poisson Solver)
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Massively parallel
Two kinds of parallelisation
By orbitals (Hamiltonian application, preconditioning)
By components (overlap matrices, orthogonalisation)
Introduction
Inputs
DFT
A few (but big) packets of data
More demanding in bandwidth than in latency
Pseudopotential
Basis set
SCF
Performances
No need of fast network
Optimal speedup (eff. ∼ 85%), also for big systems
Parallel
GPU
Conclusion
Cubic scaling code
For systems bigger than 500 atomes (1500 orbitals) :
orthonormalisation operation is predominant (N 3 )
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
High Performance Computing
Localisation & Orthogonality → Data locality
Principal code operations can be intensively optimised
Optimal for application on supercomputers
Little communication, big
packets of data
Inputs
DFT
SCF
Performances
Parallel
GPU
Conclusion
2
4
4
No need of fast
network
Pseudopotential
Basis set
2
16
98
Efficiency (%)
Introduction
100
Optimal speedup
96
8
94
44
1
1
4
22
92
11
5
(with orbitals/proc)
Efficiency of the order of
90%, up to thousands of
processors
32 at
65 at
173 at
257 at
1025 at
90
88
1
2
3
10
1
100
Number of processors
1000
Data repartition optimal for material accelerators (GPU)
Graphic Processing Units can be used to speed up the
computation
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Performance in parallel (BigDFT)
Introduction
Inputs
DFT
Pseudopotential
Basis set
SCF
Performances
Parallel
GPU
Conclusion
Distribution per orbitals (wavelets)
Nothing to specify in input files
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
GPU-ported operations in BigDFT (double precision)
20
Convolutions
(CUDA rewritten)
Introduction
Inputs
GPU speedups between
10 and 20 can be
obtained for different
sections
GPU speedup (Double prec.)
18
16
14
12
10
8
6
locden
locham
precond
4
2
0
DFT
10
20
Pseudopotential
Basis set
30
40
50
Wavefunction size (MB)
60
70
70
SCF
Parallel
GPU
Conclusion
60
GPU speedup (Double prec.)
Performances
Linear algebra
(CUBLAS library)
50
40
30
20
10
DGEMM
DSYRK
0
0
500
1000
1500
Number of orbitals
Laboratoire de Simulation Atomistique
2000
2500
The interfacing with
CUBLAS is immediate,
with considerable
speedups
http://inac.cea.fr/L_Sim
T. Deutsch
Data repartition on a node
The S_GPU approach
S_GPU library manages GPU resource within the node
GPU
Introduction
Inputs
DFT
S_GPU
Pseudopotential
Basis set
SCF
Performances
Parallel
GPU
MPI 0
MPI 1
MPI 2
MPI 3
DATA
DATA
DATA
DATA
Conclusion
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
BigDFT code on Hybrid architectures
BigDFT code can run on hybrid CPU/GPU supercomputers
In multi-GPU environments, double precision calculations
No Hot-spot operations
Introduction
Inputs
DFT
Different code sections can be ported on GPU
up to 20x speedup for some operations,
7x for the full parallel code (under improvements)
Pseudopotential
Basis set
100
1000
SCF
Performances
80
Parallel
100
Seconds (log. scale)
GPU
Percent
60
Conclusion
40
10
20
0
1
2
4
6 8 10 12 14 16
1 2
CPU code
Number of cores
Laboratoire de Simulation Atomistique
4 6 8 10 12 14 16
Hybrid code (rel.)
http://inac.cea.fr/L_Sim
1
Time (sec)
Comm
Other
PureCPU
precond
locham
locden
LinAlg
T. Deutsch
CUDA + S_GPU
Put "CUDAGPU" in input.dft
Use a GPU.config file (affinity between CPU and GPU)
USE_SHARED=0
Introduction
Inputs
DFT
Pseudopotential
Set if the repartition is shared (1) or static (0)
MPI_TASKS_PER_NODE=8
Basis set
Define the number of mpi tasks per node
SCF
Performances
Parallel
GPU
Conclusion
NUM_GPU=2
Define the number of GPU device per node
GPU_CPUS_AFF_0=0,1,2,3
Define the affinity of the processes among the GPUs.
GPU_CPUS_AFF_1=4,5,6,7
USE_GPU_BLAS=1
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
Use CUDABLAS
T. Deutsch
OpenCL (without S_GPU)
Put "OCLGPU" in input.dft
Introduction
Inputs
DFT
Pseudopotential
Basis set
No GPU.config needed presently (should be removed also
for CUDA)
SCF
Performances
Parallel
GPU
Fermi is better for concurrent access
Conclusion
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Conclusion
Functionality: Mixed between physics and chemistry
approach
Order N (real space basis set as wavelets)
Introduction
Inputs
DFT
Pseudopotential
Basis set
SCF
Performances
Parallel
GPU
Conclusion
QM/MM (Quantum Mechanics/Molecular Modeling)
Multi-scale approach (more than 1000 atoms feasible for
a better parametrization)
Numerical experience:
One-day simulation
Better exploration of atomic configurations
Molecular Dynamics (4s per step for 32 water
molecules)
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Introduction
Inputs
THE END
DFT
Pseudopotential
Basis set
SCF
Performances
Parallel
GPU
Conclusion
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Introduction
Inputs
Future developments, Discussions
DFT
Pseudopotential
Basis set
SCF
Performances
Parallel
GPU
Conclusion
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
O(N) approach
Introduction
Versatile boundary
conditions (surface,
periodic, isolated)
Inputs
DFT
Pseudopotential
Basis set
SCF
Localization regions
Performances
Wannier functions
Parallel
GPU
Easily embedded
Conclusion
TCPU
Crossover point
N
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Embedding a system (QM/MM)
Introduction
Inputs
DFT
Pseudopotential
Basis set
SCF
Performances
Parallel
GPU
Conclusion
QM/MM
Charge defects in bulk materials
Semi-infinite system as surface
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch
Developments
Introduction
Inputs
DFT
Pseudopotential
Embedding atoms, O(N) method, region of localisation
Many-body: TD-DFT, GW, Bethe-Salpeter
PAW pseudopotentials (better accuracy, larger step grid)
Basis set
SCF
Mixed basis sets
Performances
Parallel
GPU
Conclusion
Laboratoire de Simulation Atomistique
http://inac.cea.fr/L_Sim
T. Deutsch

Similar documents

Electronic Structure Calculations, Density Functional

Electronic Structure Calculations, Density Functional Diagonalisation Gaussians Real space basis sets

More information