BigDFT tutorial
Transcription
BigDFT tutorial
BigDFT tutorial F IRST YARMOUK S CHOOL Introduction Inputs BigDFT tutorial DFT Pseudopotential Basis set SCF Thierry Deutsch Performances Parallel GPU L_Sim - CEA Grenoble Conclusion November, 2. 2010 Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Outline 1 Introduction Inputs 2 Density Functional Theory (quick view) Pseudopotential Basis set SCF 3 Performances Parallel GPU 4 Conclusion Introduction Inputs DFT Pseudopotential Basis set SCF Performances Parallel GPU Conclusion Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch A basis for nanosciences: the BigDFT project STREP European project: BigDFT(2005-2008) Four partners, 15 contributors: CEA-INAC Grenoble, U. Basel, U. Louvain-la-Neuve, U. Kiel Introduction Inputs DFT Pseudopotential Basis set SCF Performances Aim: To develop an ab-initio DFT code based on Daubechies Wavelets, to be integrated in ABINIT, distributed freely (GNU-GPL license) Parallel GPU Conclusion L. Genovese, A. Neelov, S. Goedecker, T. Deutsch, et al., “Daubechies wavelets as a basis set for density functional pseudopotential calculations”, J. Chem. Phys. 129, 014109 (2008) Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch BigDFT version 1.3: capabilities http://inac.cea.fr/L_Sim/BigDFT Free, surface and periodic boundary conditions Local geometry optimizations (with constraints) Global geometry optimization Introduction Inputs Saddle point searches Vibrations DFT Pseudopotential External electric fields Basis set SCF Performances Parallel Born-Oppenheimer MD Unoccupied KS orbitals GPU Conclusion Collinear and Non-collinear magnetism All XC functionals of the ABINIT package Empirical van der Waals interactions Also available from within the ABINIT package Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch BigDFT version 1.4 (June 2010) http://inac.cea.fr/L_Sim/BigDFT Better performance for periodic systems (convolutions) Exact exchange Introduction Inputs DFT Pseudopotential Hybrid functionals (libXC, G2-1 tests) Molecular Dynamics from ABINIT K points (periodic) Basis set SCF Performances Parallel Symmetries from ABINIT Preliminary version for XANES GPU Conclusion GPU version with S_GPU BigDFT/ABINIT training workshop: June 30 – Juny 1 Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Spirit of BigDFT Simple for developers: no parser, use input files with mandatory parameters Introduction Inputs DFT Pseudopotential Basis set SCF Performances Parallel GPU Few input files Optional input files as input.kpt (for periodic systems) or input.perf Few input parameters ex. not parameters for parallel calculations Conclusion Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch BigDFT: Executables bigdft Single point calculation, geometry optimisation, molecular dynamics Execute simply: bigdft Introduction Inputs DFT memguess ’number_of_cores’ Check input files and determine the memory amount Pseudopotential Basis set SCF Performances Parallel frequencies Vibration properties abscalc XANES calculations GPU Conclusion bart BigDFT + ART (global minima search) NEB Determination of saddle point Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch BigDFT: input files posinp.xyz Atomic positions + boundary conditions input.dft Everything about DFT calculations psppar.N Pseudopotential file with the name of atom (ABINIT format) Introduction Inputs DFT Pseudopotential Optional files Basis set SCF Performances Parallel input.geopt Geometry optimisation or molecular dynamics parameters GPU Conclusion input.freq Frequencies calculations input.perf Options for performances input.kpt K-points mesh for periodic systems Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Atomic positions: posinp.xyz or posinp.ascii units reduced, angstroem, bohr or atomic Boundary conditions periodic, surface Introduction Inputs DFT Pseudopotential Basis set SCF Performances Parallel GPU Conclusion 8 reduced periodic 10.26085 10.26085 10.26085 Si 0. 0. 0. Si 0.5 0.5 0. Si 0.5 0. 0.5 Si 0. 0.5 0.5 Si 0.25 0.25 0.25 Si 0.75 0.75 0.25 Si 0.75 0.25 0.75 Si 0.25 0.75 0.75 Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Atomic positions: posinp.xyz or posinp.ascii units reduced, angstroem, bohr or atomic Boundary conditions periodic, surface Introduction Inputs DFT Pseudopotential Basis set SCF Performances Parallel GPU Conclusion 3 angstroem surface 3.0 0.0 3.0 O 1.5 0.0 1.5 H .7285 0.620919 1.5 H 2.2715 0.620919 1.5 Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Atomic positions: posinp.xyz or posinp.ascii units reduced, angstroem, bohr or atomic Boundary conditions periodic, surface Introduction Inputs 5 atomic DFT Pseudopotential Basis set SCF Performances Parallel GPU Conclusion Si 1.62 1.62 1.62 H 3.24 3.24 3.24 H 0 0 3.24 H 3.24 0 0 H 0 3.24 0 Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Kohn-Sham formalism H-K theorem: E is an unknown functional of the density E = E [ρ] → Density Functional Theory Kohn-Sham approach Introduction Mapping of an interacting many-electron system into a system with independent particles moving into an effective potential. Inputs DFT Pseudopotential Basis set Find a set of orthonormal orbitals Ψi (r) that minimizes: SCF Performances Parallel GPU Conclusion E =− 1 N /2 Z ∑ 2 i =1 Ψ∗i (r)∇2 Ψi (r)dr + 1 Z 2 ρ(r)VH (r)dr Z + Exc [ρ(r)] + Vext (r)ρ(r)dr N /2 ρ(r) = 2 ∑ Ψ∗i (r)Ψi (r) ∇2 VH (r) = −4πρ(r) i =1 Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Performing a DFT calculation A self-consistent equation ρ(r) = ∑ Ψ∗i (r)Ψi (r), where |ψi i satisfies i 1 2 − ∇ + VH [ρ] + Vxc [ρ] + Vext + Vpseudo |ψi i = Ei |ψi i , 2 Introduction Inputs DFT (Kohn-Sham) DFT “Ingredients” Pseudopotential Basis set SCF Performances Parallel GPU Conclusion An XC potential, functional of the density several approximations exists (LDA,GGA,. . . ) A choice of the pseudopotential (if not all-electrons) (norm conserving, ultrasoft, PAW,. . . ) A basis set for expressing the |ψi i An (iterative) algorithm for finding the wavefunctions |ψi i A (good) computer. . . Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Exchange-Correlation Energy (input.dft) Exc [ρ] = Kexact [ρ] − KKS [ρ] + Ve−e exact [ρ] − VHartree [ρ] KKS [ρ]: Kinetic energy for a non-interaction electron gas Introduction Inputs DFT Pseudopotential Basis set Kohn-Sham formalism is exact (mapping of an interacting electron system into a non-interacting system) SCF Performances Parallel Many types from GPU Conclusion ABINIT: Use same codes (ixc: LDA=1, PBE=11, see manual) libXC: − ’code for exchange’ ’code for correlation’ (−101130 see manual) Hybrid functionals from libXC Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Performing a DFT calculation A self-consistent equation ρ(r) = ∑ Ψ∗i (r)Ψi (r), where |ψi i satisfies i 1 2 − ∇ + VH [ρ] + Vxc [ρ] + Vext + Vpseudo |ψi i = Ei |ψi i , 2 Introduction Inputs DFT (Kohn-Sham) DFT “Ingredients” Pseudopotential Basis set SCF Performances Parallel GPU Conclusion An XC potential, functional of the density several approximations exists (LDA,GGA,. . . ) A choice of the pseudopotential (if not all-electrons) (norm conserving, ultrasoft, PAW,. . . ) A basis set for expressing the |ψi i An (iterative) algorithm for finding the wavefunctions |ψi i A (good) computer. . . Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Gaussian type separable Pseudopotentials (HGH) Local part Vloc (r ) −Zion = r r " 1 erf √ + exp − 2 2rloc ( C1 + C2 Introduction r rloc 2 + C3 r 2 # rloc 4 r + C4 rloc r 6 ) rloc Inputs DFT Pseudopotential Nonlocal (separable) part H (~r ,~r 0 ) Basis set 2 SCF Performances H sep (~r ,~r 0 ) = ∑ ∑ Ys,m (r̂ ) pis (r ) his pis (r 0 ) Ys∗,m (rˆ0 ) i =1 m Parallel GPU + Conclusion ∑ Yp,m (r̂ ) p1p (r ) h1p p1p (r 0 ) Yp∗,m (rˆ0 ) m p1l (r ) = √ 2 rle l + 23 rl − 12 ( rr l )2 q Γ(l + 32 ) Laboratoire de Simulation Atomistique p2l (r ) http://inac.cea.fr/L_Sim = √ 2 r l +2 e l+ 7 rl 2 − 12 ( rr )2 l q Γ(l + 72 ) T. Deutsch The nonlocal part of the pseudopotential H sep Ψ = |P >< P |Ψ > Introduction Inputs DFT Pseudopotential Basis set SCF Performances Parallel GPU Conclusion If the expansion coefficients cJ of P in a Daubechies basis are known then ! ! Z < P |Ψ >= cI φ(x − I ) ΨJ φ(x − J ) dx = cJ ΨJ ∑ I ∑ J ∑ J Because a Gaussian is separable and a 3-dim scaling basis is a product of 1-dim scaling functions/wavelets, ci ,j ,k can easily be calculated with machine precision. Z Z Z ci ,j ,k = φki (x )φkj (y )φkk (z ) exp(−(x 2 + y 2 + z 2 )) dx dy dz Z = φki (x ) exp(−x 2 ) dx Z Z × φki (y ) exp(−y 2 ) dy φki (z ) exp(−z 2 ) dz Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Pseudopotential file: psppar.N In the same directory as the calculation. Large library: http://www.abinit.org/downloads/psp-links Accurate and transferable Introduction Inputs DFT Pseudopotential Basis set SCF Performances Parallel GPU Conclusion Goedecker pseudopotential for N 7 5 070301 zatom,zion,pspdat 10 11 1 2 2001 0 pspcod,pspxc,lmax,lloc,mmax,r2well 0.28379051 2 -12.41522559 1.86809592 rloc nloc c1 c2 2 nnonloc 0.25540500 1 13.63026257 rs ns hs11 0.24549453 1 0.00000000 rp np hp11 0.00414750 kp11 Save memory: projectors calculated on-the-fly (cost < 1%, experimental option) Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Performing a DFT calculation A self-consistent equation ρ(r) = ∑ Ψ∗i (r)Ψi (r), where |ψi i satisfies i 1 2 − ∇ + VH [ρ] + Vxc [ρ] + Vext + Vpseudo |ψi i = Ei |ψi i , 2 Introduction Inputs DFT (Kohn-Sham) DFT “Ingredients” Pseudopotential Basis set SCF Performances Parallel GPU Conclusion An XC potential, functional of the density several approximations exists (LDA,GGA,. . . ) A choice of the pseudopotential (if not all-electrons) (norm conserving, ultrasoft, PAW,. . . ) A basis set for expressing the |ψi i An (iterative) algorithm for finding the wavefunctions |ψi i A (good) computer. . . Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Basis sets in real space: Wavelet basis sets Properties of wavelet basis sets: localized both in real and in Fourier space allow for adaptivity (for core electrons) Introduction Inputs are a systematic basis set DFT 1.5 φ(x) Pseudopotential Basis set 1 ψ(x) SCF 0.5 Performances Parallel 0 GPU -0.5 Conclusion -1 -1.5 -6 -4 -2 0 2 4 6 8 x A basis set both adaptive and systematic, real space based Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch A brief description of wavelet theory Two kind of basis functions A Multi-Resolution real space basis The functions can be classified following the resolution level they span. Introduction Inputs DFT Pseudopotential Basis set SCF Scaling Functions The functions of low resolution level are a linear combination of high-resolution functions Performances = Parallel + GPU m Conclusion φ(x ) = ∑ hj φ(2x − j ) j =−m Centered on a resolution-dependent grid: φj = φ0 (x − j ). Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch A brief description of wavelet theory Wavelets They contain the DoF needed to complete the information which is lacking due to the coarseness of the resolution. = Introduction Inputs DFT Pseudopotential 1 2 + 12 m m φ(2x ) = ∑ h̃j φ(x − j ) + ∑ g̃j ψ(x − j ) j =−m j =−m Basis set SCF Performances Increase the resolution without modifying grid space Parallel GPU SF + W = same DoF of SF of higher resolution Conclusion m ψ(x ) = ∑ gj φ(2x − j ) j =−m All functions have compact support, centered on grid points. Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Adaptivity of the mesh Adaptivity Resolution can be refined following the grid point. The grid is divided in: Low resolution pts (SF, 1 DoF) Introduction High resolution pts (SF + W, 8 DoF) Inputs DFT Pseudopotential Basis set Points of different resolution belong to the same grid. SCF Performances Parallel GPU Conclusion Data can thus be stored in compressed form. Localization property Empty regions must not be “filled” with basis functions. Optimal for big inhomogeneous systems, O (N ) approach. Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Adaptivity of the mesh Atomic positions (H2 O), hgrid Introduction Inputs DFT Pseudopotential Basis set SCF Performances Parallel GPU Conclusion Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Adaptivity of the mesh Fine grid (high resolution, frmult) Introduction Inputs DFT Pseudopotential Basis set SCF Performances Parallel GPU Conclusion Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Adaptivity of the mesh Coarse grid (low resolution, crmult) Introduction Inputs DFT Pseudopotential Basis set SCF Performances Parallel GPU Conclusion Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch input.dft: hgrid, crmult, frmult hgrid Step grid: twice as bigger than the lowest gaussian coefficient of pseudopotential Roughly 0.45 (only orthorhombic mesh) Introduction Inputs DFT Pseudopotential Basis set crmult Extension of the coarse resolution (factor 6.): depends on the extension of the HOMO for the atom SCF Performances Parallel GPU frmult Extension of the fine resolution (factor 8.): depends on the pseudopotential coefficient Conclusion In practice, we use only hgrid from 0.3 to 0.55. Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch No integration error Orthogonality, scaling relation Z dx φk (x )φj (x ) = δkj Introduction Inputs DFT Pseudopotential 1 φ(x ) = √ m ∑ 2 j =−m hj φ(2x − j ) The hamiltonian-related quantities can be calculated up to machine precision in the given basis. 14 The accuracy is only limited by the basis set (O hgrid ) Basis set SCF Performances Parallel GPU Exact evaluation of kinetic energy Obtained by convolution with filters: Conclusion f (x ) = ∑ c` φ` (x ) , ` c̃` = ∑ cj a`−j , ∇2 f (x ) = ∑ c̃` φ` (x ) , ` a` ≡ Z φ0 (x )∂2x φ` (x ) , j Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Separability in 3D The 3-dim scaling basis is a tensor product decomposition of 1-dim Scaling Functions/ Wavelets. e ,e ,e e φjxx,jy ,yjz z (x , y , z ) = φejxx (x )φjyy (y )φejzz (z ) Introduction Inputs With (jx , jy , jz ) the node coordinates, (0) φj (1) and φj the SF and the W respectively. DFT Pseudopotential Basis set SCF Performances Parallel GPU Conclusion Gaussians and wavelets The separability of the basis allows us to save computational time when performing scalar products with separable functions (e.g. gaussians): Initial wavefunctions (input guess) Poisson solver Non-local pseudopotentials Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Wavelet families used in BigDFT code Daubechies f (x ) = ∑` c` φ` (x ) R Orthogonal c` = dx φ` (x )f (x ) LEAST ASYMMETRIC DAUBECHIES-16 1.5 Interpolating f (x ) = ∑j fj ϕj (x ) Dual to dirac deltas fj = f (j ) INTERPOLATING (DESLAURIERS-DUBUC)-8 1 scaling function wavelet scaling function wavelet 1 0.5 0.5 Introduction Inputs DFT Pseudopotential Basis set SCF Performances Parallel GPU 0 0 -0.5 -0.5 -1 -1.5 -6 -4 -2 0 2 4 6 Used for wavefunctions, scalar products -1 -4 -2 0 2 4 Used for charge density, function products Conclusion Magic Filter method (A.Neelov, S. Goedecker) The passage between the two basis sets can be performed without losing accuracy Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Performing a DFT calculation A self-consistent equation ρ(r) = ∑ Ψ∗i (r)Ψi (r), where |ψi i satisfies i 1 2 − ∇ + VH [ρ] + Vxc [ρ] + Vext + Vpseudo |ψi i = Ei |ψi i , 2 Introduction Inputs DFT (Kohn-Sham) DFT “Ingredients” Pseudopotential Basis set SCF Performances Parallel GPU Conclusion An XC potential, functional of the density several approximations exists (LDA,GGA,. . . ) A choice of the pseudopotential (if not all-electrons) (norm conserving, ultrasoft, PAW,. . . ) A basis set for expressing the |ψi i An (iterative) algorithm for finding the wavefunctions |ψi i A (good) computer. . . Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Kohn-Sham Equations: Operators Apply different operators Having a one-electron hamiltonian in a mean field. Introduction Inputs 1 2 − ∇ + Veff (r )(r ) ψi = εi ψi 2 DFT Basis set Veff (r ) = Vext (r ) + SCF Performances Parallel GPU Conclusion ρ(r 0 ) 0 dr + µxc (r ) V |r − r 0 | Z Pseudopotential E [ρ] can be expressed by the orthonormalized states of one particule: ψi (r ) with the fractional occupancy number fi ( 0 ≤ f i ≤ 1) : ρ(r ) = 2 ∑ fi |ψi (r )|2 i Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Kohn-Sham Equations: Computing Energies Calculate different integrals E [ρ] = K [ρ] + U [ρ] Introduction K [ρ] = − Inputs DFT 2 ~2 2 me Z ∑ i V dr ψ∗i ∇2 ψi Pseudopotential Basis set SCF Performances Parallel Z U [ρ] = V dr Vext (r ) ρ(r )+ 1 2 V | GPU Z ρ(r )ρ(r 0 ) + E [ρ] | xc{z } |r − r 0 | {z } exchange−correlation dr dr 0 Hartree Conclusion We minimise with the variables ψi (r ) and fi Z with the constraint V Laboratoire de Simulation Atomistique dr ρ(r ) = Nel . http://inac.cea.fr/L_Sim T. Deutsch KS Equations: Self-Consistent Field − Set of self-consistent equations: 1 ~2 2 me 2 ∇ + Veff ψi = εi ψi with an effective potential: Introduction Inputs Veff (r ) = Vext (r ) + Z V DFT Pseudopotential | Basis set ρ(r 0 ) |r − r 0 | {z } dr 0 Hartree + δExc δρ(r ) | {z } exchange−correlation SCF Performances Parallel and: GPU ρ(r ) = 2 ∑i fi |ψi (r )|2 Conclusion Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch KS Equations: Self-Consistent Field − Set of self-consistent equations: 1 ~2 2 me 2 ∇ + Veff ψi = εi ψi with an effective potential: Introduction Inputs Veff (r ) = Vext (r ) + Z V DFT Pseudopotential | Basis set ρ(r 0 ) |r − r 0 | {z } dr 0 Hartree + δExc δρ(r ) | {z } exchange−correlation SCF Performances Parallel and: GPU ρ(r ) = 2 ∑i fi |ψi (r )|2 Conclusion Poisson Equation: ∆VHartree = ρ Laboratoire de Simulation Atomistique 2 2 2 (Laplacian: ∆ = ∂∂x 2 + ∂∂y 2 + ∂∂z 2 ) http://inac.cea.fr/L_Sim T. Deutsch KS Equations: Self-Consistent Field − Set of self-consistent equations: 1 ~2 2 me 2 ∇ + Veff ψi = εi ψi with an effective potential: Introduction Inputs Veff (r ) = Vext (r ) + Z V DFT Pseudopotential | Basis set ρ(r 0 ) |r − r 0 | {z } dr 0 Hartree + δExc δρ(r ) | {z } exchange−correlation SCF Performances Parallel and: GPU ρ(r ) = 2 ∑i fi |ψi (r )|2 Conclusion Poisson Equation: ∆VHartree = ρ Laboratoire de Simulation Atomistique 2 2 2 (Laplacian: ∆ = ∂∂x 2 + ∂∂y 2 + ∂∂z 2 ) http://inac.cea.fr/L_Sim T. Deutsch KS Equations: Self-Consistent Field − Set of self-consistent equations: 1 ~2 2 me 2 ∇ + Veff ψi = εi ψi with an effective potential: Introduction Inputs Veff (r ) = Vext (r ) + Z V DFT Pseudopotential | Basis set ρ(r 0 ) |r − r 0 | {z } dr 0 Hartree + δExc δρ(r ) | {z } exchange−correlation SCF Performances Parallel and: GPU ρ(r ) = 2 ∑i fi |ψi (r )|2 Conclusion Poisson Equation: ∆VHartree = ρ Laboratoire de Simulation Atomistique 2 2 2 (Laplacian: ∆ = ∂∂x 2 + ∂∂y 2 + ∂∂z 2 ) http://inac.cea.fr/L_Sim T. Deutsch KS Equations: Self-Consistent Field − Set of self-consistent equations: 1 ~2 2 me 2 ∇ + Veff ψi = εi ψi with an effective potential: Introduction Inputs Veff (r ) = Vext (r ) + Z V DFT Pseudopotential | Basis set ρ(r 0 ) |r − r 0 | {z } dr 0 Hartree + δExc δρ(r ) | {z } exchange−correlation SCF Performances Parallel and: GPU ρ(r ) = 2 ∑i fi |ψi (r )|2 Conclusion Poisson Equation: ∆VHartree = ρ Laboratoire de Simulation Atomistique 2 2 2 (Laplacian: ∆ = ∂∂x 2 + ∂∂y 2 + ∂∂z 2 ) http://inac.cea.fr/L_Sim T. Deutsch KS Equations: Self-Consistent Field − Set of self-consistent equations: 1 ~2 2 me 2 ∇ + Veff ψi = εi ψi with an effective potential: Introduction Inputs Veff (r ) = Vext (r ) + Z V DFT Pseudopotential | Basis set ρ(r 0 ) |r − r 0 | {z } dr 0 Hartree + δExc δρ(r ) | {z } exchange−correlation SCF Performances Parallel and: GPU ρ(r ) = 2 ∑i fi |ψi (r )|2 Conclusion Poisson Equation: ∆VHartree = ρ Laboratoire de Simulation Atomistique 2 2 2 (Laplacian: ∆ = ∂∂x 2 + ∂∂y 2 + ∂∂z 2 ) http://inac.cea.fr/L_Sim T. Deutsch Direct Minimisation: Flowchart ψi = ∑ cai φa Basis: adaptive mesh (0, . . . , Nφ ) a Orthonormalized Introduction Inputs DFT Pseudopotential Basis set SCF Performances Parallel GPU Conclusion Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Direct Minimisation: Flowchart ψi = ∑ cai φa Basis: adaptive mesh (0, . . . , Nφ ) a Orthonormalized inv FWT + Magic Filter occ Introduction ρ(r ) = ∑ |ψi (r )|2 Fine non-adaptive mesh: < 8Nφ i Inputs DFT Pseudopotential Basis set SCF Performances Parallel GPU Conclusion Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Direct Minimisation: Flowchart ψi = ∑ cai φa Basis: adaptive mesh (0, . . . , Nφ ) a Orthonormalized inv FWT + Magic Filter occ ρ(r ) = ∑ |ψi (r )|2 Introduction Fine non-adaptive mesh: < 8Nφ i Inputs DFT Poisson solver Pseudopotential Basis set SCF Performances Vxc [ρ(r )] VH (r ) = R G(.)ρ VNL ({ψi }) Veffective Parallel GPU Conclusion Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Direct Minimisation: Flowchart ψi = ∑ cai φa Basis: adaptive mesh (0, . . . , Nφ ) a Orthonormalized inv FWT + Magic Filter occ ρ(r ) = ∑ |ψi (r )|2 Introduction Fine non-adaptive mesh: < 8Nφ i Inputs DFT Poisson solver Pseudopotential Basis set SCF Vxc [ρ(r )] Performances VH (r ) = R G(.)ρ VNL ({ψi }) Veffective Parallel GPU Conclusion − 12 ∇2 Kinetic Term Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Direct Minimisation: Flowchart ψi = ∑ cai φa Basis: adaptive mesh (0, . . . , Nφ ) a Orthonormalized inv FWT + Magic Filter occ ρ(r ) = ∑ |ψi (r )|2 Introduction Fine non-adaptive mesh: < 8Nφ i Inputs DFT Poisson solver Pseudopotential Basis set SCF Performances Vxc [ρ(r )] VH (r ) = Parallel G(.)ρ VNL ({ψi }) Veffective FWT GPU Conclusion R − 12 ∇2 FWT δcai = − ∂Etotal + Λij caj ∂ci∗ (a) ∑ j Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Kinetic Term Λij =< ψi |H |ψj > T. Deutsch Direct Minimisation: Flowchart ψi = ∑ cai φa Basis: adaptive mesh (0, . . . , Nφ ) a Orthonormalized inv FWT + Magic Filter occ ρ(r ) = ∑ |ψi (r )|2 Introduction Fine non-adaptive mesh: < 8Nφ i Inputs DFT Poisson solver Pseudopotential Basis set SCF Performances VH (r ) = Vxc [ρ(r )] Parallel G(.)ρ VNL ({ψi }) Veffective FWT GPU Conclusion R − 12 ∇2 FWT δcai = − ∂Etotal + Λij caj ∂ci∗ (a) ∑ j Kinetic Term Λij =< ψi |H |ψj > preconditioning new ,i ca = cai + hstep δcai Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Steepest Descent, DIIS T. Deutsch Direct Minimisation: Flowchart ψi = ∑ cai φa Basis: adaptive mesh (0, . . . , Nφ ) a Orthonormalized inv FWT + Magic Filter occ ρ(r ) = ∑ |ψi (r )|2 Introduction Fine non-adaptive mesh: < 8Nφ i Inputs DFT Poisson solver Pseudopotential Basis set SCF Performances VH (r ) = Vxc [ρ(r )] Parallel G(.)ρ VNL ({ψi }) Veffective FWT GPU Conclusion R − 12 ∇2 FWT δcai = − Stop when δcai small new ,i ca ∂Etotal + Λij caj ∂ci∗ (a) ∑ j Kinetic Term Λij =< ψi |H |ψj > preconditioning = cai + hstep δcai Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Steepest Descent, DIIS T. Deutsch input.dft: idsx DIIS history Use DIIS algorithm Direct Invesion of Iterative Subspace Introduction Inputs DFT Pseudopotential Use idsx=6 previous iterations Memory-consuming (decrease if too big=) Steepest descent if idsx=0 Basis set SCF Performances Parallel itermax=50 Specify maximum number of iterations Typically 15 iterations to converge GPU Conclusion gnrm_cv=1.e-4 convergence criterion Residue |δψ|2 < gnrm_cv Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Description of the file input.dft Introduction Inputs DFT Pseudopotential Basis set SCF Performances Parallel GPU Conclusion 0.450 0.450 0.450 hx, hy, hz 5.0 8.0 crmult, frmult, coarse and fine radius 1 ixc: XC parameter (LDA=1,PBE=11) 0 0.0 ncharge: charge of the system, Electric field 10 nspin=1 non-spin polarization, total magnetic moment 1.E-04 gnrm_cv: convergence criterion gradient 50 10 itermax, nrepmax 76 # CG (preconditioning), length of diis history 0 dispersion correction functional (0, 1, 2, 3) write "CUDAGPU" to use GPU (GPU.config file needed) 0F0 InputPsiId, output_wf, output_grid 5.0 30 length of the tail, # tail CG iterations 0 0 0 davidson treatment, # virtual orbitals, # plotted orbitals 2 verbosity of the output 0=low, 2=high T disable the symmetry detection Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Description of the file input.dft Introduction Inputs DFT Pseudopotential Basis set SCF Performances Parallel GPU Conclusion 0.450 0.450 0.450 hx, hy, hz 5.0 8.0 crmult, frmult, coarse and fine radius 1 ixc: XC parameter (LDA=1,PBE=11) 0 0.0 ncharge: charge of the system, Electric field 10 nspin=1 non-spin polarization, total magnetic moment 1.E-04 gnrm_cv: convergence criterion gradient 50 10 itermax, nrepmax 76 # CG (preconditioning), length of diis history 0 dispersion correction functional (0, 1, 2, 3) write "CUDAGPU" to use GPU (GPU.config file needed) 0F0 InputPsiId, output_wf, output_grid 5.0 30 length of the tail, # tail CG iterations 0 0 0 davidson treatment, # virtual orbitals, # plotted orbitals 2 verbosity of the output 0=low, 2=high T disable the symmetry detection Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Description of the file input.dft Introduction Inputs DFT Pseudopotential Basis set SCF Performances Parallel GPU Conclusion 0.450 0.450 0.450 hx, hy, hz 5.0 8.0 crmult, frmult, coarse and fine radius 1 ixc: XC parameter (LDA=1,PBE=11) 0 0.0 ncharge: charge of the system, Electric field 10 nspin=1 non-spin polarization, total magnetic moment 1.E-04 gnrm_cv: convergence criterion gradient 50 10 itermax, nrepmax 76 # CG (preconditioning), length of diis history 0 dispersion correction functional (0, 1, 2, 3) write "CUDAGPU" to use GPU (GPU.config file needed) 0F0 InputPsiId, output_wf, output_grid 5.0 30 length of the tail, # tail CG iterations 0 0 0 davidson treatment, # virtual orbitals, # plotted orbitals 2 verbosity of the output 0=low, 2=high T disable the symmetry detection Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Description of the file input.dft Introduction Inputs DFT Pseudopotential Basis set SCF Performances Parallel GPU Conclusion 0.450 0.450 0.450 hx, hy, hz 5.0 8.0 crmult, frmult, coarse and fine radius 1 ixc: XC parameter (LDA=1,PBE=11) 0 0.0 ncharge: charge of the system, Electric field 10 nspin=1 non-spin polarization, total magnetic moment 1.E-04 gnrm_cv: convergence criterion gradient 50 10 itermax, nrepmax 76 # CG (preconditioning), length of diis history 0 dispersion correction functional (0, 1, 2, 3) write "CUDAGPU" to use GPU (GPU.config file needed) 0F0 InputPsiId, output_wf, output_grid 5.0 30 length of the tail, # tail CG iterations 0 0 0 davidson treatment, # virtual orbitals, # plotted orbitals 2 verbosity of the output 0=low, 2=high T disable the symmetry detection Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Description of the file input.dft Introduction Inputs DFT Pseudopotential Basis set SCF Performances Parallel GPU Conclusion 0.450 0.450 0.450 hx, hy, hz 5.0 8.0 crmult, frmult, coarse and fine radius 1 ixc: XC parameter (LDA=1,PBE=11) 0 0.0 ncharge: charge of the system, Electric field 10 nspin=1 non-spin polarization, total magnetic moment 1.E-04 gnrm_cv: convergence criterion gradient 50 10 itermax, nrepmax 76 # CG (preconditioning), length of diis history 0 dispersion correction functional (0, 1, 2, 3) write "CUDAGPU" to use GPU (GPU.config file needed) 0F0 InputPsiId, output_wf, output_grid 5.0 30 length of the tail, # tail CG iterations 0 0 0 davidson treatment, # virtual orbitals, # plotted orbitals 2 verbosity of the output 0=low, 2=high T disable the symmetry detection Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Description of the file input.dft Introduction Inputs DFT Pseudopotential Basis set SCF Performances Parallel GPU Conclusion 0.450 0.450 0.450 hx, hy, hz 5.0 8.0 crmult, frmult, coarse and fine radius 1 ixc: XC parameter (LDA=1,PBE=11) 0 0.0 ncharge: charge of the system, Electric field 10 nspin=1 non-spin polarization, total magnetic moment 1.E-04 gnrm_cv: convergence criterion gradient 50 10 itermax, nrepmax 76 # CG (preconditioning), length of diis history 0 dispersion correction functional (0, 1, 2, 3) write "CUDAGPU" to use GPU (GPU.config file needed) 0F0 InputPsiId, output_wf, output_grid 5.0 30 length of the tail, # tail CG iterations 0 0 0 davidson treatment, # virtual orbitals, # plotted orbitals 2 verbosity of the output 0=low, 2=high T disable the symmetry detection Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Description of the file input.dft Introduction Inputs DFT Pseudopotential Basis set SCF Performances Parallel GPU Conclusion 0.450 0.450 0.450 hx, hy, hz 5.0 8.0 crmult, frmult, coarse and fine radius 1 ixc: XC parameter (LDA=1,PBE=11) 0 0.0 ncharge: charge of the system, Electric field 10 nspin=1 non-spin polarization, total magnetic moment 1.E-04 gnrm_cv: convergence criterion gradient 50 10 itermax, nrepmax 76 # CG (preconditioning), length of diis history 0 dispersion correction functional (0, 1, 2, 3) write "CUDAGPU" to use GPU (GPU.config file needed) 0F0 InputPsiId, output_wf, output_grid 5.0 30 length of the tail, # tail CG iterations 0 0 0 davidson treatment, # virtual orbitals, # plotted orbitals 2 verbosity of the output 0=low, 2=high T disable the symmetry detection Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Description of the file input.dft Introduction Inputs DFT Pseudopotential Basis set SCF Performances Parallel GPU Conclusion 0.450 0.450 0.450 hx, hy, hz 5.0 8.0 crmult, frmult, coarse and fine radius 1 ixc: XC parameter (LDA=1,PBE=11) 0 0.0 ncharge: charge of the system, Electric field 10 nspin=1 non-spin polarization, total magnetic moment 1.E-04 gnrm_cv: convergence criterion gradient 50 10 itermax, nrepmax 76 # CG (preconditioning), length of diis history 0 dispersion correction functional (0, 1, 2, 3) write "CUDAGPU" to use GPU (GPU.config file needed) 0F0 InputPsiId, output_wf, output_grid 5.0 30 length of the tail, # tail CG iterations 0 0 0 davidson treatment, # virtual orbitals, # plotted orbitals 2 verbosity of the output 0=low, 2=high T disable the symmetry detection Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Description of the file input.dft Introduction Inputs DFT Pseudopotential Basis set SCF Performances Parallel GPU Conclusion 0.450 0.450 0.450 hx, hy, hz 5.0 8.0 crmult, frmult, coarse and fine radius 1 ixc: XC parameter (LDA=1,PBE=11) 0 0.0 ncharge: charge of the system, Electric field 10 nspin=1 non-spin polarization, total magnetic moment 1.E-04 gnrm_cv: convergence criterion gradient 50 10 itermax, nrepmax 76 # CG (preconditioning), length of diis history 0 dispersion correction functional (0, 1, 2, 3) write "CUDAGPU" to use GPU (GPU.config file needed) 0F0 InputPsiId, output_wf, output_grid 5.0 30 length of the tail, # tail CG iterations 0 0 0 davidson treatment, # virtual orbitals, # plotted orbitals 2 verbosity of the output 0=low, 2=high T disable the symmetry detection Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Description of the file input.dft Introduction Inputs DFT Pseudopotential Basis set SCF Performances Parallel GPU Conclusion 0.450 0.450 0.450 hx, hy, hz 5.0 8.0 crmult, frmult, coarse and fine radius 1 ixc: XC parameter (LDA=1,PBE=11) 0 0.0 ncharge: charge of the system, Electric field 10 nspin=1 non-spin polarization, total magnetic moment 1.E-04 gnrm_cv: convergence criterion gradient 50 10 itermax, nrepmax 76 # CG (preconditioning), length of diis history 0 dispersion correction functional (0, 1, 2, 3) write "CUDAGPU" to use GPU (GPU.config file needed) 0F0 InputPsiId, output_wf, output_grid 5.0 30 length of the tail, # tail CG iterations 0 0 0 davidson treatment, # virtual orbitals, # plotted orbitals 2 verbosity of the output 0=low, 2=high T disable the symmetry detection Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Description of the file input.dft Introduction Inputs DFT Pseudopotential Basis set SCF Performances Parallel GPU Conclusion 0.450 0.450 0.450 hx, hy, hz 5.0 8.0 crmult, frmult, coarse and fine radius 1 ixc: XC parameter (LDA=1,PBE=11) 0 0.0 ncharge: charge of the system, Electric field 10 nspin=1 non-spin polarization, total magnetic moment 1.E-04 gnrm_cv: convergence criterion gradient 50 10 itermax, nrepmax 76 # CG (preconditioning), length of diis history 0 dispersion correction functional (0, 1, 2, 3) write "CUDAGPU" to use GPU (GPU.config file needed) 0F0 InputPsiId, output_wf, output_grid 5.0 30 length of the tail, # tail CG iterations 0 0 0 davidson treatment, # virtual orbitals, # plotted orbitals 2 verbosity of the output 0=low, 2=high T disable the symmetry detection Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Description of the file input.dft Introduction Inputs DFT Pseudopotential Basis set SCF Performances Parallel GPU Conclusion 0.450 0.450 0.450 hx, hy, hz 5.0 8.0 crmult, frmult, coarse and fine radius 1 ixc: XC parameter (LDA=1,PBE=11) 0 0.0 ncharge: charge of the system, Electric field 10 nspin=1 non-spin polarization, total magnetic moment 1.E-04 gnrm_cv: convergence criterion gradient 50 10 itermax, nrepmax 76 # CG (preconditioning), length of diis history 0 dispersion correction functional (0, 1, 2, 3) write "CUDAGPU" to use GPU (GPU.config file needed) 0F0 InputPsiId, output_wf, output_grid 5.0 30 length of the tail, # tail CG iterations 0 0 0 davidson treatment, # virtual orbitals, # plotted orbitals 2 verbosity of the output 0=low, 2=high T disable the symmetry detection Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Description of the file input.dft Introduction Inputs DFT Pseudopotential Basis set SCF Performances Parallel GPU Conclusion 0.450 0.450 0.450 hx, hy, hz 5.0 8.0 crmult, frmult, coarse and fine radius 1 ixc: XC parameter (LDA=1,PBE=11) 0 0.0 ncharge: charge of the system, Electric field 10 nspin=1 non-spin polarization, total magnetic moment 1.E-04 gnrm_cv: convergence criterion gradient 50 10 itermax, nrepmax 76 # CG (preconditioning), length of diis history 0 dispersion correction functional (0, 1, 2, 3) write "CUDAGPU" to use GPU (GPU.config file needed) 0F0 InputPsiId, output_wf, output_grid 5.0 30 length of the tail, # tail CG iterations 0 0 0 davidson treatment, # virtual orbitals, # plotted orbitals 2 verbosity of the output 0=low, 2=high T disable the symmetry detection Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Description of the file input.dft Introduction Inputs DFT Pseudopotential Basis set SCF Performances Parallel GPU Conclusion 0.450 0.450 0.450 hx, hy, hz 5.0 8.0 crmult, frmult, coarse and fine radius 1 ixc: XC parameter (LDA=1,PBE=11) 0 0.0 ncharge: charge of the system, Electric field 10 nspin=1 non-spin polarization, total magnetic moment 1.E-04 gnrm_cv: convergence criterion gradient 50 10 itermax, nrepmax 76 # CG (preconditioning), length of diis history 0 dispersion correction functional (0, 1, 2, 3) write "CUDAGPU" to use GPU (GPU.config file needed) 0F0 InputPsiId, output_wf, output_grid 5.0 30 length of the tail, # tail CG iterations 0 0 0 davidson treatment, # virtual orbitals, # plotted orbitals 2 verbosity of the output 0=low, 2=high T disable the symmetry detection Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Basis sets: wavelets pseudopotential self-consistent (SCF) norm-conserving PAW all electrons Hybrid functionals Harris-Foulkes functional relativistic non-relativistic Introduction Inputs DFT h beyond LDA GW LDA,GGA LDA+U i 1 2 − 2 ∇ +V (r ) +µxc (r ) ψki= εki ψki Pseudopotential periodic Basis set SCF Performances non-periodic atomic orbitals Gaussians plane waves Slater augmented numerical Parallel GPU Conclusion N 3 scaling O (N ) methods non-spin polarized real space non-collinear finite difference spin polarized Wavelet collinear Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Optimal for isolated systems Test case: cinchonidine molecule (44 atoms) 101 Introduction Inputs DFT Pseudopotential Absolute energy precision (Ha) Plane waves 0 10 Ec = 40 Ha 10-1 Ec = 90 Ha 10-2 h = 0.4bohr Performances Parallel GPU Conclusion Ec = 125 Ha 10-3 h = 0.3bohr Basis set SCF Wavelets 10-4 105 Number of degrees of freedom 106 Allows a systematic approach for molecules Considerably faster than Plane Waves codes. 10 (5) times faster than ABINIT (CPMD) Charged systems can be treated explicitly with the same time Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Systematic basis set Two parameters for tuning the basis The grid spacing hgrid The extension of the low resolution points crmult Introduction Inputs DFT Pseudopotential Basis set SCF Performances Parallel GPU Conclusion Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Poisson Solver using interpolating scaling function The Hartree potential is calculated in the interpolating scaling function basis. Poisson solver with interpolating scaling functions From the density ρ(~j ) on an uniform grid it calculates: Introduction Inputs VH (~j ) = R d~x ρ(~x ) |~x −~j | DFT Pseudopotential Basis set SCF Performances Parallel GPU Conclusion 4 Very fast and accurate, optimal parallelization 4 Can be used independently from the DFT code 4 Integrated quantities (energies) are easy to extract 8 Non-adaptive, needs data uncompression Explicitly free boundary conditions No need to subtract supercell interactions → charged systems. Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Poisson Solver for surface boundary conditions A Poisson solver for surface problems Based on a mixed reciprocal-direct space representation Z Vpx ,py (z ) = −4π dz 0 G(|~p|; z − z 0 )ρpx ,py (z ) , 4 Can be applied both in real or reciprocal space codes Introduction 4 No supercell or screening functions Inputs DFT 4 More precise than other existing approaches Pseudopotential Basis set SCF Example of the plane capacitor: Performances Parallel GPU Hockney Periodic Our approach Conclusion V V V y x Laboratoire de Simulation Atomistique y x http://inac.cea.fr/L_Sim y x T. Deutsch input.geopt: Relaxation or molecular dynamics DIIS 500 1.d0 Introduction Inputs ncount_cluster: max steps during geometry relaxation frac_fluct: geometry optimization stops if force norm less than frac_fluct of noise 0.0d0 randdis: random amplitude for atoms 4.d0 5 betax: steepest descent step size, diis history DFT Pseudopotential Basis set Different algorithms: SCF Performances Parallel GPU Conclusion DIIS: Direct Inversion in the iterative subspace VSSD: Variable Size Steepest Descent LBFGS: Limited BFGD AB6MD: Use molecular dynamics routines from ABINIT 6 Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch input.geopt: Relaxation or molecular dynamics DIIS 500 1.d0 Introduction Inputs ncount_cluster: max steps during geometry relaxation frac_fluct: geometry optimization stops if force norm less than frac_fluct of noise 0.0d0 randdis: random amplitude for atoms 4.d0 5 betax: steepest descent step size, diis history DFT Pseudopotential Basis set Different algorithms: SCF Performances Parallel GPU Conclusion DIIS: Direct Inversion in the iterative subspace VSSD: Variable Size Steepest Descent LBFGS: Limited BFGD AB6MD: Use molecular dynamics routines from ABINIT 6 Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch input.geopt: Relaxation or molecular dynamics DIIS 500 1.d0 Introduction Inputs ncount_cluster: max steps during geometry relaxation frac_fluct: geometry optimization stops if force norm less than frac_fluct of noise 0.0d0 randdis: random amplitude for atoms 4.d0 5 betax: steepest descent step size, diis history DFT Pseudopotential Basis set Different algorithms: SCF Performances Parallel GPU Conclusion DIIS: Direct Inversion in the iterative subspace VSSD: Variable Size Steepest Descent LBFGS: Limited BFGD AB6MD: Use molecular dynamics routines from ABINIT 6 Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch input.geopt: Relaxation or molecular dynamics DIIS 500 1.d0 Introduction Inputs ncount_cluster: max steps during geometry relaxation frac_fluct: geometry optimization stops if force norm less than frac_fluct of noise 0.0d0 randdis: random amplitude for atoms 4.d0 5 betax: steepest descent step size, diis history DFT Pseudopotential Basis set Different algorithms: SCF Performances Parallel GPU Conclusion DIIS: Direct Inversion in the iterative subspace VSSD: Variable Size Steepest Descent LBFGS: Limited BFGD AB6MD: Use molecular dynamics routines from ABINIT 6 Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch input.geopt: Relaxation or molecular dynamics DIIS 500 1.d0 Introduction Inputs ncount_cluster: max steps during geometry relaxation frac_fluct: geometry optimization stops if force norm less than frac_fluct of noise 0.0d0 randdis: random amplitude for atoms 4.d0 5 betax: steepest descent step size, diis history DFT Pseudopotential Basis set Different algorithms: SCF Performances Parallel GPU Conclusion DIIS: Direct Inversion in the iterative subspace VSSD: Variable Size Steepest Descent LBFGS: Limited BFGD AB6MD: Use molecular dynamics routines from ABINIT 6 Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Operations performed Different numerical operations performed: Introduction Inputs DFT Pseudopotential Basis set SCF Performances Interpolating Daubechies For each wavefunction (Hamiltonian application) Between wavefunctions (Linear algebra) Parallel GPU Comput. operations Conclusion Convolutions with short filters BLAS routines FFT (Poisson Solver) Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Massively parallel Two kinds of parallelisation By orbitals (Hamiltonian application, preconditioning) By components (overlap matrices, orthogonalisation) Introduction Inputs DFT A few (but big) packets of data More demanding in bandwidth than in latency Pseudopotential Basis set SCF Performances No need of fast network Optimal speedup (eff. ∼ 85%), also for big systems Parallel GPU Conclusion Cubic scaling code For systems bigger than 500 atomes (1500 orbitals) : orthonormalisation operation is predominant (N 3 ) Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch High Performance Computing Localisation & Orthogonality → Data locality Principal code operations can be intensively optimised Optimal for application on supercomputers Little communication, big packets of data Inputs DFT SCF Performances Parallel GPU Conclusion 2 4 4 No need of fast network Pseudopotential Basis set 2 16 98 Efficiency (%) Introduction 100 Optimal speedup 96 8 94 44 1 1 4 22 92 11 5 (with orbitals/proc) Efficiency of the order of 90%, up to thousands of processors 32 at 65 at 173 at 257 at 1025 at 90 88 1 2 3 10 1 100 Number of processors 1000 Data repartition optimal for material accelerators (GPU) Graphic Processing Units can be used to speed up the computation Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Performance in parallel (BigDFT) Introduction Inputs DFT Pseudopotential Basis set SCF Performances Parallel GPU Conclusion Distribution per orbitals (wavelets) Nothing to specify in input files Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch GPU-ported operations in BigDFT (double precision) 20 Convolutions (CUDA rewritten) Introduction Inputs GPU speedups between 10 and 20 can be obtained for different sections GPU speedup (Double prec.) 18 16 14 12 10 8 6 locden locham precond 4 2 0 DFT 10 20 Pseudopotential Basis set 30 40 50 Wavefunction size (MB) 60 70 70 SCF Parallel GPU Conclusion 60 GPU speedup (Double prec.) Performances Linear algebra (CUBLAS library) 50 40 30 20 10 DGEMM DSYRK 0 0 500 1000 1500 Number of orbitals Laboratoire de Simulation Atomistique 2000 2500 The interfacing with CUBLAS is immediate, with considerable speedups http://inac.cea.fr/L_Sim T. Deutsch Data repartition on a node The S_GPU approach S_GPU library manages GPU resource within the node GPU Introduction Inputs DFT S_GPU Pseudopotential Basis set SCF Performances Parallel GPU MPI 0 MPI 1 MPI 2 MPI 3 DATA DATA DATA DATA Conclusion Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch BigDFT code on Hybrid architectures BigDFT code can run on hybrid CPU/GPU supercomputers In multi-GPU environments, double precision calculations No Hot-spot operations Introduction Inputs DFT Different code sections can be ported on GPU up to 20x speedup for some operations, 7x for the full parallel code (under improvements) Pseudopotential Basis set 100 1000 SCF Performances 80 Parallel 100 Seconds (log. scale) GPU Percent 60 Conclusion 40 10 20 0 1 2 4 6 8 10 12 14 16 1 2 CPU code Number of cores Laboratoire de Simulation Atomistique 4 6 8 10 12 14 16 Hybrid code (rel.) http://inac.cea.fr/L_Sim 1 Time (sec) Comm Other PureCPU precond locham locden LinAlg T. Deutsch CUDA + S_GPU Put "CUDAGPU" in input.dft Use a GPU.config file (affinity between CPU and GPU) USE_SHARED=0 Introduction Inputs DFT Pseudopotential Set if the repartition is shared (1) or static (0) MPI_TASKS_PER_NODE=8 Basis set Define the number of mpi tasks per node SCF Performances Parallel GPU Conclusion NUM_GPU=2 Define the number of GPU device per node GPU_CPUS_AFF_0=0,1,2,3 Define the affinity of the processes among the GPUs. GPU_CPUS_AFF_1=4,5,6,7 USE_GPU_BLAS=1 Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Use CUDABLAS T. Deutsch OpenCL (without S_GPU) Put "OCLGPU" in input.dft Introduction Inputs DFT Pseudopotential Basis set No GPU.config needed presently (should be removed also for CUDA) SCF Performances Parallel GPU Fermi is better for concurrent access Conclusion Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Conclusion Functionality: Mixed between physics and chemistry approach Order N (real space basis set as wavelets) Introduction Inputs DFT Pseudopotential Basis set SCF Performances Parallel GPU Conclusion QM/MM (Quantum Mechanics/Molecular Modeling) Multi-scale approach (more than 1000 atoms feasible for a better parametrization) Numerical experience: One-day simulation Better exploration of atomic configurations Molecular Dynamics (4s per step for 32 water molecules) Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Introduction Inputs THE END DFT Pseudopotential Basis set SCF Performances Parallel GPU Conclusion Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Introduction Inputs Future developments, Discussions DFT Pseudopotential Basis set SCF Performances Parallel GPU Conclusion Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch O(N) approach Introduction Versatile boundary conditions (surface, periodic, isolated) Inputs DFT Pseudopotential Basis set SCF Localization regions Performances Wannier functions Parallel GPU Easily embedded Conclusion TCPU Crossover point N Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Embedding a system (QM/MM) Introduction Inputs DFT Pseudopotential Basis set SCF Performances Parallel GPU Conclusion QM/MM Charge defects in bulk materials Semi-infinite system as surface Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch Developments Introduction Inputs DFT Pseudopotential Embedding atoms, O(N) method, region of localisation Many-body: TD-DFT, GW, Bethe-Salpeter PAW pseudopotentials (better accuracy, larger step grid) Basis set SCF Mixed basis sets Performances Parallel GPU Conclusion Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim T. Deutsch
Similar documents
Electronic Structure Calculations, Density Functional
Diagonalisation Gaussians Real space basis sets
More information