Slides
Transcription
Slides
CAA: Covariant Approximation Averaging a new class of error reduction techniques Taku Izubuchi, with T. Blum E. Shintani, C. Lehner, T. Kawanai, T. Ishikawa, J.Yu, R. Arthur, P. Bolye, …. RBC/UKQCD in preparation RIKEN BNL Research Center INT La'ce QCD Summer School, August 21 will be posted just aAer this lecture ! RBRC-967 A new class of variance reduction techniques using lattice symmetries Thomas Blum,1, 2 Taku Izubuchi,3, 2 and Eigo Shintani2 2 1 Physics Department, University of Connecticut, Storrs, CT 06269-3046, USA RIKEN-BNL Research Center, Brookhaven National Laboratory, Upton, NY 11973, USA 3 Brookhaven National Laboratory, Upton, NY 11973, USA We present a general class of unbiased improved estimators for physical observables in lattice gauge theory computations which significantly reduces statistical errors at modest computational cost. The error reduction techniques, referred to as covariant approximation averaging, utilize approximations which are covariant under lattice symmetry transformations. We observed cost reductions from the new method compared to the traditional one, for fixed statistical error, of 16 times for the nucleon mass at Mπ ∼ 330 MeV (Domain-Wall quark) and 2.6-20 times for the hadronic vacuum polarization at Mπ ∼ 480 MeV (Asqtad quark). These cost reductions should improve with decreasing quark mass and increasing lattice sizes. PACS numbers: 11.15.Ha,12.38.Gc,07.05.Tp n As non-perturbative computations using lattice gauge theory are applied to a wider range of physically interesting observables, it is increasingly important to find numerical strategies that provide precise results. In Monte Carlo simulations our reach to important physics is still often limited by statistical uncertainties. Examples include hadronic contributions to the muon’s anomalous magnetic moment [1], nucleon form factors and structure functions [2], including nucleon electric dipole moments [3–6], hadron matrix elements relevant to flavor physics (e.g., K → ππ amplitudes) [7], and multi-hadron state physics [8], to name only a few. As a generalization of low-mode averaging (LMA) [9, 10], we present a class of unbiased statistical error reduction techniques, utilizing approximations that are covariant under lattice symmetry transformations. LMA has worked well in cases where low eigenmodes of the Dirac operator dominate [11]: low energy constants in the εregime [9, 12–15], pseudoscalar meson masses and decay constants [16–18], an so on. With a modest increase in computational cost, the generalized method can reduce statistical errors by an order of magnitude, or more, even in cases where LMA fails. Unlike LMA, we account for all modes of the Dirac In lattice gauge theory simulations an ensemble of gauge field configurations {U1 , · · · , UNconf } is generated randomly, according to the Boltzmann weight, e−S[U ] , where S[U ] is the lattice-regularized action. The expectation value of a primary, covariant observable, O, Precise theoretical calculation becomes even more important to confirm or reject the standard model More than half of CPU cycles of lattice QCD are for valence calculations • • �O� = 1 Nconf N conf � i=1 O[Ui ] + O � � 1 √ , Nconf (1) is estimated as the ensemble average, over a large number of configurations, Nconf ∼ O(100 − 1000). Here, we primarily consider observables made of fermion propagators SF [U ] computed on the background gauge configuration U. By exploiting lattice symmetry transformations g ∈ G, that transform U → U g , a general class of variance reduction techniques is introduced. First construct an approximation O(appx) to O which must fulfill the following conditions, on physics point QCD simulation multi hadron simulation It’s shame to be limited by statistical error appx-1: O(appx) should fluctuate closely with O, (appx) � r ≡ Corr(O, O(appx) ) = √ �∆O∆O ≈ 1, (appx) 2 2 �(∆O) ��(∆O ) � and �(∆O)2 � ≈ �(∆O(appx) )2 � , where ∆X = X − statistical noise reduction techniques n LMA L. Giusti, P. Hernandez, M. Laine, P. Weisz and H. Wittig, JHEP 0404, 013 (2004) see also H. Neff, N. Eicker, T. Lippert, J. W. Negele and K. Schilling, Phys. Rev. D 64 (2001) 114509 and T. DeGrand and S. Schaefer, Comput. Phys. Commun. 159 (2004) 185 works for low mode dominant quantities n Truncated Solver Method (TSM) G. Bali, S. Collins, A. Schaefer, Comput. Phys. Commun. 181 (2010) 1570 uses stochastic noise to avoid systematic error n All-to-all propagator [S. Ryan’s lecture ] J.Foley, K.Juge, A. O’Cais, M. Peardon, S. Ryan, J-I. Skullerud, Comput.Phys.Commun. 172 (2005) 145 uses stochastic noise could use CAA as a part of A2A n Also closely related to the improved solvers Deflations, EigCG, Domain decomposition, MultiGrid, ….. n Other application specific reductions multi-hit (pure Gauge) multi level: Luscher, M. and Weisz, P. (2001). J. High Energy Phys., 09, 010 Multiple timestep in HMC n Multiple time steps in MD integrators n n Sexton & Weingarten trick Hasenbusch trick : introduce intermediate mass n Clark & Kennedy RHMC (quotient force term) A. Kennedy 06 Berlin Wall was torn down by Smart Work Sharings Similar tricks for valence ? 100% 75% Residue (") L! Force "/(!+0.125) CG iterations 50% 25% 0% -12.6 -10.1 -8.5 -7.1 -5.8 -4.4 -3.1 -1.7 -0.3 1.5 Shift [ln(!)] State of Obvious n Many interesting physics are limited by statistical error n Do more number of measurements, Nmeas n Change to observable with smaller fluctuation, C n Covariant Approximation Averaging (CAA) Combine the above using • symmetries of the lattice action • (crude) approximations Covariant Approximation Averaging ( CAA ) n Original observable n Covariant approximation of the observable under a lattice symmetry n Unbiased improved estimator Covariant approximation n O(appx) needs to be precisely (to the numerical accuracy required) covariant under the symmetry of lattice action to avoid systematic errors. Delta x = 4 O^g(x,y), y=4 O(x,y), y=1 U^g(x) U(x) 0 1 2 3 4 5 X 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 X g Figure c 1:heck Transformed linkcode field Uµu (x) and e a xplicitly bi-local observable Og (x, y). If OcisonfiguraRon covariant One should i n t he sing s hiAed g auge g observable, the shape of O (x, y) are exactly same as O(x, y). Unbiasness proof n Consider a element g of lattice symmetry G e.g. transformation of fields n Observable (and its approximation) is called to have covariance under g iff n or, more explicitly, n When g is a symmetry of lattice, and O(appx) is covariant Why expect improvements ? n O(imp) has smaller error, smaller C <= accuracy of approximation controls error, need not to be too accurate (0.1% is good enough) n NG suppresses the bulk part of noise cheaply Valence version of Hasenbushing in HMC AMA : a smart work sharing n Ideal approximation Oappx is strongly correlated with original one. ε ensemble R(corr) b/w O and O(appx) needs to be larger than 0.5 [C. Lehner] ε ensemble • ε, accuracy of approximaRon should be smaller than Oave appx • ΔΟrest which is staRsRcal error of Orest depends on the strength of correlaRon. • The computaRonal cost of Oappx should be much smaller than original. 10 AMA : not working n Nightmare case • Anti-correlated or bad approximation ensemble ensemble ensemble ensemble 11 Examples of covariant approximations n Low mode approximation used in the Low Mode Averaging ( LMA ) L. Giusti et al (2004), see also T. DeGrand et al. (2004) accuracy control : # of eigen mode 1000 100 10 1 0 0.5 1 1.5 2 2.5 Deflation using low eigenmodes from Lanczos [alsoNeff et al, JLQCD ] asfermion We describe the Dirac operator of domain-wall fermion 4D even-odd b Wecan can also describe the Dirac operator of domain-wall as 4D n 4D even/odd preconditioning [ R. Arthur ] DDW ! " " M! K(M4 )eo 5 = , 4 )eo M5M K(M DDWK(M = 4 )oe , 5 K(M4 )oe and the inverse of Dirac operator is given as 4D even-odd preconditioned form, and the inverse!of Dirac operator is given preconditio " ! as 4D "even-odd ! " −1 −1 −1 DDW = Dee 0 1 −K(M4 )eo M5 "! "! −10 11 −K(M 0 0 0 1 Dee 1 0 ! −1 −KM5 (M4 )1oe M5−1 −1 D = 2 −1 −1 (M ) −1 DW Dee = M5 − K−KM (M4 )eo 4 oe M5 5 4 )oe 5 M(M n n 0 1 0 2 Polynomial accelerated = solve M5 the −K (M4 )ofeoD M5−1by(M and then we D need inverse CG method. The inverse of M is ee to 4 )oe Pn( H_DWF) M (s, = to A solve P +B + δ , of Dee by CG method. The inv and then wet)need theP inverse With shift m κ m κ m κ ··· m κ Bs,t M5−1 (s, t) = As,t PR + P + δ , −κ m κ m κ · · · m κ L st H-> H-c 1 −κ −κ m κ L Lκ−1 · · · Lm−2 A = − m κ m κ m κ · f f f 1+m κ .. .. .. .. eigen Compression L L −1 . . . . −κ m κ m κ · f f / decompression 1 −κ −κ2 −κ ·m ·· κ mL κ · −κ −κ ee −1 5 s,t R s,t L f st Ls 5 Ls −1 f f Ls 2 n M5 s,t s Ls f Ls −2 f f Ls −1 f f s f Ls sf s Ls −1 Ls −2 As,t = − .. Bs,t = ATs,t , 1 + mf κLs . 1 L s −1 κ = . −κ 5 − M5 ψ = v1 + v2 H (ψ) = λ1 v1+λ2 v2 Bs,t = ATs,t , It has important property that the Γ 5 −κLs −2 3 s Ls −3 .. . 2 f fs Ls .. . −κLs −3 · (not γ5 ) Hermicity of Dee does not flip the 1 bases, and therefore Hermitian matrix is given bases in contrast to 5D even-odd κ = . multiplication of Γ5 , 5 − M5 † !"#$%"&'(&')"%*"+,-,".( /0('1'.$"&&(&')"%*"+,-,".( ( ( ( ( ( !"#(%"&'(&')"%*"+,-,".( ( Examples of Covariant Approximations (contd.) n All Mode Averaging AMA Sloppy CG or Polynomial approximations 1000 100 10 1 0 1 0.5 1.5 2 2.5 accuracy control : • low mode part #10, of theemini-max ig-‐mode Figure 3: Polynomial approximation of 1/λ, N : = approximati the relative error, for λ ∈ [0.05 , 1.67 ]. • mid-‐high mode : degree of poly. poly 2 2 All mode approximation via sloppy CG poly(lam) * lam vs lam poly(lam) * lam vs lam 16x32x16, ml=0.01, raj 2200, nev=0 16x32x16, ml=0.01, traj 2200, nev=100 1.05 rsd = 1e-‐2 1.05 rsd = 1e-‐3 1 1 rsd = 10.95 e-‐4 inv_poly.nev0_rsd1e-2_k14 inv_poly.nev0_rsd1e-3_k73 inv_poly.nev0_rsd1e-4_k296 inv_poly.nev0_rsd1e-6_k863 inv_poly.nev0_rsd1e-8_k1514 inv_poly.nev0_rsd1e-10_k2061 0.95 inv_poly.nev_rsd1e-2_k14 inv_poly.nev_rsd1e-3_k46 inv_poly.nev_rsd1e-4_k84 inv_poly.nev_rsd1e-6_k158 inv_poly.nev_rsd1e-8_k234 inv_poly.nev_rsd1e-10_k310 rsd = 1e-‐6 0.9 0.9 0.001 0.0001 0.01 no eigenvector assists 1 0.01 0.0001 1e-06 1e-08 0.01 0.1 1 1 100 eigenvector assists | poly(lam) * lam - 1 | vs lam • Conjugate residual ith sloppy convergence criteria, wml=0.01, hich trajis enev=100 quivalent to | poly(lam) * lam - 1 | vs w lam 16x32x16, 2200, 16x32x16, ml=0.01, traj 2200, nev=0 construct a polynomial approximaRng 11/λ • The starRng vector needs to be translaRon invariant to be a covariant approx. 0.01 • low eigenvectors reduces the size of the dynamic range of 1/λ 0.0001 → Beher approximaRon with smaller polynomial degrees err-inv_poly.nev_rsd1e-10_k310 err-inv_poly.nev_rsd1e-2_k14 1e-06 • low λ region has larger r elaRve e rrors err-inv_poly.nev_rsd1e-3_k46 err-inv_poly.nev0_rsd1e-10_k2061 err-inv_poly.nev_rsd1e-4_k84 err-inv_poly.nev0_rsd1e-2_k14 err-inv_poly.nev_rsd1e-6_k158 • One could employ other c onstrucRon o f p olynomial a pproximaRon f or 1/λ, 1e-08 err-inv_poly.nev0_rsd1e-3_k73 err-inv_poly.nev_rsd1e-8_k234 err-inv_poly.nev0_rsd1e-4_k296 such as min-‐max, conjugate rasidual1e-10 err-inv_poly.nev0_rsd1e-6_k863 Correlation n NN propagator at short time-slice 18 Correlation n NN propagator (LMA) at short time-slice 19 Correlation n NN propagator (AMA) at short time-slice 20 AMA results for hadron 2pt functions [ E. Shintani ] Nucleon effective mass using DWF imp O O (LMA,NG=32) imp O (AMA,NG=32) Effective mass 0.8 0.7 0.6 0 Point sink Smeared sink 5 10 Time slice 15 0 5 10 Time slice 15 0 5 10 Time slice 15 Hadronic vacuum polarization( AsqTad ) 0.2 0.2 0.18 0.18 0.16 2 -Π(Q ) 0.16 0.14 0.14 0.12 0 0.2 0.4 0.1 0.08 0.06 0 2 4 6 2 2 Q (GeV ) 8 10 TABLE IV. Computational cost. The unit of cost is one quark propagator without deflated CG, per configuration. NG = 32 for nucleon masses and 708 for HVP. The last column gives the cost to achieve the same error for each method, normalized to [2] (nucleon mass mN ) and [1] (HVP) and scaled by the errors in Tab. III. HVP scaled costs are maximum and minimum 2 n x 16 for Nucleon inDWF the range Q2 = mass 0 − 1 (M GeV . For m = 3fm) 0.005, in [2], nonPS=330MeV, relativistic spinors(MPS=470 were used which scaled costs in n x 20 for AsqTad HVP MeV, means 5 fm) the(appx) this case were increased by two. The cost of OG for AMA n should be better for lighter mass & larger volume ? is split to show the sloppy CG and low-mode costs separately. Aoki, Rudy A man Christ, T Hashimoto, T Kaneko, Chris Ruth Van de Jianglei Yu an collaborations CPS QCD lib are used, wh DOE SciDAC the Japanese 22540301 (TI) DOE grants D 92ER40716 (T Research Cen necessary for c Cost comparison for test cases (appx) mN AMA LMA Ref. [2] AMA LMA Ref. [2] HVP AMA LMA Ref. [1] a Nconf Nmeas LM O OG m = 0.005, 400 LM 110 1 213 18 91+23 110 1 213 18 23 932 4 - 3728 m = 0.01, 180 LM 158 1 297 74 300+22 158 1 297 74 22 356 4 - 1424 m = 0.0036, 1400 LM 20 1 96 11 504+420 20 1 96 11 420 292 2 - 584 - Tot. scaled cost gauss pt 350 0.063 0.065 254 0.279 0.265 3728a 1 1 693 0.203 0.214 393 0.699 0.937 1424 1 1 max min 1031 0.387 0.050 527 10.3 3.56 584 1 1 [1] C. Aubin, arXiv:1205 [2] T. Yamaza et al., Phy [hep-lat]. [3] E. Shinta Variants of CAA n CAA (Covariant Approximation Averaging) • Name approximation, approximation accuracy control • LMA (Low Mode Averaging) • • • low mode approx of propagator, # of eigen vectors AMA (All Mode Averaging), low mode (optional)+Polynomial approx, (# of eigenV) Polynomial degree (also other type of minimization) Heavy quark averaging [T. Kawanai] heavier mass quark prop as an approx of light prop quark mass ????? Other Examples of Covariant Approximations n Less expensive (parameters of) fermions : • • • • n n Larger mf Smaller Ls DWF Mobius even staggered or Wilson ….. Different boundary conditions More than one kinds of approximation (c.f. multi mass Hasenbushing) Strongly depends on Observables / Physics (YMMV) Would work better for EXPENSIVE observables and/or fermion, potentially a game changer ? reduces the dispersion of rest part. Also I plot the parameter t and q. These parameters satisfy the relation r = q + 1. We need r > 1/2 to reduce the statistical error in final analysis. Finally I show the results for the effective mass in the Fig.3. and Table 1. The results for improvement are consistent with original ones within error bars and its statustical error bars are slightly smaller than original ones. Actually approximation plotted as green points is corresponding to Bs meson in this test. The ratio of dispersion for improvement to original one is also plotted as gray line on the right axis. Fitting results are shown in Table 1. As a result, the improvement finaly gives 82 % statistical error of original. This improvement rate is corresponding to statistics increased by half. As 24^3x64x16, 20 config , an experiment, I have tried to extrapolate the data by using the improvement and approximate one. moff=0.01 (target) mtechniques f=0.04 “approximaRon” The advantage this error reduction using heavier quark propagator is what approximate is used to extrapolate to physical point. Finally I show the results for the effective mass in the Fig.3. Larger mass as CAA [ Taichi Kawanai ] 3.14 100 3.1 75 3.08 50 3.06 25 3.04 0 3.02 3.04 4 6 8 10 12 time slice [lattice unit] 14 16 improvement original 3.1 3.08 mass effective mass 3.12 3.12 original improvement approxmation 3.06 18 0 0.5 1 1.5 2 Figure 3: (left) The effective mass plots. The ratio of dispersion for improvement to original one is also plotted as gray line. (right)Extrapolation and Table 1. 3 Estimation for error reduction per cost. Summary n CAA , LMA, AMA, …. : Class of Statistical error reduction technique • AMA is a valence version of the Hasenbush trick • AMA could improve existing data easily 1. Do Full CG for selected config / source (existing data : This expensive part is already done ) 2. 3. Find a good approximation (accuracy of sloppiness / number of eigenvalue) that reproduce your exact CG result by, say, 95% (mathematically find a strongly correlated approximation, R(corr) > 0.5 ) Subtract the approx obs with same source location as full CG 4. Perform many source location using approx obs, average, add back You could use other config. l l Your Millage May Varies…. Home Work : find a good / cheap / funny approximations AMA in USQCD Static-light [ PI Tomomi Ishikawa ] 16^3x64x16, 20 conf, 100 eigenvectors 1 40 O O - Oapprox 0.5 Oapprox src summed 20 10 O - Oapprox 0.5 0 Oapprox 30 Oapprox src summed 20 10 Oapprox Oapprox 0 O - Oapprox Oapprox error normalized by O [%] O - Oapprox 1 portion of 2pt function error normalized by <O> [%] portion of 2pt function 30 5 10 15 t 0 0 0 5 10 t LMA 15 0 5 10 15 t 0 0 5 10 t AMA 15 3pt function [ E. Shintani ] n Application to the form factor measurement • CP-even and CP-odd nucleon EM form factor • Complicated structure in the ratio method Cf. Yamazaki et al., PRD79, 114505 (2009) Ratio has complicated combination of both low and high mode, so AMA has more advantage than LMA even if AMA need larger cost. 30 LMA AMA q2 GeV2 Ge (LMA) Ge (AMA) 0.0 0.96(11) 0.98(3) 0.198 0.72(12) 0.73(3) 0.382 0.58(10) 0.56(3) 0.574 0.48(10) 0.45(2) 0.733 0.52(12) 0.44(3) StaRsRcal error of AMA is about 3-‐-‐5 Rmes smaller than LMA. 31 Comparison of isovector F1,2 [ E. Shintani ] m=0.01 m=0.01 • Results are well consistent with full staRsRcs. • StaRsRcal error is much reduced in AMA rather than LMA. • Compared to full staRsRcs, AMA results (m=0.01) have sRll 1.2 -‐-‐ 1.5 Rmes larger staRsRcal error (except for F1(0)). • This may be due to correlaRon between different source points. 32 CP-odd part n Nucleon 2pt function with θ reweighting • Q is topological charge. • α which is CP-odd phase is necessary to extract EDM form • • factor. It is good check of applicability of LMA/AMA to CP-odd sector. Effective mass plot shows the consistency of the above formula 33 CP-odd part [ E. Shintani ] • There is good plateau in AMA, and this figure actually shows CP-‐odd part has consistent exponent with CP-‐even(nucleon mass) part as expected. • CP-‐odd part has both contribuRon from high and low lying mode. • AMA works well even in CP-‐odd sector ! 34 -1 -1.2 Nucleon Magnetic formfactor -1.4 -1.6 0 2 4 6 8 10 12 t Gm at m=0.01 for N Gm at m=0.01 for N 0 q22=0.202 q =0.395 q22=0.583 -0.2 0 GeV22 GeV GeV22 -0.2 q =0.753 GeV -0.4 -0.4 -0.6 -0.6 -0.8 -0.8 -1 -1 -1.2 -1.2 -1.4 -1.4 -1.6 -1.6 0 2 4 6 8 10 t q22=0.200 GeV22 q =0.388 GeV q22=0.569 GeV22 q =0.721 GeV 12 0 2 4 6 8 10 12 t AMA Original C G Figure 7: G for neutron at m = 0.01. (Top) Original, (Middle) LMA m G at m=0.01 for N m 0 -0.2 q22=0.199 GeV22 q =0.389 GeV q22=0.576 GeV22 q =0.731 GeV N, m=0.01, point sink N, m=0.005, point sink Full staRsRcs LMA [7,15] m = 0.01 AMA [7,15] staRsRcs Full staRsRcs (Gaussian sink) 0.712(16) 0.710(5) Nconf=80, N’mes=32 0.703(4), Nconf = 356, Nmes=4 m = 0.005 0.673(22) 0.666(13) Nconf=26, N’mes=32 0.663(4), Nconf = 932, Nmes=4 Yamazaki et al., PRD79, 114505 (2009) 36 Cost (in the case of 24cube m=0.01) Use of unit of quark propagator “prop” in full CG w/o deflation Yamazaki et al., PRD79, 114505 (2009) n Case of full statistics In Nconf = 356, Nmes=4, Total : 356×4 = 1424 prop n Case of AMA w/o deflation Since calculation of Oappx need 1/50 prop, then in Nconf=81, N’mes=32 Total : 80 + 80×32/50 = 131 prop ⇒ 10 times fast n Case of AMA w/ deflation When using 180 eigenmode, calculation of Oappx need 1/80 prop, but in this case the calculation of lowmode is ~1 prop/configs. Deflated CG makes reduction of full CG to 1/3 prop, then Total : 80/3 + 80×32/80 + 80 = 138 prop ⇒ 10 times fast Note that stored eigehmode is useful for other works. 37 Correlation n NN propagator at long time-slice 38 Correlation n NN propagator (LMA) at long time-slice 39 Correlation n NN propagator (AMA) at long time-slice 40 Other technical details n n n n Implicitly Restarted Lanczos with Polynomial acceleration and spectrum shifts for DWF and staggered in CPS++ [ E. Shintani, T. Blum, TI ]. Eigen Vector compression / decompression Sea Electric Charge is now controlled by QED reweighting [ T. Ishikawa et. al. arXiv:1202.6018 ] Aslash-SeqSrc method