cardwell lattice

Transcription

cardwell lattice
Ising Spin Model
Model of ferromagnet - collection of magnetic
moments associated with the spins of atoms.
Treat the spins as located at fixed points on a
2D lattice. Ignore quantum mechanical
effects (active area of research).
See:
Computational Physics,
N. Giordano, H. Nakanishi,
2nd Ed., McGraw Hill
Consider spin projections along either +z or -z
direction.
Consider only nearest neighbor interactions.
E = −J ∑ si s j
ij
si, j = ±1
where ij means all nearest neighbor spins
Summer 2009
Data Analysis and Monte Carlo Methods
Lecture 10 1
Ising Spin Model
E=J
E = −J ∑ si s j
ij
€
s = −1 s = 1
E = −J
€
s=1 s=1
is a link. There are
€
€
(N rows − 1)N columns + (N columns − 1)N rows = 2N columns N rows − N rows − N columns
links. Since simulations of large arrays is limited by computer
time, we often employ periodic boundary conditions, where we
also have a link between spins in rows 1,Nrows, and spins in
columns 1,Ncolumns. In this case, there are Ncolumns*Nrows links.
Summer 2009
Data Analysis and Monte Carlo Methods
Lecture 10 2
Statistical Mechanics
In the spirit of Statistical Mechanics, we consider the energy
associated with a spin state (which corresponds to knowing the
spin directions of all atoms on the lattice). For a system in
equilibrium with a heat bath, the probability associated with being
in a given state depends only on the energy:
Pα ∝ e − E
α
/ kT
For a square lattice with N sites per direction, there are N2 sites
and a total of €
possible states. Even a small lattice with
2N
N=10 is impossible to study with ‘brute force’. On the other hand,
such a small lattice is very far from typical real systems, where
we deal with O(1023) spins. Boundary conditions in the
€
simulation become important.
2
Summer 2009
Data Analysis and Monte Carlo Methods
Lecture 10 3
Statistical Mechanics
Typical quantities of interest are:
< E >= ∑ Pα Eα
< M >=∑ Pα Mα
α
α
where Mα = ∑ si
i
Note that there are often many ‘microstates’ - here, arrangement
of spins - with the same energy. The normalization of the
€
probability
is
e − E / kT
Pα =
− E / kT
∑e
α
α
α
Competition between highest probability state (min energy) and
large number of lower probability states (max entropy). Balance
€ by the temperature, T.
controlled
Summer 2009
Data Analysis and Monte Carlo Methods
Lecture 10 4
Energy vs Entropy
+
+ +
+
+ +
+ + +
+
+ +
+ + +
+ +
+
Min energy when all spins are lined up:
E=-18
nα=2 (all up or all down)
If one spin opposite direction
E=-14
nα=18 (one up or one down)
Which is more likely ? Depends on T.
Summer 2009
Data Analysis and Monte Carlo Methods
Lecture 10 5
Mean Field Theory
Before starting with the simulations, get some guidance on
expected results from an approximate calculational approach,
called ‘Mean Field Theory’.
For an infinitely large system, all spins are equivalent (edge
effects unimportant). Then, e.g.,
M = ∑ si = N 2 si
i
where we consider the average value of the spin projection for
one spin, and the average is done over all the states weighted by
their probability.
€
Imagine we add an external magnetic field, H. Then, the energy
becomes:
E = −J ∑ si s j − µH ∑ si
ij
Summer 2009
i
Data Analysis and Monte Carlo Methods
Lecture 10 6
Mean Field Theory
Suppose for the moment we only have one spin in the external
magnetic field. Then, there are two possible states, with the spin
aligned or not aligned:
s= +
E + = −µH
P+ = Ce + µH / kT
s=−
E − = +µH
P− = Ce − µH / kT
Requiring total probability 1 gives:
€
so that
€
Summer 2009
€
e ± µH / kT
P± = + µH / kT
e
+ e − µH / kT
e + µH / kT − e − µH / kT
s = ∑ sP± = + µH / kT
− µH / kT = tanh(µH /kT)
e
+e
s= ±1
Data Analysis and Monte Carlo Methods
Lecture 10 7
Mean Field Theory
The Mean Field approximation consists of the approximation that
the interaction of a spin with its neighbors can be represented by
the interaction with an effective magnetic field, Heff:
E = −µH eff ∑ si − µH ∑ si
i
i
H eff
J
= ∑ sj
µ [ j]
and further replacing the values of the spins by their average
values:
€
zJ
H eff =
s z is an effective number of spins producing H eff
µ
If we now drop the external magnetic field, we get
s = tanh(zJ s /kT)
€
Summer 2009
Implicit relation
Data Analysis and Monte Carlo Methods
Lecture 10 8
Mean Field Theory
Need to find the crossing
points, or, the roots of the
equation
s − tanh(zJ s /kT) = 0
s = 0 is always a solution
€
€
T in units of J/k
Summer 2009
Paramagnetic phase
For T<TC, other solutions are
also possible (ferromagnetic
phase). These have net
magnetization
Data Analysis and Monte Carlo Methods
Lecture 10 9
Root Finding
To find the solutions for the average spin, we need to find the
roots of
s − tanh(zJ s /kT) = 0
Recall the Newton-Raphson method:
€
f (x)
x i+1 = x i −
f ′ (x)
We can easily differentiate f(x) so that this method converges
quickly.€
f (x) = x −€
tanh(zJx /kT) = 0
f ′ (x) = 1−
Summer 2009
e 2zJx / kT
zJ
4
kT
+ e −2zJx / kT + 2
Data Analysis and Monte Carlo Methods
Lecture 10 10
Aside – Root finding and Optimization
We want to solve problems of the sort
max h(θ )
θ ∈Θ
We start with some standard numerical techniques for doing this
(steepest descent, conjugate gradient, Newton-Raphson). They
€
work well in a small number of dimensions, but not in large
dimensional spaces. They also require some analytic knowledge
of the function to work well.
Recall – optimization is very similar to root finding (zero of
derivative)
Later we consider Monte Carlo methods (following Monte Carlo
Statistical Methods, C. Robert, G. Casella, 2nd Ed. Chapter 5.)
Summer 2009
Data Analysis and Monte Carlo Methods
Lecture 9 11
Newton-Raphson Method for Root Finding
Here we use the slope (or also 2nd derivative) at a guess position
to extrapolate to the zero crossing. Can easily generalize to many
parameters and many equations. However, it also has its
drawbacks as we will see.
What if this is our guess ?
Summer 2009
Data Analysis and Monte Carlo Methods
Lecture 11 12
Several Roots
Suppose we have a function which as several roots; e.g.
[
]
0 = −κ 3 + 8κ 2 (a 2 + b 2 ) − κ (16a 4 + 39a 2b 2 + 16b 4 ) + 28a 2b 2 (a 2 + b 2 )
It is important to analyze the problem and set the bounds
correctly. For the NR method, if you are not sure where the root
is, then need to give several different starting values and see
what happens.
Summer 2009
Data Analysis and Monte Carlo Methods
Lecture 11 13
Several variable
The Newton-Raphson method is easily generalized to several
functions of several variables:
 f1 (x1,, x n )
 =0
f (x) = 



 f n (x1,, x n )
We form the derivative matrix:
€
∂f 1 
 ∂f 1

 ∂x 1
∂x n 
Df =     
 ∂f n
∂f n 
 ∂x  ∂x 
 1
n
If the matrix is not singular, we solve the SLE:
 (0)
 (0)  (1)  (0)
0 = f ( x )+ Df ( x )( x − x )
The iteration is
Summer 2009
 (r +1) € (r )
 (r )
x
= x − Df ( x )
(
)
−1
 (r )
f (x )
Data Analysis and Monte Carlo Methods
Lecture 11 14
Optimization
We now move to the closely related problem of optimization
(finding the zeroes of a derivative). This is a very widespread
problem in physics (e.g., finding the minimum χ2, the maximum
likelihood, the lowest energy state, …). Instead of looking for
zeroes of a function, we look for extrema.
Finding global extrema is a very important and very difficult
problem, particularly in the case of several variables. Many
techniques have been invented, and we look at a few here.

Here, look for the minimum of h( x ) . For a maximum, consider

the minimum of -h( x ) . We assume the function is at least twice
differentiable.
€
Summer 2009
€
Data Analysis and Monte Carlo Methods
Lecture 11 15
Optimization
First and second derivatives:
∂h 
 t   ∂h
g ( x ) =  ,, 
 ∂x 1
∂x n 
a vector
 ∂ 2h
∂ 2h 
 ∂x ∂x  ∂x ∂x 
1 n
  1 1

H( x ) =

 2

2
∂ h 
 ∂ h


 ∂x n∂x1
∂x n∂x n 
the Hessian matrix
General technique:

1. Start with initial guess x ( 0)

2. Determine a direction, s, and a step size λ


3. Iterate x until g t < ε, or cannot find smaller h
Summer 2009
Data Analysis and Monte Carlo Methods



x ( r +1) = x (r ) + λ r sr
Lecture 11 16
Steepest Descent
Reasonable try: steepest descent


sr = − gr


dh( xr − λgr )
step length from 0 =
dλ
Note that consecutive steps are in orthogonal directions.
2
χ
As an example, we consider minimizing a :
€

 2
λ are the parameters of the function to be fit
n
(yi − f (xi ; λ ))
2
yi are the measured points at values xi
χ =∑
2
€
wi
i=0
wi is the weight given to point i
f (x; A, ϑ ) = A cos(x + ϑ )
In our example:
and
wi = 1 ∀i
we want to minimize χ2 as a function of A and ϕ
Summer 2009
Data Analysis and Monte Carlo Methods
Lecture 11 17
Steepest Descent
x
y
0.
0.
1.26 0.95
2.51 0.59
3.77 -0.59
5.03 -0.95
6.28 0.
7.54 0.95€
8.80 0.59
Summer 2009
 2
(yi − f (xi ; λ ))
2
χ =∑
2
w
i=0
i
n
2
8
h(A,ϑ ) = χ = ∑ (y i − Acos(x i + ϑ )) 2
i=1
Data Analysis and Monte Carlo Methods
Lecture 11 18
Steepest Descent
To use our method, need to have the derivatives:
 8 2(y − Acos(x + ϑ ))(− cos(x + ϑ ))
∑
i
i
i



i=1
g(A,ϑ ) = 8


 ∑ 2(y i − Acos(x i + ϑ ))(Asin(x i + ϑ ))
 i=1



dh( x r − λgr )
d

recall step length from 0 =
=
h( x r +1 )
dλ r
dλ r
€
d
d 



T
T
h( x r +1 ) = ∇h( x r +1 ) ⋅
x r +1 = −∇h( x r +1 ) gr
dλ r
dλ r
Setting to zero we see that the step length is chosen so as to
make the next step orthogonal. We proceed in a zig-zag
pattern.
Summer 2009
Data Analysis and Monte Carlo Methods
Lecture 11 19
Steepest Descent
* Starting guesses for parameters
A=0.5
phase=1.
step=1.
* Evaluate derivatives of function
gA=dchisqdA(A,phase)
gp=dchisqdp(A,phase)
*
h=chisq(A,phase)
*
Itry=0
1 continue
Itry=Itry+1
* update parameters for given step size
A1=A-step*gA
phase1=phase-step*gp
* reevaluate the chisquared
h1=chisq(A1,phase1)
* change step size if chi squared increased
If (h1.gt.h) then
step=step/2.
goto 1
Endif
* Chi squared decreased, keep this update
A=A1
phase=phase1
gA=dchisqdA(A,phase)
gp=dchisqdp(A,phase)
Data Analysis and Monte Carlo Methods
Lecture 11 20
Determining the step
size through the
orthogonality is often
difficult. Easier to do it
by trial and error:
Summer 2009
Steepest Descent
* Starting guesses for parameters
A=0.5
phase=1.
step=1.
Minimum found is A=-1, phase=π/2.
Previously, A=1, phase=- π/2
Summer 2009
Data Analysis and Monte Carlo Methods
Lecture 11 21
Other Optimization Techniques
Conjugate Gradient
{
Similar to steepest descent, but slightly different way of choosing
direction of next step:



x r +1 = x r + λ r sr


s0 = − g0



sr +1 = − gr +1 + β r +1sr
new term

t 
λ r is chosen to minimize h( x r +1 ). This yields gr +1gr = 0
€

Here we allow a further step in the sr direction. One choice
(Fletcher - Reeves) for β r +1 is
gr2+1
β r +1 = 2
gr
Summer 2009
Data Analysis and Monte Carlo Methods
Lecture 12 22
Newton-Raphson
Assume the function that we want to minimize is twice
differentiable. Then, a Taylor expansion gives
t  1  t 
 
h( x + α ) ≈ a + b α + α Cα
2
where
€
Now
    
 ∂ 2h 

a = h( x ), b = ∇h( x ) = g( x ), C = 
=H

 ∂x i∂x j 
  


∇h( x + α ) ≈ b + Cα
Because C is symmetric (check)


€


−1
For an extremum, we have b + Cα = 0 α = −C b
€


 −1  
x
=
x
−
H(
x
or
r +1
r
r ) g( x r )
€
Summer 2009
Data Analysis and Monte Carlo Methods
Lecture 12 23
Newton-Raphson



 
x r +1 = x r − H( x r ) −1 g( x r )

 −1  
s
=
H(
x
i.e., the search direction is
r ) g( x r )
and λ = 1
This converges quickly (if you start with a good guess), but the
penalty is that the Hessian needs to be calculated (usually
€
numerically)

Again, convergence is when s is sufficiently small
How would we calculate€the Hessian numerically ? Use Lagrange
polynomial in several
€ dimensions and work it out
Summer 2009
Data Analysis and Monte Carlo Methods
Lecture 12 24
Bounded Regions
The standard tool for minimization in particle physics is the
MINUIT program (CERN library). It has also made its way well
outside the particle physics community.
Author: Fred James
Here is how MINUIT handles bounded search regions - it
transforms the parameter to be optimized as follows:
b− a
 λ−a 
λ ′ = arcsin 2
−1
λ = a+
sin λ ′ + 1)
(
 b− a 
2
λ is the exernal (user) parameter
λ ′ is the internal parameter
MINUIT is available within PAW, ROOT …
€Summer 2009
Data Analysis and Monte Carlo Methods
Lecture 12 25
MINUIT
MINUIT uses a (variable metric) conjugate gradient search
algorithm (along with others). Basic idea:
•  assume that the function to minimize can be approximated by a
quadratic form near the minimum
•  build up iteratively an approximation for the inverse of the
Hessian matrix. Recall
 
    1 t 
h( x + α ) ≈ h( x ) + ∇h( x ) ⋅ α + α Hα
2
the approximation for the Hessian is updated as follows:








( x i+1 − x i ) ⊗ ( x i+1 − x i ) [ H i ⋅ (∇hi+1 − ∇hi )] ⊗ [ H i ⋅ (∇hi+1 − ∇hi )]






H i+1 = H i + 
−

€
( x i+1 − x i ) ⋅ (∇hi+1 − ∇hi )
(∇hi+1 − ∇hi ) ⋅ H i ⋅ (∇hi+1 − ∇hi )
where the ⊗ symbol represents an outer product of two vectors (a matrix)
 
(a ⊗ b )ij = aib j
Summer 2009
Data Analysis and Monte Carlo Methods
Lecture 12 26
Stochastic Exploration
Brute force:
•  generate values of Θ using a uniform distribution, and find the
maximum using the approximation:
max h(θ ) ≈ h * = max(h(u1 ),h(u2 ),...,h(um )) ui ~ UΘ
θ ∈Θ
*
If h = h(u* ),
θ * ≈ u*
This will always work, but it may be extremely slow. Obviously, if
€ we can sample according to h(θ) we will be much more efficient.
Let’s try it out on our old friend:
h(x) = [cos(50x) + sin(20x)]
Summer 2009
€
2
Data Analysis and Monte Carlo Methods
Lecture 9 27
Stochastic Exploration
Summer 2009
Data Analysis and Monte Carlo Methods
Lecture 9 28
Stochastic Exploration
Let’s look at a somewhat more complicated function:
2
h(x, y) = [ x sin(20y) + y sin(20x)] cosh(sin(10x)x) +
2
[ x cos(10y) − y sin(10x)] cosh(cos(20y)y)
€
Many local minima
Global minimum at (0,0)
Summer 2009
Data Analysis and Monte Carlo Methods
Lecture 9 29
Stochastic Exploration
About 40000 iterations
needed to reach real
minimum. May not be
stable.
Can we do better ?
Have principle problem that many of the
local minima have minimum value very
close to the absolute minimum.
Summer 2009
Data Analysis and Monte Carlo Methods
Lecture 9 30
Mean Field Theory - Average Spin
Ferromagnetic material - spontaneous magnetization
2nd order phase transition
(slope discontinuous) at
TC=zJ/k=4J/k
M (or <s>) is an order
parameter. It tells us what
phase the system is in. At
high T, disordered, at low
T ordered.
TC
Paramagnetic material - M=0
Summer 2009
Data Analysis and Monte Carlo Methods
Lecture 10 31
Order Parameter & Critical Exponent
Writing x=<s>, we have
zJx 1  zJx 
x = tanh(zJx / kT ) x
→
− 

→0
kT 3  kT 
3
solutions x = 0, x =
3
 kT 
3  zJ
 − T   =
 zJ 
T k
3


3
T
(TC − T ) 
T
 TC 
∝ T (TC − T )1/ 2 ∝ (TC − T )β
β ≈ 1/ 2 is the 'critical exponent'
€
Summer 2009
Data Analysis and Monte Carlo Methods
Lecture 10 32
Metropolis-Hastings
We now look at the Ising spin system with a simulation:
E=J
E = −J ∑ si s j
ij
€
s = −1 s = 1
E = −J
€
s=1 s=1
is a link.
€
€
Use Markov Chain approach (Metropolis algorithm). The
Metropolis algorithm has the desired properties: aperiodic,
recurrent, irreducible. We know the form of the target distribution
up to a normalization, and Markov Chain theory guarantees
sampling according to this distribution (asymptotically).
Summer 2009
Data Analysis and Monte Carlo Methods
Lecture 11 33
Metropolis-Hastings
Algorithm:
1. 
2. 
Generate a random assignment of spins on a square NxN lattice
Calculate the starting energy, magnetization
N
2
E 0 = − ∑ si s j
links ij
3. 
M = ∑ si
i=1
Loop over temperatures. There is a unique probability distribution of the states at
each T. We therefore want to evolve the Markov Chain until we have reached a
stationary distribution for each T.
a)  Loop over ‘time’, €
where 1 unit of time is defined as N2 attempted spin flips.
i. 
Select a site at random
ii.  Flip the spin at that site, and calculate the new energy. We apply periodic
boundary conditions, so that sites at i=1 connect to i=N, and j=1 connect
to j=N.
−(E − E ) / kT
e
iii.  If Enew≤E0, accept. If Enew>E0, accept with probability
,
and, if flip accepted, set E0=Enew.
b)  Keep track of the mean energy and rms over the last tmin time steps. Declare
convergence if t> tmin and Erms<T.
new
0
€
Summer 2009
Data Analysis and Monte Carlo Methods
Lecture 11 34
Example 3x3
-
+
+
+ +
- +
-
+
+ +
+
+
-
E0=+2
M0=+1
-
-
-
Start with T=5
+ +
First attempt - flip spin in row 3
+ - +
(starting from bottom) and
column 2.
- + Energy decreased by 4 units (2 for
+ +
every net link sign) - accepted
- - +
…
Summer 2009
Data Analysis and Monte Carlo Methods
E0=-2
M0=+3
Lecture 11 35
Example 3x3
At lower T, start to get stuck in some configurations for many time
cycles.
Summer 2009
Data Analysis and Monte Carlo Methods
Lecture 11 36
Example 3x3
Transition occurs near
T=2, not T=4
Analytic calculation for
square lattice gives:
TC = 2
ln(1 + 2)
≈ 2.27
€
Summer 2009
Data Analysis and Monte Carlo Methods
Lecture 11 37
Example 10x10
See what happens when we go to a larger lattice:
Transitions get sharper for larger lattices.
Big fluctuations near the
critical point. Need to run
simulation for sufficient
time.
Summer 2009
Data Analysis and Monte Carlo Methods
Lecture 11 38
Comparison to Mean Field Theory
The value of TC from the simulation is much closer to the exact
value than mean field theory. Why ?
Also, the behavior of M vs T is very different (sharper in the
simulation). Check why this happens. First, look at the critical
exponent. Mean field theory gave β=0.5, whereas the exact
value for an Ising model on a square lattice is 1/8. What does
the simulation yield ?
As we have seen, need large enough grid to have sharp
predictions. Go to 20x20 grid, minimum of 105 iterations per
temperature, and maximum of 106.
Summer 2009
Data Analysis and Monte Carlo Methods
Lecture 11 39
Example 20x20
Note that Eavg does not go
to zero as M goes to zero
for T>TC. Indicates that
there is a correlation of the
spins of nearest neighbors.
More on this later.
To get critical exponent, need to get exponent of (TC-T)β in region
near TC. This is difficult (time consuming) because the
fluctuations are huge in this region.
Summer 2009
Data Analysis and Monte Carlo Methods
Lecture 11 40
Critical Exponent
Fit of function
M avg = A(TC − T) β
P1 = M avg
€
P2 = TC
P3 = β
Errors chosen proportional
to Mrms.
Try with more T steps near
TC
Summer 2009
Data Analysis and Monte Carlo Methods
Lecture 11 41
Critical Exponent
Fit function
M avg = AT(TC − T) β
P1 = M avg
Summer 2009
Data Analysis and Monte Carlo Methods
P2 = TC
P3 = β
Lecture 11 42
Specific Heat
Recall the relation
€
Summer 2009
C=
( ΔE )
kT 2
2
2
E rms
=
kT 2
Specific heat becomes
discontinuous at TC. Sign
is a phase transition from
disordered to ordered
system. The peak
becomes sharper for larger
grids.
Data Analysis and Monte Carlo Methods
Lecture 11 43
Correlations between Spins
As we have seen, the average energy can be non-zero even when
the net magnetization is zero. The reason is that neighboring
spins are correlated. Let’s look at the correlation:
Black square: spin up
See blocks of spins
oriented in one
direction.
Let’s calculate:
si si+ n (T) where i,i + n are in the same column, but separated by n rows.
Summer 2009
Data Analysis and Monte Carlo Methods
Lecture 11 44
Correlations 20x20
These correlations are
not properly handled in
the Mean Field Theory in particular the rapid
change near the critical
temperature.
Summer 2009
Data Analysis and Monte Carlo Methods
Lecture 11 45