Slides. - Ceremade

Transcription

Slides. - Ceremade
Cooperative and Non Cooperative Mean Field
Game Methods in Energy Systems
Roland P. Malhamé
Department of Electrical Engineering
École Polytechnique de Montréal and GERAD
Mean Fields Games and Related Topics - 3
June, 2015
1 / 43
Outline
Motivations
Non Cooperative Collective Target Tracking Mean Field
Control
Cooperative Collective Target Tracking Mean Field Control
Conclusions and future directions
2 / 43
Energy Systems
3 / 43
Motivation
A higher level of penetration of renewable sources of energy in
the energy mix of power systems (wind or solar) is
synonymous with greater variability.
Energy storage becomes an essential asset to compensate
generation/load mismatch.
Fundamental idea: Use the energy storage from electrical
sources naturally present in the power system at customer
sites based on mutually beneficial agreements (electric water
heaters, electric space heating, electric space cooling).
4 / 43
Challenges and Previous Work
Challenges: Literally millions of control points to model,
monitor and control; severe computational requirements; large
communication costs.
Past approaches: Direct load control. Send the same
interruption/reconnection signals to large collections of
devices.
Aggregate modeling scheme:
Develop elemental stochastic load model of individual load
behavior.
Build aggregate load model by developing ensemble statistics
of the devices, much as in the statistical mechanics framework.
5 / 43
An Example: A Di↵usion Model of Heating/Cooling Loads
A hybrid state stochastic system
(Malhamé-Chong, TAC 1985)
Continuous State:
Ca dxin
t =
Ua (xin
t
xout )dt + Qh mt bt dt + dwt
divide by Ca and obtain
xout )dt + Q0h mt bt dt +
in
dxin
t = a(xt
Discrete State:
mt+
0
in
m
⇡(x , m; x+ , x ) =
1
m
b
m
t
0
dwt
= mt + ⇡(xin
t ; x+ , x )
x < xin < x+
xin x+
xin  x
the operating state of the device
(1 for “on” or 0 for “o↵”)
the state of the power supply
(1 for “on” or 0 for “o↵”).
dw
Qh
m
b
1
Ua
xout
xin
Ca
6 / 43
An Example: A Di↵usion Model of Heating/Cooling Loads
The Coupled Fokker-Planck Equations
The resulting coupled Fokker-Planck equation model describing the
evolution of temperature distributions within controlled residences
T k,t [f ] =
@f
@t
@
[(a(
@
2
xa (t)) kb(t)R)f ]
@2
f,
2 @ 2
k = 0, 1
Fokker-Planck Equation Simulation
7 / 43
Remarks
The optimal control problem becomes one of controlling
PDEs using on-o↵ signals.
A fraction of customers is inevitably penalized.
The smaller this fraction, the less e↵ective the control is.
8 / 43
Implementation Principles
1
Each controller has to be situated locally: decentralized
2
Data exchange should be kept at minimum both with the
central authority and among users
3
User disturbance should be kept at minimum
9 / 43
Envisioned Overall Architecture: The case of a single central authority
Forecasts of wind
and solar generation
Aggregate
models of energy
storage capable devices
Mathematical
Programming
Uncontrollable component
of load
Energy/temperature
schedule of large
aggregates
of devices
8
< - electrical space heaters
- electrical water heaters
:
- etc.
10 / 43
Mean Field Games: The Reasons?
Two fundamental reasons:
Games are a natural device for enforcing decentralization.
The large numbers involved induce decoupling e↵ects which
allow the law of large numbers to kick in.
Practical benefits:
The resulting control laws can be computed in an open loop
manner by individual devices thus significantly reducing
communication requirements.
Control implementation is local unlike direct control, thus
permitting local enforcement of comfort and safety
constraints.
11 / 43
Non Cooperative Collective Target
Tracking Mean Field Control for
Space Heaters
12 / 43
Elemental space heating model: two changes
1 Thermostat controlled heating element is
replaced by a continuous controller.
dxin
t =
⌘
1
[ Ua (xin
t
Ca
dxt = [ a(xt
dw
xout ) + Qh (t)]dt + dwt
Qh
1
Ua
xout ) + but ]dt + dwt
xout
xin
Ca
2 The control input is redefined so that no
control e↵ort is required on average to
remain at initial temperature.
dxt = [ a(xt
xout ) + b(ut + uf ree )]dt + dwt
where uf ree , a(xi0
xout ).
Change 2 is made so that diversity is preserved in the water heater
population while mean population temperature tracks the target.
With no control e↵ort, the temperature stays at x0 . We do not penalize
the control e↵ort that is used to stay at the initial temperature at the
start of the control law horizon.
13 / 43
Constant Level Tracking Problem Setup
25
24
High Comfort Line = h
23
°
C
22
Target Mean Trajectory = y
Initial Mean Temperature
21
20
19
Low Comfort Line = z
18
17
0
0.5
1
1.5
2
2.5
hours
3
3.5
4
4.5
5
14 / 43
Classical LQG Target Tracking
JiN (ui , u i ) = E
Z
1
e
t
0
xi :
y:
ui :
⇥
(xit
⇤
y)2 q + b(uit )2 r dt,
1iN
temperature
tracking target
control
25
Target Trajectory = y
Comfort Line: Low = z
Comfort Line: High = h
Mean Temperature
Individual Trajectories
24
23
°
C
22
21
20
19
18
17
0
1
2
3
hours
4
5
6
Agents Applying LQG Tracking
15 / 43
A Novel Integral Control Based Cost Function
JiN (ui , u i ) = E
xi :
z:
ui :
Z
1
t
e
0
⇥
(xit
⇤
z)2 qt + b(uit )2 r dt
temperature
lower comfort bound
control
Integral controller embedded in mean-target deviation coefficient:
qt , t 2 [0, T ], calculated as the following integrated error signal:
qt =
x̄N :
y:
Z
t
0
)
(x(N
⌧
y)d⌧
mean temperature of the population
mean target
16 / 43
Mean Field Based Collective Target Tracking
Mean Field Based Collective Target Tracking
Novelty is that the mean field e↵ect is mediated by the
quadratic cost function parameters under the form of an
integral error as compared to prevailing mean field theory
where the mean field e↵ect is on the tracking signal
17 / 43
Fixed Point Equation System
MF Equation System on [0, 1)
d⇡t✓
=
⇡t✓ + 2a⇡t✓ b2 r 1 (⇡t✓ )2 + qt⇤ ,
dt
ds✓t
=(
+ a b2 ⇡ ✓ r 1 )s✓t ax✓0 ⇡ ✓ qt⇤ z,
dt
dx̄✓t
= (a b2 ⇡ ✓ r 1 )x̄✓t b2 r 1 s✓t ax✓0 ,
dt
Z
x̄t =
x̄✓t dF ✓ ,
⇥
qt⇤
=
Z
t
(x̄⌧
y)d⌧ .
0
Define the function space G ⇢ Cb [0, 1), where for function f 2 G,
f (0) = x̄0 and z  f (t)  x̄0 for all t 0.
✓ 2 ⇥ corresponds to all sources of heterogeneity including initial
conditions.
Theorem: Schauder’s fixed point theorem guarantees the existence of a fixed
point to the MF Equation System on space G.
18 / 43
Preliminary Numerical Experimentation Steps
Successful experiment with a naive iterative algorithm.
Note that lower comfort bound is 17, whereas the mean
tracking target is 21.
21.5
Target
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Final
21
°
C
20.5
20
19.5
19
0
100
200
300
400
hours
500
600
700
800
Iterations
19 / 43
Numerical Computation of the Mean Field Control Law (II)
Unsuccessful experiment with a naive iterative algorithm
(di↵erent parameters).
Note that lower comfort bound is 17, whereas the mean
tracking target is 21.
21.5
Target
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Final
21
20.5
20
°
C
19.5
19
18.5
18
17.5
17
0
100
200
300
400
hours
500
600
700
800
Iterations
20 / 43
Observations
26
x(0)
24
*
x
22
x(1)
20
18
16
0
5
10
15
20
25
30
35
40
45
50
From the experimentation, one concludes that the cases of
interest correspond to convergence within the region [x0 , y];
also one observes monotonicity of the temperature trajectories
in the early part of their behavior (pre crossing target y). This
suggest to us the idea of using a so-called restricted operator
whereby anytime the mean state hits the target y, it is frozen
at y.
21 / 43
Operator Definitions
Define the function space G r ⇢ G, where for functions f 2 G r , f (0) = x̄0
and y  f (t)  x̄0 for all t 0.
Define the operator
(x̄; ) : Cb [0, 1) ! C[0, 1):
Z t
qt = (x̄t ; ) ,
(x̄t y)dt
0
Define T : C[0, 1) ! C[0, 1):
8
d⇡t✓
>
>
=
⇡t✓ + 2a⇡t✓ b2 r 1 (⇡t✓ )2 + qt ,
>
dt
>
< ds✓t
=(
+ a b2 ⇡ ✓ r 1 )s✓t ax✓0 ⇡ ✓ qt z,
dt
x̄t = (T q)(t) ,
✓
dx̄
>
2 ✓
1
✓
2
1 ✓
✓
t
>
>
> dt =R (a ✓ b ✓⇡ r )x̄t b r st ax0 ,
:
x̄t = ⇥ x̄t dF
Hence, write the MF Eq. System for Collective Target Tracking as
x̄t = (T
)(x̄)(t)
= (Mx̄)(t).
22 / 43
Restricted Operator
Define Th 2 (R+ [ 1) as the first time that x̄ hits y.
Restricted Operator
Define T r : C[0, 1) ! Cb [0, 1):
⇢
T (qt ) for t 2 [0, Th )
Tr ,
y
for t 2 [Th , 1)
Define Mr , (T r
) : Cb [0, 1) ! Cb [0, 1).
23 / 43
A Numerical Algorithm for the Restricted Operator
k=0
while kx̄
x̄old k1 > ✏1 , do
x̄old(2)
= x̄old
= x̄
if mod(k, 2) == 1 then
xold
q = (x̄; )
⇤ then
if q1 > q1
1
= ⇥ 1+✏
2
q = (x̄; )
end if
elseif |x̄
q=
x̄old(2) | == 0 then
r (x̄;
else
q=
)
⇤
Define q1
(x̄) of x̄ for x̄t ! y as t ! 1.
⇤
q1
=
a(a
)r
b2
✓
x̄0 x1
x1 z
◆
(x̄; )
end if
x̄ = T r (q)
k =k+1
end while
return x̄.
24 / 43
Observations
Coefficients can be tuned to guarantee successful convergence
at the cost of slowing collective dynamics.
The algorithm calculates a ⇤ such that there exists a
desirable fixed point trajectory for the operator
M=T
(x̄; ) : G ! G.
The numerical algorithm always converges to a smooth fixed
point trajectory.
25 / 43
✏-Nash Theorem
Theorem Under technical conditions the collective target tracking
MF stochastic control law generates a set of controls
N , {(ui ) ; 1  i  N }, 1  N < 1, with
Ucol
(uit ) =
b/r(⇡ti xit + sit ),
t
0,
such that
(i) all agent system trajectories xi , 1  i  N, are stable in the
sense that
Z 1
E
e t kxit k2 dt < 1;
0
N ; 1  N < 1} yields an ✏-Nash equilibrium for all ✏ > 0;
(ii) {Ucol
there exists N (✏) such that for all N N (✏)
JiN (ui ) , (u i )
inf
ui 2UgN
✏
JiN
ui , (u i )
 JiN (ui ) , (u i )
.
26 / 43
Simulation
400 heaters with maximum power: 5kW
2 experiments:
increase 0.5 C mean temperature,
decrease 0.5 C mean temperature,
over a 6 hours control horizon
the central authority provides the target, local controllers
apply collective target tracking mean field
27 / 43
Energy Release: Accelerated Engineering Solution
Accelerated Engineering Solution: No Control until agent’s temperature
reaches its individual steady state:
xi1 = xi0
⇤
b 2 q1
(xi0 z)
⇤ b2
ar(a
) + q1
25
Target Trajectory = y
Comfort Line: Low = z
Comfort Line: High = h
Mean Temperature
Individual Trajectories
24
23
°
C
22
21
20
19
18
17
0
1
2
3
4
5
6
hours
Agents Applying Collective Target Tracking MF Control
Accelerated Engineering Solution
28 / 43
Cooperative Collective Target
Tracking Mean Field Control for
Space Heaters
29 / 43
Methodology
Agents collectively minimize the social cost function instead
of their individual cost functions.
Person by person optimization at the population limit
Solution Concept: ✏-optimality instead of ✏-Nash
30 / 43
Social Cost
Remember the non-cooperative cost function:
Z 1
N i
i
Ji (u , u ) = E
e t lN (·)dt,
0
⇥
⇤
lN (·) = (xit z)2 qt + (uit )2 r ,
where
qt =
Z
t
0
)
(x(N
⌧
y)d⌧ .
Social Cost Function:
N
Jsoc
(u) =
N
X
JjN (uj , u
j
).
j=1
31 / 43
Non Tractability of the Centralized Problem
Social running loss function:
N
X
[(xjt
z)2 qt + (ujt )2 r].
j=1
For smoothness of qt , change qt to
✓ Z t
)
qt =
(x(N
⌧
y)d⌧
0
Define
xjt =
"
xj
R t tj
0 x⌧ d⌧
#
=
"
(xjt )1
(xjt )2
#
&
◆2
.
2
3
x1t
6
7
xt = 4 ... 5 .
xN
t
Optimizing social cost is essentially equivalent to centralized
control. However, no closed form solution for any finite N because
of terms of the form (xjt )21 (xjt )22 . Would need to solve HJB...
32 / 43
Person by Person Optimization (following HCM, 2012)
The corresponding loss function as seen from an agent i:
lN (·)|i = (xit
z)2 qt + (uit )2 r +
N
X
[(xjt
z)2 qt + (ujt )2 r]
j6=i
, I1N + I2N + (uit )2 r +
N
X
(ujt )2 r,
j6=i
where qt =
I1N
=
((xit )1
+
I2N =
N
X
j6=i
z)
2
2

2
N2
((xt i )2
((xjt )1
z)2
((xit )2
⇣ R
t
0
(N )
(x⌧
yt)2 +
y)d⌧
2 2 i
((xt )2
N
⌘2
.
yt)((xt i )2
yt)
yt)2 ,
✓
2
N2
⇣
(xit )2
⌘2 2 2
yt +
((xit )2
N
yt)((xt i )2
yt)
◆
+ terms independent of i,
where (xt i )2 =
1
N
PN
j
j6=i (xt )2 .
33 / 43
Person by Person Optimization as N ! 1
Limit as N ! 1:
lim I1N = ((xit )1
z)2 lim
N !1
N !1
+ ((xit )1
lim I2N = lim
N !1
N !1
z)2 2
 2
1
((xit )2
N N
2
((x̄t )2
N
1 X j
((xt )1
N
+ lim
j6=i
N !1
yt)2 + 2
((xit )2
⇣
(xit )2
yt)((x̄t )2
yt)
yt)2
z)2 ⇥ lim
N
1 X j
((xt )1
N
2
N !1
z)2 2
2
1
N
2
((xit )2
⌘2
yt
yt)((xt i )2
yt),
j6=i
34 / 43
“Optimistic” Anticipation of Limiting Behaviour
Assuming boundedness of the yellow terms through optimality, all
the limits of the yellow terms go to zero as N goes to infinity:
l1 (·)|i = 2
⇥
⇤
yt]2 (xit )21 2(xit )1 z
N
i2
1 Xh j
+ 2 2 lim
(xt )1 z [(x̄t )2
N !1 N
2
[(x̄t )2
⇥
yt] (xit )2
j6=i
Denote
N
1 Xh j
(xt )1
N !1 N
⌫t , lim
j6=i
(x̄t )1
i2
⇤
yt
.
35 / 43
Mean Field Equations
l1 (·)|i can be written as:
Recall:
>
l1 (·)|i = xit Qt xit + 2Dt > xit
xit =
Assume zero initial variance:
Qt =


2
2
[(x̄t )2
0
yt]2
0
0

Rt
0
xit
xi⌧ d⌧
=

(xjt )1
(xjt )2
,
2z 2 [(x̄t )2 yt]2
,
⌫t + [(x̄t )1 z]2 [(x̄t )2 yt]
✓
◆ ✓
◆>
d⇧t
= ⇧t A
I + A
I
⇧t ⇧t BR 1 B > ⇧t + Qt ,
dt
2
2
dst
= (A
I BR 1 B > ⇧t )> st + ⇧t c + Dt ,
dt
dx̄t
= (A BR 1 B > ⇧t )x̄t BR 1 B > st + c,
dt
d⌫t
= (A BR 1 B > ⇧t )⌫t + ⌫t (A BR 1 B > ⇧t )> + GG>
dt
Dt =
2
36 / 43
Simulation: unstable behavior!!!
500
0
°
C
-500
-1000
-1500
-2000
-2500
Target
1
2
0
10
20
30
40
50
hours
37 / 43
Instability of the Anticipated Limiting Problem
Remembering the yellow terms, we notice that
N
1 X j
((xt )1
N !1 N
lim
j6=i
z)2 ⇥ lim
N !1
1
N
2
(xit )2
2
yt ,
will always be present for finite N . In the hope of controlling the
negative drift of the linear (xit )2 term multiplying the mean
deviation of position with respect to variance, we reinject a small
term of this form and test its e↵ect on stability of the calculations.
38 / 43
Mean Field Equations
l1 (·)|i can be written as:
Recall:
>
l1 (·)|i = xit Qt xit + 2Dt > xit
xit =

Assume zero initial variance:
Qt =


2
2
[(x̄t )2
0
yt]2
2
0
⌫t + [(x̄t )1
Rt
0
xit
xi⌧ d⌧
z]2
=

(xjt )1
(xjt )2
,
2z 2 [(x̄t )2 yt]2
,
⌫t + [(x̄t )1 z]2 [(x̄t )2 yt]
✓
◆ ✓
◆>
d⇧t
= ⇧t A
I + A
I
⇧t ⇧t BR 1 B > ⇧t + Qt ,
dt
2
2
dst
= (A
I BR 1 B > ⇧t )> st + ⇧t c + Dt ,
dt
dx̄t
= (A BR 1 B > ⇧t )x̄t BR 1 B > st + c,
dt
d⌫t
= (A BR 1 B > ⇧t )⌫t + ⌫t (A BR 1 B > ⇧t )> + GG>
dt
Dt =
2
39 / 43
Simulation
21.5
Target
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Final
21.4
21.3
°
C
21.2
21.1
21
20.9
20.8
0
10
20
30
40
50
hours
40 / 43
Decaying ⇠
Define ⇠
70
60
yt]2
0
n
⇠ 2 ⌫t + [(¯t )1
z]2
o
Target
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Final
50
40
30
C
2 2 [(¯t )2
0
#
°
Qt =
"
20
10
0
-10
-20
-30
0
10
20
30
40
50
hours
⇠ = 0.5
700
200
°
C
50
0
-50
-100
-150
0
10
20
30
hours
⇠ = 0.35
40
50
600
Target
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Final
500
400
300
C
100
°
Target
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Final
150
200
100
0
-100
-200
0
10
20
30
40
50
hours
⇠ = 0.2
41 / 43
Non Cooperative vs Cooperative
25
21.5
Target Trajectory = y
Comfort Line: Low = z
Comfort Line: High = h
Mean Field
Mean Temperature
Individual Trajectories
24
°
20.5
20
19.5
0
100
200
300
400
500
600
700
23
22
C
C
21
°
Target
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Final
21
20
19
18
17
800
0
2
4
6
8
10
hours
hours
Non cooperative
25
21.5
Target Trajectory = y
Comfort Line: Low = z
Comfort Line: High = h
Mean Field
Mean Temperature
Individual Trajectories
24
°
C
21.2
21.1
21
20.9
20.8
0
10
20
30
40
50
22
C
21.3
23
°
Target
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Final
21.4
21
20
19
18
17
0
1
2
3
4
5
6
7
8
9
10
hours
hours
Cooperative
42 / 43
Conclusions / Future Work
Still Needed:
More rigorous theory for identifying the limiting set of fixed point
equations
Existence theory for a fixed point
✏-optimality properties
Future work:
Develop online device model parameter identification and adaptation
algorithms
Consider time varying collective target tracking problems
Better address the impact of local constraints on global target generation
43 / 43