Slides. - Ceremade
Transcription
Slides. - Ceremade
Cooperative and Non Cooperative Mean Field Game Methods in Energy Systems Roland P. Malhamé Department of Electrical Engineering École Polytechnique de Montréal and GERAD Mean Fields Games and Related Topics - 3 June, 2015 1 / 43 Outline Motivations Non Cooperative Collective Target Tracking Mean Field Control Cooperative Collective Target Tracking Mean Field Control Conclusions and future directions 2 / 43 Energy Systems 3 / 43 Motivation A higher level of penetration of renewable sources of energy in the energy mix of power systems (wind or solar) is synonymous with greater variability. Energy storage becomes an essential asset to compensate generation/load mismatch. Fundamental idea: Use the energy storage from electrical sources naturally present in the power system at customer sites based on mutually beneficial agreements (electric water heaters, electric space heating, electric space cooling). 4 / 43 Challenges and Previous Work Challenges: Literally millions of control points to model, monitor and control; severe computational requirements; large communication costs. Past approaches: Direct load control. Send the same interruption/reconnection signals to large collections of devices. Aggregate modeling scheme: Develop elemental stochastic load model of individual load behavior. Build aggregate load model by developing ensemble statistics of the devices, much as in the statistical mechanics framework. 5 / 43 An Example: A Di↵usion Model of Heating/Cooling Loads A hybrid state stochastic system (Malhamé-Chong, TAC 1985) Continuous State: Ca dxin t = Ua (xin t xout )dt + Qh mt bt dt + dwt divide by Ca and obtain xout )dt + Q0h mt bt dt + in dxin t = a(xt Discrete State: mt+ 0 in m ⇡(x , m; x+ , x ) = 1 m b m t 0 dwt = mt + ⇡(xin t ; x+ , x ) x < xin < x+ xin x+ xin x the operating state of the device (1 for “on” or 0 for “o↵”) the state of the power supply (1 for “on” or 0 for “o↵”). dw Qh m b 1 Ua xout xin Ca 6 / 43 An Example: A Di↵usion Model of Heating/Cooling Loads The Coupled Fokker-Planck Equations The resulting coupled Fokker-Planck equation model describing the evolution of temperature distributions within controlled residences T k,t [f ] = @f @t @ [(a( @ 2 xa (t)) kb(t)R)f ] @2 f, 2 @ 2 k = 0, 1 Fokker-Planck Equation Simulation 7 / 43 Remarks The optimal control problem becomes one of controlling PDEs using on-o↵ signals. A fraction of customers is inevitably penalized. The smaller this fraction, the less e↵ective the control is. 8 / 43 Implementation Principles 1 Each controller has to be situated locally: decentralized 2 Data exchange should be kept at minimum both with the central authority and among users 3 User disturbance should be kept at minimum 9 / 43 Envisioned Overall Architecture: The case of a single central authority Forecasts of wind and solar generation Aggregate models of energy storage capable devices Mathematical Programming Uncontrollable component of load Energy/temperature schedule of large aggregates of devices 8 < - electrical space heaters - electrical water heaters : - etc. 10 / 43 Mean Field Games: The Reasons? Two fundamental reasons: Games are a natural device for enforcing decentralization. The large numbers involved induce decoupling e↵ects which allow the law of large numbers to kick in. Practical benefits: The resulting control laws can be computed in an open loop manner by individual devices thus significantly reducing communication requirements. Control implementation is local unlike direct control, thus permitting local enforcement of comfort and safety constraints. 11 / 43 Non Cooperative Collective Target Tracking Mean Field Control for Space Heaters 12 / 43 Elemental space heating model: two changes 1 Thermostat controlled heating element is replaced by a continuous controller. dxin t = ⌘ 1 [ Ua (xin t Ca dxt = [ a(xt dw xout ) + Qh (t)]dt + dwt Qh 1 Ua xout ) + but ]dt + dwt xout xin Ca 2 The control input is redefined so that no control e↵ort is required on average to remain at initial temperature. dxt = [ a(xt xout ) + b(ut + uf ree )]dt + dwt where uf ree , a(xi0 xout ). Change 2 is made so that diversity is preserved in the water heater population while mean population temperature tracks the target. With no control e↵ort, the temperature stays at x0 . We do not penalize the control e↵ort that is used to stay at the initial temperature at the start of the control law horizon. 13 / 43 Constant Level Tracking Problem Setup 25 24 High Comfort Line = h 23 ° C 22 Target Mean Trajectory = y Initial Mean Temperature 21 20 19 Low Comfort Line = z 18 17 0 0.5 1 1.5 2 2.5 hours 3 3.5 4 4.5 5 14 / 43 Classical LQG Target Tracking JiN (ui , u i ) = E Z 1 e t 0 xi : y: ui : ⇥ (xit ⇤ y)2 q + b(uit )2 r dt, 1iN temperature tracking target control 25 Target Trajectory = y Comfort Line: Low = z Comfort Line: High = h Mean Temperature Individual Trajectories 24 23 ° C 22 21 20 19 18 17 0 1 2 3 hours 4 5 6 Agents Applying LQG Tracking 15 / 43 A Novel Integral Control Based Cost Function JiN (ui , u i ) = E xi : z: ui : Z 1 t e 0 ⇥ (xit ⇤ z)2 qt + b(uit )2 r dt temperature lower comfort bound control Integral controller embedded in mean-target deviation coefficient: qt , t 2 [0, T ], calculated as the following integrated error signal: qt = x̄N : y: Z t 0 ) (x(N ⌧ y)d⌧ mean temperature of the population mean target 16 / 43 Mean Field Based Collective Target Tracking Mean Field Based Collective Target Tracking Novelty is that the mean field e↵ect is mediated by the quadratic cost function parameters under the form of an integral error as compared to prevailing mean field theory where the mean field e↵ect is on the tracking signal 17 / 43 Fixed Point Equation System MF Equation System on [0, 1) d⇡t✓ = ⇡t✓ + 2a⇡t✓ b2 r 1 (⇡t✓ )2 + qt⇤ , dt ds✓t =( + a b2 ⇡ ✓ r 1 )s✓t ax✓0 ⇡ ✓ qt⇤ z, dt dx̄✓t = (a b2 ⇡ ✓ r 1 )x̄✓t b2 r 1 s✓t ax✓0 , dt Z x̄t = x̄✓t dF ✓ , ⇥ qt⇤ = Z t (x̄⌧ y)d⌧ . 0 Define the function space G ⇢ Cb [0, 1), where for function f 2 G, f (0) = x̄0 and z f (t) x̄0 for all t 0. ✓ 2 ⇥ corresponds to all sources of heterogeneity including initial conditions. Theorem: Schauder’s fixed point theorem guarantees the existence of a fixed point to the MF Equation System on space G. 18 / 43 Preliminary Numerical Experimentation Steps Successful experiment with a naive iterative algorithm. Note that lower comfort bound is 17, whereas the mean tracking target is 21. 21.5 Target 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Final 21 ° C 20.5 20 19.5 19 0 100 200 300 400 hours 500 600 700 800 Iterations 19 / 43 Numerical Computation of the Mean Field Control Law (II) Unsuccessful experiment with a naive iterative algorithm (di↵erent parameters). Note that lower comfort bound is 17, whereas the mean tracking target is 21. 21.5 Target 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Final 21 20.5 20 ° C 19.5 19 18.5 18 17.5 17 0 100 200 300 400 hours 500 600 700 800 Iterations 20 / 43 Observations 26 x(0) 24 * x 22 x(1) 20 18 16 0 5 10 15 20 25 30 35 40 45 50 From the experimentation, one concludes that the cases of interest correspond to convergence within the region [x0 , y]; also one observes monotonicity of the temperature trajectories in the early part of their behavior (pre crossing target y). This suggest to us the idea of using a so-called restricted operator whereby anytime the mean state hits the target y, it is frozen at y. 21 / 43 Operator Definitions Define the function space G r ⇢ G, where for functions f 2 G r , f (0) = x̄0 and y f (t) x̄0 for all t 0. Define the operator (x̄; ) : Cb [0, 1) ! C[0, 1): Z t qt = (x̄t ; ) , (x̄t y)dt 0 Define T : C[0, 1) ! C[0, 1): 8 d⇡t✓ > > = ⇡t✓ + 2a⇡t✓ b2 r 1 (⇡t✓ )2 + qt , > dt > < ds✓t =( + a b2 ⇡ ✓ r 1 )s✓t ax✓0 ⇡ ✓ qt z, dt x̄t = (T q)(t) , ✓ dx̄ > 2 ✓ 1 ✓ 2 1 ✓ ✓ t > > > dt =R (a ✓ b ✓⇡ r )x̄t b r st ax0 , : x̄t = ⇥ x̄t dF Hence, write the MF Eq. System for Collective Target Tracking as x̄t = (T )(x̄)(t) = (Mx̄)(t). 22 / 43 Restricted Operator Define Th 2 (R+ [ 1) as the first time that x̄ hits y. Restricted Operator Define T r : C[0, 1) ! Cb [0, 1): ⇢ T (qt ) for t 2 [0, Th ) Tr , y for t 2 [Th , 1) Define Mr , (T r ) : Cb [0, 1) ! Cb [0, 1). 23 / 43 A Numerical Algorithm for the Restricted Operator k=0 while kx̄ x̄old k1 > ✏1 , do x̄old(2) = x̄old = x̄ if mod(k, 2) == 1 then xold q = (x̄; ) ⇤ then if q1 > q1 1 = ⇥ 1+✏ 2 q = (x̄; ) end if elseif |x̄ q= x̄old(2) | == 0 then r (x̄; else q= ) ⇤ Define q1 (x̄) of x̄ for x̄t ! y as t ! 1. ⇤ q1 = a(a )r b2 ✓ x̄0 x1 x1 z ◆ (x̄; ) end if x̄ = T r (q) k =k+1 end while return x̄. 24 / 43 Observations Coefficients can be tuned to guarantee successful convergence at the cost of slowing collective dynamics. The algorithm calculates a ⇤ such that there exists a desirable fixed point trajectory for the operator M=T (x̄; ) : G ! G. The numerical algorithm always converges to a smooth fixed point trajectory. 25 / 43 ✏-Nash Theorem Theorem Under technical conditions the collective target tracking MF stochastic control law generates a set of controls N , {(ui ) ; 1 i N }, 1 N < 1, with Ucol (uit ) = b/r(⇡ti xit + sit ), t 0, such that (i) all agent system trajectories xi , 1 i N, are stable in the sense that Z 1 E e t kxit k2 dt < 1; 0 N ; 1 N < 1} yields an ✏-Nash equilibrium for all ✏ > 0; (ii) {Ucol there exists N (✏) such that for all N N (✏) JiN (ui ) , (u i ) inf ui 2UgN ✏ JiN ui , (u i ) JiN (ui ) , (u i ) . 26 / 43 Simulation 400 heaters with maximum power: 5kW 2 experiments: increase 0.5 C mean temperature, decrease 0.5 C mean temperature, over a 6 hours control horizon the central authority provides the target, local controllers apply collective target tracking mean field 27 / 43 Energy Release: Accelerated Engineering Solution Accelerated Engineering Solution: No Control until agent’s temperature reaches its individual steady state: xi1 = xi0 ⇤ b 2 q1 (xi0 z) ⇤ b2 ar(a ) + q1 25 Target Trajectory = y Comfort Line: Low = z Comfort Line: High = h Mean Temperature Individual Trajectories 24 23 ° C 22 21 20 19 18 17 0 1 2 3 4 5 6 hours Agents Applying Collective Target Tracking MF Control Accelerated Engineering Solution 28 / 43 Cooperative Collective Target Tracking Mean Field Control for Space Heaters 29 / 43 Methodology Agents collectively minimize the social cost function instead of their individual cost functions. Person by person optimization at the population limit Solution Concept: ✏-optimality instead of ✏-Nash 30 / 43 Social Cost Remember the non-cooperative cost function: Z 1 N i i Ji (u , u ) = E e t lN (·)dt, 0 ⇥ ⇤ lN (·) = (xit z)2 qt + (uit )2 r , where qt = Z t 0 ) (x(N ⌧ y)d⌧ . Social Cost Function: N Jsoc (u) = N X JjN (uj , u j ). j=1 31 / 43 Non Tractability of the Centralized Problem Social running loss function: N X [(xjt z)2 qt + (ujt )2 r]. j=1 For smoothness of qt , change qt to ✓ Z t ) qt = (x(N ⌧ y)d⌧ 0 Define xjt = " xj R t tj 0 x⌧ d⌧ # = " (xjt )1 (xjt )2 # & ◆2 . 2 3 x1t 6 7 xt = 4 ... 5 . xN t Optimizing social cost is essentially equivalent to centralized control. However, no closed form solution for any finite N because of terms of the form (xjt )21 (xjt )22 . Would need to solve HJB... 32 / 43 Person by Person Optimization (following HCM, 2012) The corresponding loss function as seen from an agent i: lN (·)|i = (xit z)2 qt + (uit )2 r + N X [(xjt z)2 qt + (ujt )2 r] j6=i , I1N + I2N + (uit )2 r + N X (ujt )2 r, j6=i where qt = I1N = ((xit )1 + I2N = N X j6=i z) 2 2 2 N2 ((xt i )2 ((xjt )1 z)2 ((xit )2 ⇣ R t 0 (N ) (x⌧ yt)2 + y)d⌧ 2 2 i ((xt )2 N ⌘2 . yt)((xt i )2 yt) yt)2 , ✓ 2 N2 ⇣ (xit )2 ⌘2 2 2 yt + ((xit )2 N yt)((xt i )2 yt) ◆ + terms independent of i, where (xt i )2 = 1 N PN j j6=i (xt )2 . 33 / 43 Person by Person Optimization as N ! 1 Limit as N ! 1: lim I1N = ((xit )1 z)2 lim N !1 N !1 + ((xit )1 lim I2N = lim N !1 N !1 z)2 2 2 1 ((xit )2 N N 2 ((x̄t )2 N 1 X j ((xt )1 N + lim j6=i N !1 yt)2 + 2 ((xit )2 ⇣ (xit )2 yt)((x̄t )2 yt) yt)2 z)2 ⇥ lim N 1 X j ((xt )1 N 2 N !1 z)2 2 2 1 N 2 ((xit )2 ⌘2 yt yt)((xt i )2 yt), j6=i 34 / 43 “Optimistic” Anticipation of Limiting Behaviour Assuming boundedness of the yellow terms through optimality, all the limits of the yellow terms go to zero as N goes to infinity: l1 (·)|i = 2 ⇥ ⇤ yt]2 (xit )21 2(xit )1 z N i2 1 Xh j + 2 2 lim (xt )1 z [(x̄t )2 N !1 N 2 [(x̄t )2 ⇥ yt] (xit )2 j6=i Denote N 1 Xh j (xt )1 N !1 N ⌫t , lim j6=i (x̄t )1 i2 ⇤ yt . 35 / 43 Mean Field Equations l1 (·)|i can be written as: Recall: > l1 (·)|i = xit Qt xit + 2Dt > xit xit = Assume zero initial variance: Qt = 2 2 [(x̄t )2 0 yt]2 0 0 Rt 0 xit xi⌧ d⌧ = (xjt )1 (xjt )2 , 2z 2 [(x̄t )2 yt]2 , ⌫t + [(x̄t )1 z]2 [(x̄t )2 yt] ✓ ◆ ✓ ◆> d⇧t = ⇧t A I + A I ⇧t ⇧t BR 1 B > ⇧t + Qt , dt 2 2 dst = (A I BR 1 B > ⇧t )> st + ⇧t c + Dt , dt dx̄t = (A BR 1 B > ⇧t )x̄t BR 1 B > st + c, dt d⌫t = (A BR 1 B > ⇧t )⌫t + ⌫t (A BR 1 B > ⇧t )> + GG> dt Dt = 2 36 / 43 Simulation: unstable behavior!!! 500 0 ° C -500 -1000 -1500 -2000 -2500 Target 1 2 0 10 20 30 40 50 hours 37 / 43 Instability of the Anticipated Limiting Problem Remembering the yellow terms, we notice that N 1 X j ((xt )1 N !1 N lim j6=i z)2 ⇥ lim N !1 1 N 2 (xit )2 2 yt , will always be present for finite N . In the hope of controlling the negative drift of the linear (xit )2 term multiplying the mean deviation of position with respect to variance, we reinject a small term of this form and test its e↵ect on stability of the calculations. 38 / 43 Mean Field Equations l1 (·)|i can be written as: Recall: > l1 (·)|i = xit Qt xit + 2Dt > xit xit = Assume zero initial variance: Qt = 2 2 [(x̄t )2 0 yt]2 2 0 ⌫t + [(x̄t )1 Rt 0 xit xi⌧ d⌧ z]2 = (xjt )1 (xjt )2 , 2z 2 [(x̄t )2 yt]2 , ⌫t + [(x̄t )1 z]2 [(x̄t )2 yt] ✓ ◆ ✓ ◆> d⇧t = ⇧t A I + A I ⇧t ⇧t BR 1 B > ⇧t + Qt , dt 2 2 dst = (A I BR 1 B > ⇧t )> st + ⇧t c + Dt , dt dx̄t = (A BR 1 B > ⇧t )x̄t BR 1 B > st + c, dt d⌫t = (A BR 1 B > ⇧t )⌫t + ⌫t (A BR 1 B > ⇧t )> + GG> dt Dt = 2 39 / 43 Simulation 21.5 Target 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Final 21.4 21.3 ° C 21.2 21.1 21 20.9 20.8 0 10 20 30 40 50 hours 40 / 43 Decaying ⇠ Define ⇠ 70 60 yt]2 0 n ⇠ 2 ⌫t + [(¯t )1 z]2 o Target 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Final 50 40 30 C 2 2 [(¯t )2 0 # ° Qt = " 20 10 0 -10 -20 -30 0 10 20 30 40 50 hours ⇠ = 0.5 700 200 ° C 50 0 -50 -100 -150 0 10 20 30 hours ⇠ = 0.35 40 50 600 Target 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Final 500 400 300 C 100 ° Target 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Final 150 200 100 0 -100 -200 0 10 20 30 40 50 hours ⇠ = 0.2 41 / 43 Non Cooperative vs Cooperative 25 21.5 Target Trajectory = y Comfort Line: Low = z Comfort Line: High = h Mean Field Mean Temperature Individual Trajectories 24 ° 20.5 20 19.5 0 100 200 300 400 500 600 700 23 22 C C 21 ° Target 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Final 21 20 19 18 17 800 0 2 4 6 8 10 hours hours Non cooperative 25 21.5 Target Trajectory = y Comfort Line: Low = z Comfort Line: High = h Mean Field Mean Temperature Individual Trajectories 24 ° C 21.2 21.1 21 20.9 20.8 0 10 20 30 40 50 22 C 21.3 23 ° Target 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Final 21.4 21 20 19 18 17 0 1 2 3 4 5 6 7 8 9 10 hours hours Cooperative 42 / 43 Conclusions / Future Work Still Needed: More rigorous theory for identifying the limiting set of fixed point equations Existence theory for a fixed point ✏-optimality properties Future work: Develop online device model parameter identification and adaptation algorithms Consider time varying collective target tracking problems Better address the impact of local constraints on global target generation 43 / 43