Computational Nonlinear Stochastic Control based on the Fokker
Transcription
Computational Nonlinear Stochastic Control based on the Fokker
Computational Nonlinear Stochastic Control based on the Fokker-Planck-Kolmogorov Equation Mrinal Kumar ∗S. Chakravorty †John L. Junkins ‡ February 13, 2008 Abstract The optimal control of nonlinear stochastic systems is considered in this paper. The central role played by the Fokker-Planck-Kolmogorov equation in the stochastic control problem is shown under the assumption of asymptotic stability. A computational approach for the problem is devised based on policy iteration/ successive approximations, and a finite dimensional approximation of the control parametrized diffusion operator, i.e., the controlled FokkerPlanck operator. Several numerical examples are provided to show the efficacy of the proposed computational methodology. 1 Introduction Numerous fields of science and engineering present the problem of uncertainty propagation through nonlinear dynamic systems.1 One may be interested in the determination of the response of engineering structures - beams/plates/entire buildings under random excitation (in structure mechanics2 ), or the motion of particles under the influence of stochastic force fields (in particle physics3 ), or the computation of the prediction step in the design of a Bayesian filter (in filtering theory)4, 5 among others. All these applications require the study of the time evolution of the Probability Density Function (PDF), p(t, x), corresponding to the state, x, of the relevant dynamic system. The pdf is given by the solution to the Fokker-Planck-Kolmogorov equation (FPE), which is a PDE in the pdf of the system, defined by the underlying dynamical system’s parameters. In this paper, approximate solutions of the FPE are considered, and subsequently leveraged for the design of controllers for nonlinear stochastic dynamical systems. In the past few years, the authors have developed a generalized multi-resolution meshless FEM methodology, partition of unity FEM (PUFEM), utilizing the recently developed GLOMAP (Global Local Orthogonal MAPpings) methodology to provide the partition of unity functions6, 7 and the orthogonal local basis functions, for the solution of the FPE. The PUFEM is a Galerkin projection method and the solution is characterized in terms of an finite dimensional representation of the Fokker-Planck operator underlying the problem. The methodology is also highly amenable to parallelization.8–12 Though the FPE is invaluable in quantifying the uncertainty evolution through nonlinear systems, perhaps its greatest benefit may be in the stochastic analysis, design and control of nonlinear systems. In the context of nonlinear stochastic control, Markov Decision Processes have long been one of the most widely used methods for discrete time stochastic control. However, the Dynamic Programming equations underlying the MDPs suffer from the curse of dimensionality.13–15 Various approximate Dynamic Programming (ADP) methods have been proposed in the past several years for overcoming the curse of dimensionality,15–19 and broadly can be categorized under the category of functional reinforcement learning. These methods are essentially model free method of approximating the optimal control policy in stochastic optimal control problems. These methods generally fall under the category of value function approximation ∗ Mrinal Kumar is a Ph.D student in the department Aerospace Engineering at the Texas A&M University, College Station, TX-77843, [email protected], http://people.tamu.edu/˜mrinal. † Suman Chakravorty is an Assistant Professor with the Faculty of Aerospace Engineering at Texas A&M University, College Station ‡ John L. Junkins is the Royce E. Weisenbaker Distinguished Professor of Aerospace Engineering at Texas A&M University, College Station 1 methods,15 policy gradient/ approximation methods17, 18 and actor-critic methods.16, 19 These methods attempt to reduce the dimensionality of the DP problem through a compact parametrization of the value functions (with respect to a policy or the optimal function) and the policy function. The difference in the methods is mainly through the parametrization that is employed in order to achieve the above goal, and range from nonlinear function approximators such as neural networks16 to linear approximation architectures.15, 20 These methods learn the optimal policies by repeated simulations on a dynamical system and thus, can take a long time to converge to a good policy, especially when the problem has continuous state and control spaces. In contrast, the methodology proposed here is model-based, and uses the finite dimensional representation of the underlying parametrized diffusion operator to parametrize both the value function as well as the control policy in the stochastic optimal control problem. Considering the low order finite dimensional controlled diffusion operator allows us to significantly reduce the dimensionality of the planning problem, while providing a computationally efficient recursive method for obtaining progressively better control policies. The literature on computational methods for solving continuous time stochastic control problems is relatively sparse when compared to the discrete time problem. One of the approaches is through the use of locally consistent Markov decision processes.21 In this approach, the continuous controlled diffusion operator is approximated by a finite dimensional Markov chain which satisfies certain local consistency conditions, namely that its drift and diffusion coefficients match that of the original process locally. The resulting finite state MDP is solved by standard DP techniques such as value iteration and policy iteration. The method relies on a finite difference discretization and thus, can be computationally very intensive in higher dimensional spaces. In another approach,22, 23 the diffusion process is approximated by a finite dimensional Markov chain through the application of generalized cell to cell mapping.24 However, even this method suffers from the curse of dimensionality because it involves discretizing the state space into a grid which becomes increasingly infeasible as the dimension of the system grows. Also, finite difference and finite element methods have been applied directly to the nonlinear Hamilton-Jacobi-Bellman partial differential equation.25, 26 The method proposed here differs in that it uses policy iteration in the original infinite dimensional function space, along with a finite dimensional representation of the controlled diffusion operator in order to solve the problem. Considering a lower order approximation of the underlying operator results in a significant reduction in dimensionality of the computational problem. Utilizing the policy iteration algorithm results in typically having to solve a sequence of a few linear equations (typically < 5) before practical convergence is obtained, as opposed to solving a high dimensional nonlinear equation if the original nonlinear HJB equation is solved. The literature for solving deterministic optimal control problems in continuous time is relatively mature when compared to its stochastic counterpart. The method of successive approximations/ policy iteration has widely been used to solve deterministic optimal control problems. Various different methods have been proposed for solving the policy evaluation step in the policy iteration algorithms that include Galerkin-based methods27, 28 and neural network based methods29–33 among others. The methodology outlined here can be viewed as an extension of this extensive body of work to the stochastic optimal control problem. However, it should be noted, that the extension is far from trivial as the stochastic optimal control problem, especially the the control of degenerate diffusions, has pathologies that have to be treated carefully in order to devise effective computational methods for solving such problems. We would like to mention that we are interested in the feedback control of dynamical systems here, i.e, in the solution of the HJB equation as opposed to solving the open loop optimal control problem based on the Pontryagin minimum principle34 that results in a two-point boundary value problem involving the states and co-states of the dynamical system. Please see references34–36 for more on this approach to solving the optimal control problem. Also, we would like to state here that, to the best of our knowledge, there is no stochastic equivalent to the two-point boundary value problems that result from the Pontryagin minimum principle being applied to the deterministic optimal control problem. The rest of the paper is arranged as follows. Section 2 introduces the forward and backward Kolmogorov equations and shows their equivalence under the condition of asymptotic stability. Section 3 outlines the computational method used to obtain a finite dimensional representation of the infinite dimensional Fokker-Planck or forward Kolmogorov (FPE), and the backward Kolmogorov Equation (BKE). In Section 4, the finite dimensional representations of the FPE and BKE operators are used to outline a recursive procedure, based on policy iteration, to obtain the solution to the stochastic nonlinear control problem. In Section 5, the methodology is applied to various different nonlinear control problems. 2 2 The Forward and Backward Kolmogorov Equations In this section, we shall introduce the forward Kolmogorov or the Fokker-Planck Equation (FPE) and the Backward Kolmogorov equation. While the former is central to the propagation of uncertainty and nonlinear filtering of dynamical systems, the latter is paramount in the solution of stochastic control problems. It can be shown that if the FPE is asymptotically stable, then the forward and backward equations are equivalent in a sense to be made precise later in this section. For the sake of simplicity, we shall consider the case of a one-dimensional system in this section, the results are easily generalized to multidimensional systems. Consider the stochastic differential equation: ẋ = f (x) + g(x)W, (1) where W represents white noise. Associated with the above equation is the Forward Kolmogorov or the Fokker-Planck Equation (FPE): ∂p ∂(f p) 1 ∂ 2 (g 2 p) , =− + ∂t ∂x 2 ∂x2 (2) and the Backward Kolmogorov Equation (BKE): ∂(p) g 2 ∂ 2 p ∂p =f + . ∂t ∂x 2 ∂x2 (3) The FPE governs the evolution of the probability density function of the stochastic system Eq.1 forward in time while the BKE governs the evolution of the uncertainty (pdf) in the system backward in time. The FPE is said to be asymptotically stable if there exists a unique pdf p∞ such that any initial condition, either deterministic or probabilistic, decays to the pdf p∞ as time goes to infinity. In the following,we show the equivalence of the BKE and the FPE under the condition of asymptotic stability of the FPE. Since the FPE is asymptotically stable, let the pdf at any time be represented as follows: p(x, t) = p∞ (x)p̄(x, t), (4) i.e., as the product of the stationary pdf and a time varying part. Substituting into the FPE, we obtain: ∂p∞ p̄ ∂(f p∞ p̄) 1 ∂ 2 (g 2 p∞ p̄) =− + , ∂t ∂x 2 ∂x2 (5) which after using the fact that p∞ is the stationary solution, i.e., that ∂p∞ ∂(f p∞ ) 1 ∂ 2 (g 2 p∞ ) =− + = 0, ∂t ∂x 2 ∂x2 (6) and expanding the various terms in Eq. 5 leads us to the following identity: p∞ ∂ p̄ Consider the term −f p∞ ∂x + ∂ p̄ ∂ p̄ 1 2 ∂ 2 p̄ ∂ p̄ ∂(g 2 p∞ ) = −f p∞ + g p∞ 2 + . ∂t ∂x 2 ∂x ∂x ∂x ∂ p̄ ∂(g 2 p∞ ) ∂x ∂x (7) in the above equation. Note that from Eq. 6 that ∂ 1 ∂g 2 p∞ (−f p∞ + ) = 0, ∂x 2 ∂x (8) 1 ∂(g 2 p∞ ) = const. 2 ∂x (9) and hence, −f p∞ + Hence, if we assume that the invariant distribution is well-behaved such that the terms f p∞ → 0, and x → ±∞, i.e, if the stationary distribution is not heavy-tailed, then −f p∞ + 1 ∂(g 2 p∞ ) = 0, 2 ∂x 3 ∂g 2 p∞ ∂x → 0 as (10) and hence, substituting the above into Eq. 7, and dividing throughout by p∞ (x), it follows that ∂ p̄ ∂(p̄) g 2 ∂ 2 p̄ , =f + ∂t ∂x 2 ∂x2 (11) i.e, the time varying part of the pdf follows the BKE. Denote by Lf (.) and Lb (.) the FPE and BKE operators respectively, i.e., ∂(f p) 1 ∂ 2 (g 2 p) , + ∂x 2 ∂x2 ∂p g 2 ∂ 2 p Lb (p) = f . + ∂x 2 ∂x2 Lf (p) = − (12) (13) The BKE operator is the formal adjoint of the FPE operator. Under the condition of asymptotic stability of the FPE, and in view of the above development, the BKE and FPE are equivalent in the following sense: if ψ(x) is and eigenfunction of the BKE operator Lb (.) with corresponding eigenvalue λ, then p∞ (x)ψ(x) is an eigenfunction of the FPE operator with the same eigenvalue λ. Hence, the knowledge of the eigenfunctions of the BKE operator along with knowledge of the stationary distribution of the FPE is sufficient for identifying the eigenstructure of the FPE, and vice-versa. Hence, under the condition of asymptotic stability, knowledge of the stationary solution along with any one of the FPE or the BKE operators is sufficient to completely determine the characteristics of the other operator. In the following, we outline some of the other important properties of the FPE/ BKE operators under the condition of asymptotic stability of the FPE. Consider the following Hilbert space, L2 (p∞ ), i.e, the L2 space where the inner product is weighted by the stationary distribution: Z < f, g >∞ = f (x)g(x)p∞ (x)dx. (14) Then, from the development above it is quite simple to show that the BKE operator is a symmetric operator in L2 (p∞ ), i.e., < Lb (f ), g >∞ =< f, Lb (g) >∞ , ∀ f, g ∈ L2 (p∞ ). (15) Under the condition that the stationary distribution is not heavy tailed, the semi-group associated with the BKE operator is compact as well as self-adjoint and thus, the operator has a discrete spectrum. Moreover, the eigenfunctions of the operator also form a complete orthonormal basis for L2 (p∞ ). However, the sufficient conditions for the existence of a discrete spectrum are not well-known in the multidimensional case, especially for degenerate diffusions (please see below), to the best of the knowledge of the authors. However, in obtaining the finite dimensional representations of the operators, the symmetry of the operator is enough to assure the well-posedness of the associated problems. In the case of multidimenesional problems, the FPE and the BKE operators are respectively given by the following expressions: Lf (p) = − Lb (V ) = X ∂ 1 X ∂2 fi p + (GGt )ij p, ∂x 2 ∂x x i i j i i,j (16) ∂ 1X ∂2 V + (GGt )ij V, ∂xi 2 i,j ∂xi xj (17) X i fi given the nonlinear stochastic multidimensional system Ẋ = F (X) + G(X)W, F (X) = [f1 (X), · · · , fn (X)], (18) (19) where W is multidimensional white noise. Note that in most multidimensional systems, especially mechanical ones, the matrix G is rank deficient since the input to the systems is at only the velocity level variables, and hence, so is the matrix GGt . In general, a diffusion operator may be written as D(V ) ≡ X i ai (X) ∂V 1X ∂2V + dij (X) , ∂xi 2 i,j ∂xi xj (20) where the A(x) = [a1 (X), · · · , aN (X)] is a drift vector and D(X) = [dij (X)] is a diffusion matrix. In the case when D(x) is not strictly positve definite for all x, the diffusion process is termed degenerate (also sometimes it is said the 4 diffusion process violates the uniform ellipticity condition). For diffusions arising out of stochastic dynamical systems, i.e., the BKE operator, it follows that they are degenerate diffusions due to the rank defficiency of G. In general, the theoretical treatment of the solution of such equations is made very difficult by the degeneracy because classical solutions to PDEs governed by such operators may not exist. In these cases, the generalized notion of viscosity solutions needs to be introduced.37–39 The FPE and the BKE are paramount in the propagation of uncertainty through, and the filtering and control of stochastic nonlinear systems. As mentioned above though the theoretical treatment of these operators in the multidimensional case is very difficult, nevertheless, we may obtain a finite dimensional, i.e., a matrix representation of the above operators in order to practically solve these problems, atleast in an approximate fashion within the resulting finite dimensional space. We shall show the exact application of the finite dimensional representation of the operators to uncertainty propagation, filtering and control in the next two sections. 3 Finite Dimensional Representation of the FPE/ BKE operators As mentioned in the previous section, the FPE and the BKE operators are the key to the analysis, filtering and control of stochastic dynamical systems. However, these are infinite dimensional operators and in order to be practically feasible, a finite dimensional representation of these operators needs to be obtained. The Galerkin residual projection method is one of the most powerful methods of obtaining such a representation and is outlined in the following. The Galerkin projection method is a general method to solve partial differential equations and is often used to solve the FPK equation. Let p(t, x) represent the time varying solution to the FPK equation (2). Let {φ1 (x), · · · , φn (x), · · · } represent a set of basis functions on some compact domain D. the various choices of the basis functions lead to different methodologies. The Galerkin method makes the assumption that the pdf p(t, x) can be written as a linear combination of the basis functions φi (x) with time varying coefficients, i.e., p(t, x) = ∞ X ci (t)φi (x). (21) i=1 The key to obtaining the solution to the pdf p(t, x) is to find the coefficients ci (t) in the above equation. The Galerkin method achieves this by truncating the expansion of p(t, x) for some finite N and projecting the equation error onto the set of basis functions {φi (x)}, i.e., N X ci (t)φi (x), (22) p(t, x) ≈ i=1 and < ∂ PN i=1 ci (t)φi (x) ∂t N X ci (t)φi (x)), φj (x) >= 0, − Lf ( (23) i=1 where < ., . > represents the inner product on L2 (D), the space of Lebesgue integrable functions on the domain D, and Lf represents the FPE operator. Note that the above represents a finite set of differential equations in the co-efficients {ci (t)}, which can be solved for the coefficients ci (t), given by: N X i=1 ċi < φi , φj > − N X ci < Lf (φi ), φj >= 0, ∀j. (24) i=1 Let M = [< φi , φj >], K = [< Lf (φi ), φj >]. (25) (26) The pair (M, K) represents the finite dimensional approximation of the FPE operator. If the basis functions are mutually orthogonal, then the stiffness matrix K represents the finite dimensional representation of the FPE operator. An eigendecompositon of the pair (M, K) yields the approximation of the eigenstructure of the infinite dimensional 5 FPE operator. The uncertainty propagation problem, as well as the prediction step of the filtering problem involve the solution of the linear differential equation M Ċ + KC = 0, (27) over a finite or infinite time interval. Given the eigenstructure of the above LTI system has been found and stored a priori, the above equation can be trivially solved in real-time. Moreover, if the FPK equation is asymptotically stable, it can be shown after some work that the eigenvalues of the FPE operator are all real and non-positive. In the following section, we shall show how the FPE/ BKE operator may be used to solve the stochastic nonlinear control problem. The Galerkin method is not the only method that is available in order to obtain the finite dimensional representation of the FPE/ BKE operator. The generalized Galerkin residual projection method can be used to obtain the representation. In the generalized procedure, the residual (or equation error) obtained by substituting the approximate solution into the equation of interest, is projected onto a set of generalized functions instead of the basis functions themselves. For instance, collocation/ pseudospectral methods involve projecting the equation error onto a set of delta functions located at a specially selected set of collocation points.35, 36 A full comparison of these methods for the computational solution of the FPE/ BKE equations is beyond the scope of this paper. However, we would like to mention that the generalized FEM based methods developed by the authors8–12 have consistently outperformed other competing Galerkin residual projection methods. 4 Nonlinear Stochastic Control An iterative approach to solving the Hamilton-Jacobi-Bellman (HJB) equation is presented now for the optimal control problems of nonlinear stochastic dynamical systems. Consider the following dynamical system: ẋ = f (x) + g(x)u + h(x)w(t); x(t = 0) = x0 (28) where, u is the control input (vector) and w represents white noise excitation. We are interested in minimizing the following discounted cost functional over the infinite horizon: Z ∞ J(x0 ) = E [l(ϕ(t)) + ku(ϕ(t))k2R ]e−βt dt (29) 0 where, ϕ(t) is a trajectory that solves Eq.28 and l is often referred to as the state-penalty function, and β is a given discount factor. The infinite horizon problem results in a stationary, i.e., time invariant control law. However, for the case of systems with additive noise, i.e., when h(x) is independent of x, as would be the case for instance in the LQG problem, the cost-to-go is undefined since the trajectories of the system never decay to zero. Hence, in that case, the discounting is a practical step to ensure a bounded cost-to-go function. The discount factor may also be interpreted as a finite horizon over which the control law tries to optimize the system performance. For this problem, the optimal control law, u∗ (x) is known to be given in terms of a value function V ∗ (x) as follows: ∂V ∗ 1 (x) u∗ (x) = − R−1 gT 2 ∂x (30) where, the value function V ∗ (x) solves the following well known (stationary) Hamilton-Jacobi Bellman equation: ∂V ∗ T 1 ∂2V ∗ 1 ∂V ∗ ∂V ∗ f + hQhT +l− gR−1 gT − βV ∗ = 0, 2 ∂x 2 ∂x 4 ∂x ∂x V ∗ (0) = 0 (31) The above framework (solving for the optimal control law, u∗ (x)) is known as the H2 control paradigm. Notice that designing the optimal control in the above framework requires solving a nonlinear PDE (Eq.31), which is in general a very difficult problem. It is however possible to restructure the above equations so that the central problem can be reduced to solving a sequence of linear PDEs - a much easier proposition. This is achieved via the substitution of Eq.30 into the quadratic term involving the gradient of the value function in the HJB (Eq.31). Doing so gives us the following equivalent form the the HJB: 1 ∂2V ∗ ∂V ∗ T (f + gu) + hQhT + l + kuk2R − βV ∗ = 0 (32) ∂x 2 ∂x2 The above form of the HJB is known as the generalized HJB. Notice that the substitution discussed above has converted the nonlinear PDE to a linear PDE in the value function, V ∗ (x). Eq.32 forms the core of a policy iteration algorithm in which the optimal control law can be obtained by iterating upon an initial stabilizing control policy as follows: 6 1. Let u(0) be an initial stabilizing control law (policy) for the dynamical system (Eq. 28), i.e., the FPE corresponding to the closed loop under u(0) is asymptotically stable. 2. For i = 0 to ∞ • Solve for V (i) from: T 1 ∂V (i) ∂ 2 V (i) (f + gu(i) ) + hQhT + l + ku(i) k2R − βV (i) = 0 ∂x 2 ∂x2 (33) 1 ∂V (i) u(i+1) (x) = − R−1 gT (x) 2 ∂x (34) • Update policy as: 3. End [Policy Iteration] The convergence of the above algorithm has been proven for the deterministic case (Q = 0) in references27, 28 and is true for the stochastic case too.40 However, conditions for the existence of classical solutions to the linear elliptic PDE in the policy evaluation step (Eq. 33), and the asymptotic stability of the closed loop systems under the resulting control policies, in the sense that the associated FPE is stable, have not been obtained to the best knowledge of the authors. In this paper, we shall assume that the policy evaluation step admits a classical solution as well as that the sequence of control policies generated asymptotically stabilize the closed loop system. In fact, these assumptions are borne out by the numerical examples considered in the next section. The great advantage of using the policy iteration algorithm over directly trying to solve the nonlinear HJB equation is that the algorithm typically converges in very few steps (≤ 5). Let us take a closer look at Eq.33 - writing it in operator form, we have: (L − β)V = q (35) where, 1 ∂2 ∂ + hQhT 2 ∂x 2 ∂x = −(l + kuk2R ) L = q (f + gu)T (36) (37) Notice that the operator L() is identical to the BKE operator of the closed loop under control policy u. The approximation for the value function is then written in terms of basis functions in the following manner: N X V̂ = ci φi (x) (38) i=1 where, φi (x) could be local or global basis functions depending on the nature of the approximation being used. In the current paper, we use global polynomials (i.e. φ()i are polynomials supported on Ω). Then, following the Galerkin approach, we project the residual error resulting from plugging the approximation (Eq.38) into the generalized HJB (Eq.33) on to the space of polynomials on Ω as follows: N X i=1 Z Z (L − β)[φi (x)]φj (x)dΩ = ci qφj dΩ, Ω j = 1, 2, . . . , N (39) Ω Eq.39 is equivalent to a linear system of algebraic equations in ci : Kc = f (40) where Z Kij = (L − β)[φj ]φi dΩ (41) qφi dΩ (42) ZΩ fi = Ω 7 Note that K = Kb − βI, (43) where Kb is the finite dimensional representation of the backward Kolmogorov operator. Some notes on the admissibility of basis functions (φi ) are due here. Notice that the BK operator involves only derivatives of the unknown function, V (x). It is thus clear that when the discount factor (β) is zero, it is not possible to include a use a complete set of basis functions without making the stiffness matrix K singular (φ ≡ 1 causes a rank deficiency of 1). However, so far as solving Eq. 40 is concerned, it is not important to have the constant basis function in the approximation space. In theory, a nontrivial β allows us to include φ ≡ 1 in the basis set; but in practice it would require a large discount factor before the K is reasonably well-conditioned for inversion; which in turn would take the solution far from being optimal since then the control policy becomes very short sighted. The above formulation can also be carried out on local subdomains by choosing φi to be supported on subdomains forming a cover for the global domain Ω. Then the projection integrals (39) would be evaluated over the local subdomains. This is the basic approach in finite element methods and meshless finite element methods, where the local subdomains overlap each other. The authors have developed a method based on the Partition of Unity Finite Element method (PUFEM) to solve the forward (Fokker Planck) and backward Kolmogorov equations.8–12 5 5.1 Numerical Examples Two Dimensional System - Van der Pol Oscillator Consider the following two dimensional stochastic Van der Pol oscillator: ẍ + v1 x + v2 (1 − x2 )ẋ = u + gw(t) with, v1 = 1, v2 = −1, g = 1. The objective is to minimize the following cost function: Z ∞ 1 2 2 2 (x + ẋ ) + u dt J(x0 , ẋ0 ) = E 2 0 (44) (45) The approximation for the value function was written using a complete set of global polynomials up to fourth order (using polynomials up to the sixth order did not provide significant improvement in the approximation accuracy). It is notable that in the above formulation, no direct means of enforcing state or control input constraints have been used. In order to obtain reasonable bounds on the control input, the cost function (Eq.45) was scaled so as to modify the generalized HJB (Eq.33) as follows (in addition to including the discount factor (β)): T ∂V (i) 1 ∂ 2 V (i) (f + gu(i) ) + hQhT + βV = −eγ (l + ku(i) kR ) ∂x 2 ∂x2 (46) i.e., l(·) → eγ l(·) R → eγ R, γ<0 (47) (48) Fig. 1 shows the converged results obtained for the above system. A linear starting policy was used to start the iteration process, i.e., u(0) = K ẋ, with K = −20. Tuning parameters β = 0.05 and γ = −0.15 were found to give acceptable results after about 13 iterations of the control policy. Fig.2(i) shows the system response without a control input (open loop). Clearly, the states diverge. Fig.2(ii) shows the system response with the initial stabilizing control. For both these results, nominal initial conditions were used inside the domain of operation, Ω = [−5, 5] × [−5, 5]. Fig.3 shows the optimal state trajectories obtained and the optimal control law. 5.2 Two Dimensional System - Duffing Oscillator Next consider the following hard spring stochastic duffing oscillator: ẍ − x + x3 = u + gw(t) 8 (49) (i) Converged Value Function Surface. (ii) Converged Control Input Surface. The maximum control required was modulated using the tuning parameter γ. Figure 1 Converged Results for the Stochastic Van der Pol Oscillator. (i) Open Loop System Response for Eq.44. (ii) System Response to Initial Stabilizing Control, u(0) = −20ẋ. Figure 2 System Response of the Stochastic Van der Pol Oscillator I. 9 Response to u(0) Response to uconverged (ii) Optimal Control Law Converged upon and u(0) . (i) Optimal Path Obtained, Shown Alongside Response to u(0) . Figure 3 System Response of the Stochastic Van der Pol Oscillator II. (i) Converged Value Function Surface. (ii) Converged Control Input Surface. The maximum control required was modulated using the tuning parameter γ. Figure 4 Converged Results for the Stochastic Hard Spring Duffing Oscillator. The above system represents case ( = +1.5) with an unstable open-loop equilibrium at (x, ẋ) = (0, 0) √ the hard-spring √ and stable equilibria at ( , 0) and (− , 0). The cost function to be minimized is the same as for the previous system, and we use a linear stabilizing control given by: u(0) = −30ẋ. Similar convergence characteristics as for the Van der Pol oscillator were obtained and are shown in Figs. 4-6. 5.3 Four Dimensional System - Missile Pitch Control Autopilot Finally, we consider a four-dimensional system that models the pitch motion control for a missile autopilot. This model is the same as that considered in refrences27, 28 with a noise term added to all kinetic equations: q̇ = α̇ = δ̈ My + gq w1 (t), Iy cos2 α Fz + q + gα w2 (t), mU = −2ζωn δ̇ + ωn2 (δc − δ) + gδ w3 (t) 10 My = Cm (α, δ)QSd (50) Fz = Cn (α, δ)Sd (51) (52) (ii) System Response to Initial Stabilizing Control, u(0) = −30ẋ. (i) Open Loop System Response for Eq.49. Figure 5 System Response of the Stochastic Duffing Oscillator I. Response to u(0) Response to uconverged (i) Optimal Path Obtained, Shown Alongside Response to u(0) . (ii) Optimal Control Law Converged upon and u(0) . Figure 6 System Response of the Stochastic Duffing Oscillator II. 11 (i) Converged Value Function Surface. (ii) Converged Control Input Surface. Figure 7 Converged Results for the Missile Pitch Controller, Showing Various Sections of the Four-D State Space. The control input in the above equations is δc (t), which is the commanded tail-fin deflection, while δ(t) denotes the actual tail-fin deflection. Pitch rate and angle of attack are denoted by q and α respectively. wq (t), wα (t) and wδ (t) are independent components of a 3 dimensional white noise process. The aerodynamic coefficient are given by the following nonlinear functions: Cm (α, δ) = b1 α3 + b2 α|α| + b3 α + b4 δ Cn (α, δ) = a1 α3 + a2 α|α| + a3 α + a4 δ The values of the various constants can be found in Beard et. al. (1998). The cost by: "Z 2 ∞ 1 Fz − Fzss 2 2 J(x0 ) = E (q − qss ) + 25(α − αss ) + + 2 m 0 (53) (54) function to be minimized is given 1 ku − δss k2R 2 # (55) where the ‘ss’ subscripted quantities denote steady state values. We use the same starting stabilizing control as that used in Beard et. al.: Fz u(0) = 0.08(q + qss ) + 0.38(α + αss ) + 0.37 (56) m A complete basis of global quadratic shape functions was used to approximate the value function in the policy iteration algorithm. Various sections of the converged value function and optimal control surface are shown in Fig.7. The optimal trajectories are shown in Fig.7. Fig.8 shows the system response to the optimal control law obtained. Notice that the pitch rate settles to zero and the angle of attack acquires the desired steady state value. 5.4 Notes on Modal Analysis It is easy to see that any constant function, ψ = c, is a stationary solution of the BK equation, and an eigenfunction of the BK operator with trivial eigenvalue. It was mentioned in Section that the constant basis function, φ ≡ 1, is not admissible while solving for the coefficients, ci . However, when considering the eigenvalue problem for the BK operator, Lψ = λψ, it is imperative that the constant basis function be included in the set; because the stationary solution of the BK equation is indeed the constant function. The modified BK operator (β > 0), however, does not admit a constant stationary solution and an eigenfunction with a trivial eigenvalue (Fig 9). We reiterate that it is important while solving the eigenvalue problem to utilize a complete basis set in the approximation space (i.e. including the constant function) in order to recover the stationary eigenfunction of the BK operator. On the other hand, it is possible to solve for the coefficients, ci , without using the constant basis function. Fig.9 shows the absolute values of the eigenvalues for both the Fokker-Planck and the Backward Kolmogorov operators. 12 Open Loop Response to uconverged (i) Optimal Path Obtained, Shown Alongside Open Loop Trajectory. (ii) Optimal Control Law Obtained. Figure 8 System Response of the Missile Pitch Controller. (i) Comparison of the Spectra of the BK and FP Operators (ii) The Modified BK Operator does not Contain the Trivfor the Van der Pol Oscillator. ial Eigenvalue in its Spectrum (β = 0.02). Figure 9 Spectra of the BK, modified BK and FP Operators for the Van der Pol Oscillator 13 6 Conclusion In this paper, we have outlined a computational methodology for solving the nonlinear stochastic optimal control based on the policy iteration algorithm and a finite dimensional approximation of the controlled Fokker-Planck-Kolomogorov/ diffusion operator. The computational methodology was also tested on a number of test cases where it was shown to have satisfactory performance. The next step in our research will be to extend these methods to more complex stochastic systems such as jump diffusion processes and stochastic hybrid systems. We envisage using a hierarchical hybrid methodology in order to solve the abovementioned problems wherein the methods outlined in this paper would form the lower, continuous level of the hierarchy. REFERENCES 1. H. Risken, The Fokker Planck Equation: Methods of Solution and Applications, ser. Springer Series in Synergetics. 2. D. C. Polidori and J. L. Beck, “Approximate solutions for nonlinear vibration problems,” Prob. Engr. Mech., vol. 11, pp. 179–185, 1996. 3. M. C. Wang and G. Uhlenbeck, “On the theory of brownian motion ii,” Reviews of Modern Physics, vol. 17, no. 2-3, pp. 323–342, 1945. 4. A. H. Jazwinski, Stochastic Processes and Filtering Theory. New york, NY: Academic Press, 1970. 5. A. Doucet, J. F. G. deFrietas, N. J. Gordon, and eds., Sequential Monte Carlo Methods in Practice. New York: Springer-Verlag, 2001. 6. J. L. Junkins, P. Singla, T. D. Griffith, and T. Henderson, “Orthogonal global/local approximation in ndimensions: Applications to input/output approximation,” in 6th International conference on Dynamics and Control of Systems and Structures in Space, Cinque-Terre, Italy, 2004. 7. J. L. Junkins and P. Singla, “Orthogonal global/local approximation in n-dimensions: Applications to input/output approximation,” in to appear in Computer Methods in Engg. and Science, Submitted. 8. M. Kumar, P. Singla, J. Junkins, and S. Chakravorty, “A multi-resolution meshless approach to steady state uncertainty determination in nonlinear dynamical systems,” in 38th IEEE Southeastern Symposium on Systems Theory, Cookeville, TN, Mar 5-7, 2006. 9. M. Kumar, P. Singla, S. Chakravorty, and J. L. Junkins, “The partition of unity finite element approach to the stationary fokker-planck equation,” in AIAA/AAS Astrodynamics Specialist Conference, Keystone, CO, USA, 21-24 August, 2006, AIAA Paper 2006-6285. 10. S. Chakravorty, “A homotopic galerkin approach to the solution of the fokker-planck-kolmogorov equation,” in Proceedings of the 2006 American Control Conference, 2006. 11. M. Kumar, S. Chakravorty, and J. L. Junkins, “An eigendecomposition approach for the transient fokker-planck equation,” in Proceedings of the 18th Engineering Mechanics Division Conference of the ASCE, Blacksburg, VA, 2007. 12. ——, “A semianalytic meshless approach to the fokker-planck equation with application to hybrid systems,” in Proc. IEEE Conf. Dec. and Control, vol. to appear, 2007. 13. R. E. Bellman, Dynamic Programming. Princeton, NJ: Princeton University Press, 1957. 14. D. P. Bertsekas, Dynamic Programming and Optimal Control, vols I and II. Cambridge: Athena Scientific, 2000. 15. D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming. Boston, MA: Athena Scientific, 1996. 16. P. Werbos, “Approximate dynmaic programming for real-time control and neural modeling,” in Handbook of Intelligent Control, White and Sofge, Eds. Van Nostrand Reinhold. 17. P. Marbach, Simulation based Optimization of Markov Reward Processes, PhD Thesis. sachusetts Institute of Technology, 1999. 14 Boston, MA: Mas- 18. J. Baxter and P. Bartlett, “Infinite horizon policy-gradient estimation,” Journal of Artificial Intelligence Research, vol. 15, pp. 319–350, 2001. 19. R. S. S. et. al., “Policy gradient methods for reinforcement learning with function approximation,” in Proc. 1999 Neural Information Proc. Sys., 1999. 20. M. Lagoudakis and R. Parr, “Least squares policy iteration,” Journal of Machine Learning Research, vol. 4, pp. 1107–1149, 2003. 21. H. J. Kushner and P. Dupuis, Numerical Methods for Stochastic Control Problems in Continuous Time. york, NY: Springer, 2001. New 22. L. G. Crespo and J. Q. Sun, “Stochastic optimal control of nonlinear systems via short-time gaussian approximation and cell mapping,” Nonlinear Dynamics, vol. 28, pp. 323–342, 2002. 23. ——, “Stochastic optimal control via bellman’s principle,” Automatica, vol. 39, pp. 2109–2114, 2003. 24. C. S. Hsu, Cell to Cell Mapping: Amethod for the Global Analysis of Nonlinear Systems. Springer-Verlag, 1987. New York, NY: 25. F. B. Hanson, “Techniques in computational stochastic dynamic programming,” in Control and Dynamical Systems: Advances in Theory and Applications, C. T. Leondes, Ed., vol. 76. New York, NY: Academic Press, 1996. 26. ——, “Computational dynamic programming on a vector multiprocessor,” IEEE Transactions on Automatic Control, vol. 36, pp. 507–511, 1991. 27. R. Beard, G. Saridis, and J. Wen, “Galerkin approximation of the generalized hamilton-jacobi-bellman equation,” Automatica, vol. 33, pp. 2159–2177, 1997. 28. ——, “Approximate solutions to the time-invariant hamilton-jacobi-bellman equation,” Journal of Optim. Th. Appl., 1998. 29. F. L. L. et. al., Neural Network control of Robotic Manipulators and nonlinear systems. Taylor and Francis, 1999. 30. M. A. Kahlaf and F. L. Lewis, “Nearly optimal state feedback control of constrained nonlinear system using a neural network hjb,” Annual Reviews in Control, vol. 28, pp. 239–251, 2004. 31. M. Abu-Khalaf, J. Huang, and F. L. Lewis, Nonlinear H2/ H-∞ constrained Feedback control: a practical approach using neural networks. New York: Springer Verlag, June 2006. 32. R. Padhi and S. N. Balakrishnan, “Proper orthogonal decomposition based optimal neurocontrol synthesis of a chemical reactor process using approximate dynamic programming,” Neural Netw., vol. 16, no. 5-6, pp. 719–728, 2003. 33. S. Ferrari, Algebraic and Adaptive Learning in Neural Control Systems, PhD thesis. Princeton University, 2002. 34. A. E. Bryson and Y. Ho, Applied Optimal Control. Washington, D.C.: Hemsiphere Publsihing Co., 1975. 35. I. M. Ross and F. Fahroo, “Pseudospectral knotting methods for solving nonsmooth optimal control problems,” AIAA J. of Guid. Cont. Dyn., vol. 27, pp. 397–405, 2004. 36. ——, “Issues in real-time computation of optimal control,” Mathematical and Computer Modeling, vol. 43, pp. 1172–1188, 2006. 37. W. H. Fleming and H. Soner, Controlled Markov Processes and Viscosity Solutions. 2001. New york, NY: Springer, 38. M. G. Crandall and P. L. Lions, “Viscosity solutions of hamilton-jacobi equations,” Transactions of the American Mathematical Society, vol. 277, pp. 1–42, 1983. 39. M. G. Crandall, H. Ishii, and P. L. Lions, “User’s guide to viscosity solutions of second order partial differential equations,” Bulletin of the American Mathematical Society, vol. 27, pp. 1–67, 1992. 40. M. Puterman, Markov Decision Process. Hoboken, NJ: Wiley-Interscience, 1994. 15