preprint

Transcription

preprint
Distributed Asynchronous Time-Varying
Constrained Optimization
Andrea Simonetto and Geert Leus
Faculty of EEMCS, Delft University of Technology, 2826 CD Delft, The Netherlands
e-mails: ta.simonetto, [email protected]
Abstract—We devise a distributed asynchronous gradientbased algorithm to enable a network of computing and communicating nodes to solve a constrained discrete-time timevarying convex optimization problem. Each node updates its
own decision variable only once every discrete time step. Under
some assumptions (strong convexity, Lipschitz continuity of
the gradient, persistent excitation), we prove the algorithm’s
asymptotic convergence in expectation to an error bound whose
size is related to the constant stepsize choice and the variability
in time of the optimization problem. Moreover, the convergence
rate is linear.
In addition, we present an interesting by-product of the
proposed algorithm in the context of time-varying consensus, and
we discuss some numerical evaluations in multi-robot scenarios
to assess the algorithm performance and the tightness of the
proven asymptotic bounds.
I. I NTRODUCTION
We consider a time-varying optimization problem defined
on time-varying functions that are distributed over a network
of computing and communicating nodes. Let the nodes be
labeled with i P V “ t1, . . . , nu, and for each discrete time
k P N, we equip each of them with the private local function
fi,k pxq : Rd Ñ R.
The main goal for the computing nodes at each discrete
time k is to solve the optimization problem
ÿ
minimize
fi,k pxq
(1)
xPX
iPV
where each of the fi,k pxq is a convex function of x, while X
is a nonempty, closed, convex set. And, by solving, we mean
computing an optimizer of (1) for each k.
We allow the computing nodes to communicate with their
immediate neighbors defined via the undirected communication graph Gk “ pV, Ek q, with time-varying edge set Ek . In
particular, at each time k, every node i can communicate with
all the nodes j P Ni,k :“ tj P V |pi, jq P Ek u (that is, we
assume an edge-asynchronous protocol).
Problems as (1) appear in distributed estimation of stochastic time-varying signals [1], in distributed control of mobile
multi-robot systems with time-varying tasks [2], and even as a
result of sequential convex programming approaches to multiagent non-convex problems [3]–[5].
When each of the fi,k pxq’s are time-invariant, several
approaches can be applied to solve (1). These techniques
This research was supported in part by STW under the D2S2 project from
the ASSYS program (project 10561).
differ for the assumptions they require and the properties
they can ensure (convergence, convergence rate, resilience
to asynchronous communication protocols, among others).
Examples of such approaches are the subgradient method [6],
dual averaging [7], and the alternating direction method of
multipliers [8], [9].
All the mentioned techniques are iterative and they require
communication among the nodes to converge to an optimizer
of (1). In the case of time-varying fi,k pxq’s, they would
converge (in theory) only when each node could exchange
an infinite number of messages with its neighbors, between
consecutive time steps k and k`1. Specific (so called running)
methods that account for a finite number of messages between
consecutive time steps and still guarantee convergence have
been proposed in [2], [10]–[17], but they are all limited to
specific versions of (1). Notably, in [1], [17], the authors
work under the same general assumptions that we use, but
consider unconstrained optimization, while in [16], the authors
use subgradient methods but they assume that the optimizers
of (1) are not time-varying.
Contributions. We propose an asynchronous gradient-based
distributed algorithm for the computing nodes to converge to
an optimizer of (1), here presented in Algorithm 1. In fact, due
to the time-varying nature of the problem, the convergence
will be shown up to an error bound, whose size is directly
dependent on the change in time of the optimizer of (1). This
algorithm can be seen as a generalization in a time-varying
context of the work in [6] where only one iteration of the
algorithm is performed between consecutive time steps, as well
as a generalization of the work in [1], [17] in asynchronous
and constrained settings. In addition, in contrast to [1], [17],
our algorithm does not hinge on dual variables to reach a
common decision vector among the nodes (which complicates
significantly the theoretical analysis of convergence), but is
instead based on consensus protocols, which are easier to
analyze and embed on real hardware.
II. D ISTRIBUTED A LGORITHM
We want to enable the computing nodes to solve (1) in a
distributed fashion, where each of the nodes communicates
with their neighbors only. For this task, we introduce local
copies of the decision variable xk . These local copies are
referred to as yi,k . We formally formulate the problem at hand
as follows.
Devising a distributed algorithm in order to enforce that
the local decision variable yi,k eventually converges (up to a
bounded error) to the optimal solution of (1) at time step k
(x˚k ), or formally,
‰
“
@i P V,
lim inf E }yi,k ´ x˚k }2 ď δ,
A3) the second smallest eigenvalue of W̄ is positive, i.e.,
λ2 pW̄q ą 0.
Call yk the stacked vector of all yi,k ’s. Define
r :“ 1 ` α2 L2 ´ αm, γ :“ 1 ´ βλ2 pW̄q,
›)
!› ÿ
›
›
M :“ max ›
∇x fj,k pxq|x˚ › .
k
kÑ8
for some δ ě 0.
We describe now our proposed algorithm: it consists of two
basic steps, the first is a single consensus iteration, while the
second is a projected gradient descent. In order to enforce
consensus among the local decision variables, let us define
the time-varying consensus matrix Wk and two different
stepsizes α ą 0 and β ą 0. The consensus matrix Wk
is a symmetric (owning to the edge-asynchronous protocol
assumption) matrix constructed based on the edge set Ek , and
thus on the adjacency matrix Ak , as
"
´rA s
for j ‰ i
rWk si,j “ řn k i,j
.
rA
s
for
j“i
k i,l
l“1
As for any consensus matrix, we assume that Wk has nonzero
elements if and only if the related nodes can communicate with
each other, it is rank deficient and in particular Wk 1n “ 0n ,
and finally, for the sequence of matrices tWk u, we define
ErWk s “ W̄ “ W̄T .
With this in place, we are ready to describe our gradientbased distributed algorithm, as in Algorithm 1.
Algorithm 1 Asynchronous distributed gradient algorithm
Initialize by picking locally an arbitrary yi,1 P X. Then
for k ě 1:
1) compute the local variable vi,k`1 by local communication
as
n
ÿ
vi,k`1 “ yi,k ´ β
rWk si,j yj,k ;
(2)
j“1
2) compute locally the gradient of fi,k with respect to x at
vi,k`1 , as
gi,k “ ∇x fi,k pxq|vi,k`1 ;
3) update the local variable yi,k as
yi,k`1 “ PX rvi,k`1 ´ αgi,k s ;
(3)
where PX r¨s indicates the projection operator;
4) go to step 1.
For Algorithm 1, convergence goes as follows.
Theorem 1: Assume that:
A1) each one of the functions fi,k pxq is strongly convex with
parameter m for all k, and their gradient is Lipschitz
continuous with constant L for all k;
A2) the distance between the optimizer of (1) at two subsequent time steps is bounded as
}x˚k ´ x˚k´1 } ď δx ;
i,k
jPV,j‰i
If we choose β ă 1{n, and α ă m{L2 , then the sequence
of tyk u generated by Algorithm 1 converges as
Er}yk`1 ´ 1n b x˚k`1 }2 s ď rEr}yk ´ 1n b x˚k }2 s`
α2 n M 2
nδx2
?
? `
? .
γp1 ´ γq 1 ´ γ
Furthermore, we have 0 ă r ă 1 and thus, the convergence
rate is linear.
Proof: The proof can be found in [18, Theorem 1].
A few words are in order for the assumptions. Assumption
A1) makes the solution set of (1) to be a unique point, and
it is a recurrent assumption in the time-varying optimization
literature, see for instance [19, Chapter 6]. Assumption A2)
gives a handle on the variability of the optimizer, and it is also
quite standard. Assumption A3) is required for the nodes to
reach an agreement: it basically says that the communication
graph Gk is connected in expectation, i.e., ErGk s is connected.
Finally, the scalar M in the theorem quantifies how different
the local optimal gradients are from the their mean value, and
it is bounded, given A1).
Corollary 1: Under the same assumptions of Theorem 1,
we obtain
lim inf Er}yk`1 ´ 1n b x˚k`1 }2 s ď δ,
kÑ8
where
δ“
˘
`
1
n
?
? α2 M 2 { γ ` δx2 .
1´r1´ γ
Proof: Straightforward by applying the properties of
geometric series.
The last result shows the bounded error floor the algorithm
is converging to. In particular, δ depends on the constant
stepsize choice α, on the dissimilarity of the local optimal
gradient M , on the network connectivity γ, and on the
variability of the optimizer δx . In principle, one could optimize
the choices of α and β to trade-off convergence rate, r, and
asymptotic error, δ.
III. E XAMPLE : T IME -VARYING C ONSENSUS
In this section, we present an interesting by-product of the
proposed algorithm. In particular, we show that it can be used
to solve time-varying consensus problems. A time-invariant
consensus problem can be written as the following strongly
convex program
1 ÿ
minimize
}x ´ ci }2 .
(4)
x
2 iPV
5
5
5
5
5
5
0
0
0
0
0
0
−5
−5
35
40
k“2
45
−5
35
40
45
k “ 79
40
45
k “ 1247
4
2
10
35
40
45
k “ 3816
2
5
35
40
45
k “ 4983
40
45
k “ 6306
40
45
k “ 1947
35
40
45
k “ 9420
10
5
35
35
15
6
4
40
45
k “ 2804
−5
35
15
10
8
6
35
40
45
k “ 702
15
10
8
0
−5
35
12
10
5
40
45
k “ 313
14
12
10
−5
35
35
40
45
k “ 7785
5
Fig. 1. Snapshots of the algorithm’s waypoint generation (red points) and reference ones (blue diamonds) for the chosen example. Black lines between red nodes
represent the edge set E, while the light red lines are the waypoints’ trajectories from τ “ 0 till τ “ k. The reference points move along a circular trajectory,
although the radius is too big to be appreciated in these snapshots. A video of this simulation result is available at http://ens.ewi.tudelft.nl/„asimonetto/.
ř
The solution of (4) is x˚ “ n1 iPV ci , that is the average
value of the vector ci across the network. In a time-varying
case, ci is time-dependent, i.e., ci,k , and we want the nodes
to agree on a time-varying average. In particular, we want the
node to solve (for each k) the optimization problem
1 ÿ
}x ´ ci,k }2 ,
(5)
minimize
x
2 iPV
which perfectly fits our problem set (1). Applying Algorithm 1
to this problem yields the iterate
¸
˜
n
ÿ
rWk si,j yj,k ` αci,k . (6)
yi,k`1 “ p1 ´ αq yi,k ´ β
j“1
Corollary 2: Assume that the second smallest eigenvalue
of W̄ is positive, i.e., λ2 pW̄q ą 0. Choose β ă 1{n and
α ă 1{2. Then iteration (6) converges to the solution of (5)
as
›ı
”›
1 ÿ
›
›
lim inf E ›yi,k`1 ´
ci,k › ď δ, @i
kÑ8
n iPV
Proof: Straightforward given that m “ 1{2, L “ 1.
Notice that the value of δ is the same as in Theorem 1.
Remark 1: The algorithms in [1], [17] could also be applied
to time-varying unconstrained consensus problems. The benefit to use ours is that we can allow the nodes to communicate
asynchronously via a time-varying edge set Ek . A thorough
comparison with [1], [17] is a matter of future research.
IV. N UMERICAL E VALUATIONS
A. Multi-Robot Control
The numerical example we are presenting is a formation
control problem, where a number of mobile nodes need to
track a defined point in space and maintain a certain formation.
The example is inspired by [20] and has the added aim to
show that the proposed algorithm can work with partially
overlapping decision variables x (i.e., there is no need for
each of the computing nodes to agree on the total decision
variable x but only on subsets of it).
We consider n “ 36 mobile nodes that have a fixed connection structure E, and need to track a squared pattern figure in
two dimensions (Figure 1). At a given discrete time k, each
mobile node i needs to compute a waypoint xi,k where to head
to, this waypoint depends on the current value of the reference
point xref
i,k and on the neighboring waypoint/reference values.
We consider each of the neighboring reference points to be
known to the nodes.
Putting this together, the computing mobile nodes have to
solve the optimization problem
ÿ´
2
minimize
θ}xi,k ´ xref
i,k } `
xk
iPV
ÿ
jPNi
¯
ref 2
}xi,k ´ xj,k ´ pxref
, (7)
i,k ´ xj,k q}
where xk is the stacked version of the all xi,k ’s and θ ą 0 is
a chosen scaling factor. It is easy to see that the problem (7)
fits our problem formulation (1). It is sufficient to call,
ÿ
ref 2
2
}xi ´ xj ´ pxref
fi,k pxq “ θ}xi ´ xref
i,k ´ xj,k q} .
i,k } `
jPNi
The reference states (xref
i,k ) evolve along circular trajectories with constant angular velocity ω. At each iteration
k the symmetric adjacency matrix of the communication
graph Ak is generated by an i.i.d. Bernoulli process with
4
Er}yk ´ yk˚ }2 s
10
Bound: Theorem 1
y1,1
10
3
10
5
y2,1
10
0
2000
4000
6000
8000
y
2
10000
y50,1
0
x˚
1
discrete time k
Fig. 2.
Convergence performance of the algorithm for the chosen example.
−5
B. Time-Varying Consensus
A second numerical example involves a time-varying consensus problem in two dimensions. In particular, with the
notation of Section III, we consider n “ 50 nodes, the vector
ci,0 generated by using a uniform probability distribution of
width 1 around the point p10, 0q, and ci,k following a circular
trajectory of angular velocity ω “ 1e-4. The initial vectors
yi,0 are randomly picked, the stepsizes α “ 1{p15nq and
β “ 0.2{n, λ2 pW̄q “ 0.4846, while PrrrAk sij “ 1s “ 0.8
for all pi, jq P E.
Figure 3 depicts how the nodes reach consensus and follow
the time-varying optimizer. The black circles are the values
of yi,1 for all i, the red square represents the value of the
optimizer at the last simulated discrete time, k “ 50, 000,
while the black diamonds close by are the values of yi,k at
the same discrete time.
In Figure 4, we display the convergence of the proposed
algorithm which is linear up to the asymptotic bound of
Corollary 2. In this case, the bound is also reasonably tight
(once again yk˚ “ 1n b x˚k ).
V. C ONCLUSIONS
We have proposed a distributed stochastic gradient asynchronous algorithm to solve a convex separable time-varying
program. The overall scheme converges linearly to an error
bound whose size depends on the constant stepsize α and the
yi,50,000
−10
−10
−5
x˚
50,000
0
5
10
x
Fig. 3. Trajectories of the vectors yi,k while reaching consensus and tracking
the optimizer.
4
10
3
10
Er}yk ´ yk˚ }2 s
PrrrAk sij “ 1s “ 0.7 for all pi, jq P E (that is, Ek Ď E).
The weight θ “ 0.5, while the stepsizes α and β are
determined according Theorem 1, in fact, in this example, the
bounds are analytically computable. For the simulations, we
set α “ 1.75θ{p2pθ ` λn pW̄q{0.7qq2 and β “ 0.3{n.
We select the angular velocity as ω “ 0.5{40α, and
run the distributed asynchronous time-varying optimization
problem up to k “ 10, 000. By using snapshots of the agents’
trajectories, we show the algorithm’s behavior (Figure 1). The
blue diamonds are the reference waypoints, while the red
points are the agents’ computed waypoints at discrete time
k. The black lines represent the fixed edge set E, while the
light red lines are the waypoints’ trajectories from τ “ 0 till
τ “ k. As we further see, the convergence performance is
in line with the asymptotical bound of Theorem 1, which is
reasonable in this particular case (Figure 2, where we have
called yk˚ “ 1n b x˚k ).
Bound: Corollary 2
2
10
1
10
0
10 0
10
1
10
2
10
3
10
4
10
discrete time k
Fig. 4.
Convergence performance of the algorithm for the chosen example.
variability in time of the optimizer. The numerical evaluations
well depict the performance of the proposed approach.
In addition to the results presented in this paper, some
extensions have already been studied: in [18], we present a
variant of the algorithm for cases in which the constraint set
X is time-varying, the optimization problem is stochastic, and
the gradients are computed only up to a defined accuracy ǫ.
In [21], we extend the algorithm presented here to deal with
non-strongly convex multiuser optimization.
Nonetheless, several open questions are still present, and
have been left for future research. For example, the derived
bounds could be tightened. And, if some knowledge of how
the functions fi,k pxq are varying with time can be acquired or
learned by the nodes, a predictive-corrective tracking could be
added to the algorithm, as proposed in centralized nonlinear
programming [22], [23].
R EFERENCES
[1] F. Y. Jakubiec and A. Ribeiro, “D-MAP: Distributed Maximum a Posteriori Probability Estimation of Dynamic Systems,” IEEE Transactions
on Signal Processing, vol. 61, no. 2, pp. 450 – 466, 2013.
[2] S.-Y. Tu and A. H. Sayed, “Mobile Adaptive Networks,” IEEE Journal
of Selected Topics in Signal Processing, vol. 5, no. 4, pp. 649 – 664,
2011.
[3] A. Simonetto, T. Keviczky, and R. Babuška, “On Distributed Algebraic
Connectivity Maximization in Robotic Networks,” in Proceedings of the
American Control Conference, San Francisco, USA, June – July 2011,
pp. 2180 – 2185.
[4] A. Simonetto, “Distributed Estimation and Control for Robotic Networks,” Ph.D. dissertation, Delft University of Technology, Delft, The
Netherlands, 2012.
[5] A. Simonetto, T. Keviczky, and R. Babuska, “Constrained Distributed
Algebraic Connectivity Maximization in Robotic Networks,” Automatica, vol. 49, no. 5, pp. 1348 – 1357, 2013.
[6] K. Srivastava and A. Nedić, “Distributed Asynchronous Constrained
Stochastic Optimization,” IEEE Transactions on Selected Topics in
Signal Processing, vol. 5, no. 4, pp. 772 – 790, 2011.
[7] J. C. Duchi, A. Agarwal, and M. Wainwright, “Dual Averaging for
Distributed Optimization: Convergence Analysis and Network Scaling,”
IEEE Transactions on Automatic Control, vol. 57, no. 3, pp. 592 – 606,
2012.
[8] I. D. Schizas, A. Ribeiro, and G. B. Giannakis, “Consensus in Ad
Hoc WSNs With Noisy Links— Part I: Distributed Estimation of
Deterministic Signals,” IEEE Transactions on Signal Processing, vol. 56,
no. 1, pp. 350 – 364, 2008.
[9] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed Optimization and Statistical Learning via the Alternating Direction Method
R in Machine Learning, vol. 3,
of Multipliers,” Foundations and Trends
no. 1, pp. 1 – 122, 2011.
[10] M. Kamgarpour and C. Tomlin, “Convergence Properties of a Decentralized Kalman Filter,” in Proceedings of the 47th IEEE Conference
on Decision and Control, Cancun, Mexico, December 2008, pp. 3205 –
3210.
[11] P. Braca, S. Marano, V. Matta, and P. Willett, “Asymptotic Optimality of
Running Consensus in Testing Binary Hypotheses,” IEEE Transactions
on Signal Processing, vol. 58, no. 2, pp. 814 – 825, 2010.
[12] F. S. Cattivelli and A. H. Sayed, “Diffusion Strategies for Distributed
Kalman Filtering and Smoothing,” IEEE Transactions on Automatic
Control, vol. 55, no. 9, pp. 2069 – 2084, 2010.
[13] M. Farina, G. Ferrari-Trecate, and R. Scattolini, “Distributed Moving
Horizon Estimation for Linear Constrained Systems,” IEEE Transactions
on Automatic Control, vol. 55, no. 11, pp. 2462 – 2475, 2010.
[14] D. Bajovic, D. Jakovetic, J. Xavier, B. Sinopoli, and J. M. F. Moura,
“Distributed Detection via Gaussian Running Consensus: Large Deviations Asymptotic Analysis,” IEEE Transactions on Signal Processing,
vol. 59, no. 9, pp. 4381 – 4396, 2011.
[15] M. M. Zavlanos, A. Ribeiro, and G. J. Pappas, “Network Integrity in
Mobile Robotic Networks,” IEEE Transactions on Automatic Control,
vol. 58, no. 1, pp. 3 – 18, 2013.
[16] R. L. G. Cavalcante and S. Stanczak, “A Distributed Subgradient Method
for Dynamic Convex Optimization Problems Under Noisy Information
Exchange,” IEEE Journal of Selected Topics in Signal Processing, vol. 7,
no. 2, pp. 243 – 256, 2013.
[17] Q. Ling and A. Ribeiro, “Decentralized Dynamic Optimization Through
the Alternating Direction Method of Multipliers,” IEEE Transactions on
Signal Processing, vol. 62, no. 5, pp. 1185 – 1197, 2014.
[18] A. Simonetto, L. Kester, and G. Leus, “Distributed Time-Varying
Stochastic Optimization and Utility-based Communication,” Submitted
to IEEE Transactions on Control of Network Systems, 2014, available
at http://arxiv.org/abs/1408.5294.
[19] B. T. Polyak, Introduction to Optimization. Optimization Software,
Inc., 1987.
[20] F. Borrelli and T. Keviczky, “Distributed LQR Design for Identical
Dynamically Decoupled Systems,” IEEE Transaction on Automatic
Control, vol. 53, no. 8, pp. 1901 – 1912, 2008.
[21] A. Simonetto and G. Leus, “Double Smoothing for Time-Varying
Distributed Multi-user Optimization,” in Proceedings of the IEEE Global
Conference on Signal and Information Processing, Atlanta, US, December 2014.
[22] V. M. Zavala and M. Anitescu, “Real-Time Nonlinear Optimization as
a Generalized Equation,” SIAM Journal of Control and Optimization,
vol. 48, no. 8, pp. 5444 – 5467, 2010.
[23] A. L. Dontchev, M. I. Krastanov, R. T. Rockafellar, and V. M. Veliov,
“An Euler-Newton Continuation method for Tracking Solution Trajectories of Parametric Variational Inequalities,” SIAM Journal of Control
and Optimization, vol. 51, no. 51, pp. 1823 – 1840, 2013.