SAMPLE PATH OPTIMIZATION TECHNIQUES FOR DYNAMIC

Transcription

RESOURCE ALLOCATION IN DISCRETE EVENT SYSTEMS
A Dissertation Presented
by
CHRISTOS PANAYIOTOU
Submitted to the Graduate School of the
University of Massachusetts, Amherst in partial fulfillment
of the requirements for the degree of
DOCTOR OF PHILOSOPHY
May 1999
Department of Electrical and Computer Engineering
c
°Copyright
by Christos Panayiotou 1999
All Rights Reserved
A Dissertation Presented
by
CHRISTOS PANAYIOTOU
Approved as to style and content by:
Christos G. Cassandras, Chair
Theodore E. Djaferis, Co-Chair
Wei-Bo Gong, Member
Agha Iqbal Ali, Member
Seshu Desu, Department Head
Department of Electrical and Computer Engineering
ABSTRACT
SEPTEMBER 1999
CHRISTOS PANAYIOTOU, B.S.E.C.E., UNIVERSITY OF MASSACHUSETTS AMHERST
M.B.A., UNIVERSITY OF MASSACHUSETTS AMHERST
Ph.D., UNIVERSITY OF MASSACHUSETTS AMHERST
Directed by: Professor Christos G. Cassandras
The main focus of this dissertation is the dynamic allocation of discrete-resources in stochastic environments. For this purpose, we develop two algorithms that can be used to address such problems.
The first one, is descent, in other words at every iteration it goes to an allocation with a lower cost,
and it is suitable for problems with separable convex structure. Furthermore, at every iteration it
visits feasible allocations which makes it appropriate for use on-line. The second one, is incremental,
that is, it starts with zero resources, and at every step it allocates an additional resource. Both
algorithms are proven to converge in deterministic as well as stochastic environments. Furthermore,
because they are driven by ordinal comparisons they are robust with respect to estimation noise and
converge fast.
To complement the implementation of the derived optimization algorithms we develop two techniques for predicting the system performance under several parameters while observing a single
sample path under a single parameter. The first technique, Concurrent Estimation can be directly
applied to general DES while for the second one, FPA, we demonstrate a general procedure for deriving such algorithm for the system dynamics. Moreover, both procedures can be used for systems
with general event lifetime distributions.
The dissertation ends with three applications of the derived resource allocation methodologies on
three different problems. First, the incremental algorithm is used on the a kanban-based manufacturiv
ing system to find the kanban allocation that optimizes a given objective function (e.g., throughput,
mean delay). Next, a variation of the descent algorithm is used to resolve the channel allocation
problem in cellular telephone networks as to minimize the number of lost calls. Finally, a combination of the FPA and kanban approaches were used to solve the ground holding problem in air traffic
control to minimize the congestion over busy airports.
v
Contents
ABSTRACT
iv
LIST OF FIGURES
xii
1 INTRODUCTION
1
1.1. Classification of Resource Allocation Problems . . . . . . . . . . . . . . . . . . . . . . .
1
1.2. Dissertation Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.3. Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
1.4. Organization of the Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
2 BACKGROUND ON STOCHASTIC OPTIMIZATION
7
2.1. Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
2.2. Ordinal Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
2.3. Stochastic Ruler (SR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
2.4. Stochastic Comparison (SC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
2.5. Nested Partitioning (NP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.6. Multi-armed Bandit Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.7. Noise Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 DESCENT ALGORITHMS FOR DISCRETE-RESOURCE ALLOCATION
12
3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2. Characterization of the Optimal Allocation . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.3. Deterministic Descent On-Line Optimization Algorithm . . . . . . . . . . . . . . . . . . 15
3.3.1 . Interpretation of D-DOP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3.2 . Properties of the Process D-DOP . . . . . . . . . . . . . . . . . . . . . . . . . . 16
vi
3.3.3 . Convergence of the D-DOP Process . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.4. Stochastic On-Line Optimization Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4.1 . Properties of the S-DOP Process
. . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.4.2 . Convergence of the S-DOP Process . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.4.3 . A Stronger Convergence Result . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.5. Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.6. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4 INCREMENTAL ALGORITHMS FOR DISCRETE-RESOURCE
ALLOCATION
24
4.1. Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.2. Deterministic Case
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2.1 . Deterministic Incremental Optimization Algorithm (DIO) . . . . . . . . . . . . 25
4.2.2 . Complementary Deterministic Incremental Optimization Algorithm (DIO) . . . 26
4.2.3 . Extension of the Incremental Optimization Algorithms . . . . . . . . . . . . . . 26
4.3. Stochastic Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3.1 . Stronger Convergence Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.4. Discussion on the Incremental Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5 PERTURBATION ANALYSIS
30
5.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.2. Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.3. Concurrent Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.3.1 . Notation and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.3.2 . Timed State Automaton . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.3.3 . Coupling Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.3.4 . Extensions of the TWA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.4. Finite Perturbation Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.4.1 . Notation and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.4.2 . Derivation of Departure Time Perturbation Dynamics . . . . . . . . . . . . . . . 38
5.5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
vii
6 OPTIMIZATION OF KANBAN-BASED MANUFACTURING SYSTEMS
40
6.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
6.2. More on the Smoothness Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6.3. Application of the Incremental Optimization Algorithms . . . . . . . . . . . . . . . . . 43
6.3.1 . Application of SIO on Serial Manufacturing Process . . . . . . . . . . . . . . . 43
6.3.2 . Application of SIO on a Network . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6.3.3 . Application of SIO on a Network . . . . . . . . . . . . . . . . . . . . . . . . . . 45
6.4. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
7 CHANNEL ALLOCATION IN CELLULAR TELEPHONE NETWORKS
48
7.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
7.2. Overlapping Cells and Modeling Assumptions . . . . . . . . . . . . . . . . . . . . . . . 49
7.3. DR and DH Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
7.4. Performance Enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
7.4.1 . Simple Neighborhood (SN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
7.4.2 . Extended Neighborhood (EN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
7.4.3 . On-Line Implementation of SN and EN . . . . . . . . . . . . . . . . . . . . . . . 54
7.5. Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
7.6. Conclusions and Future Direction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
8 GROUND-HOLDING PROBLEM IN AIR TRAFFIC CONTROL
61
8.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
8.2. System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
8.3. Kanban-Smoothing (KS) Control Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
8.3.1 . Representation of (KS) as a Timed State Automaton . . . . . . . . . . . . . . . 65
8.3.2 . Evaluation of GHD Under (KS) . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
8.4. Airplane Scheduling Using Finite Perturbation Analysis . . . . . . . . . . . . . . . . . . 68
8.4.1 . FPA-Based Control Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
8.4.2 . Global Optimality of the FPA Approach . . . . . . . . . . . . . . . . . . . . . . 74
8.4.3 . Algorithm Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
8.5. Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
viii
8.5.1 . Performance of KS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
8.5.2 . Performance of L-FPA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
8.6. Conclusions and Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
9 EPILOGUE
82
9.1. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
9.2. Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
A SELECTED ALGORITHMS
85
A.1 S-DOP Pseudo Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
A.2 Time Warping Algorithm (TWA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
A.3 Finite Perturbation Algorithm for Serial Queueing Systems . . . . . . . . . . . . . . . 86
B PROOFS FROM CHAPTER 3
88
B.1 Proof of Theorem 3.2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
B.3 Proof of Lemma 3.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
B.4 Proof of Lemma 3.3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
B.6 Proof of Lemma 3.4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
B.7 Proof of Lemma 3.4.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
C PROOFS FROM CHAPTER 4
108
C.1 Proof of Theorem 4.2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
D PROOFS FROM CHAPTER 5
111
D.1 Proof of Theorem 5.4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
ix
E PROOFS FROM CHAPTER 8
114
E.1 Proof of Theorem 8.4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
E.2 Proof of Lemma 8.4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
E.3 Proof of Lemma 8.4.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
E.4 Proof of Theorem 8.4.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
BIBLIOGRAPHY
125
x
List of Figures
3.1
Evolutions of the modified D-DOP . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.1
The sample path constructability problem for DES . . . . . . . . . . . . . . . . . . . . 32
5.2
FPA System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.1
Manufacturing systems consisting of N stations in series . . . . . . . . . . . . . . . . . 41
6.2
Manufacturing network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
6.3
Queueing network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
6.4
Evolution of the SIO algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6.5
Evolution of the SIO algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
6.6
Ranking of the allocations picked by SIO . . . . . . . . . . . . . . . . . . . . . . . . . 46
6.7
Evolution of the (SIO) algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
6.8
Ranking of the allocations picked by SIO . . . . . . . . . . . . . . . . . . . . . . . . . 47
7.1
Overlapping Cell Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
7.2
Cell overlapping as a function of the cell radius . . . . . . . . . . . . . . . . . . . . . . 56
7.3
Call loss probabilities as a function of the traffic intensity ρ when the cell radius is 1.14 57
7.4
Call loss probabilities as a function of the traffic intensity ρ when the cell radius is 1.4 58
7.5
Average number of induced handoffs for EN and DH. . . . . . . . . . . . . . . . . . . . 58
7.6
Call loss probabilities as a function of the cell radius . . . . . . . . . . . . . . . . . . . 59
7.7
Call loss probabilities as a function of the parameter τ . . . . . . . . . . . . . . . . . . 59
7.8
Call loss probabilities for non-uniform traffic . . . . . . . . . . . . . . . . . . . . . . . . 60
8.1
Destination airport queueing model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
8.2
Stage representation for the KS control policy . . . . . . . . . . . . . . . . . . . . . . . 65
xi
8.3
Assignment of Ground-Holding Delay (GHD) under KS. (a) GHD until the beginning
of the next stage. (b) GHD until the end of the next stage. (c) GHD until the previous
airplane clears the runway. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
8.4
Timing diagram for ground-holding delay . . . . . . . . . . . . . . . . . . . . . . . . . 69
8.5
Global Optimality: (a) L-FPA result (b) Global Optimum. . . . . . . . . . . . . . . . 74
8.6
FPA-based algorithms. (a) Local FPA (L-FPA) Controller triggered by airplane departures, (b) Global FPA (G-FPA) Controller triggered at any time . . . . . . . . . . 76
8.7
Hourly landings at airport D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
8.8
Trade-off between airborne and ground delays for the KS controller . . . . . . . . . . . 78
8.9
Overall cost improvement under the KS scheme . . . . . . . . . . . . . . . . . . . . . . 79
8.10 Trade-off between airborne and ground delays for the L-FPA controller . . . . . . . . . 80
8.11 Overall cost improvement under the L-FPA scheme . . . . . . . . . . . . . . . . . . . . 81
E.1 τ ∗ for Case I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
E.2 τ ∗ for Case II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
E.3 τ ∗ for Case III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
E.4 Case IV subcases: (a) P > 0, (b) P < 0 . . . . . . . . . . . . . . . . . . . . . . . . . . 117
xii
Chapter 1
INTRODUCTION
This dissertation focuses on optimal resource allocation in the context of Discrete-Event Systems
(DES). These are systems, mainly “man-made”, where state changes are due to asynchronous occurrences of discrete events [12]. For example, consider a computer that processes jobs on a first
come first serve basis. For this system, the state is described by the number of jobs that are either
processed or wait to be processed. The state changes only when a new job is submitted to the
computer (job arrival) or when the computer finishes processing a job (job departure). Therefore,
for this system, all activity is observed only at the instants of a job arrival or departure. At any
other point in time, the state of the system remains unchanged.
For the purposes of this thesis, a “resource” corresponds to any means that can be used by a
“user” to achieve a goal. This interpretation of resource allocation can be applied to a broad class
of systems. For example, in the context of wireless communications, a “channel” is a resource, a
mobile phone is the user, and the goal is to allow two people that are physically located in distant
areas to communicate. Another example may be the automatic teller machine (resource) that allows
people (users) to get cash. Also, note that many entities may be viewed as either users or resources
depending on the context. For example, in a queueing system, buffers may be the resources while
servers may be the users. From another perspective though, servers may be viewed as the resources
while customers become the users. Finally, it is worth pointing out that several other problems may
be mapped into resource allocation problems. For example, consider the ground-holding problem in
air traffic control (see Chapter 8) where it is required to schedule airplane arrivals so that congestion
is avoided. In this case, the time that the runway can be used is divided into small intervals, each
representing a resource which is then allocated to each flight to facilitate its landing or takeoff.
1.1. Classification of Resource Allocation Problems
Resources, depending on their nature, can be classified as “continuous” or “discrete”. The basic
premise of this classification is whether a resource is divisible or not. Note however that this distinction may not always be clear. Take, for example, a computer link with 1 Mega bit per second
(Mbps) capacity. If a user (computer) can request any amount from the available capacity, then the
resource allocation problem is considered to be continuous. In this case, a user may request 100Kbps
while another may request 133Kbps. In another setting, the capacity allocation may be viewed as a
discrete resource allocation problem. For example, suppose the aforementioned link is divided into
1
10 discrete channels with capacity 100Kbps each. While the first user will request a single channel,
the second one, with the 133Kbps requirement, might decide that s/he will get a single channel and
suffer a loss in the quality of service (QoS), or may request two channels and pay a higher cost while
wasting the 67Kbps.
Another classification of resource allocation problems is whether they are “static” or “dynamic”.
In static problems, the objective function corresponds to a long, possibly infinite, time horizon. In
this case, the optimization problem is solved once and it is not revisited until the end of this long
interval. On the other hand, in dynamic optimization problems, the objective function is defined
over a finite horizon, the length of which is much smaller than the time horizon of the static problem.
In this case, the optimization problem is solved multiple times, once at the end of each short interval.
Therefore, a dynamic controller can reallocate resources as to optimize the objective function based
on the information available at the end of each interval, which of course, is going to be at least the
same as the information available at the beginning of interval one. Since in general, more information
leads to better decisions, a dynamic controller will perform better than a static one at the expense
of collecting more information.
Yet another classification for resource allocation problems refers to the environment that the
underlying system operates in. If the event times and state transitions are known exactly, and if
the objective function under any allocation can be calculated exactly, then the system is said to be
“deterministic”. On the other hand, if any of event times and state transitions are random variables,
then the system is said to be “stochastic”.
1.2. Dissertation Overview
In general, resources are scarce while there are many users that compete to gain control over them.
The first goal of this thesis is to derive ways in which resources can be allocated to users so that
an objective function is optimized. The main focus of this dissertation is the dynamic allocation
of discrete-resources in stochastic environments. Discrete Resource allocation problems often arise
in the context of Discrete Event Systems (DES). Classic examples include the channel allocation in
communication networks [15, 76] and the buffer allocation problem in queueing models [36, 78]. Our
second goal is to apply the derived techniques to real complex systems where finding a closed form
solution for the system’s performance, under any allocation, is very difficult if at all possible. Thus,
performance must be estimated through Monte Carlo simulation or by direct measurements made on
the actual system. In these systems one is forced to make the resource reallocation decisions based
on noisy estimates of the system’s performance1 .
While the area of stochastic optimization over continuous decision spaces is rich and usually
involves gradient-based techniques as in several well-known stochastic approximation algorithms
[46, 64], the literature in the area of discrete stochastic optimization is relatively limited. The known
approaches are based on (i) multi-armed bandit theory [28, 6], (ii) branch and bound techniques
[66, 53], and (iii) random search algorithms [77, 31, 3]. (for more details see Chapter 2). The main
difficulty in solving optimization problems over a discrete parameter set is that gradients are not
defined. Therefore, all mathematical tools that have been developed for solving continuous optimization problems simply do not apply to discrete optimization problems. Since gradient information is
not meaningful for the type of systems we are investigating, we substitute it with a finite difference,
1
When appropriate or to help the clarity of the presentation, we may assume a deterministic system.
2
which is of the form
∆L(n) = L(n + 1) − L(n),
(1.1)
where, L(n) is the value of the objective function under n resources. We derive necessary and
sufficient conditions that such finite difference must satisfy to achieve optimality, and we use it to
develop optimization algorithms that yield the optimal allocation.
The first problem we consider is that of allocating discrete resources to a set of users when the
objective function has a separable structure
J(x) =
N
X
Ji (xi )
(1.2)
i=1
where xi is the number of resources allocated to user i. For this problem, we identified necessary
and sufficient conditions that the optimal allocation must satisfy in a deterministic environment.
Based on these conditions, we develop a “descent” optimization algorithm that yields the optimal
allocation in a finite number of steps. Subsequently, we adapted the optimization algorithm for
stochastic environments, and showed that the modified algorithm converges to the optimal allocation
in probability. These results were published in [13]. Furthermore, we showed that the modified
algorithm, under some additional mild assumptions, converges to the optimal allocation almost
surely. This result has been accepted for publications in [23] where, it is also shown that the rate of
convergence of this algorithm for the class of “regenerative” systems is exponential.
Subsequently, we considered the problem of resource allocation for a system that does not have
the nice separable structure of (1.2), but satisfy a “smoothness” condition (defined in Chapter 4).
For this type of systems, we have developed an “incremental” optimization algorithm that yields the
optimal allocation in a finite number of steps when the performance estimates are known exactly.
Moreover, we modified the algorithm for stochastic environments and showed that it converges to
the optimal allocation in probability and, under some additional mild assumptions, almost surely.
These results have been accepted for publication in [57].
Two features of the resource allocation schemes we analyze are worth noting because of their
practical implications. All iterative reallocation steps are driven by ordinal comparisons, which are
particularly robust with respect to noise in the estimation process (see Chapter 2). Consequently, (i)
As in other ordinal optimization schemes (e.g., [37, 38]), convergence is fast because short estimation
intervals are adequate to guide allocations towards the optimal, and (ii) There is no need for “step
size” or “scaling” parameters which arise in algorithms driven by cardinal estimates of derivatives
or finite differences; instead, based on the result of comparisons of various quantities, allocations
are updated by reassigning one resource with respect to the current allocation. This avoids the
difficult problem of selecting appropriate values for these parameters, which is often crucial to the
convergence properties of such the algorithms.
In order to be able to apply the proposed optimization algorithms on-line, it is necessary to
develop efficient ways of calculating the finite differences of the form of (1.1). For this reason we
have developed the following two schemes:
(i) Concurrent Estimation (CE): a fairly general method of constructing sample paths under any
parameter by observing a sample path under a single parameter. The results of this work are
presented in [16].
3
(ii) Finite Perturbation Analysis (FPA): a more efficient but less general way of constructing a
sample path under some “neighboring” allocations2 . This scheme takes advantage of the special
structure of some systems.
Subsequently, we use the principles from the derived optimization schemes to solve the resource
allocation problems for three different applications. The first application is from the area of kanbanbased manufacturing systems where kanban (tokens) are used to maintain low work-in-process inventory (WIP). In that context, kanban constitute discrete resources that are allocated to the various
productions stations (users) as to optimize an objective function while maintaining low WIP. To
solve this problem, we used the “incremental” optimization algorithm described in Chapter 4 in
conjunction with the FPA scheme mentioned earlier as described in [57].
Our second application comes from the area of wireless communications and deals with problem of
channel allocating in cellular telephone networks. In this case, we assume the model of “overlapping”
cells described in [26] and apply a variation of the descent algorithm to distribute subscribers over
the various base stations as to minimize the probability that an arriving call will be lost due to the
unavailability of a free channel.
Finally, the last application is from the area of air traffic control. In that context it is generally
true that airborne delays are more expensive than ground holding delays, hence the objective is to
determine the ground holding delay of each airplane as to minimize the overall waiting cost. For
this problem we propose two solutions. The first approach is referred to as the Kanban Smoothing
(KS) flow control policy, first proposed in [56]. KS is designed to “smooth” an arrival process by
systematically reducing its variance. The second approach uses FPA to determine the change in the
value of the cost function if a new airplane is allowed to arrive at the destination airport as a function
of its ground delay. Hence, it determines the delay than minimizes that cost. These schemes are also
presented in [58].
1.3. Contributions
The contributions of this dissertation are the following.
• For the separable convex resource allocation problem, we have extended the results of [14]
in two ways. First, in the deterministic case, we derived several properties of the on-line
descent optimization algorithm. Second, we have modified the algorithm so that it is applicable
in a stochastic environment and have investigated its convergence properties. The resulting
stochastic optimization algorithm uses pseudo-sensitivities (finite differences (1.1)) to determine
the next allocation and is considerably different from the already existing approaches which
are based on Bandit theory, branch and bound techniques and random search.
• Using the pseudo-sensitivities again, we have shown that the algorithm INCREASE described in
[41] for deterministic, separable, convex objective functions can be also used for non-separable
resource allocation problems, but which satisfy the “smoothness” condition defined in Chapter 4. In addition, we modified the algorithm so that it can be used in stochastic environments
and have proved that it converges to the optimal allocation.
2
These are the allocations resulting by adding or removing a single resource from the allocation of the observed
sample path.
4
• We have developed the Time Warping Algorithm (TWA) which implements “concurrent estimation/simulation” and can be used to solve the sample path constructability problem for
DES (see Chapter 5). Even though the basic idea behind concurrent estimation/simulation
is not new (it was often implied in the literature, e.g. see [12, 80]) it was never developed for
general systems. TWA is a fairly general simulation algorithm which can solve the sample path
constructability problem for DES with arbitrary lifetime distributions unlike Augmented System Analysis (ASA) [10] and Standard Clock (SC) [74] which require exponentially distributed
lifetimes.
• We illustrate a procedure that can be used to obtain estimates of finite perturbations for systems
where the dynamics can be described though Lindley-type recursions.
• We apply the resource allocation techniques we developed to cellular telephone networks.
Specifically, we extend the methodologies for channel allocations when “overlapping cells” are
allowed. In this context, Karlsson [44] and Everitt [26] have developed heuristic algorithms
for channel allocation for overlapping cells. We have developed two new channel allocation
algorithms which are based on the aforementioned discrete resource allocation algorithms and
have used simulation to show that our algorithms improve the system performance which is
usually defined as the call loss probability.
• Another area where we apply our resource allocation techniques is the ground-holding problem
in air traffic control. For this problem, we developed two new and efficient approaches for
solving this problem, the Kanban-Smoothing and the FPA approach, which are unrelated with
the linear programming type approaches that have been used to attack this problem.
1.4. Organization of the Dissertation
This dissertations can be divided into two main parts. The first part deals with the development
of new methodologies for attacking resource allocation problems while the second part deals with
applications of resource allocation techniques on real systems.
The first part starts with Chapter 2 which reviews some of the literature on approaches for solving
stochastic optimization problems. Furthermore, it presents some relevant results that will be used
in the development of the methodologies proposed in this thesis. Subsequently, in Chapter 3 we
address the problem of allocating identical resources to a set of users when the objective function to
be optimized is separable and convex and propose a “descent” optimization algorithm. In Chapter 4
we relax the separability assumption and propose an “incremental” algorithm that yields the optimal
allocation under the “smoothness” condition. In both chapters, we first address the deterministic
problem and show that the proposed algorithms yield the optimal allocation in a finite number of
steps. Subsequently, we address the stochastic problem and show that under some mild assumptions
the proposed algorithms converge to the global optimum. The first part ends with Chapter 5 where
we developed techniques for obtaining the finite differences of (1.1), by observing a single sample
path under a single parameter.
The second part includes the applications. In Chapter 6 we deal with the problem of allocating a
finite number of kanban to the various production stages of a manufacturing system. In Chapters 7 we
address the issue of allocating communication channels to base stations in a TDMA/FDMA cellular
communication network and in Chapter 8 we address the ground holding problem in air traffic
5
control. Summary and conclusions are presented in Chapter 9. Finally, the appendices include some
of the derived algorithms as well as the proofs of all theorems and lemmas presented in the main
part of the thesis.
6
Chapter 2
BACKGROUND ON STOCHASTIC
OPTIMIZATION
This chapter describes the problem of stochastic optimization and reviews some of the attempts of
solving it.
2.1. Problem Formulation
In many resource allocation problems, we wish to find the allocation x that optimizes a performance
measure J(x), where x belongs in a finite set X . For many real systems however, it is often impossible
to derive a closed form solution for J(x) and as a result, when evaluating system performance, we
are forced to resort to estimation obtained through simulation or on-line system observation over an
interval of length t. For the purposes of this thesis, we assume that the effect of estimation noise is
decreased as the observation interval t increases.
A2.1: Ergodicity Assumption: For every allocation x the performance estimate Jˆt (x) taken over
a sample path of length t approaches its true value as t goes to infinity. That is:
lim Jˆt (x) = J(x), a.s.
t→∞
This assumption is mild and in the context of discrete-event systems it usually holds. Note that the
performance measures of DES are often expressed in the form of an expectation
J(x) = E[L(x, ξ)]
(2.1)
where L(x, ξ) is a random variable corresponding to the system performance under parameter x
while ξ represents the uncertainty. When evaluating (2.1) only realizations of L(x, ξ) are available
so a standard approach of estimating the performance of such systems if though sampling, i.e.,
t
1X
ˆ
J(x)
=
Li (x, ξi ).
t i=1
7
(2.2)
In this case, assuming that the realizations Li (x, ξ) form an i.i.d process with E[Li (x, ξ)] < ∞ and
V ar(Li (x, ξ)) < ∞ then A2.1 holds due to
√ the strong law of large numbers. However, it is also true
that the rate of convergence is only O(1/ t) in the sense that
µ
ˆ
E[J(x)
− E[L(x, ξ)]]2 =
¶
µ
1
1
V ar(L(x, ξ)) = O √
t
t
¶
(2.3)
Note that for large systems, this rate of convergence is prohibitively slow. It implies that to obtain an
accurate performance estimate under a single parameter, very long simulations are required. Then,
it is easy to see that for systems with a large number of feasible allocations it would be practically
impossible to obtain estimates under all possible allocations.
Next, we present some of the techniques that were developed to solve the problem of stochastic
optimization.
2.2. Ordinal Comparison
The ordinal comparison technique [37] is based on two main principles.
1. Goal Softening.
2. Ordinal rather than cardinal optimization.
Goal softening is a realization that in many applications rather than spending a lot of resources in
finding the optimal allocation it is often more desirable to find an allocation that is good enough
with the minimum amount of effort. In other words, find an allocation that is within the top α% of
all possible design.
The second principle of ordinal comparison suggests comparing the relative goodness (rank) between different designs without knowing the exact values of the corresponding performance measures.
For example suppose that we want to choose between two allocations x1 and x2 and suppose that
ˆ
J(x1 ) < J(x2 ). Then, even when the performance estimates J(x)
have a large noise component,
ˆ
ˆ
it is highly probable that J(x1 ) < J(x2 ). This suggests that it is possible to perform the resource
allocation without having to obtain accurate estimates of J(x).
Next, we present without proof, two results from [22] that reveal some of the properties of ordinal
comparison and which will be proven useful in the sequel. The first lemma is a direct consequence
of A2.1, and suggests that as the observation interval t increases, then the probability that the
performance estimates will give the correct order goes to one.
Lemma 2.2.1 Let J(x1 ) < J(x2 ) and suppose that assumption A2.1 holds. Then,
lim Pr[Jˆt (x1 ) ≥ Jˆt (x2 )] = 0, and
t→∞
lim Pr[Jˆt (x1 ) < Jˆt (x2 )] = 1
k→∞
The second lemma establishes the rate of convergence for comparing δˆt against 0.
Lemma 2.2.2 Suppose that {δˆt , t ≥ 0} is a stochastic process satisfying
8
(a) limt→∞ δˆt = δ, a.s.;
(b) limt→∞ E[δˆt ] = δ;
(c) V ar[δˆt ] = O
³ ´
1
t
.
If δ > 0, then Pr[δˆt ≤ 0] = O
³ ´
1
t
.
The assumption in Lemma 2.2.2 is very mild and almost always satisfied in the simulation or direct
sample path observation of discrete-event dynamic systems.
Another interesting results proven by Dai [22] indicates that for the class of “Regenerative Systems1 ” the order of the performance estimates converges at a rate which is exponentially fast. Finally,
note that the principles and properties of ordinal comparison can be used to complement other optimization schemes. This is true of the algorithms described next, as well as, for some of the schemes
that will be developed later in the sequel.
2.3. Stochastic Ruler (SR)
The Stochastic Ruler [77] is motivated by Simulated Annealing [1] method. In essence, this algorithm
defines a sequence of allocations {xi , i = 1, 2, · · ·} and for every allocation it defines a neighborhood
N (xi ) ⊂ X . To determine the next allocation, xi+1 the algorithm randomly picks an allocation
y ∈ N (xi ) and compares its performance H(y) to a random variable Θ(a, b) which is uniformly
distributed in the interval (a, b). (Θ(a, b) represents the stochastic ruler). If the system’s performance
is better than the random variable, then, SR adopts y as the new allocation (xi+1 = y) otherwise
xi+1 = xi .
Application of SR is complicated for two reasons. First, a priori information is needed on the
range of the performance estimates in order to determine the stochastic ruler range (a, b). Second,
it is necessary to define the neighborhood structure N (xi ) for all i = 1, 2, · · ·. Clearly, identifying a
good neighboring structure will benefit the algorithm performance, however, in general this is a very
difficult task. These restrictions of SR have motivated the development of Stochastic Comparison
which is described next.
2.4. Stochastic Comparison (SC)
The Stochastic Comparison approach [32] uses the principles of random search [9] to overcome the
shortcomings of SR. The SR approach also defines a sequence of allocation {xi , i = 1, 2, · · ·} but in
order to determine the next allocation, it randomly picks an allocation y from the entire search space
and compares its performance to the performance of the current allocation. If the performance of
the new allocation is better than the performance of the current allocation, then SC adopts the new
allocation (xi+1 = y) otherwise xi+1 = xi .
SC eliminates the neighborhood identification problem by always selecting an allocation from
the entire search space. In other words, the neighborhood of any allocation includes every feasible
1
See [43] for details on regenerative systems
9
allocation. Furthermore, SC does not require any a priori information on the system performance
since the comparison is always between the performances of the current allocation and the allocation
under test. Different variations of SR and SC are proposed in [3] and in references therein.
2.5. Nested Partitioning (NP)
The Nested Partitioning approach [68, 67] combines the principles of random search with branchand-bound techniques to determine the global optimum allocation. The algorithm consists of four
basic steps. (i) partitioning, (ii) random sampling, (iii) identification of promising region, and (iv)
backtracking. Specifically, the algorithm works as follows. First, it divides the entire search space
into M regions and randomly samples allocations from each region σi0 , i = 1, · · · , M . Using the
obtained samples, it determines the most promising region (say σ10 ). Subsequently, it divides the
selected region σ10 into M smaller regions σi1 , i = 1, · · · , M and aggregates all other regions into a
1
single region σM
+1 . Then it randomly samples allocations from all M + 1 regions and it identifies
the new most promising region. If the most promising region is any of the regions 1 to M , it divides
that region into another M sub-regions and again aggregates the remaining regions into a single
region M + 1. This continues until the most promising region is a singleton. In the event that the
surrounding region M + 1 becomes the most promising region, then NP backtracks to a larger region.
Implementation of the four steps of NP can vary depending on the application. Note however
that the region partitioning rule is crucial to the algorithm’s performance. If the partitioning is
such that most of the good allocations tend to be clustered in the same sub-region, it is likely that
the algorithm will concentrate its effort in these regions. On the other hand, a bad selection of the
region partitioning rule may have adverse effects on its performance. Unfortunately, identifying a
good region partitioning strategy is not a trivial task.
2.6. Multi-armed Bandit Theory
The bandit theory approach [28] addresses a slightly different problem than the stochastic optimization problem described earlier. In this case, rather than allocating several resources to the users,
the goal is to determine how to dynamically allocate a single resource among all possible users as to
optimize the objective function over time.
In the basic version of the multi-armed bandit problem there are N possible choices, each carrying
a random reward ri , i = 1, · · · , N derived from a distribution fri (ri ). At the nth iteration, the system
reward R(n) is given by the reward of the selected choice, i.e. R(n) = rj , where j is the selected
choice. The objective is to optimize the discounted reward over an infinite horizon
R=
∞
X
β n R(n)
n=1
where 0 < β < 1 is a discount factor.
The solution suggested by Gittins and Jones [29] involves the association of an index νi (n) on
each option and hence select the choice with the largest index. The calculation of the index νi (n)
which depends on the underlying distributions of each option, is beyond the scope of this thesis and
it is omitted. Variations of the problem appear in [6] and references therein.
10
2.7. Noise Effects
For all of the algorithms described above, in order to achieve convergence to the optimal allocation
it is necessary that the effect of noise in the performance estimates is gradually reduced. There are
several possible approaches that can allow us to achieve this goal. First, if the performance measure
of interest satisfies assumption A2.1, then, at every iteration one can increase the length of the
observation interval. This approach will be used in the methodologies that will be presented later in
Chapters 3 and 4.
Another possibility is the approach described in [31]. In this case, the observation interval is
kept constant but the number of comparisons is increased. So, a new allocation is adopted only if
its performance is found better than the performance of the current allocation more than Mk times,
where Mk is a monotonically increasing sequence.
11
Chapter 3
DESCENT ALGORITHMS FOR
DISCRETE-RESOURCE
ALLOCATION
In this chapter we develop a descent optimization algorithm that can be used for discrete resource
allocation problems with separable and convex structure. This algorithm is very efficient and can
be used in real time (on-line) application. Furthermore, it is shown that it converges to the optimal
under both, deterministic and stochastic environments.
3.1. Introduction
In this chapter we consider the problem of K identical resources to be allocated over N user classes so
as to optimize some system performance measure (objective function). Let the resources be sequentially indexed so that the “allocation” is represented by the K-dimensional vector s = [s1 , · · · , sK ]T
where sj ∈ {1, · · · , N } is the user class index assigned to resource j. Let S be the finite set of feasible
resource allocations
S = {[s1 , · · · , sK ] : sj ∈ {1, · · · , N }}
where “feasible” means that the allocation may have to be chosen to satisfy some basic requirements
such as stability or fairness. Let Li (s) be the class i cost associated with the allocation vector s. The
class of resource allocation problems we consider is formulated as:
(RA1)
min
s∈S
N
X
Li (s)
i=1
(RA1) is a special case of a nonlinear integer programming problem (see [41, 59] and references
therein) and is in general NP-hard [41]. However, in some cases, depending upon the form of the
objective function (e.g., separability, convexity), efficient algorithms based on finite-stage dynamic
programming or generalized Lagrange relaxation methods are known (see [41] for a comprehensive
discussion on aspects of deterministic resource allocation algorithms). Alternatively, if no a priori
information is known about the structure of the problem, then some form of a search algorithm is
employed (e.g., Simulated Annealing [1], Genetic Algorithms [39]).
12
3.2. Characterization of the Optimal Allocation
In order to specify the class of discrete resource allocation problems we shall study in this chapter,
we define
ni =
K
X
1[sj = i]
i = 1, · · · , N
(3.1)
j=1
where 1[·] is the standard indicator function and ni is simply the number of resources allocated to
user class i under some allocation s. We shall now make the following assumption:
A3.1: Li (s) depends only on the number of resources assigned to class i, i.e., Li (s) = Li (ni ).
This assumption asserts that resources are indistinguishable, as opposed to cases where the identity
of a resource assigned to user i affects that user’s cost function. Even though A3.1 limits the applicability of the approach to a class of resource allocation problems, it is also true that this class
includes a number of interesting problems. Examples include: (a) Buffer allocation in parallel queueing systems where the blocking probability is a function of the number of buffer slots assigned to each
server (for details, see [54]), (b) Cellular systems where the call loss probability of each cell depends
only on the number of channels assigned to each cell (see also Chapter 7), (c) Scheduling packet
transmissions in a mobile radio network, where the resources are the time slots in a transmission
frame (see [15, 76, 55]).
Under A3.1, we can see that an allocation written as the K-dimensional vector s = [s1 , · · · , sK ],
can be replaced by the N -dimensional vector s = [n1 , · · · , nN ]. In this case, the resource allocation
problem (RA1) is reformulated as
(RA2)
min
s∈S
N
X
Li (ni )
s.t.
i=1
N
X
ni = K
i=i
−1)!
Although (RA2) is not NP-hard, the state space is still combinatorially explosive (|S| = (K+N
K!(N −1)! ),
so that an exhaustive search of the state space is generally not feasible. Several off-line algorithms
based on the theory of generalized Lagrange multipliers are presented in Chapter 4 of [41] where the
optimal solution can be determined in polynomial time. Our objective however, is to solve stochastic
resource allocation problems where the cost function is not available in closed-form. This requires that
(a) We resort to estimates of Li (ni ) and ∆Li (ni ) for all i = 1, · · · , N over some observation period,
and (b) Iterate after every such observation period by adjusting the allocation which, therefore,
must remain feasible at every step of this process. It is for this reason that we wish to derive on-line
discrete optimization algorithms. We shall first deal with issue (b) above in section 3.3. We will then
address issue (a) in section 3.4.
In addition to A3.1, we will make the following assumption regarding the cost functions of interest:
A3.2: For all i = 1, · · · , N , Li (ni ) is such that ∆Li (ni + 1) > ∆Li (ni ) where
∆Li (ni ) = Li (ni ) − Li (ni − 1),
ni = 1, · · · , K
with boundary values ∆Li (0) ≡ −∞ and ∆Li (N + 1) ≡ ∞
13
(3.2)
This assumption is the analog of the usual convexity/concavity requirement for the vast majority
of gradient-driven optimization over continuous search spaces. It is the assumption that typically
allows an extremum to be a global optimum. The alternative is to settle for local optima. From a
practical standpoint, most common performance criteria in systems where resource allocation arises
are quantities such as throughput, mean delay, and blocking probability which generally satisfy such
properties. Next, we present two key results that will provide a stopping condition for the proposed
algorithm.
Theorem 3.2.1 Under assumptions A3.1 - A3.2, an allocation ¯s = [¯
n1 , · · · , n
¯ N ] is a global optimum
(i.e., a solution of (RA2)) if and only if
∆Li (¯
ni + 1) ≥ ∆Lj (¯
nj ) for any i, j = 1, · · · , N
(3.3)
The proof of the lemma in included in Appendix B.
Note that Theorem 3.2.1 gives a necessary and sufficient condition that the optimal allocation
must satisfy in terms of the cost differences ∆Li (·) for i = 1, · · · , N in only a small set of feasible
allocations, namely the neighborhood of the optimal allocation B(s∗ )1 . Also, s∗ denotes the solution
of the optimization problem (RA2), i.e., s∗ is such that L(s∗ ) ≤ L(s) for all s ∈ S where S is
redefined as
(
)
S=
s = [n1 , · · · , nN ] |
N
X
ni = K .
i=i
Next, we will derive a different necessary and sufficient condition for global optimality in solving
(RA2), expressed in terms of maxi=1,···,N {∆Li (ni )}. As will be seen in the proof of Theorem 3.2.2,
necessity still relies on assumptions A3.1-A3.2 alone, but sufficiency requires an additional technical
condition:
A3.3: Let [¯
n1 , · · · , n
¯ N ] be an allocation such that:
max {∆Li (¯
ni )} ≤ max {∆Li (ni )}
i=1,···,N
i=1,···,N
for all s = [n1 , · · · , nN ] ∈ S. If i∗ = arg maxi=1,···,N {∆Li (¯
ni )}, then ∆Li∗ (¯
ni∗ ) > ∆Lj (¯
nj ) for
∗
all j = 1, · · · , N , j 6= i
This assumption guarantees a unique solution to (RA2) and, as mentioned above, it is only used
to prove sufficiency of Theorem 3.2.2. If the condition is violated, i.e. there is a set of optimal
allocations, then, in the deterministic case, the algorithm will converge to one member of the set
dependent on the initial allocation. In the stochastic case, the algorithm will oscillate between the
members of the set as mentioned in the remark at the end of Section 3.4.
Theorem 3.2.2 Under assumptions A3.1-A3.2, if an allocation ¯s = [¯
n1 , · · · , n
¯ N ] is a global optimum
(i.e., a solution of (RA2)) then:
max {∆Li (¯
ni )} ≤ max {∆Li (ni )}
i=1,···,N
i=1,···,N
(3.4)
for all s ∈ S. If in addition A3.3 holds, then (3.4) also implies that ¯s is a solution of (RA2).
1
B(s) = {x : x = s + ei − ej , i, j = 1, · · · , N } and ei is an N -dimensional vector with all of its elements equal to 0
except the ith one which is equal to 1.
14
The proof of the lemma in included in Appendix B
Note that Theorem 3.2.2 provides a characterization of the optimal allocation in terms of only
the largest ∆Li (·) element in the allocation. What is interesting about condition (3.4) is that it can
be interpreted as the discrete analog to continuous variable optimization problems. In such problems
the partial derivatives of the cost function with respect to control variables must be equal (e.g.,
see [27]). In order to derive a similar result for a discrete optimization problem, one must replace
derivatives by finite cost differences, such as the quantities ∆Li (·), i = 1, · · · , N , defined in (3.2)
and keep them as close as possible. This is expressed in terms of the maximum value of such finite
differences at the optimal point in condition (3.4).
Having established some necessary and sufficient conditions that characterize the optimal allocation, namely Theorems 3.2.1 and 3.2.2, our next task is to develop an algorithm that iteratively
adjusts allocations on-line. These conditions then serve to determine a stopping condition for such
an algorithm, guaranteeing that an optimal allocation has been found.
3.3. Deterministic Descent On-Line Optimization Algorithm
In this section, we present an iterative process referred to as deterministic descent optimization
process (D-DOP), for determining the solution to (RA2). In particular, we generate sequences
{ni,k }, k = 0, 1, · · · for each i = 1, · · · , N as follows. We define a set C0 = {1, · · · , N } and initialize
all sequences so that an allocation s0 = [n1,0 , · · · , nN,0 ] is feasible. Then, let


 ni,k − 1
ni,k+1 =
n
i,k

 n
i,k
and
(
Ck+1 =
+1
if i = i∗k and δk > 0
if i = jk∗ and δk > 0
otherwise
Ck − {j ∗ }
Ck
if δk ≤ 0
otherwise
(3.5)
(3.6)
where i∗k , jk∗ and δk are defined as follows:
i∗k = arg max{∆Li (ni,k )}
(3.7)
jk∗ = arg min{∆Li (ni,k )}
(3.8)
δk = ∆Li∗k (ni∗k ,k ) − ∆Ljk∗ (njk∗ ,k + 1)
(3.9)
i∈Ck
i∈Ck
To complete the specification of this process, we will henceforth set ∆Li (0) ≡ −∞ for all i = 1, · · · , N .
Finally, note that ties in equations (3.7) and (3.8) (i.e., if there are more than one indices that qualify
as either i∗k or jk∗ ) can be broken arbitrarily but for simplicity, we will adopt the following convention:
If i∗k = p and δk ≤ 0, then i∗k+1 = p
(3.10)
This statement is trivial if the maximization in (3.7) gives a unique value. If, however, this is not
the case then we simply leave this index unchanged as long as δl ≤ 0 for l > k, which implies that
all ∆Li (ni,k ) values remain unchanged.
15
3.3.1. Interpretation of D-DOP
Looking at (3.7), i∗k identifies the user “most sensitive” to the removal of a resource among those
users in the set Ck , while in (3.8), jk∗ identifies the user who is “least sensitive”. Then, (3.5) forces a
natural exchange of resources from the least to the most sensitive user at the kth step of this process,
provided the quantity δk is strictly positive (an interpretation of δk is provided below). Otherwise,
the allocation is unaffected, but the user with index jk∗ is removed from the set Ck through (3.6).
Thus, as the process evolves, users are gradually removed from this set. As we will show in the next
section, the process terminates in a finite number of steps when this set contains a single element
(user index), and the corresponding allocation is a globally optimal one.
As defined in (3.9), δk represents the “potential improvement” (cost reduction) incurred by a
transition from allocation sk to sk+1 . That is,
δk = L(sk ) − L(sk+1 )
(3.11)
which is seen as follows:
L(sk ) − L(sk+1 ) =
N
X
i=1
Li (ni,k ) −
N
X
Li (ni,k+1 )
i=1
= Li∗k (ni∗k ,k ) + Ljk∗ (njk∗ ,k ) − Li∗k (ni∗k ,k − 1) − Ljk∗ (njk∗ ,k + 1)
= ∆Li∗k (ni∗k ,k ) − ∆Ljk∗ (njk∗ ,k + 1) = δk
Note that if δk > 0, which implies that the cost will be reduced by allocation sk+1 , then the reallocation is implemented in (3.5). If, on the other hand, δk ≤ 0, this implies no cost reduction under
the candidate allocation sk+1 , and sk remains unchanged as seen in (3.5).
3.3.2. Properties of the Process D-DOP
We begin by establishing in Lemma 3.3.1 below a number of properties that the sequences {ni,k } and
{Ck } in (3.5) and (3.6) respectively satisfy. Based on these properties, we will show that {sk }, where
sk = [n1,k , · · · , nN,k ], converges to a globally optimal allocation. We will also use them to determine
an upper bound for the number of steps required to reach this global optimum.
Lemma 3.3.1 The D-DOP process defined by (3.5)-(3.9) is characterized by the following properties:
P1. ∆Li∗k (·) is non-increasing in k = 0, 1, · · ·, that is,
∆Li∗k+1 (ni∗k+1 ,k+1 ) ≤ ∆Li∗k (ni∗k ,k ) for all k = 0, 1, · · ·
(3.12)
P2. ∆Ljk∗ (·) is non-decreasing in k = 0, 1, · · ·, that is,
∗
∗
∆Ljk+1
(njk+1
,k+1 ) ≥ ∆Ljk∗ (njk∗ ,k ) for all k = 0, 1, · · ·
(3.13)
∗ = p and p 6= i∗ for all k < l < m.
P3. Let p = i∗k and suppose there exists some m > k such that jm
l
Then,
Cm+1 = Cm − {p}
(3.14)
16
P4. Let p = jk∗ and suppose there exists some m > k such that i∗m = p and p 6= jl∗ for all k < l < m.
Then, there exists some q, 1 ≤ q ≤ N − 1, such that
(
Cm+q+1 =
Cm+q − {p}
{p}
if |Cm+q+1 | > 1
if |Cm+q+1 | = 1
(3.15)
P5. Let i∗k = p. Then,
np,m ≤ np,k for any k = 0, 1, · · · and for all m > k
(3.16)
np,m ≥ np,k for any k = 0, 1, · · · and for all m > k
(3.17)
P6. Let jk∗ = p. Then,
The proof of the lemma in included in Appendix B.
Properties P3 and P4 are particularly important in characterizing the behavior of D-DOP and
in establishing the main results of this section. In particular, P3 states that if any user p is identified
∗ , m > k, then this user is immediately removed from
as i∗k at any step k of the process and as jm
the Cm set. This also implies that np,m is the number of resources finally allocated to p. Property
P4 is a dual statement, with a different implication. Once a user p is identified as jk∗ at some step
k and as i∗m , m > k, then there are two possibilities: Either p will be the only user left in Cl ,
l > m and, therefore, the allocation process will terminate, or p will be removed from Cl for some
m < l < m + N − 1.
This discussion also serves to point out an important difference between P5 and P6, which, at
first sight, seem exact duals of each other. In P5, a user p = i∗k for some k will never in the future
take any resources from other users. On the other hand, in P6 it is not true that a user p = jk∗ will
never in the future give away any resources to other users; rather, as seen in P4, user p may give
away at most one resource to other users. This happens if δm > 0 when p = i∗m , m > k, as is clear
from the proof of P4, since np,m = np,k+1 = np,k + 1 and then np,m+1 = np,m − 1.
3.3.3. Convergence of the D-DOP Process
The next result establishes an upper bound in the number of steps required for the process D-DOP
to converge to a final allocation where, a final allocation sL is defined to be one at step L with
|CL | = 1. Furthermore, in this section we show that the final allocation is also a global optimum.
¯ in M steps such that |C|
¯ = 1 and
Lemma 3.3.2 The D-DOP process reaches a final state (¯s, C)
M ≤ K + 2(N − 1).
The proof of the lemma is included in Appendix B.
Theorem 3.3.1 Let ¯s = [¯
n1 , · · · , n
¯ N ] be the final allocation of the D-DOP process. Then, ¯s is a
global optimum (i.e., a solution of (RA2)).
The proof of the theorem is also included in Appendix B.
Corollary 3.3.1 The D-DOP process defines a descent algorithm, i.e., L(sk ) ≥ L(sl ) for any l > k
The proof of the corollary follows immediately from equations (3.5) and (3.9) and the fact that
δk = L(sk ) − L(sk+1 ) in (3.11).
17
3.4. Stochastic On-Line Optimization Algorithm
In this section, we turn our attention to discrete resource allocation performed in a stochastic setting.
In this case, as mentioned in Chapter 2, the cost function L(s) is usually an expectation whose exact
value is difficult to obtain (except for very simple models). We therefore resort to estimates of L(s)
which may be obtained through simulation or through direct on-line observation of a system. In
ˆ t (s) an estimate of L(s) based on observing a sample path for a time
either case, we denote by L
period of length t. We are now faced with a problem of finding the optimal allocation using the noisy
ˆ t (s).
information L
It should be clear that D-DOP described by equations (3.5)-(3.9) does not work in a stochastic
ˆ t (s). For instance, suppose that δk > 0;
environment if we simply replace L(s) by its estimate L
however, due to noise, we may obtain an estimate of δk , denoted by δˆk , such that δˆk ≤ 0. In this
case, rather than reallocating resources, we would remove a user from the C set permanently. This
implies that this user can never receive any more resources, hence the optimal allocation will never
be reached.
Two modifications are necessary. First, we will provide a mechanism through which users can
re-enter the C set to compensate for the case where a user is erroneously removed because of noise.
Second, we will progressively improve the estimates of the cost differences ∆L(s) so as to eliminate
the effect of estimation noise; this can often be achieved by increasing the observed sample path
length over which an estimate is taken. We will henceforth denote the length of such a sample path
at the kth iteration of our process by f (k).
The following is the modified stochastic descent optimization process ((S-DOP) adjusted for a
stochastic environment. The state is now denoted by {(ˆsk , Cˆk )}, with ˆsk = [ˆ
n1,k , ..., n
ˆ N,k ]. After
proper initialization, at the kth iteration we set:


ˆ i,k − 1
 n
n
ˆ i,k+1
if i = î∗k and δˆk (î∗k , ˆjk∗ ) > 0
=
n
ˆ + 1 if i = ˆjk∗ and δˆk (î∗k , ˆjk∗ ) > 0
 i,k

n
ˆ i,k
otherwise
and
Cˆk+1 =

ˆ

jk∗ }
 Ck − {ˆ
if δˆk (î∗k , ˆjk∗ ) ≤ 0
if |Cˆk | = 1
otherwise
C
0

 ˆ
Ck
(3.18)
(3.19)
where
ˆ f (k) (ˆ
î∗k = arg max{∆L
ni,k )}
i
(3.20)
ˆ f (k) (ˆ
ˆjk∗ = arg min{∆L
ni,k )}
i
(3.21)
i∈Cˆk
i∈Cˆk
ˆ f∗(k) (ˆ
ˆ f∗(k) (ˆ
δˆk (î∗k , ˆjk∗ ) = ∆L
nî∗ ,k ) − ∆L
nˆj ∗ ,k + 1)
î
ˆ
j
k
k
k
k
(3.22)
It is clear that equations (3.18)-(3.22) define a Markov process {(ˆsk , Cˆk )}, whose state transition
probability matrix is determined by î∗k , ˆjk∗ , and δˆk (î∗k , ˆjk∗ ). Before proceeding, let us point out that
the only structural difference in S-DOP compared to the deterministic case D-DOP of the previous
section occurs in equation (3.19), where we reset the Cˆk set every time that it contains only one
element. By doing so, we allow users that have been erroneously removed from the Cˆk set due to
18
noise to re-enter the user set at the next step. Another difference is of course that the actual values of
ˆ f (k) (·). An implementation of S-DOP is included
all ∆Li (·) are now replaced by their estimates, ∆L
i
in Appendix A.
3.4.1. Properties of the S-DOP Process
Before we begin describing the properties of S-DOP, let as make some assumptions on the behavior
of the objective function under consideration. As stated earlier, the second modification we impose
is to eliminate the effect of estimation noise by increasing the observed sample path length as the
number of iterations increases. In addition to the ergodicity assumption (A2.1) we make the following
assumption.
A3.4: Let δk (i, j) = ∆Li (ˆ
ni,k ) − ∆Lj (ˆ
nj,k + 1). For every δk (î∗k , ˆjk∗ ) = 0, there is a constant p0 such
that
Pr[δˆk (î∗k , ˆjk∗ ) ≤ 0 | δk (î∗k , ˆjk∗ ) = 0, (ˆsk , Cˆk ) = (s, C)] ≥ p0 > 0
for any k and any pair (s, C).
Assumption A3.4 guarantees that an estimate does not always give one-side-biased incorrect
information. This assumption is mild and it usually holds in the context of discrete-event dynamic
systems where such problems arise.
Now we are ready to describe some useful properties of the process {(ˆsk , Cˆk )} in the form of the
following lemmas, the proofs of which are included in Appendix B. These properties pertain to the
asymptotic behavior of probabilities of certain events crucial in the behavior of {(ˆsk , Cˆk )}.
First, let
dk (s, C) = 1 − Pr[L(ˆsk+1 ) ≤ L(ˆsk ) | (ˆsk , Cˆk ) = (s, C)]
(3.23)
so that [1 − dk (s, C)] is the probability that either some cost reduction or no change in cost results
from the kth transition in our process (i.e., the new allocation has at most the same cost). The next
lemma shows that the probability of this event is asymptotically 1, i.e., our process corresponds to
an asymptotic descent resource allocation algorithm.
Lemma 3.4.1 For any s = [n1 , n2 , ..., nN ] ∈ S and any C,
lim dk (s, C) = 0
(3.24)
dk = sup max di (s, C).
(3.25)
k→∞
Moreover, define
i≥k (s,C)
Then dk ≥ dk (s, C), dk is monotone decreasing, and
lim dk = 0.
(3.26)
k→∞
Next, given any state (ˆsk , Cˆk ) reached by the S-DOP process, define
½
Amax
k
¾
= j | ∆Lj (ˆ
nj,k ) = max{∆Li (ˆ
ni,k )} ,
i
19
(3.27)
½
Amin
k
¾
= j | ∆Lj (ˆ
nj,k ) = min{∆Li (ˆ
ni,k )} .
(3.28)
i
Observe that Amax
and Amin
are, respectively, the sets of indices i∗k and jk∗ defined in (3.7) and (3.8)
k
k
of the deterministic optimization process (with exact measurement). Recall that i∗k , jk∗ need not be
unique at each step k, hence the need for these sets. We then define
h
i
h
i
ak (s, C) = 1 − Pr î∗k ∈ Amax
| (ˆsk , Cˆk ) = (s, C) ,
k
(3.29)
bk (s, C) = 1 − Pr ˆjk∗ ∈ Amin
| (ˆsk , Cˆk ) = (s, C) .
k
(3.30)
Here, [1−ak (s, C)] is the probability that our stochastic resource allocation process at step k correctly
identifies an index î∗k as belonging to the set Amax
(similarly for [1 − bk (s, C)]).
k
Lemma 3.4.2 Suppose that Assumption A2.1 holds. Then, for every pair (s, C), we have
lim ak (s, C) = 0,
lim bk (s, C) = 0
k→∞
(3.31)
k→∞
Moreover, define
ak = sup max ak (s, C),
bk = sup max bk (s, C).
i≥k (s,C)
(3.32)
i≥k (s,C)
Then ak ≥ ak (s, C), bk ≥ bk (s, C), both ak and bk are monotone decreasing, and
lim ak = 0,
k→∞
lim bk = 0.
(3.33)
k→∞
The proof of the first part of the lemma follows immediately from Lemma 2.2.1 given the definition
of the sets Amax
and Amin
k
k . The second part then follows from the fact that, by their definitions, ak
and bk are monotone decreasing.
The last asymptotic property we need establishes the fact that there will be an improvement
(i.e., strictly lower cost) to an allocation at step k if that allocation is not optimal. However,
this improvement may not occur within a single step; rather, we show in Lemma 3.4.3 that such
improvement may require a number of steps, αk , beyond the kth step, where, αk is an increasing
sequence that satisfies certain technical requirements (See Appendix B.7).
Next, define:
i
h
ek (s, C) = 1 − Pr L(ˆsk+αk ) < L(ˆsk ) | (ˆsk , Cˆk ) = (s, C) .
(3.34)
and observe that [1 − ek (s, C)] is the probability that strict improvement (i.e., strictly lower cost)
results when transitioning from a state such that the allocation is not optimal to a future state αk
steps later. We will establish in Lemma 3.4.3 that this probability is asymptotically 1, as shown in
the next lemma the proof of which is included in Appendix B.
Lemma 3.4.3 Suppose that the ergodicity assumption (A2.1) as well as A3.4 hold. For any allocation s = [n1 , · · · , nN ] 6= s∗ and any set C,
lim ek (s, C) = 0.
(3.35)
ek = sup max ei (s, C).
(3.36)
k→∞
Moreover, define
i≥k s∈S,C
Then ek ≥ ek (s, C), ek is monotone decreasing, and
lim ek = 0.
k→∞
20
(3.37)
3.4.2. Convergence of the S-DOP Process
With the help of the properties established in the previous section, we can prove the following
theorem on the convergence of the S-DOP process the proof of which is included in Appendix B.
Theorem 3.4.1 Suppose that the ergodicity assumption A2.1 and A3.4 hold and that the optimum
s∗ is unique. Then the S-DOP process described by equations (3.18)-(3.22) converges in probability
to the optimal allocation s∗ .
Remark: If the optimal allocation s∗ is not unique, the analysis above can be extended to show
that convergence is to a set of “equivalent” allocations as long as each optimum is neighboring at
least one other optimum. When this arises in practice, what we often observe is oscillations between
allocations that all yield optimal performance.
3.4.3. A Stronger Convergence Result
By proper selection of the sample path length f (k), i.e., the kth estimation period, and under some
additional mild assumptions of Lemmas 2.2.2 we can show that the following lemma holds:
ˆ t (ni ) satisfies the assumptions of Lemma 2.2.2.
Lemma 3.4.4 Assume that, for every i, the estimate L
i
Then, for any i, j, i 6= j,
µ ¶
ˆ ti (ni ) ≥ ∆L
ˆ tj (nj )] = O
Pr[∆L
1
t
µ ¶
and
ˆ ti (ni ) < ∆L
ˆ tj (nj )] = 1 − O
Pr[∆L
1
t
provided that ∆Li (ni ) < ∆Lj (nj ).
Using Lemma 3.4.4, we can show that the S-DOP process converges almost surely. This result
is formally stated in the following theorem the proof of which is included in Appendix B.
Theorem 3.4.2 Suppose A3.1-A3.3 and the assumptions of Lemma 3.4.4 hold. If f (k) ≥ k 1+c
for some constant c > 0, then the S-DOP process converges almost surely to the global optimal
allocation.
3.5. Future Directions
The algorithms described in this chapter are easy to implement either on-line or off-line and, are
robust with respect to estimation noise. In addition, they converge fast, and as shown in [23],
for the class of regenerative systems they converge exponentially fast. Given those advantages, it
is interesting to see if such algorithms can work for more general systems that do not fall under
(RA2) and/or systems that violate assumptions A3.1 and A3.2 (i.e., systems that are not convex
nor separable).
The first question that rises is what would happen if we simply apply either D-DOP or S-DOP
on a general system. The answer is that there are two potential problems:
1. The algorithm might oscillate between two or more allocations and so it will never converge.
21
2. The algorithm may converge to some allocation other than the global optimum (we will referred
to such allocations as “local” optima).
For the first problem, we can find a quick fix using the properties that were derived in Section 3.3.2.
Specifically, properties P3 and P5 state that once a user has given one or more resources, it cannot
receive any resource back. Enforcing such a policy will guarantee that the algorithm will converge to
an allocation, however, there is no guarantee that such allocation is going to be the global optimum.
To solve the second problem, i.e., get out of the local optimum, we can use principles from
Random Search techniques [37]. The basic idea is to randomly pick an initial allocation and then let
D-DOP or S-DOP evolve until they reach their final allocation. Once this allocation is reached,
randomly pick a new initial allocation and repeat the same process.
Figure 3.1 shows several sample paths of this algorithm when it is used to find the minimum to
the deterministic function shown below:

10
if x1 ≤ 2



 5| cos(0.01πx x ) + cos(0.005πx )| if 2 < x ≤ 15
2 3
4
1
f (x1 , x2 , x3 , x4 ) =
5
5
1500
+
)
if
15
<
x

1 ≤ 25

1+x2 1+x3
(2x1 +3x2 +4x3 +5x4 )2


9
(3.38)
if x1 > 25
P
where x1 , · · · , x4 are integers such that 4i=1 xi = 30 and xi ≥ 0, i = 1, · · · , 4. The minimum value
for this problem is 0 and it occurs at eight allocations out of the possible 5, 456. If we apply the
D-DOP scheme as presented in Section 3.3 then it does not converge because the cosine function
causes the algorithm to oscillate. Thus we enforce P3 and P5, (i.e., we do not allow a user to get
a resource if it has given up one or more resources). As seen in Figure 3.1, the modified D-DOP
starts at a bad allocation and quickly converges to a local minimum. At this point, in order to get
out of the local minimum, it randomly selects different initial allocations and implements D-DOP
until one results in an allocation with better performance.
3.6. Summary
In this chapter we consider a class of DES with separable convex functions, i.e. problems of the form
of (RA2) that satisfy assumptions A3.1 and A3.2. For this class of systems, we derived necessary and
sufficient conditions that the optimal allocation must satisfy. Based on these conditions, we developed
an optimization algorithm which in a deterministic environment, i.e., when the objective function
is known with certainty, yields the optimal allocation in a finite number of steps. Subsequently,
we adapted the optimization algorithm for stochastic environments, i.e., when only noisy estimates
of the performance measure is available, and showed that it converges to the optimal allocation in
probability and, under some mild assumptions, almost surely as well. Finally, these algorithms have
several desirable properties. They are easy to implement and can be used for both, on-line and
off-line applications. They converge fast and are robust with respect to estimation noise.
22
2
1.6
Cost
1.2
0.8
0.4
0
0
50
100
Iteration
Figure 3.1: Evolutions of the modified D-DOP
23
150
Chapter 4
INCREMENTAL ALGORITHMS
FOR DISCRETE-RESOURCE
ALLOCATION
Not all problems have a nice separable structure. In this chapter we develop an incremental optimization algorithms that can be applied to systems with non-separable structure. The optimality
properties of the algorithms are proven under a necessary and sufficient “smoothness condition”.
Furthermore, it is shown that the incremental algorithms converge to the optimal allocation in probability and, under additional mild conditions, almost surely as well.
4.1. Problem Formulation
In this chapter we consider systems with performance measures J(x). The major difficulty here
is due to the fact that the performance measures are not separable. In other words, one cannot
P
express J(x) as J(x) = N
i=1 Ji (xi ), therefore the algorithms described in the previous chapter do
not directly apply. We will make use of the following definitions. First, ei = [0, · · · , 0, 1, 0, · · · , 0] is
an N -dimensional vector with all of its elements zero except the ith element which is equal to 1.
Second,
∆Ji (x) = J(x + ei ) − J(x)
(4.1)
is the change in J(x) due to the addition of a new resource to the ith element of an allocation
x = [x1 , · · · , xN ]. In other words, it is the sensitivity of J(x) with respect to xi . Finally, let
(
Ak =
x :
N
X
)
xi = k, xi ≥ 0 , k = 0, 1, · · ·
i=1
be the set of all possible allocations of k resources to N stages. Using the above definitions, the
optimization problem is formally stated as:
(RA3)
max J(x)
x∈AK
In addition, we define the following conditions that apply on J(x):
24
• Smoothness Condition or Condition (S):
If J(x∗ ) ≥ J(x) for some x∗ ∈ Ak and any x ∈ Ak , k = 1, · · · , K, then
max J(x∗ + ei ) ≥ max J(x + ei )
i=1,...,N
i=1,...,N
(4.2)
¯
• Complementary Smoothness Condition or Condition (S):
If J(x∗ ) ≤ J(x) for some x∗ ∈ Ak and any x ∈ Ak , k = 1, · · · , K, then
min J(x∗ + ei ) ≤ min J(x + ei )
i=1,...,N
i=1,...,N
(4.3)
• Uniqueness Condition or Condition (U):
Let i∗ = arg maxi=1,...,N {∆Ji (x)}, then
∆Ji∗ (x) > ∆Jj (x),
(4.4)
for any x ∈ Ak , k = 1, · · · , K, and any j 6= i∗ .
¯ imply that if in an optimal allocation of k resources a user i is given ni
Conditions (S) and (S)
resources, then, in an optimal allocation of k + 1 resources, user i will receive at least ni resources.
These conditions might sound restrictive, but as indicated in a later chapter, they are satisfied by a
wide range of systems (see Section 6.2). Condition (U) requires that at every allocation the maximum
finite difference as defined in (4.1) is unique. This is a rather technical condition, as will become
clear in the sequel, and it may be relaxed as shown in Section 4.2.3.
4.2. Deterministic Case
Problem (RA3) falls in the class of discrete resource allocation problems. Under conditions (S)
and (U), the following simple incremental allocation process similar to one found in [41] provides an
optimal allocation of K resources in K steps.
4.2.1. Deterministic Incremental Optimization Algorithm (DIO)
Define the sequence {xk }, k = 0, · · · , K such that
xk+1 = xk + ei∗k
(4.5)
i∗k = arg max {∆Ji (xk )}
(4.6)
where
i=1,...,N
and x0 := [0, · · · , 0]. After K steps, xK is the optimal solution of (RA3) as shown in the theorem
that follows.
Theorem 4.2.1 For any k = 0, 1, 2, · · ·, xk defined in (4.5) yields a solution to problem (RA3) if
and only if J(x) satisfies conditions (S) and (U).
25
The proof of this theorem is included in Appendix C.
It should come as no surprise that a similar algorithm will deliver an allocation that minimizes
¯ and (U).
an objective function that satisfies conditions ( S)
4.2.2. Complementary Deterministic Incremental Optimization Algorithm (DIO)
Consider the complementary (RA3) problem
min J(x)
(RA3)
x∈AK
To solve (RA3) define the sequence {xk }, k = 0, · · · , K such that
xk+1 = xk + ei∗k
(4.7)
i∗k = arg min {∆Ji (xk )}
(4.8)
where
i=1,...,N
and x0 := [0, · · · , 0]. After K steps, xK is the optimal solution of (RA3) as shown in the theorem
that follows.
Theorem 4.2.2 For any k = 0, 1, 2, · · ·, defined in xk in (4.7) yields a solution to problem (RA3)
¯ and (U).
if and only if J(x) satisfies conditions (S)
The proof this theorem is similar to the proof of Theorem 4.2.1 and it is omitted. Remarks: If there
are K available resources to be allocated to N stages, then the DIO as well as the DIO processes
require K steps before they deliver the optimal allocation. In contrast, using exhaustive search
−1)!
requires a number of steps which is combinatorially explosive (K+N
(N −1)!K! . It is possible to relax the
Uniqueness condition (U) through a straightforward extension of the DIO algorithm as described
in the next section.
4.2.3. Extension of the Incremental Optimization Algorithms
¯ k after k steps and assume that (4.6) gives i∗k = i = j,
Suppose that the sequence {xk } in (4.5) yields x
for two indices i, j ∈ 1, · · · , N . In this case, it is clear that J(¯
xk +ei ) = J(¯
xk +ej ), but the process has
¯k.
no way of distinguishing between i and j in order to define a unique new state xk+1 given xk = x
Note also that random selection cannot guarantee convergence to the optimal since it is possible
¯ k + ei or x
¯ k + ej ) can yield the
that at the next iteration only one of the two allocations (either x
optimum. Since there is inadequate information to choose between i and j, it is natural to postpone
the decision until more information is available. To achieve this we modify the process as described
next, by using a recursion on a set of allocations Uk ∈ Ak . In particular, we define a sequence of sets
{Uk }, k = 0, · · · , K such that
Uk+1 = {xk + ei | ∆Ji (xk ) = ∆Ji∗k (xk ), i = 1, · · · , N, xk ∈ Uk }
26
(4.9)
where
i∗k = arg max {∆Ji (xk )}.
(4.10)
i=1,...,N
xk ∈Uk
and U0 = {x0 }, x0 = [0, 0, · · · , 0]. After K steps, it is easy to see that any allocation in UK is the
optimal solution to (RA3). The extra cost incurred by this scheme compared to (4.5)-(4.6) involves
storing additional information. It is straight forward to show that a similar extension applies to
(DIO) and so it is omitted.
4.3. Stochastic Case
In this section we focus our attention to the resource allocation problem in a stochastic environment.
Following the definitions of Chapter 2, we assume that the performance measure is of the form of
an expectation, J(x) = E[L(x)], where L(x) is a sample function used as the noisy performance
estimate. The problem then, is to determine the optimal resource allocation based on a scheme
similar to the DIO algorithm defined by (4.5)-(4.6) now driven by estimates of J(xk ). In particular,
let Jˆt (xk ) denote a noisy estimate of J(xk ) obtained through simulation or on-line observation of
the system over an “estimation period” t. Clearly, the DIO algorithm (as well as DIO and their
extensions of Section 4.2.3) can no longer guarantee convergence in such a stochastic setting. For
instance, suppose that at the kth step of the allocation process i∗k = j, however, due to noise, we
obtain an estimate î∗k = m 6= j. In this case, the mth stage will get an additional resource, whereas
it is possible that at the optimal allocation the mth stage has only as many resources as it had
prior to the kth iteration. Since there is no way of reallocating resources to another stage, the
optimal allocation will never be reached. With this observation in mind, we introduce a number of
modifications to the DIO algorithm. First, as in the previous chapter, there should be a mechanism
through which resources erroneously allocated to some stage are reallocated to other stages. Second,
it must be possible to progressively improve the performance estimates so as to eliminate the effects of
estimation noise. Toward this goal, let f (l) denote the length of the sample path on the lth iteration
and let it be such that liml→∞ f (l) = ∞. We then define a stochastic process {ˆ
xk,l }, k = 0, · · · , K,
l = 1, 2, · · ·, as follows:
ˆ k+1,l = x
ˆ k,l + eî∗
k = 0, · · · , K − 1
(4.11)
x
k,l
ˆ K,l , the process is reset to
for all l = 1, 2, · · ·, and every K iterations, i.e. after allocation x
ˆ 0,l+1 = [0, · · · , 0] l = 1, 2, · · ·
x
where
î∗k,l = arg
n
max
i=1,...,N
f (l)
∆Jî
(4.12)
o
(ˆ
xk,l ) .
(4.13)
We will subsequently refer to the above allocation scheme as the Stochastic Incremental Optimization
(SIO) algorithm. Note that in order to derive the stochastic version of DIO, (SIO) we simply replace
the max operator in (4.13) with a min operator.
Again we make the ergodicity assumption A2.1 about the performance estimates Jˆt (x) and prove
the following result.
Theorem 4.3.1 For any performance measure J(x) that satisfies assumption A2.1 and conditions
(S) and (U), {ˆ
xK,l } converges in probability to the global optimal allocation, as l → ∞.
The proof of the theorem is included in Appendix C.
27
4.3.1. Stronger Convergence Results
Under some additional mild conditions and by properly selecting the “observation interval” t = f (l),
it is possible to show that the SIO process converges to the optimal allocation almost surely. First,
ˆ
assume that that J(x)
satisfies the conditions of Lemma 2.2.2, therefore Lemma 3.4.4 also holds. As
in Chapter 3 we prove the following result the proof of which is included in Appendix C.
Theorem 4.3.2 For any performance measure J(x) that satisfies conditions (S) and (U) and the
conditions of Lemma 3.4.4, if the observation interval f (l) ≥ l1+c for some constant c > 0, the
process {ˆ
xk,l } converges to the global optimal allocation almost surely.
Note that it is straight forward to derive similar convergence results for SIO and so they are
omitted.
4.4. Discussion on the Incremental Algorithm
First note that if the Uniqueness condition (U) is not satisfied, then there is no guarantee that
neither SIO nor SIO will converge to the optimal in any sense. In this case, one can proceed in
a way similar to the set-iterative process described in Section 4.2.3. For example, one can include
in the set Uk all the allocations whose estimated performance lies within some distance rk from the
observed maximum/minimum where rk is a decreasing sequence.
Second, using the “Law of Diminishing Returns” from Economic Theory (e.g. see [2]) it is expected
that as the number of resources increases, then, assuming no economies of scale, the marginal impact
of every additional resource on the objective function will be decreasing. This implies the following:
1. ∆J(xk∗ ) = J(x∗k ) − J(x∗k−1 ), where x∗k is the solution to (RA3) when there are k available
resources, is a decreasing function of k.
2. As k increases, the set of allocations that exhibit optimal or near optimal performance grows
larger.
3. Pr[ˆ
xk,l = x∗k ] is a decreasing function of k, since the number of near optimal allocations increases
and hence the distance between two or more allocations increases.
These repercussions have practical implications on the usage of the developed incremental algorithms.
First, if the total number of resources is not fixed, then one can determine the maximum number
of resources based on a cost-benefit analysis. In other words, one can continue adding resources as
long as the marginal benefit of the new resource ∆J(x∗k ) is greater than the cost of the resource.
Second, there are implications in terms of the convergence of the stochastic version of the algorithm. For small values of k, there are less optimal allocations and the difference of the performance
of such allocations with any other allocations is large. As a consequence, the probability of identifying the optimal allocation under a small k is also large. As k increases, there are more near
optimal allocations hence the probability of identifying the true optimal decreases. As a result the
algorithm may oscillate in a set of allocations with near optimal performance. This can be used to
make the algorithm even more efficient. For example, in SIO every K steps the algorithm is reset
to x0 = [0, · · · , 0] as shown in (4.12). Rather than resetting to x0 for all l = 1, 2, · · ·, it may be more
28
efficient to reset to x0 only for l = n, 2n, · · ·, where n > 1 is an integer. For all other iterations
we may reset to an allocation xz , z < K such than xiz <= x
îK for all i = 1, · · · , N . That is, find
an allocation that is common to all allocations that are picked as optimal and make it the initial
allocation.
4.5. Summary
In this chapter we consider another class of DES that does not have a separable functions but
¯ conditions. For
which satisfy either the “smoothness” (S) or the “complementary smoothness”(S)
this class of systems, we developed an incremental optimization algorithm which in a deterministic
environment, yields the optimal allocation in a finite number of steps. Finally, we adapted the
optimization algorithm for use in stochastic environments and showed that it converges to the optimal
allocation in probability and, under some mild assumptions, almost surely as well.
29
Chapter 5
PERTURBATION ANALYSIS
The optimization algorithms presented in the previous two chapters are contingent upon availability
of the value of the finite difference ∆J or at least the availability of an unbiased estimate of its
value. This implies the observation of at least two sample paths, one under parameter x and one
under x + ei . Hence, at least two sample paths are required to obtain one such ∆. Our objective
however, is to use the optimization algorithms on-line. In other words, we would like to develop
controllers that would be able to observe a real system and automatically reallocate the system’s
resources to maintain optimal performance. Note that, when observing a real system operating under
some parameter x, it is usually straight forward to obtain one of the necessary estimates needed to
ˆ
calculate the required difference, i.e., J(x).
The objective of this chapter is to develop techniques
that would enable us to predict the system’s performance under other parameters while observing
the system under x without actually switching to the other parameter.
5.1. Introduction
It is by now well-documented in the literature that the nature of sample paths of DES can be exploited so as to extract a significant amount of information, beyond merely an estimate of J(θ). It
has been shown that observing a sample path under some parameter value θ allows us to efficiently
obtain estimates of derivatives of the form dJ/dθ which are in many cases unbiased and strongly
consistent (e.g., see [12, 30, 35] where Infinitesimal Perturbation Analysis (IPA) and its extensions
are described). Similarly, Finite Perturbation Analysis (FPA) has been used to estimate finite differences of the form ∆J(∆θ) or to approximate the derivative dJ/dθ through ∆J/∆θ when other PA
techniques fail. In the discrete-resource allocation context, of particular interest are often parameters
θ that take values from a discrete set {θ1 , · · · , θm } (e.g., queueing capacities), in which case we desire
to effectively construct sample paths under any θ1 , · · · , θm by just observing a sample path under
one of these parameter values.
In this chapter, we develop Concurrent Estimation (CE) which is a general approach for constructing the sample paths under any parameter {θ1 , · · · , θm } using observations on a single sample
path under θ. Subsequently, we develop a Finite Perturbation Scheme which takes advantage of
the special structure of some systems to construct sample paths under neighboring allocations, i.e.
allocations that result from adding a resource to one of the users.
30
5.2. Problem Definition
We will concentrate on the general sample path constructability problem for DES. That is, given
a sample path under a particular parameter value θ, the problem is to construct multiple sample
paths of the system under different values using only information available along the given sample
path. A solution to this problem can be obtained when the system under consideration satisfies the
Constructability Condition (CO) presented in [11, 10]. Suppose that a sample path of the system is
observed under parameter θ and we would like to construct the corresponding sample path under
some θ0 . Then (CO) consists of two parts. The first part is the Observabilty Condition (OB) which
states that at every state the feasible event set of the constructed sample path must be a subset of
the feasible event set of the observed sample path. The second part is a requirement that all lifetimes
of feasible events conditioned on event ages are equal in distribution.
Unfortunately, CO is not easily satisfied. Nonetheless, two methods have been developed that
solve CO for systems with exponential lifetime distributions. In particular, the Standard Clock (SC)
approach [74] solves the sample path constructability problem for models with exponentially distributed event lifetimes by exploiting the well-known uniformization technique for Markov chains.
This approach allows the concurrent construction of multiple sample paths under different (continuous or discrete) parameters at the expense of introducing “fictitious” events. Chen and Ho [18]
have proposed a Generalized Standard Clock approach that uses approximation techniques to extend
the SC approach to systems with non-exponential event lifetime distributions. On the other hand,
Augmented System Analysis (ASA) [11, 10] solves the constructability problem by “suspending” the
construction of one or more paths during certain segments of the observed sample path in a way
such that the stochastic characteristics of the observed sample path are preserved. In ASA, it is still
necessary to assume exponential event lifetime distributions, although, with a minor extension it is
possible to allow at most one event to have a non-exponential lifetime distribution (see [12, 10] for
details).
We consider a DES and adopt the modeling framework of a stochastic timed state automaton
(E,X ,Γ,f ,x0 ) (see [12]). Here, E is a countable event set, X is a countable state space, and Γ(x) is
a set of feasible (or enabled) events, defined for all x ∈ X such that Γ(x) ⊆ E. The state transition
function f (x, e) is defined for all x ∈ X , e ∈ Γ(x), and specifies the next state resulting when e occurs
at state x. Finally, x0 is a given initial state.
Remark: The definition is easily modified to (E,X ,Γ,p,p0 ) in order to include probabilistic state
transition mechanisms. In this case, the state transition probability p(x0 ; x, e0 ) is defined for all
x, x0 ∈ X , e0 ∈ E, and is such that p(x0 ; x, e0 ) = 0 for all e0 6∈ Γ(x). In addition, p0 (x) is the pmf
P [x0 = x], x ∈ X , of the initial state x0 .
Assuming the cardinality of the event set E is N , the input to the system is a set of event
lifetime sequences {V1 , · · · , VN }, one for each event, where Vi = {vi (1), vi (2), · · ·} is characterized
by some arbitrary distribution. Under some system parameter θ0 , the output is a sequence ξ(θ0 ) =
{(ek , tk ), k = 1, 2, · · ·} where ek ∈ E is the kth event and tk is its corresponding occurrence time (see
Figure 5.1). Based on any observed ξ(θ0 ), we can evaluate L[ξ(θ0 )], a sample performance metric
for the system. For a large family of performance metrics of the form J(θ0 ) = E[L[ξ(θ0 )]], L[ξ(θ0 )]
is therefore an estimate of J(θ0 ). Defining a set of parameter values of interest {θ0 , θ1 , · · · , θM }, the
31
V1 = {v1 (1), v1 (2), · · ·}
VN = {vN (1), vN (2), · · ·}
-
··
·
-
DES
(θ0 )
-
- ξ(θ0 )
DES
(θ1 )
- ξ(θ1 )
··
·
-
DES
(θM )
- ξ(θM )
Figure 5.1: The sample path constructability problem for DES
sample path constructability problem is:
For a DES under θ0 , construct all sample paths ξ(θ1 ), · · · , ξ(θM ) given a realization of
lifetime sequences V1 , · · · , VN and the sample path ξ(θ0 ).
We emphasize that the proposed schemes are suited to on-line sample path construction, where
actual system data are processed for performance estimation purposes. Furthermore, unlike SC and
ASA they can be used for arbitrary lifetime distributions.
5.3. Concurrent Simulation
For simplicity, in the rest of this section we assume that the DES under investigation satisfies the
following three assumptions.
A5.1: Feasibility Assumption: Let xn be the state of the DES after the occurrence of the nth event.
Then, for any n, there exists at least one r > n such that e ∈ Γ(xr ) for any e ∈ E.
A5.2: Invariability Assumption: Let E be the event set under the nominal parameter θ0 and let Em
be the event set under θm 6= θ0 . Then, Em = E.
A5.3 Similarity Assumption: Let Gi (θ0 ), i ∈ E be the event lifetime distribution for the event i
under θ0 and let Gi (θm ), i ∈ E be the corresponding event lifetime distribution under θm .
Then, Gi (θ0 ) = Gi (θm ) for all i ∈ E.
Assumption A5.1 guarantees that in the evolution of any sample path all events in E will always
become feasible at some point in the future. If for some DES assumption A5.1 is not satisfied, i.e.
there exists an event α that never gets activated after some point in time, then, as we will see, it is
32
possible that the construction of some sample path will remain suspended forever waiting for α to
happen. Note that a DES with an irreducible state space immediately satisfies this condition.
Assumption A5.2 states that changing a parameter from θ0 to some θm 6= θ0 does not alter the
event set E. More importantly, A5.2 guarantees that changing to θm does not introduce any new
events so that all event lifetimes for all events can be observed from the nominal sample path.
Finally, assumption A5.3 guarantees that changing a parameter from θ0 to some θm 6= θ0 does
not affect the distribution of one or more event lifetime sequences. This allows us to use exactly
the same lifetimes that we observe in the nominal sample path to construct the perturbed sample
path. In other words, our analysis focuses on structural system parameters rather that distributional
parameters which is appropriate for the resource allocation problems that we are dealing in this
thesis.
Note, that these assumptions can be relaxed at some additional computational cost as discussed
later in this chapter. Before presenting the coupling approach we use to solve the constructability
problem and the explicit procedure we will refer to as the Time Warping Algorithm, let us present
the necessary notation.
5.3.1. Notation and Definitions
First, let ξ(n, θ) = {ej : j = 1, · · · , n}, with ej ∈ E, be the sequence of events that constitute the
observed sample path up to n total events. Although ξ(n, θ) is clearly a function of the parameter
˜
θ, we will write ξ(n) to refer to the observed sample path and adopt the notation ξ(k)
= {˜
ej : j =
1, · · · , k} for any constructed sample path under a different value of the parameter up to k events in
that path. It is important to realize that k is actually a function of n, since the constructed sample
path is coupled to the observed sample path through the observed event lifetimes. However, again
for the sake of notational simplicity, we will refrain from continuously indicating this dependence.
Next we define the score of an event i ∈ E in a sequence ξ(n), denoted by sni = [ξ(n)]i , to
be the non-negative integer that counts the number of instances of event i in this sequence. The
˜ i . In what follows, all
corresponding score of i in a constructed sample path is denoted by s˜ki = [ξ(k)]
quantities with the symbol “ ˜. ” refer to a typical constructed sample path.
Associated with every event type i ∈ E in ξ(n) is a sequence of sni event lifetimes
Vi (n) = {vi (1), · · · , vi (sni )}
for all i ∈ E
The corresponding set of sequences in the constructed sample path is:
e i (k) = {vi (1), · · · , vi (˜
V
ski )}
for all i ∈ E
which is a subsequence of Vi (n) with k ≤ n. In addition, we define the following sequence of lifetimes:
Vi (n, k) = {vi (˜
ski + 1), · · · , vi (sni )}
for all i ∈ E
e i (k). Associated with any one of
which consists of all event lifetimes that are in Vi (n) but not in V
these sequences are the following operations. Given some Wi = {wi (j), · · · , wi (r)},
Suffix Addition: Wi + {wi (r + 1)} = {wi (j), · · · , wi (r), wi (r + 1)} and,
33
Prefix Subtraction: Wi − {wi (j)} = {wi (j + 1), · · · , wi (r)}.
Note that the addition and subtraction operations are defined so that a new element is always added
as the last element (the suffix) of a sequence, whereas subtraction always removes the first element
(the prefix) of the sequence.
Next, define the set
n
A(n, k) = i : i ∈ E, sni > s˜ki
o
(5.1)
which is associated with Vi (n, k) and consists of all events i whose corresponding sequence Vi (n, k)
contains at least one element. Thus, every i ∈ A(n, k) is an event that has been observed in ξ(n)
˜
and has at least one lifetime that has yet to be used in the coupled sample path ξ(k).
Hence, A(n, k)
should be thought of as the set of available events to be used in the construction of the coupled path.
Finally, we define the following set, which is crucial in our approach:
M (n, k) = Γ(˜
xk ) − (Γ(˜
xk−1 ) − {˜
ek })
(5.2)
where, clearly, M (n, k) ⊆ E. Note that e˜k is the triggering event at the (k − 1)th state visited in
the constructed sample path. Thus, M (n, k) contains all the events that are in the feasible event
set Γ(˜
xk ) but not in Γ(˜
xk−1 ); in addition, e˜k also belongs to M (n, k) if it happens that e˜k ∈ Γ(˜
xk ).
Intuitively, M (n, k) consists of all missing events from the perspective of the constructed sample
path when it enters a new state x
˜k : those events already in Γ(˜
xk−1 ) which were not the triggering
event remain available to be used in the sample path construction as long as they are still feasible;
all other events in the set are “missing” as far as residual lifetime information is concerned.
The concurrent sample path construction process we are interested in consists of two coupled
processes, each generated by a timed state automaton. This implies that there are two similar sets
of equations that describe the dynamics of each process. In addition, we need a set of equations that
captures the coupling between them.
5.3.2. Timed State Automaton
We briefly review here the standard timed state automaton dynamics, also known as a Generalized
Semi-Markov Scheme (GSMS) (see [12, 30, 35]). We introduce two additional variables, tn to be the
time when the nth event occurs, and yi (n), i ∈ Γ(xn ), to be the residual lifetime of event i after the
occurrence of the nth event (i.e., it is the time left until event i occurs). On a particular sample
path, just after the nth event occurs the following information is known: the state xn from which we
can determine Γ(xn ), the time tn , the residual lifetimes yi (n) for all i ∈ Γ(xn ), and all event scores
sni , i ∈ E. The following equations describe the dynamics of the timed state automaton.
step 1: Determine the smallest residual lifetime among all feasible events at state xn , denoted by yn∗ :
yn∗ = min {yi (n)}
i∈Γ(xn )
(5.3)
step 2: Determine the triggering event:
en+1 = arg
34
min {yi (n)}
i∈Γ(xn )
(5.4)
step 3: Determine the next state:
xn+1 = f (xn , en+1 )
(5.5)
tn+1 = tn + yn∗
(5.6)
step 4: Determine the next event time:
step 5: Determine the new residual lifetimes for all new feasible events i ∈ Γ(xn+1 ):
(
yi (n + 1) =
yi (n) − yn∗
vi (sni + 1)
if i 6= en+1 and i ∈ Γ(xn )
if i = en+1 or i ∈
6 Γ(xn )
step 6: Update the event scores:
(
sn+1
i
=
sni + 1
sni
for all i ∈ Γ(xn+1 )
if i = en+1
otherwise
(5.7)
(5.8)
Equations (5.3)-(5.8) describe the sample path evolution of a timed state automaton. These
equations apply to both the observed and the constructed sample paths. Next, we need to specify
the mechanism through which these two sample paths are coupled in a way that enables event
˜
lifetimes from the observed ξ(n) to be used to construct a sample path ξ(k).
5.3.3. Coupling Dynamics
Upon occurrence of the (n + 1)th observed event, en+1 , the first step is to update the event lifetime
sequences Vi (n, k) as follows:
(
Vi (n + 1, k) =
Vi (n, k) + vi (sni + 1)
Vi (n, k)
if i = en+1
otherwise
(5.9)
The addition of a new event lifetime implies that the “available event set” A(n, k) defined in (5.1)
may be affected. Therefore, it is updated as follows:
A(n + 1, k) = A(n, k) ∪ {en+1 }
(5.10)
Finally, note that the “missing event set” M (n, k) defined in (5.2) remains unaffected by the occurrence of observed events:
M (n + 1, k) = M (n, k)
(5.11)
At this point, we are able to decide whether all lifetime information to proceed with a state
transition in the constructed sample path is available or not. In particular, we check the condition
M (n + 1, k) ⊆ A(n + 1, k).
(5.12)
Assuming (5.12) is satisfied, equations (5.3)-(5.8) may be used to update the state x˜k of the
constructed sample path. In so doing, lifetimes vi (ski + 1) for all i ∈ M (n + 1, k) are used from
e i (n + 1, k). Thus, upon completion of the six state update steps, all
the corresponding sequences V
e i (n, k), A(n, k), and M (n, k) need to be
three variables associated with the coupling process, i.e., V
updated. In particular,
(
Vi (n + 1, k + 1) =
Vi (n + 1, k) − vi (˜
ski + 1)
Vi (n + 1, k)
35
for all i ∈ M (n + 1, k)
otherwise
(5.13)
This operation immediately affects the set A(n + 1, k) which is updated as follows:
n
A(n + 1, k + 1) = A(n + 1, k) − i : i ∈ M (n + 1, k), s˜k+1
= sn+1
i
i
o
(5.14)
Finally, applying (5.2) to the new state x
˜k+1 ,
M (n + 1, k + 1) = Γ(˜
xk+1 ) − (Γ(˜
xk ) − {˜
ek+1 })
(5.15)
Therefore, we are again in a position to check condition (5.12) for the new sets M (n + 1, k + 1) and
A(n + 1, k + 1). If it is satisfied, then we can proceed with one more state update on the constructed
sample path; otherwise, we wait for the next event on the observed sample path until (5.12) is again
satisfied. The analysis above is summarized by the Time Warping Algorithm (TWA) included in
Appendix A.2.
5.3.4. Extensions of the TWA
Earlier in this chapter we stated a few assumptions that were made to simplify the development of
our approach and keep the TWA notationally simple. It turns out that we can extend the application
of TWA to DES by relaxing these assumptions at the expense of some extra work.
In A5.2 we assumed that changing a parameter from θ0 to some θm 6= θ0 does not alter the
event set E. Clearly, if the new event set, Em is such that Em ⊆ E, the development and analysis of
TWA is not affected. If, on the other hand, E ⊂ Em , this implies that events required to cause state
transitions under θm are unavailable in the observed sample path, which make the application of our
algorithm impossible. In this case, one can introduce phantom event sources which generate all the
unavailable events as described, for example, in [17], provided that the lifetime distributions of these
events are known. The idea of phantom sources can also be applied to DES that do not satisfy A1.
In this case, if a sample path remains suspended for a long period of time, then a phantom source
can provide the required event(s) so that the sample path construction can resume.
In A3 we assumed that changing a parameter from θ0 to some θm 6= θ0 does not affect the distribution of one or more event lifetime sequences. This assumption is used in (5.9) where the observed
e i (n + 1, k). Note that this problem can be
lifetime vi (sni + 1) is directly suffix-added to the sequence V
overcome by transforming observed lifetimes Vi = {vi (1), vi (2), · · ·} with an underlying distribution
Gi (θ0 ) into samples of a similar sequence corresponding to the new distribution Gi (θm ) and then
e i (n + 1, k). This is indeed possible, if Gi (θ0 ), Gi (θm ) are known, at the expense
suffix-add them in V
of some additional computational cost for this transformation (for example, see [12]). One interesting
special case arises when the parameter of interest is a scale parameter of some event lifetime distribution (e.g., it is the mean of a distribution in the Erlang family). Then, simple rescaling suffices to
transform an observed lifetime vi under θ0 into a new lifetime vî under θm :
vî = (θm /θ0 )vi
Finally note that in a simulation environment it is possible to eliminate the overhead qK which
is due to checking the subset condition in step 2.5. In order to achieve this we need to eliminate the
coupling between the observed and constructed sample paths. Towards this goal, we can simulate
the nominal sample but rather than disposing the event lifetimes we save them all in memory. Once
the simulation is done, we simulate one by one all the perturbed sample paths exactly as we do with
the brute force simulation scheme but rather than generating the required random variates we read
them directly from the computer memory. In this way we trade off computer memory for higher
speedup. A quantification of this tradeoff is the subject of ongoing research.
36
5.4. Finite Perturbation Analysis
As noted before, the optimization algorithms of the previous chapters do not require the system
performance under any allocation but only under the neighboring allocations, i.e., allocations that
differ by ±1 resources in two distinct users. In this section we try to take advantage of the structure
of queueing systems in order to derive a more efficient constructability scheme that will provide us
with the necessary information. Even though the method of deriving this scheme is pretty general
and can be applied to any queueing model, we will develop it using the serial queueing model shown
in Figure 5.2. For this model, our objective is to observe the system under some buffer allocation,
and based on that predict the performance of the system if we had added an extra buffer slot to one
of the queues.
λ
Q0
Q1
µ0
…
µ1
QN
µΝ
Figure 5.2: FPA System Model
5.4.1. Notation and Definitions
We begin by establishing some basic notation and defining quantities we will use in our analysis.
First, for any x, let [x]+ ≡ max{0, x}. The pair (k, n) will be used to denote the kth job in the nth
stage. Associated with such a job are
Zkn : the service time of (k, n).
Ckn : the service completion time of (k, n) at stage n.
Dkn : the departure time of (k, n) from stage n; if no blocking occurs, then Dkn = Ckn .
We also define
n
Ikn ≡ Dkn−1 − Dk−1
≡ −Wkn
(5.16)
Observe that when Ikn > 0, this quantity is the length of an idle period at stage n that starts with the
departure of (k−1, n) and ends with the arrival of (k, n) at time Dkn−1 . Conversely, if Wkn = −Ikn > 0,
n
this is the waiting time of (k, n) who can only begin processing at time Dk−1
> Dkn−1 . Similarly, we
define
n+1
Bkn ≡ Dk−x
− Ckn
(5.17)
n+1
which, if Bkn > 0, provides the length of a blocking period for the job (k, n) completing service at
time Ckn . Finally,
n
Qnk ≡ Dkn − Dk−1
= Zkn + [Ikn ]+ + [Bkn ]+
(5.18)
so that Qnk represents the interdeparture time of (k − 1, n) and (k, n) at stage n.
37
For our purposes, a perturbed sample path is one that would have resulted if the exact same
nominal sample path had been reproduced under an allocation with one buffer slot added at some
queue. To distinguish between quantities pertaining to the nominal path and their counterparts on a
perturbed path we will use a tilde(“˜·”) as follows: if the number of buffers allocated to queue n is xn
in the nominal path, then x
ñ denotes the number of buffers in the perturbed path. Similar notation
applies to other quantities such as Dkn , etc. With this in mind, we define the indicator function
(
1[n + 1] = 1[˜
xn+1 = xn+1 + 1] =
1 if x
ñ+1 = xn+1 + 1
0 if x
ñ+1 = xn+1
to identify the downstream stage to any stage n where an additional buffer would have been added
in a perturbed path. We also define
en
∆Dkn ≡ Dkn − D
(5.19)
k
to be the departure time perturbation for (k, n) due to the addition of a buffer to the nominal
allocation.
Finally, we will find useful the following quantity, defined as the relative perturbation in departure
times for two jobs (k1 , n1 ) and (k2 , n2 ):
(k ,n )
∆(k12 ,n12 ) ≡ ∆Dkn11 − ∆Dkn22
(5.20)
5.4.2. Derivation of Departure Time Perturbation Dynamics
We begin with the simple observation that the departure time Dkn satisfies the following Lindley-type
recursive equation:
n
o
n+1
n
Dkn = max Dkn−1 + Zkn , Dk−1
+ Zkn , Dk−x
(5.21)
n+1
There are three cases captured in this equation:
1. The departure of (k, n) was activated by the departure of (k, n − 1). This corresponds to the
case where (k, n) starts a new busy period at stage n and, upon completion of service, it is
not blocked by the downstream stage n + 1 . Thus, Dkn = Dkn−1 + Zkn and from the definitions
(5.16) and (5.17) it is easy to see that
Dkn = Dkn−1 + Zkn ⇐⇒ Wkn ≤ 0, Bkn ≤ 0
(5.22)
2. The departure of (k, n) was activated by the departure of (k − 1, n). This corresponds to
the case where (k, n) belongs to an ongoing busy period (hence, experiencing some waiting in
queue before receiving service) and it is not blocked by the downstream server n + 1. Thus,
n
Dkn = Dk−1
+ Zkn and from (5.16) and (5.17) it is once again easy to check that
n
Dkn = Dk−1
+ Zkn ⇐⇒ Wkn ≥ 0, Bkn ≤ 0
(5.23)
3. The departure of (k, n) was activated by the departure of (k − xn+1 , n + 1). This corresponds
to the case where (k, n) is blocked and must remain at the nth stage after service completion1 .
n+1
In this case, Dkn = Dk−x
and from (5.17) it is easy to check that
n+1
n+1
Dkn = Dk−x
⇐⇒ Bkn ≥ 0
n+1
(5.24)
1
Actually, this case combines two subcases, one where (k, n) starts a new busy period and leaves stage n after
being blocked for some time, and another where (k, n) belongs to anongoing busy period and leaves stage n after being
blocked.
38
Next, we consider the perturbation ∆Dkn defined by (5.19). In order to find the perturbation we
apply (5.21) to both, the nominal and perturbed paths, and identify 9 distinct cases. After some
algebra we arrive at the following theorems:
Theorem 5.4.1 If Wkn ≤ 0 and Bkn ≤ 0, then:
n
(k,n−1)
∆Dkn = ∆Dkn−1 − max 0, ∆(k−1,n) − Ikn ,
o
(k,n−1)
∆(k−xn+1 −1[n+1],n+1) + Bkn − Qn+1
k−xn+1 · 1[n + 1]
(5.25)
The proof of this theorem is included in the Appendix D.
Theorem 5.4.2 If Wkn > 0 and Bkn ≤ 0, then:
n
(k−1,n)
n
∆Dkn = ∆Dk−1
− max 0, ∆(k,n−1) − Wkn ,
o
(k−1,n)
∆(k−xn+1 −1[n+1],n+1) + Bkn − Qn+1
k−xn+1 · 1[n + 1]
(5.26)
Theorem 5.4.3 If Bkn > 0, then:
n+1
∆Dkn = ∆Dk−x
n+1
n
(k−x
n+1
− max ∆(k,n−1)
(k−x
n+1
∆(k−1,n)
,n+1)
,n+1)
− Bkn − [Wkn ]+ ,
³
(k−x
,n+1)
´
o
− Bkn − [Ikn ]+ , ∆(k−xn+1
− Qn+1
k−xn+1 · 1[n + 1]
n+1 −1,n+1)
(5.27)
The proofs of Theorems 5.4.2 and 5.4.3 are similar to the proof of Theorem 5.4.1 and are omitted.
Theorems 5.4.1-5.4.3 directly lead to the FPA algorithm presented in Appendix A.3. Note that
this algorithm exactly constructs the sample path that would have been observed if one of the stages
had an extra kanban. Also, note that all quantities used are directly observable from the nominal
sample path. Finally, note that this algorithm subsumes the FPA results presented in [36], where
perturbations were associated with stages rather than individual jobs.
5.5. Summary
Several optimization algorithms, including the ones described in earlier chapters, require “derivative”
like information in order to determine their next step. In general, to provide such information it is
necessary to observe two sample paths under two different parameter values. It is well known that
for the class of discrete-event systems it is possible to obtain such information by observing only a
single sample path. This is referred to as the constructability problem, which is the subject of this
chapter. First we develop a general algorithm that can be used to construct a sample path under
any parameter {θ1 , · · · , θm } while observing a single sample path under θ0 . Subsequently, we take
advantage of the special structure of queueing systems and we develop a more efficient algorithm for
constructing sample paths but its applicability is restricted to a neighborhood of allocations “close”
to the parameter of the observed sample path (i.e., allocations that differ by ±1 resources).
39
Chapter 6
OPTIMIZATION OF
KANBAN-BASED
MANUFACTURING SYSTEMS
This chapter presents the first application of the developed resource allocation methodologies on
kanban-based manufacturing systems. In this context, kanban constitute the discrete resources while
the machine stages represent the users. We show that such systems satisfy the smoothness or
complementary smoothness conditions defined in Chapter 4 and hence use the developed Incremental
optimization algorithms to allocate a fixed number of kanban to all stages as to optimize an objective
function.
6.1. Introduction
The Just-In-Time (JIT) manufacturing approach (see Sugimori et al. [71] and Ashburn [7]) was
developed to reduce the work-in-process inventory and its fluctuations and hence reduce production
costs. The main principle of the technique is to produce material only when it is needed. Its most
celebrated component is the so called kanban method, the basic idea of which is the following. A
production line is divided into several stages and at every stage there is a fixed number of tags (or
tickets) called kanban. An arriving job receives a kanban at the entrance of the stage and maintains
possession of it until it exits the stage. If an arriving job does not find an available kanban at the
entrance, it is not allowed to enter that stage until a kanban is freed; in this case, the job is forced
to wait in the previous stage and becomes blocked.
There are several variations of the basic kanban production system (see [24] for an overview)
and, over the past years, much work has been devoted to the analysis and performance evaluation of
such schemes. One of the main issues associated with the kanban method is the determination of the
number of kanban at every stage. It is obvious that in order to achieve a minimum work-in-process
inventory, no more than one job should be allowed at every stage. This, however, would severely
restrict other objective functions such as throughput, mean delay etc. Therefore, the selection of the
number of kanban is closely linked to a tradeoff between work-in-process inventory and some other
possible objective. Several authors have investigated such tradeoffs. Philipoom et. al. [60], investigated the relation of the number of kanban with the coefficient of variation in processing times,
40
machine utilization, and the autocorrelation of processing times and proposed an empirical methodology for determining the number of kanban. In related work, Gupta and Gupta [34] investigated
additional performance measures such as production idle time and shortage of final products. In
studying the performance of kanban systems, both analytical models (e.g., [47, 52, 70]) and simulation (e.g., [40, 50, 65]) have been used. In the former case, one must resort to certain assumptions
regarding the various stochastic processes involved (e.g., modeling demand and service processes
at different stages through exponential distributions); however, even in the case of a simple finite
Markov chain model, the large state space of such models necessitates the use of several types of
approximations. In the case where simulation is used, any kind of parametric analysis of the model
requires a large number of simulation runs (one for each parameter setting) to be performed. For a
comparative overview see [73].
In this chapter we use simulation to first show that several manufacturing systems satisfy the
“smoothness” or “complementary smoothness” conditions introduced in Chapter 4. Hence, depending on the objective, we apply the (SIO) or (SIO) algorithms that were developed in that chapter to
allocated a fixed number of kanban K to the N stations as to either maximize the system throughput
or minimize the average delay of each part (i.e., minimize system time).
6.2. More on the Smoothness Condition
Unfortunately, conditions SIO and (SIO) are difficult to test. Since no easy method for testing
these condition exists, we simulated the investigated systems exhaustively and obtained the system
performance under any possible allocation for all possible number of kanban K = 1, 2, · · ·, and verified
that SIO and (SIO) indeed hold. Specifically, we performed the following experiments.
First, we simulated the serial manufacturing system of Figure 6.1 for N = 4, 5, 6, i.e., for systems
with 5, 6, and 7 queues respectively, where the first queue (Q0 ) was assigned an infinite number of
kanban. The objective function under consideration was the system throughput and it was found
that smoothness was satisfied for all test cases.
λ
Q0
Q1
µ0
…
µ1
QN
µΝ
Figure 6.1: Manufacturing systems consisting of N stations in series
Subsequently, we tested the system shown in Figure 6.2 where, we again assumed that Q0 had
infinite number of kanban. For this system, we considered throughput as the objective function of
interest and found that smoothness was satisfied for all test cases. In addition, we considered the
mean delay as another possible objective function and tested whether such a system satisfies the
¯ was valid.
complementary smoothness condition. The findings were that for all examined cases, (S)
Similar results were reported for the system of Figure 6.3. In this case however, all stages have
a finite capacity. As a result, entities (parts, customers etc.) are lost if they arrive at queues
41
Q1
µ1
Q4
p1
λ
Q0
µ4
Q2
µ0
p2
Q6
µ2
µ6
p3
Q3
Q5
µ3
µ5
Figure 6.2: Manufacturing network
Q1 , · · · , Q3 when no kanban is available1 . In this case, we found that for all examined cases, the
¯ In addition, we considered the customer
throughput satisfies (S) while the mean delay satisfies (S).
¯
loss probability as another possible objective measure and found that it too satisfies the (S).
λ1
Q1
µ1
Q4
µ4
λ2
λ3
Q2
Q6
µ2
Q3
µ6
Q5
µ3
µ5
Figure 6.3: Queueing network
¯ may seem restrictive,
The results reported above suggest that even though conditions (S) and (S)
they are satisfied by several important systems and it reflects the fact that the addition of a single
kanban to a stage of a non-optimal allocation will not cause a “non-smooth” jump in the overall
performance a system.
1
Note that such a system is more relevant in the context of communication systems rather than manufacturing
system but we consider it just to test the applicability of the smoothness condition.
42
6.3. Application of the Incremental Optimization Algorithms
In this section we consider two manufacturing processes modeled as a kanban system consisting of
N + 1 stages. The entrance to stage 0 contains an infinite-capacity buffer, i.e. stage 0 has infinite
kanban. A job completes service at any stage, it continues to a downstream stage if that stage has
an available kanban otherwise it waits, hence blocking the operation of the corresponding server.
Exception is the last stage (N ) which is assumed to be connected to an infinite sink. Finally, jobs
at all stages are processed on a First-In-First-Out (FIFO) basis and no distinction among job types
is made.
Let xi denote the number of kanban allocated to stage i and define the N -dimensional vector
x = [x1 , · · · , xN ] to represent a kanban allocation. We will assume that at least one kanban is initially
allocated to each of stages 1, · · · , N (xi ≥ 1), otherwise the throughput of the system is zero. We will
P
0
further assume that an upper bound on the work-in-process is given such that N
i=1 xi = K . Note
0
that since every stage must have at least one kanban, only K = K − N are available to be allocated
to the N stages. Therefore, the search space must be redefined as follows:
(
Ak =
x :
N
X
)
xi = k + N, xi ≥ 1 , k = 0, 1, · · ·
i=1
6.3.1. Application of SIO on Serial Manufacturing Process
First, we consider the serial manufacturing process shown in Figure 6.1. In this case, the objective
function J(x) is the throughput of the system and hence the problem is to determine an allocation
x that maximizes J(x) subject to the constraint on the total number of kanban. In this case, SIO
can be applied directly with the only modification that x0 = [1, · · · , 1] rather than [0, · · · , 0].
Figure 6.4 shows the evolution of the algorithm for a system with five stages in series (N = 4) when
the available kanban K = 9 (K 0 = 13), and therefore there are 220 possible allocations. Furthermore,
the arrival process is Poisson with rate λ = 1.0 and the service processes are all exponential with
rates µ0 = 2.0, µ1 = 1.5, µ2 = 1.3, µ3 = 1.2, and µ4 = 1.1.
In this figure, the horizontal axis is given in term of steps, where a ”step” represents the interval between the allocation of an additional resource through (4.11) and (4.12). Initially, we obtain
ˆ k,l
performance estimates every 100 departures (f (l) = 100 departures) and every time we reset x
we increase the observation interval by another 100 departures. Through exhaustive search, it was
found that the optimal allocation is [1, 3, 4, 5] with approximate throughput 0.9033. As seen in Figure 6.4, the SIO algorithm yields near-optimal allocations within the first four or five iterations2
(SIO performance curve). It is also worth reporting some additional results not evident from Figure 6.4. Specifically, the algorithm delivered allocations which were among the top 10% designs even
at the very first iteration when the observation interval was limited to only 100 departures. After
the first 10 iterations, (observation intervals greater than 1000 departures) the allocations obtained
were among the top 1% designs, and after the 20th iteration the SIO algorithm consistently picked
the top design [1, 3, 4, 5]. Finally, notice the saw-tooth shape of the evolution of the SIO algorithm
curve which reflects the incremental nature of the algorithm and the resetting of (4.12).
2
An iteration corresponds to K steps
43
0.95
Throughput
0.85
0.75
0.65
0.55
15
30
45
60
75
90
105
120
135
150
165
180
Step
SIO Performance
SIO Evolution
Optimal Solution
Figure 6.4: Evolution of the SIO algorithm
6.3.2. Application of SIO on a Network
Next, we consider the manufacturing process shown in Figure 6.2. For this case, the objective
function J(x) is the throughput of the system and hence the problem is to determine an allocation
x that maximizes J(x) subject to the constraint on the total number of kanban. Again we apply
(SIO) directly using x0 = [1, · · · , 1] rather than [0, · · · , 0].
Figure 6.5 shows the evolution of the algorithm for the network when the available kanban K = 9
= 15), and therefore there are 2, 002 possible allocations. The arrival process is Poisson with
rate λ = 1.3 and the service processes are all exponential with rates µ0 = 3.0, µ1 = µ2 = 1.0,
µ3 = µ4 = µ5 = 1.5, and µ6 = 3.0.
(K 0
Initially, we obtain performance estimates every 100 departures (f (l) = 100 departures) and
ˆ k,l we increase the observation interval by another 100 departures. Through
every time we reset x
exhaustive search (simulating every allocation for 106 departures), it was found that the optimal
allocation is [2, 0, 0, 3, 2, 2] with approximate throughput 1.304. As seen in Figure 6.5, initially the
performance estimates are very noisy since they are taken over very short observation intervals.
As the observation interval f (k) grows larger, then variance of the estimates is reduced but not
eliminated. Note, that even with the noisy estimates, SIO is able to pick allocations with relatively
good performance.
Next, we used the results from the exhaustive search and ranked every possible allocation according their performance. Subsequently, we used this ranking order to check the rank of the allocations
picked by SIO and the results are shown in the histogram of Figure 6.6. This figure shows that all of
the picked allocations are among the top 40% of all possible allocations, but only 13% are among the
top 5%. This suggests that either the SIO does not work very well, or there were several allocations
with optimal or near optimal performance and hence the order we obtained through the exhaustive
44
1.5
Throughput
1.4
1.3
1.2
SIO Evolution
1.1
SIO Performance
Optimal Solution
1
0
50
100
150
200
Step
Figure 6.5: Evolution of the SIO algorithm
simulation does not represent the “true” ranking. In this case, we believe that the later is true since
more than 400 allocations (i.e., 20% of all allocations) exhibited performance within 0.1% of the
optimum.
6.3.3. Application of SIO on a Network
Finally, we used the system considered in the previous section but rather than maximizing throughput
we minimize the mean delay. For this system, through exhaustive search, we found that the optimal
allocation is [2, 3, 2, 0, 2, 0] with mean delay 3.975. As seen in Figure 6.7, the SIO algorithm yields
near-optimal allocations even within the first iteration.
As before, we used the results from the exhaustive search and ranked every possible allocation
according their performance. Based on that order we check the rank of the allocations picked by SIO
and the results are shown in the histogram of Figure 6.8. This figure shows that 58% of the picked
allocations are among the top 1% of all possible allocations while none of the picked allocations is
worse than the top 30% of all possible designs. This is considerably better than the results presented
in the previous section and the reason is that there are considerably fewer allocations that exhibit
near optimal performance.
6.4. Summary
In this chapter we considered to problem of allocating kanban to the various stages of a manufacturing
system as to optimize a performance measure such as the throughput and mean delay. First, we
¯ Subsequently, we used
showed that the systems under consideration satisfy conditions (S) and (S).
(SIO) and (SIO) to determine the optimal allocation and showed that these algorithms can yield
good solutions with a limited amount of effort.
45
16
%
12
8
4
0
5%
10%
15%
20%
25%
30%
35%
40%
45%
50%
Percentile of Top Designs
Figure 6.6: Ranking of the allocations picked by SIO
60
SIO Evolution
SIO Performance
Optimal Solution
System Time
50
40
30
20
10
0
0
50
100
150
Step
Figure 6.7: Evolution of the (SIO) algorithm
46
200
70
60
50
%
40
30
20
10
0
1%
2%
3%
4%
5%
6%
7%
8%
9%
10%
15%
20%
Percentile of Top Designs
Figure 6.8: Ranking of the allocations picked by SIO
47
30%
Chapter 7
CHANNEL ALLOCATION IN
CELLULAR TELEPHONE
NETWORKS
This chapter presents another application of resource allocation methodologies on mobile cellular
telecommunication networks. In this context, channels are the discrete resources while mobile phones
are the users. Using principles from the preceding chapters we are able to develop channel allocation
algorithms that can reduce the probability that a new call will not find an available channel while
using the smallest number of reconfigurations.
7.1. Introduction
In the recent years mobile communications have experienced a tremendous growth. In order to cope
with the increased demand, the service area is divided into cells where frequency channels may be
reused as long as there is sufficient distance separations to prevent interference [51]. In addition,
frequency allocation algorithms may further increase the system capacity, at least for systems that
use either frequency or time division multiple access schemes (FDMA or TDMA respectively).
The simplest frequency allocation scheme is fixed channel allocation (FCA) where each cell is
pre-assigned a fixed set of channels for its exclusive use, making sure that none of the surrounding
cells can use those channels. Apart from its simplicity, FCA can reduce interference by increasing the
frequency separation among the channels assigned to the same cell and it has superior performance
under heavy traffic. On the other hand, FCA cannot adapt to changing traffic conditions. Dynamic
channel allocation (DCA) schemes [21, 61] overcome this problem by allowing all cells to use any
of the available channels as long as interference is kept below certain level. DCA schemes increase
the flexibility and traffic adaptability of the system, and consequently increase the system capacity
under light and medium traffic conditions. However, for heavy traffic conditions, DCA may lead
to inefficient allocations making FCA a better choice. To combine the benefits of the two, hybrid
channel allocation (HCA) schemes [42] have been developed where all available channels are divided
into fixed and dynamic sets. For a comprehensive study of the channel allocation schemes the reader
is referred to [45].
48
In all of the channel allocation schemes described above, a mobile phone is always connected
to the base station with the highest signal to noise ratio, which usually is the base station that is
physically located closer to the mobile user. In practical systems, in order to achieve a complete
area coverage it is inevitable that some mobile phones, depending on their actual location, may be
able to communicate with two or more base stations, i.e. they receive a signal with sufficiently high
signal to noise ratio from multiple base stations. Since such phones can be connected to two or
more base stations, it may be possible to increase the network capacity by connecting new mobile
phones to the “least” congested base station. Towards this end, two algorithms have been developed,
namely, directed retry (DR) and directed handoff (DH) [26, 44, 25]. DR directs a new call to the
base station with the greatest number of available channels, while DH may redirect an existing call
from one cell to a neighboring one to further increase the system capacity (for comparison purposes
both algorithms are described in Section 7.3).
Both, DR and DH perform the “phone allocation” to base stations based on some state information, i.e., the number of available channels. In effect, the two algorithms try to balance the number
of available channels over all base stations. However, using Lagrangian relaxation, it is easy to show
that in order to solve a convex constrained optimization problem, it is important to balance the
partial derivatives of the objective function with respect to the control variables [27]. This idea has
motivated the development of the optimization algorithms developed in the preceding chapters and
it is also a motivating factor for algorithms that are derived in this chapter, namely, Simple Neighborhood (SN) and Extended Neighborhood (EN). These algorithms use derivative-like information (in
the form of a finite difference) to determine the channel allocation. As a result, these algorithms
enhance the performance of DR and DH by decreasing the call loss probability while requiring less
reconfigurations. Moreover, we demonstrate that using overlapping cell structures leads to increased
system capacities that can be better than DCA. Consequently, one can expect such structures to
perform even better in hierarchical models due to the larger overlap between micro and macro cells
[].
7.2. Overlapping Cells and Modeling Assumptions
In this section we describe a cellular system model where the cells are allowed to overlap as seen in
Figure 7.1. In the model we assume that the service area is flat and each cell is represented by a
hexagon inscribed in a circle of radius r = 1. Each cell Ci is serviced by base station Bi which is
located in the center of the cell. The coverage area of each base station is represented by a circle
around Bi , where, in order to achieve full coverage its radius must be greater than or equal to
1. Note that a model with overlapping cells is realistic since even under the idealized conditions
described above, it is necessary that at least 21% of the cell’s area overlaps with neighboring cells. In
practical systems, this overlap is usually much larger, and it can be further increased by increasing
the transmitted power of some transmitters and by applying principles of power control [33, 79].
Next, we define some sets that will be useful when describing the various algorithms that will be
presented in the remaining of this chapter.
A(Bi ) is the set of all base stations that are located in the cells that are adjacent to cell Ci and is
serviced by base station Bi .
IN (c) Immediate Neighborhood is the set of all base stations that mobile phone c can be directly
connected to.
49
EN (c) Extended Neighborhood is the set of all base stations that are adjacent to the cells that are
S
in IN (c), i.e., EN (c) = j∈IN (c) A(j).
M (Bi ) is the set of all mobile phones that are connected to base stations Bi , and
H(Bi ) is the set of all mobiles that are not connected to base station Bi but which can be connected
to Bi , i.e., these are calls connected to a base station in A(Bi ) and are located in the area that
overlaps with Bi .
For example, in Figure 7.1, A(B1 ) = {B2 , B3 , B4 , B5 , B11 , B12 }, IN (c) = {B1 , B2 , B3} and EN (c) =
{B1 , · · · , B12 }.
To complete the notational definitions we assume that base station Bi is assigned a fixed number
of channels Ki . Furthermore, we use mi to denote the number of channels that are currently available
in Bi (mi = Ki − |M (Bi )|). Finally, by B ∗ (c) we denote the base station that is located closer to
mobile c or has the highest signal to noise ratio.
Existing call (e)
B4
B12
B5
B6
B1
B2
B11
B3
B7
B10
B8
New call (c)
B9
Figure 7.1: Overlapping Cell Structure
In addition, the results of this paper follow the basic assumptions made in [26], that is:
1. Call arrivals are described by a Poisson process with rate λ while the call duration is exponentially distributed with mean 1/µ.
50
2. Blocked calls are cleared and do not return.
3. Fading and co-channel interference are not accounted for. Propagation and interference considerations are simply represented by the constraint that, if the channel is used in a given cell,
it cannot be reused in a ring of R cells around that cell, R = 1, 2, · · ·.
4. Mobile phones are assumed stationary, i.e. in this model we do not account for mobile users
that roam from one cell to another.
5. Certain mobiles may be connected to multiple base stations, i.e. mobiles that are located in
the intersection of the coverage areas of two or more base stations as shown in Figure 7.1.
7.3. DR and DH Schemes
In this section we describe variations of the two algorithms that have appeared in the literature and
use the overlapping cell model, namely “directed retry” and “directed handoff”.
In DR when a new call is initiated, it is connected to the closest base station if the number
of available channels in that base station is greater than a threshold T . If the number of available
channels is less than or equal to T , then the new call is assigned to the base station that has the most
available channels from all base stations in the immediate neighborhood of the new call (IN (c)).
Directed Retry (DR)
When a new call c arrives
1.
If mi∗ > T , carry call c at Bi∗ , where Bi∗ = B ∗ (c)
2.
If mi∗ ≤ T , carry call c at Bj ∗ , where j ∗ = maxj∈IN (c) {mj }
3.
If mj = 0, for all j ∈ IN (c), then call c is blocked
DH is an enhancement to DR where an existing call may be redirected to a neighboring base
station in order to accommodate more new calls. Specifically, this scheme works as shown below
Directed Handoff (DH)
When a new call c arrives
1.
Find Bi∗ = B ∗ (c) and define Q = {Bi∗ } ∪ A(Bi∗ )
2.
Let j ∗ = arg maxj∈Q {mj }
3.
If j ∗ = i∗ , carry c at Bi∗ and go to 7.
4.
If there exists call e ∈ Mi∗ ∩ Hj ∗ , then handoff e to Bj ∗ ,
and carry the new call c at Bi∗ . Go to 7.
5.
If Mi∗ ∩ Hj ∗ = ∅, Q := Q − {Bj ∗ }
6.
If Q 6= ∅ go to 2 else the call is blocked
7.
END.
51
DH can improve the performance of DR by being able to redistribute calls over seven base
stations (the base station located closer to the new call plus its six adjacent cells), while DR can
only redistribute to up to a maximum of three base stations. The trade off is an induced handoff on
an existing call which is forced to switch its carrying base station.
7.4. Performance Enhancements
As mentioned in the introduction, DR and DH perform the call allocation to the base stations
based on the current state information, i.e., the number of available channels in each cell, whereas,
it would be desirable to use derivative like information to perform the optimization. To derive
such information, we need an objective function and given that we are interested in minimizing the
number of lost calls, it is natural to consider the steady state loss probability as given by the Erlang
B formula.


K
−1
ρKi Xi ρji 
LP = i 
Ki ! j=0 j!
(7.1)
where Ki is the number of channels assigned to base station Bi and ρi = λi /µi is the traffic intensity
in cell Ci .
Note however, that for the type of algorithms we are interested in, the number of channels Ki is
fixed for all cells, hence, any derivative like function of the steady state loss probability with respect
to any parameter other than Ki , will always be equal to zero. This suggests that rather than using
a steady state measure, it may be preferable to use a transient one. So, rather than directly trying
to minimize the steady state loss probability, one can try to minimize the loss probability over the
next τ time units and hope that such actions will also minimize the steady state loss probability.
To derive a transient objective function recall that any base station can be modeled by an
M/M/m/m queueing system (see [48, 12]). Such a systems generates a birth-death Markov chain
with a probability mass function π i (t) which is the solution of the differential equation
dπ i (t)
= π i (t)Qi
dt
(7.2)
i (t)] and π i (t) is the probability that at time t there will be j, j =
where π i (t) = [π0i (t), · · · , πK
j
i
0, · · · , Ki , active calls in cell Ci . Furthermore, Qi is the transition rate matrix which for the
M/M/m/m queueing system is given by

−λi
λi
 µ
−(λ
i
i + µi )




Qi = 





0
2µi
0
..
.
0
..
.
0
···
0
λi
0
0
···
···
−(λi + 2µi )
..
.
λi
..
.
···
..
.
0
···
0
0
..
.
..
.
(Ki − 1)µi −(λi + (Ki − 1)µi )
λi
0
Ki µi
−Ki µi












(7.3)
i (τ ),
What we are after, is the probability that a call is lost within the next τ time units, i.e., πK
i
which for small τ is clearly going to be a function of the initial conditions. Hence, for every cell Ci
52
we define the following objective function:
i
Li (mi ) = πK
(τ ), s.t. π i (0) = eKi −mi
i
(7.4)
where ej is a (Ki + 1)-dimensional vector with all of its elements equal to zero except the jth one
which is equal to 1 and mi is the number of the base station’s free channels at the decision time, i.e.,
at t = 0. Based on this objective function, we then define the following finite difference
∆Li (mi ) = Li (mi − 1) − Li (m), mi = 1, · · · , Ki
(7.5)
with a boundary condition Li (0) = ∞. Next, we are ready to describe our optimization algorithms.
7.4.1. Simple Neighborhood (SN)
The SN algorithm is very similar to DR. Their only difference is that SN assigns the new call to the
“least sensitive” base station with respect to the number of available channels, while DR assigns the
new call to the base station with the largest number of available channels. Note that for systems
with uniform traffic over the entire coverage area, and when the threshold T of DR is set equal to
the number of available channels, then the two algorithms behave exactly in the same way, since
the least sensitive base station will always be the one with the most available channels. This is also
observed in some of the simulation results that we present in the next section. More specifically, the
algorithm works as follows:
Simple Neighborhood (SN)
When a new call c arrives,
1.
Let i∗ = arg mini∈IN (c) {∆Li (mi )}
2.
Assign c to base station Bi∗
3.
END.
7.4.2. Extended Neighborhood (EN)
The Extended Neighborhood algorithm rather than looking for the least sensitive base station among
the cells in the immediate neighborhood, it searches the entire extended neighborhood. If the least
sensitive base station (say i∗ ) is within IN (c), then it assigns the call to i∗ as in SN. If the least
sensitive base station is not in IN (c), then this scheme looks for an existing call that is connected to
one of the IN (c) base stations, and is located in the intersection of the least sensitive base station
with any of the cells in IN (c). For example, in Figure 7.1, call e is connected to B1 and is located
in the intersection of the coverage area of B4 (the least sensitive base station among EN (c)). Then,
the algorithm induces a handoff, connecting e to B4 while assigning c to B1 . Note, that if there is
no call in the intersection of the least sensitive base station with any of the cells in IN (c), then the
scheme looks for the next best option. Next, we formally describe the EN algorithm
53
Extended Neighborhood (EN)
When a new call (c) arrives,
1.
Define Q = EN (c)
2.
Let i∗ = arg mini∈Q {∆Li (mi )}
3.
If Bi∗ ∈ IN (c), assign c to Bi∗ and go to 7.
4.
If there exists e ∈ Ai∗ ∩
5.
Handoff call e to Bi∗ and assign c to the base station
that e belonged to. Then go to 7
6.
Q = Q − {Bi∗ }. If |Q| = 0 call is blocked, otherwise, go to 2.
7.
END.
³S
´
j∈IN (c) Mj go to 5, else go to 6.
The EN algorithm is also similar to the DH algorithm. Their differences lie in step 2 where EN is
trying to minimize the sensitivities with respect to the number of available channels while DH tries
to find the base station with the most available channels. Another difference is that EN is trying to
find the least sensitive base station among all base stations in EN (c) while DH is trying to find the
base station with the most available channels in A(B ∗ (c)) ⊆ EN (c).
7.4.3. On-Line Implementation of SN and EN
Under the Poisson arrival and exponential call duration assumptions one can monitor the system
over an interval of length T and get estimates of the actual parameters λi and µi for all base stations
Ci , i = 1, 2, · · ·. Based on these estimates, we solve the differential equation (7.2) Ki + 1 times,
once for each initial condition, and use the results to determine the finite differences ∆Li (j) for all
j = 0, · · · , Ki which are then saved and used by the controller over the next interval. Therefore, this
algorithm does not impose any significant computational burden.
An interesting question rises when the underlying distributions are not exponential. In this case,
the differential equation (7.2) is not valid and therefore neither the finite differences (7.5). In this
case, we can directly get estimates of the finite differences over the interval T and use those to drive
the SN and EN algorithms. A simple algorithm for obtaining such estimates is the following: Every
τ time units observe the current state of each cell, i.e. the number of active channels. Count the
number of times that each state was observed Nl , for l = 0, · · · , Ki . Also count the number that
that the state at t0 , is l and at t0 + τ it becomes m, Nlm for all l, m = 0, · · · , Ki . Finally, form the
appropriate ratios to get the required estimates.
7.5. Simulation Results
In this section we present simulation results for the call loss probability of SN and EN and compare
them with the corresponding results of DR and DH. In addition, we use the results derived in [19, 20]
to compare the performance of these algorithms with the performance of Dynamic Channel Allocation
(DCA) algorithms. Specifically, we reproduced the lower bounds on two DCA algorithms: (a) The
“Timid DCA” scheme which allows a new mobile phone to connect via any channel that is not used
54
in any of the cells that are located in the R consecutive rings surrounding the closest base station.
(b) The “Aggressive DCA” scheme which allows a new mobile to get any channel, even if it is used
in an adjacent cell, and force existing calls to search for new interference free channel in their area1 .
These bounds are the result of an ad hoc Erlang-B model which uses the Erlang B formula (7.1) and
substitutes the traffic intensity ρ → N ρ and the total number of available channels K → δK, where
N is the reuse factor2 [51], and δ is the normalized channel utilization3 .
For the results presented next, we assume that there is a total of 70 channels. Any channel
assigned to base station Bi cannot be reused in any base station in the ring of radius R = 2 cells
around Bi . In this case, the reuse factor N = 7 and hence each base station is assigned 10 channels.
In addition, we assume that the service area consists of 64 cells each of which is represented by a
hexagon inscribed in a circle of radius r = 1, arranged in an 8 × 8 grid with a base station located in
the center. Note that in order to achieve full area coverage, the coverage radius of each base station
must be at least equal to 1. Finally, in order to reduce the effect of cells being located at the edges we
have used the model in [26]: a cell on an edge is assumed to be adjacent to the cells on the opposite
edge.
Figure 7.2 shows how the probability of a mobile being able to hear multiple base stations
changes as a function of the coverage radius of each base station assuming that demand is uniformly
distributed over the entire area. Note that even at the minimum coverage radius at least 20% of the
mobile phones are covered by two base stations.
Next we compare the call loss probabilities for various channel allocation and “mobile allocation”
algorithms. Figure 7.3 compares the call loss probability of the five channel allocations schemes
namely FCA, DR, SN, DH and EN when the traffic is uniform over the entire service area and the
coverage radius is 1.14. In this case, about 50% of all calls can hear a single base station, 43%
can hear two base stations and 7% can hear three base stations. For DR, we simulated the system
with 2 different thresholds T = 5 and T = 10. As T increases, the loss probability decreases and
at the limit, i.e., when T is equal to the number of the available channels, the performance of DR
is identical with the performance of SN as indicated in Section 7.4.1. Furthermore, note that DH
and EN exhibit superior performance which is considerably better than the Timid DCA while for
intensities ρ ≥ 7.5, EN outperforms even the aggressive DCA.
A similar story is also presented in Figure 7.4 which compares the loss probabilities of the seven
algorithms when the coverage radius is equal to 1.4, i.e. when 15% of the calls can hear one base
station, 33% can hear two base stations and 52% can hear three base stations. Note how the call
loss probability has been dramatically reduced and note that both, EN and DH outperform even the
aggressive DCA. However, increasing the coverage radius by that much will increase the co-channel
interference between adjacent base stations and it is possible that such a configuration may not be
feasible due to noise. On the other hand, we point out that such overlapping probabilities may be
feasible for non-uniform traffic or in hierarchical cell structures. For example, when planing a system,
the base stations may be placed is a way such that the overlapping areas is over high usage regions.
EN and DH improve the system performance at the expense of intracell handoffs. When a new
call arrives, they may redirect an existing call from its present base station to a neighboring one to
1
Note that in [19] it is stated that no practical algorithm exists that can implement the aggressive DCA and
conjecture that this bound may not be attainable.
2
N = i2 + ij + j 2 where i, j are integers. For R odd, i = j = (R + 1)/2 and for R even i = R/2 and R = j/2 + 1.
3
For a Timid DCA scheme δ(R = 1) = 0.693, δ(R = 2) = 0.658, δ(R = 3) = 0.627 while for the aggressive DCA
scheme δ = 1 [19].
55
0.9
Call listens 1 BS
Call listens 2 BSs
Call listens 3 BSs
0.8
Coverage Probability
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1
1.03
1.06
1.09
1.12
1.15
1.18
1.21
1.24
1.27
1.3
1.33
1.36
1.39
1.42
Normalized Cell Radius
Figure 7.2: Cell overlapping as a function of the cell radius
accommodate the new call. As shown in Figure 7.5, the use of derivative like information allows EN
to achieve lower call loss probability than DH while requiring significantly less reconfigurations (less
intracell induced handoffs).
Figure 7.6 shows how the call loss probability will change as the cell radius changes when the
traffic intensity in each cell is ρ = 8 Erlangs. Note that when DH and EN are used, a small increase
in the base stations’ coverage area may dramatically decrease the call loss probability.
Next, we investigate the effect of the parameter τ on the performance of the SN and EN algorithms. This is shown in Figure 7.7 which indicates that the algorithms perform well under small
values of τ however as τ gets larger, their performance degrades. This is reasonable because algorithms like SN and EN behave like D-DOP and S-DOP presented in Chapter 3 and hence require
some convexity assumptions like A3.2. Note that the transient performance measure we used (7.4)
is convex with respect to the initial conditions for τ = 0, but not strictly convex. For values of
0 < τ < τ 0 it becomes strictly convex but for values of τ > τ 0 it becomes non-convex, where τ 0 is a
constant that depends on λ and µ. The objective of SN and EN is to direct new calls to the “least
sensitive” base station. If the objective function is convex, then the minimum finite difference (7.5)
correctly identifies that base station. On the other hand, this is not guaranteed when the function is
not convex. Hence, the performance of the SN and EN algorithms degrades as the objective function
becomes non-convex.
As mentioned previously, under uniform traffic, the performance of DR is almost identical to SN
when the threshold T is equal to the total number of channels. For non-uniform traffic however, SN
has an advantage due to the use of the sensitivity information. Figure 7.8 shows the overall call loss
probability when the coverage radius is 1.14 and the traffic is such that for every three neighboring
56
Call Loss Probability
1.0E+00
1.0E-01
1.0E-02
FCA
DR (T=5)
DR (T=10)
SN
DH
EN
Timid DCA
Aggr. DCA
1.0E-03
1.0E-04
3
3.5
4
4.5
5
5.5
6
6.5
7
7.5
8
8.5
9
9.5
10
Traffic Intensity (Erlangs)
Figure 7.3: Call loss probabilities as a function of the traffic intensity ρ when the cell radius is 1.14
cells, one has a fixed intensity of 8 Erlangs, another has intensity of 2 Erlangs and the intensity of
the third varies as in the horizontal axis of the figure. For this case, SN exhibits slightly lower loss
probability than DR while again EN exhibits the best performance.
7.6. Conclusions and Future Direction
In this chapter we presented two algorithms (SN and EN) which use sensitivity like information to
improve the performance of DR and DH schemes in the context of overlapping cellular networks.
SN and EN exhibit lower call loss probability than DR and DH respectively while EN also reduces
the number of intracell handoffs. Furthermore, for instances where a high enough percentage of calls
can be connected to multiple base stations, these algorithms can achieve lower call loss probabilities
than many DCA schemes.
In should be interesting to investigate the effect of such algorithms in systems with hierarchical
cell structure where overlapping areas are potentially much bigger. Furthermore, for this work we
assumed that the mobile users are “stationary”, i.e., they do not cross the boundaries of the cell
where they were initiated. Clearly, it would be interesting to look at how such algorithms affect the
number of dropped calls when users are allowed to roam. Finally, another interesting aspect of such
systems is the issue of fairness as described in [49] (i.e., mobile users depending on their location can
perceive different quality of service because they can here multiple base stations).
57
1.0E+00
FCA
DR (T=5)
DR (T=10)
SN
DH
EN
Timid DCA
Aggr. DCA
1.0E-01
1.0E-02
1.0E-03
1.0E-04
1.0E-05
3
3.5
4
4.5
5
5.5
6
6.5
7
7.5
8
8.5
9
9.5
10
Traffic Intensity (Erlangs)
Figure 7.4: Call loss probabilities as a function of the traffic intensity ρ when the cell radius is 1.4
0.9
Handoffs per Call
0.8
0.7
0.6
0.5
DH (r =1.14)
EN (r =1.14)
DH (r =1.4)
EN (r =1.4)
0.4
0.3
3
3.5
4
4.5
5
5.5
6
6.5
7
7.5
8
8.5
9
9.5
10
Traffic Intensity
Figure 7.5: Average number of induced handoffs for EN and DH.
58
0.12
DR
SN
DH
EN
0.1
0.08
0.06
0.04
0.02
0
1
1.03
1.06
1.09
1.12
1.15
1.18
1.21
1.24
1.27
1.3
1.33
1.36
1.39
1.42
Normalized Cell Radius
Figure 7.6: Call loss probabilities as a function of the cell radius
0.14
SN-8
EN-8
0.12
SN-9
EN-9
0.1
0.08
0.06
0.04
0.02
0
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
0.22
0.24
Transient Horizon (ττ ))
Figure 7.7: Call loss probabilities as a function of the parameter τ
59
1
FCA
DR (T=10)
SN
DH
EN
0.1
0.01
0.001
4
4.5
5
5.5
6
6.5
7
7.5
8
8.5
9
9.5
10
10.5
Traffic Intensity
Figure 7.8: Call loss probabilities for non-uniform traffic
60
11
Chapter 8
GROUND-HOLDING PROBLEM IN
AIR TRAFFIC CONTROL
This chapter presents our final application of resource allocation methodologies. Specifically we
consider the ground-holding problem as it arises in air traffic control. In this context a runway is
considered as the resource while airplanes using it to land or take off are the users. First, we consider
the runway as a discrete resource by dividing the interval that it can be used into time slots and
hence assign each time slot to an aircraft. Subsequently, we relax the time discretization and view
the runway as a continuous resource.
8.1. Introduction
In recent years, air traffic has increased dramatically while airport capacity has remained stagnant.
This has resulted in congestion problems which degrade the performance of the air traffic control
system, raise safety concerns, and cause excessive costs (of the order of several billions of US dollars).
Adding to the problem is the important fact that the capacity of an airport is sensitive to changes in
weather conditions (e.g., visibility, wind). Thus, even if a maximum airport capacity were adequate
to meet the scheduled demand, it is not unusual for this capacity to drop by half or even more due
to bad weather conditions, resulting in serious congestion problems that typically propagate to other
airports as well.
Solutions to this problem vary according to the planning horizon. Long-term considerations
involve building new airports and additional runways. Medium-term approaches focus on ways
that disperse traffic to less utilized airports through regulation, incentives, etc. Finally, short-term
solutions aim at minimizing the unavoidable delay costs under the current capacity and demand.
This chapter proposes and analyzes control schemes that belong to the latter category.
The most important class of solutions to the short-term congestion problem is through GroundHolding Policies (GHP) which are based on the premise that ground delays are less expensive (less
fuel) and safer than airborne delays. The objective of any GHP is to trade off airborne delays for
ground delays. Thus, when a flight is scheduled to arrive at some destination airport at a time when
high congestion is expected, it is instructed to delay its departure from the origin airport until air
traffic subsides. The fundamental issue in any GHP is to determine which flights should be delayed
61
and by how long. Several authors have considered this problem for a single destination airport.
Andreatta and Romanin-Jacur [5] have studied the single-period GHP problem and used dynamic
programming (DP) to obtain a GHP that minimizes the delay cost. Terrab and Odoni [72] extended
these results to the multi-period GHP, while Richetta and Odoni [62, 63] also addressed the same
problem formulated as a stochastic linear program which they solved to obtain the optimal solution.
However, it is well-known that the DP approach suffers from the “curse of dimensionality”; for a
problem of realistic size, the resulting state space explodes, making a solution practically intractable
and necessitating the use of heuristics. For the GHP problem in a network of several destination
airports, Vranas et. al. [75] developed three models under slightly different assumptions, which are
based on a zero-one integer program formulation. Bertsimas and Patterson [8] enriched the model
by introducing en-route capacity constraints, and also addressed issues that arise due to banks of
flights in the hub-and-spoke system, as well as rerouting of aircrafts. A comparative study for three
different approaches of solving the GHP problem for multi-airport systems has been conducted by
Andreatta and Brunetta [4].
The GHP problem is generally viewed as having a stochastic and a dynamic component. First,
airport capacity is stochastic since it is weather dependent and cannot be predicted accurately,
especially for long periods of time. A dynamic component is present, since, as time progresses,
better weather estimates become available which may repeatedly require changes in the scheduling
policy. In addition, sudden changes in operating conditions (e.g., emergencies) also require rapid
rescheduling capabilities. Note that the aforementioned methodologies reported in the literature
are based on some type of a Linear Program (LP) to solve the GHP problem, which for typicalsize problems, contains several hundreds of thousands of variables and constraints. An additional
difficulty is present in some of the techniques which also involve integer programming: since there is
no guarantee that the LP relaxation will yield an integral solution, techniques such as the branch-andbound may also be necessary, increasing the time needed to solve a single instance of the problem.
Due to the dynamic component of the GHP problem, it is essential to solve several instances of it
over the period of a day, as one tracks changes in operating conditions and hence airport capacities.
However, due to the size of the problem, such approaches may easily become impractical.
In this chapter we address the GHP problem for a single destination airport and introduce two new
solution approaches suited for this problem. Our first approach is motivated by the kanban control
policies introduced in the previous chapter; more specifically it follows the Kanban Smoothing (KS)
flow control policy, first proposed in [56]. Its main advantage is that it is inherently dynamic and
at the same time very simple to implement. By controlling certain parameters, one can trade off
ground and airborne delays. This approach, however, is not aimed at completely eliminating airborne
delays. This has motivated our second approach, in which we study the sample paths generated by
scheduled flight departures and arrivals with the explicit aim of assigning each flight a GH delay
that minimizes a specific cost function. By invoking Finite Perturbation Analysis (PA) techniques,
we develop an efficient algorithm which eliminates airborne delays.
The advantages of (KS) include: (a) It is very easy to implement and inherently dynamic. As
soon as new estimates of the airport capacity become available, they can be immediately entered
into the controller by just changing the number of kanban in the relevant stages; thus, there is no
a priori need to limit airport capacity to a small number of profiles so as to keep computational
effort to manageable levels (as was done in [62]). (b) It automatically addresses uncertainties in
the departure time or travel time of each flight through its kanban release mechanism. (c) It is
distributed over airports and scalable, since every destination airport can have its own controller
and a new airport can be added by simply adding a new controller. (d) It facilitates en-route speed
62
control; for example, an airplane that enters a stage without an available kanban can be instructed
to slow down and later speed up as kanban become available, and (e) It easily addresses en-route
capacities by assigning a similar controller to every sector and requiring that every flight must obtain
a kanban from each sector in its path before it is allowed to take off.
The second approach bypasses a limitation of the proposed (KS) policy, as well as all LP policies
proposed in the literature, in which a day is divided into small intervals. The smaller the intervals,
the higher the effectiveness of the resulting policy; however, the size of the problem increases and
this requires more computational effort to obtain a solution. As an example, if a day is divided into
15-minute intervals and the airport capacity is 5 landings per interval, then 5 flights assigned to any
such interval may arrive at any point during the interval; therefore, they are not guaranteed zero
airborne waiting time. On the other hand, if a day is divided into 3-minute intervals, then only 1
landing per interval will be allowed, reducing, but not eliminating, the airborne delay cost. This
however, will increase the number of variables and constraints required, hence the complexity of the
problem will be increased. Depending on the amount of actual delay assigned by such policies, they
may become more “conservative” or more “liberal” in the sense that they may introduce excessive GH
delays while the runway at the destination airport is idle, or allow more than the minimum possible
airborne delays. This problem has motivated our second control scheme, which aims at forcing
airborne waiting times to zero, while avoiding excessive ground holding delays. The cornerstone of
this approach is to replace a time-driven model in which one considers flights within intervals of
fixed length by an event-driven model in which one studies sample paths of the air traffic system
generated by departure and arrival events under a specific policy. Towards this end, we have employed
an FPA scheme similar to the one described in an earlier chapter (Section 5.4). In the GHP case,
the “specified change” is the introduction of any new flight arrival, and the FPA scheme we develop
evaluates the impact of such an event on the delay cost for all possible GH delay values. This allows
us to determine the GH delay which minimizes the cost that this new flight will induce to the entire
system.
8.2. System Model
For the purposes of this paper we consider a network with M departure (source) airports (S1 , · · · , SM )
and a single destination airport D as shown if Figure 8.1. Over the period (0, T ), there are K flights
(f1 , · · · , fK ) scheduled to arrive at D, at times T1 , · · · , TK respectively. Any flight fk that departs
from Si , i = 1, · · · , M , at time tk arrives at the destination airport at time Tk = tk + di where, the
travel times di of each airplane are deterministic and known in advance. Delays, therefore, are only
due to congestion.
S1
S2
d1
.
.
..
..
d2
Airborne wait
queue
Destination Airport
Runway
Main Area
dM
SM
Figure 8.1: Destination airport queueing model
63
At the time of an airplane arrival at D, if the runway of D is not occupied by a preceding
airplane, it immediately proceeds to the runway in order to land, otherwise it delays its landing until
all preceding airplanes clear the runway. Once an airplane has landed, it proceeds to the airport’s
gates making the runway available for the next airplane. Note that for safety reasons there is a
minimum time separation, Zt , between any two consecutive airplane landings which depends on the
weather conditions. Under “good weather”, Zt obtains its minimum value and therefore the airport
capacity CtD , defined as the number of landings per unit of time, assumes its maximum value. As
weather conditions deteriorate, Zt increases and so CtD decreases.
In the context of queueing networks, a runway corresponds to a server. Therefore, the system
can be represented by a single queue served by multiple servers, one for each available runway. For
simplicity, in our analysis we assume a single server but extensions are straightforward.
8.3. Kanban-Smoothing (KS) Control Policy
According to the KS policy, the entire air traffic network is divided into N stages based on the
distance of any point from the destination airport D which is defined as stage 0. Thus, at any given
time every airplane belongs to one of the stages depending on its distance from D. An airplane that
is either landing or waiting to land (i.e., it is in the airborne wait queue) is in stage 0. Any other
airplane is in the ith stage if it is at a distance d such that (i − 1)σ < d ≤ iσ, away form D, where σ
is the stage range or stage duration and is a parameter set by the controller. Note that rather than
using actual distance units (e.g., Km or miles) to describe each stage, it is preferable to use time
units in order to accommodate airplanes with different speeds. In this case, d corresponds to the
expected time needed by the airplane to reach D and σ has time units (e.g., minutes). In Figure 8.2
flight fd is expected to arrive at D after 2σ < d < 3σ time units so it is assigned to stage 3. Similarly,
fe is assigned to stage N − 1.
Every stage i is assigned ki kanban (tokens or permits). Every time an airplane enters stage i
it releases the kanban from the previous stage, i + 1, and receives a kanban from the new stage i.
Unlike manufacturing systems, however, the stage boundaries in the air traffic network cannot be
rigidly set. In a serial production line, if a part finds no kanban available in the next stage it is
simply forced to wait at its current stage until a kanban becomes available. In our case, however,
a traveling airplane cannot be forced to wait in the midst of its trip until a kanban is freed. Thus,
our controller allows for a relaxed exchange of kanban as follows: If an airplane that already holds
a kanban from a previous stage enters a new stage that does not have a free kanban, the airplane
simply does not release the kanban of the previous stage. In other words, it only releases a kanban
of an upstream stage when it can get a kanban of some downstream stage.
The control policy operates on airplanes that have recently become ready to take off as follows:
The airplane informs the controller of its departure time from a source airport and its expected
arrival time at D. Using this information, the controller determines the airplane’s original stage, say
i. If that stage has an available kanban, it assigns it to the airplane which is then allowed to take
off. On the other hand, if no kanban is available, the controller searches all upstream stages for a
free kanban. If it finds one, say m > i, it assigns it to the new airplane, but instructs it to delay its
departure by an amount of time which corresponds to the time difference between the stage that the
airplane is currently in and the stage that the available kanban was found, i.e., by (m − i)σ.
64
Stage N-1
Stage 3
fd
Stage 2
Stage 1
fb
fa
Destin.
Airport
D
Stage 0
fc
Time
Kanban
flow
fe
Figure 8.2: Stage representation for the KS control policy
8.3.1. Representation of (KS) as a Timed State Automaton
To formalize this process, we shall model it through a timed state automaton (E, X , Γ, f, x0 , V), where
E is the event set, X is the state space, Γ contains the feasible event sets for all x ∈ X , f is the state
transition function, x0 is the initial state and V is the clock structure associated with the events in
E (for the dynamics of the automaton see Chapter 5 and for more details see [12]). In this context,
the event set is given by
E = {α1 , · · · , αN −1 , β1 , · · · , βN −1 , γ}
where αi denotes the event that an airplane which is located in stage i becomes ready to take off, βi
denotes the event that an airplane has moved from stage i to i − 1, and γ denotes the event that an
airplane has landed at D. The state space of the system is described by two N -dimensional integer
vectors (x, a) such that
x = [x0 , · · · , xN −1 ] and a = [a0 , · · · , aN −1 ].
(8.1)
In x, element x0 is the number of airplanes physically present in stage 0 (i.e., airplanes that are either
landing or waiting to land). In a, element a0 is the number of free kanban of stage 0. Similarly, xi
is the number of airplanes that are physically in the ith stage, that is, they are at a distance d away
from D such that (i − 1)σ < d ≤ iσ. Lastly, ai is the number of free kanban at the ith stage. Note
P −1
that 0 ≤ xi ≤ N
j=i kj and 0 ≤ ai ≤ ki , where kj is the number of kanban assigned to stage j.
For every state (x, a), the feasible event set is given by
Γ(x) =
N[
−1
j=1
αj
N[
−1
{βj : xj > 0}
j=1
65
[
{γ : x0 > 0}
(8.2)
that is, events αj are always feasible, while an event βj or γ is only feasible if the corresponding
stage contains at least one airplane.
To specify the state transition functions, we first define the following auxiliary variables:
(
pi =
N,
min{m : m ≥ i, am > 0},
if ai = 0, · · · , aN −1 = 0
otherwise
for all i = 1, · · · , N − 1, and


qi = min m : m ≥ i,

m
X
j=i
xj ≤
m
X
j=i


kj

for all i = 0, · · · , N − 1. The variable pi is used to determine the first upstream stage (i.e., pi ≥ i)
that has an available kanban and if none exists we set pi = N . The variable qi determines the first
upstream stage that does not have any airplane with kanban assigned from a further upstream stage.
The criterion for such a stage (say m) is that the total number of airplanes in stages i through m
must be less than the total number of kanban assigned to these stages.
We now specify the state transition functions as follows (a prime ‘·0 ’ is used to indicate the next
state following an event occurrence):
• If event αi occurs and pi < N , set:
x0pi = xpi + 1 and a0pi = api − 1
• If event βi occurs, set:
x0i = xi − 1 and x0i−1 = xi−1 + 1
In addition, if ai−1 > 0 set
a0i−1 = ai−1 − 1 and a0qi = aqi + 1
• If event γ occurs, set:
x00 = x0 − 1 and a0q0 = aq0 + 1
When an αi event occurs, the state of stage pi is updated. If none of the feasible stages i, · · · , N − 1
has an available kanban, i.e., pi = N , then the corresponding flight is cancelled (or must be held
and scheduled only when some kanban becomes available). When a βi event occurs, if the new stage
(i − 1) has an available kanban (i.e., ai−1 > 0), then the airplane involved will release the kanban of
stage i and get a kanban form i − 1. However, it is possible that in stage i there is an airplane that
still has a kanban from stage i + 1 because when it arrived at i there was no kanban available. In
this case, this airplane will get the newly freed kanban and release the kanban of i + 1. In general,
there may be several airplanes that are physically in stage s ≥ i but have kanban from another stage
(say z > s). Those airplanes will get a kanban from z − 1 and release the kanban from z. In effect,
a newly released kanban will propagate upstream until it finds the first stage that does not have
any airplane with kanban from a further upstream stage, determined through qi . A similar upstream
propagation of kanban may occur when a γ event occurs as well, in which case q0 is used to determine
the ultimate stage that will receive a newly released kanban. For example, in Figure 8.2, when fa
66
enters stage 0, it gets a kanban from stage 0 and releases the kanban of stage 1. Subsequently, fb
gets the kanban of stage 1 and releases the kanban of stage 2, which is then taken by fc . Finally, fc
releases the kanban of stage q = 3. So even though fa released a kanban of stage 1, at the end the
released kanban in the one from stage 3.
The initial state x0 can be any feasible state, but for the sake of simplicity we assume that the
system starts out empty, so x = [0, · · · , 0] and a = [k0 , · · · , kN −1 ]. Note that at any airport usually
there are no scheduled arrivals between midnight and six o’clock in the morning, which gives enough
time for the system to empty making it a reasonable assumption. Finally, the clock structure V
describes the lifetime sequence of every event in E through some probability distribution.
8.3.2. Evaluation of GHD Under (KS)
In the event that there is no kanban available in the current stage for a flight that has become ready
to take off, it is immediately apparent that this flight will experience some ground holding delay
(GHD). The determination of the exact delay can result in a “conservative” or “liberal” approach
with respect to the airborne delay allowed as illustrated next.
Distance from D
Stage 0
Stage 1
Stage i+1
Stage i
Stage i-1
Airport D
f1
d
Z
t1
f1
δt
wg
f2
t2
case (a)
wa
wg
f1
I
t2
case (b)
wg
f1
f2
f2
t2
case (c)
Z
Figure 8.3: Assignment of Ground-Holding Delay (GHD) under KS. (a) GHD until the beginning of
the next stage. (b) GHD until the end of the next stage. (c) GHD until the previous airplane clears
the runway.
For simplicity, we assume that there is a single source airport S (i.e. M = 1) and it is located
d time units away from the destination airport D. Furthermore, assume that each stage is assigned
a single kanban (i.e., ki = 1 for all i = 1, · · · , N ) and that the system starts out empty. At t = t1 ,
flight f1 is ready to depart from S and since (i − 1)σ ≤ d ≤ iσ, f1 is assigned to stage i as shown
in Figure 8.3. After δt, at t = t2 = t1 + δt, f2 is ready to take off from S. If f1 is still in stage i,
then f2 will be assigned to stage i + 1 and will be given a kanban from the same stage. Now, any
67
airplane in stage i + 1 will arrive at D after an interval s such that iσ ≤ s ≤ (i + 1)σ. In order
for f2 to arrive within this range, it must delay its takeoff by an interval wg which can take any
value between iσ − d ≤ wg ≤ (i + 1)σ − d. In the more liberal approach, wg = iσ − d. Then, it
is possible that f2 will arrive at D before f1 clears the runway, therefore it will experience some
airborne delay wa as shown in case (a) of Figure 8.3. On the other hand, in the more conservative
approach, wg = (i + 1)σ − d. In this scenario the runway of D will be idle for a period I while f2 is
experiencing unnecessary ground delays as shown in case (b). Notice that in the best case scenario,
f2 will arrive at D immediately after f1 has cleared the runway as shown in case (c). Under this
scenario, the runway does not remain idle, while the airborne delay wa is zero. Clearly, the total
delay (wg + wa ) in cases (a) and (c) is the same, however, since we assume that there is a higher cost
associated with airborne delays, case (c) is preferred.
The discussion above raises two issues. First, the performance of KS depends on the controller
parameters, i.e., the duration of each stage σ and the number of kanban assigned to each stage ki ,
i = 0, · · · , N − 1. Thus, it is necessary to find ways for determining these parameters, and we address
this issue in Section 8.5. Second, it should be apparent that the time division in intervals of length σ
prevents us from applying a control policy that can completely eliminate airborne delays, unless an
additional mechanism is developed (thus complicating the KS scheme). This motivates our second
approach, which is based on using Perturbation Analysis (PA) techniques to analyze a sample path
of the system and aim at minimizing the airborne delays without creating unnecessary idle periods.
This approach is described in the next section.
8.4. Airplane Scheduling Using Finite Perturbation Analysis
At any given time, every airport maintains a list of the airplanes scheduled to arrive during the day.
Each new airplane requesting takeoff from a source airport represents an addition to this schedule
and incurs an additional cost. Our approach in this section is based on a derivation of the incremental
cost associated with the arrival of an extra airplane as a function of its ground-holding delay and
the use of Finite Perturbation Analysis (FPA) [12, 35] to minimize this cost.
Since the destination airport corresponds to a single server queue, the dynamics associated with
it are given by the standard Lindley recursion (e.g., see [48]):
Lk (d) = max{Ak (d), Lk−1 } + Zk
(8.3)
where Ak (d) is the time until airplane k will arrive at D when it is assigned a ground holding time d,
Zk is the time that it occupies the runway, and Lk (d) is the time until it lands. Note that applying
some ground-holding delay to k implies that its arrival time Ak will increase which in turn will
affect its landing time through the max operation in (8.3). Therefore, any cost function expressed
in terms of Ak , Lk will be non-differentiable, making it difficult to solve any associated minimization
problem. To overcome this problem, we recognize that points of non-differentiability correspond to
ground-holding delays that result in event-order changes (i.e., ground-holding delays that result in
two airplanes arriving at exactly the same time). Based on the arrival times of the already expected
airplanes, we break all possible ground-holding delays into smaller intervals [Aj , Aj+1 ) for all j such
that Aj > Ak (0) and optimize the cost function in each interval. Then, we determine the groundholding delay that corresponds to the minimum cost among all these intervals.
Before we describe the details of our optimization procedure, it is essential that we define all
timing intervals that we will use in studying a sample path. At any point in time, the controller of
68
sm
¾
-¾
τ
-
···
D
?
?
Aa−1 Ak (0)
?
?
Am−1
Aa
?
?
Ak (sm + τ ) Am
-
Figure 8.4: Timing diagram for ground-holding delay
the destination airport has a list of all airplanes that are expected to arrive with their corresponding
arrival times {A1 , · · · , Am , · · ·} as shown in Figure 8.4. When the kth airplane is ready to take off
from some airport Sj , it informs the controller about its expected arrival time, which is the earliest
possible arrival time for k when the ground delay d is zero, that is, Ak (0). Based on this value, the
controller can identify the airplanes that are expected right before and right after k, denoted by a − 1
and a respectively with a = min{i : Ak (0) < Ai }. If the kth airplane is assigned a ground-holding
delay d ≥ 0, its new expected arrival time is delayed by d, that is,
Ak (d) = Ak (0) + d,
(8.4)
In general, this will place the arrival time in some interval [Am−1 , Am ), where m ≥ a, as illustrated
in Figure 8.4. For reasons that will be clear in the sequel, we break the ground-holding delay into
two parts
d = sm + τ,
(8.5)
where, sm is the ground holding delay which forces k to arrive at exactly the same time as Am−1 ,
and τ is any additional delay within the interval [Am−1 , Am ). Specifically, define
sm = max{Am−1 − Ak (0), 0}, for any m ≥ a.
(8.6)
Notice that for the first possible interval [Aa−1 , Aa ), we have sa = 0, thus the need for the max
operator. In addition, since Am−1 ≤ Ak (0) + sm + τ < Am , the “residual” delay τ is constrained as
follows:
0 ≤ τ ≤ Am − Ak (0) if sm = 0
0 ≤ τ ≤ Am − Am−1 otherwise.
Finally, we point out that given the ground-holding delay d of k, airplane k will arrive after m − 1,
therefore its arrival will not affect the airborne waiting time of a − 1 or any other airplane j < m − 1.
On the other hand, the arrival of k may increase the airborne delay time of any airplane that is
expected after k, that is, airplanes m, m + 1, · · ·.
Next, define
˜ j (d) − Lj ≥ 0 j = 1, 2, · · ·
∆Lj (d) = L
(8.7)
˜ j (d) is the
where Lj is the expected landing time of airplane j before airplane k is considered, and L
expected landing time of j if k is assigned a ground holding time d. Using Perturbation Analysis
(PA) nomenclature, Lj is the landing time in the nominal sample path of this system. The addition
of the new airplane k would result in a perturbed sample path. The values of d ≥ 0 define a family
˜ j (d) is the landing time of j in a perturbed sample path
of such perturbed sample paths, and L
corresponding to some d. As already pointed out, for all j such that Aj < Ak (d), ∆Lj (d) = 0, while
for j such that Aj ≥ Ak (d), ∆Lj (d) ≥ 0.
69
Now we are ready to express the additional cost due to k as a function of the ground-holding delay
d = sm + τ . Letting cg and ca be the costs per unit time of ground and airborne delays respectively,
we have
X
Ck (sm , τ ) = cg (sm + τ ) + ca max{0, Lm−1 − Ak (sm + τ )} + ca
∆Lj (τ )
(8.8)
{j:Aj >Ak (sm )}
The first term is the cost due to ground-holding of airplane k, and the second term is its airborne
delay cost, which is positive only if Lm−1 > Ak (sm + τ ). The last term is the cost incurred to all
airplanes that are expected after k. Note that since ∆Lj (d) = 0 for all j such that Aj < Ak (sm + τ ),
those terms are left out of the summation, which is why ∆Lj becomes a function of τ only. Finally,
throughout the remainder of the paper we will make the obvious assumption that ca ≥ cg .
Our objective then is to determine τ = τ ∗ such that Ck (sm , τ ∗ ) ≤ Ck (sm , τ ) for all possible sm
and then find the value of sm with the minimum cost. That is, we seek
min Ck (sm , τ ), k = 1, · · · , K
(8.9)
sm ,τ
where K is the number of expected arrivals in a time period of interest, typically a day.
Next, we concentrate on deriving the optimal point in any interval [Am−1 , Am ), m = a, a + 1, · · ·.
In evaluating Ck (sm , τ ) we need to evaluate all perturbations ∆Lj (τ ) for j ≥ m. We will first
consider the case j > m, and then the case j = m.
For all j > m (with m fixed) we can easily evaluate the perturbation ∆Lj (τ ) using (8.3) and
(8.7) as follows:
˜ j−1 (τ )} + Zj − max{Aj , Lj−1 } − Zj
∆Lj (τ ) = max{Aj , L
(
=
h
=
max{0, ∆Lj−1 − Ij }, if Aj > Lj−1
∆Lj−1 ,
otherwise
∆Lj−1 (τ ) − [Ij ]+
i+
,
j>m
(8.10)
where [x]+ = max{0, x} and Ij = Aj − Lj−1 is the idle period preceding the arrival of airplane j
which is present if Aj > Lj−1 . Equation (8.10) indicates that a perturbation is generated only at
the landing of airplane m. The perturbation of any airplane j > m is just due to perturbation
propagation. Also, note that since the perturbed sample path contains a new arrival, it is impossible
to generate a new idle period.
We now express the perturbation in the landing time ∆Lj (·) for all j > m as a function of
∆Lm (τ ) for the given m. Let us group all airplanes j ≥ m in busy periods starting from m. Note
that (8.10) implies that for any airplane z that belongs to the same busy period as m, we have
∆Lz (τ ) = ∆Lm (τ ) since [Iz ]+ = 0. Furthermore, for any airplane n that starts a new busy period
we have In > 0, hence ∆Ln (τ ) = [∆Ln−1 (τ ) − In ]+ . Let Bn be the number of airplanes in the busy
period that started with n. Then, clearly, for all z = n + 1, · · · , n + Bn , we have ∆z (τ ) = ∆Ln (τ ).
With these observations, we can express the incremental cost Ck (sm , τ ) in (8.8) as a function of the
generated perturbation ∆Lm (τ ):
Ck (sm , τ ) = cg (sm + τ ) + ca max{0, Lm−1 − Ak (sm ) − τ )} + ca
B
X
b=1
70
"
Bb ∆Lm (τ ) −
b
X
i=1
#+
IiB
(8.11)
where B is the number of busy periods after the arrival of k, B1 is the number of airplanes that
follow m and are in the same busy period as m, and Bb , b = 2, · · · , B is the number of airplanes in
the bth busy period. Finally, IiB is the idle period preceding busy period i = 1, · · · , b.
Next, we make the following simplifying assumption:
A8.1. The minimum time between any two consecutive airplane landings is constant and equal to
Z, i.e., Zk = Z for all k.
This assumption is reasonable since Z depends on the weather conditions which do not change
dramatically in a small period of time (e.g., a few minutes), so that Zk−1 ≈ Zk ≈ Zk+1 . This
assumption maintains a First Come First Serve (FCFS) scheduling discipline at D; without this
assumption, an optimization algorithm aiming at minimizing the delay will give precedence to the
airplanes with the smallest landing time. Further, we assume that Z does not depend on the type
of airplane, but only on the weather so it is the same for all airplanes.
We now consider the case j = m and determine the generated perturbation ∆Lm (τ ). This
provides an initial condition for the recursive relationship in (8.10). Recalling (8.3), we have Lm =
˜ m (τ ) = max{Am , L
˜ k (τ )} + Z, therefore,
max{Am , Lm−1 } + Z and L
˜ k (τ )} − max{Am , Lm−1 }
∆Lm (τ ) = max{Am , L
(8.12)
˜ k (τ ) ≥ Lm−1 . It follows that
where L
(
∆Lm (τ ) =
˜ k (τ ) − Lm−1
L
if Am ≤ Lm−1 , (i.e., m does not start a new busy period)
˜
max{0, Lk (τ ) − Am } if Am > Lm−1 , (i.e., m starts a new busy period)
Let Wk be the airborne waiting time of airplane k, given by
Wk = [Lm−1 − Ak (sm ) − τ ]+
˜ k (τ ) = Ak (sm ) + τ + Wk + Z, we get
Since L
(
∆Lm (τ ) =
Ak (sm ) + τ + Wk + Z − Lm−1 ,
if Am ≤ Lm−1
+
max{0, Ak (sm ) + τ + Wk + Z − Am }, if Am > Lm−1
Expanding Wk = max{0, Lm−1 − Ak (sm ) − τ }, we get
(
∆Lm (τ ) =
max{0, Ak (sm ) + τ − Lm−1 } + Z,
if Am ≤ Lm−1
max{0, Ak (sm ) + τ + Z − Am , Lm−1 + Z − Am }, if Am > Lm−1
(8.13)
This expression provides the perturbation generated when airplane m lands resulting from the
addition of airplane k such that Am−1 = Ak (sm ) ≤ Ak (sm ) + τ ≤ Am . Thus, the control variable τ
is constrained by
0 ≤ τ ≤ Am − Ak (sm )
Using the perturbation expression in (8.13) and the cost expression in (8.11), we can determine
the ground holding delay τ that minimizes the additional cost Ck (sm , τ ) (under fixed sm ). This is
accomplished in the following theorem the proof of which is found in the Appendix E.
71
Theorem 8.4.1 Let T1 = Lm−1 −Ak (sm ) and T2 = Am −Ak (sm ). Then, in any interval [Am−1 , Am ),
the additional cost Ck (sm , τ ) is minimized by
τ ∗ = T1 + T2 − max{T1 , T2 } − min{0, T1 }
(8.14)
So far we have identified the minimum cost in each of the intervals [Am−1 , Am ) for all m =
a, a + 1, · · ·. Next we summarize the results in the form of an algorithm that gives the solution to
(8.9).
8.4.1. FPA-Based Control Algorithm
The algorithm we present here determines the GHD that minimizes the incremental cost in (8.9)
based on the assumption that the controller’s information consists of a list with all expected airplanes
j = 1, · · · , k − 1 that have already been scheduled prior to airplane k requesting permission to arrive
at D. The list includes the expected arrival times (Aj ), calculated landing times (Lj ), and idle
period lengths (Ij ≥ 0) that precede each scheduled airplane. Note that this algorithm solves a costminimization problem for each airplane k; as such, this is a local optimization problem, in the sense
that it is driven only by an individual takeoff request and does not take into account all expected
arrival information in the system. This justifies the term Local qualifying the FPA-based control
algorithm (L-FPA). In the next section, we will show that the local nature of this algorithm may
lead to sub-optimal solutions and then examine how it may be used to achieve global optimality.
The L-FPA algorithm starts by identifying the airplanes that will precede and follow the new
airplane (denoted by k) when its ground-holding time is zero. That is, we identify the airplanes
indexed by a − 1 and a respectively. Using the information in the expected airplane list (Aa−1 , Aa )
together with Theorem 8.4.1, the controller determines the ground-holding delay that would minimize
Ck (0, τ ) in the interval [Ak (0), Aa ). Next it asks what would happen if airplane k arrives between the
next two airplanes; in other words, what would happen if m − 1 = a and m = a + 1. The algorithm
continues traversing backwards checking whether increasing the ground holding delay may reduce
the cost. The question that arises is when the algorithm should stop the backward search. Note
that any GHP is trading off airborne delay for ground delay based on the assumption that airborne
delay is more expensive. Therefore, when the cost due to ground-holding delay becomes greater than
the airborne cost under zero ground-delays, increasing the ground holding delay any further will not
reduce the cost, hence defining a stopping condition. In other words, the stopping condition is
sm cg ≥ Ck (0, 0)
where cg is the ground-holding cost per unit time, sm = max{0, Am−1 −Ak (0)}, and Ck (·, ·) is defined
in (8.11).
72
Local FPA-based Control Algorithm (L-FPA)
When k requests permission to arrive at D,
Step 1.
INITIALIZE:
m := min{i : Ak (0) < Ai }
sm := 0, Cmin := Ck (0, 0), GHD := 0
Step 2.
IF cg sm ≥ Ck (0, 0) GOTO Step 6.
Step 3.
Determine τ ∗ using equation (8.14)
Step 4.
IF Ck (sm , τ ∗ ) < Cmin THEN Cmin := Ck (sm , τ ∗ ) and GHD := sm + τ ∗
Step 5.
Set m := m + 1, sm := Am−1 − Ak (0) and GOTO Step 2.
Step 6.
END.
To complete the specification of the algorithm we include the cases where m−1 or m does not exist
by setting Lm−1 = 0 and Am = ∞ respectively. Upon termination of the algorithm, GHD holds the
value of the ground-holding delay that minimizes the cost function Ck (·), i.e., d∗ = s∗m + τ ∗ = GHD.
The following lemma provides some insight to the operation of the L-FPA algorithm and will
prove useful in our analysis of global optimality. Lemma 8.4.1 asserts that when an airplane k is
expected to arrive at D, it will be assigned a GHD such that its own airborne delay is zero. In other
words, when k is expected to arrive during a busy period of the system, it is assigned a GHD which
is at least long enough to allow the last airplane of the busy period to clear the runway.
In the sequel we shall use dk to denote the GHD of the kth airplane as determined by L-FPA.
Lemma 8.4.1 Suppose that Ak (0) ≤ Aa ≤ La−1 , where a := min{i : Ak (0) < Ai }. Then, the GHD
assigned to k by the L-FPA controller is such that
dk ≥ Ll (dl ) − Ak (0)
where l is the index of the last airplane of the busy period that a belongs to.
The proof of the lemma is included in the Appendix E.
An obvious corollary of Lemma 8.4.1 is that within any busy period, the additional cost Ck (·, ·)
is monotonically decreasing as dk increases.
From a practical standpoint, the L-FPA algorithm is well-suited for dynamic control: It is triggered by the kth airplane ready to take off, which requests permission from the controller of the
destination airport by sending its earliest possible arrival time Ak (0). The controller then determines the GHD that minimizes the incremental cost based on all information available up to that
time instant, i.e., all expected arrival times Aj , where j = 1, · · · , k − 1 are airplanes already scheduled prior to k. Note that Ak (0) will account for possible congestion at the source airport as well
as any other known factors that may delay its departure. In addition, Ak (0) will include the best
current estimate of the travel time of k until the destination is reached. Whether this approach can
also achieve global optimality in the sense of completely eliminating all airborne delays is an issue
addressed in the next section.
73
8.4.2. Global Optimality of the FPA Approach
As previously mentioned, the L-FPA algorithm addresses a local optimization problem pertaining
to an individual airplane k and it does not take into account all expected arrival information in
the system; this includes future airplanes expected to request permission to take off from a source
airport and fly to D. This raises the question of whether the algorithm might possibly achieve
global optimality as well. In this section, we shall first show how the L-FPA algorithm may lead to
sub-optimal solutions and then examine how it may be modified to achieve global optimality.
The fact that the L-FPA controller does not take into account all expected arrivals may lead to
sub-optimal policies as shown in Figure 8.5. Suppose that flight fk is expected to arrive at D after
Ak (sm ) time units. Under the FPA scheme, assuming appropriate ground and airborne costs, cg and
ca respectively, fk will be assigned a ground delay τ ∗ , as indicated in Figure 8.5 (a). In addition,
assume that flight fm , originating from an airport located farther away than the source airport of
k, had already requested permission to take off and is scheduled to arrive at Am as shown in the
a
same figure. When the L-FPA control policy is implemented, fk will induce an airborne delay wm
on flight fm . One can easily notice that the performance of the L-FPA scheme can be improved
∗ to flight f , thus eliminating all airborne delays as shown in
by introducing a ground delay τm
m
Figure 8.5 (b). However, under the L-FPA control policy the latter cannot materialize since at the
time the information on fk becomes available, fm is already en-route to D; therefore, introducing
ground delays is infeasible and the only option at this point is to reduce the speed of fm .
∗
¾ τ -¾
Z
a
¾wm -
(a)
?
D
(b)
D
Ak (sm ) Lm−1
∗
-
Am
∗
¾ τk ?
¾ τm -
Ak (sm ) Lm−1
∗
Am Am + τm
-
Figure 8.5: Global Optimality: (a) L-FPA result (b) Global Optimum.
Therefore, if we consider a “global optimum” policy to be one through which airborne delay is
entirely traded off for ground-holding delay, then the absence of future information, i.e., information
on airplanes that are expected to take off after k, prevents the L-FPA control scheme from achieving
this goal. The question that arises is whether this scheme, equipped with additional information
on all expected flights, can converge to a globally optimal policy. For instance, every morning the
airport controller has information on the expected arrival times of all flights that are scheduled for
the day, as well as predictions on the airport capacity. Is it possible to use this information together
with the FPA approach to derive a policy such that airborne delay is reduced to zero? It turns out
that this is possible as shown in Theorem 8.4.2.
Before further considering optimality properties, let us completely characterize a “globally op74
timal” policy. To do so, assume that during a particular day, under a zero ground-holding policy
all airplanes will experience a total of Da0 time units of airborne delay and, of course, zero ground
delay Dg0 = 0. In this case, under any ground-holding policy, it is unavoidable that all airplanes will
experience a total of at least Da0 time units of delay, either in the air or on the ground. Assuming
that airborne delay is more expensive than ground delay, then the globally optimal policy is one
where
Da∗ = 0 and
Dg∗ = Da0
(8.15)
That is, all of the airborne delay is traded off for exactly the same amount of ground delay. We will
show in Theorem 8.4.2 that the L-FPA algorithm can achieve (8.15) under an appropriate information
structure. To do so, we will make use of the following lemma the proof of which is in the Appendix E.
Lemma 8.4.2 Suppose that all flights are ordered based on their arrival time such that A1 (0) ≤
A2 (0) ≤ · · · ≤ Ak (0) ≤ · · · ≤ AK (0), and are used to drive the L-FPA algorithm in that order.
Then, the L-FPA algorithm will assign GHDs dj , j = 1, · · · , K, such that A1 (d1 ) ≤ A2 (d2 ) ≤ · · · ≤
Ak (dk ) ≤ · · · ≤ AK (dK ), where dj = sj + τ j and sj , τ j are the solutions to (8.9).
The proof of Lemma 8.4.2 provides additional insights to the operation of the L-FPA algorithm
when each airplane is considered according to its arrival order. The following corollary allows us to
immediately determine the GHD of k + 1 given the schedule up to k.
Corollary 8.4.1 Suppose that all flights are ordered based on their arrival time such that A1 (0) ≤
A2 (0) ≤ · · · ≤ Ak (0) ≤ · · · ≤ AK (0), and are used to drive the L-FPA algorithm in that order. Then,
the GHD assigned by L-FPA to each airplane is given by
n
o
dk = sk + τ k = max 0, Lk−1 (dk−1 ) − Ak (0)
for all k = 1, · · · , K.
(8.16)
The proof of the corollary follows from the proof of Lemma 8.4.1. When k − 1 is the last scheduled
airplane, H = 0 and hence Ak (dk ) = Lk−1 (dk−1 ), therefore, dk = Lk−1 (dk−1 ) − Ak (0). The max
operator is necessary for the case when Ak (0) > Lk−1 (dk−1 ).
This suggests that if the kth airplane is scheduled to arrive before airplane k −1 clears the runway
(Ak (0) < Lk−1 (d∗k−1 )), it should delay its departure by Lk−1 (d∗k−1 ) − Ak (0) which is the expected
airborne delay that would be experienced by airplane k. On the other hand, if the kth airplane is
expected after k − 1 lands (Ak (0) > Lk−1 (d∗k−1 )), then it should depart immediately. In this case,
the loop of steps 2-5 of the L-FPA algorithm in Section 8.4.1 is eliminated.
Theorem 8.4.2 Suppose that all flights are ordered based on their arrival time such that A1 (0) ≤
A2 (0) ≤ · · · ≤ Ak (0) ≤ · · · ≤ AK (0), and are used to drive the L-FPA algorithm in that order. Then,
the L-FPA algorithm yields a globally optimal policy.
The proof of the theorem is included in the Appendix E.
Theorem 8.4.2 suggests an alternative approach of using the FPA algorithm which is referred to
as “Global FPA” (G-FPA) because it results in a globally optimal schedule. In this case, the FPA
algorithm is triggered at any desired time instant and is given the list of all expected airplane arrivals
in ascending order (i.e., Aj (0) ≤ Aj+1 (0), for j = 1, · · · , K − 1). The output of FPA is also a list
with the optimal GHDs of all airplanes dj , j = 1, · · · , K.
75
Global FPA Algorithm (G-FPA)
At any point in time
Step 1.
Order airplanes according to their arrival times
A1 (0) ≤ A2 (0) ≤ · · · ≤ AK (0)
Step 2.
FOR k = n
1 TO K
o
k
d = max 0, Lk−1 (dk−1 ) − Ak (0)
Step 3.
END.
Figure 8.6 shows two different ways of implementing the underlying FPA algorithm developed
in Section 8.4 which result in different control structures. First, the L-FPA controller is shown in
Figure 8.6 (a). In this case, the underlying FPA algorithm is triggered by every airplane k which is
ready to depart and informs the controller of its expected arrival time Ak (0). The FPA algorithm
determines a GHD and adds k to the schedule. Note that when this is done, L-FPA does not change
the GHD of any airplane that has already been scheduled. Furthermore, this controller does not take
into consideration any airplanes that will depart after k and, for this reason, it cannot guarantee a
global optimal solution in the sense of (8.15), as indicated at the beginning of this section.
The G-FPA controller is shown in Figure 8.6 (b). This controller is invoked every time there is a
change in some of the current estimates: the airport capacity (i.e., changes in Z) or the estimates of
airplane expected arrival times, Aj (0), j = 1, · · · , K. Every time that the FPA algorithm is triggered
it considers future information and can change the GHD of any airplane, leading to a global optimum
as proven in Theorem 8.4.2. As shown next, even though this algorithm needs to be executed several
times, it is very efficient so as not to pose any computational problems.
Memory
{Ai : Di < Dk}
Ak
FPA
Input at
time t
Input at
time t
A1
::
Ak
Ak+1
::
AK
dk
Output at
time t
Output at
time t
FPA
dk
dk+1
::
dK
(b)
(a)
Figure 8.6: FPA-based algorithms. (a) Local FPA (L-FPA) Controller triggered by airplane departures, (b) Global FPA (G-FPA) Controller triggered at any time
8.4.3. Algorithm Complexity
The efficiency of the L-FPA algorithm depends on several parameters, including the airport capacity.
For example, when the airport capacity is maximum, then the optimal GHD for all airplanes is zero,
assuming that the original flight scheduling was done optimally. In this case, the loop of steps 2,
76
3 and, 4 of L-FPA is implemented only once. On the other hand, when the airport capacity is
close to zero, then every airplane will experience long GHDs and it is possible that the loop will be
implemented close to k times.
In the case of G-FPA, the input is the entire list of all expected arrivals applied in ascending order
(i.e., Aj (0) ≤ Aj+1 (0), for j = 1, · · · , K − 1). Using Corollary 8.4.1, the GHD of any airplane involves
just the evaluation of dk through (8.16) which in turn involves a single addition and comparison.
Therefore, the total number of operations required in order to determine the optimal schedule is just
2K where K is the total number of flights expected. Since the G-FPA algorithm is computationally
efficient, it is reasonable to have it executed several times during a day for every significant change
in weather conditions or in the expected arrival time of a flight.
8.5. Numerical Results
This section describes some numerical results that were obtained through simulation of the KS and
FPA schemes. We used the model of Figure 8.1 with M = 20 source airports located at distances of
30, 45, · · · , 285 minutes away from D. Furthermore, based on the distance from D, each flight was
assigned a scheduled take off time so that the arrival pattern at D is the one shown in Figure 8.7.
This figure shows the number of scheduled arrivals at Boston’s Logan International airport for every
hour of a typical day and was also used in the studies described in [62]. For the purposes of our
experiments, we assumed that the airplanes expected within a given hour were equally spaced over
the interval.
50
47
46
41
39
Landings per Hour
40
37
37
40
36
31
30
30
28
29
28
23
22
19
20
16
10
3
0
6
8
10
12
14
16
18
20
Time of Day
Figure 8.7: Hourly landings at airport D
77
22
8.5.1. Performance of KS
To test the performance of the KS algorithm, we simulated the system with and without the KS
controller and compared their corresponding results. First we assumed that the landing capacity
of D is fixed at 40 landings per hour, i.e., the minimum separation between any two consecutive
airplanes is Z = 1.5 minutes. Furthermore, every time that a flight was given a kanban from a
higher stage, the delay assigned to it was the average between the maximum and minimum possible
delays (see Section 8.3.2). In other words, delayed airplanes were placed in the middle of the stage.
Finally, we fixed the number of kanban per stage to three (3) and observed the performance of the
KS scheme for various values of the landing stage duration σ, which is a parameter of the controller.
As indicated in Figure 8.8, increasing the landing stage duration decreases the airborne delay while
increasing the ground holding time. This is expected, since increasing the interval σ decreases the
instantaneous arrival rate at D (3/σ).
8
Delay (min.)
No Control Airborne Delay
7
KS Ground Delay
KS Airborne Delay
6
KS Total Delay
5
4
3
2
1
0
3.8
3.9
4
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
Stage Duration (min.)
Figure 8.8: Trade-off between airborne and ground delays for the KS controller
Figure 8.9 shows the percentage improvement of the delay cost for various ca /cg ratios. Note that
the improvement is maximized when the stage duration is equal to 4.5 minutes which is also equal
to the number of kanban times the minimum separation between any two consecutive arrivals. This
observation can be generalized through “flow equilibrium” to provide a rule for setting the number
of kanban at every stage i, that is
»
σ
ki =
Zi
¼
(8.17)
where Zi is the predicted minimum separation between any two consecutive airplanes at time t = iσ,
and dxe in the smallest integer greater than x.
78
50
Ca=1.25Cg
Ca=1.5Cg
Ca=1.75Cg
Ca=2Cg
% Delay Cost Improvent
40
30
20
10
0
-10
-20
3.8
3.9
4
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
Stage Duration (min.)
Figure 8.9: Overall cost improvement under the KS scheme
8.5.2. Performance of L-FPA
As in the KS case, to demonstrate the effectiveness of the L-FPA scheme we compare the performance
under the L-FPA controller with the uncontrolled case. In Figure 8.10 we set the cost parameters
ca = 1.2cg and obtain the expected ground and airborne delays as a function of the airport capacity.
Note that the total delay in the uncontrolled case consists of only the airborne delay. On the other
hand, L-FPA forces the airborne delay down to almost zero at the expense of longer ground delays.
Also note that the total delay (ground plus airborne) is the same for both cases. Notice that in this
case, L-FPA performed very well even though, as we have seen it cannot guarantee a global optimal
schedule. Had we used the G-FPA controller that uses future information as well, then the airborne
delay would have been reduced to zero, while the ground delay would have been exactly equal to the
airborne delay of the uncontrolled case.
Figure 8.11 shows the cost benefits of L-FPA under various ca /cg ratios. As indicated in the
figure, the higher the airborne cost, the higher the benefit of the L-FPA scheme. Also note that
in the case where ca /cg = 1 there is no benefit of using a ground holding policy as indicated in
Section 8.3.2.
Finally, one can see the benefit of using L-FPA instead of KS when observing the maximum
improvements achieved by the two algorithms. Suppose that the ratio ca /cg = 2. Then, as indicated
in Figure 8.9, KS can achieve an improvement of about 38%. On the other hand, when the capacity
is 40 landings per hour, L-FPA can achieve an improvement of about 46% as indicated in Figure 8.11.
79
Ave. Ground/Airborne Delays (min.)
50
No Control Airborne Delay
FPA Ground Delay
40
FPA Airborne Delay
FPA Total Delay
30
20
10
0
32
33
34
35
36
37
38
39
40
41
42
43
44
45
Destination Airport Capacity
Figure 8.10: Trade-off between airborne and ground delays for the L-FPA controller
8.6. Conclusions and Future Directions
In this chapter we present two new approaches for solving the ground-holding problem in air traffic
control. The first approach is a heuristic and is based on the kanban control policy. The second
approach stems from ideas in perturbation analysis of discrete-event systems and is proven to generate
an optimal policy. Both approaches are very easy to implement, inherently dynamic, and scalable
while they can accommodate various aspects of the problem such as limited sector capacities etc.
In the future it would be interesting to investigate extensions that will enable the analysis of
networks of airports. Specifically, we need to address multiple destination flights and banks of flights
in the hub-and-spoke system. In multiple destination flights a GHD at any intermediate airport may
generate additional costs in several downstream destination airports, thus the monotonicity result of
Lemma 8.4.1 may no longer hold. In the hub-and-spoke system, a GHD of an incoming flight at the
hub may delay several outgoing flights, or it may cause passengers to miss their connecting flights.
In these cases, it is still possible to utilize both of the proposed approaches, but communication
between airport controllers may be necessary.
80
50
Ca=Cg
Ca=1.25Cg
Ca=1.5Cg
Ca=1.75Cg
Ca=2Cg
% Cost Improvement
40
30
20
10
0
-10
32
33
34
35
36
37
38
39
40
41
42
43
44
45
Destination Airport Capacity
Figure 8.11: Overall cost improvement under the L-FPA scheme
81
Chapter 9
EPILOGUE
In conclusion, we summarize the achievements of the dissertation and outline some possible directions
for future research.
9.1. Summary
The main focus of this dissertation was the dynamic allocation of discrete-resources in stochastic
environments. Such problems arise frequently in communication networks, manufacturing systems,
transportation systems etc. To solve such problems, we developed two algorithms. The first one, is
descent, in other words at every iteration it goes to an allocation with a lower cost, and it is most
suitable for problems with separable convex structure. At every step, this algorithm takes a resource
from the “most sensitive” user and reallocates it to the “least sensitive” user, thus it always visits
feasible allocations which makes it appropriate for use on-line.
The second algorithm is incremental in the sense that it starts with zero resources, and at every
step it allocates an additional resource and it is suited for systems that satisfy the smoothness or
complementary smoothness conditions. In a deterministic environment, both algorithms were proven
to converge to the global optimal allocation in a finite number of steps which is grows linearly with
the number the number of resources to be allocated. Furthermore, in stochastic environment, they
were proven to converge in probability, while, under some additional mild assumptions they converge
with probability one. Moreover, because they are driven by ordinal comparisons they are robust with
respect to estimation noise thus their convergence is accelerated since they can use estimates taken
over shorter observation intervals.
In addition, we used perturbation analysis to improve the performance of the optimization algorithms and to enable on-line implementation. Towards this end, we developed two techniques for
predicting the system performance under several parameters while observing a single sample path
under a single parameter. The first technique, Concurrent Estimation can be directly applied to
general DES while for the second one, FPA, we demonstrated a general procedure for deriving such
algorithm for the system dynamics. Furthermore, we point out that both procedures can be used for
systems with general event lifetime distributions.
Subsequently, we applied principles from the derived optimization techniques on three different
problems. The first problem is from the area of manufacturing systems. Specifically, we addressed
82
the problem of allocating a fixed number of kanban to the various production stages as to optimize
a performance measure (e.g., throughput, mean delay) while maintaining a low work-in-process
inventory. We showed that such systems satisfy the smoothness and complementary smoothness
conditions and hence we used the “incremental” algorithm to perform the resource allocation.
The second problem we considered is from the area of mobile telecommunication networks and
we addressed the problem of channel allocation as to minimize the number of lost calls due to the
unavailability of free channels. This problem can be modeled as a separable convex problem and
hence we used a variation of the descent algorithm to solve it. Through simulation we showed that the
proposed algorithms performed better that existing algorithms and they require less reconfigurations.
The last application is from the area of air traffic control where we addressed the problem of
determining the ground holding delay as to minimize congestion over busy airports. To solve the
problem we developed two new approaches. The first one is a heuristic and is based on the kanban
control policy. The second approach stems from the ideas in FPA and is proven to generate an
optimal policy. Both approaches are very easy to implement, inherently dynamic, and scalable while
they can accommodate various aspects of the problem such as limited sector capacities etc.
9.2. Future Directions
There are several direction in which one can extend the results presented in this dissertation. First, in
Section 3.5 we started looking at ways of developing a descent-like algorithm for more general systems
and proposed a modified version of the algorithm. One possibility is to investigate the conditions
under which that modified algorithm converges to the optimal allocations, both in a deterministic
and stochastic environments.
Furthermore, another possibility worth exploring is whether it is possible to improve the performance of such algorithms by redefining the quantities that involve the finite differences ∆Li (·) and
δk . These quantities as they are defined in equations (3.2), (3.9) and (3.22) describe what would
happen if some user gets an extra resource independently from what would happen if another user
gives up a resource. This however, works only for separable problems. For the general case, it may
be necessary to define these quantities in such a way so that they reflect the combined effect. For
example, one may define
∆Lij (s) = L(s) − L(s + ei − ej )
for all i, j = 1, 2, · · · , N , where again ei is a vector with all of its elements equal to 0 except the ith
one which is equal to 1. Of course, the drawback is that more such differences might be necessary
for every step.
The channel allocation problem also has some directions worth exploring. First, it should be
interesting to investigate the effect of the proposed algorithms (SN and EN) for systems with hierarchical cell structure where overlapping areas are potentially much bigger. Furthermore, the results
presented in Chapter 7 assume that the mobile users are “stationary”, i.e., they do not cross the
boundaries of the cell where they were initiated. Clearly, it would be interesting to look at how such
algorithms affect the number of dropped calls when users are allowed to roam. Finally, note that
users in the overlapping areas experience a higher quality of service compared to users located in
areas that are covered by a single base station. Another interesting issue is to see how SN and EN
affect the fairness in the perceived quality of service among users.
83
Finally, there are several unresolved issues relating to the proposed approaches for solving the
ground holding problem in air traffic control. For this problem it would be interesting to investigate
extensions that will enable the analysis of networks of airports. Specifically, address multiple destination flights and banks of flights in the hub-and-spoke system. In multiple destination flights a
GHD at any intermediate airport may generate additional costs in several downstream destination
airports, thus the monotonicity result of Lemma 8.4.1 may no longer hold. In the hub-and-spoke
system, a GHD of an incoming flight at the hub may delay several outgoing flights, or it may cause
passengers to miss their connecting flights. In these cases, it is still possible to utilize both of the
proposed approaches, but communication between airport controllers may be necessary.
84
Appendix A
SELECTED ALGORITHMS
A.1
S-DOP Pseudo Code
1.0 Initialize:
(0)
(0)
s(0) = [n1 , · · · , nN ]; C (0) = {1, · · · , N }; k = 0; initialize f (k).
ˆ k (n(k) , · · · , n(k) ) ≡ [∆L
ˆ f (k) (n(k) )]
ˆ f (k) (n(k) ), · · · , ∆L
1.1 Evaluate D
1
1
1
N
N
N
(k)
(k)
ˆ k (n , · · · , n )]
2.1 Set i∗ = arg maxi=1,···,C (k) [D
1
N
(k)
(k)
ˆ k (n , · · · , n )]
2.2 Set j ∗ = arg mini∈C (k) [D
1
N
(k)
(k)
ˆ k (n(k) , · · · , n(k)
2.3 Increase f (k) and Evaluate D
1
i∗ − 1, · · · , nj ∗ + 1, · · · , nN )
ˆ f∗(k) (n(k)
ˆ f (k) (k)
2.4 If ∆L
j
j ∗ + 1) < ∆Li∗ (ni∗ ) Goto 3.1 ELSE Goto 3.2
3.1 Update allocation:
(k+1)
(k)
(k+1)
(k)
(k+1)
(k)
ni∗
= ni∗ − 1; nj ∗
= nj ∗ + 1; nm
= nm for all m ∈ C (k) and m 6= i∗ , j ∗ ;
Set k ← k + 1 and go to 2.1
3.2 Replace C (k) by C (k) − {j ∗ };
If |C (k) | = 1, Reset C (k) = {1, · · · , N };
Go to 2.2
A.2
Time Warping Algorithm (TWA)
1.
INITIALIZE
n := 0, k := 0, tn := 0, t˜k := 0, xn := x0 , x
˜k = x
˜0 ,
n
k
yi (n) = vi (1) for all i ∈ Γ(xn ), si = 0,˜
si = 0 for all i ∈ E,
M (0, 0) := Γ(˜
x0 ), A(0, 0) := ∅
2.
WHEN EVENT en IS OBSERVED:
85
2.1
2.2
Use (5.3)-(5.8) to determine en+1 , xn+1 , tn+1 , yi (n + 1) for all i ∈ Γ(xn+1 ), sn+1
for all
i
i ∈ E.
Add the en+1 event lifetime to Vi (n + 1, k):
(
Vi (n + 1, k) =
3.
Vi (n, k) + vi (sni + 1)
Vi (n, k)
if i = en+1
otherwise
2.3
Update the available event set A(n, k): A(n + 1, k) = A(n, k) ∪ {en+1 }
2.4
Update the missing event set M (n, k): M (n + 1, k) = M (n, k)
2.5
IF M (n + 1, k) ⊆ A(n + 1, k) then Goto 3. ELSE set n ← n + 1 and Goto 2.1.
TIME WARPING OPERATION:
3.1
Obtain all missing event lifetimes to resume sample path construction at state x˜k :
(
y˜i (k) =
3.2
3.3
vi (˜
ski + 1)
y˜i (k − 1)
for i ∈ M (n + 1, k)
otherwise
Use (5.3)-(5.8) to determine e˜k+1 , x
˜k+1 , t˜k+1 , y˜i (k + 1) for all i ∈ Γ(˜
xk+1 ) ∩ (Γ(˜
xk ) −
k+1
{˜
ek+1 }), s˜i
for all i ∈ E.
Discard all used event lifetimes:
Vi (n + 1, k + 1) = Vi (n + 1, k) − vi (˜
ski + 1) for all i ∈ M (n + 1, k)
3.4
Update the available event set A(n + 1, k):
n
A(n + 1, k + 1) = A(n + 1, k) − i : i ∈ M (n + 1, k), s˜k+1
= sn+1
i
i
3.5
o
Update the missing event set M (n + 1, k):
M (n + 1, k + 1) = Γ(˜
xk+1 ) − (Γ(˜
xk ) − {˜
ek+1 })
3.6
A.3
IF M (n + 1, k + 1) ⊆ A(n + 1, k + 1) then k ← k + 1 and Goto 3.1. ELSE k ← k + 1,
n ← n + 1 and Goto 2.1.
Finite Perturbation Algorithm for Serial Queueing Systems
1. Initialize: ∆Dkn = 0 for all k, n ≤ 0.
2. At the departure of (k, n):
(a) If (k, n) did NOT wait and was NOT blocked (Wkn ≤ 0, Bkn ≤ 0), then
n
(k,n−1)
∆Dkn = ∆Dkn−1 − max 0, ∆(k−1,n) − Ikn ,
(k,n−1)
o
∆(k−xn+1 −1[n+1],n+1) + Bkn − Qn+1
k−xn+1 · 1[n + 1]
86
(b) If (k, n) waited but was NOT blocked (Wkn > 0, Bkn ≤ 0), then
n
(k−1,n)
n
∆Dkn = ∆Dk−1
− max 0, ∆(k,n−1) − Wkn ,
o
(k−1,n)
∆(k−xn+1 −1[n+1],n+1) + Bkn − Qn+1
k−xn+1 · 1[n + 1]
(c) If (k, n) was blocked (Bkn > 0), then
n
(k−x
n+1
n+1
− max ∆(k,n−1)
∆Dkn = ∆Dk−x
n+1
(k−x
n+1
∆(k−1,n)
³
,n+1)
(k−x
,n+1)
− Bkn − [Ikn ]+ ,
,n+1)
− Bkn − [Wkn ]+ ,
´
o
∆(k−xn+1
− Qn+1
k−xn+1 · 1[n + 1]
n+1 −1,n+1)
87
Appendix B
PROOFS FROM CHAPTER 3
B.1
Proof of Theorem 3.2.1
First, define the set
B(s) = {s : s = [n1 , · · · , ni + 1, ··, nj − 1, ··, nN ] for some i 6= j}
which includes all feasible neighboring points to s = [n1 , · · · , nN ], i.e., vectors which differ from s by
+1 and −1 in two distinct components (recall that n1 + · · · + nN = K). To prove that (3.3) is a
necessary condition, assume that ¯s is a global optimum, that is L(¯s) ≤ L(s0 ) for all s0 ∈ B(¯s). From
this we can write:
N
X
Li (¯
ni ) ≤ L1 (¯
n1 ) + · · · + Li (¯
ni + 1) + · · · + Lj (¯
nj − 1) + · · · + LN (¯
nN )
i=1
or
Li (¯
ni ) + Lj (¯
nj ) ≤ Li (¯
ni + 1) + Lj (¯
nj − 1)
and, therefore,
∆Li (¯
ni + 1) ≥ ∆Lj (¯
nj )
s∗
for any i, j
(B.1)
To prove the sufficiency of (3.3), let ¯s = [¯
n1 , · · · , n
¯ N ] be an allocation that satisfies (3.3), and let
∗
∗
= [n1 , · · · , nN ] be a global optimum. Therefore, [n∗1 , · · · , n∗N ] satisfies (B.1), i.e,
∆Li (n∗i + 1) ≥ ∆Lj (n∗j ) for any i, j
(B.2)
Let n∗i = n
¯ i + di for all i = 1, · · · , N , where di ∈ {−K, · · · , −1, 0, 1, · · · , K} and subject to the
constraint
N
X
dj = 0
j=1
which follows from the constraint n1 + · · · + nN = K. Then, define the set A = {i : di = 0}. There
are now two cases depending on the cardinality |A| of this set:
Case 1: |A| = N . In this case we have n
¯ i = n∗i for all i, so that, trivially, ¯s ≡ s∗ .
88
Case 2: |A| =
6 N . This implies, that there exist indices i, j such that di > 0 and dj < 0. Therefore,
we can write the following ordering:
∆Lj (¯
nj + dj + 1) ≥ ∆Li (¯
ni + di ) ≥ ∆Li (¯
ni + 1) ≥ ∆Lj (¯
nj )
(B.3)
where the first inequality is due to (B.2), the second is due to A3.2, and the third is due to
our assumption that ¯s satisfies (3.3). However, for dj ≤ −2, using A3.2, we have ∆Lj (¯
nj ) >
∆Lj (¯
nj + dj + 1) which contradicts (B.3). It follows that for an allocation to satisfy (3.3) only
dj = −1 is possible, which in turn implies that (B.3) holds in equality, i.e.,
∆Lj (¯
nj ) = ∆Li (¯
ni + di ) = ∆Li (¯
ni + 1)
(B.4)
Using A3.2, this implies that di = 1.
This argument holds for any (i, j) pair, therefore we conclude that the only possible candidate
allocations ¯s satisfying (3.3) are such that
∆Li (¯
ni + 1) = ∆Lj (¯
nj ) for all i, j 6∈ A, di = 1, dj = −1
(B.5)
Let the difference in cost corresponding to ¯s and s∗ be ∆(¯s, s∗ ). This is given by
∆(¯s, s∗ ) =
N
X
[Li (¯
ni ) − Li (n∗i )] =
i=1
=
N
X
N
X
[Li (¯
ni ) − Li (¯
ni + di )]
i=1
i6∈A
∆Li (¯
ni ) −
i=1
i6∈A
di =−1
N
X
∆Li (¯
ni + 1) = 0
i=1
i6∈A
di =1
where in the last step we use (B.5). This establishes that if ¯s = [¯
n1 , · · · , n
¯ N ] satisfies (3.3), then
either ¯s ≡ s∗ as in Case 1 or it belongs to a set of equivalent optimal allocations and hence the
theorem is proved.
B.2
Suppose that ¯s is a global optimum, and consider an allocation s = [n1 , · · · , nN ] such that
s 6= ¯s. We can then express ni (0 ≤ ni ≤ K) as:
ni = n
¯ i + di for all i = 1, · · · , N
where di ∈ {−K, · · · , −1, 0, 1, · · · , K} and subject to
N
X
dj = 0
(B.6)
j=1
which follows from the fact that [¯
n1 , · · · , n
¯ N ] is a feasible allocation. Let i∗ = arg maxi=1,···,N {∆Li (¯
ni )}.
If s 6= ¯s, it follows from (B.6) that there exists some j such that dj > 0 and two cases arise:
Case 1: If j = i∗ , then
max {∆Li (ni )} ≥ ∆Lj (nj ) = ∆Li∗ (¯
ni∗ + di∗ ) > ∆Li∗ (¯
ni∗ )
i=1,···,N
where the last step is due to A3.2 since di∗ > 0.
89
Case 2: If j 6= i∗ , then first apply Theorem 3.2.1 to the optimal allocation ¯s to get
∆Lj (¯
nj + 1) ≥ ∆Li∗ (¯
ni∗ )
(B.7)
Then, we can write the following:
max {∆Li (ni )} ≥ ∆Lj (nj ) = ∆Lj (¯
nj + dj ) ≥ ∆Lj (¯
nj + 1) ≥ ∆Li∗ (¯
ni∗ )
i=1,···,N
where the second inequality is due to A3.2 and the fact that dj ≥ 1, and the last inequality
in due to (B.7). Hence, (3.4) is established.
Next, we show that if an allocation ¯s satisfies (3.4) and A3.3 holds, it also satisfies (3.3),
from which, by Theorem 3.2.1, we conclude that the allocation is a global optimum. Let
i∗ = arg maxi=1,···,N {∆Li (¯
ni )} and suppose that (3.3) does not hold. Then, there exists a
j 6= i∗ such that:
ni )}
(B.8)
ni∗ ) = max {∆Li (¯
∆Lj (¯
nj + 1) < ∆Li∗ (¯
i=1,···,N
Note that if no such j were to be found, we would have ∆Lj (¯
nj + 1) ≥ ∆Li∗ (¯
ni∗ ) > ∆Lk (¯
nk )
for all j, k (because of A3.3) and we would not be able to violate (3.3) as assumed above.
Now, without loss of generality, let i∗ = 1 and j = N (j satisfying (B.8)). Then, using A3.2,
A3.3, and (B.8), the feasible allocation [¯
n1 − 1, n
¯2, · · · , n
¯ N −1 , n
¯ N + 1] is such that:
∆L1 (¯
n1 ) = max{∆L1 (¯
n1 ), · · · , ∆LN (¯
nN )}
> max{∆L1 (¯
n1 − 1), ∆L2 (¯
n2 ), · · · , ∆LN (¯
nN + 1)}
which contradicts (3.4) for the feasible allocation [¯
n1 −1, n
¯2, · · · , n
¯ N −1 , n
¯ N +1] and the theorem
is proved.
B.3
Proof of Lemma 3.3.1
To prove P1, first note that if δk ≤ 0, then, from (3.5), ∆Li∗k+1 (ni∗k+1 ,k+1 ) = ∆Li∗k (ni∗k ,k ). On
the other hand, if δk > 0, then there are two cases:
Case 1: If i∗k = i∗k+1 then, from A3.2,
∆Li∗k (ni∗k ,k ) > ∆Li∗k (ni∗k ,k − 1) = ∆Li∗k+1 (ni∗k+1 ,k+1 )
Case 2: If i∗k 6= i∗k+1 = p, then there are two possible subcases:
Case 2.1: If p = jk∗ , since δk > 0, we have
∆Li∗k (ni∗k ,k ) > ∆Lp (np,k + 1) = ∆Li∗k+1 (ni∗k+1 ,k+1 )
Case 2.2: If p 6= jk∗ , then by the definition of i∗k and the fact that np,k = np,k+1 ,
∆Li∗k (ni∗k ,k ) ≥ ∆Lp (np,k ) = ∆Li∗k+1 (ni∗k+1 ,k+1 )
90
The proof of P2 is similar to that of P1 and is omitted.
Next, we prove property P3. First, note that when p = i∗k we must have δk > 0. Otherwise,
from (3.5), we get ni,k+1 = ni,k for all i = 1, · · · , N . From (3.10), this implies that i∗k+1 = i∗k = p,
which violates our assumption that p 6= i∗l for k < l < m. Therefore, with δk > 0, (3.5) implies
that np,k+1 = np,k − 1. In addition, np,m = np,k+1 = np,k − 1, since p 6= i∗l for all l such that
k < l < m, and p ∈ Cm . We then have:
δm = ∆Li∗m (ni∗m ,m ) − ∆Lp (np,m + 1)
= ∆Li∗m (ni∗m ,m ) − ∆Lp (np,k )
= ∆Li∗m (ni∗m ,m ) − ∆Li∗k (ni∗k ,k ) ≤ 0
where the last inequality is due to P1. Therefore, (3.14) immediately follows from (3.6).
To prove P4, first note that when p =
p is removed from Ck , in which case p
p = i∗m as assumed. Therefore, with δk
np,m = np,k+1 = np,k + 1, since p 6= jl∗
consider two possible cases:
jk∗ we must have δk > 0. If δk ≤ 0, then from (3.6),
6∈ Cm for any m > k and it is not possible to have
> 0, we get np,k+1 = np,k + 1 from (3.5). Moreover,
for all l such that k < l < m, and p ∈ Cm . We now
Case 1: If δm > 0, then np,m+1 = np,m − 1 = np,k . The following subcases are now possible:
Case 1.1: If there is at least one j ∈ Cm+1 such that ∆Lj (nj,m+1 ) > ∆Lp,m+1 (np,m+1 ),
∗
then we are assured that i∗m+1 6= p. If jm+1
= arg mini∈Cm+1 {∆Li (ni,m+1 )} is unique,
∗
∗
then, since jk = p and np,m+1 = np,k , it follows from P2 that jm+1
= p. Now consider
δm+1 and observe that
δm+1 = ∆Li∗m+1 (ni∗m+1 ,m+1 ) − ∆Lp (np,m+1 + 1)
= ∆Li∗m+1 (ni∗m+1 ,m+1 ) − ∆Lp (np,m )
= ∆Li∗m+1 (ni∗m+1 ,m+1 ) − ∆Li∗m (ni∗m ,m ) ≤ 0
where the last inequality is due to P1. Therefore, from (3.6), Cm+2 = Cm+1 − {p}
∗
and (3.15) holds for q = 1. If, on the other hand, jm+1
is not unique, then it is
∗
possible that jm+1 6= p since we have assumed that ties are arbitrarily broken. In
∗
this case, there are at most q ≤ N − 1 steps before jm+q
= p. This is because at step
∗
m + 1 either δm+1 ≤ 0 and jm+1 is removed from Cm+1 , or δm+1 > 0 and, from (3.5),
∗
∗
∗
∗
∗
∗
(njm+1
(njm+2
njm+2
,m+2 ) > ∆Ljm+1
,m+1 )
,m+2 = njm+1
,m+1 + 1, in which case ∆Ljm+2
from A3.2. The same is true for any of the q steps after m. Then at step m + q + 1,
∗
we get δm+q+1 ≤ 0 by arguing exactly as in the case where jm+1
is unique, with m + 1
replaced by m + q + 1, and again and (3.15) holds.
Case 1.2: If ∆Lj (nj,m+1 ) is the same for all j ∈ Cm+1 , then it is possible that i∗m+1 = p.
∗
In this case, δl < 0 for all l > m + 1 due to A3.2. Therefore, jm+1
will be removed
∗
from Cm+1 through (3.6). Moreover, since im+1 = p by (3.10), this process repeats
itself for at most q ≤ N − 1 steps resulting in Cm+q+1 = {p}.
∗ = r where r 6= p, then C
Case 2: If δm ≤ 0 and jm
m+1 = Cm − {r}. In this case, note that
∗
∗
im+1 = im = p and depending on the sign of δm+1 we either go to Case 1 or we repeat
the process of removing one additional user index from the Cm+1 set. In the event that
δl ≤ 0 for all l > m, all jl∗ will be removed from the Cl set. The only remaining element
in this set is p, which reduces to Case 1.2 above.
91
Property P5 follows from P3 by observing in (3.5) that the only way to get np,m > np,k is if
jl∗ = p and δl > 0 for some k < l < m . However, P3 asserts that this is not possible, since p would
be removed from Cl .
Property P6 follows from P4 by a similar argument. The only way to get np,m < np,k is if i∗l = p
and δl > 0 for some k < l < m . However, it is clear from the proof of P4 that p would either be
removed from Cl , possibly after a finite number of steps, or simply remain in this set until it is the
last element in it.
B.4
We begin by first establishing the fact that the process terminates in a finite number of steps bounded
by K(N + 1). This is easily seen as follows. At any step k, the process determines some i∗k (say p)
with two possibilities: (i) Either user p gives one resource to some other user through (3.5), or (ii)
One user index is removed from Ck through (3.6), in which case i∗k+1 = p and we have the exact same
situation as in step k (if case (ii) persists, clearly |Cl | = 1 for some l ≤ k + N − 1). Under case (i),
because of property P5, p cannot receive any resources from other users, therefore in the worst case
p will give away all of its initial resources to other users and will subsequently not be able to either
give or receive resources from other users. Since np,k ≤ K for any k, it follows that p can be involved
in a number of steps that is bounded by K + 1, where 1 is the extra step when p is removed from
Ck at some k. Finally, since there are N users undergoing this series of steps, in the worst case the
process terminates in N (K + 1) steps.
This simple upper bound serves to establish the fact that the process always terminates in a finite
number of steps. We will use this fact together with some of the properties in Lemma 3.3.1 to find a
tighter upper bound. Let the initial allocation be s0 . Since the process always terminates in a finite
number of steps, there exists some final allocation ¯s = [¯
n1 , · · · , n
¯ N ] which, given s0 , is unique since
the algorithm is deterministic. An allocation ni,k at the kth step can be written as follows:
ni,k = n
¯ i + di,k
PN
where di,k = {−K, · · · , −1, 0, 1, · · · , K} and
feasible. Now define the following three sets:
Ak = {i : di,k > 0},
i=1 di,k
= 0 for all k = 0, 1, · · ·, since all allocations are
Bk = {i : di,k = 0},
Ck = {i : di,k < 0}
and note that at the final state di = 0 for all i = 1, · · · , N . Due to P3, at every step we have i∗k ∈ Ak
(recall that once a user is selected as i∗k it can only give away resources to other users). Similarly,
due to P4, jk∗ ∈ Bk ∪ Ck .
At every step of the process, there are only two possibilities:
1. If δk > 0, let p = i∗k ∈ Ak and q = jk∗ ∈ Bk ∪ Ck . Then, at the next step, (3.5) implies that
dp,k+1 = dp,k − 1 and dq,k+1 = dq,k + 1.
2. If δk ≤ 0, then a user index from Bk is removed from the set Ck .
92
Moreover, from the definitions of the three sets above, we have
N
X
di,k = 0 =
i=1
N
X
di,k +
i=1
i∈Ak
N
X
di,k
i=1
i∈Ck
and, therefore, we can write
Pk =
N
X
di,k = −
i=1
i∈Ak
N
X
di,k
i=1
i∈Ck
where 0 ≤ Pk ≤ K for all k = 0, 1, · · ·, since 0 ≤ ni,k ≤ K.
Now let P0 be the initial value of Pk and let |A0 | be the initial cardinality of the set Ak . We
separate the number of steps required to reach the final allocation into three categories:
(i) Clearly, P0 steps (not necessarily contiguous) are required to make Pk = 0 for some k ≥ 0 by
removing one resource at each such step form users i ∈ Al , l ≤ k. During any such step, we
have δl > 0 as in case 1 above.
(ii) These P0 steps would suffice to empty the set Ak if it were impossible for user indices to be
added to it from the set Bl , l ≤ k. However, from property P4 it is possible for a user j
such that j ∈ Bk and j 6∈ Al for all l < k to receive at most one resource, in which case we
have j ∈ Ak . There are at most N − |A0 | users with such an opportunity, and hence N − |A0 |
additional steps are possible. During any such step, as in (i), we have δl > 0 as in case 1 above.
(iii) Finally, we consider steps covered by case 2 above. Clearly, N − 1 steps are required to reach
|Ck | = 1 for some k.
Therefore, the number of steps L required to reach the final allocation is such that L ≤ P0 + N −
|A0 | + N − 1. Observing that P0 ≤ K and |A0 | ≥ 1, we get L ≤ K + 2(N − 1). Note that if |A0 | = 0,
implies that the s0 = ¯s and in this case only N − 1 steps (see (iii) above) are required to reach the
final state. Thus, N − 1 is the lower bound on the required number of steps.
B.5
First, by Lemma 3.3.2, a final allocation exists. We will next show that this allocation satisfies
∆Li (¯
ni + 1) ≥ ∆Lj (¯
nj ) for any i, j
(B.9)
We establish this by contradiction. Suppose there exist p 6= q such that (B.9) is violated, i.e.
∆Lp (¯
np + 1) < ∆Lq (¯
nq ), and suppose that p, q were removed from Ck and Cl respectively (i.e., at
steps k, l respectively). Then, two cases are possible:
Case 1: k < l. For p to be removed from Ck in (3.6), the following should be true: jk∗ = p and δk ≤ 0.
However,
∆Lp (np,k + 1) ≥ ∆Li∗k (ni∗k ,k ) ≥ ∆Li∗ (ni∗ ,l ) ≥ ∆Lq (nq,l )
where the first inequality is due to δk ≤ 0, the second is due to property P1 in (3.12), and the
last is due to the definition of i∗k . Therefore, our assumption is contradicted.
93
Case 2: k > l. Now q is removed from Cl first, therefore:
∆Lq (nq,l ) = ∆Ljl∗ (njl∗ ,l ) ≤ ∆Ljk∗ (njk∗ ,k ) = ∆Lp (np,k ) < ∆Lp (np,k + 1)
where the two equalities are due to (3.6) and the fact that q, p were removed from Cl and Ck
respectively. In addition, the first inequality is due to P2 in (3.13), and the last inequality is
due to A3.2. Again, our assumption is contradicted.
Therefore, (B.9) holds. We can now invoke Theorem 3.2.1, from which it follows that (B.9) implies
global optimality.
B.6
Suppose δk (i, j) satisfies Assumption A3.4. Given ˆsk = s, Cˆk = C, consider the event L(ˆsk+1 ) > L(ˆsk ).
According to the process (3.18)-(3.22) and (3.11):
If δˆk (î∗k , ˆjk∗ ) > 0,
if δˆk (î∗ , ˆj ∗ ) ≤ 0,
k
then L(ˆsk+1 ) − L(ˆsk ) = −δk (î∗k , ˆjk∗ ), and
then L(ˆsk+1 ) = L(ˆsk ).
k
Therefore, L(ˆsk+1 ) > L(ˆsk ) occurs if and only if
δˆk (î∗k , ˆjk∗ ) > 0 and δk (î∗k , ˆjk∗ ) < 0.
Then,
Pr[L(ˆsk+1 ) > L(ˆsk ) | (ˆsk , Cˆk ) = (s, C)] = Pr[δˆk (î∗k , ˆjk∗ ) > 0 and δk (î∗k , ˆjk∗ ) < 0]
X
ˆ f (k) (ni ) − ∆L
ˆ f (k) (nj + 1) > 0 and (î∗k , ˆjk∗ ) = (i, j)]
=
Pr[∆L
i
j
{(i,j)|δk (i,j)<0}
≤
X
f (k)
ˆ
Pr[∆L
i
ˆ f (k) (nj + 1) > 0].
(ni ) − ∆L
j
(B.10)
{(i,j)|δk (i,j)<0}
For each pair of (i, j) satisfying δk (i, j) < 0, we know from Lemma 2.2.1 that
f (k)
ˆ
lim Pr[∆L
i
k→∞
ˆ f (k) (nj + 1) > 0] = 0.
(ni ) − ∆L
j
Taking this limit in (B.10), and also noticing the finiteness of the set {(i, j) | δk (i, j) < 0} for any
pair (ˆsk , Cˆk ) = (s, C), we obtain
lim dk (s, C) = lim Pr[L(ˆsk+1 ) > L(ˆsk ) | (ˆsk , Cˆk ) = (s, C)] = 0
k→∞
k→∞
and the proof of (3.24) is complete.
The definition (3.25) immediately implies that dk is monotone decreasing and that dk ≥ dk (s, C).
The limit (3.26) then follows from (3.24).
94
B.7
Before we prove this lemma, we first need to formalize the definition of the αk sequence as well as
an auxiliary results which is stated as Lemma B.7.1. Hence, we define a sequence of integers {αk }
satisfying
lim αk = ∞, lim αk (ak + bk ) = 0, lim (1 − dbk/2c )αk = 1,
(B.11)
k→∞
k→∞
k→∞
where, for any x, bxc = {n | n ≤ x, n is integer} is the greatest integer smaller than x. Such a
sequence {αk } exists. For example, any αk ≤ b(max{dbk/2c , ak + bk })−1/2 c, limk→∞ αk = ∞, satisfies
(B.11) (without loss of generality, we assume that dk ak bk 6= 0, otherwise αk can take any arbitrary
value). The choice of {αk } is rather technical. Its necessity will be clear from the following proofs of
Lemmas B.7.1 and 3.4.3. Furthermore, observe that if {αk } satisfies (B.11), we also have
lim (1 − dk )αk = 1,
k→∞
(B.12)
since dk ≤ dbk/2c .
Next, we define another property in terms of the next lemma which is needed to establish
Lemma 3.4.3. The property states that if an allocation is not optimal at step k, then the probability that this allocation remains unchanged over αk steps is asymptotically zero.
Lemma B.7.1 Suppose that A2.1 and A3.4 hold and let {αk } satisfy (B.11). Consider an allocation
s = [n1 , · · · , nN ] 6= s∗ and any set C. Then
lim Pr[L(ˆsh+1 ) = L(ˆsh ), h = k, ..., k + αk − 1 | (ˆsk , Cˆk ) = (s, C)] = 0
k→∞
(B.13)
Proof: Given ˆsk = s, Cˆk = C, consider the event L(ˆsk+1 ) = L(ˆsk ). According to the process
(3.18)-(3.22) and (3.11):
If δˆk (î∗k , ˆjk∗ ) > 0, then L(ˆsk+1 ) − L(ˆsk ) = −δk (î∗k , ˆjk∗ ),
and if δˆk (î∗k , ˆjk∗ ) ≤ 0, then L(ˆsk+1 ) = L(ˆsk ).
Therefore, L(ˆsk+1 ) = L(ˆsk ) occurs if and only if
either {δˆk (î∗k , ˆjk∗ ) > 0, δk (î∗k , ˆjk∗ ) = 0} or {δˆk (î∗k , ˆjk∗ ) ≤ 0}.
(B.14)
for every k.
For notational convenience, consider, for any k, the events
−
ˆ ˆ∗ ˆ∗
ˆ ˆ∗ ˆ∗
ˆ∗ ˆ∗
A+
k = {δk (ik , jk ) > 0, δk (ik , jk ) = 0} and Ak = {δk (ik , jk ) ≤ 0}
Next, for any i ≥ 1, define the following subset of {k, · · · , k + i − 1}:
ˆ = {h : δˆh (î∗ , ˆj ∗ ) ≤ 0, h ∈ {k, · · · , k + i − 1}}
R(i)
h h
ˆ k ). In addition, for any given integer I, let R
ˆ I (αk ) denote
and let Iˆk be the cardinality of the set R(α
such a set with exactly I elements. Then, define the set
ˆ I (αk ) = {k, · · · , k + αk − 1} − R
ˆ I (αk )
Q
95
containing all indices h ∈ {k, · · · , k + αk − 1} which do not satisfy δˆh (î∗h , ˆjh∗ ) ≤ 0.
Finally, define
+
−
−
ˆ
ˆ
A+
I (αk ) = {Ah , for all h ∈ QI (αk )} and AI (αk ) = {Ah , h ∈ RI (αk )}
Depending on the value of Iˆk defined above, we can write
Pr[L(ˆsh+1 ) = L(ˆsh ), h = k, ..., k + αk − 1 | (ˆsk , Cˆk ) = (s, C)] =
= Pr[L(ˆsh+1 ) = L(ˆsh ), h = k, ..., k + αk − 1, Iˆk ≤ |C| + N | (ˆsk , Cˆk ) = (s, C)] +
+ Pr[L(ˆsh+1 ) = L(ˆsh ), h = k, ..., k + αk − 1, Iˆk > |C| + N | (ˆsk , Cˆk ) = (s, C)].
(B.15)
We will now consider each of the two terms in (B.15) separately.
The first term in (B.15) can be rewritten as
Pr[L(ˆsh+1 ) = L(ˆsh ), h = k, ..., k + αk − 1, Iˆk ≤ |C| + N | (ˆsk , Cˆk ) = (s, C)] =
|C|+N
=
X
Pr[L(ˆsh+1 ) = L(ˆsh ), h = k, ..., k + αk − 1, Iˆk = I | (ˆsk , Cˆk ) = (s, C)]
(B.16)
I=0
Using the notation we have introduced, observe that
Pr[L(ˆsh+1 ) = L(ˆsh ), h = k, ..., k + αk − 1, Iˆk = I | (ˆsk , Cˆk ) = (s, C)]
X
=
Pr[A− (αk ), A+ (αk ) | (ˆsk , Cˆk ) = (s, C)]
I
I
ˆ I (αk )
R
=
X
−
Pr[A+
sk , Cˆk ) = (s, C)] Pr[A−
sk , Cˆk ) = (s, C)]
I (αk ) | AI (αk ), (ˆ
I (αk ) | (ˆ
(B.17)
ˆ I (αk )
R
ˆ I (αk ) (otherwise, if
Set h0 = k + αk − 1 and, without loss of generality, assume that h0 ∈
6 R
ˆ I (αk ) there must exist some M such that k + αk − M 6∈ R
ˆ I (αk ) and the same argument may
∈R
0
be used with H = k + αk − M ). Then
h0
−
Pr[A+
sk , Cˆk ) = (s, C)]
I (αk ) | AI (αk ), (ˆ
+
−
= Pr[A+
sk , Cˆk ) = (s, C)]
h0 , AI (αk − 1) | AI (αk ), (ˆ
X
−
=
Pr[A+
sh0 , Cˆh0 ) = (s0 , C 0 ), A+
sk , Cˆk ) = (s, C)]
I (αk − 1) | AI (αk ), (ˆ
h0 , (ˆ
(B.18)
(s0 ,C 0 )
Recalling the definition of A+
h0 , we can write
−
Pr[A+
sh0 , Cˆh0 ) = (s0 , C 0 ), A+
sk , Cˆk ) = (s, C)]
I (αk − 1) | AI (αk ), (ˆ
h0 , (ˆ
−
= Pr[δˆh0 (î∗h0 , ˆjh∗0 ) > 0 | δh0 (î∗h0 , ˆjh∗0 )=0, (ˆsh0 , Cˆh0 ) = (s0 , C 0 ), A+
sk , Cˆk ) = (s, C)] ×
I (αk − 1), AI (αk ), (ˆ
Pr[δh0 (î∗h0 , ˆjh∗0 )=0, (ˆsh0 , Cˆh0 ) = (s0 , C 0 ), A+ (αk − 1) | A− (αk ), (ˆsk , Cˆk ) = (s, C)]
I
I
Then, the Markov property of the process (3.18)-(3.22) implies that
−
Pr[A+
sh0 , Cˆh0 ) = (s0 , C 0 ), A+
sk , Cˆk ) = (s, C)]
I (αk − 1) | AI (αk ), (ˆ
h0 , (ˆ
= Pr[δˆh0 (î∗h0 , ˆjh∗0 ) > 0 | δh0 (î∗h0 , ˆjh∗0 )=0, (ˆsh0 , Cˆh0 ) = (s0 , C 0 )] ×
−
Pr[δh0 (î∗h0 , ˆjh∗0 )=0, (ˆsh0 , Cˆh0 ) = (s0 , C 0 ), A+
sk , Cˆk ) = (s, C)] (B.19)
I (αk − 1) | AI (αk ), (ˆ
96
However, by Assumption A3.4,
Pr[δˆh0 (î∗h0 , ˆjh∗0 ) > 0 | δh0 (î∗h0 , ˆjh∗0 )=0, (ˆsh0 , Cˆh0 ) = (s0 , C 0 )] ≤ 1 − p0
Thus, (B.19) becomes
−
Pr[A+
sh0 , Cˆh0 ) = (s0 , C 0 ), A+
sk , Cˆk ) = (s, C)]
I (αk − 1) | AI (αk ), (ˆ
h0 , (ˆ
+
≤ (1 − p0 ) Pr[δh0 (î∗h0 , ˆjh∗0 )=0, (ˆsh0 , Cˆh0 ) = (s0 , C 0 ), AI (αk − 1) | A−
sk , Cˆk ) = (s, C)]
I (αk ), (ˆ
≤ (1 − p0 ) Pr[(ˆsh0 , Cˆh0 ) = (s0 , C 0 ), A+ (αk − 1) | A− (αk ), (ˆsk , Cˆk ) = (s, C)].
I
I
Using this inequality in (B.18), we obtain
−
Pr[A+
sk , Cˆk ) = (s, C)]
I (αk ) | AI (αk ), (ˆ
−
≤ (1 − p0 ) Pr[A+
sk , Cˆk ) = (s, C)].
I (αk − 1) | AI (αk ), (ˆ
Continuing this recursive procedure, we finally arrive at
−
Pr[A+
sk , Cˆk ) = (s, C)] ≤ (1 − p0 )αk −I
I (αk ) | AI (αk ), (ˆ
which allows us to obtain the following inequality from (B.16) and (B.17):
Pr[L(ˆsh+1 ) = L(ˆsh ), h = k, ..., k + αk − 1, Iˆk ≤ |C| + N | (ˆsk , Cˆk ) = (s, C)]
|C|+N
≤
X
X
(1 − p0 )αk −I Pr[A−
sk , Cˆk ) = (s, C)]
I (αk ) | (ˆ
I=0 R
ˆ I (αk )
|C|+N
≤
X
αk −(|C|+N )
(1 − p0 )αk −I ≤ p−1
.
0 (1 − p0 )
(B.20)
I=0
Since 0 ≤ 1 − p0 < 1 by Assumption A3.4, and since limk→∞ αk = ∞ according to (B.11), the
preceding inequality implies that the first term in (B.15) is such that
lim Pr[L(ˆsh+1 ) = L(ˆsh ), h = k, ..., k + αk − 1, Iˆk < |C| + N | (ˆsk , Cˆk ) = (s, C)] = 0.
k→∞
(B.21)
Next we consider the second term in (B.15). Let Jˆk be the first step after k that we have either
î∗ 6∈ Amax or ˆj ∗ 6∈ Amin , h = k, k + 1, ..., k + αk − 1. Clearly, k ≤ Jˆk ≤ k + αk . We also use (without
h
h
h
h
confusion) Jˆk = k + αk to mean that î∗h ∈ Amax
and ˆjh∗ ∈ Amin
for all h = k, k + 1, ..., k + αk − 1.
k
h
Then the second term in (B.15) can be written as
Pr[L(ˆsh+1 ) = L(ˆsh ), h = k, ..., k + αk − 1, Iˆk > |C| + N | (ˆsk , Cˆk ) = (s, C)]
=
k+α
k −1
X
Pr[L(ˆsh+1 ) = L(ˆsh ), h = k, ..., k + αk − 1, Iˆk > |C| + N, Jˆk = J | (ˆsk , Cˆk ) = (s, C)]
J=k
min
ˆ
ˆ∗
+Pr[L(ˆsh+1 ) = L(ˆsh ), î∗h ∈ Amax
sk , Cˆk )=(s, C)]
h , jh ∈ Ah , h = k, ..., k+αk −1, Ik > |C|+N |(ˆ
(B.22)
We shall now consider each of the two terms in (B.22) separately. In the first term, for any J,
k ≤ J < k + αk , we have
≤ Pr[Jˆk = J | (ˆsk , Cˆk ) = (s, C)]
=
X
Pr[Jˆk = J | (ˆsJ , CˆJ ) = (s0 , C 0 )] Pr[(ˆsJ , CˆJ ) = (s0 , C 0 )|(ˆsk , Cˆk ) = (s, C)]
{(s0 ,C 0 )}
97
(B.23)
where the second step above follows from the Markov property of (3.18)-(3.22). Moreover,
Pr[Jˆk = J | (ˆsJ , CˆJ ) = (s0 , C 0 )]
[
≤ Pr[{î∗J 6∈ Amax
{ˆjJ∗ 6∈ Amin
sJ , CˆJ ) = (s0 , C 0 )]
J }
J } | (ˆ
≤ Pr[î∗J 6∈ Amax
| (ˆsJ , CˆJ ) = (s0 , C 0 )] + Pr[ˆjJ∗ 6∈ Amin
| (ˆsJ , CˆJ ) = (s0 , C 0 )]
J
J
≤ aJ + bJ ≤ ak + bk
where we have used (3.29), (3.30), (3.32), and the monotonicity of ak and bk . This inequality, together
with (B.23), implies that
k+α
k −1
X
J=k
≤ (αk − 1)(ak + bk ).
By Lemma 3.4.2 and (B.11) it follows that
lim
k→∞
k+α
k −1
X
Pr[L(ˆsh+1 ) = L(ˆsh ), h = k, ..., k + αk − 1, Iˆk > |C| + N, Jˆk = J | (ˆsk , Cˆk ) = (s, C)] = 0
J=k
(B.24)
As for the second term in (B.22), we note the following facts.
(a) Given that ˆsk 6= s∗ , then either there is a j ∈ Cˆk such that δk (i∗k , j) > 0 for any i∗k ∈ Amax
k
or the set Cˆh first decreases to |Cˆh | = 1 according to (3.19) and then is reset to C0 , in which
case there is a j ∈ C0 such that δk (i∗k , j) > 0 (otherwise ˆsk would be the optimum according
to Theorem 3.2.1. Therefore, without loss of generality, we assume that there is a j ∈ Cˆk such
that δk (i∗k , j) > 0 for any i∗k ∈ Amax
k .
min
ˆ∗
(b) As long as (B.14) holds and î∗h ∈ Amax
h , jh ∈ Ah ,
ni,j )}, h = k, k + 1, ..., k + αk − 1.
ni,j )} = max{∆Li (ˆ
max{∆Li (ˆ
j∈Cˆk
j∈Cˆh
(c) One user is deleted from the set Cˆh every time δˆk (î∗h , ˆjh∗ ) ≤ 0.
The previous facts (a) – (c) imply that, when
min
ˆ
ˆ∗
L(ˆsh+1 ) = L(ˆsh ), î∗h ∈ Amax
h , jh ∈ Ah , h = k, · · · , k+αk −1, Ik > |C|+N,
ˆ k, k ≤ M
ˆ k ≤ k + αk − 1 such that
with probability one there exists a M
∗
ˆ∗ˆ , ˆj ∗ˆ ) > 0.
δˆMˆ k (î∗Mˆ , ˆjM
ˆ k (iM
ˆ ) ≤ 0, δM
M
k
k
k
k
Then, the second term in (B.22) becomes
min
ˆ
ˆ∗
Pr[L(ˆsh+1 ) = L(ˆsh ), î∗h ∈ Amax
sk , Cˆk ) = (s, C)]
h , jh ∈ Ah , h = k, ..., k+αk −1, Ik > |C|+N | (ˆ
∗
∗
∗
∗
≤ Pr[δˆ ˆ (î , ˆj ) ≤ 0, δ ˆ (î , ˆj ) > 0 | (ˆsk , Cˆk ) = (s, C)]
Mk
=
k+α
k −1
X
ˆk
M
ˆk
M
Mk
ˆk
M
ˆk
M
∗
∗
ˆ k = M, (ˆsk , Cˆk ) = (s, C)] ×
Pr[δˆM (î∗M , ˆjM
) ≤ 0, δM (î∗M , ˆjM
)>0 | M
M =k
ˆ k = M | (ˆsk , Cˆk ) = (s, C)]
Pr[M
98
(B.25)
Using Lemma 2.2.1, we know that
∗
∗
ˆ k = M, (ˆsk , Cˆk ) = (s, C)]
) ≤ 0, δM (î∗M , ˆjM
)>0|M
X
ˆ k = M, (ˆsk , Cˆk ) = (s, C)] → 0 as k → ∞.
≤
Pr[δˆM (i, j) ≤ 0 | M
{(i,j)∈CˆM ,δM (i,j)>0}
Therefore, we get from (B.25):
min
ˆ
ˆ∗
lim Pr[L(ˆsh+1 ) = L(ˆsh ), î∗h ∈ Amax
sk , Cˆk ) = (s, C)] = 0
h , jh ∈ Ah , h = k, ..., k+αk −1, Ik > |C|+N | (ˆ
k→∞
The combination of this fact with (B.24) and (B.21) yields the conclusion of the lemma.
First, given (ˆsk , Cˆk ) = (s, C) and some αk defined as in (B.11), consider sample paths such that
L(ˆsi+1 ) ≤ L(ˆsi ) for all i = k, k + 1, · · · , k + αk − 1. Observe that any such sample path can be
decomposed into a set such that L(ˆsi+1 ) < L(ˆsi ) for some k ≤ h ≤ k + αk − 1 and a set such that
L(ˆsi+1 ) = L(ˆsi ) for all i = k, k + 1, · · · , k + αk − 1. Thus, we can write
{L(ˆsi+1 ) ≤ L(ˆsi ), i = k, · · · , k + αk − 1}
= {∃k ≤ h ≤ k + αk − 1 s.t. L(ˆsh+1 ) < L(ˆsh ), and L(ˆsi+1 ) ≤ L(ˆsi ), i = k, · · · , k+αk −1, i 6= h}
[
{L(ˆsi+1 ) = L(ˆsi ), i = k, · · · , k +αk − 1}.
(B.26)
Therefore,
Pr[L(ˆsi+1 ) ≤ L(ˆsi ), i = k, ..., k + αk − 1 | (ˆsk , Cˆk ) = (s, C)]
= Pr[∃k ≤ h < k+αk s.t. L(ˆsh+1 ) < L(ˆsh ),
and L(ˆsi+1 ) ≤ L(ˆsi ), i = k, · · · , k+αk−1, i 6= h | (ˆsk , Cˆk ) = (s, C)]
+ Pr[L(ˆsi+1 ) = L(ˆsi ), i = k, · · · , k + αk − 1 | (ˆsk , Cˆk ) = (s, C)]
≤ Pr[L(ˆsk+αk ) < L(ˆsk ) | (ˆsk , Cˆk ) = (s, C)]
+ Pr[L(ˆsi+1 ) = L(ˆsi ), i = k, · · · , k + αk − 1 | (ˆsk , Cˆk ) = (s, C)].
(B.27)
Using Lemma B.7.1, the second term on the right-hand-side above vanishes as k → ∞, and (B.27)
yields
lim Pr[L(ˆsi+1 ) ≤ L(ˆsi ), i = k, · · · , k + αk − 1 | (ˆsk , Cˆk ) = (s, C)].
k→∞
≤ lim Pr[L(ˆsk+αk ) < L(ˆsk ) | (ˆsk , Cˆk ) = (s, C)]
k→∞
On the other hand, we can write
Pr[L(ˆsi+1 ) ≤ L(ˆsi ), i = k, · · · , k + αk − 1 | (ˆsk , Cˆk ) = (s, C)]
X
=
Pr[(ˆsi , Cî ) = (si , Ci ), i = k + 1, · · · , k + αk | (ˆsk , Cˆk ) = (s, C)].
(
)
(si ,Ci )
L(si )≤L(si−1 )
i=k+1,···,k+αk
99
(B.28)
The Markov property of {(ˆsk , Cˆk )} implies that
Pr[(ˆsi , Cî ) = (si , Ci ), i = k + 1, ..., k + αk | (ˆsk , Cˆk ) = (s, C)]
=
k+α
Yk
Pr[(ˆsi , Cî ) = (si , Ci ) | (ˆsi−1 , Cî−1 ) = (si−1 , Ci−1 )].
i=k+1
Thus
=
k+α
Yk
X
(
)
(si ,Ci )
L(si )≤L(si−1 )
i=k+1,···,k+αk
=
Pr[(ˆsi , Cî ) = (si , Ci ) | (ˆsi−1 , Cî−1 ) = (si−1 , Ci−1 )]
i=k+1
k+α
k −1 ³
Y
X
(
)
(si ,Ci )
L(si )≤L(si−1 )
i=k+1,···,k+αk −1
X
(
)
Pr[(ˆsi , Cî ) = (si , Ci ) | (ˆsi−1 , Cî−1 ) = (si−1 , Ci−1 )]×
i=k+1
´
Pr[(ˆsj , Cˆj ) = (sj , Cj ) | (ˆsj−1 , Cˆj−1 ) = (sj−1 , Cj−1 )]
(sj ,Cj )
L(sj )≤L(sj−1 )
j=k+αk
=
k+α
k −1 ³
Y
X
)
(
(si ,Ci )
L(si )≤L(si−1 )
i=k+1,···,k+αk −1
Pr[(ˆsi , Cî ) = (si , Ci ) | (ˆsi−1 , Cî−1 ) = (si−1 , Ci−1 )]×
i=k+1
´
Pr[L(ˆsk+αk ) ≤ L(sk+αk −1 ) | (ˆsk+αk −1 , Cˆk+αk −1 ) = (sk+αk −1 , Ck+αk −1 )]
Now, recalling the definition of dk (s, C) in (3.23), observe that the last term in the product
above is precisely [1 − dk+αk −1 (ˆsk+αk −1 , Cˆk+αk −1 )]. Moreover, by Lemma 3.4.1 we have dk+αk −1 ≥
dk+αk −1 (ˆsk+αk −1 , Cˆk+αk −1 ). Therefore, we get
(
)
(si ,Ci )
L(si )≤L(si−1 )
i=k+1,···,k+αk −1
≥
···
≥
k+α
k −1
Y
X
≥ (1−dk+αk −1 )
Pr[(ˆsi , Cî ) = (si , Ci ) | (ˆsi−1 , Cî−1 ) = ˆ(si−1 , Ci−1 )]
i=k+1
k+α
k −1
Y
(1 − di )
i=k
≥ (1 − dk )αk
(B.29)
where the last inequality follows from Lemma 3.4.1 where it was shown that di is monotone decreasing
in i. Hence, since αk satisfies (B.12), we get
lim Pr[L(ˆsi+1 ) ≤ L(ˆsi ), i = k, · · · , k + αk − 1 | (ˆsk , Cˆk )] = 1
k→∞
100
Finally, using this limit and the inequality (B.28), and recalling the definition of ek (s, C) in (3.34),
we readily conclude that (3.35) holds. Moreover, the definition (3.36) immediately implies that ek is
monotone decreasing and ek ≥ ek (s, C). The limit (3.37) then follows from (3.35).
B.8
We begin by defining three auxiliary quantities we shall use in the proof. First, let us choose some
² > 0 such that
² < min0 {L(s) − L(s0 ) | L(s) − L(s0 ) > 0}.
(B.30)
{s,s }
Note that such ² > 0 exists because of the discrete nature of the cost function and the finiteness of
the number of feasible allocations. Observe that ² is a real number strictly lower than the smallest
cost difference in the allocation process.
Second, for any s, set
qs = b(L(s) − L(s∗ ))/²c.
Then
L(s) − L(s∗ ) ≥ qs ², L(s) − L(s∗ ) < (qs + 1)².
(B.31)
Finally, we shall define a convenient sequence {αk } that satisfies (B.11). To do so, let q =
maxs∈S qs , and, for any k, choose
αk = b
1
1
min{k, (max{dbk/2c , ak + bk })−1/2 }c ≤
min{k, (max{dbk/2c , ak + bk })−1/2 }.
2q
2q
Since the sequences {dk } and {ak + bk } are monotone decreasing by their definitions, the sequence
{αk } is monotone increasing and it is easy to verify that it satisfies (B.11).
The next step in the proof is to define a particular subsequence of {(ˆsi , Cî )} as follows. First, set
x = k − qαk and observe that
x = k − qαk ≥ k/2.
(B.32)
Then, define a sequence of indices {yi } with i = 0, · · · , qs through
y0 = x,
yi = yi−1 + αyi−1 , i = 1, 2, ..., qs .
For sufficiently large k such that αx ≥ 1, it is easy to verify by induction that
x = y0 < y1 < ... < yqs ≤ k.
(B.33)
Now, for any k and x defined above, consider a subsequence of {(ˆsi , Cî ), i = x, .., k}, denoted by ψ =
{(ˆsyi , Cˆyi ), i = 0, 1, ..., qs }, starting at ˆsy0 = ˆsx = s, and such that either there is an i, 0 ≤ i ≤ qs − 1
such that
L(ˆsyj+1 ) − L(ˆsyj ) ≤ −² for all j = 0, · · · , i − 1 and ˆsyj = s∗ ,
for all j = i, · · · , qs
or
L(ˆsyi+1 ) − L(ˆsyi ) ≤ −² and ˆsyi 6= s∗ for all i = 0, · · · , qs − 1
101
In other words, any such subsequence is “embedded” into the original process {(ˆsi , Cî )} so as to give
strictly decreasing costs and if it reaches the optimum it stays there afterwards.
The subsequence defined above has the additional property that
ˆsyqs = s∗ .
(B.34)
This is obvious in the case where ˆsyj = s∗ for some j = 0, 1, · · · , qs . On the other hand, if ˆsyi 6= s∗
for all i = 0, 1, ..., qs − 1, we must have
L(ˆsyi )−L(ˆsyi−1 ) ≤ −², i = 1, ..., qs .
(B.35)
Adding the qs inequalities above yields
L(ˆsyqs ) − L(ˆsx ) ≤ −qs ²
or, since ˆsx = s,
L(ˆsyqs ) − L(s∗ ) ≤ L(s) − L(s∗ ) − qs ².
This inequality, together with (B.31), implies that
L(ˆsyqs ) − L(s∗ ) ≤ ².
Since ² satisfies (B.30), we must have L(ˆsyqs ) = L(s∗ ) for all paths satisfying (B.35), which in
turn implies ˆsyqs = s∗ since the optimum s∗ is assumed unique. Therefore, for every subsequence
ψ = {(ˆsyi , Cˆyi ), i = 0, 1, ..., qs } considered, (B.34) holds.
Before proceeding with the main part of the proof, let us also define, for notational convenience,
a set Ψ to contain all subsequences of the form ψ as specified above, or any part of any such
subsequence, i.e., any {(ˆsyn , Cˆyn ), · · · , (ˆsym , Cˆym )} with n ≤ m and n, m ∈ {0, 1, · · · , qs }.
Then, for any s 6= s∗ and any C, all sample paths restricted to include some ψ ∈ Ψ form a subset
of all sample paths that lead to a state such that ˆsyqs = s∗ , i.e.,
Pr[ˆsyqs = s∗ | (ˆsx , Cˆx ) = (s, C)]
X
≥
Pr[(ˆsyi , Cˆyi ) = (si , Ci ), i = 1, ..., qs | (ˆsx , Cˆx ) = (s, C)]
(B.36)
{(si ,Ci ),i=1,...,qs }∈Ψ
Because {(ˆsk , Cˆk )} is a Markov process, setting (s0 , C0 ) = (s, C), the previous inequality can be
rewritten as
≥
X
qs
Y
Pr[(ˆsyi , Cˆyi ) = (si , Ci ) | (ˆsyi−1 , Cˆyi−1 ) = (si−1 , Ci−1 )]
{(si ,Ci ),i=1,...,qs }∈Ψ i=1
In addition, let us decompose any subsequence ψ into its first (qs − 1) elements and the remaining
element (sqs , Cqs ). Thus, for any subsequence whose (qs − 1)th element is (ˆsyqs −1 , Cˆyqs −1 ), there is a
102
set of final states such that (sqs , Cqs ) ∈ Ψ, so that we may write
qs
Y
X
Pr[(ˆsyi , Cˆyi ) = (si , Ci ) | (ˆsyi−1 , Cˆyi−1 ) = (si−1 , Ci−1 )]
{(si ,Ci ),i=1,...,qs }∈Ψ i=1
=

qY
s−1
X

Pr[(ˆsyi , Cˆyi ) = (si , Ci ) | (ˆsyi−1 , Cˆyi−1 ) = (si−1 , Ci−1 )]×
{(si ,Ci ),i=1,...,qs−1}∈Ψ i=1
X

Pr[(ˆsyqs , Cˆyqs ) = (sqs , Cqs ) | (ˆsyqs −1 , Cˆyqs −1 ) = (sqs −1 , Cqs −1 )]
(B.37)
(sqs ,Cqs )∈Ψ
Let us now consider two possible cases regarding the value of sqs −1 .
Case 1: If sqs −1 = s∗ , then, aggregating over all Cqs and recalling (B.34), we can write, for any Cqs −1
in some subsequence of Ψ,
X
Pr[(ˆsyqs , Cˆyqs ) = (sqs , Cqs ) | (ˆsyqs −1 , Cˆyqs −1 ) = (s∗ , Cqs −1 )]
(sqs ,Cqs )∈Ψ
= Pr[ˆsyqs = s∗ | (ˆsyqs −1 , Cˆyqs −1 ) = (s∗ , Cqs −1 )]
Now let us consider a subsequence {(ˆsi , Cî )} with i = yqs −1 , · · · , yqs and ˆsyqs −1 = ˆsyqs = s∗ .
Observing that all subsequences {(ˆsi , Cî )} restricted to ˆsi = s∗ for all i = yqs −1 , · · · , yqs form
a subset of all the subsequences above, and exploiting once again the Markov property of the
process {(ˆsk , Cˆk )}, we can clearly write
Pr[ˆsyqs = s∗ | (ˆsyqs −1 , Cˆyqs −1 ) = (s∗ , Cqs −1 )]
yqs
Y
X
≥
0
Pr[(ˆsi , Cî ) = (s∗ , Ci0 )|(ˆsi−1 , Cî−1 ) = (s∗ , Ci−1
)]
{Ci0 ,i=yqs −1 ,...,yqs }i=yqs −1 +1
where Cy0 qs −1 = Cqs −1 . Using the definition of dk (s, C) in (3.23) and noticing that, given ˆsk = s∗ ,
L(ˆsk+1 ) ≤ L(ˆsk ) is equivalent to ˆsk+1 = s∗ when the optimum is unique, each term in the
product above can be replaced by [1 − di−1 (ˆsi−1 , Cî−1 )], i = yqs −1 + 1, · · · , yqs . In addition, from
Lemma 3.4.1, we have dk ≥ dk (s, C). Therefore,
Pr[ˆsyqs = s∗ | (ˆsyqs −1 , Cˆyqs −1 ) = (s∗ , Cqs −1 )]
yqs
Y
≥
(1 − di ) ≥ (1 − dyqs −1 )yqs −yqs −1 ≥ (1 − dx )yqs −yqs −1
(B.38)
i=yqs −1 +1
where the last two inequalities follow from the fact that dk is monotone decreasing and the fact
that yi ≥ x.
Case 2: If sqs −1 6= s∗ , then by the definition of any subsequence ψ ∈ Ψ, we must have a strict cost
decrease, i.e., L(ˆsyqs ) − L(ˆsyqs −1 ) ≤ −². Therefore, for any Cqs −1 in some subsequence of Ψ, we
can now write
X
Pr[(ˆsyqs , Cˆyqs ) = (sqs , Cqs )|(ˆsyqs −1 , Cˆyqs −1 ) = (sqs −1 , Cqs −1 )])
(sqs ,Cqs )∈Ψ
= Pr[L(ˆsyqs ) − L(ˆsyqs −1 ) ≤ −²|(ˆsyqs −1 , Cˆyqs −1 ) = (sqs −1 , Cqs −1 )]
= Pr[L(ˆsy
) < L(ˆsy
)|(ˆsy
, Cˆy
) = (sqs −1 , Cqs −1 )]
+αy
qs −1
qs −1
qs −1
103
qs −1
qs −1
(B.39)
recalling the choice of ² in (B.30).
We can now make use of the definition of ek (s, C) in (3.34) and write
1 − eyqs −1 = Pr[L(ˆsyqs −1 +αyqs −1 ) < L(ˆsyqs −1 )|(ˆsyqs −1 , Cˆyqs −1 ) = (sqs −1 , Cqs −1 )]
Then, making use of the monotonicity of {ek } established in Lemma 3.4.3 and the fact that
yi ≥ x for all i = 1, · · · , qs , we get
X
Pr[(ˆsyqs , Cˆyqs ) = (sqs , Cqs )|(ˆsyqs −1 , Cˆyqs −1 ) = (sqs −1 , Cqs −1 )]
(sqs ,Cqs )∈Ψ
≥ 1 − eyqs −1 ≥ 1 − ex
(B.40)
Therefore, combining both cases, i.e., inequalities (B.38) and (B.40), we obtain the inequality
X
Pr[(ˆsyqs , Cˆyqs ) = (sqs , Cqs )|(ˆsyqs −1 , Cˆyqs −1 ) = (sqs −1 , Cqs −1 )]
(sqs ,Cqs )∈Ψ
≥ min{(1 − dx )yqs −yqs −1 , 1 − ex }
≥ (1 − dx )yqs −yqs −1 (1 − ex ).
Returning to (B.36) and using the inequality above, we obtain
≥(1−dx )yqs−yqs −1 (1−ex )×
X
qY
s−1
Pr[(ˆsyi , Cˆyi ) = (si , Ci )|(ˆsyi−1 , Cˆyi−1 ) = (si−1 , Ci−1 )]
{(si ,Ci ),i=1,...,qs−1}∈Ψi=1
This procedure can now be repeated by decomposing a subsequence ψ with (qs − 1) elements into
its first (qs − 2) elements and the remaining element (sqs −1 , Cqs −1 ) and so on. Note that in this case
the value of the last state at each step of this procedure, sqs −i , i = 1, · · · , qs , is not necessarily s∗ .
However, if sqs −i−1 = sqs −i = s∗ , then Case 1 considered earlier applies; if sqs −i−1 6= s∗ , then Case 2
applies.
Thus, after qs such steps, we arrive at:
Pr[ˆsyqs = s∗ | (ˆsx , Cˆx ) = (s, C)] ≥ (1 − dx )yqs −y0 (1 − ex )qs .
Since x ≥ k/2 ≥ bk/2c according to (B.32) and since dk and ek are monotone decreasing according
to (3.25) and (3.36) respectively, we have
dx ≤ dbk/2c
Thus
and ex ≤ ebk/2c .
Pr[ˆsyqs = s∗ | ˆsx , Cˆx ] ≥ (1 − dbk/2c )yqs −y0 (1 − ebk/2c )q .
(B.41)
On the other hand, noting that yqs ≤ k according to (B.33), consider {(ˆsk , Cˆk )} starting from
(ˆsx , Cˆx ). Then,
Pr[ˆsk = s∗ | (ˆsx , Cˆx ) = (s, C)]
X
≥
Pr[(ˆsyqs , Cˆyqs ) = (s∗ , Cyqs ), (ˆsi , Cî ) = (s∗ , Ci ), i = yqs +1, ..., k | (ˆsx , Cˆx ) = (s, C)]
{Ci ,i=yqs ,...,k}
104
where we have used the fact that ˆsyqs = s∗ . Using, once again, the Markov property and the same
argument as in Case 1 earlier to introduce dk , we get
X
Pr[(ˆsyqs , Cˆyqs ) = (s∗ , Cyqs ), (ˆsi , Cî ) = (s∗ , Ci ), i = yqsx +1, ..., k | (ˆsx , Cˆx ) = (s, C)]
{Ci ,i=yqs ,...,k}
X
=
Pr[(ˆsyqs , Cˆyqs ) = (s∗ , Cyqs ) | (ˆsx , Cˆx ) = (s, C)] ×
{Ci ,i=yqs ,...,k}
k−1
Y
Pr[(ˆsh+1 , Cˆh+1 ) = (s∗ , Ch+1 ) | (ˆsh , Cˆh ) = (s∗ , Ch )]
h=yqs
≥ (1 − dbk/2c )yqs −y0 (1 − ebk/2c )q
k−1
Y
(1 − dh )
h=yqs
≥ (1 − ebk/2c )q (1 − dbk/2c )k−x
= (1 − ebk/2c )q (1 − dbk/2c )qαk .
Consequently
Pr[ˆsk = s∗ ] = E[Pr[ˆsk = s∗ | (ˆsx , Cˆx ) = (s, C)]]
≥ (1 − ebk/2c )q (1 − dbk/2c )qαk
= (1 − ebk/2c )q [(1 − dbk/2c )αk ]q → 1,
as k → ∞
(B.42)
where the limit follows from (3.37) in Lemma 3.4.3 and the choice of αk satisfying (B.12). This
proves that {ˆsk } converges to s∗ in probability.
B.9
First, let us derive two relations that will prove useful in the proof of the theorem. From the definition
of the quantities ak (s, C), bk (s, C) given by (3.29) and (3.30) respectively we get
h
i
ak (s, C) = Pr î∗k ∈
6 Amax
| (ˆsk , Cˆk ) = (s, C)
k
"
≤ Pr
≤
≤
#
ˆ f (k) (ˆ
ˆ f (k) (ˆ
max
{∆L
nj,k )} ≥ max
{∆L
ni,k )} | (ˆsk , Cˆk ) = (s, C)
j
i
max
max
j6∈Ak
min Pr
i∈Amax
k
min
i∈Ak
"
ˆ f (k) (ˆ
max {∆L
nj,k )}
j
j6∈Amax
k

 X

i∈Amax
k
h
j6∈Amax
k
f (k)
ˆ
Pr ∆L
j
#
≥
ˆ f (k) (ˆ
∆L
ni,k )
i
| (ˆsk , Cˆk ) = (s, C)

i
ˆ f (k) (ˆ
(ˆ
nj,k ) ≥ ∆L
ni,k ) | (ˆsk , Cˆk ) = (s, C)
i

.
(B.43)
Note that ∆Lj (ˆ
nj,k ) < ∆Li (ˆ
ni,k ) for all j 6∈ Amax
and i ∈ Amax
k
k . Similarly, we get
bk (s, C) ≤ min


 X

i∈Amin

k
h
f (k)
ˆ
Pr ∆L
j
f (k)
ˆ
(ˆ
nj,k ) ≤ ∆L
i
j6∈Amin
k
and ∆Lj (ˆ
nj,k ) > ∆Li (ˆ
ni,k ) for all j 6∈ Amin
and i ∈ Amin
k
k .
105


i
(ˆ
ni,k ) | (ˆsk , Cˆk ) = (s, C)


(B.44)
Next, for any αk , consider the event [L(ˆsi+1 ) ≤ L(ˆsi ), i = k, ..., k + αk − 1] and observe that
= Pr[∃h, k ≤ h < k+αk s.t. L(ˆsh+1 ) < L(ˆsh ),
and L(ˆsi+1 ) ≤ L(ˆsi ), i = k, · · · , k+αk−1, i 6= h | (ˆsk , Cˆk ) = (s, C)]
+ Pr[L(ˆsi+1 ) = L(ˆsi ), i = k, · · · , k + αk − 1 | (ˆsk , Cˆk ) = (s, C)]
≤ Pr[L(ˆsk+αk ) < L(ˆsk ) | (ˆsk , Cˆk ) = (s, C)]
+ Pr[L(ˆsi+1 ) = L(ˆsi ), i = k, · · · , k + αk − 1 | (ˆsk , Cˆk ) = (s, C)].
(B.45)
In addition, it follows from Lemma B.7.1 (specifically equations (B.15), (B.20), (B.22) and (B.25))
that
Pr[L(ˆsh+1 ) = L(ˆsh ), h = k, ..., k + αk − 1 | (ˆsk , Cˆk ) = (s, C)]
αk −(|C|+N )
≤ p−1
+ (αk − 1)(ak + bk )
0 (1 − p0 )
+
k+α
k −1
X
∗
∗
) ≤ 0, δM (î∗M , ˆjM
) > 0 | (ˆsk , Cˆk ) = (s, C)]
M =k
≤
p−1
0 (1
+
− p0 )αk −(|C|+N ) + (αk − 1)(ak + bk )
k+α
k −1
X
X
n
M =k
(i,j)∈CˆM ,
δM (i,j)>0
o
P [δˆM (i, j) ≤ 0 | (ˆsk , Cˆk ) = (s, C)].
(B.46)
We can now combine (B.45), (B.46) with (B.29) to establish the following inequality for any
(s, C):
αk −(|C|+N )
ek (s, C) ≤ [1 − (1 − dk )αk ] + p−1
+ (αk − 1)(ak + bk )
0 (1 − p0 )
+
k+α
k −1
X
X
P [δˆM (i, j) ≤ 0 | (ˆsk , Cˆk ) = (s, C)].
(B.47)
M =k {(i,j)∈CˆM ,δM (i,j)>0}
Now we are ready to proceed to the proof of the theorem. If f (k) ≥ k 1+c for some c > 0 and if
the assumptions of Lemma 3.4.4 are satisfied, we know from Lemma 3.4.4, the definition in (3.32)
(B.10) that
¶
µ
¶
µ
1
1
=O
.
dk = O
f (k)
k 1+c
Furthermore, since the space of (s, C) is finite, Lemma 3.4.4, the definition in (3.32), and inequalities
(B.43) and (B.44) imply that
µ
1
ak = O
f (k)
¶
µ
=O
1
k 1+c
¶
µ
,
1
bk = O
f (k)
¶
µ
=O
Next, choose
αk =
1+c
ln(k), k = 1, 2, ...
− ln(1 − p0 )
106
1
k 1+c
¶
.
and observe that {αk } above satisfies (B.11) and that (1 − p0 )αk =
1
.
k1+c
Then, (B.47) gives
µ
ek = O(1 − (1 − dk )αk ) + O((1 − p0 )αk ) + O((αk − 1)(ak + bk )) + O
µ
ln(k)
= O
k 1+c
¶
µ
αk
+ O((1 − p0 )
¶
1
f (k)
¶
ln(k)
=O
.
k 1+c
Finally, from (B.42) we get
µ
∗
q
qαk
Pr[ˆsk = s ] = 1−O(1−(1−ebk/2c ) (1−dbk/2c )
¶
ln(k)
) = 1−O(ebk/2c +αk dbk/2c ) = 1−O
.
k 1+c
Since ˆsk can take only a finite set of values, the previous equation can be rewritten as
µ
Pr[|ˆsk − s∗ | ≥ ²] = O
P
ln(k)
k 1+c
¶
(B.48)
for any sufficiently small ² > 0. Since k ln(k)
< ∞, we know from the Borel-Cantelli Lemma ([69],
k1+c
pp. 255-256) that {ˆsk } converges almost surely to the optimum allocation s∗ .
107
Appendix C
C.1
We use induction on k = 0, 1, · · · and establish the result for any number of kanban k to be allocated
over N stages. First, define the following vectors: xk is the allocation reached at the kth step in
(4.5); x∗k is the solution of (RA3) over x ∈ Ak ; and finally yk is any allocation in Ak .
For k = 0, (4.6) gives i∗0 := arg maxi=1,...,N {∆Ji (x0 )}. Then, from the definition of ∆Ji (x) and
condition (U), it follows that
(C.1)
J(x0 + ei∗0 ) > J(x0 + ei0 )
for all i0 = 1, · · · , N , i0 6= i∗0 , which implies that x∗1 = x0 + ei∗0 . Note that this is true because (4.6)
is obtained from an exhaustive search over the entire space A1 which includes only N allocations,
x0 + ei for i = 1, · · · , N . Since equation (4.5) gives x1 = x0 + ei∗0 , it follows that x1 = x∗1 , that is, x1
is the solution of (RA3) over A1 .
Now suppose that for some k ≥ 1 the vector xk obtained from (4.5)-(4.6) yields the optimal
allocation, that is
J(xk ) = J(x∗k ) ≥ J(yk ) for all yk ∈ Ak
From (4.6), again i∗k = arg maxi=1,...,N {∆Ji (xk )} (a unique index under (U)). It then follows from
the definition of ∆Ji (x) that
J(xk + ei∗k ) = J(xk ) + ∆Ji∗k (xk )
≥ J(xk ) + ∆Jik (xk )
= J(xk + eik ),
for any ik = 1, · · · , N . Therefore,
J(xk + ei∗k ) = max {J(xk + ei )} ≥ max {J(yk + ei )}
i=1,..,N
i=1,..,N
where the inequality is due to the smoothness condition (S). Hence, x∗k+1 = xk + ei∗k Finally, note
that (4.5) also gives xk+1 = xk + ei∗k , and therefore, xk+1 = x∗k+1 , i.e. xk+1 is the solution of (RA3)
over Ak+1 .
108
Conversely, suppose that the algorithm yields the optimal solution for any K = 1, 2, · · ·, however
it does not satisfy conditions (S) and (U) for some k < K. This implies that there exists an allocation
x∗k ∈ Ak such that J(x∗k ) ≥ J(yk ) for all yk ∈ Ak and maxi=1,..,N {J(x∗k + ei )} < maxi=1,..,N {J(yk +
ei )}. This implies that the algorithm does not yield an optimal allocation over Ak+1 , which is a
contradiction.
C.2
Let yk , k = 1, · · · , K denote the allocations that the DIO process in (4.5) would visit if J(x)
were known exactly. Clearly, yK is the optimal allocation due to Theorem 4.2.1. We proceed by
determining the probability that (4.11)-(4.13) will yield yK for some l:
Pr[ˆ
xK,l = yK ] = Pr[î∗K−1,l = i∗K−1 |ˆ
xK−1,l = yK−1 ] Pr[ˆ
xK−1,l = yK−1,l ]
where î∗K−1,l and i∗K−1 are defined in (4.13) and (4.6) respectively. Further conditioning, we get
Pr[ˆ
xK,l = yK ] =
(K−1
Y
)
Pr[î∗k,l
=
i∗k |ˆ
xk,l
= yk ] Pr[ˆ
x0,l = y0 ]
(C.2)
k=1
Next, take any term of the product
h
Pr î∗k,l = i∗k |ˆ
xk,l = yk
i


n

o

f (l)
ˆf (l)
= Pr 
max ∆Jˆj (x) |ˆ
xk,l = yk 
∆Ji∗k,l (x) > j=1,···,N

j6=i∗k,l



k,l
j=1,···,N
j6=i∗k,l

=
n
f (l)
ˆ
= 1 − Pr 
∆Ji∗ (x) ≤ max
f (l)
∆Jˆj
o

(x) |ˆ
xk,l = yk 


N
 [

f (l)
f (l)


1 − Pr 
∆Jî∗ (x) ≤ ∆Jˆj (x)|ˆ
xk,l = yk 
k,l


j=1
j6=i∗k,l
≥ 1−
N
X
·
f (l)
f (l)
Pr ∆Jî∗ (x) ≤ ∆Jˆj (x)|ˆ
xk,l = yk
¸
k,l
j=1
j6=i∗k,l
(C.3)
Since liml→∞ f (l) = ∞, and since ∆Ji∗k,l (x) > ∆Jj (x), j 6= i∗k,l all terms in the summation go to 0
due to Lemma 2.2.1 and therefore, all terms Pr[î∗k,l = i∗k |ˆ
xk,l = yk ] approach 1 as l → ∞. Moreover,
by (4.12) we have Pr[ˆ
x0,l = y0 ] = 1, where y0 = [0, · · · , 0]. It follows that liml→∞ Pr[ˆ
xK,l = yK ] = 1
and the theorem is proved.
C.3
From Lemma 3.4.4 we get that
·
¸
f (l)
f (l)
Pr ∆Jî∗ (x) ≤ ∆Jˆj (x)|ˆ
xk,l = yk = O
k,l
109
µ
1
f (l)
¶
µ
=O
1
l1+c
¶
Also from (C.3) we get that the
µ
Pr[ˆ
xk,l 6∈
Xk∗ |ˆ
xk,l
= yk ] = O
1
l1+c
¶
(C.4)
where Xk∗ is the set of all allocations that exhibit optimal performance from the set Ak . Clearly,
∞
X
1
l=1
l1+c
<∞
(C.5)
Hence, using the Borel-Cantelli Lemma (see pp. 255-256 of [69]) we conclude that xˆk,l converges to
the optimal allocation almost surely.
110
Appendix D
D.1
In order to prove this theorem, first we need to derive the perturbation of (k, n) which is defined in
(5.19). Given the definition of Dkn from equation (5.21), then, there are 9 distinct cases possible.
Case 1. Nominal Sample Path: (k, n) starts a new busy period and is not blocked, i.e. Wkn ≤ 0,
Bkn ≤ 0.
f n ≤ 0 and
Perturbed Sample Path: (k, n) starts a new busy period and is not blocked, i.e. W
k
n
e ≤ 0.
B
k
Applying (5.22) for both nominal and perturbed sample paths, we get:
e n−1 + Z n ) = Dn−1 − D
e n−1 = ∆D n−1
∆Dkn = Dkn−1 + Zkn − (D
k
k
k
k
k
(D.1)
Case 2. Nominal Sample Path: (k, n) starts a new busy period and is not blocked, i.e. Wkn ≤ 0 and
Bkn ≤ 0.
f n > 0 and B
e n ≤ 0.
Perturbed Sample Path: (k, n) waits and is not blocked, i.e. W
k
k
Applying (5.22) for the nominal sample path and (5.23) for the perturbed sample path, we get:
e n + Z n)
∆Dkn = Dkn−1 + Zkn − (D
k−1
k
n
e n − Dn
= Dkn−1 + Dk−1
−D
k−1
k−1
n
n
= ∆Dk−1
+ Dkn−1 − Dk−1
n
= ∆Dk−1
+ Ikn
where (5.16) was used in the last step. Note that Ikn ≥ 0 since Wkn ≤ 0 by assumption. Adding
and subtracting ∆Dkn−1 and using (5.20) allows us to rewrite this equation in the following
form (which will prove more convenient later on):
h
(k,n−1)
∆Dkn = ∆Dkn−1 − ∆(k−1,n) − Ikn
i
(D.2)
Case 3. Nominal Sample Path: (k, n) starts a new busy period and is not blocked, i.e. Wkn ≤ 0 and
Bkn ≤ 0.
111
e n > 0.
Perturbed Sample Path: (k, n) is blocked, i.e. B
k
Using (5.24) for the perturbed path and the definition of ∆Dkn ,
e n+1
∆Dkn = Dkn − D
k−xn+1 −1[n+1]
n+1
n+1
e n+1
= Dkn + Dk−x
−D
k−xn+1 −1[n+1] − Dk−xn+1 −1[n+1]
n+1 −1[n+1]
n+1
n+1
= ∆Dk−x
+ Dkn − Dk−x
n+1 −1[n+1]
n+1 −1[n+1]
Using (5.17),(5.18) and the fact that Dkn = Ckn we get, if 1[n + 1] = 0,
n+1
∆Dkn = ∆Dk−x
− Bkn
n+1
and, if 1[n + 1] = 1,
n+1
n+1
n+1
n+1
∆Dkn = ∆Dk−x
+ Dkn − Dk−x
+ Dk−x
− Dk−x
n+1 −1
n+1
n+1
n+1 −1
n+1
= ∆Dk−x
− Bkn + Qn+1
k−xn+1
n+1 −1
Again add and subtract ∆Dkn−1 to obtain
h
i
(k,n−1)
∆Dkn = ∆Dkn−1 − ∆(k−xn+1 −1[n+1],n+1) + Bkn − Qn+1
k−xn+1 · 1[n + 1]
(D.3)
For the other six remaining cases, expressions for ∆Dkn can be derived in a similar way. We omit
the details and provide only the final equations:
Case 4. Nominal Sample Path: (k, n) waits and is not blocked, i.e. Wkn > 0 and Bkn ≤ 0.
f n ≤ 0 and
k
e n ≤ 0.
B
k
h
i
(k−1,n)
n
∆Dkn = ∆Dk−1
− ∆(k,n−1) − Wkn
(D.4)
f n > 0 and B
e n ≤ 0.
k
k
n
∆Dkn = ∆Dk−1
(D.5)
e n > 0.
k
h
i
(k−1,n)
n
∆Dkn = ∆Dk−1
− ∆(k−xn+1 −1[n+1],n+1) + Bkn − Qn+1
k−xn+1 · 1[n + 1]
(D.6)
Case 7. Nominal Sample Path: (k, n) is blocked, i.e. Bkn > 0.
f n ≤ 0 and
k
n
e
Bk ≤ 0.
i
h
(k−xn+1 ,n+1)
n+1
n
n +
−
B
−
[W
]
(D.7)
∆Dkn = ∆Dk−x
−
∆
k
k
(k,n−1)
n+1
f n > 0 and B
e n ≤ 0.
k
k
h
(k−x
n+1
n+1
∆Dkn = ∆Dk−x
− ∆(k−1,n)
n+1
112
,n+1)
− Bkn − [Ikn ]+
i
(D.8)
e n > 0.
k
h
(k−x
,n+1)
i
n+1
n+1
n+1
∆Dkn = ∆Dk−x
− ∆(k−xn+1
−1,n+1) − Qk−xn+1 · 1[n + 1]
n+1
(D.9)
Using these 9 cases we can prove the theorem as follows. First, we show that the last two terms
f n and B
e n alone:
in the max bracket of equation (5.25) can be expressed in terms of W
k
k
(k,n−1)
n
∆(k−1,n) − Ikn = ∆Dkn−1 − ∆Dk−1
− Ikn
e n−1 − Dn + D
e n − Dn−1 + Dn
= Dkn−1 − D
k−1
k−1
k−1
k
k
en − D
e n−1
= D
k−1
k
fn
= W
k
(D.10)
(k,n−1)
∆(k−xn+1 −1[n+1],n+1) + Bkn − Qn+1
k−xn+1 · 1[n + 1] =
n+1
= ∆Dkn−1 − ∆Dk−x
+ Bkn − Qn+1
k−xn+1 · 1[n + 1]
n+1 −1[n+1]
e n−1 − Dn+1
e n+1
= Dkn−1 − D
k
k−xn+1 −1[n+1] + Dk−xn+1 −1[n+1] +
n+1
n+1
n+1
+Dk−x
− Dkn − (Dk−x
− Dk−x
) · 1[n + 1]
n+1
n+1
n+1 −1
e n+1
e n−1 − Z n
= D
k
k−xn+1 −1[n+1] − Dk
e n+1
en
fn +
= D
k−xn+1 −1[n+1] − (Ck − [Wk ] )
e n + [W
f n ]+
= B
k
k
(D.11)
Therefore, equation (5.25) is equivalent to:
f n, B
e n + [W
f n ]+ }
∆Dkn = ∆Dkn−1 − max{0, W
k
k
k
(D.12)
We can then consider the following three possible cases:
f n ≤ 0 and B
e n ≤ 0, then Case 1 examined earlier applies and equation (D.1) gives
1. If W
k
k
n−1
∆Dkn = ∆Dk , which is precisely (D.12) since
f n, B
e n + [W
f n ]+ } = 0.
max{0, W
k
k
k
f n > 0 and B
e n ≤ 0, then Case 2 applies and equation (D.2) holds, which is again (D.12)
2. If W
k
k
since
f n, B
e n + [W
f n ]+ } = W
f n = ∆(k,n−1) − I n
max{0, W
k
k
k
k
k
(k−1,n)
e n > 0, then Case 3 applies and (D.3) holds, which is the same as (D.12) since
3. If B
k
(k,n−1)
n+1
n
f n, B
e n + [W
f n ]+ } = B
e n + [W
f n ]+ = ∆
max{0, W
k
k
k
k
k
(k−xn+1 −1[n+1],n+1) + Bk − Qk−xn+1 · 1[n + 1]
113
Appendix E
E.1
In order to prove the theorem we identify the following four cases in (8.13).
Case I: Ak (sm ) ≤ Am ≤ Lm−1 , i.e., m does not start a new busy period.
From (8.13), even when τ obtains its maximum value, τ = Am −Ak (sm ), we have max{0, Ak (sm )+
Am − Ak (sm ) − Lm−1 } = 0 since Lm−1 ≥ Am . Therefore ∆Lm (τ ) = Z. Substituting ∆Lm (τ ) in the
cost function (8.11) we get
Ck (sm , τ ) = (cg − ca )τ + cg sm + ca (Lm−1 − Ak (sm )) + ca
B
X
b=1
"
Bb Z −
b
X
#+
IiB
i=1
= (cg − ca )τ + H
(E.1)
where H is a constant independent of τ . Since ca ≥ cg , Ck (sm , τ ) obtains its minimum value when
τ obtains its maximum value, that is
τ ∗ = Am − Ak (sm )
(E.2)
which is shown in Figure E.1. Note that in this case, from the definition of T1 and T2 and since
Ak (sm ) ≤ Am ≤ Lm−1 , max{T1 , T2 } = T1 and min{0, T1 } = 0. Therefore, (8.14) gives τ ∗ = T2 =
Am − Ak (sm ) which is the result in (E.2).
Case II: Ak (sm ) < Lm−1 < Am and Z − Im ≤ 0 (in this case, m starts a new busy period).
Under these conditions, the second case in (8.13) applies and gives
(
∆Lm (τ ) =
0,
if 0 ≤ τ ≤ Am − Ak (sm ) − Z
Ak (sm ) + τ + Z − Am , if Am − Ak (sm ) − Z < τ ≤ Am − Ak (sm )
(E.3)
To simplify the notation, we define P = Am − Ak (sm ) − Z, therefore, (E.3) is rewritten as
(
∆Lm (τ ) =
0,
if 0 ≤ τ ≤ P
τ − P, if P < τ ≤ Am − Ak (sm )
114
(E.4)
τ∗
¾
-
?
Ak (sm )
D
¾
Am
τ range
Lm−1
-
-
Figure E.1: τ ∗ for Case I
Substituting ∆Lm (τ ) in the cost function (8.11)


cg τ + cg sm + ca max{0, Lm−1 − Ak (sm ) − τ }, if 0 ≤ τ ≤ P



Ck (sm , τ ) =
cg τ + cg sm + ca max{0, Lm−1 − Ak (sm ) − τ }

h
i+

P
P


+ca B Bi −P + τ − b I B ,
i=1 i
b=1
(E.5)
if P ≤ τ ≤ Am − Ak (sm )
Since, by assumption, Lm−1 > Ak (sm ),
(
Lm−1 − Ak (sm ) − τ, if 0 ≤ τ ≤ Lm−1 − Ak (sm )
0,
if Lm−1 − Ak (sm ) ≤ τ ≤ Am − Ak (sm )
(E.6)
Substituting in the cost function (E.5), we get:
max{0, Lm−1 − Ak (sm ) − τ } =


if 0 ≤ τ ≤ Lm−1 − Ak (sm ) ≤ P

 (cg − ca )τ + cg sm + ca (Lm−1 − Ak (sm )),
c
τ
+
c
s
,
if Lm−1 − Ak (sm ) ≤ τ ≤ P
g
g
m
Ck (sm , τ ) =
h
i+

PB
Pb

B
 cg τ + cg sm + ca b=1 Bi −P + τ − i=1 I
, if P ≤ τ ≤ Am − Ak (sm )
i
(E.7)
Note that (E.6) breaks the first case of (E.5) into two subcases. Further, it simplifies the second
case of (E.5) since for the range Lm−1 − Ak (sm ) ≤ P ≤ τ ≤ Am − Ak (sm ), max{·} = 0 leading to
the third case of (E.7). Next we check each of the three possible cases in of (E.7) to find the value
of τ that minimizes the cost function. In the first case, the corresponding expression is minimized
when τ obtains its maximum value, i.e., τ = Lm−1 − Ak (sm ). In the second case, the expression is
minimized when τ obtains its minimum value which again is τ = Lm−1 − Ak (sm ). Finally, the third
expression is always greater than or equal to the second one, since the summation term is always
non-negative. Therefore, the cost is minimized when
τ ∗ = Lm−1 − Ak (sm )
(E.8)
as shown in Figure E.2. Note that in this case Ak (sm ) ≤ Lm−1 ≤ Am , therefore, max{T1 , T2 } = T2
and min{0, T1 } = 0. Hence, (8.14) gives τ ∗ = T1 = Lm−1 − Ak (sm ) which is the result of (E.8).
Case III: Ak (sm ) < Lm−1 < Am and Z − Im > 0 (as in Case II, m again starts a new busy period).
Since Z − Im > 0, (8.13) reduces to
(
∆Lm (τ ) = max{τ − P, Z − Im } =
Z − Im , if 0 ≤ τ ≤ Lm−1 − Ak (sm )
τ − P,
if Lm−1 − Ak (sm ) < τ ≤ Am − Ak (sm )
115
(E.9)
Im
¾
τ∗
¾
?
Z
-¾
-
-
Lm−1
Ak (sm )
D
Am
Figure E.2: τ ∗ for Case II
where as before, P = Am − Ak (sm ) − Z. Therefore, the additional cost due to k becomes
Ck (sm , τ ) =


(cg − ca )τ + cg sm + ca (Lm−1 − Ak (sm ))


i+
h

Pb
P

B


,
+ca B
i=2 Ii
b=1 Bi Z − Im −
if 0 ≤ τ ≤ Lm−1 − Ak (sm )


cg τ + cg sm


h
i+

P
Pb

B

+ca B
B
−P
+
τ
−
I
, if Lm−1 − Ak (sm ) ≤ τ ≤ Am − Ak (sm )
i
b=1
i=2 i
(E.10)
The first term is minimized when τ obtains its maximum value, i.e., τ = Lm−1 − Ak (sm ). The second
term is minimized when τ obtains its minimum value i.e., τ = Lm−1 − Ak (sm ). Therefore, the value
of τ that minimizes the cost is given by:
τ ∗ = Lm−1 − Ak (sm )
(E.11)
which is the same as (E.8) and is illustrated in Figure E.3. Note that, as in Case II, Ak (sm ) ≤ Lm−1 ≤
Am implies that max{T1 , T2 } = T2 and min{0, T1 } = 0. Hence, (8.14) gives τ ∗ = T1 = Lm−1 −Ak (sm )
which is the result of (E.11).
τ∗
¾
Z
-¾
-
?
D
Ak (sm )
Lm−1
Am
-
Figure E.3: τ ∗ for Case III
Case IV: Lm−1 < Ak (sm ) < Am (as in Cases II, III, m again starts a new busy period).
In this case, with P as defined earlier,
τ −P
= τ − Am + Ak (sm ) + Z
= Z − (Am − Lm−1 ) + τ + Ak (sm ) − Lm−1
= Z − Im + τ + Ak (sm ) − Lm−1
> Z − Im
since Ak (sm ) > Lm−1 . Therefore, (8.13) reduces to
(
∆Lm (τ ) = max{0, τ − P } =
0,
if 0 ≤ τ ≤ P
τ − P, if P < τ ≤ Am − Ak (sm )
116
(E.12)
and based on the sign of P = Am − Ak (sm ) − Z we identify two subcases which are also presented
in Figure E.4:
(a) P ≥ 0
The cost function is then given by
(
Ck (sm , τ ) =
cg τ + cg sm ,
cg τ + cg sm + ca
h
PB
b=1 Bi
−P + τ −
Pb
B
i=1 Ii
i+
if 0 ≤ τ ≤ P
, if P ≤ τ ≤ Am − Ak (sm )
(E.13)
Clearly, the minimum value is obtained when τ ∗ = 0.
(b) P < 0
Since τ ≥ 0, only ∆Lm (τ ) = τ − P is possible in (E.12), for all 0 ≤ τ ≤ Am − Ak (sm ).
Therefore, the cost function becomes
Ck (sm , τ ) = cg τ + cg sm + ca
B
X
"
Bi −P + τ −
b
X
#+
IiB
(E.14)
i=1
b=1
which obtains its minimum value when τ ∗ = 0.
Therefore, under the conditions of case IV, the value of τ that minimizes the cost is given by
τ ∗ = 0.
(E.15)
In this case, Lm−1 ≤ Ak (sm ) ≤ Am , hence max{T1 , T2 } = T2 and min{0, T1 } = T1 . Therefore, (8.14)
gives τ ∗ = 0 which is the result in (E.15).
D
Z
¾
(a)
-
?
Lm−1 Ak (sm )
(b)
¾
D
?
Lm−1
-
Am
Ak (sm )
Z
-
Am
-
Figure E.4: Case IV subcases: (a) P > 0, (b) P < 0
Note that no other case is possible since, by assumption, Ak (sm ) ≤ Am and the proof is complete.
E.2
In the nominal path (before airplane k is considered), airplane a is expected to experience an airborne
waiting time Wa = La−1 − Aa . In the perturbed path (when k is considered) under a zero GHP we
117
get Wk = La−1 − Ak (0) and Wa = La−1 − Aa + Z, since the presence of k prior to a imposes an
additional airborne delay Z on airplane a. Clearly, this additional airborne delay propagates to all
a + 1, · · · , l.. Therefore,
Ck (0, 0) = ca [La−1 − Ak (0) + (l − a + 1)Z] + H
(E.16)
where H ≥ 0 accounts for the delay that k will induce on airplanes scheduled to arrive after l. Note
that for Ak (0) + dk ≥ Ll (dl ) the statement of the lemma holds trivially while for Ak (0) + dk < Ll (dl ),
H is constant independent of the GHD dk since in this case the departure of l in the perturbed path
is fixed at Ll (dl ) + Z. Moreover, note that H = 0 if l is the last scheduled airplane, or if the idle
period that precedes l + 1 is such that Il+1 ≥ Z.
(1)
Now, let us invoke the L-FPA algorithm. In the first iteration m(1) = a and sm = sa =
(1)
max{0, Aa−1 − Ak (0)} = 0. By assumption, Ak (sm ) = Ak (0) ≤ Aa ≤ La−1 , which corresponds to
Case I of Theorem 8.4.1, therefore, τ ∗(1) = Aa − Ak (0). Then, the new additional cost is
³
Ck 0, τ ∗(1)
´
n
o
∗(1)
= cg τ ∗(1) + ca max 0, La−1 − Ak (s(1)
) + ca (l − a + 1)Z + H
m +τ
= cg [Aa − Ak (0)] + ca max{0, La−1 − Ak (0) − Aa + Ak (0)} + ca (l − a + 1)Z + H
= cg [Aa − Ak (0)] + ca [La−1 − Aa ] + ca (l − a + 1)Z + H
= Ck (0, 0) − (ca − cg )(Aa − Ak (0))
≤ Ck (0, 0)
where the max evaluates to La−1 − Aa since La−1 − Aa ≥ 0 and the last inequality is due to ca > cg .
Therefore, L-FPA will assign a GHD dk = τ ∗(1) (L-FPA step 4).
(2)
(2)
In the next iteration, m(2) = a + 1 and sm = sa+1 = Aa − Ak (0). In this case, Ak (sm ) =
(2)
Ak (0) + sm = Aa ≤ Aa+1 ≤ La since a + 1 also belongs in the same busy period. Therefore, again
(2)
Case I of Theorem 8.4.1 holds and τ ∗(2) = Aa+1 − Ak (sm ) = Aa+1 − Aa . Hence, the new additional
cost is
³
∗(2)
Ck s(2)
m ,τ
´
³
´
n
o
∗(2)
∗(2)
= cg s(2)
+ ca max 0, La − Ak (s(2)
) + ca (l − a)Z + H
m +τ
m +τ
= cg (Aa − Ak (0) + Aa+1 − Aa ) + ca max{0, La − Aa+1 }} + ca (l − a)Z + H
= cg (Aa+1 − Ak (0)) + ca (La−1 + Z − Aa+1 − Aa + Aa ) + ca (l − a)Z + H
= cg (Aa+1 − Aa ) + cg (Aa − Ak (0)) + ca (La−1 − Aa )
³
= Ck 0, τ
³
−ca (Aa+1 − Aa ) + ca (l − a + 1)Z + H
∗(1)
´
´
− (ca − cg )(Aa+1 − Aa )
≤ Ck 0, τ ∗(1) .
(E.17)
(2)
Therefore, L-FPA will again increase the ground-holding delay to dk = sm + τ ∗(2) (L-FPA step
4). Equation (E.17) indicates that increasing the GHD such that Ak (dk ) is delayed from interval
[Ak (0), Aa ) to [Aa , Aa+1 ) reduces the additional cost by an amount proportional to the length of this
interval, i.e., by (ca − cg )(Aa+1 − Aa ).
By proceeding in exactly the same way, in every iteration j, 1 < j ≤ l − a + 1, the additional
cost is reduced by (ca − cg )(Aa+j−1 − Aa+j−2 ) and therefore the ground-holding delay is
dk = s(l−a+1)
+ τ ∗(l−a+1) = Al−1 − Ak (0) + Al − Al−1 = Al − Ak (0)
m
118
This in turn implies that
Ak (dk ) = Ak (0) + dk = Ak (0) + Al − Ak (0) = Al
that is, the earliest that k will arrive is exactly at the same time as the arrival of l. In this case, the
additional cost is given by
³
, τ ∗(l−a+1)
Ck s(l−a+1)
m
´
´
³
= Ck s(l−a)
, τ ∗(l−a) − (ca − cg )(Al − Al−1 )
m
³
´
= Ck s(l−a−1)
, τ ∗(l−a−1) − (ca − cg )(Al − Al−2 )
m
= ···
= Ck (0, 0) − (ca − cg ) [Al − Ak (0)]
= ca [La−1 + (l − a + 1)Z] − ca Ak (0) − ca Al + ca Ak (0)
+cg [Al − Ak (0)] + H
= ca [Ll − Al ] + cg [Al − Ak (0)] + H
= cg [Ll − Ak (0)] + (ca − cg )[Ll − Al ] + H
(E.18)
where we have used (E.16). Finally, in the next iteration m(l−a+2) − 1 = l, m(1−a+2) = l + 1, and
(l−a+2)
sm
= Al − Ak (0). Since l + 1 starts a new busy period, Al+1 (sl+1 ) = Al+1 (0) > Ll (dl ). In this
(l−a+2)
case, Ak (sm
) < Lm(l−a+2) −1 < Am(l−a+2) = Al+1 (0) hence, either Case II or III of Theorem 8.4.1
(l−a+2)
(l−a+2)
holds, thus τ ∗(l−a+2) = Lm(l−a+2) −1 −Ak (sm
) = Ll −Ak (0)−sm
cost consists only of the GHD assigned to k, that is
³
Ck s(l−a+2)
, τ ∗(l−a+2)
m
´
h
. In this case, the additional
i
= cg s(l−a+2)
+ τ ∗(l−a+2) + H
m
h
i
= cg s(l−a+2)
+ Ll − Ak (0) − s(l−a+2)
+H
m
m
= cg [Ll − Ak (0)] + H
³
≤ Ck s(l−a+1)
, τ ∗(l−a+1)
m
´
(E.19)
Hence, L-FPA will assign a new GHD so that
dk = s(l−a+2)
+ τ ∗(l−a+2) = Ll − Ak (0)
m
If H is large enough, it is possible that following iterations may further increase the ground holding
delay, so dk ≥ Ll − Ak (0), which is the statement of the lemma.
E.3
The proof is by induction over the added airplanes k = 1, 2, · · ·. First, we need to show that
A2 (d2 ) ≥ A1 (d1 ). To show this, consider the scheduling done by L-FPA for airplane k = 1. In this
case, since L-FPA starts with an empty list, airplanes m − 1 and m do not exist, therefore, Lm−1 = 0
and Am = ∞. Also, sm can only take a single value, sm = max{0, Am−1 − Ak (0)} = 0, hence
s1 = 0. Thus, since Lm−1 < Ak (0) = A1 (0) < Am , Case IV of Theorem 8.4.1 holds and as a result
τ ∗ = τ 1 = 0. Hence, d1 = s1 + τ 1 = 0, therefore, A1 (d1 ) = A1 (0) ≤ A2 (0) ≤ A2 (d2 ) since d2 ≥ 0.
Next, suppose that order is preserved up to k and check if Ak+1 (dk+1 ) ≥ Ak (dk ). Since order is
preserved up to k, then k is the last arrival in the last scheduled busy period since Ak (0) ≥ Aj (0)
for all j = 1, · · · , k − 1. We can now identify two cases:
119
Case 1: k starts the last busy period. In this case, Ak (0) > Lk−1 (dk−1 ), hence, Case IV of Theorem 8.4.1 holds while sk can only take the value 0 and thus τ ∗ = τ k = 0. Thus, dk = sk +τ k = 0.
Then Ak+1 (dk+1 ) ≥ Ak+1 (0) ≥ Ak (0) = Ak (dk ), therefore order is preserved.
Case 2: k does not start the last busy period. In this case, if Ak+1 (0) > Ak (dk ) order is preserved
trivially since dk+1 ≥ 0. On the other hand, if Ak+1 (0) < Ak (dk ), then by Lemma 8.4.1, k + 1
will be assigned a ground delay dk+1 such that Ak+1 (dk+1 ) ≥ Lk (dk ) > Ak (dk ) therefore, order
will be preserved and the proof is complete.
E.4
First, note that the total cost of the system with k flights under the L-FPA control policy is given
by
CT (k) =
k
X
Cj (sj , τ j )
(E.20)
j=1
where Cj (sj , τ j ) is given by (8.11), and sj , τ j are the solutions to (8.9).
Next, we proceed to a proof by induction over k = 1, 2, · · ·. First, when k = 1, since we start
with an empty list, airplanes m − 1 and m do not exist, therefore, Lm−1 = 0 and Am = ∞. Also, sm
can only take a single value, sm = 0, hence s1 = 0. Then, since Lm−1 < Ak (0) = A1 (0) < Am , Case
IV of Theorem 8.4.1 holds and as a result τ ∗ = τ 1 = 0. Hence, C1 (s1 , τ 1 ) = C1 (0, 0) = 0. Then, the
cost in (E.20) is CT (1) = 0, which is minimum since the cost is non-negative.
Now suppose that for airplane k, L-FPA yields a GHD dk = sk + τ k that minimizes the delay
cost, that is, CT (k) = CT∗ (k). Note that since the delay cost is minimized, then the airborne delay
of the jth airplane must be zero and therefore Lj (sj + τ j ) = Aj (sj + τ j ) + Z for all j = 1, · · · , k.
Next, consider what happens to airplane k + 1. From Corollary 8.4.1 we can identify two cases:
Case 1: If dk+1 = 0, then Ck+1 (0, 0) = 0. As a result,
CT (k + 1) =
k+1
X
Cj (sj , τ j ) = CT∗ (k) + Ck+1 (0, 0) = CT∗ (k)
j=1
which implies that CT (k + 1) is minimized since the cost is a non-decreasing function of k.
Case 2: If dk+1 = Lk (dk ) − Ak+1 (0), observe the following: If k + 1 were assigned a zero GHD, then
its airborne waiting time would be Lk (dk ) − Ak (0) = dk+1 . Hence, under L-FPA, k has traded
off all of its airborne waiting time for exactly the same amount of ground holding time, i.e., it
satisfies (8.15). Further, note that k + 1 is added at the end of the list due to Lemma 8.4.2,
therefore k + 1 induces no delays to any airplane j = 1, · · · , k. Hence, cost remains minimum
and the proof is complete.
120
Bibliography
[1] E. Aarts and J. Korst, Simulated Annealing and Boltzmann Machines, John Wiley & Sons,
1989.
[2] B. T. Allen, Managerial Economics, Harper Collins, 1994.
´ ttir, A global search method for discrete stochastic optimization, SIAM Journal
[3] S. Andrado
on Optimization, 6 (1999), pp. 513–530.
[4] G. Andreatta and L. Brunetta, Multi-airport ground holding problem: A computational
evaluation of exact algorithms, Operations Research, 46 (1998), pp. 57–64.
[5] G. Andreatta and G. Romanin-Jacur, Aircraft flow management under congestion, Transportation Science, 21 (1987), pp. 249–253.
[6] M. Asawa and D. Teneketzis, Multi-armed bandits with switching penalties, IEEE Transactions on Automatic Control, 41 (1996), pp. 328–348.
[7] A. Ashburn, Toyota famous OhNo system, American machinist, in Applying Just in Time:
The American/Japanese Experience, Y. Monden, ed., IIE Press, 1986.
[8] D. Bertsimas and S. S. Patterson, The air traffic flow management problem with enroute
capacities, Operations Research, 46 (1998), pp. 406–422.
[9] S. Brooks, A discussion of random methods for seeking maxima, Operations Research, 6 (1958).
[10] C. Cassandras and S. Strickland, Observable augmented systems for sensitivity analysis
of Markov and semi-Markov processes, IEEE Transactions on Automatic Control, 34 (1989),
pp. 1026–1037.
[11]
, On-line sensitivity analysis of Markov chains, IEEE Transactions on Automatic Control,
34 (1989), pp. 76–86.
[12] C. G. Cassandras, Discrete Event Systems, Modeling and Performance Analysis, IRWIN,
1993.
[13] C. G. Cassandras, L. Dai, and C. G. Panayiotou, Ordinal optimization for a class of
deterministic and stochastic discrete resource allocation problems, IEEE Transactions on Automatic Control, 43 (1998), pp. 881–900.
[14] C. G. Cassandras and V. Julka, Descent algorithms for discrete resource allocation problems, in Proceedings of the 33rd Conference on Decision and Control, Dec 1994, pp. 2639–2644.
121
[15]
, Scheduling policies using marked/phantom slot algorithms, Queueing Systems: Theory
and Applications, 20 (1995), pp. 207–254.
[16] C. G. Cassandras and C. G. Panayiotou, Concurrent sample path analysis of discrete
event systems, Accepted in Journal of Discrete Event Dynamic Systems, (1999).
[17] C. G. Cassandras and W. Shi, Perturbation analysis of multiclass multiobjective queueing
systems with ‘quality-of-service’ guarantees, in Proceedings of the 35th Conference on Decision
and Control, Dec 1996, pp. 3322–3327.
[18] C. Chen and Y. Ho, An approximation approach of the standard clock method for general
discrete event simulation, IEEE Transactions on Control Applications, 3 (1995), pp. 309–317.
[19] L. Cimini, G. Foschini, C.-L. I, and Z. Miljanic, Call blocking performance of distributed
algorithms for dynamic channel allocation in microcells, IEEE Transactions on Communications,
42 (1994), pp. 2600–7.
[20] L. Cimini, G. Foschini, and L. Shepp, Single-channel user-capacity calculations for selforganizing cellular systems, IEEE Transactions on Communications, 42 (1994), pp. 3137–3143.
[21] D. C. Cox and D. O. Reudink, Increasing channel occupancy in large-scale mobile radio systems: Dynamic channel reassignment, IEEE Transactions on Vehicular Technology, 22 (1973).
[22] L. Dai, Convergence properties of ordinal comparison in the simulation of discrete event dynamic systems, Journal of Optimization Theory and Applications, 91 (1996), pp. 363–388.
[23] L. Dai, C. G. Cassandras, and C. G. Panayiotou, On the convergence rate of ordinal
optimization for stochastic discrete resource allocation problems, To appear in IEEE Transactions
on Automatic Control, 44 (1999).
[24] M. Di Mascolo, Y. Frein, Y. Dallery, and R. David, A unified modeling of kanban
systems using petri nets, Intl. Journal of Flexible Manufacturing Systems, 3 (1991), pp. 275–
307.
[25] B. Eklundh, Channel utilization and blocking probability in a cellular mobile telephone system
with directed retry, IEEE Transactions on Communications, 34 (1986), pp. 329–337.
[26] D. Everitt, Traffic capacity of cellular mobile communication systems, Computer Networks
ISDN Systems, 20 (1990), pp. 447–54.
[27] R. Galager, A minimum delay routing algorithm using distributed computation, IEEE Transactions on Communications, 25 (1977), pp. 73–85.
[28] J. Gittins, Multi-Armed Bandit Allocation Indices, Wiley, New York, 1989.
[29] J. Gittins and D. Jones, A dynamic allocation index for the sequential design of experiments,
in Progress in Statistics, European Meeting of Statisticians, K. S. D. Gani and I. Vince, eds.,
Amsterdam: North Holland, 1974, pp. 241–266.
[30] P. Glasserman, Gradient Estimation via Perturbation Analysis, Kluwer, Boston, 1991.
[31] W.-B. Gong, Y. Ho, and W. Zhai, Stochastic comparison algorithm for discrete optimization
with estimation, in Proceedings of 31st IEEE Conference on Decision and Control, Dec 1992,
pp. 795–802.
122
[32]
, Stochastic comparison algorithm for discrete optimization with estimation, Journal of
Discrete Event Dynamic Systems: Theory and Applications, (1995).
[33] S. Grandhi, R. Vijayan, D. Goodman, and Z. J., Centralized power-control in cellular
radio systems, IEEE Transactions On Vehicular Technology, 42 (1993), pp. 466–468.
[34] Y. Gupta and M. Gupta, A system dynamics model for a multistage multiline dual-card
JIT-kanban system, Intl. Journal of Production Research, 27 (1989), pp. 309–352.
[35] Y. Ho and X. Cao, Perturbation Analysis of Discrete Event Systems, Kluwer, Boston, 1991.
[36] Y. Ho, M. Eyler, and T. Chien, A gradient technique for general buffer storage design in
production line, Intl. Journal of Production Research, 17 (1979), pp. 557–580.
[37] Y. Ho, R. Sreenivas, and P. Vakili, Ordinal optimization in DEDS, Journal of Discrete
Event Dynamic Systems: Theory and Applications, 2 (1992), pp. 61–88.
[38] Y. C. Ho, Heuristics, rule of thumb, and the 80/20 proposition, IEEE Transactions on Automatic Control, 39 (1994), pp. 1025–1027.
[39] J. Holland, Adaptation in Natural and Artificial Systems, University of Michigan Press, 1975.
[40] P. Huang, L. Rees, and T. B.W., A simulation analysis of the japanese Just-In-Time technique (with kanbans) for a multiline multistage production system, Decision Sciences, 14 (1983),
pp. 326–344.
[41] T. Ibaraki and N. Katoh, Resource Allocation Problems, Algorithmic Approaches, MIT
Press, 1988.
[42] T. J. Kahwa and N. D. Georganas, A hybrid channel assignment scheme in large-scale
cellular-structured mobile communication systems, IEEE Transactions on Communications, 26
(1978).
[43] V. Kalashnikov, Topics on Regenerative Processes, CRC Press, Boca Raton, Florida, 1994.
[44] J. Karlsson and B. Eklundh, A cellular mobile telephone system with load sharing-an enhancement of directed retry, IEEE Transactions on Communications, 37 (1989), pp. 530–535.
[45] I. Katzela and M. Naghshineh, Channel assignment schemes for cellular mobile telecommunication systems: A comprehensive survey, IEEE Personal Communications, 3 (1996), pp. 10–31.
[46] J. Kiefer and J. Wolfowitz, Stochastic estimation of the maximum of a regression function,
Annals of Mathematical Statistics, 23 (1952), pp. 462–466.
[47] O. Kimura and H. Terada, Design and analysis of pull system, a method of multi-stage
production control, Intl. Journal of Production Research, 19 (1981), pp. 241–253.
[48] L. Kleinrock, Queueing Systems. Volume I: Theory, Wiley, 1975.
[49] X. Lagrange and B. Jabbari, Fairness in wireless microcellular networks, IEEE Transactions
On Vehicular Technology, 47 (1998), p. 472.
[50] M. Lulu and J. Black, Effect of process unreliability on integrated manufacturing/production
systems, Journal of Manufacturing Systems, 6 (1987), pp. 15–22.
123
[51] V. H. MacDonald, The cellular concept, Bell System Technology Journal, 58 (1978), pp. 15–
41.
[52] D. Mitra and I. Mitrani, Analysis of a novel discipline for cell coordination in production
lines, tech. report, AT&T Laboratories, 1988.
´ ski, On optimal allocation of indivisibles
[53] V. I. Norkin, Y. M. Ermoliev, and Ruszczyn
under uncertainty, Operations Research, 46 (1998), pp. 381–395.
[54] C. G. Panayiotou and C. G. Cassandras, Dynamic resource allocation in discrete event
systems, in Proceedings of IEEE Mediterranean Conference on Control and Systems, Jul 1997.
[55]
, Dynamic transmission scheduling for packet radio networks, in Proceedings of IEEE Symposium on Computers and Communications, Jun 1998, pp. 69–73.
[56]
, Flow control for a class of transportation systems, in Proceedings of IEEE Intl. Conference
on Control Applications, Sep 1998, pp. 771–775.
[57]
, Optimization of kanban-based manufacturing systems, Accepted for publication in AUTOMATICA, (1999).
[58]
, A sample path approach for solving the ground-holding policy problem in air traffic control,
Submitted to IEEE Transaction on Control Systems Technology, (1999).
[59] R. Parker and R. Rardin, Discrete Optimization, Academic Press, Inc, Boston, 1988.
[60] P. Philipoom, L. Rees, B. Taylor, and P. Huang, An investigation of the factors influencing the number of kanbans required in the implementation of the JIT technique with kanbans,
Intl. Journal of Production Research, 25 (1987), pp. 457–472.
[61] P. A. Raymond, Performance analysis of cellular networks, IEEE Transactions on Communications, 39 (1991), pp. 1787–1793.
[62] O. Richetta and A. R. Odoni, Solving optimally the static ground-holding policy problem in
air traffic control, Transportation Science, 27 (1993), pp. 228–238.
[63]
, Dynamic solution to the ground-holding problem in air traffic control, Transportation
Research, 28A (1994), pp. 167–185.
[64] H. Robbins and S. Monro, A stochastic approximation method, Annals of Mathematical
Statistics, 22 (1951), pp. 400–407.
[65] B. Schroer, J. Black, and S. Zhang, Just-In-Time (JIT), with kanban, manufacturing
system simulation on a microcomputer, Simulation, 45 (1985), pp. 62–70.
´
[66] L. Shi and S. Olafsson,
Convergence rate of nested partitions method for stochastic optimization, Submitted to Management Science, (1997).
[67]
, Stopping rules for the stochastic nested partition method, paper in progress, (1998).
[68]
, Nested partitions method for global optimization, To appear in Operations Research,
(1999).
[69] A. Shiryayev, Probability, Springer-Verlag, New York, 1979.
124
[70] K. C. So and S. C. Pinault, Allocating buffer storage in a pull system, Intl. Journal of
Production Research, 26 (1988), pp. 1959–1980.
[71] Y. Sugimori, K. Kusunoki, F. Cho, and S. Uchikawa, Toyota production system materialization of Just-In-Time and research-for-human systems, Intl. Journal of Production Research,
15 (1977), pp. 553–564.
[72] M. Terrab and A. R. Odoni, Strategic flow management for air traffic control, Operations
Research, 41 (1993), pp. 138–152.
[73] R. Uzsoy and L. A. Martin-Vega, Modeling kanban-based demand-pull systems: a survey
and critique, Manufacturing Review, 3 (1990), pp. 155–160.
[74] P. Vakili, A standard clock technique for efficient simulation, Operations Research Letters, 10
(1991), pp. 445–452.
[75] P. Vranas, D. Bertsimas, and A. R. Odoni, The multi-airport ground-holding problem in
air traffic control, Operations Research, 42 (1994), pp. 249–261.
[76] J. Wieselthier, C. Barnhart, and A. Ephremides, Optimal admission control in circuitswitched multihop radio networks, in Proceedings of 31st IEEE Conference on Decision and
Control, Dec 1992, pp. 1011–1013.
[77] D. Yan and H. Mukai, Stochastic discrete optimization, SIAM Journal on Control and Optimization, 30 (1992).
[78] H. Yan, X. Zhou, and G. Yin, Finding optimal number of kanbans in a manufacturing system
via stochastic approximation and perturbation analysis, in Proceedings of 11th Intl. Conference
on Analysis and Optimization of Systems, 1994, pp. 572–578.
[79] J. Zander, Distributed co-channel interference control in cellular radio systems, IEEE Transactions On Vehicular Technology, 41 (1992).
[80] H. Zhu and V. S. Frost, In-service monitoring for cell loss quality of service violations in
ATM networks, IEEE/ACM Transactions on Networking, 4 (1996), pp. 240–248.
125

SAMPLE PATH OPTIMIZATION TECHNIQUES FOR DYNAMIC

Transcription

Similar documents

XOps Project Leaflet

What do you hear outside? dog bird airplane car bug

The man changing

Zlin Z242 Aerobatic Airplane Z 242L is a two seat side-by

Lesson Plan: Similar Paper Airplanes

Pietenpol Sky Scout - Model "A" Ford Foundation

Media Kit - Meli Marketing

How to Meditate on an Airplane

click here - Flight Journal