On the Shortest Queue Policy for the tandem

Transcription

On the Shortest Queue Policy for the tandem
On the Shortest Queue Policy for the tandem parallel queue
Arie Hordijk and Ger Koole
Department of Mathematics and Computer Science
University of Leiden
and
Department of Operations Research
University of North Carolina at Chapel Hill
Probability in the Engineering and Informational Sciences 6:63–79, 1992
1. Introduction
We consider two nodes in tandem. At each node or service centre there are two servers
present with the same service rate µ and each with its own queue. Customers arrive at
the first node according to a Poisson process with arrival rate λ. At their arrival, they
have to be assigned to one of the servers, so they are routed to one of the queues at node
1. Customers leaving centre 1 enter node 2 and are routed to one of the queues at node 2
(see figure 1).
centre 2
centre 1
Poisson(λ)
...queue
.......................... ..1
...queue
.......................... ..1
.....
.....
......... ........
......... ........
.
....
...
...
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.....
.
.
.
.....
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... .....
.
..
..
............................................ .........
............................................ .........
.....
.....
.
.
.
.
.....
.
.
.
.
.
.
.
.
.
.
....
...
...
..
.........
...................................................................
....................................................
...............................................................
. .....
....
....
queue
2
queue
2
.
.
.
.
.
.
.....
.
.
.
.
.
.
............................................. .....
............................................. .....
.....
.....
.
.
.
.
.
.
.
........
........
.....
....
....
.....
....................
...................
....
....
...
...
..
..
..
..
.............................................
.............................................
µ
µ
µ
µ
Figure 1.
The goal of our analysis is to find a policy which stochastically minimizes the number
of customers in the system at any time point. In the communication model which motivated
this research the information on which an arriving customer has to be routed to one of
the queues consists of the lengths of the queues in that node. So at any node there is no
information available about the queue lengths of the other node.
In this paper we consider two cases. One is the model described above, which we call
the tandem parallel queue with Partial Information (PI). The other is the case where the
routing decisions in a node may also depend on the queue lengths at the other node. We
call this the Full Information (FI) case.
1
It can be shown that the Shortest Queue Policy (SQP) in node 2 dominates any other
policy in that node including policies which use information of node 1. So in the PI as
well as in the FI case an optimal policy can use the SQP rule in node 2. A similar result
does not hold for the first node as we show in section 5 through numerical calculations.
For finite buffers in both centres the Optimal Policy (OP) is computed for the FI case and
the total expected discounted reward, with immediate reward rate the departure rate of
customers from the system. In table 1 the relative difference of the value vectors of the OP
and the policy which uses SQP in both nodes is given for various values of the arrival rate
λ and the discount factor α. It can be seen in that table that the policy with SQP in Both
Nodes (SQBNP) is not discounted optimal. Hence, it does not stochastically minimize the
number of customers in the system at any time point. By using sufficiently large buffer sizes
we found that a similar result is also true for infinite buffer sizes. Although the SQBNP is
not optimal, from the numerical calculations it is seen that it is nearly optimal. Indeed,
even as the number of states where the OP is not equal to the SQBNP is in some cases
large the relative differences remain small. More insight why this is so may be gathered
from the counterexample in section 4.
For the PI case our conjecture is that the SQBNP is optimal. This conjecture is
supported by extensive numerical evidence. For finding the optimal policy in the PI case
the standard stochastic dynamic programming algorithms are not applicable. Therefore we
used the recently developed algorithm of Kulkarni and Serin, which worked well. Policies
based on partial information are what they call implementable policies. The table 3 in
section 6 reports on the various values of the buffer sizes for which the optimal policy has
been computed for the PI case. In all instances the SQBNP was locally optimal.
In section 2 we show that in the PI case the SQBNP is optimal in a large class of
policies. Indeed, the SQBNP dominates any policy which uses a static policy in the second
node. In order to prove this result we use two theorems. In theorem 2.1 we consider one
node with a general arrival process. We compare the SQP with an arbitrary policy R. The
theorem says that the departure processes for SQP and R can be coupled such that with
probability one the k-departure under the SQP is before the k-departure under R for any
k ∈ IN. This theorem is the topic of section 3. It strengthens the standard result that the
number of departures at any time point is stochastically maximized by SQP. As pointed
out in remark 3.5 the basic lemma 3.3 extends to the case of finite buffers. Hence the SQP
gives also for finite buffer sizes a pathwise earlier departure process. Interestingly enough
2
it also maximizes the number of accepted customers.
In theorem 2.2 we again consider one node. For a given policy we compare the
departure processes for two coupled arrival processes. It is shown for a static policy that
a pathwise earlier arrival process gives a pathwise earlier departure process.
Counterexample 2.4 shows that theorem 2.2 is not valid for the SQP. This is the idea
behind the counterexample of section 4, where we find that it is better to postpone arrivals
in states with queue lengths which are equal and not small. This counterexample explains
why the SQBNP is not an optimal policy in the FI case when the traffic is light. It turns
out that the states of node 2 for which delaying customers in the first node pays off, have
a low probability of occurring. This is the reason why the relative differences between the
OP and the SQBNP are small.
There is an extensive literature on the routing of customers to parallel queues. For
an introduction we refer to Walrand [19]. An extensive overview is given in Hariharan et
al. [9].
Winston [22] shows that the SQP stochastically minimizes the expected discounted
cost over any time-horizon for single exponential servers. In Davis [5] it is shown that the
SQP is optimal for discounted waiting costs and an arrival process of GI-type. He considers
the more general case where customers may be rejected. Weber [20] derives the optimality
of the policy which selects among the shortest queues the one for which the given service
is longest. He allows general arrival processes and service times with increasing failure
rate. Whitt [21] shows that the shortest delay policy which routes customers to the queue
with shortest expected waiting time, is not optimal if the service time distribution has a
U-shaped failure rate. Ephremides et al. [6] prove that the SQP minimizes the total time
for the completion of service of all customers which arrive before time t. They also show
that the cyclic assignment policy is optimal when the queue lengths are not observed. In
Lehtonen [15] Winston’s result is strengthened. He shows that for the SQP the departure
process is pathwise earliest. In Hordijk & Koole [11] we prove that the SQP maximizes
stochastically the number of customers served at any time t when the queues have finite
buffers. The buffers may have different capacities. Using lemma 3.1 of this paper we can
show that the SQP also has the earliest departure process in case of finite buffers. In
Towsley et al. [18] also finite buffers are considered. They show that the ”join the shortest
non-full queue” is optimal with respect to weak majorization. They also consider the case
where buffering is available at the controller.
3
All the models above assume a single server at any of the queues. In Johri [13]
state-dependent service rates are considered. For Poisson arrivals the SQP stochastically
minimizes the number of customers at any time point when the service rates are nondecreasing concave. In Menich and Serfozo [17] the service and arrival rates are functions of
all queue lengths. They show that SQP is optimal with respect to weak submajorization
if the arrival and service rates are families of interchangeable functions. Their conditions
include the case in which each service station has s identical exponential servers. Their
assumptions do not allow finite buffers.
With the exception of finite and unequal buffers all models mentioned so far are
symmetric in all queues. In Abdel-Gawad [1] the results of Davis [5] are extended to
parallel queues with different service rates. A general two-station control model is analysed
in Hajek [7]. This model includes as special case the routing problem for two non-symmetric
single-server nodes, each with its own queue. In Hariharan et al. [10] two parallel queues
are considered which are not symmetric since the holding cost functions are different. The
optimal routing and admission control problem for infinite-server exponential queues with
a common rate is analysed. For a Poisson arrival process and nondecreasing convex holding
cost function admission and routing monotonicity for the discounted optimal policy are
shown.
Many papers address the problem of obtaining the stationary distribution under the
SQP. Using a compensation method Adan et al. [2] have recently shown that it equals
an infinite linear combination of product forms with explicit relations for the parameters.
Bounds and approximations can be found in Halfin [8], Conolly [4] and Houck [12]. In
Houck [12] two queues with unequal numbers of servers are considered. It turns out that
the shortest expected delay policy is nearly optimal. This result concurs with our findings
on the SQP that, although not optimal in the full information case, its performance is very
close to the optimal policy.
None of the papers mentioned studies the case of two nodes in tandem with parallel
queues in each node. In this paper we consider the full and partial information case
for this model. In the last case the control in the first node may not depend on the
number of customers in the second node. In Beutler & Teneketzis [3] partial information
or decentralized control is studied for a tandem queue. The transfer rate for customers
from node 1 to node 2 is controlled and the information on the number of customers is
through a probability distribution. It is shown that the optimal control is a threshold
4
policy.
2. Partial Information: SQP is faster than static policies.
In this section we prove that in the PI case the SQP in both nodes gives an earlier departure
process than the two node policy which uses a static policy in the second node. In order
to do so we need two theorems. The first states that the SQP gives a pathwise earlier
departure process. The second theorem says that for a static policy an earlier arrival
process gives an earlier departure process. Combining these theorems indeed gives the
desired result. We see an arrival process as a sequence of arrival times. That is, the arrival
process V = {Vn , n ∈ IN} has Vn as the time of the nth arrival. For arrival processes
V = {Vn , n ∈ IN} and W = {Wn , n ∈ IN} we say that V is pathwise earlier than W and
d
d
we write V ≤p W if there are arrival processes V ∗ and W ∗ with V ∗ = V and W ∗ = W
such that Vn∗ (ω) ≤ Wn∗ (ω) for all ω and n ∈ IN. A similar definition and notation we use
d
for departure processes. As is standard = means that the processes on either side have the
same distribution. Then the result of section 3 can be stated as
Theorem 2.1. Consider one centre with two parallel queues, arrival process U and policy
SQP resp. an arbitrary policy R. If V resp. V˜ are the departure processes, then V ≤p V˜ .
A policy R is called a static policy if it is defined by a sequence of random variables
{Πn , n ∈ IN}, where Πn = j corresponds to routing the jth arriving customer to queue
j. The routing probabilities are stochastically independent of the queue lengths and the
arrival times.
We get the Bernoulli Policy if all Πn are independent and IP(Πn = 1) =
static policy is the Cyclic Assignment Policy if IP(Πn+1 = j + 1
1
2
for all n. The
(mod 2) | Πn = j) = 1
for all n.
Theorem 2.2. Consider one centre with two parallel queues and a static policy R. For
arrival processes T and T˜ the departure processes are denoted by V resp. V˜ . Then
T ≤p T˜ ⇒ V ≤p V˜ .
( ) d
( )
Proof. Because T ≤p T˜ there are arrival processes T ∗ and T˜∗ with T˜ = T˜∗ such that
( )
( )
( )
Tn∗ (ω) ≤ T˜n∗ (ω) ∀n∀ω. Fix ω ∈ Ω. We use the following notation: T˜n∗ (ω) = t˜n . Let S˜n
( )
˜n be the service time of resp. the queue to which the nth customer is routed. Of
resp. U
5
d
d ˜
course Sn = S˜n . Because R is static we also have Un = U
n . Hence by coupling arguments
˜n for all n. Denote an arbitrary realization of
we may assume that Sn = S˜n and Un = U
Sn , Un , n > 0 with sn , un , n > 0. We omit the superscript ∗ . Let ξ T,V (t) resp. ξ˜T,V (t) be
the number of arrived resp. served customers at time t. A subscript j denotes a specific
queue. Then (with I{···} the indicator function):
ξ T (t)
ξjV
(t) =
X
n=1
I{un = j} I{t + Pn I
k
l=k {ul =j} sl ≤ t, k = 1, . . . , n}
ξ˜T (t)
≥
X
n=1
I{un = j} I{t + Pn I
k
l=k {ul =j} sl ≤ t, k = 1, . . . , n}
ξ˜T (t)
≥
=
X
I{un = j} I{t˜ + Pn I
k
l=k {ul =j} sl ≤ t, k = 1, . . . , n}
n=1
ξ˜jV (t),
j = 1, 2.
Thus ξ V (t) = ξ1V (t) + ξ2V (t) ≥ ξ˜V (t) ∀t and tVn ≤ t˜Vn
∀n.
Remark 2.3. This theorem is also true for general service times. Unfortunately, it does
not hold for the SQP as the following counterexample shows.
Counterexample 2.4. Take T1 = T2 = T3 = T˜1 = T˜2 = 0; T˜3 = h; Tn = T˜n > 1 + h
∀n ≥ 4. Thus T ≤p T˜. Compare the probabilities that 2 customers have left at t = 1 + h.
Condition on the number of departures in [0, h]. If no departures occur in [0, h], the two
systems behave the same.
On the other hand, if exactly one departure occurs in [0, h], the time until the next
departure in the T -model will have with probability
1
2
1
2
an Erlang(µ) and with probability
an Erlang(2µ) distribution. Indeed, the customer departing in [0, h] leaves the queue
with one customer with probability 12 and the queue with two customers with probability
1
˜
2 as well. In the T -model the at h arriving customer chooses the empty queue, therefore
the time until the next departure will be Erlang(2µ) distributed. The difference between
these two probabilities, say c, does not depend on h, but only on µ. The probability that
one customer leaves in [0, h] is equal to 2µh + o(h).
The probability that two customers leave in [0, h] in [0, h] is o(h). Now we have:
IPT (2 customers leave in [0, h]) − IPT˜ (2 customers leave in [0, h]) = 2µh + o(h) > 0,
if h is small enough.
6
Combining the theorems 2.1 and 2.2 gives the following result for the two centres in
tandem.
Theorem 2.5. Let R = (R1 , R2 ) be the two node policy with static policy R2 in centre
2. Denote R∗ = (SQP, SQP) for the two node policy which use the SQP in both centres.
ˆ be the departure processes of the second
For a general arrival process T let W resp. W
node under R resp. R∗ . Then
ˆ ≤p W.
W
Proof. The proof easily follows from the propositions 2.1 and 2.2. As depicted in figure
2 let V resp. Vˆ be the notation for the departure processes of the first node under policy
R resp. R∗ . The departure process of the second node for the policy (SQP, R2 ) is denoted
˜.
by W
............................
............................
...
...
......
......
.
.............................................................
.............................................................
..........................................................
......
......
...
.
...
.
.
.
................1
.
.
.
.
..............2
......
...........
T
V
R
V˜
R
W
˜
W
....................
..............................
..........
....
......
..
..........................................................
..
....
.............................................................
.
..............................................................
.
.......
.
.
....
..................2
.........................
...........
T
SQP
V˜
R
ˆ
W
.........................
............................
....
.......
...
......
..
..
...
.
.
.............................................................
...............................................................
.........................................................
......
...............................
..............................
T
SQP
SQP
Figure 2.
From proposition 2.1 we have
V˜ ≤p V.
Hence by proposition 2.2
˜ ≤p W.
W
Proposition 2.1 also gives
ˆ ≤p W
˜.
W
Combining the last two inequalities yields
ˆ ≤p W.
W
Remark 2.6. It is straightforward to generalize the result of theorem 2.5 to a network
of centres in tandem. The proof goes by induction on the number of nodes. Suppose it
is true for k nodes. Assume V resp. V˜ are the departure processes of the kth node when
using (R1 , . . . , Rk ) with Ri static for 1 ≤ i ≤ k, resp. SQP in each node. Then by the
7
induction hypothesis V˜ ≤p V and we can use again the same arguments as in the proof of
theorem 2.5.
3. One node: Pathwise optimality of the SQP
Consider the model consisting of 1 centre. We prove that the sample paths under the SQP
and an arbitrary policy R can be coupled such that with probability one the number of
customers present is smaller under SQP at any time point. In Hordijk & Koole [11] we
used the following general arrival process.
Definition 3.1. Arrival process. Let Λ be the—possibly countable—state space of a
Markov process with transition rates λxy , x, y ∈ Λ. When this process moves from x to y
P
a customer arrives with probability qxy , y∈Λ qxy ≤ 1 ∀x ∈ Λ.
It can be shown that any arrival process can be approximated arbitrarily closely by
this type of Markovian process. In this approximation we may assume without loss of
P
generality that y λxy ≤ M ∀x.
ˆ
Now consider 2 Markov processes X(t) and X(t),
corresponding to the models with
policies SQP and R respectively. We assume that they have the same arrival process
which satisfies definition 3.1. The state of these processes has three components. x for
the state of the arrival process, i1 resp. i2 for the number of customers in queue 1 resp.
queue 2. Due to the boundedness of the transition rates we can uniformize the Markov
processes. Doing so we can analyse the Markov processes through the embedded Markov
chains, since all time intervals between transitions, which may be dummy transitions, are
now exponentially distributed with mean
1
M +2µ .
In the sequence we refer to the model with policy R resp. policy SQP by using resp.
not using the symbol ˆ. With (ˆ) we indicate that the relation holds for both models.
Also the Markov chains have state space (x, i1 , i2 ), with x the state of the arrival
process and i1 , i2 the numbers in queue 1 resp. 2. The transition probabilities are:
( )
rˆ(x,i1 ,i2 )(x,i1 −1,i2 ) = p if i1 > 0
( )
rˆ(x,i1 ,i2 )(x,i1 ,i2 −1) = p if i2 > 0
( )
rˆ(x,i1 ,i2 )(y,i1 +1,i2 ) = pxy qxy if SQP, R assigns to queue 1 in state (i1 , i2 )
( )
rˆ(x,i1 ,i2 )(y,i1 ,i2 +1) = pxy qxy if SQP, R assigns to queue 2 in state (i1 , i2 )
8
( )
rˆ(x,i1 ,i2 )(y,i1 ,i2 ) = pxy (1 − qxy )
X
( )
rˆ(x,i1 ,i2 )(x,i1 ,i2 ) = 1 − 2p −
pxy
y
where p =
µ
M +2µ
and pxy =
λ
M +2µ .
All other transitions have probability 0.
Now we define 2 Markov processes Y (t) resp. Yˆ (t) corresponding to the policies SQP
( )
ˆ
resp. R with state space equal to that of X(t).
Let Sn , Sˆn be the times between the
transitions. All Sn are independently exponentially distributed with parameter M + 2µ
ˆn which are independent
and similarly for Sˆn . The transitions are generated through Un , U
random variables with uniform distribution on [0, 1]. When at the nth transition time
of the Y (t) process the state is (x, i1 , i2 ), then a customer leaves queue 1 resp. 2 when
Un ∈ (0, p] resp. (p, 2p]. If that queue is empty a dummy transition occurs. An arriving
customer is assigned to a queue according to the SQP. The arrival process changes from x
P
P
to y and a customer arrives if Un ∈ (2p + z<y pxz , 2p + z<y pxz + pxy qxy ], the arrival
P
process changes from x to y and no customer arrives if Un ∈ (2p + z<y pxz + pxy qxy , 2p +
P
P
z≤y pxy ]. A dummy transition occurs if Un 6∈ (0, 2p +
z pxz ]. The process under policy
ˆn variables. It is easy to see that the transition
R is defined in a similar way with the U
( )
ˆn are equal to the (rˆ) defined above. Consequently, the
probabilities generated by the U
stochastic processes X(t), t ≥ 0 and Y (t), t ≥ 0 have the same distribution and similarly
ˆ
for X(t),
t ≥ 0 and Yˆ (t), t ≥ 0.
( )
( )
ˆn completely describe the evolution of the processes. Now we couple
The r.v. Sˆn , U
ˆn . We take Sn = Sˆn , which means that
both processes by relating Sn to Sˆn and Un to U
events in both models take place at the same time epochs. Thus we can concentrate on the
ˆn depends
embedded Markov chains governed by the Un . The relation between Un and U
ˆn = Un if ˆı1 ≤ ˆı2 and i1 ≤ i2 or
on the numbers in the queues, i.e. on ((ˆı)1 , (ˆı)2 ). We take U
ˆı1 ≥ ˆı2 and i1 ≥ i2 . If the longest queue in the SQP-model has not the same index as the
ˆn = Un
longest in the R-model, thus if ˆı1 < ˆı2 and i1 > i2 or ˆı1 > ˆı2 and i1 < i2 , we take U
ˆn = Un + p if Un ∈ (0, p] and U
ˆn = Un − p if Un ∈ (p, 2p]. Note that
if Un ∈ (2p, 1], U
ˆn and Sn to Sˆn does not invalidate our assumptions about their
the coupling of Un to U
distributions.
In our coupling, the transitions in the arrival processes of Y (t) and Yˆ (t) are equal.
If in both models the same queue is longest then service completions occur in the same
queue. Otherwise, the service of the first queue in the Y (t) process is coupled to the service
of the second queue in the Yˆ (t) process.
9
( )
ˆ (n) denote the embedded Markov chains of Y(ˆ) (t). We will compare the sample
Let N
ˆ (n). Let ω = (u1 , u2 , . . .) be a sample path of (Un , n > 0). Then
paths of N (n) and N
ˆ (n)(ω) give the sample paths of the embedded Markov chains we compare.
N (n)(ω) and N
( )
( )
ˆ (n)(ω) = (x
We write N
ˆn , (ˆı)1,n , (ˆı)2,n ).
Lemma 3.2. If

(ia) i1,n + i2,n = ˆı1,n + ˆı2,n



(ib) |i1,n − i2,n | ≤ |ˆı1,n − ˆı2,n |



(ic) xn = x
ˆn

(iia) i1,n + i2,n < ˆı1,n + ˆı2,n



(iib) |i1,n − i2,n | − 1 ≤ |ˆı1,n − ˆı2,n |



(iic) xn = x
ˆn
or
holds for n = 0, then for any n > 0 one of these sets of relations is true.
The relations (ia) and (ib) say that under the SQP the numbers of customers are
more equally divided over both queues if under the SQP and R the numbers of customers
are equal. The relations (iia) and (iib) are concerned with the case that there are less
customers under the SQP. In this case there is a similar expression on the balance of
customers, except that now the SQP may be slightly more unbalanced. The logic behind
this can be seen when regarding states with i1,n + i2,n = ˆı1,n + ˆı2,n − 1 and ˆı1,n = ˆı2,n .
Now (ib) cannot hold, but (iib) can. Below we proof that these relations are indeed valid.
Proof of lemma 3.2. By induction. Suppose the lemma holds for 1, . . . , n. We split
the proof for n + 1 and we consider the different cases depending on the value of un+1
and whether (i) or (ii) holds for n. From the relation between un and u
ˆn it is clear that
xn+1 = x
ˆn+1 . Therefore we can concentrate on the numbers of customers.
2p < un+1 ≤ 1. Because a customer does or does not arrive in both models, it is sufficient
to show (ib) for n + 1 if (i) holds for n. Similarly in the case that (ii) holds for n.
(i) holds for n. If |i1,n −i2,n | = 0, then |i1,n+1 −i2,n+1 | = 1. At n + 1 the total number
of customers is odd, thus |ˆı1,n+1 − ˆı2,n+1 | ≥ 1. If |i1,n − i2,n | > 0, |i1,n+1 − i2,n+1 | =
|ı1,n − i2,n | − 1 because of the SQP. |ˆı1,n − ˆı2,n | cannot decrease by more than 1 when
1 customer arrives, thus (ib) remains valid.
(ii) holds for n. If |i1,n − i2,n | = 0, then |i1,n+1 − i2,n+1 | − 1 = 0. Thus (iib) is valid.
If |i1,n − i2,n | > 0, (iib) is valid by the same reasoning as given above.
0 < un+1 ≤ p. Choose j and ˆ such that queue j and queue ˆ are the shorter queues. This
means that queue j and queue ˆ are served at the same time. Note that service may be
10
given to an empty queue. If i1,n = i2,n or ˆı1,n = ˆı2,n holds choose j or ˆ such that queues
j and ˆ are served at the same time. For queue (ˆ) we denote the other queue with (ˆ) ± 1.
(i) holds for n. Consider the case where un+1 is such that the queues j and ˆ are
served:
( )
( )
ˆ (n + 1)(ω) = N
ˆ (n)(ω).
ˆıˆ,n = 0, ij,n = 0 ⇒ N
ˆıˆ,n = 0, ij,n > 0 ⇒ i1,n+1 +i2,n+1 = ˆı1,n+1 +ˆı2,n+1 −1. The unbalance |i1,n −i2,n |
cannot increase by more than 1, thus (iib) holds also for n + 1.
ˆıˆ,n > 0. Since queue (ˆ) is the emptier queue we have from (i) that ˆıˆ,n ≤ ij,n ≤
ij±1,n ≤ ˆıˆ±1,n . Thus because ij,n > 0 (ia) holds for n + 1. Furthermore we have
|(ˆı)1,n+1 − (ˆı)2,n+1 | = 1 + |(ˆı)1,n − (ˆı)2,n |, thus (ib) also holds.
Let un+1 be such that the longer queues, i.e. j ± 1 and ˆ ± 1, are served.
ˆıˆ±1,n = 0. Since ˆıˆ,n ≤ ij,n ≤ ij±1,n ≤ ˆıˆ±1,n , we have i1,n = i2,n = 0 ⇒
( )
( )
ˆ (n + 1)(ω) = N
ˆ (n)(ω)
N
ˆıˆ±1,n > 0, ij±1,n = 0 ⇒ ij,n = 0 ⇒ i1,n+1 = i2,n+1 = 0 thus (i) or (ii) holds for
n + 1.
ˆıˆ±1,n > 0, ij±1,n > 0. (ia) holds for n + 1. The unbalance decreases in both
queues with 1, thus (ib) holds also.
(ii) holds for n. Consider again the case where un+1 is such that queue j and ˆ are
served:
( )
( )
ˆ (n + 1)(ω) = N
ˆ (n)(ω)
ˆıˆ,n = 0, ij,n = 0 ⇒ N
ˆıˆ,n = 0, ij,n > 0, (iia) ⇒ ij,n + ij±1,n + 1 ≤ ˆıˆ±1,n ⇒ ij±1,n + 2 ≤ ˆıˆ±1,n . The
numbers of customers in the queues j ± 1 and ˆ ± 1 remain unchanged, therefore
|ˆıˆ,n+1 − ˆıˆ±1,n+1 | = ˆıˆ±1,n ≥ ij±1,n + 2 > |ij,n+1 − ij±1,n+1 |, and (iib) clearly
holds. It is easy to see that also (iia) holds for n + 1.
ˆıˆ,n > 0 ⇒ ij,n > 0, use similar arguments as for case (i). Thus (iia) holds for
n + 1. The unbalance increases by 1 for both models, so (iib) holds also.
If un+1 is such that service is given to the longer queues, we get:
ˆıˆ±1,n = 0 ⇒ ˆıˆ,n = 0 so that (iia) can not hold.
ˆıˆ±1,n > 0, ij±1,n = 0 ⇒ i1,n+1 = i2,n+1 = 0 thus (i) or (ii) holds for n + 1.
ˆıˆ±1,n > 0, ij±1,n > 0. Because in both processes a customer leaves (iia) remains
valid. If |ij,n − ij±1,n | > 0, then |ij,n+1 − ij±1,n+1 | = |ij,n − ij±1,n | − 1, which
means that (iib) holds. On the other hand, if |ij,n − ij±1,n | = 0, then |ij,n+1 −
ij±1,n+1 | − 1 = 0 which establishes (iib) also.
11
p < un+1 ≤ 2p. Analogously to 0 < un+1 ≤ p.
Lemma 3.2 leads to the main result of this section.
Theorem 3.3. Consider two parallel servers with the same service rate. There are versions
ˆ
K(t) and K(t)
of the stochastic processes of total numbers of customers under policy R
resp. SQP such that
ˆ
K(t)
≤ K(t) for all t.
Proof. If t is between the nth and (n + 1)th transition epoch then
( )
ˆ
K(t)
= (ˆı)1,n + (ˆı)2,n .
Hence the proof follows the relations (ia) and (iia) of lemma 3.2.
ˆ
Remark 3.4. Since the arrivals for the K(t) and K(t)
process are at the same time
instants, the departures under the SQP must be pathwise earlier than under policy R.
This proves theorem 2.1.
Remark 3.5. It is straightforward to check that lemma 3.2 remains true in case each
queue has a finite buffer size, say B1 and B2 . In this more general case only policies are
allowed which accept customers if not both buffers are full. This leads to the assertion
that the SQP (note that in Hordijk & Koole [11] we named this policy generalized SQP)
is admitting more customers to the system than other policies. Again using the (ia) and
(iia) relations of extended lemma 3.2 gives that theorem 2.1 is also true for finite buffers.
12
4. Full information: SQP is not optimal
We consider again the tandem parallel queue as shown in figure 1 and we denote with i1
and i2 the numbers of customers in queue 1 and 2 of centre 1. Let j1 , j2 be the numbers
of customers in the queues of centre 2. As initial state we take i0 = (1, 0), j 0 = (5, 5).
Suppose we have to decide whether to rout an arriving customer at the first centre to queue
2, or to queue 1. Assume that there are no future arrivals. Take µ = 1. We calculate,
under both actions, the expected time until 12 customers have left, which is the time until
the system becomes empty. Let us denote this expected time if we start with (i1 , i2 ) resp.
(j1 , j2 ) in the first resp. second node with t i1 j1 . These numbers can be calculated with the
i2 j2
recursive formulae

t 00 = 0,


00




1 + δ(i1 )t i1 −1 j1 +1 + δ(i2 )t i1 j1 +1 + δ(j1 )t i1 j1 −1 + δ(j2 )t i1 j1


i2
j2
i2 j2
i2 j2 −1
i2 −1 j2
,
t i1 j1 =
i2 j2
δ(i
)
+
δ(i
)
+
δ(j
)
+
δ(j
)
1
2
1
2




1 + δ(i1 )t i1 −1 j1 + δ(i2 )t i1 j1 + δ(j1 )t i1 j1 −1 + δ(j2 )t i1 j1


i2 j2 −1
i2 j2 +1
i2 −1 j2 +1
i2 j2

,
 t i1 j1 =
i2 j2
δ(i1 ) + δ(i2 ) + δ(j1 ) + δ(j2 )
j1 ≤ j2 ,
j1 > j 2 .
We found that
t 25 =
05
659164549
37518487069
≈ 7.26758 < 7.27133 ≈
= t 15
15
90699264
5159780352
and this means that the SQP does not stochastically maximize the number of departures.
Because if it did, the time until 12 customers have depart would be stochastically smaller
under the SQP and so the expectation would be.
This result also holds for Poisson arrivals with λ small enough. To see this, let T i1 j1
i2 j2
be the time until i1 + i2 + j1 + j2 customers have left when there are no arrivals and denote
the distribution by F i1 j1 . T iA1 j1 is similarly defined but includes Poisson(λ) arrivals. Let
i2 j2
i2 j2
T˜ be exponentially distributed with mean λ1 . Because T 15 is a continuous r.v. with finite
15
R∞
expectation we can take t˜ such that 0.0001 = t˜ (t − t˜)dF 15 (t). Let λ0 be such that
15
IP(T˜ ≤ t˜) = 0.0001 when λ ≤ λ0 . Define Tˆ: IP(Tˆ = 0) = 1 − IP(Tˆ = tˆ) = 0.0001. Then
A
IET 25
≤ IET 25 < IET 15 − 0.003 < 7.27134 − 0.003 = 7.26834
05
05
15
< (1 − 0.0001)(7.27132 − 0.0001) < (1 − 0.0001)(IET 15 − 0.0001)
15
Z ∞
Z t˜
= (1 − 0.0001)
tdF 15 (t) +
t˜dF 15 (t) = (1 − 0.0001)IE min(T 15 , t˜)
0
15
t˜
15
A
= IE min(T 15 , Tˆ) ≤ IE min(T 15 , T˜) ≤ IET 15
.
15
15
13
15
15
From this we conclude that for λ ≤ λ0 the SQP does not minimize the expected time
until 12 customers have departed. Hence it cannot stochastically minimize the departure
times.
5. Full information: On the optimal policy
To study the optimal policy (OP) for more realistic values of λ than considered in the last
section we did various numerical calculations on the 2 centre model. We fixed the service
rate µ to 1 and varied the arrival rate λ and the buffer size B of the queues. We computed
the OP for the discounted and average reward criterion. Our results are summarized in
the 2 tables below. First we consider discounted rewards.
In each state we took as reward rate the departure rate of the system. We found
that the SQP is not optimal for values of λ lower than 1 and the discount factor α bigger
than 0.25. Taking bigger buffer sizes does not change the OP on existing states. But the
number of states in which the OP deviates from the SQP increases with the buffer size (as
the number of states increases with the buffer size).
The results with B = 20 are shown in table 1. For each combination of α and λ
the table contains the maximum relative difference between the OP and the SQP. These
numbers are calculated with the formula maxi
viα −viα (SQP)
,
viα
where viα resp. viα (SQP) is the
reward under the OP resp. SQP and the maximization is taken over all possible states.
With ‘-’ as entry we mean that the OP is equal to the SQP. Remark that the SQP is nearly
optimal. However, the number of states in which the SQP deviates from the OP can be
large, e.g. 742 in the model with λ = 1 and α = 0.5, in comparison with the total number
of states which is equal to 214 , it is small.
λ = .1
α =.1
.25
.5
.75
−12
4.93 · 10
2.35 · 10−8
3.15 · 10−6
4.48 · 10−5
.25
.5
−12
5.35 · 10
3.08 · 10−8
4.94 · 10−6
7.08 · 10−5
1
−12
1.60 · 10
1.72 · 10−8
3.69 · 10−6
5.43 · 10−5
−14
1.09 · 10
1.74 · 10−8
3.83 · 10−7
5.66 · 10−6
2
3
-
-
Table 1. Discounted rewards
In the discounted case, there is a time preference. If a customer leaves earlier, its
discounted reward is larger. When considering average rewards, there is no time preference.
The average number of customers leaving the system per unit time is the average expected
reward. This means that the optimal policy is the one which minimizes the average number
14
of blocked customers per unit of time. This policy appears to be the SQP. A more selective
cost function is the number of customers in the system. By Little’s theorem we see that
the policy which minimizes the average number of customers is equal to the policy which
minimizes the average time that a customer spends in the system in case the accepted
number per unit time stays constant. Again we compared, for B = 20, the OP and the
SQP. The relative difference between their average cost can be found in table 2. Once
again the SQP is nearly optimal, but the number of states in which the OP does not follow
the SQP can be large.
λ = .1
.25
.5
1
2
3
2.38 · 10−7
1.84 · 10−7
5.34 · 10−8
3.15 · 10−8
4.45 · 10−3
3.87 · 10−5
Table 2. Average rewards
6. Partial Information: Numerical evidence on the optimality of the SQP
For describing the state of the exponential tandem parallel queue we need (i1 , i2 ) resp.
(j1 , j2 ) the numbers of customers in the first resp. the second node. Stationary policies
in the corresponding Markov decision problem are functions then of both vectors. In
the partial information model we only allow policies R = (R1 , R2 ) where Ri , i = 1, 2
is the policy in node i and only depends on (i1 , i2 ) resp. (j1 , j2 ) for i = 1 resp. i = 2.
The standard algorithms like successive approximations, policy improvement and linear
programming cannot be used to solve this problem.
In Kulkarni & Serin [14] an algorithm is derived which finds a local optimum or a
saddle point in a restricted class of policies, which they call implementable policies.
Loeve & Pols applied their algorithm to our problem. They solved problems with
buffer sizes in both nodes going up to 30 and approximately 25 · 104 states. Two types
of Sun workstations were used. Clearly a lot of swapping between the main memory and
the disk had to be done. However, the system time which included the swap time never
exceeded 1.77% of total computing time. Depending on the starting policy this computing
time could be long. Table 3, taken from Loeve & Pols [16] summarizes our experience. In
our opinion it is encouraging that problems of these sizes can be solved nowadays on desk
top workstations.
In all problem instances we found that the SQP was a locally optimal policy. For
a subclass we used a search technique in order to check whether the local was a global
15
optimum. In all cases it turned out to be so. The problems analysed have arrival rate
λ = 4, service rate µ = 5 and continuous discount rate equal to 12. Although we did
not vary these parameters, we believe that the results give substantial evidence for the
optimality of the SQP.
starting policy (NA , NB ) computertime systemtime
SQP
(30,30)
18 h.
1.77%
”
(25,25)
7 h.
0.83
”
(20,20)
3 h.
0.64
”
(15,15)
1.5 h.
0.03
”
(10,10)
11 min.
0.83
”
(8,8)
7 min.
0.95
all to queue 1
(25,25)
102 h.*
1.41
”
(20,20)
138.5 h.
0.03
”
(15,15)
43 h.
0.15
”
(10,10)
3 h.
0.03
”
(8,8)
1 h.
0.50
Bernoulli Policy (20,20)
56 h.
0.05
”
(15,15)
84.5 h.*
1.18
”
(10,10)
6 h.
0.04
”
(8,8)
1.75 h.
0.49
random policy
(10,10)
19.5 h.
0.44
”
(8,8)
9 h.
0.21
”
(6,4)
1.75 h.
0.02
* calculated on a faster computer
Table 3. Partial Information
Acknowledgement
This paper was written while the first author was on sabbatical leave at the Department
of Operations Research of the University of North Carolina at Chapel Hill. The hospitality and stimulating discussions with V.G. Kulkarni and S. Stidham, Jr. are kindly
acknowledged.
The authors thank Anneke Loeve and Mandy Pols for the use of their analysis of the
SQP under partial information.
16
References
[1] E.F. Abdel-Gawad (1984). Optimal control of arrivals and routing in a network of
queues. Ph.D. thesis, N.C. State University, Raleigh.
[2] I.J.B.F. Adan, J. Wessels & W.H.M. Zijm (1989). Analysis of the symmetric shortest
queue problem. Stochastic Models 6: 691–713.
[3] F.J. Beutler & D. Teneketzis (1989). Routing in queueing networks under imperfect
information: stochastic dominance and thresholds. Stochastics and Stochastic Reports
26: 81–100.
[4] B.W. Conolly (1984). The autostrada queueing problem. Journal of Applied Probability 21: 394–403.
[5] E. Davis (1977). Optimal control of arrivals to a two-server queueing system with
separate queues. Ph.D. thesis, N.C. State University, Raleigh.
[6] A. Ephremides, P. Varaiya & J. Walrand (1980). A simple dynamic routing problem.
IEEE Transactions on Automatic Control 25: 690–693.
[7] B. Hajek (1984). Optimal control of two interacting service stations. IEEE Transactions on Automatic Control 29: 491–499.
[8] S. Halfin (1985). The shortest queue problem. Journal of Applied Probability 22:
865–878.
[9] R. Hariharan, V.G. Kulkarni & S. Stidham, Jr. (1990). A survey of research relevant to
virtual-circuit routing in telecommunication networks. Technical report UNC/OR/TR9013, University of N.C. at Chapel Hill.
[10] R. Hariharan, V.G. Kulkarni & S. Stidham, Jr. (1990). Optimal control of two parallel infinite-server queues. Technical report UNC/OR/TR90-19, University of N.C. at
Chapel Hill.
[11] A. Hordijk & G. Koole (1990). On the optimality of the generalized shortest queue
policy. Probability in the Engineering and Informational Sciences 4: 477–487.
[12] D.J. Houck (1987). Comparison of policies for routing customers to parallel queueing
systems. Operations Research 35: 306–310.
[13] P.K. Johri (1989). Optimality of the shortest line discipline with state-dependent
service times. Europian Journal of Operations Research 41: 157–161.
[14] V.G. Kulkarni & Y. Serin (1990). Optimal implementable policies: discounted cost
case. Working paper.
[16] A. Loeve & M. Pols (1990). Optimale toewijzing van klanten in wachtrijmodellen met
17
twee bedieningscentra. Master thesis, University of Leiden.
[17] R. Menich & R.F. Serfozo (1991). Monotonicity and optimality of symmetric parallel
processing systems. Queueing Systems 9: 403–418.
[18] D. Towsley, P.D. Sparaggis & C.G. Cassandras (1992). Optimal routing and buffer
allocation for a class of finite capacity queueing systems. IEEE Transactions on Automatic Control 37: 1446–1451.
[19] J. Walrand (1988). An Introduction to Queueing Networks. Prentice-Hall, Englewood
Cliffs.
[22] W. Winston (1977). Optimality of the shortest line discipline. Journal of Applied
Probability 14: 181–189.
18