On the Shortest Queue Policy for the tandem
Transcription
On the Shortest Queue Policy for the tandem
On the Shortest Queue Policy for the tandem parallel queue Arie Hordijk and Ger Koole Department of Mathematics and Computer Science University of Leiden and Department of Operations Research University of North Carolina at Chapel Hill Probability in the Engineering and Informational Sciences 6:63–79, 1992 1. Introduction We consider two nodes in tandem. At each node or service centre there are two servers present with the same service rate µ and each with its own queue. Customers arrive at the first node according to a Poisson process with arrival rate λ. At their arrival, they have to be assigned to one of the servers, so they are routed to one of the queues at node 1. Customers leaving centre 1 enter node 2 and are routed to one of the queues at node 2 (see figure 1). centre 2 centre 1 Poisson(λ) ...queue .......................... ..1 ...queue .......................... ..1 ..... ..... ......... ........ ......... ........ . .... ... ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..... . . . ..... . . . . . . . . . . . . . . . . . . . . . ... ..... . .. .. ............................................ ......... ............................................ ......... ..... ..... . . . . ..... . . . . . . . . . . .... ... ... .. ......... ................................................................... .................................................... ............................................................... . ..... .... .... queue 2 queue 2 . . . . . . ..... . . . . . . ............................................. ..... ............................................. ..... ..... ..... . . . . . . . ........ ........ ..... .... .... ..... .................... ................... .... .... ... ... .. .. .. .. ............................................. ............................................. µ µ µ µ Figure 1. The goal of our analysis is to find a policy which stochastically minimizes the number of customers in the system at any time point. In the communication model which motivated this research the information on which an arriving customer has to be routed to one of the queues consists of the lengths of the queues in that node. So at any node there is no information available about the queue lengths of the other node. In this paper we consider two cases. One is the model described above, which we call the tandem parallel queue with Partial Information (PI). The other is the case where the routing decisions in a node may also depend on the queue lengths at the other node. We call this the Full Information (FI) case. 1 It can be shown that the Shortest Queue Policy (SQP) in node 2 dominates any other policy in that node including policies which use information of node 1. So in the PI as well as in the FI case an optimal policy can use the SQP rule in node 2. A similar result does not hold for the first node as we show in section 5 through numerical calculations. For finite buffers in both centres the Optimal Policy (OP) is computed for the FI case and the total expected discounted reward, with immediate reward rate the departure rate of customers from the system. In table 1 the relative difference of the value vectors of the OP and the policy which uses SQP in both nodes is given for various values of the arrival rate λ and the discount factor α. It can be seen in that table that the policy with SQP in Both Nodes (SQBNP) is not discounted optimal. Hence, it does not stochastically minimize the number of customers in the system at any time point. By using sufficiently large buffer sizes we found that a similar result is also true for infinite buffer sizes. Although the SQBNP is not optimal, from the numerical calculations it is seen that it is nearly optimal. Indeed, even as the number of states where the OP is not equal to the SQBNP is in some cases large the relative differences remain small. More insight why this is so may be gathered from the counterexample in section 4. For the PI case our conjecture is that the SQBNP is optimal. This conjecture is supported by extensive numerical evidence. For finding the optimal policy in the PI case the standard stochastic dynamic programming algorithms are not applicable. Therefore we used the recently developed algorithm of Kulkarni and Serin, which worked well. Policies based on partial information are what they call implementable policies. The table 3 in section 6 reports on the various values of the buffer sizes for which the optimal policy has been computed for the PI case. In all instances the SQBNP was locally optimal. In section 2 we show that in the PI case the SQBNP is optimal in a large class of policies. Indeed, the SQBNP dominates any policy which uses a static policy in the second node. In order to prove this result we use two theorems. In theorem 2.1 we consider one node with a general arrival process. We compare the SQP with an arbitrary policy R. The theorem says that the departure processes for SQP and R can be coupled such that with probability one the k-departure under the SQP is before the k-departure under R for any k ∈ IN. This theorem is the topic of section 3. It strengthens the standard result that the number of departures at any time point is stochastically maximized by SQP. As pointed out in remark 3.5 the basic lemma 3.3 extends to the case of finite buffers. Hence the SQP gives also for finite buffer sizes a pathwise earlier departure process. Interestingly enough 2 it also maximizes the number of accepted customers. In theorem 2.2 we again consider one node. For a given policy we compare the departure processes for two coupled arrival processes. It is shown for a static policy that a pathwise earlier arrival process gives a pathwise earlier departure process. Counterexample 2.4 shows that theorem 2.2 is not valid for the SQP. This is the idea behind the counterexample of section 4, where we find that it is better to postpone arrivals in states with queue lengths which are equal and not small. This counterexample explains why the SQBNP is not an optimal policy in the FI case when the traffic is light. It turns out that the states of node 2 for which delaying customers in the first node pays off, have a low probability of occurring. This is the reason why the relative differences between the OP and the SQBNP are small. There is an extensive literature on the routing of customers to parallel queues. For an introduction we refer to Walrand [19]. An extensive overview is given in Hariharan et al. [9]. Winston [22] shows that the SQP stochastically minimizes the expected discounted cost over any time-horizon for single exponential servers. In Davis [5] it is shown that the SQP is optimal for discounted waiting costs and an arrival process of GI-type. He considers the more general case where customers may be rejected. Weber [20] derives the optimality of the policy which selects among the shortest queues the one for which the given service is longest. He allows general arrival processes and service times with increasing failure rate. Whitt [21] shows that the shortest delay policy which routes customers to the queue with shortest expected waiting time, is not optimal if the service time distribution has a U-shaped failure rate. Ephremides et al. [6] prove that the SQP minimizes the total time for the completion of service of all customers which arrive before time t. They also show that the cyclic assignment policy is optimal when the queue lengths are not observed. In Lehtonen [15] Winston’s result is strengthened. He shows that for the SQP the departure process is pathwise earliest. In Hordijk & Koole [11] we prove that the SQP maximizes stochastically the number of customers served at any time t when the queues have finite buffers. The buffers may have different capacities. Using lemma 3.1 of this paper we can show that the SQP also has the earliest departure process in case of finite buffers. In Towsley et al. [18] also finite buffers are considered. They show that the ”join the shortest non-full queue” is optimal with respect to weak majorization. They also consider the case where buffering is available at the controller. 3 All the models above assume a single server at any of the queues. In Johri [13] state-dependent service rates are considered. For Poisson arrivals the SQP stochastically minimizes the number of customers at any time point when the service rates are nondecreasing concave. In Menich and Serfozo [17] the service and arrival rates are functions of all queue lengths. They show that SQP is optimal with respect to weak submajorization if the arrival and service rates are families of interchangeable functions. Their conditions include the case in which each service station has s identical exponential servers. Their assumptions do not allow finite buffers. With the exception of finite and unequal buffers all models mentioned so far are symmetric in all queues. In Abdel-Gawad [1] the results of Davis [5] are extended to parallel queues with different service rates. A general two-station control model is analysed in Hajek [7]. This model includes as special case the routing problem for two non-symmetric single-server nodes, each with its own queue. In Hariharan et al. [10] two parallel queues are considered which are not symmetric since the holding cost functions are different. The optimal routing and admission control problem for infinite-server exponential queues with a common rate is analysed. For a Poisson arrival process and nondecreasing convex holding cost function admission and routing monotonicity for the discounted optimal policy are shown. Many papers address the problem of obtaining the stationary distribution under the SQP. Using a compensation method Adan et al. [2] have recently shown that it equals an infinite linear combination of product forms with explicit relations for the parameters. Bounds and approximations can be found in Halfin [8], Conolly [4] and Houck [12]. In Houck [12] two queues with unequal numbers of servers are considered. It turns out that the shortest expected delay policy is nearly optimal. This result concurs with our findings on the SQP that, although not optimal in the full information case, its performance is very close to the optimal policy. None of the papers mentioned studies the case of two nodes in tandem with parallel queues in each node. In this paper we consider the full and partial information case for this model. In the last case the control in the first node may not depend on the number of customers in the second node. In Beutler & Teneketzis [3] partial information or decentralized control is studied for a tandem queue. The transfer rate for customers from node 1 to node 2 is controlled and the information on the number of customers is through a probability distribution. It is shown that the optimal control is a threshold 4 policy. 2. Partial Information: SQP is faster than static policies. In this section we prove that in the PI case the SQP in both nodes gives an earlier departure process than the two node policy which uses a static policy in the second node. In order to do so we need two theorems. The first states that the SQP gives a pathwise earlier departure process. The second theorem says that for a static policy an earlier arrival process gives an earlier departure process. Combining these theorems indeed gives the desired result. We see an arrival process as a sequence of arrival times. That is, the arrival process V = {Vn , n ∈ IN} has Vn as the time of the nth arrival. For arrival processes V = {Vn , n ∈ IN} and W = {Wn , n ∈ IN} we say that V is pathwise earlier than W and d d we write V ≤p W if there are arrival processes V ∗ and W ∗ with V ∗ = V and W ∗ = W such that Vn∗ (ω) ≤ Wn∗ (ω) for all ω and n ∈ IN. A similar definition and notation we use d for departure processes. As is standard = means that the processes on either side have the same distribution. Then the result of section 3 can be stated as Theorem 2.1. Consider one centre with two parallel queues, arrival process U and policy SQP resp. an arbitrary policy R. If V resp. V˜ are the departure processes, then V ≤p V˜ . A policy R is called a static policy if it is defined by a sequence of random variables {Πn , n ∈ IN}, where Πn = j corresponds to routing the jth arriving customer to queue j. The routing probabilities are stochastically independent of the queue lengths and the arrival times. We get the Bernoulli Policy if all Πn are independent and IP(Πn = 1) = static policy is the Cyclic Assignment Policy if IP(Πn+1 = j + 1 1 2 for all n. The (mod 2) | Πn = j) = 1 for all n. Theorem 2.2. Consider one centre with two parallel queues and a static policy R. For arrival processes T and T˜ the departure processes are denoted by V resp. V˜ . Then T ≤p T˜ ⇒ V ≤p V˜ . ( ) d ( ) Proof. Because T ≤p T˜ there are arrival processes T ∗ and T˜∗ with T˜ = T˜∗ such that ( ) ( ) ( ) Tn∗ (ω) ≤ T˜n∗ (ω) ∀n∀ω. Fix ω ∈ Ω. We use the following notation: T˜n∗ (ω) = t˜n . Let S˜n ( ) ˜n be the service time of resp. the queue to which the nth customer is routed. Of resp. U 5 d d ˜ course Sn = S˜n . Because R is static we also have Un = U n . Hence by coupling arguments ˜n for all n. Denote an arbitrary realization of we may assume that Sn = S˜n and Un = U Sn , Un , n > 0 with sn , un , n > 0. We omit the superscript ∗ . Let ξ T,V (t) resp. ξ˜T,V (t) be the number of arrived resp. served customers at time t. A subscript j denotes a specific queue. Then (with I{···} the indicator function): ξ T (t) ξjV (t) = X n=1 I{un = j} I{t + Pn I k l=k {ul =j} sl ≤ t, k = 1, . . . , n} ξ˜T (t) ≥ X n=1 I{un = j} I{t + Pn I k l=k {ul =j} sl ≤ t, k = 1, . . . , n} ξ˜T (t) ≥ = X I{un = j} I{t˜ + Pn I k l=k {ul =j} sl ≤ t, k = 1, . . . , n} n=1 ξ˜jV (t), j = 1, 2. Thus ξ V (t) = ξ1V (t) + ξ2V (t) ≥ ξ˜V (t) ∀t and tVn ≤ t˜Vn ∀n. Remark 2.3. This theorem is also true for general service times. Unfortunately, it does not hold for the SQP as the following counterexample shows. Counterexample 2.4. Take T1 = T2 = T3 = T˜1 = T˜2 = 0; T˜3 = h; Tn = T˜n > 1 + h ∀n ≥ 4. Thus T ≤p T˜. Compare the probabilities that 2 customers have left at t = 1 + h. Condition on the number of departures in [0, h]. If no departures occur in [0, h], the two systems behave the same. On the other hand, if exactly one departure occurs in [0, h], the time until the next departure in the T -model will have with probability 1 2 1 2 an Erlang(µ) and with probability an Erlang(2µ) distribution. Indeed, the customer departing in [0, h] leaves the queue with one customer with probability 12 and the queue with two customers with probability 1 ˜ 2 as well. In the T -model the at h arriving customer chooses the empty queue, therefore the time until the next departure will be Erlang(2µ) distributed. The difference between these two probabilities, say c, does not depend on h, but only on µ. The probability that one customer leaves in [0, h] is equal to 2µh + o(h). The probability that two customers leave in [0, h] in [0, h] is o(h). Now we have: IPT (2 customers leave in [0, h]) − IPT˜ (2 customers leave in [0, h]) = 2µh + o(h) > 0, if h is small enough. 6 Combining the theorems 2.1 and 2.2 gives the following result for the two centres in tandem. Theorem 2.5. Let R = (R1 , R2 ) be the two node policy with static policy R2 in centre 2. Denote R∗ = (SQP, SQP) for the two node policy which use the SQP in both centres. ˆ be the departure processes of the second For a general arrival process T let W resp. W node under R resp. R∗ . Then ˆ ≤p W. W Proof. The proof easily follows from the propositions 2.1 and 2.2. As depicted in figure 2 let V resp. Vˆ be the notation for the departure processes of the first node under policy R resp. R∗ . The departure process of the second node for the policy (SQP, R2 ) is denoted ˜. by W ............................ ............................ ... ... ...... ...... . ............................................................. ............................................................. .......................................................... ...... ...... ... . ... . . . ................1 . . . . ..............2 ...... ........... T V R V˜ R W ˜ W .................... .............................. .......... .... ...... .. .......................................................... .. .... ............................................................. . .............................................................. . ....... . . .... ..................2 ......................... ........... T SQP V˜ R ˆ W ......................... ............................ .... ....... ... ...... .. .. ... . . ............................................................. ............................................................... ......................................................... ...... ............................... .............................. T SQP SQP Figure 2. From proposition 2.1 we have V˜ ≤p V. Hence by proposition 2.2 ˜ ≤p W. W Proposition 2.1 also gives ˆ ≤p W ˜. W Combining the last two inequalities yields ˆ ≤p W. W Remark 2.6. It is straightforward to generalize the result of theorem 2.5 to a network of centres in tandem. The proof goes by induction on the number of nodes. Suppose it is true for k nodes. Assume V resp. V˜ are the departure processes of the kth node when using (R1 , . . . , Rk ) with Ri static for 1 ≤ i ≤ k, resp. SQP in each node. Then by the 7 induction hypothesis V˜ ≤p V and we can use again the same arguments as in the proof of theorem 2.5. 3. One node: Pathwise optimality of the SQP Consider the model consisting of 1 centre. We prove that the sample paths under the SQP and an arbitrary policy R can be coupled such that with probability one the number of customers present is smaller under SQP at any time point. In Hordijk & Koole [11] we used the following general arrival process. Definition 3.1. Arrival process. Let Λ be the—possibly countable—state space of a Markov process with transition rates λxy , x, y ∈ Λ. When this process moves from x to y P a customer arrives with probability qxy , y∈Λ qxy ≤ 1 ∀x ∈ Λ. It can be shown that any arrival process can be approximated arbitrarily closely by this type of Markovian process. In this approximation we may assume without loss of P generality that y λxy ≤ M ∀x. ˆ Now consider 2 Markov processes X(t) and X(t), corresponding to the models with policies SQP and R respectively. We assume that they have the same arrival process which satisfies definition 3.1. The state of these processes has three components. x for the state of the arrival process, i1 resp. i2 for the number of customers in queue 1 resp. queue 2. Due to the boundedness of the transition rates we can uniformize the Markov processes. Doing so we can analyse the Markov processes through the embedded Markov chains, since all time intervals between transitions, which may be dummy transitions, are now exponentially distributed with mean 1 M +2µ . In the sequence we refer to the model with policy R resp. policy SQP by using resp. not using the symbol ˆ. With (ˆ) we indicate that the relation holds for both models. Also the Markov chains have state space (x, i1 , i2 ), with x the state of the arrival process and i1 , i2 the numbers in queue 1 resp. 2. The transition probabilities are: ( ) rˆ(x,i1 ,i2 )(x,i1 −1,i2 ) = p if i1 > 0 ( ) rˆ(x,i1 ,i2 )(x,i1 ,i2 −1) = p if i2 > 0 ( ) rˆ(x,i1 ,i2 )(y,i1 +1,i2 ) = pxy qxy if SQP, R assigns to queue 1 in state (i1 , i2 ) ( ) rˆ(x,i1 ,i2 )(y,i1 ,i2 +1) = pxy qxy if SQP, R assigns to queue 2 in state (i1 , i2 ) 8 ( ) rˆ(x,i1 ,i2 )(y,i1 ,i2 ) = pxy (1 − qxy ) X ( ) rˆ(x,i1 ,i2 )(x,i1 ,i2 ) = 1 − 2p − pxy y where p = µ M +2µ and pxy = λ M +2µ . All other transitions have probability 0. Now we define 2 Markov processes Y (t) resp. Yˆ (t) corresponding to the policies SQP ( ) ˆ resp. R with state space equal to that of X(t). Let Sn , Sˆn be the times between the transitions. All Sn are independently exponentially distributed with parameter M + 2µ ˆn which are independent and similarly for Sˆn . The transitions are generated through Un , U random variables with uniform distribution on [0, 1]. When at the nth transition time of the Y (t) process the state is (x, i1 , i2 ), then a customer leaves queue 1 resp. 2 when Un ∈ (0, p] resp. (p, 2p]. If that queue is empty a dummy transition occurs. An arriving customer is assigned to a queue according to the SQP. The arrival process changes from x P P to y and a customer arrives if Un ∈ (2p + z<y pxz , 2p + z<y pxz + pxy qxy ], the arrival P process changes from x to y and no customer arrives if Un ∈ (2p + z<y pxz + pxy qxy , 2p + P P z≤y pxy ]. A dummy transition occurs if Un 6∈ (0, 2p + z pxz ]. The process under policy ˆn variables. It is easy to see that the transition R is defined in a similar way with the U ( ) ˆn are equal to the (rˆ) defined above. Consequently, the probabilities generated by the U stochastic processes X(t), t ≥ 0 and Y (t), t ≥ 0 have the same distribution and similarly ˆ for X(t), t ≥ 0 and Yˆ (t), t ≥ 0. ( ) ( ) ˆn completely describe the evolution of the processes. Now we couple The r.v. Sˆn , U ˆn . We take Sn = Sˆn , which means that both processes by relating Sn to Sˆn and Un to U events in both models take place at the same time epochs. Thus we can concentrate on the ˆn depends embedded Markov chains governed by the Un . The relation between Un and U ˆn = Un if ˆı1 ≤ ˆı2 and i1 ≤ i2 or on the numbers in the queues, i.e. on ((ˆı)1 , (ˆı)2 ). We take U ˆı1 ≥ ˆı2 and i1 ≥ i2 . If the longest queue in the SQP-model has not the same index as the ˆn = Un longest in the R-model, thus if ˆı1 < ˆı2 and i1 > i2 or ˆı1 > ˆı2 and i1 < i2 , we take U ˆn = Un + p if Un ∈ (0, p] and U ˆn = Un − p if Un ∈ (p, 2p]. Note that if Un ∈ (2p, 1], U ˆn and Sn to Sˆn does not invalidate our assumptions about their the coupling of Un to U distributions. In our coupling, the transitions in the arrival processes of Y (t) and Yˆ (t) are equal. If in both models the same queue is longest then service completions occur in the same queue. Otherwise, the service of the first queue in the Y (t) process is coupled to the service of the second queue in the Yˆ (t) process. 9 ( ) ˆ (n) denote the embedded Markov chains of Y(ˆ) (t). We will compare the sample Let N ˆ (n). Let ω = (u1 , u2 , . . .) be a sample path of (Un , n > 0). Then paths of N (n) and N ˆ (n)(ω) give the sample paths of the embedded Markov chains we compare. N (n)(ω) and N ( ) ( ) ˆ (n)(ω) = (x We write N ˆn , (ˆı)1,n , (ˆı)2,n ). Lemma 3.2. If (ia) i1,n + i2,n = ˆı1,n + ˆı2,n (ib) |i1,n − i2,n | ≤ |ˆı1,n − ˆı2,n | (ic) xn = x ˆn (iia) i1,n + i2,n < ˆı1,n + ˆı2,n (iib) |i1,n − i2,n | − 1 ≤ |ˆı1,n − ˆı2,n | (iic) xn = x ˆn or holds for n = 0, then for any n > 0 one of these sets of relations is true. The relations (ia) and (ib) say that under the SQP the numbers of customers are more equally divided over both queues if under the SQP and R the numbers of customers are equal. The relations (iia) and (iib) are concerned with the case that there are less customers under the SQP. In this case there is a similar expression on the balance of customers, except that now the SQP may be slightly more unbalanced. The logic behind this can be seen when regarding states with i1,n + i2,n = ˆı1,n + ˆı2,n − 1 and ˆı1,n = ˆı2,n . Now (ib) cannot hold, but (iib) can. Below we proof that these relations are indeed valid. Proof of lemma 3.2. By induction. Suppose the lemma holds for 1, . . . , n. We split the proof for n + 1 and we consider the different cases depending on the value of un+1 and whether (i) or (ii) holds for n. From the relation between un and u ˆn it is clear that xn+1 = x ˆn+1 . Therefore we can concentrate on the numbers of customers. 2p < un+1 ≤ 1. Because a customer does or does not arrive in both models, it is sufficient to show (ib) for n + 1 if (i) holds for n. Similarly in the case that (ii) holds for n. (i) holds for n. If |i1,n −i2,n | = 0, then |i1,n+1 −i2,n+1 | = 1. At n + 1 the total number of customers is odd, thus |ˆı1,n+1 − ˆı2,n+1 | ≥ 1. If |i1,n − i2,n | > 0, |i1,n+1 − i2,n+1 | = |ı1,n − i2,n | − 1 because of the SQP. |ˆı1,n − ˆı2,n | cannot decrease by more than 1 when 1 customer arrives, thus (ib) remains valid. (ii) holds for n. If |i1,n − i2,n | = 0, then |i1,n+1 − i2,n+1 | − 1 = 0. Thus (iib) is valid. If |i1,n − i2,n | > 0, (iib) is valid by the same reasoning as given above. 0 < un+1 ≤ p. Choose j and ˆ such that queue j and queue ˆ are the shorter queues. This means that queue j and queue ˆ are served at the same time. Note that service may be 10 given to an empty queue. If i1,n = i2,n or ˆı1,n = ˆı2,n holds choose j or ˆ such that queues j and ˆ are served at the same time. For queue (ˆ) we denote the other queue with (ˆ) ± 1. (i) holds for n. Consider the case where un+1 is such that the queues j and ˆ are served: ( ) ( ) ˆ (n + 1)(ω) = N ˆ (n)(ω). ˆıˆ,n = 0, ij,n = 0 ⇒ N ˆıˆ,n = 0, ij,n > 0 ⇒ i1,n+1 +i2,n+1 = ˆı1,n+1 +ˆı2,n+1 −1. The unbalance |i1,n −i2,n | cannot increase by more than 1, thus (iib) holds also for n + 1. ˆıˆ,n > 0. Since queue (ˆ) is the emptier queue we have from (i) that ˆıˆ,n ≤ ij,n ≤ ij±1,n ≤ ˆıˆ±1,n . Thus because ij,n > 0 (ia) holds for n + 1. Furthermore we have |(ˆı)1,n+1 − (ˆı)2,n+1 | = 1 + |(ˆı)1,n − (ˆı)2,n |, thus (ib) also holds. Let un+1 be such that the longer queues, i.e. j ± 1 and ˆ ± 1, are served. ˆıˆ±1,n = 0. Since ˆıˆ,n ≤ ij,n ≤ ij±1,n ≤ ˆıˆ±1,n , we have i1,n = i2,n = 0 ⇒ ( ) ( ) ˆ (n + 1)(ω) = N ˆ (n)(ω) N ˆıˆ±1,n > 0, ij±1,n = 0 ⇒ ij,n = 0 ⇒ i1,n+1 = i2,n+1 = 0 thus (i) or (ii) holds for n + 1. ˆıˆ±1,n > 0, ij±1,n > 0. (ia) holds for n + 1. The unbalance decreases in both queues with 1, thus (ib) holds also. (ii) holds for n. Consider again the case where un+1 is such that queue j and ˆ are served: ( ) ( ) ˆ (n + 1)(ω) = N ˆ (n)(ω) ˆıˆ,n = 0, ij,n = 0 ⇒ N ˆıˆ,n = 0, ij,n > 0, (iia) ⇒ ij,n + ij±1,n + 1 ≤ ˆıˆ±1,n ⇒ ij±1,n + 2 ≤ ˆıˆ±1,n . The numbers of customers in the queues j ± 1 and ˆ ± 1 remain unchanged, therefore |ˆıˆ,n+1 − ˆıˆ±1,n+1 | = ˆıˆ±1,n ≥ ij±1,n + 2 > |ij,n+1 − ij±1,n+1 |, and (iib) clearly holds. It is easy to see that also (iia) holds for n + 1. ˆıˆ,n > 0 ⇒ ij,n > 0, use similar arguments as for case (i). Thus (iia) holds for n + 1. The unbalance increases by 1 for both models, so (iib) holds also. If un+1 is such that service is given to the longer queues, we get: ˆıˆ±1,n = 0 ⇒ ˆıˆ,n = 0 so that (iia) can not hold. ˆıˆ±1,n > 0, ij±1,n = 0 ⇒ i1,n+1 = i2,n+1 = 0 thus (i) or (ii) holds for n + 1. ˆıˆ±1,n > 0, ij±1,n > 0. Because in both processes a customer leaves (iia) remains valid. If |ij,n − ij±1,n | > 0, then |ij,n+1 − ij±1,n+1 | = |ij,n − ij±1,n | − 1, which means that (iib) holds. On the other hand, if |ij,n − ij±1,n | = 0, then |ij,n+1 − ij±1,n+1 | − 1 = 0 which establishes (iib) also. 11 p < un+1 ≤ 2p. Analogously to 0 < un+1 ≤ p. Lemma 3.2 leads to the main result of this section. Theorem 3.3. Consider two parallel servers with the same service rate. There are versions ˆ K(t) and K(t) of the stochastic processes of total numbers of customers under policy R resp. SQP such that ˆ K(t) ≤ K(t) for all t. Proof. If t is between the nth and (n + 1)th transition epoch then ( ) ˆ K(t) = (ˆı)1,n + (ˆı)2,n . Hence the proof follows the relations (ia) and (iia) of lemma 3.2. ˆ Remark 3.4. Since the arrivals for the K(t) and K(t) process are at the same time instants, the departures under the SQP must be pathwise earlier than under policy R. This proves theorem 2.1. Remark 3.5. It is straightforward to check that lemma 3.2 remains true in case each queue has a finite buffer size, say B1 and B2 . In this more general case only policies are allowed which accept customers if not both buffers are full. This leads to the assertion that the SQP (note that in Hordijk & Koole [11] we named this policy generalized SQP) is admitting more customers to the system than other policies. Again using the (ia) and (iia) relations of extended lemma 3.2 gives that theorem 2.1 is also true for finite buffers. 12 4. Full information: SQP is not optimal We consider again the tandem parallel queue as shown in figure 1 and we denote with i1 and i2 the numbers of customers in queue 1 and 2 of centre 1. Let j1 , j2 be the numbers of customers in the queues of centre 2. As initial state we take i0 = (1, 0), j 0 = (5, 5). Suppose we have to decide whether to rout an arriving customer at the first centre to queue 2, or to queue 1. Assume that there are no future arrivals. Take µ = 1. We calculate, under both actions, the expected time until 12 customers have left, which is the time until the system becomes empty. Let us denote this expected time if we start with (i1 , i2 ) resp. (j1 , j2 ) in the first resp. second node with t i1 j1 . These numbers can be calculated with the i2 j2 recursive formulae t 00 = 0, 00 1 + δ(i1 )t i1 −1 j1 +1 + δ(i2 )t i1 j1 +1 + δ(j1 )t i1 j1 −1 + δ(j2 )t i1 j1 i2 j2 i2 j2 i2 j2 −1 i2 −1 j2 , t i1 j1 = i2 j2 δ(i ) + δ(i ) + δ(j ) + δ(j ) 1 2 1 2 1 + δ(i1 )t i1 −1 j1 + δ(i2 )t i1 j1 + δ(j1 )t i1 j1 −1 + δ(j2 )t i1 j1 i2 j2 −1 i2 j2 +1 i2 −1 j2 +1 i2 j2 , t i1 j1 = i2 j2 δ(i1 ) + δ(i2 ) + δ(j1 ) + δ(j2 ) j1 ≤ j2 , j1 > j 2 . We found that t 25 = 05 659164549 37518487069 ≈ 7.26758 < 7.27133 ≈ = t 15 15 90699264 5159780352 and this means that the SQP does not stochastically maximize the number of departures. Because if it did, the time until 12 customers have depart would be stochastically smaller under the SQP and so the expectation would be. This result also holds for Poisson arrivals with λ small enough. To see this, let T i1 j1 i2 j2 be the time until i1 + i2 + j1 + j2 customers have left when there are no arrivals and denote the distribution by F i1 j1 . T iA1 j1 is similarly defined but includes Poisson(λ) arrivals. Let i2 j2 i2 j2 T˜ be exponentially distributed with mean λ1 . Because T 15 is a continuous r.v. with finite 15 R∞ expectation we can take t˜ such that 0.0001 = t˜ (t − t˜)dF 15 (t). Let λ0 be such that 15 IP(T˜ ≤ t˜) = 0.0001 when λ ≤ λ0 . Define Tˆ: IP(Tˆ = 0) = 1 − IP(Tˆ = tˆ) = 0.0001. Then A IET 25 ≤ IET 25 < IET 15 − 0.003 < 7.27134 − 0.003 = 7.26834 05 05 15 < (1 − 0.0001)(7.27132 − 0.0001) < (1 − 0.0001)(IET 15 − 0.0001) 15 Z ∞ Z t˜ = (1 − 0.0001) tdF 15 (t) + t˜dF 15 (t) = (1 − 0.0001)IE min(T 15 , t˜) 0 15 t˜ 15 A = IE min(T 15 , Tˆ) ≤ IE min(T 15 , T˜) ≤ IET 15 . 15 15 13 15 15 From this we conclude that for λ ≤ λ0 the SQP does not minimize the expected time until 12 customers have departed. Hence it cannot stochastically minimize the departure times. 5. Full information: On the optimal policy To study the optimal policy (OP) for more realistic values of λ than considered in the last section we did various numerical calculations on the 2 centre model. We fixed the service rate µ to 1 and varied the arrival rate λ and the buffer size B of the queues. We computed the OP for the discounted and average reward criterion. Our results are summarized in the 2 tables below. First we consider discounted rewards. In each state we took as reward rate the departure rate of the system. We found that the SQP is not optimal for values of λ lower than 1 and the discount factor α bigger than 0.25. Taking bigger buffer sizes does not change the OP on existing states. But the number of states in which the OP deviates from the SQP increases with the buffer size (as the number of states increases with the buffer size). The results with B = 20 are shown in table 1. For each combination of α and λ the table contains the maximum relative difference between the OP and the SQP. These numbers are calculated with the formula maxi viα −viα (SQP) , viα where viα resp. viα (SQP) is the reward under the OP resp. SQP and the maximization is taken over all possible states. With ‘-’ as entry we mean that the OP is equal to the SQP. Remark that the SQP is nearly optimal. However, the number of states in which the SQP deviates from the OP can be large, e.g. 742 in the model with λ = 1 and α = 0.5, in comparison with the total number of states which is equal to 214 , it is small. λ = .1 α =.1 .25 .5 .75 −12 4.93 · 10 2.35 · 10−8 3.15 · 10−6 4.48 · 10−5 .25 .5 −12 5.35 · 10 3.08 · 10−8 4.94 · 10−6 7.08 · 10−5 1 −12 1.60 · 10 1.72 · 10−8 3.69 · 10−6 5.43 · 10−5 −14 1.09 · 10 1.74 · 10−8 3.83 · 10−7 5.66 · 10−6 2 3 - - Table 1. Discounted rewards In the discounted case, there is a time preference. If a customer leaves earlier, its discounted reward is larger. When considering average rewards, there is no time preference. The average number of customers leaving the system per unit time is the average expected reward. This means that the optimal policy is the one which minimizes the average number 14 of blocked customers per unit of time. This policy appears to be the SQP. A more selective cost function is the number of customers in the system. By Little’s theorem we see that the policy which minimizes the average number of customers is equal to the policy which minimizes the average time that a customer spends in the system in case the accepted number per unit time stays constant. Again we compared, for B = 20, the OP and the SQP. The relative difference between their average cost can be found in table 2. Once again the SQP is nearly optimal, but the number of states in which the OP does not follow the SQP can be large. λ = .1 .25 .5 1 2 3 2.38 · 10−7 1.84 · 10−7 5.34 · 10−8 3.15 · 10−8 4.45 · 10−3 3.87 · 10−5 Table 2. Average rewards 6. Partial Information: Numerical evidence on the optimality of the SQP For describing the state of the exponential tandem parallel queue we need (i1 , i2 ) resp. (j1 , j2 ) the numbers of customers in the first resp. the second node. Stationary policies in the corresponding Markov decision problem are functions then of both vectors. In the partial information model we only allow policies R = (R1 , R2 ) where Ri , i = 1, 2 is the policy in node i and only depends on (i1 , i2 ) resp. (j1 , j2 ) for i = 1 resp. i = 2. The standard algorithms like successive approximations, policy improvement and linear programming cannot be used to solve this problem. In Kulkarni & Serin [14] an algorithm is derived which finds a local optimum or a saddle point in a restricted class of policies, which they call implementable policies. Loeve & Pols applied their algorithm to our problem. They solved problems with buffer sizes in both nodes going up to 30 and approximately 25 · 104 states. Two types of Sun workstations were used. Clearly a lot of swapping between the main memory and the disk had to be done. However, the system time which included the swap time never exceeded 1.77% of total computing time. Depending on the starting policy this computing time could be long. Table 3, taken from Loeve & Pols [16] summarizes our experience. In our opinion it is encouraging that problems of these sizes can be solved nowadays on desk top workstations. In all problem instances we found that the SQP was a locally optimal policy. For a subclass we used a search technique in order to check whether the local was a global 15 optimum. In all cases it turned out to be so. The problems analysed have arrival rate λ = 4, service rate µ = 5 and continuous discount rate equal to 12. Although we did not vary these parameters, we believe that the results give substantial evidence for the optimality of the SQP. starting policy (NA , NB ) computertime systemtime SQP (30,30) 18 h. 1.77% ” (25,25) 7 h. 0.83 ” (20,20) 3 h. 0.64 ” (15,15) 1.5 h. 0.03 ” (10,10) 11 min. 0.83 ” (8,8) 7 min. 0.95 all to queue 1 (25,25) 102 h.* 1.41 ” (20,20) 138.5 h. 0.03 ” (15,15) 43 h. 0.15 ” (10,10) 3 h. 0.03 ” (8,8) 1 h. 0.50 Bernoulli Policy (20,20) 56 h. 0.05 ” (15,15) 84.5 h.* 1.18 ” (10,10) 6 h. 0.04 ” (8,8) 1.75 h. 0.49 random policy (10,10) 19.5 h. 0.44 ” (8,8) 9 h. 0.21 ” (6,4) 1.75 h. 0.02 * calculated on a faster computer Table 3. Partial Information Acknowledgement This paper was written while the first author was on sabbatical leave at the Department of Operations Research of the University of North Carolina at Chapel Hill. The hospitality and stimulating discussions with V.G. Kulkarni and S. Stidham, Jr. are kindly acknowledged. The authors thank Anneke Loeve and Mandy Pols for the use of their analysis of the SQP under partial information. 16 References [1] E.F. Abdel-Gawad (1984). Optimal control of arrivals and routing in a network of queues. Ph.D. thesis, N.C. State University, Raleigh. [2] I.J.B.F. Adan, J. Wessels & W.H.M. Zijm (1989). Analysis of the symmetric shortest queue problem. Stochastic Models 6: 691–713. [3] F.J. Beutler & D. Teneketzis (1989). Routing in queueing networks under imperfect information: stochastic dominance and thresholds. Stochastics and Stochastic Reports 26: 81–100. [4] B.W. Conolly (1984). The autostrada queueing problem. Journal of Applied Probability 21: 394–403. [5] E. Davis (1977). Optimal control of arrivals to a two-server queueing system with separate queues. Ph.D. thesis, N.C. State University, Raleigh. [6] A. Ephremides, P. Varaiya & J. Walrand (1980). A simple dynamic routing problem. IEEE Transactions on Automatic Control 25: 690–693. [7] B. Hajek (1984). Optimal control of two interacting service stations. IEEE Transactions on Automatic Control 29: 491–499. [8] S. Halfin (1985). The shortest queue problem. Journal of Applied Probability 22: 865–878. [9] R. Hariharan, V.G. Kulkarni & S. Stidham, Jr. (1990). A survey of research relevant to virtual-circuit routing in telecommunication networks. Technical report UNC/OR/TR9013, University of N.C. at Chapel Hill. [10] R. Hariharan, V.G. Kulkarni & S. Stidham, Jr. (1990). Optimal control of two parallel infinite-server queues. Technical report UNC/OR/TR90-19, University of N.C. at Chapel Hill. [11] A. Hordijk & G. Koole (1990). On the optimality of the generalized shortest queue policy. Probability in the Engineering and Informational Sciences 4: 477–487. [12] D.J. Houck (1987). Comparison of policies for routing customers to parallel queueing systems. Operations Research 35: 306–310. [13] P.K. Johri (1989). Optimality of the shortest line discipline with state-dependent service times. Europian Journal of Operations Research 41: 157–161. [14] V.G. Kulkarni & Y. Serin (1990). Optimal implementable policies: discounted cost case. Working paper. [16] A. Loeve & M. Pols (1990). Optimale toewijzing van klanten in wachtrijmodellen met 17 twee bedieningscentra. Master thesis, University of Leiden. [17] R. Menich & R.F. Serfozo (1991). Monotonicity and optimality of symmetric parallel processing systems. Queueing Systems 9: 403–418. [18] D. Towsley, P.D. Sparaggis & C.G. Cassandras (1992). Optimal routing and buffer allocation for a class of finite capacity queueing systems. IEEE Transactions on Automatic Control 37: 1446–1451. [19] J. Walrand (1988). An Introduction to Queueing Networks. Prentice-Hall, Englewood Cliffs. [22] W. Winston (1977). Optimality of the shortest line discipline. Journal of Applied Probability 14: 181–189. 18