Albert-Ludwigs-Universität, Inst. für Informatik Prof. Dr. Fabian Kuhn

Transcription

Albert-Ludwigs-Universität, Inst. für Informatik Prof. Dr. Fabian Kuhn
Albert-Ludwigs-Universität, Inst. für Informatik
Prof. Dr. Fabian Kuhn
S. Daum, A. Ghodselahi, O. Saukh
December 3, 2013
Algorithm Theory, Winter Term 2013/14
Sample Solution 3
Exercise 1: Sub-Sequences
(a) Consider a sequence A = a1 , a2 , . . . , am of characters. A sub-sequence S of A is a sequence
that can be obtained by deleting some characters in A. Hence, for example, a, r, e, w, a, a
is a sub-sequence of b, a, a, r, e, z, w, a, b, a. Given two sequences A of length m and B of
length n, describe an O(m · n) time dynamic programming algorithm to compute the length
of a longest common sub-sequence of A and B! To describe the algorithm, specify the subproblems you need to solve (and store in a table), the initialization, as well as the recursive
rule to compute the sub-problems.
(b) Apply your algorithm to the inputs A = a, e, a, a, g, s, s, t, and B = b, a, e, b, s, a, g, t by
setting up and filling out the table you need. Explain how you get the length of the longest
common sub-sequence from your table. Also show how to compute a longest common subsequence of A and B.
(c) For two character sequences A and B, a super sequence C is a character sequence such that
both A and B are sub-sequences of C. Give an efficient algorithm for finding the shortest
common super-sequence of two strings A and B.
Hint: Try to find a connection between the longest common sub-sequence of A and B and
the shortest common super-sequence of A and B.
Solution:
(a) Assume that the sequences A and B are in arrays A[1, . . . , m] and B[1, . . . , n], respectively.
Further, for integers i ∈ {1, . . . , m} and j ∈ {1, . . . , n}, let A[1, . . . , i] and B[1, . . . , j] be
the substrings formed by the first i characters of A and the first j characters of B, respectively.
We introduce LCS [i, j] to denote the length of a longest common sub-sequence of A[1, . . . , i]
and B[1, . . . , j]. Hence, LCS [i, j] is the length of the longest common sub-sequence of
A and B. We can define LCS [i, j] recursively as follows. For i = 0 and j = 0 (if we
consider two empty strings), LCS [i, j] = LCS [0, 0] is 0. We clearly more generally have
LCS [i, 0] = LCS [0, j] = 0 for all i and j. Otherwise, we need to consider the following
cases:
(1) Both A[1, . . . , i] and B[1, . . . , j] end in the same character (i.e., A[i] = B[j]). In that
case, the longest common sub-sequence of A[1, . . . , i] and B[1, . . . , j] ends in A[i] =
B[j] and therefore,
∀i, j > 0 : A[i] = B[j] ⇒ LCS[i, j] = LCS[i − 1, j − 1] + 1.
1
(2) If A[i] 6= B[j], a common sub-sequence can end in at most one of the two (different)
characters A[i] and B[j] and thus,
∀i, j > 0 : A[i] 6= B[j] ⇒ LCS[i, j] = max{LCS[i, j − 1], LCS[i − 1, j]}.
To compute LCS [m, n], we set up an (n + 1) × (m + 1)-matrix, where LCS [i, j] is stored
in column i and row j. For each matrix element, we only need to consider at most 2 other
elements and we can therefore fill the whole matrix in time O(mn). The details are given in
Algorithm 1.
Input: Two sequences A[1, · · · , n] and B[1, · · · , m]
Output: The matrix LCS with m + 1 rows and n + 1 columns such that LCS[m, n]
returns the length of the longest common sub-sequence of A and B
for i ← 1 to n do
LCS[0, i] = 0;
end
for i ← 1 to m do
LCS[i, 0] = 0;
end
for i ← 1 to m do
for j ← 1 to n do
if A[j] = B[i] then
LCS[i, j] = LCS[i − 1, j − 1] + 1;
else
LCS[i, j] = max{LCS[i − 1, j], LCS[i, j − 1]};
end
end
end
return LCS[m, n];
Algorithm 1: Algorithm for the length of a longest common sub-sequence for A and B
(b) As shown in Figure 1, the first column and the first row are filled with zeroes, because
when an empty sequence is compared with a non-empty sequence, the length of the longest
common sub-sequence is zero.
Generally for all 1 ≤ i, j ≤ 8, if A[i] = B[j], LCS[i, j] is computed as LCS[i − 1, j − 1] + 1.
In Figure 1 this is indicated by a diagonal arrow &. If A[i] 6= B[j] and if LCS[i − 1, j] >
LCS[i, j − 1] (resp. LCS[i − 1, j] < LCS[i, j − 1]), the value LCS[i, j] is determined as
LCS[i−1, j] (resp. LCS[i, j−1]), which is depicted by a vertical arrow ↓ (resp. by a horizontal
arrow →). If A[i] 6= B[j] and LCS[i − 1, j] = LCS[i, j − 1], both values LCS[i − 1, j] and
LCS[i, j − 1] could be used to determine LCS[i, j] and therefore both arrows are added in
Figure 1. The length of the longest common sub-sequence of the two strings is LCS[8, 8] = 5.
To find the longest common sub-sequence of A and B, we just traverse the arrows backwards
through matrix starting in cell LCS[8, 8]. If there is a diagonal arrow pointing to the current
cell (the green dashed arrows in Figure 1), one move diagonally the the upper left neighbor
and returns the corresponding character. Otherwise, there is a vertical or horizontal arrow
(or even both) which point to the current cell. We just move to a cell from which there is
an arrow to the current cell and continue from there. The procedure outputs the characters
2
Ø
Ø 0
a
0
e
0
a
0
a
0
g
0
s
0
s
0
t
0
b 0
0
0
0
0
0
0
0
0
a 0
1
1
1
1
1
1
1
1
e 0
1
2
2
2
2
2
2
2
b 0
1
2
2
2
2
2
2
2
s 0
1
2
2
2
2
3
3
3
a 0
1
2
3
3
3
3
3
3
g 0
1
2
3
3
4
4
4
4
t 0
1
2
3
3
4
4
4
5
Figure 1: LCS Example.
of the LCS in reverse order. In Figure 1, the green arrows indicate these backward moves to
find the longest common sub-sequence.
(c) The shortest common super-sequence (SCS) problem of two sequences A and B is essentially equivalent to the LCS problem of A and B. From a common super-sequence C of
length ` ≥ max {m, n}, we get a common sub-sequence S of length at least m + n − `
as follows. Both sequences A and B appear as sub-sequences of C. Because C has only
length `, the two sub-sequences intersect in at least m + n − ` places and we therefore have
a common sub-sequence of that length. On the other hand, from a common sub-sequence of
length h, we get a common super-sequence of length m + n − h in an obvious way. To find
the shortest common super-sequence (SCS) of A and B, we execute the following steps:
(1) Compute a LCS of A and B by using the algorithm described in Parts (a) and (b).
(2) Move backwards along the (green) arrows as in (b). Whenever, we move on a horizontal
arrow, we add the corresponding character of A, whenever we move on a vertical arrow,
we add the corresponding symbol of B. If we move on a diagonal arrow, A and B have
the same character at the current position (the position from which we move). We then
add this symbol to the common super-sequence. As before, we get the SCS in reverse
order.
Exercise 2: Sharing an Inheritance
Assume that Alice and Bob are two siblings who inherit some goods from their deceased parents.
In order to share the inherited goods in a fair way, Alice and Bob decide about a value vi for each
inherited item i and they would like to share the goods so that both of them get items of the same
total value.
Assume that there are items 1, . . . , n and that each item i has an integer value vi > 0. Give a
dynamic programming algorithm that determines whether the n items can be partitioned into two
3
parts of equal total value. If the items can be partitioned like this, the algorithm should also allow
to compute such a partition. What is the time complexity of your algorithm? What assumptions
do you need to get an algorithm that runs in time polynomial in the total number of items n?
Solution: This problem can be solved by reduction to the knapsack problem. First, some initializations are needed so that the problem can be formulated as a knapsack instance. For each item
1 ≤ i ≤ n, we P
set the weight of item i equal to the corresponding value, i.e. wi := vi . We also
define V := 12 ni=1 wi . Note that to get a fair sharing, V has to be an integer (the total value
of all items has to be even). Otherwise, such a fair sharing is not possible. Then, we solve the
knapsack problem for n items and capacity V . If the total value of the optimal knapsack solution
is less than V , there is no fair way to share the items. Otherwise, the optimal knapsack solution
has value V , it corresponds to one side of the partition into two equal parts.
Input: A group of items with their values
Output: The possibility of sharing the items into two sets with equal total values
for i ← 1 to n do
wi := vi ;
end
P
V := 12 ni=1 wi ;
if V ∈
/ N then
return No
end
if Knapsack(n,V) < V then
return No
else
return Yes
end
Algorithm 2: Sharing an Inheritance
All the computations take O(n) time except for solving the knapsack instance which takes time
O(nV ) as we saw in the lecture. To get a running time that is polynomial in the number of items
n, we have to assume that all values are integers that are polynomially bounded in n.
Exercise 3: Stacking Boxes
We are given a set of n rectangular 3-D boxes, where the ith box has length `i , width wi , and
height hi , and where all three dimensions are positive integers. We further assume that for each
box i, `i ≥ wi . The objective is to create a stack of boxes which has a total volume that is as
large as possible, but one is only allowed to stack a box on top of another box if the dimensions
of the 2-D base of the lower box are each strictly larger than those of the 2-D base of the higher
box. Hence, it is only possible to put box j on top of box i if `i > `j and wi > wj . See Figure 2
as an illustration.
(a) As a first attempt, we might try to use a greedy algorithm. Because when stacking boxes on
top of each other, the lengths have to be strictly decreasing, a natural strategy might be to
first sort the boxes by decreasing length such that `1 ≥ `2 ≥ · · · ≥ `n . Then, we go through
the boxes in that order and add a box on the existing stack whenever possible. Show that the
behavior of this greedy algorithm can be arbitrarily bad!
4
(b) Instead of sorting the boxes by length, we could alternatively sort by decreasing area of the
2-D base. That is, we sort the boxes such that `1 w1 ≥ `2 w2 ≥ · · · ≥ `n wn . Show that
now, box j can only be placed on top of box i if j > i and therefore `j wj ≤ `i wi . We
again consider the greedy algorithm, but this time we go through the boxes according to the
new ordering where the boxes are sorted by decreasing base area. We further simplify and
assume that all boxes have height hi = 1. Show that if there is a stack of volume V , the
greedy algorithm finds a stack of volume at least Ω(V 2/3 ).
(c) As a greedy algorithm does not seem to lead to a solution of the problem, let us finally try
another approach. We again consider the general case where the heights of the boxes are
arbitrary integers. Give an efficient algorithm that solves the box stacking problem by using
dynamic programming. (The algorithm should return a maximum volume stack of boxes.)
What is the running time of your algorithm? (as a function of n)
Hint: It might still make sense to go through the boxes in order of decreasing base area.
Remark: This was an exam question in Fall 2013.
𝑤
ℓ𝓁 ≥ 𝑤
ℎ
ℎ
ℓ𝓁 ≥ 𝑤
𝑤
ℎ
𝑤
ℎ
ℓ𝓁 ≥ 𝑤
ℎ
ℓ𝓁 ≥ 𝑤
𝑤
𝑤
ℓ𝓁 ≥ 𝑤
Figure 2: A single box of length `i , width wi , and height hi , as well as a feasible stack of boxes.
Solution:
(a) Here is a simple bad example to show that the suggested greedy algorithm can behave arbitrarily bad. Assume that we have only two boxes, one has length `, width 1, and height 1.
The second box has length ` − 1, width ` − 1, and height h 1. The greedy algorithm only
takes the first box, resulting in a total volume of `. An optimal algorithm takes the second
box and gets a volume of (` − 1)2 · h. By choosing h sufficiently large, the ratio between the
two solutions can be made arbitrarily large.
(b) According to the assumptions, generally for any two boxes i and j, box j can only be placed
on top of box i if `i > `j and wi > wj . Because the boxes are sorted by base area, if j < i,
`j wj ≥ `i wi and therefore either `j ≥ `i or wj ≥ wi . Box j can thus not be place on top of
box i.
Let us now come to the suggested greedy algorithm and recall that the boxes are sorted
by decreasing base area such that `1 w1 ≥ · · · ≥ `n wn . We define Ai := `i wi to be the base
area of box i. Note that from hi = 1, it follows that the volume of box i is also equal to Ai .
The greedy algorithm takes at least box 1 and therefore obtains a stack of volume at least A1 .
Consider an optimal solution and assume that box i is part of that solution. The base area
5
of box i is Ai and thus its volume is Ai ≤ A1 . Further,
√ because all boxes have base area at
. Because widths are assumed to be
most A1 and because wi ≤ `i , for each box i, wi ≤ A1√
integers, any feasible stack therefore has height at most A1 (to stack boxes on top of each
other, the lengths and the widths have to be strictly decreasing). The total volume V of any
√
3/2
feasible stack is therefore at most V ≤ A1 · A1 = A1 . The volume of the greedy solution
therefore is at least A1 ≥ V 2/3 .
(c) To get an efficient dynamic programming algorithm we need to find a recursive definition of
the problem. We follow the hint and assume that the boxes are sorted in non-increasing order
of their base areas (this preparation step takes O(n log n) time). We define V (i) to be the
maximum volume of a feasible stack of boxes where box i is the box on top. We can write
V (i) recursively as follows,
V (i) =
max
j<i:`j >`i ,wj >wi
{V (j)} + `i wi hi .
(1)
The optimal subproblems considered in our computation therefore are the optimal stacks of
boxes if a specific box is required to be on top. Note that in an optimal solution, potentially
every box can be the one at the top. At the end, the optimal solution to the problem can be
determined by taking the maximum over all the values V (i) for i ∈ {1, . . . , n}:
max
{V (i)}.
i∈{1,2,··· ,n}
Overall, The algorithm requires time O(n2 ) because to determine V (i), we potentially have
to check the values of i − 1 ≤ n other values V (j) (for j < i). An optimal stack (rather than
just the volume of an optimal stack) can be obtained by remembering for each box i, on top
of which other box it has to be placed to get an optimal stack with box i as the top box. That
is, for each V (i), we store, which value j < i maximizes the recursive rule in (1).
6