How to allocate review tasks for robust ranking ∗ Dorit S. Hochbaum
Transcription
How to allocate review tasks for robust ranking ∗ Dorit S. Hochbaum
How to allocate review tasks for robust ranking∗ Dorit S. Hochbaum† Asaf Levin‡ June 21, 2010 Abstract In the process of reviewing and ranking projects by a group of reviewers, the allocation of the subset of projects to each reviewer has major impact on the robustness of the outcome ranking. We address here this problem where each reviewer is assigned, out of the list of all projects, a subset of up to k projects. Each individual reviewer then ranks and compares all pairs of k projects. The k-allocation problem is to determine an allocation of up to k projects to each reviewer, that lie within the expertise set of the reviewer, so that the resulting union of reviewed projects has certain desirable properties. The k-complete problem is a k-allocation with the property that all pairs of projects have been compared by at least one reviewer. A k-complete allocation is desirable as otherwise there may be projects that were not compared by any reviewer, leading to possible adverse properties in the outcome ranking. When a k-complete allocation cannot be achieved, one might settle for other properties. One basic requirement is that each pair of projects is comparable via a ranking path which is a sequence of pairwise rankings of projects implying a comparison of all pairs on the path. A k-allocation with a ranking path between each pair is the connectivity-k-aloc. Since the robustness of relative comparisons deteriorates with increased length of the ranking path, another goal is that between each pair of projects there will be at least one ranking path that has at most two hops or q hops for fixed values of q. An alternative means for increasing the robustness of the ranking is to use a k-allocation with at least p disjoint ranking paths between each pair. We model all these problems as graph problems. We demonstrate that the CONNECTIVITY-kALOC problem is polynomially solvable, using matroid intersection; we prove that the k-complete problem is NP-hard unless k = 2; and we provide approximation algorithms for a related optimization problem. All other variants are shown to be NP-complete for all values of k ≥ 2. Keywords: Approximation algorithms, allocation problem, maximum coverage problem. 1 Introduction The k-allocation problem arises in the context of a committee of reviewers that is to evaluate and rank a set of projects. The task of evaluating all the projects is an excessive workload for a single reviewer. Each reviewer has a maximum workload of, say, k projects that they can evaluate. Additionally, the ∗ An extended abstract version of this paper appeared with the title “The k-allocation problem and its variants” in Proceedings of the Fourth Workshop on Approximation and Online Algorithms (WAOA 2006) † Research supported in part by NSF award No. DMI-0620677 and CBET-0736232. Department of Industrial Engineering and Operations Research and Walter A. Haas School of Business, University of California, Berkeley. email: [email protected] ‡ Chaya fellow. Faculty of Industrial Engineering and Management, The Technion, 32000 Haifa, Israel. email: [email protected] 1 2 reviewers can only evaluate those projects that fall within their area of expertise. The allocation of up to k projects per reviewer within their expertise set is said to be a k-allocation. An example of a scenario where such allocation takes place is for a National Science Foundation panel that is to select the best few among the submitted proposals to fund. The director of the panel usually focuses on the task of ensuring that each proposal is reviewed by at least a minimum number of reviewers (say, at least 3) and that the allocation to each reviewer lies within their area of expertise. One pitfall of the allocation is that in the review process it is possible, and indeed has happened on occasions, that the two leading ranked projects are not comparable, as no reviewer reviewed both of them. It is thus conceivable that there is a partition of the projects into two disjoint sets, and while both these projects are best in their respective sets, one project can actually lie below the worst ranked project in the set where the other one is best. One way of addressing this pitfall is to allocate the projects so that each pair is also assigned to at least one reviewer. This goal however is not always achievable for a given configuration of reviewers and their expertise sets, as it adds to the total workload. We are then considering here this goal, as well as alternative and weaker goals that still provide a form of comparison between any pair of projects, albeit an indirect comparison. In a general group decision making scenario, the major challenge is to come up with an aggregate ranking that reflects the opinions of all reviewers and is fair and representative. There is a large body of literature that addresses this challenge. This literature is reviewed in Kemeny and Snell [23], Brans and Vincke [4], Bartholdi, Tovey and Trick [3], Keener [22], Fuller and Carlsson [14], and Fernandez and Olemdo [12], and [19]. As explained above, the aggregate ranking may be affected by the assignment of projects to individual reviewers. Still, the allocation of projects and its impact on the quality of the resulting aggregate ranking is an aspect of group decision that is often overlooked. Cook et al. [8] are the only researchers that have explicitly addressed this issue. A convenient formalism for the problems and the associated models is as graph representation. The input to the problem is an undirected complete graph G = (V, E) defined on the set of projects V and all possible pairs (edges) E, an integer number k and a collection of node subsets representing the expertise set of each of L reviewers, S1 , . . . , SL ⊆ V . In order to maintain a reasonable and balanced workload, each reviewer is assigned at most k projects out of the set of possible projects, Sj . So a feasible allocation consists, for each j ∈ {1, 2, . . . , L}, of a subset Vj ⊆ Sj such that |Vj | ≤ min{|Sj |, k}. Each reviewer is able to make a direct comparison between each pair of projects he/she reviews. The set of pairs compared by each reviewer forms a clique (a complete subgraph) of size |Vj |, CVj (i.e., CVj is the edge set of a clique over the node set Vj ). For a given feasible L allocation, the set of covered projects is ∪L j=1 Vj , and the set of compared project pairs is ∪j=1 CVj . The properties of the graph of the edges covered by this union of cliques are closely related to the quality of the ranking decision that can be achieved. Let the review graph of covered projects and compared project pairs be GR = (V R , E R ) where R L R V R = ∪L j=1 Vj and E = ∪j=1 CVj . The graph G is a multigraph – that is, there could be multiple edges between some pairs of nodes (because more than one reviewer reviews this pair). A pair of projects i, j ∈ V R is said to be directly compared if edge [i, j] ∈ E R . For a directly compared pair there is input from at least one reviewer on the extent of preference of one project to the other. This relative rank comparison is typically expressed in an additive form or in a multiplicative form. A detailed discussion on intensity of preferences and the additive versus the multiplicative forms of preferences is provided in [19]. We will use throughout the additive form in which pij expresses by how much the rank of i exceeds the rank of j . So pji = −pij and the magnitude of pij is the intensity of the preference of i to j. Each (undirected) edge [i, j] in the graph GR is formed of a pair of (directed) arcs (i, j), (j, i) with the associated values pij and pji . 3 Although some pairs may not be directly compared by a given allocation, we can deduce a relative ranking of two projects i and j if there exists a sequence of directly compared pairs: [i, i1 ], [i1 , i2 ], . . . , [ip−1 , j] ∈ E R . Such sequence corresponds to a path in the graph GR and the implied ranking of this P path is pij = p−1 q=0 piq ,iq+1 where i = i0 , j = ip . We call this path a ranking path of length p. A direct comparison is then a ranking path of length 1. Since the process of evaluating projects and comparing them is not accurate, an implied ranking by a long path may be impacted by cumulative errors in the comparisons. This effect can be mitigated by the presence of multiple ranking paths between given pairs or by having the ranking paths of bounded length. The presence of multiple ranking paths between all pairs correspond to the increased edge, or node, connectivity of the review graph. As an illustration of these concepts consider the graph GR in Figure 1. In this graph we take k = 2, and therefore each reviewer reviews only a pair of projects. The endpoints of each edge form the allocation to one reviewer. The intensities of the preferences are given as pij for i < j. In this graph the pair of projects 1 and 2 is reviewed by two different reviewers. There are four implied ranking paths between projects 1 and 5 of intensities 2, 4, .5 and −2. Among those the value .5 is the intensity of a direct comparison. Figure 1: An illustration review graph GR . The numbers along the edges are the intensity pij for i < j. Preliminaries and notations. For a graph H we denote by nH and mH the number of nodes and edges, respectively, in H. For an integer `, an `-subset is a subset of ` nodes. A polynomial time algorithm A for a minimization problem (maximization problem) is a ρapproximation algorithm if it always returns a feasible solution whose objective value is at most (at least) ρ times the optimum. List of goals and paper outline. The problem of allocating evaluation tasks to reviewers can be cast as the H-graph k-clique cover problem defined as follows. Given a set V , L sets S1 , . . . , SL ⊆ V , and an integer k, find subsets L V1 , . . . , VL , with Vi ⊆ Si , and |Vi | ≤ k, so that the review graph GR = (∪L i=1 Vi , ∪i=1 CVi ) has property H. We are interested in the following properties H: • GR is connected (or containing a spanning tree) – discussed in Section 2. • GR is a complete graph – discussed in Section 3. • GR is highly connected, where we consider several alternative definitions of this term. This concept and related optimization problems are investigated in Section 4 and Section 5. 4 • GR has a path consisting of at most q hops between each pair of nodes – considered in Section 6. Related Research. The Perron-Frobenius Theorem states the algebraic conditions that guarantee a positive unique solution eigenvector r to the system Ar = λr. This subject is related to aggregate ranking when the matrix A represents the pairwise comparisons between all pairs of objects. Further, each column of the matrix can be viewed as the ranking provided by one reviewer. The eigenvector r, if exists, is a principal eigenvector – corresponding to the largest eigenvalue. This theorem and the generated principal eigenvector r have been used for decades to generate an aggregate ranking from a matrix of rankings that can be viewed as provided by different reviewers. The condition for the existence of such eigenvector is that the matrix A is irreducible. (For a statement of the theorem see e.g. [22].) The irreducibility of the matrix is equivalent to the connectivity of GR . 2 The CONNECTIVITY-k-ALOC problem A basic requirement is the ability to compare each pair of projects once the review process is done. To that end we require that every pair of projects is comparable via a ranking path. In terms of the review graph GR this goal is to find an allocation so that GR is connected. The CONNECTIVITY-kALOC problem is to find a feasible allocation so that GR is connected. In this section we consider the CONNECTIVITY -k- ALOC problem, and show that it is polynomially solvable as we can recast it as an instance of an intersection of two matroids. Theorem 2.1 The CONNECTIVITY-k-ALOC problem is solvable in polynomial time O((|V | + L)3 ). If the number of reviewers is fixed then the problem is solvable in O((|V | + L) log(|V | + L)) time. Proof: We rephrase our problem as follows: Given a bipartite graph B = (P, R, E) with the two sets of nodes P and R corresponding to the project set and the reviewer set, respectively, and the set of edges E, with edge [p, r] ∈ E if and only if reviewer r can review project p (i.e., p belongs to the expertise set of reviewer r). The goal is to find a spanning tree in B such that the degree of each node in R is at most k (if such a spanning tree exists). We show that the CONNECTIVITY-k-ALOC problem is equivalent to this last problem. Although in the CONNECTIVITY-k-ALOC problem we are actually interested in a tree that is not necessarily spanning as not all reviewer nodes have to be included in the tree, yet this tree must include the entire project set P within a common connected component. W.l.o.g. we can connect each reviewer node to (at least one) project node and hence convert the tree into a spanning tree. The degree bound restriction on the nodes r ∈ R is equivalent to the constraint that r can review at most k projects. Hence, our problem is a minimum spanning tree in a bipartite graph with degree bounds on the nodes of one side R of the bipartite graph. This problem is polynomial time solvable by any polynomial time algorithm for intersection of two matroids. The first matroid is the graphic matroid (and hence our solution will be a spanning tree), and the second matroid is a partition matroid defined over E where a subset E 0 of the edges is independent in the partition matroid if the degree of each node r ∈ R in the graph (P ∪ R, E 0 ) is at most k. We note that the independence testing oracle of each of the above matroids can be implemented in linear time (using Breadth-First-Search for the graphic matroid and simple counting for the partition matroid). Therefore, any polynomial time (unweighted) matroid intersection algorithm can be applied to solve the CONNECTIVITY-k-ALOC problem. We can use either the algorithm of [5] or the algorithm of [6] for matroid intersection. The latter algorithm of Bresovac et al. 1986, [6], is specialized for matroid intersection when one of the matroids is a partition matroid as is the case here. The running time of that algorithm in our case is O(n3B ) where nB = |P | + |R|. 5 We note that if the number of reviewers is a fixed constant, then the degree bounds apply only to a fixed number of nodes. Therefore, the more efficient algorithm of Frederickson and Srinivas 1989, [13], for this matroid intersection problem can be applied. In this case the time complexity of the resulting algorithm is O(nB log nB ). Hence, the claim follows. 3 The k-COMPLETE and the MAX k- COMPLETE COVERAGE problems If an appropriate allocation exists, then it is desirable that all pairs of projects should be directly comparable. Cook et al. [8] recently studied the problem of maximizing the number of directly comparable projects pairs. The problem is therefore to determine the subsets Vj so that the union S of the edges in the complete graphs (or cliques) induced on Vj , CVj , | L j=1 CVj |, is maximum. We call this problem the MAX k- COMPLETE COVERAGE problem. The k- COMPLETE problem is the problem of deciding if a given complete graph G has an optimal solution of the MAX k- COMPLETE COVERAGE problem that equals the number of edges in G. So the MAX k- COMPLETE COVERAGE problem is a more general problem than the k- COMPLETE problem, whereby the problem is defined on an undirected graph G = (V, E) which is not necessarily complete. The goal is to select sets S Vj ⊆ Sj of size at most k, that maximize the number of edges, | L j=1 CVj ∩ E|. The k-COMPLETE problem and the MAX k- COMPLETE COVERAGE problem were recently studied by Cook et al. [8] who gave integer programming formulations and a branch-and-bound (exponential time) algorithm for solving these problems as well as a heuristic algorithm. In this section we study the MAX k- COMPLETE COVERAGE problem and the k-COMPLETE problem. We first classify the complexity status of MAX k- COMPLETE COVERAGE problem as a function of k showing that for k = 2 it is polynomially solvable whereas for all k ≥ 3 the k-COMPLETE problem is NP-complete showing that the MAX k- COMPLETE COVERAGE problem is NP-hard for all fixed values of k such that k ≥ 3. We then turn our attention into approximation algorithms. 3.1 Complexity classification In this section we prove that the MAX 2- COMPLETE COVERAGE is polynomially solvable whereas the k-COMPLETE for each fixed value of k (k ≥ 3) is NP-complete. The k = 2 case: Proposition 3.1 The MAX 2- COMPLETE COVERAGE problem is solvable in time O(min{mG L1.5 , L2.5 / log(mG + L)}). Proof: To solve the MAX 2- COMPLETE COVERAGE problem we construct the following bipartite graph B 0 . The left side of the graph has L nodes corresponding to the sets Sj and the right side has mG nodes corresponding to the edges of G. There is an edge in B 0 between Sj and e ∈ E if and only if the set Sj contains both endpoints of the edge e. Figure 2 illustrates an example of such graph for G a complete graph on 4 nodes, where the two reviewers’ expertise sets are {1, 2, 3} and {1, 2, 4}. In this bipartite graph we find a maximum matching. For each edge [Sj , e] that belongs to the matching we define the set Vj to be the pair of end-nodes of e. For each un-matched set Sj we define Vj to be an arbitrary subset of Sj with two nodes. Any feasible matching in B 0 corresponds to a MAX 2- COMPLETE COVERAGE solution with the same objective value and vice versa. Therefore, the MAX 2- COMPLETE COVERAGE problem is solvable by the above procedure. The complexity of the most efficient bipartite matching algorithm is √ dominated by the expression O(min{mB M , M 2.5 / log(nB )}), [18], where M ≤ L is the size of the maximum matching in B. Note that nB = L + mG and mB ≤ LmG . 6 [1, 3] [2, 3] {1, 2, 3} [1, 2] [3, 4] {1, 2, 4} [2, 4] [1, 4] Figure 2: The bipartite graph resulting for sets {1, 2, 3} and {1, 2, 4} for the COVERAGE problem on a complete four node graph. MAX 2- COMPLETE The k ≥ 3 case: We now show that the k-COMPLETE problem is NP-hard for each fixed value of k such that k ≥ 3. Our proof is based on a reduction from the following problem: An Hdecomposition of a graph G = (V, E) is a partition of E into subgraphs isomorphic to H. Given a fixed graph H the H - DECOMPOSITION PROBLEM is to determine whether an input graph G admits an H-decomposition. Holyer [20] proved that H-decomposition is NP-complete even for H that is a complete graph over at least three nodes. Since then a stronger result was proved by Dor and Tarsi, [9], demonstrating that if H is connected with at least three edges then H-decomposition is NP-complete. Our proof of NP-hardness of k-COMPLETE is based on reduction from H-decomposition problem for H that is a complete graph over k nodes. Proposition 3.2 k-COMPLETE is NP-hard for all fixed values of k such that k ≥ 3. Proof: We describe a reduction from H-decomposition where H is the complete graph over k nodes. Let G = (V, E) be an input graph for the H-decomposition problem. If there is a H-decomposition of ¡k¢ the graph, then the number of H-isomorphic subgraphs in the H-decomposition is |E| p where p = 2 . Let E be the set of edges missing in G (to make it a complete graph). For each [u, v] ∈ E, let k−2 1 2 v[u,v] , v[u,v] , . . . , v[u,v] be a set of k − 2 new nodes (that do not belong to V ) corresponding to [u, v]. 0 Let V be the set of new nodes for all the edges in E. The instance of the k-COMPLETE is defined as follows: The graph is the complete graph over V ∪ V 0 . We define |E| p sets S1 , S2 , . . . , S |E| each of p which equals V . These sets are called type 1 sets. We define another |E| sets each corresponding to k−2 1 2 , v[u,v] , . . . , v[u,v] }. These an edge in E. The set corresponding to [u, v] ∈ E is S[u,v] = {u, v, v[u,v] are called type 2 sets. A type 2 set has exactly k elements, and therefore w.l.o.g. there is an optimal solution for the resulting k-COMPLETE instance with V[u,v] = S[u,v] . These type 2 sets span disjoint subsets of edges, each of them contains exactly p edges. By a counting argument the hvalue of ani optimal solution of the k-COMPLETE instance is p times the number of subsets (i.e., p · |E| p + |E| ) if and only if G has an H-decomposition. Therefore, k-COMPLETE is NP-hard for each fixed value of k such that k ≥ 3. 7 Naturally, the proof of the above proposition demonstrates that the MAX k- COMPLETE COVER problem is NP-hard also when G is not restricted to be a complete graph, since this problem generalizes the problem on complete graphs that is NP-hard. AGE 3.2 Approximation algorithms Motivated by the NP-hardness of the MAX k- COMPLETE COVERAGE problem we turn our attention to approximation algorithms. In this section we describe three distinct approximation algorithms. The 1 first one is a trivial one and provides for even values of k at least k−1 times the value of an optimal 1 solution, and for odd values of k at least k times the value of an optimal solution. The other two algorithms relates to two auxiliary problems. Namely, the MAXIMUM COVERAGE PROBLEM and the DENSEST k- SUBGRAPH PROBLEM . The MAXIMUM COVERAGE PROBLEM is defined on a given a collection of elements and a collection F of subsets of the element set. The objective is to select up to L members of F that cover a maximum number of elements. The MAXIMUM COVERAGE WITH CARDINALITY CONSTRAINTS PROBLEM (MCCC) is a variant of the maximum coverage problem where F is partitioned into L sub-collections F 1 , F 2 , . . . ,F L , and the constraint restricting the choice of at most L subsets from F is replaced by a set of constraints enforcing for each j the choice of a single member of F j . The DENSEST k- SUBGRAPH PROBLEM is defined on an undirected graph G = (V, E) and an integer number k. The goal is to find a subset V 0 ⊆ V of at most k nodes so as to maximize the number of edges in the induced subgraph of G over V 0 . This problem is known to be NP-hard and the current best known approximation algorithm for this problem has an approximation ratio of O(n−1/3+δ ) for a positive fixed number δ [11]. Moreover, there is an O( nk )-approximation algorithm for this problem (see for example [2]). The second approximation algorithm for non-fixed values of k is based on the greedy algorithm that was analyzed by Chekuri and Kumar [7]. We show that if there is a ρ-approximation algorithm for the MAX k- COMPLETE COVERAGE problem then there is a ρ-approximation algorithm for the densest k-subgraph problem, and if there is a ρ-approximation algorithm for the densest k-subgraph problem then there is a 12 ρ-approximation algorithm for MAX k- COMPLETE COVERAGE problem. This latter result shows that the two problems for non-fixed values of k are almost equivalent as far as approximation algorithms are concerned. The third algorithm that we ³ show ´r is using Ageev and Sviridenko’s [1] approximation algorithm for MCCC to derive an 1 − 1 − 1r -approximation algorithm for r being the maximum number of k-subsets that contain a given edge (the maximum is over all edges). So r can be as large as O(nk−2 ). This algorithm is based on solving a linear programming relaxation of the problem with the number of variables as large as the number of all possible k-subsets Vj . We note that for fixed value of k, using the last result in order to approximate the densest k-subgraph problem is superfluous, because using a similar time complexity we can enumerate all the k-subsets and pick the densest k-subgraph. 3.2.1 The trivial algorithm The so-called trivial algorithm is a generalization of the matching procedure used to solve the MAX 2COMPLETE COVERAGE problem. Given an instance of MAX k- COMPLETE COVERAGE we construct a bipartite graph B = (A1 ; A2 , EB ) as before. For each subset Sj there is a corresponding node in A1 denoted by vSj , and for each edge e of G there is a corresponding node in A2 denoted by ue . There is an edge (vSj , ue ) ∈ EB if both endpoints of e belong to Sj . 8 A b-matching in the bipartite graph B is a set of edges M that has up to bi edges adjacent to node i. For each node i in A1 , bi = b k2 c, and for each node j in A2 , bj = 1. An optimal b-matching has a maximum number of edges among all b-matchings. From the optimal b-matching, M , we generate a feasible solution to the MAX k- COMPLETE COVERAGE by setting, for each Sj , the subset Vj is the one consisting of the endpoints of its matched edges {eq ∈ A2 |(sj , eq ) ∈ M } in the b-matching. ³ Theorem 3.1 The trivial approximation algorithm is a ³ ´ ues of k and a O(nmL log n). 1 k 1 k−1 ´ -approximation algorithm for even val- -approximation algorithm for odd values of k. The complexity of the algorithm is Proof: The solution delivered by the algorithm is feasible: Each Vj consists of the endpoints of at most b k2 c edges and thus has at most k nodes and is contained in Sj . The algorithm is clearly a polynomial-time algorithm since b-matching is a polynomial problem. Using the algorithm for bmatching in bipartite graphs of Gabow and Tarjan [15, 16] the running time is O(nmL log n) (for more information on this problem and related results see Chapter 21 in Schrijver [26]). It remains to prove the approximation ratio of the algorithm. Consider an optimal solution for the MAX k- COMPLETE COVERAGE instance. For each Sj the optimal solution has a subset Vj∗ of at most k nodes. We construct an equivalent disjoint collection of sets Ej∗ by assigning for each edge e of G with both endpoints in a common Vj∗ , to the least index set Vj∗ that contains both endpoints of e. For each Ej∗ , if there are at least b k2 c assigned edges, then an optimal b-matching has a set of b k2 c assigned edges to Sj . For sets Ej∗ with fewer edges than b k2 c, ¡ ¢ the b-matching can assign the entire set of edges Ej∗ . Since |Ej∗ | ≤ k2 , the approximation ratio of the trivial approximation algorithm is at least values of k this is k1 . b k2 c (k2) . For even values of k this is 1 k−1 , whereas for odd 3.2.2 Applying the greedy algorithm when k is not fixed We denote by ρ the approximation factor of an approximation algorithm for the densest k-subgraph problem (so ρ = O(n−1/3+δ ) < 1, or ρ = O( nk )). The greedy algorithm iteratively picks subsets that cover, each in turn, the maximum number of uncovered elements. In each step the subset picked is only a subset of Sj such that the algorithm did not select earlier another subset of Sj . Chekuri and Kumar [7] proved that this algorithm is a 1 2 -approximation algorithm. If in each step of the greedy algorithm, instead of picking the subset that covers the maximum number of uncovered elements (among the subsets of Sj such that the algorithm did not select earlier another subset of Sj ), the algorithm picks a subset that covers at least β times the maximum number of uncovered elements (note that β ≤ 1), then Chekuri and Kumar showed that the resulting approximation ratio is 12 β. Theorem 3.2 When k is not a fixed constant, if there is a ρ-approximation algorithm for the densest ksubgraph problem, then there is a 12 ρ-approximation algorithm for the MAX k- COMPLETE COVERAGE problem. Hence there is an O(min{n−1/3+δ , nk }) < 1 approximation algorithm for the MAX kCOMPLETE COVERAGE problem. Proof: The proof follows because if the algorithm picks a subset that covers at least β times the maximum number of uncovered elements (note that β ≤ 1), then Chekuri and Kumar showed that the resulting approximation ratio is 12 β. We next show that when k is not fixed, then we can design such 9 a greedy algorithm that in each step picks at least β = ρ times the maximum number of uncovered elements. To see this, we apply the following procedure for each j such that the algorithm did not select earlier another subset of Sj : We construct an auxiliary graph Gj = (Sj , Ej ) where Ej is the set of edges from G that connects two nodes of Sj that the algorithm did not cover earlier. In this auxiliary graph we find an approximated densest k-subgraph using the algorithm of [11], and denote its node set by Vj . We pick the subset Vj that covers most elements among the different values of j. Since in each step we use a ρ-approximation algorithm for computing Vj , we conclude that the subset that we return, covers at least ρ times the maximum number of elements that can be covered using one subset at this step. Proposition 3.3 If there is a ρ0 -approximation algorithm for the MAX k- COMPLETE COVERAGE problem, then there is also a ρ0 -approximation algorithm for the densest k-subgraph problem. Proof: Note that one can set one reviewer with S = V , and the resulting MAX k- COMPLETE COVER AGE instance is exactly the instance of the densest k-subgraph problem on the same graph. Therefore, we conclude the claim. 3.2.3 Transforming the MAX k- COMPLETE COVERAGE problem into MCCC when k is fixed For each Sj we write down the list of all subsets of Sj that have exactly k elements. Denote by F j S the resulting family of k-subsets of Sj , and if |Sj | < k we let F j = {Sj }. Denote F= j F j . The MAX k- COMPLETE COVERAGE problem is to choose one set from each F j such that the number of covered edges with endpoints in a common set is maximized. The resulting problem is an instance of the MAXIMUM COVERAGE WITH CARDINALITY CONSTRAINTS PROBLEM . The size of this instance is polynomial if we assume that k is fixed. Theorem 3.3 When k is a fixed constant, there is a polynomial time (1 − 1e )-approximation algorithm for the MAX k- COMPLETE COVERAGE problem. Proof: The algorithm of Ageev and Sviridenko [1] is based on solving a linear programming relaxation of the problem, and then rounding the resulting solution. The formulation solved is the linear programming relaxation of the following integer program: max s.t. P P j zj i∈Jj xi ≥ zj ∀j i∈Iq xi = 1 ∀q = 1, 2, . . . , L P xi , zj ∈ {0, 1} ∀i, j In this integer program xi is an indicator variable for each k-subset that is contained in some Sj . Note that the same k-subset, if contained in more than one Si , will have several variables, one for each subset it is contained in. This variable is set to 1 if the corresponding k-subset contained in some particular subset Sj is selected for the solution. There is a binary variable zj for each edge, which takes the value 1 if the edge is covered by the solution. The index set Jq is the set of all k-subsets that contains the edge eq . The index set Iq is the set of all k-subsets corresponding to Sq . So the constraints say that if an edge is covered then at least one k-subset that contains both its endpoints is picked, and that we choose exactly one k-subset for each Sq . 10 ³ ´r The approximation ratio of the algorithm is 1 − 1 − 1r where r is the largest number of 1’s in a row in the above mathematical programming formulation. That is, r is the maximum size of a family of k-subsets each containing a given edge (the maximum is over all edges) such that each such edge is contained in some Sj . Clearly, this number can be approximately as large as nk−2 . Therefore, the bound is approximately 1 − 1e , and we establish the claim. 4 The (k, p)-RCon, the (k, p)-NCon and the (k, p)-ECon problems The standard definitions of edge-connectivity and node-connectivity are as follows: A connected graph H = (U, F ) is p-edge connected if the removal of up to p−1 edges from F results in a connected graph. A connected graph H = (U, F ) is p-node connected if the removal of up to p − 1 nodes from U results in a connected graph. We also define a new connectivity measure of the review graph that we call reviewer-connectivity defined as follows: A review graph GR is a p-reviewer connected if the removal of all the edges corresponding to at most p − 1 reviewers from GR results in a connected graph. We say that a pair of projects has p reviewer disjoint ranking paths between them, if the removal of at most p − 1 reviewers from the review graph keep the two projects in the same connected component of the resulting graph. In order to increase the reliability of the implied rankings it is desirable that there will be more than a single ranking path between each pair. To that end we require that there are at least p edge disjoint ranking paths in GR between each pair of projects. That is, the removal of at most p − 1 pairwise comparisons by a given reviewer results a connected review graph. In the example in Figure 1 there are four ranking paths between 1 and 5 with only three of them edge-disjoint. In this graph nodes 3 and 4 have only two edge disjoint paths between them. So this allocation is a solution to the p = 2 edge disjoint requirement. The second review graph shown in Figure 3 is 3-edge connected. The associated problem is to maximize the number of pairs of projects that have at least p edge disjoint ranking paths between them. This associated problem is called the (k, p)- EDGE CONNECTIVITY ALLOCATION problem denoted as (k, p) − ECon. Similarly to the p-edge connectivity problem, in order to increase the reliability of the implied rankings, we ask that that the review graph will be p reviewer connected. That is, the removal of at most p − 1 reviewers results in a connected review graph. The motivation for studying this problem is the fact that the implied rankings of some pairs depend on very few reviewers and this situation can skew the results. The associated problem is to maximize the number of pairs of projects that have at least p reviewer disjoint ranking paths between them. This associated problem is called the (k, p)- REVIEWER CONNECTIVITY ALLOCATION problem denoted as (k, p) − RCon. We note that when k = 2 the (2, p)-ECon and the (2, p)-RCon problems are equivalent problems. However, for larger values of k, the notion of reviewer connectivity is different from that of edge connectivity. Similarly to the p-edge connectivity problem and the (k, p)-reviewer connectivity allocation problem, in order to increase the reliability of the implied rankings, we ask that there are at least p node disjoint ranking paths in GR between each pair of projects. That is, the removal of at most p − 1 projects results in a connected review graph. The motivation for this is that low node connectivity indicates that the implied rankings of some pairs depend on very few projects and can skew the results. In the example in Figure 1 the review graph is only 1 node connected, as there are no two node disjoint paths between node 1 and node 6. Therefore, the implied ranking of nodes 1 and 6 depends only on the relative strength of project 5. When project 5 is particularly strong, then the implied ranking of projects 1 and 6 may not be meaningful as the extent of the differentiation between them is dominated by the strength of project 5. This example can be magnified in a star review graph such as the one 11 Figure 3: A review graph GR with star topology. shown in Figure 3. The review graph in this example is a 1-node connected. The associated problem is to maximize the number of pairs of projects that have at least p node disjoint ranking paths between them. This associated problem is called the (k, p)- NODE CONNECTIVITY ALLOCATION problem denoted as (k, p) − N Con. Theorem 4.1 The (k, 2)-RCon problem is NP-complete for all fixed values of k such that k ≥ 2. Proof: To show that the (k, 2)-RCon problem is in NP we demonstrate that when given a feasible solution to the problem, the feasibility of it can be tested efficiently as follows. We test that each reviewer is allocated at most k projects from its set, and to check the 2-reviewer connectivity, we construct the review graph GR , and we test if it is 2-reviewer connected. This can be done efficiently by testing for each reviewer if the removal of the edge set that corresponds to his/her direct comparisons, results in a connected graph. To show that the problem is NP-complete, we present a reduction from Hamiltonian cycle in bipartite graphs that is NP-complete (see problem [GT37] in [17]). Given a bipartite graph B = (C, D, E) with sides C and D, of the Hamiltonian cycle problem, we construct an instance to the (k, 2)-RCon problem as follows. The reviewer set is {r, r0 : r ∈ D}. I.e., for each node r in D we will have a pair of reviewers associated with this node. For each r ∈ D, we construct a set of k − 1 new projects Pr = {pjr |1 ≤ S j ≤ k − 1}. Our project set will be C ∪ r∈D Pr . Reviewer r and r0 can review the projects in the set Pr ∪ {i ∈ C|(i, r) ∈ E}. So in order to obtain a 2-reviewer connected solution for each project in Pr there must be (at least) two reviewers that review the project. Therefore, each of the reviewers r and r0 must review all the projects in Pr , and another project from the set C (that is adjacent to r in G). We now claim that G is Hamiltonian if and only if the (k, 2)-RCon instance is feasible. Assume that G is Hamiltonian, then if [i1 , r], [i2 , r] are edges in the Hamiltonian cycle, then we assign reviewer r the project i1 and assign reviewer r0 the project i2 . We also assign both r and r0 the project set Pr . Then, the resulting graph of direct comparisons among projects is 2-reviewer connected (as there is a cycle over the nodes in C and the other nodes are connected to this cycle via a pair of edges). Therefore, in this case the (k, 2)-RCon instance is feasible. Assume that the (k, 2)-RCon instance is feasible. Then, as stated above reviewer r and also reviewer r0 must review the project set Pr and another project from C. We consider the induced sub- 12 graph of GR over the node set resulting from V R by deleting from V R for all r ∈ D all nodes in Pr S except one representative. I.e., the subgraph induced by C ∪ r∈D {p1r }. In this induced subgraph each node has degree two. Moreover, since GR is 2-reviewer connected, it is also 2-edge connected, and therefore this induced subgraph is also 2-edge connected. By a counting argument this induced subgraph is a Hamiltonian cycle. Now for each pair of nodes from u, v ∈ C such that along this Hamiltonian cycle there is a path between u and v consisting of two edges, there is a node r ∈ D such that along this Hamiltonian cycle p1r is adjacent to both u and v. Therefore, we can place r between these two projects nodes to obtain an Hamiltonian cycle of the bipartite graph B. Therefore, B is Hamiltonian. Proposition 4.1 The (k, 2)-NCon problem is NP-complete for all fixed values of k such that k ≥ 2. Proof: We argue that the same proof of Theorem 4.1 shows the NP-completeness of the (k, 2)-NCon problem as well, where we look for an allocation so the resulting review graph is 2-node connected. To see this first note that the (k, 2)-NCon problem is in NP because we can guess the solution, construct the review graph and check that the review graph is 2-node connected. Next, we use the same reduction from the Hamiltonian cycle problem in bipartite graph as devised in the proof of Theorem 4.1. In this reduction, if GR is 2-node connected it is also 2-reviewer connected, and therefore the bipartite graph is Hamiltonian. On the other hand if the bipartite graph B is Hamiltonian, then the resulting solution as stated for the (k, 2)-RCon problem is also feasible to the (k, 2)-NCon problem. Theorem 4.2 The (k, p)-NCon problem is NP-complete for all fixed values of p and k such that p ≥ 2 and k ≥ 2. Proof: The (k, p)-NCon problem is in NP because given an instance for this problem we can guess the resulting solution, then we can compute the review graph and verify that it is p-node connected graph by computing the global min-cut in a node weighted graph (where the cut has to separate two non-empty node sets). In order to show that the problem is NP-complete, we will show how to modify the reduction from Hamiltonian cycle in bipartite graphs described in the proof of Theorem 4.1. Let the bipartite graph be B = (C, D, E) as before. The reviewer set is {r1 , r2 , . . . , rp : r ∈ D} ∪ A. I.e., for each node r in D we will have a set of p reviewers associated with this node. We will have an additional set of auxiliary reviewers denoted as A. For each r ∈ D, we construct a set of k − 1 new projects Pr = {pjr |1 ≤ j ≤ k − 1}. We will have another p − 2 projects called auxiliary S projects denoted as PA = {a1 , . . . , ap−2 }. Our project set is then C ∪ PA ∪ r∈D Pr . Reviewer r1 and r2 can review the projects in the set Pr ∪ {i ∈ C|(i, r) ∈ E}, and reviewer ri+2 can review the projects Pr ∪ {ai }. So in order to obtain a p-node connected solution for each project in Pr there must be (at least) p reviewers that review the project. Therefore, each of the reviewers r1 and r2 must review all the projects in Pr , and another project from the set C (that is adjacent to r in G). Moreover, each reviewer ri+2 (for i ≥ 1) must review all projects in Pr ∪ {ai }. For each project q ∈ C ∪ PA and for each i = 1, 2, . . . , p − 2 such that q 6= ai , we have a reviewer in A that can review only the pair of projects {ai , q}. Similarly, to the proof of Theorem 4.1, we now claim that B is Hamiltonian if and only if the (k, p)-NCon instance is feasible. Assume that B is Hamiltonian, then for [i1 , r], [i2 , r] edges in the Hamiltonian cycle we assign reviewer r1 the project i1 and assign reviewer r2 the project i2 . We now argue that the resulting graph of direct comparisons among projects is p-node connected. To see this note that between any pair of 13 projects q, q 0 , there are p node disjoint paths: p − 2 of these paths are through a single node from PA and the other 2 node disjoint paths are using the Hamiltonian cycle. Therefore, in this case the (k, p)-NCon instance is feasible. Assume that the (k, p)-NCon instance is feasible. Then, as stated above reviewer r1 and also reviewer r2 must review the project set Pr and another project from C. We consider the subgraph of S GR induced by C ∪ r∈D {p1r }. In this induced subgraph each node has degree two. Moreover, since GR is p-node connected, this induced subgraph is 2-node connected. This is so because we remove PA and hence decrease the size of any node cut by at most k − 2 and this results a 2-node connected subgraph. The removal of the other nodes do not reduce the node connectivity any further similarly to the proof of Theorem 4.1. By a counting argument this induced subgraph is a Hamiltonian cycle. Now for each pair of nodes from u, v ∈ C such that along this Hamiltonian cycle there is a path between u and v consisting of two edges, there is a node r ∈ D that along this Hamiltonian cycle p1r is adjacent to both u and v. Therefore, we can place r between these two projects nodes to obtain an Hamiltonian cycle of the bipartite graph B. Therefore, B is Hamiltonian. It remains to consider the complexity status of the (k, p)-ECon problem. We first note that if each reviewer has an expertise set that contains at least k projects and p ≤ k − 1, then the review graph is p-edge connected if and only if it is 1-edge connected (i.e., if it is a feasible solution to the CONNECTIVITY-k-ALOC problem). This is so because w.l.o.g. each set Vj in the solution to the CONNECTIVITY -k- ALOC problem has exactly k nodes (and not less than k nodes). Therefore, the removal of at most p − 1 edges from the review graph keeps each node set of a given reviewer in the same connected component and hence it remains a connected graph. We next consider the case where p ≥ k, and we prove that for 2k − 1 ≥ p ≥ k the problem is NP-complete. Proposition 4.2 The (k, p)-ECon problem is NP-complete for all values of k ≥ 2 such that 2k − 2 ≥ p ≥ k. Proof: The (k, p)-ECon problem is in NP since we can guess the solution, construct the corresponding review graph, and then check that the review graph is p-edge connected (by computing the global mincut in the review graph). In order to show that it is NP-complete, we modify the reduction in the proof of Theorem 4.1. Given an input bipartite graph B for the Hamiltonian cycle problem, we construct an input to the (k, p)-ECon problem that is the (k, 2)-RCon instance constructed in the proof of Theorem 4.1. If the (k, 2)-RCon instance is a YES instance, then by removing at most p − 1 edges from the review graph we can disconnect the node set that correspond to at most one reviewer (note that the expertise set of each reviewer in the reduction has at least k projects), and therefore by removing at most p−1 edges the review graph remains connected. Therefore, in this case the (k, p)-ECon instance is a YES instance as well. By the reduction of Theorem 4.1, it means that if B is Hamiltonian, then the (k, p)-ECon instance is a YES instance. If the (k, p)-ECon instance is a YES instance, then note the fact that in this solution (that shows that it is a YES instance), each project is reviewed by at least two reviewers (as otherwise we can disconnect it from the rest of the project by removing k−1 edges). Therefore, by a counting argument, each project is reviewed by exactly two reviewers. Then, we can apply the same proof as in the proof of Theorem 4.1, to show that in this case B is Hamiltonian. 14 5 The 2- REVIEWER CONNECTIVITY AUGMENTATION problem The 2- REVIEWER CONNECTIVITY AUGMENTATION problem is the associated optimization problem defined as follows. The input is a feasible solution to the CONNECTIVITY-k-ALOC problem, and integer numbers q and k ≥ 2. The goal is to augment the solution to the CONNECTIVITY-k-ALOC problem using at most q additional reviewers, where each of these has an expertise set that equals to the set of all projects and we can assign at most k projects for each additional reviewer. The goal is to maximize the number of pairs of projects such that the resulting review graph (constructed by adding the q additional reviewers to the solution of the CONNECTIVITY-k-ALOC problem) has two reviewerdisjoint paths between them. The motivation for studying this problem is the fact that in some cases there are few reviewers that can review the whole set of projects (these reviewers might be the panel members) though their expertise level in each subject is smaller than the one of the regular reviewers. Therefore, we would like to have a feasible solution to the CONNECTIVITY-k-ALOC problem using a high level expert in each of the projects, and to use the panel members in a method to increase the robustness of the resulting ranking, by providing two reviewer-disjoint paths between some pairs of projects. The problem of finding a minimum set of edges that must be added to a given subgraph so that the resulting graph is 2-edge connected has been studied previously. Eswaran and Tarjan [10] provided a sufficient and necessary conditions, and a linear time algorithm to construct an optimal solution is given in [21, 25]. Therefore, if k = 2 and the number of edges that can be augmented to GR is large enough so that it is possible to make the whole review graph a 2-edge connected graph (or 2-reviewer connected graph), then such a feasible solution can be found in linear time. In this section we show how to solve in polynomial time the 2- REVIEWER CONNECTIVITY AUG MENTATION problem. We define a block to be a non-trivial (i.e., with at least two nodes) maximal node set such that between each pair of nodes in this set there are at least two reviewer-disjoint paths. The next lemma simplifies the structure of an optimal solution to the 2- REVIEWER CONNECTIVITY AUGMENTATION problem. Lemma 5.1 Given an optimal solution such that there is a pair of additional reviewers r, r0 where r review projects p1 and p2 , and r0 reviews p3 and p4 (among perhaps other projects), then p1 , p2 , p3 , and p4 belong to a common block. Proof: First note that p1 and p2 belong to a common block (and similarly p3 and p4 belong to a common block). This is so as between p1 and p2 there is one path that is using the comparisons of the solution to the CONNECTIVITY-k-ALLOC problem, and another path using the additional reviewer, r that reviews both of them. To conclude the claim assume that p1 and p3 do not belong a common block. We next consider the ranking path between p1 and p3 , and the path between p2 and p4 in the solution to the CONNECTIVITYk-ALOC problem (see Figure 4). These two paths share a common reviewer r˜ as otherwise p1 and p3 are in the same block. This is so since p1 and p2 are in the same block, and therefore we can extend the path from p2 to p4 by the two additional reviewers to obtain a path from p1 to p3 that is reviewerdisjoint to the path between these projects in the solution to the CONNECTIVITY-k-ALOC problem. We next replace the allocated sets of projects of r and r0 , by assigning project p1 to r0 and not to r and assigning the project p3 to r instead of r0 . Then, the project set assigned to r˜ belongs to the block of p1 and p3 in the resulting new solution. Moreover, the same project set is also in the block of p2 and p4 . Therefore, the set of projects assigned to either r or r0 belongs to a common block. Since other blocks are not separated by the above change (in the solution), the resulting solution is no worse 15 Figure 4: The solution to the CONNECTIVITY-k-ALOC problem in the proof of Lemma 5.1 then the initial one. Moreover, since we merge at least two blocks into one block the number of pairs in a common block increased, and therefore this contradicts the assumption that the initial solution is an optimal one. Therefore, in an optimal solution the claim holds. By Lemma 5.1, the entire set of projects that is reviewed by at least one additional reviewer is in one block, that we denote as the main block. Assume that in the main block there are ` projects and there are t pairs of projects from this block that have been 2-reviewer connected already in the CONNECTIVITY -k- ALOC solution. Then, by using the q additional reviewers we 2-reviewer connect ¡ ¢ another 2` − t, and we would like to maximize this amount. Next, we describe Algorithm Augment DP for solving the 2- REVIEWER CONNECTIVITY AUG MENTATION problem. Our algorithm is summarized in Figure 5. Given a feasible solution to the CONNECTIVITY-k-ALOC problem, we construct a tree over the bipartite graph B = (P, R, E) whose sides are the project set P and the reviewer set R and each edge in E connects a reviewer to a project that lies within his/her expertise set. In this bipartite graph we contract each 2-reviewer connected component (of the review graph) with any reviewer such that his entire set of projects lies in a common 2-reviewer connected component of GR . The resulting graph is still not necessarily a tree, but we consider a BFS tree constructed from an arbitrary node. Note that in this tree we have to select a set of kq leaves that will be reviewed by the additional reviewers. These leaves and the paths in the tree between pairs of these leaves will be the ¡main block in the solution. `¢ So the goal is to find a set of kq leaves that will maximizes the amount 2 − t defined above. Our algorithm is based on a dynamic programming procedure, and we prove the following theorem: Theorem 5.1 Algorithm Augment DP solves the 2- REVIEWER problem in O(n5 k 2 q 2 ) time. CONNECTIVITY AUGMENTATION Proof: In the resulting tree we assign a non-negative integer weight w(v) to each node v. If v ∈ R then w(v) = 0. Otherwise, if v ∈ P and no node was contracted into v then w(v) = 1, and if v is a result of contracting nodes into it then w(v) is the total number of projects that were contracted to this node. We root the resulting tree at an arbitrary node denoted as root. Next, we apply a preprocessing step on the tree so that each node in the tree has at most two children. To achieve this goal, we replace a node v with ∆ children by a binary tree with ∆ leaves, and the root of this binary tree has weight w(v), each internal node of the binary tree has zero weight, 16 and the ∆ leaves correspond to the ∆ children of v before the change occurs. Hence, we can assume that the input graph is a binary tree and each node has an integral weight smaller than n. Given the binary tree T rooted at root where each node v has a weight w(v) ∈ {0, 1, . . . , n}, we apply the following dynamic programming algorithm. We first assume that the root will be a member of the main block. In this case at least one leaf of the selected kq leaves will be a descendant of the right child of root and at least one selected leaf will be a descendant of the left child of root. We denote by Fi,j,t the maximum total weight of nodes that can be covered using at most j paths each of them starting at i and going down the tree until reaching a leaf, and such that these j paths cover exactly t pair of projects that are already 2-reviewer connected in the CONNECTIVITY-k-ALOC initial solution. We would like to compute maxt=0,1,...,(n) Froot,kq,t − t. The computation is carried bottom2 up. During the computation we set the value of F to be −∞ for infeasible values of the triple i, j, t. For a leaf node v, ¡w(v)¢ w(v) if j ≥ 1, t = 2 Fv,j,t = . 0 if j = 0, t = 0 −∞ otherwise Next, consider the case where the current node v has only one child denoted as u. Then, ( Fv,j,t = w(v) + Fu,j,t−(w(v)) if j ≥ 1 2 0 otherwise where we assume that Fu,j,t = −∞ for all negative values of t. Finally, we assume that v has two children denoted as u and u0 . If j ≥ 1 then Fv,j,t = w(v) + max j 0 =0,1,...,j, w(v) t0 =0,1,...,t− 2 ( Fu,j 0 ,t0 + Fu0 ,j−j 0 ,t−t0 −(w(v)) , 2 ) and otherwise Fv,0,t = 0 where we set Fv,j,t to equal −∞ if either j or t are negative. Since the weight of a node in the tree is at most nGR the total time complexity to solve the entire problem is O(n5 k 2 q 2 ) as there are O(n3 kq) values to compute and each of them takes O(n2 kq). This dynamic programming solves the 2-reviewer connectivity augmentation problem assuming that root belongs to the main block. We can check all possibilities of selecting the root node and obtaining a polynomial time algorithm that solves the problem (by paying an extra factor of n in the time complexity). However, we next show how to modify the dynamic programming so that we will not assume that root belongs to the main block. This assumption is used by the fact that we search for a collection of kq paths going down from root to leaves of the tree and that we assume that these paths also cover the root. The next modification removes this assumption. In the case where the current node v has only one child denoted as u. Then, Fv,j,t = w(v) + Fu,j,t−(w(v) 2 ) Fu,j,t 0 if kq > j ≥ 1 if j = kq otherwise where we assume that Fu,j,t = −∞ for all negative values of t. In the case where v has two children denoted as u and u0 . If j ≥ 1 then w(v) + max j 0 =0,1,...,j, Fu,j 0 ,t0 + Fu0 ,j−j 0 ,t−t0 −(w(v)) 2 0 =0,1,...,t−(w(v)) t 2 Fv,j,t = 0 0 0 0 max F , F , w(v) + max F + F j =1,2,...,j−1, u,j ,t u0 ,j−j 0 ,t−t0 −(w(v) u,j,t u ,j,t 2 ) w(v) t0 =0,1,...,t−( 2 ) 0 if kq > j ≥ 1 if j = kq if j = 0 17 and otherwise Fv,0,t = 0 where we set Fv,j,t to equal −∞ if either j or t are negative. We denote the resulting algorithm by Augment DP, and we conclude the claim. 6 The (k, q)- HOP problem As the length of the ranking path increases, the robustness and reliability of the implied ranking decreases. It is therefore desirable to limit the length of the ranking paths. For each pair of projects we require the existence of at least one ranking path that has length of at most q hops. That means that the ranking path has at most q edges. When all projects are directly comparable the allocation provides a 1-hop review graph. The graph in Figure 1 is a 3-hop review graph and the graph in Figure 3 is a 2-hop review graph. The associated problem called (k, q)- HOP problem is to maximize the number of pairs of projects that have at least one q-hop ranking path between them. The (k, q)- HOP problem was addressed recently by Park and Newman [24] in ranking college football teams. They include in a graph a directed arc from i to j if (football) team i wins against team j. They conclude an implied win if there is a directed path with q hops, where the weight of this implied win decreases exponentially with q. The algorithm they developed is based on the diminished importance of the implied paths as a function of the number of hops. This model is different from ours in that the graph is determined as an outcome of the evaluation process - the process of playing the games, whereas in the intensity of preferences models the graph topology is determined by the k-allocation and only the intensities are determined by the evaluations. In this section we show that the (k, q)- HOP problem is NP-complete for all fixed values of q and k such that q ≥ 2 and k ≥ 2. We note that the case of q = 1 is exactly the MAX k- COMPLETE COVERAGE problem that is polynomially solvable for k = 2 and NP-complete for all values of k such that k ≥ 3. In order to show that the (k, q)- HOP problem is NP-complete for all values of q we first show this claim for q = 2, and then show how to extend it to other values of q. Theorem 6.1 The (k, 2)- HOP problem is NP-complete for all fixed values of k such that k ≥ 2. Proof: The problem is clearly in NP as given a solution it is easy to test that each reviewer reviews at most k projects, and that for each pair of projects there exists a path of at most two hops that compare this pair (by testing all such paths). Now consider the following reduction from SAT (see problem [LO1] in [17]): Assume that we are given an instance for the SAT problem consisting of clauses C1 , C2 , . . . , Cm over the variables v1 , v2 , . . . , vn . W.l.o.g. we assume that for each i we have a clause consisting of the pair of literals vi and vi (where vi is the negation of vi ). We construct an instance for the (k, 2)- HOP problem as follows: Our project set is defined in the following way. For each variable vi , we have k projects the first of these is associated with the variable, the second with its negation (these first two projects are called literal projects) and the remaining k −2 projects are denoted as Q1i , . . . , Qk−2 . For each clause Cj , we i j have a set of k − 1 projects Pi for i = 1, 2, . . . , k − 1. To conclude our list of projects we have another truth project T and an auxiliary project A. Our reviewer set is defined as follows: For each pair of literals li , li0 we have a reviewer with set that equals {li , li0 } (and hence w.l.o.g. this set is selected for this reviewer). For each project p such that p 6= T, A we will have a reviewer with project set equals {p, A} (and hence w.l.o.g. this set is selected for this reviewer). For each clause Cj we will have a clause reviewer with a set of projects equals Cj ∪ {Pij : i = 1, 2, . . . , k − 1}, and for each variable vi we will have a truth-assignment reviewer with a set of projects equals T, Q1i , . . . , Qk−2 , vi , vi . i We first note that each pair of literal nodes are compared using the corresponding reviewer (the one with project set equals {li , li0 }). We also note that in order for the solution to be connected each 18 clause reviewer with project set Cj ∪ {Pij : i = 1, 2, . . . , k − 1} reviews exactly one project from Cj (and the other k − 1 projects are Pij for all i), and each truth-assignment reviewer that corresponds to vi must review the projects Q1i , . . . , Qk−2 and two other projects from T, vi , vi (as otherwise if he/she i does not review Qji then there is no 2-hop path from T to Qji ). To conclude the proof we will show the following claim. Claim 6.1 The following three conditions are equivalent. 1. The (k, 2)- HOP instance is feasible. 2. There is a 2-hop path from T to all the projects. 3. There is a truth assignment that satisfies the SAT formula. Proof: Assume that the (k, 2)- HOP instance is feasible. Then clearly there is a 2-hop path from T to all the projects. For each variable vi such that the i-th truth-reviewer reviews T , we note that this reviewer reviews exactly one of the projects vi and vi , and we let the literal of the project that he/she reviews to have value TRUE whereas its negation is FALSE. We let the truth-assignment of the other (yet undefined) variables to be arbitrary. We next claim that each clause Cj has a TRUE literal. To see this claim note that there is a 2-hop path from P1j to T . This 2-hop path must traverse a middle project that is one of the literals that belongs to Cj . Since there is a reviewer that reviews both this literal and T , we conclude that this reviewer is a truth-assignment reviewer, and therefore we assign this literal a TRUE value. Therefore, the SAT formula is satisfied. Assume that the SAT formula is satisfied by a truth assignment φ. We let the j-th clause reviewer review the set Pij for i = 1, 2, . . . , k − 1 and also review one of the literals that belongs to Cj and assigned a TRUE value in φ. We let the truth-assignment reviewer reviews the set T, Q1i , . . . , Qk−2 i and one extra project that is the TRUE literal among vi and vi . Then, clearly each project is adjacent to one of the literal projects and therefore has a 2-hop path to all the literals projects. Moreover each project Pij is compared to A, and therefore there is a 2-hop path between each pair of such projects. We next note that there is a 2-hop path from A to T . This is so because assuming that clause Cj is satisfied by literal l (and during the solution we picked l as the TRUE literal of Cj ), then both T and A are compared to l (T is compared by the truth-assignment reviewer and A by the reviewer with project set {l, A}). It remains to show that there is a 2-hop path from T to Pij . This 2-hop path is established because there is an l ∈ Cj that is a TRUE literal such that the clause reviewer compare l and Pij (l is the unique literal project that this clause reviewer reviews), and the truth-assignment reviewer compare l and T . This provides a 2-hop path from T to Pij . This concludes the proof of Theorem 6.1. Theorem 6.2 The (k, q)- HOP problem is NP-complete for all fixed values of q and k such that q ≥ 2 and k ≥ 2. Proof: To extend the proof of Theorem 6.1 to larger values of q, we change the reduction so that we will have another q − 2 new projects denoted as a1 , a2 , . . . , aq−2 and q − 2 new reviewers where the i-th new reviewer is able to review only ai and ai+1 (where aq−1 = T is the truth project). We also remove the project A from the project set and remove all the reviewers that contain it in their expertise set. The resulting instance of the (k, q)- HOP is feasible if and only if there is a 2-hop path from T to all the projects beside a1 , a2 , . . . , aq−2 . This is so because for each pair of projects that are associated with either a variable or a clause, there is a 3-hop path connecting these projects. But as shown in Claim 6.1, there is a 2-hop path from T to all the projects beside a1 , a2 , . . . , aq−2 if and only if there is a truth assignment that satisfies all clauses. Therefore, we conclude the claim. 19 7 Concluding remarks In this paper we study optimization and decision problems relating to the allocation of reviewing tasks during the process of evaluating a large number of projects. Whereas several problems are shown to be polynomially solvable, most studied problems are shown to be NP-complete. We believe that finding polynomially solvable special cases that are of relevance to some applications is an interesting and important open question that is left for future research. References [1] A. A. Ageev and M. I. Sviridenko, ”Pipage rounding: a new method of constructing algorithms with proven performance guarantee,” Journal of Combinatorial Optimization, 8, 307–328, 2004. [2] Y. Asahiro, K. Iwama, H. Tamaki and T. Tokuyama, ”Greedily finding a dense subgraph,” J. Algorithms, 34, 203–221, 2000. [3] J. J. Bartholdi, C. A. Tovey and M. A. Trick, ”The computational difficulty of manipulating an election,” Social Choice and Welfare, 6, 227–241, 1989. [4] J. P. Brans and Ph. Vincke, ”A preference ranking organization method,” Management Science, 31, 647–656, 1985. [5] C. Brezovec, G. Cornuejols and F. Glover, “Two algorithms for weighted matroid intersection,” Mathematical Programming, 36, 39-53, 1986. [6] C. Brezovec, G. Cornuejols and F. Glover, ”A matroid algorithm and its application to the efficient solution of two optimization problems on graphs,” Mathematical Programming, 42, 471-487, 1988. [7] C. Chekuri and A. Kumar, ”Maximum coverage problem with group budget constraints and applications,” in Proceedings of APPROX 2004, 72–83, 2004. [8] W.D. Cook, B. Golany, M. Kress, M. Penn and T. Raviv. Optimal allocation of proposals to reviewers to facilitate effective ranking. Management Science, 51, 655–661, 2005. [9] D. Dor and M. Tarsi, ”Graph decomposition is NP-complete: a complete proof of Holyer’s conjecture,” SIAM Journal on Computing, 26, 1166–1187, 1997. [10] K. P. Eswaran and R. E. Tarjan, ”Augmentation problems,” SIAM Journal on Computing, 5, 653-665, 1976. [11] U. Feige, G. Kortsarz and D. Peleg, ”The Dense k-Subgraph Problem,” Algorithmica, 29, 410-421, 2001. [12] E. Fernandez and R. Olemdo, ”An agent model based on ideas of concordance and discordance for group ranking problems,” Decision Support Systems, 39, 429–443, 2005. [13] G. N. Frederickson and M. A. Srinivas, ”Algorithms and data structures for an expanded family of matroid intersection problems,” SIAM Journal on Computing, 18, 112-138, 1989. 20 [14] R. Fuller and Ch. Carlsson, ”Fuzzy multiple criteria decision making: recent developments,” Fuzzy sets and systems, 78, 139–153, 1996. [15] H. N. Gabow and R. E. Tarjan, ”Almost optimum speed-ups of algorithms for bipartite matching and related problems,” in Proceedings of STOC 1988, 514–527, 1988. [16] H. N. Gabow and R. E. Tarjan, ”Faster scaling algorithms for network problems,” SIAM Journal of Computing, 18, 1013–1036, 1989. [17] M. R. Garey and D. S. Johnson, Computers and Intractability, W.H. Freeman and Co., New York, 1979. [18] D. S. Hochbaum and B. Chandran, ”Further below the flow decomposition barrier of maximum flow for bipartite matching and maximum closure.” Manuscript, UC Berkeley April 2004. [19] D. S. Hochbaum and A. Levin, ”Methodologies for the group rankings decision.” Management Science, 52, 1394-1408, 2006. [20] I. Holyer, ”The NP-completeness of some edge-partition problems,” SIAM Journal on Computing, 10, 713–717, 1981. [21] T. Hsu and V. Ramachandran, ”On finding smallest augmentation to biconnect a graph,” SIAM Journal on Computing, 22, 889-912, 1993. [22] J. P. Keener, ”The Perron-Frobenius theorem and the rating of football teams,” SIAM review, 35, 80–93, 1993. [23] J. G. Kemeny and J. L. Snell, ”Preference ranking: An axiomatic approach,” In Mathematical models in the social sciences, Boston, Ginn, 9–23, 1962. [24] J. Park, and M.E.J. Newman, ”A network-based ranking system for US college football,” Journal of Statistical Mechanics: Theory and Experiment, (Oct. 31, 2005). Abstract available at http://www.iop.org/EJ/abstract/1742-5468/2005/10/P10014. [25] A. Rosenthal and A. Goldner, ”Smallest augmentations to biconnect a graph,” SIAM Journal on Computing, 6, 55-66, 1977. [26] A. Schrijver, ”Combinatorial optimization polyhedra and efficiency”, Springer–Verlag, Berlin, 2003. 21 Input. A feasible solution to the CONNECTIVITY-k-ALOC problem. Preprocessing phase. 1. Construct the bipartite graph B = (P, R, E), and contract each 2-reviewer connected component (of GR ) with any reviewer such that his entire set of projects lies in a common 2-reviewer connected component of GR . 2. Consider a BFS tree constructed from an arbitrary node, and root the tree at an arbitrary node root. 3. Assign weights to nodes: If v ∈ R then w(v) = 0. Otherwise, if v ∈ P and no node was contracted into v then w(v) = 1, and if v is a result of contracting nodes into it then w(v) is the total number of projects that were contracted to this node. 4. Replace the tree with a binary tree T , by replacing a node v with ∆ > 2 children by a binary tree with ∆ leaves where all internal nodes have zero weight. The dynamic programming procedure. Fi,j,t is the maximum total weight of nodes that can be covered using at most j paths each of them starting at i and going down the tree until reaching a leaf, and such that these j paths cover exactly t pair of projects that are already 2-reviewer connected in the CONNECTIVITY-k-ALOC initial solution. Goal of the procedure. To compute max{Froot,kq,t − t} such that 0 ≤ t ≤ ¡ n¢ 2 . Apply the following computation in a bottom-up fashion. (Where we set Fv,j,t to equal −∞ if either j or t are negative.) 1. If v is a leaf of the tree, then we set Fv,j,t = w(v) if kq ≥ j ≥ 1 and t = 0, and otherwise Fv,j,t = 0. 2. If v has only one child denoted as u, then we set Fv,j,t to be w(v) + Fu,j,t−(w(v)) if kq > j ≥ 1, 2 Fu,j,t if j = kq and 0 otherwise. 3. If v has two children denoted as u and u0 . (a) If kq > j ≥ 1, then Fv,j,t = w(v) + max j 0 =0,1,...,j, w(v) t0 =0,1,...,t− 2 ( Fu,j 0 ,t0 + Fu0 ,j−j 0 ,t−t0 −(w(v)) . 2 ) (b) If j = kq, then Fv,j,t = max Fu,j,t , Fu0 ,j,t , w(v) + max j 0 =1,2,...,j−1, w(v) t0 =0,1,...,t− 2 ( Fu,j 0 ,t0 + Fu0 ,j−j 0 ,t−t0 −(w(v)) 2 ) (c) If j = 0, then Fv,j,t = 0. Figure 5: Algorithm Augment DP .