Travelling through Theoretical Computer Science Zihan Tan
Transcription
Travelling through Theoretical Computer Science Zihan Tan
Travelling through Theoretical Computer Science Zihan Tan Abstract Through 4 years at Institute Interdisciplinary Information Science, I was well-educated in branches of Theoretical Computer Science (TCS). While enjoying various of excellent courses here, I also worked on many research problems, which excited my interest in design and analysis of algorithms, computational complexity, combinatorial optimization, theory of machine learning, theory of network, etc. By doing research, attending seminars and taking courses I developed a good flavor and a broad vision of TCS. This manuscript collects my main research notes in various fields, vividly depicting the 4-year-long journey in the paradigm of TCS. • Object has projections. What about the converse? • Is a given network suitable for computing the given function? • How hard is bridgecard game? • How long will two random walks meet? • Do better on Influence Maximization! • Hypothesis Testing is Impossible without the knowledge that the optimal is unique! • Do better on Pagerank! • How should you make online decisions? • A Linear Propagation model for Compressive Sensing. • Evolution Theory is related with Game Theory. • Mutual Exclusion is impossible with only one stack. • Information-theoretic Differential Privacy is pessimistic. What about computational ones? • Random Serial Dictatorship is bad about Facility Location. • Conway’s Life Game. 1 On the Inequalities of Projected Volumes and the Constructible Region Zihan Tan ; Liwei Zeng ; Jian Li Abstract We study the following geometry problem: given a 2n −1 dimensional vector π = {πS }S⊆[n],S6=∅ , is there an object T ⊆ Rn such that log(vol(TS )) = πS , for all S ⊆ [n], where TS is the projection of T to the subspace spanned by the axes in S? If π does correspond to an object in Rn , we say that π is constructible. We use Ψn to denote the constructible region, i.e., the set of all conn as and Thomason showed that Ψn is contained in a structible vectors in R2 −1 . In 1995, Bollob´ polyhedral cone, defined a class of so called uniform cover inequalities. We propose a new set of natural inequalities, called nonuniform-cover inequalities, which generalize the BT inequalities. We show that any linear inequality that all points in Ψn satisfy must be a nonuniform-cover inequality. Based on this result and an example by Bollob´ as and Thomason, we show that constructible region Ψn is not even convex, and thus cannot be fully characterized by linear inequalities. We further show that some subclasses of the nonuniform-cover inequalities are not correct by various combinatorial constructions, which refutes a previous conjecture about Ψn . Finally, we conclude with an interesting conjecture regarding the convex hull of Ψn . 1 Introduction We use notations introduced in [3]. Let T be an object in Rn+ and let {v1 , · · · , vn } be the standard basis of Rn . By an object, we mean a bounded compact subset of Rn+ . We let Span(S) denote the subspace spanned by {vi | i ∈ S}. Given an index set S ⊂ [n] = {1, 2, · · · , n} with |S| = d, we denote by TS the orthogonal projection of T onto Span(S) , and by |TS | its d-dimensional volume. We use |T | to denote the n-dimensional volume of T . Given an n-dimensional object T , define π(T ) to be the log-projection vector of T , which is a 2n − 1 dimensional vector with entries indexed by subsets of [n] and π(T )S = log |TS | for all S ⊆ [n] (we use the convention that log 0 = −∞). Whenever we refer to a 2n − 1 dimensional vector π, we assume that the entries are indexed by the subsets of [n] (i.e., πS is the entry index by S ⊂ [n]). We say that a 2n − 1 dimensional vector π is constructible if π is the log-projection vector of some object T in Rn . Let us define the constructible region Ψn , the central subject studied in this paper, to be the set of all constructible vectors: n −1 Ψn = {π ∈ R2 | π is constructible}. Having the above definitions, it is natural to ask the following questions: 1. Given a 2n − 1 dimensional vector π, is there an algorithm to decide whether π is in Ψn ? 2. What does Ψn look like? What property does Ψn have? 1 In 1995, Bollob´ as and Thomason [3] proposed a class of inequalities relating the projected volumes. Their result reads as follows. Let A be a family of subsets of [n]. We say A is a k-cover of [n], if each element of [n] appears exactly k times in the multiset induced by A. For example, {{1, 2}, {2, 3}, {1, 3}} is a 2-uniform cover of {1, 2, 3}. n Theorem 1. (Bollob´ as-Thomason (BT) uniform-cover Q inequalities) Suppose T is an object in R k and A is a k-cover of [n]. Then, we have that |T | ≤ A∈A |TA |. With the above notations, we define the polyhedron cone X n BT n = {π ∈ R2 −1 | kπS ≤ πA , for all k and A that k-covers S, S ⊂ [n]}. A∈A BT inequalities essentially assert that every constructible vector is in BT n , or equivalently Ψn ⊆ BT n . In the very same paper [3], they also presented a non-constructible point in BT 4 , which immediately implies that Ψn BT n . However the above result does not rule out the possibility that Ψn is convex, or even can be characterized by a finite set of linear inequalities. 1.1 Our Results Except the results mentioned above, very little is known about Ψn and the main goal of this paper is to deepen our understanding about its structure. First, we propose a new class of natural inequalities, called nonuniform-cover inequalities, which generalize the BT uniform-cover inequalities. We need a few notations first. 1 Let A = {Ai }ki=1 , B = {Bj }m j=1 be two families of subsets of [n], where Ai s and Bj s are subsets of [n]. We say A covers B if the following properties hold: 1. The disjoint union of {Ai }ki=1 is the same as the disjoint union of {Bj }m j=1 . In other words, for every element e ∈ [n], |{i | e ∈ Ai }| = |{j | e ∈ Bj }|. 2. Let Σ = {(Ai , t) | t ∈ Ai } and Γ = {(Bj , s) | s ∈ Bj }, and there is an one-to-one mapping f between Σ and Γ such that: for any (Ai , t) ∈ Σ with (Bj , s) = f (Ai , t), t = s and Ai ⊂ Bj . Definition 1. (Nonuniform-Cover (NC) inequalities) x is a 2n − 1 dimensional vector. Suppose A covers B. A nonuniform-cover inequality is of the following form: Y Y xAi ≥ xBj . Ai ∈A Bj ∈B Example 1. Let A = {{1, 2}, {2, 3}, {3, 4}} and B = {{1, 2, 3}, {2, 3, 4}}. We can see A covers B. The corresponding NC inequality is x{1,2} · x{2,3} · x{3,4} ≥ x{1,2,3} · x{2,3,4} . Here is another example: x{1} · x{1,2} · x{2,3} · x{3,4} · x{2,4} ≥ x{1,2,3} · x{2,3,4} · x{1,2,4} . P P When the context is clear, we refer to a linear inequality of the form B πBj ≤ A πAi as an NC inequality as well. It is easy to see that that every BT inequality is an NC inequality. But the converse may not be true. For example, x{1,2} · x{2,3} · x{3,4} ≥ x{1,2,3} · x{2,3,4} . (We alert the reader that we do not claim such inequalities are always true. We will discuss it in detail in Section 4.) 1 A subset of [n] may appear more than one times in A or B 2 Similar to BT n , we define N C n to be the set of all points that satisfies all NC inequalities: Formally, it is the following polyhedron cone: X X n N C n = {π ∈ R2 −1 | πBi ≤ πAi , for all A, B such that A covers B}. Bj ∈B Ai ∈A Our first result states that all correct linear inequalities should be in this class. P Theorem 2. If all points in Ψn satisfy a certain linear inequality S⊆[n] αS πS ≤ 0, the linear inequality must be an NC inequality, or a positive combination of NC inequalities. In order to prove the above theorem, we introduce a class of objects called rectangular flowers. We let RF n to denote all possible log-projection vectors that can be generated by rectangular flowers (see the definition in Section 2). We show that for any linear inequality that is not an NC inequality, we can construct a rectangular flower which violates the inequality. It is simple to show that a log-projection vector of a rectangular flower in Rn satisfies all nonuniform cover inequalities (i.e., it is in N C n ). Moreover, we show that for every point π ∈ N C n , there is a rectangular flower in Rn whose log-projection vector is π. Therefore, we can prove the following theorem. Theorem 3. For all n ≥ 1, N C n = RF n ⊆ Ψn . Given Theorem 3, it is natural to ask whether N C n = Ψn . If the answer was yes, Ψn would have a compact description and deciding whether a point is in Ψn can be done using linear programming (see Section 2 for the details). However, the answer is not that simple. In fact, using Theorem 2, we can show our next result which states that Ψn is not even convex for n ≥ 4. We note that for n = 1, 2, 3, Ψn = BT n , thus convex. For completeness, we provide a proof in Appendix A. Theorem 4. (Non-convexity of Ψn ) For n ≥ 4, Ψn is not convex. n Theorem 4 implies that there exist certain constructible vector in R2 −1 which violates some NC constraint. In other words, N C n Ψn . Thus it would be interesting to know which NC inequalities are true and which are false (we already know BT inequalities are true). In Section 4, we provide several methods for constructing counterexamples for different subclasses of NC inequalities. However, we have not been able to disprove all NC inequalities that are not BT inequalities, nor prove any of them. This leads us to conjecture the following. P P Conjecture 1. If all points in Ψn satisfy a certain linear inequality j βj πBj ≤ i αi πAi , the linear inequality must be a BT inequality or a positive combination of several BT inequalities. Moreover, BT n = Conv(Ψn ), the convex hull of Ψn . At the end of the introduction, we summarize our results in the following chain: RF n = N C n ( Ψn ( Conv(Ψn ) ⊆ BT n , and we conjecture that Conv(Ψn ) = BT n . 3 1.2 A Motivating Problem from Databases Our problem is closely related to the data generation problem [1] studied in the area of databases, which is in fact our initial motivation for studying the problem. Generating synthetic relation under various constraints is a key problem for testing data management systems. A relation R(A1 , . . . , An ) is essentially a table, where each row is one record about some entity, and each column Ai is an attribute. One of the most important operations in relational databases is the projection operation to a subset of attributes. One can think of the projection to subset S of attributes, denoted as πS (R), as the table R first restricted to columns in S, and then with duplication removed. To see the connection between the database problem and geometry, we can think a relation R(A1 , . . . , An ) with n attributes as an n-dimensional object T in Rn : A tuple (i.e., a row) (t1 , t2 , . . . , tm ) can be thought as a unit cube [t1 − 1, t1 ] × . . . × [tm − 1, tm ]. Then, TS , the projection of T to Span(S), corresponds to exactly the projected relation πS (R). Example 2. The following table shows the information of course registration. 5 items in the table correspond to unitsquare in the coordinate system. In this way, a table is represented by an object in Euclidean space. Courses Rank 1 2 3 4 5 Biology Physics Name Alice Alice Alice Bob Bob Course Math Physics Biology Math Physics Math Alice Bob Students In the data generating problem with projection constraints, we are given the cardinalities |πS (R)| for a set of subsets S ⊆ [n]. The goal is to construct a relation R that is consistent with the given cardinalities. We can see it is a discrete version of our geometry problem. Moreover, if the given cardinalities (after taking logarithm) is not in Ψn , or violate any projection inequality, there is obviously no solution to the data generation problem. Therefore, a good understanding of our geometry problem is central for solving the data generation problem. 1.3 Other Related Work Loomis and Whitney proved a class of projection inequalities in [7], allowing one to upper bound the volume of a d−dimensional object by the volumes of its (d − 1)-dimensional projection volumes. Their inequalities are special cases of BT inequalities. BT inequalities and their generalizations also play an essential role in the worst-case optimal join problem in databases (we can get an upper bound of the size of the relation R knowing the cardinalities of its projections). See e.g., [8] for some most recent results on this problem. 4 There is a large body of literature on the constructible region Γn for joint entropy function over n random variables X1 , . . . , Xn . More specifically, for each joint distribution over X1 , . . . , Xn , there is a point in Πn , which is a 2n − 1 dimensional vector, with the entry indexed by S ⊆ [n] being H({Xi }i∈S ). Characterizing Γn is one major problem in information theory and has been studied extensively. Many entropy inequalities are known, including Shannon-type inequalities and several non-Shannon-type inequalities. For a comprehensive treatment of this topic, we refer interested readers to the book [9]. There are close connections between entropy inequalities and projection inequalities [2–5]. In particular, BT inequalities can be easily derived from the well known Shearer’s entropy inequalities [4] (many even regard them as the same). 2 Proof of Theorem 2 and Theorem 3 In this section, we prove Theorem 2 and Theorem 3. We need to introduce a class of special geometric objects, which are crucial to our proofs. We say an n-dimensional object F ⊆ Rn+ is n cornered if x ∈ T implies y ∈ F for all y ≤ x (i.e., yi ≤ xi for all i ∈ [n]). An object R ⊆ R+ is said to be an open rectangle if R = (0, a1 ] × (0, a2 ] × . . . × (0, an ], or a closed rectangle if R = [0, a1 ] × [0, a2 ] × . . . × [0, an ]. n is a rectangular flower if Definition 2. We say F ⊆ R+ 1. F is cornered, 2. F ∩ (0, ∞)S is a open rectangle in (0, ∞)S for any S ⊂ [n]. ΛS ΣS s t ΛT ΣT (ii) (i) Figure 1: (i) A 3-dimensional rectangular flower. (ii) The network flow N (A, B). The dashed line represents the minimum s-t cut. See Figure 1 for an example. It is easy to see that a rectangular flower F ⊂ Rn+ is a union of S − 1 closed rectangles S⊆[n],S6=∅ FS , each FS being a closed rectangle in Span(S). Moreover, If S ⊂ S 0 , for any i ∈ S, the edge length of RS along axis i is no shorter than that of RS 0 (since F is cornered). We also need to introduce a new class of inequalities, call fractional nonuniform-cover inequalities, which can be seen as the fractional generalization of NC inequalities. We need some notations first. Let A = {(Ai , αi )}ki=1 , B = {(Bj , βj )}m j=1 be two families of weighted subsets of [n], where Ai s and Bj s are subsets of [n] and αi > 0 (βj resp.) is the nonnegative weight associated with Ai (Bj resp.). Construct a network flow instance N (A, B) as follows: Let Σ = {(Ai , x) | x ∈ Ai , Ai ∈ A} 2n 5 and Λ = {(Bj , y) | y ∈ Bj , Bj ∈ B} be sets of nodes. Let node s be the source and node t be the sink. There is an arc from s to each node (Ai , x) ∈ Σ with capacity αi . There is an arc from each node (Bj , y) ∈ Λ to t with capacity βj . For each pair of (Ai , x) and (Bj , y), there is an arc with capacity +∞ from (Ai , x) to (Bj , y) if Ai ⊆ Bi and x = y. We say A saturates B if the following properties hold: P P C1. For any x ∈ [n], ki=1 αi 1(x ∈ Ai ) = m j=1 βj 1(x ∈ Bj ). P C2. The maximum s-t flow (or equivalently, the minimum s-t cut) of N (A, B) is j βj . Definition 3. (Fractional-Nonuniform-Cover (FNC) inequalities) Suppose T is an object in Rn and A covers B. A fractional-nonuniform-cover inequality is of the following form: Y Y |TBj |βj . |TAi |αi ≥ (Bj ,βj )∈B (Ai ,αi )∈A P When the context is clear, we also refer to linear inequalities of the form Bj ∈B βj πBj as FNC inequalities. P Ai ∈A αi πAi ≥ Lemma 1. The set of FNC inequalities (the linear form) is exactly the set of all nonnegative linear combinations of NC inequalities. Proof. It is trivial to see that a nonnegative linear combination of NC inequalities is an FNC inequality. Now, we show the other direction. Fix the dimension to be n. Consider an arbitrary FNC inequality cx ≤ 0. If all entries in c are rational number, the FNC inequality itself is an NC inequality by scaling all coefficients by some integer factor (this is because if all capacities of the network are integral, there is an integral maximum flow). So, we only need to handle the case where some entries of c are not rational. Now, we show that every point in N C n is satisfied by cx ≤ 0. Suppose the contrary that there is point y ∈ N C n but cy = > 0. However, we claim that there is a sequence of FNC inequalities with rational coefficients {c(i) x ≤ 0}i such that limi c(i) = c Hence, we have that, for some sufficiently large i, c(i) y ≥ /2 > 0, which renders a contradiction. Now, we briefly argue why the claimed sequence exists. It is not hard to see that the set of coefficient vectors c corresponding to FNC inequalities is a rational polyhedral cone, defined by the linear constraints C1 and the flow constraint C2, which again can be captured by linear constraints (using auxiliary flow variables). So there is a set V of rational generating vectors and c can be written as a nonnegative combination of these vectors. Suppose c = V γ, γ ≥ 0 (each column of V is a generating vector). Pick an arbitrary rational nonnegative sequence of vectors {γ (j) }j that approach to γ, and {V γ (j) } would be the desired sequence. The rest can be seen from Farka’s Lemma: Let Ax ≤ 0 be a feasible system of inequalities and cx ≤ 0 be an inequality satisfied by all x with Ax ≤ 0. Then, By Farka’s Lemma, cx ≤ 0 is a nonnegative linear combination of the inequalities in Ax ≤ 0 (see e.g., [6]). Proof of Theorem 2. We only need to show that all non-FNC inequalities are wrong. Suppose F is an object. Consider an arbitrary non-FNC inequality: Y Y |FAi |αi ≥ |FBj |βj , (1) A B where A does not saturate B. We show that we can construct a rectangular flower F that this inequality does not hold. 6 C1 does not hold: for some x ∈ [n], Pk Consider the network Pm flow instance N (A, B). Suppose Pk Pm α 1(x ∈ A ) = 6 β 1(x ∈ B ). First, if α |A | 6 = i i j j i i j=1 βj |Bj |, we can easily see i=1 j=1 i=1 Pk P n that (1) is false by considering F = [0, 2] (log LHS = i=1 αi |Ai | and log RHS = m j=1 βj |Bi |) ). Pm Pk Pm Pk Now, suppose i=1 αi |Ai | = j=1 βj |Bj | but i=1 αi 1(x ∈ Ai ) 6= j=1 βj 1(x ∈ Bj ) for some x. W.l.o.g., assume P x = 1. Let F = [0, 2] × [0, 1] × P. . . × [0, 1]. Again, it is easy to see (1) is false since log LHS = ki=1 αi 1(x ∈ Ai ) and log RHS = m j=1 βj 1(x ∈ Bj )). P Now, suppose C2 is false, that is the value of the minimum s-t cut of N (A, B) is less than j βj . Suppose the minimum s-t cut defines the partition (S, T ) of vertices such that s ∈ S and t ∈ T . Let Σ and Λ be defined P as above, and ΣS = Σ ∩ S, ΣT = Σ ∩ T , ΛS = Λ ∩ S, ΛT = Λ ∩ T . Since the min-cut is less than j βj , none of the above four sets are empty. Clearly, there is no edge from ΣS to ΛT since otherwise the value of the cut is infinity. In other words, ΛS absorbs all outgoing edges from ΣS . (See Figure 1(ii)). P P Moreover, we can see the value of the min-cut is (Ai ,x)∈ΣS αi + (Bj ,y)∈ΛT βj . Since this P P P value is less than (Bj ,y)∈Λ βj , we have that (Ai ,x)∈ΣS αi < (Bj ,y)∈ΛS βj due to C1. Now, we S construct the rectangular flower F . Suppose F = S⊆[n],S6=∅ FS and we use FS,x to denote the edge length of the close rectangle FS along axis x ∈ S. We only need to specify all FS,x s as follows: t S ⊆ Bj , for some (Bj , y) ∈ ΛS FS,x = 1 otherwise. Now, we verify that the above rectangular flow F violates the given non-FNC inequality. In fact, we can easily see that for any node (Ai , x) ∈ ΣS , there is a node (Bj , x) ∈ ΛS and we have that FAi ,x = t. Hence, Y X log |FA |αi = log t αi . (Ai ,x)∈A (Ai ,x)∈ΣS On the other hand, we have that log Y B∈B |FB |βj ≥ log t X βj , (Bj ,y)∈ΛS which implies that the given inequality is false. This proves Theorem 2. We denote the set of log-projection vectors generated by rectangular flowers to be RF n = {π ∈ R2 n −1 | π is the log-projection vector of some rectangular flower F }. Now, we prove Theorem 3. Proof of Theorem 3. Clearly, RF n ⊆ Ψn . We only need to show that RF n = N C n . We can see that a given vector π is the log-projection vector of some rectangular flower in Rn if the following linear program, denoted as LP(π), is feasible (treating fS,i as variables): X fS,i = πS , for all S ⊆ [n], i∈S fS,i ≥ fS 0 ,i, for all S ⊂ S 0 ⊆ [n]. n Hence, RF n = {π ∈ R2 −1 | LP(π) is feasible}. It is easy to check that RF n is a convex cone (i.e., if π1 , π2 ∈ RF n , aπ1 + bπ2 ∈ RF n for any a, b > 0). In fact, from basic linear programming 7 fact, RF n is a polyhedron cone. In fact, this can be easily seen as follows: We can write LP(π) as the standard matrix form {Ax = (π, 0), x ≥ 0}. Obviously, {Ax = (y1 , y2 ) | x ≥ 0} is a finitely generated cone (generated by columns of A), thus a polyhedral cone. RF n is the intersection of the above cone with the subspace {(y1 , y2 ) | x2 = 0}, which is again a polyhedral cone. It is straightforward to verify that each point in RF n satisfies all NC inequalities (we leave the verification to the reader). So RF n ⊆ N C n . Suppose P for contradiction that there is a point π ∈ NPC n but π 6∈ RF n . SoPthere is a hyperplane S⊆[n] αS xS = 0 separating RF n and π (with S⊆[n] αS πS > 0). So S⊆[n] αS xS ≤ 0 is not an FNC inequality (since π ∈ N C n should satisfy all FNC inequalities). From the proof Theorem 2, we have shown that for any non-FNC inequality, we can construct a rectangular flower that violates the inequality. This contradicts P that S⊆[n] αS xS ≤ 0 for all x ∈ RF n . Hence, N C n ⊆ RF n . This concludes the proof of the theorem. At the P end of this section, we briefly mention projection inequalities with nonzero constant terms ( S αS xS ≤ β, forP β 6= 0). If β < 0, none such inequality P is true by just considering the hypercube. Obviously, if S αS xS ≤ 0 is true for all P Px ∈ Ψn , S αS xS ≤ β for all β > 0 also. be true for any β > 0, Moreover, if S αS xS ≤ 0 is not an FNC inequality, S αS xS ≤ β can notP since we can make t large enough in the proof of Theorem 2. Conversely, if S αS xS ≤ β for some P β > 0 is true for all x ∈ Ψn , it must hold that S αS xS ≤ 0 for all x ∈ Ψn . This is because if x ∈ Ψn , ax ∈ Ψn for any a > 0. Therefore, it suffices to consider only those inequalities with zero constant term. 3 Proof of Theorem 4: Non-Convexity of Ψn In this section, we will prove Theorem 4: the non-convexity of constructible region Ψn for n ≥ 4. We suppose the converse that Ψn is convex. First, we can see that if Ψn is convex, it must be a convex cone (this is because if x ∈ Ψn , αx ∈ Ψn for α > 0). Hence, each supporting hyperplane of Ψn must correspond to an FNC inequality. Consider Π0 = {(π(T ){1,2} , π(T ){1,3} , π(T ){2,3} , π(T ){2,4} , π(T ){3,4} , π(T ){1,2,3} , π(T ){2,3,4} ) | T is an object in Rn+ }, which is the projection of Ψn onto the subspace spanned by {v{1,2} , v{1,3} , v{2,3} , v{2,4} , v{3,4} , v{1,2,3} , v{2,3,4} } where vS is the axis indexed by S ⊂ [n]. Since Ψn is a convex cone, Π0 must also be a convex cone. Hence, each linear inequality that defines Π0 must be some FNC inequality with terms T{1,2} , T{1,3} , T{2,3} , T{2,4} , T{3,4} , T{1,2,3} , T{2,3,4} . Now, we prove that any FNC inequality involving only the above terms is a nonnegative linear combination of the following two BT inequalities: |T{1,2} | · |T{1,3} | · |T{2,3} | ≥ |T{1,2,3} |2 , 2 |T{2,3} | · |T{2,4} | · |T{3,4} | ≥ |T{2,3,4} | (2) (3) By Lemma 1, it suffices to consider only NC inequalities. In fact, according to the definition of NC, the right hand side can only contain the terms T{1,2,3} and T{2,3,4} . Apply Corollary 2 in 8 Single Cover Theorem (which is discussed in next section) on this inequality, we can see that it must be a combination of (2) and (3). In other words, Π0 is defined by (2) and (3). (t) Now, we consider the vector φ0 , t > 0, t 6= 1, (t) φ0 = (0, 2 ln t, 0, 2 ln t, 0, ln t, ln t) (t) The example is essentially adopted from the example in [3]. It is easy to see that φ0 satisfies (2) (t) and (3). Now, we briefly show φ0 ∈ / Π0 . Suppose there exists an object T with the log-projection (t) vector consistent with φ0 . In other words, |T{1,2} | = |T{2,3} | = |T{3,4} | = 1, |T{1,3} | = |T{2,4} | = t2 , |T{1,2,3} | = |T{2,3,4} | = t. Note that |T{1,2} | · |T{1,3} | · |T{2,3} | = |T{1,2,3} |2 . From Theorem 4 in [3], we know that the projection of T{2,3} must be a rectangle B( 1t , t). However, since |T{2,3} | · |T{2,4} | · |T{3,4} | = |T{2,3,4} |2 , the projection of T{2,3} must be a rectangle B(t, 1t ). Since t 6= 1, the two boxes are not the same and we arrive at a contradiction. This shows that Ψn is not convex and thus completes the proof of Theorem 4. 4 Counterexample Construction for NC\BT Inequalities We have shown that the constructible region cannot be fully characterized by a set of linear inequalities as it is not convex. However, it is still interesting to see what are all correct linear inequalities. Equivalently, we want to figure out the set of linear inequalities that define Conv(Ψn ), the convex hull of Ψn . In this section, we construct counterexamples for several NC but non-BT (denoted as NC\BT) inequalities. Note that a compact object can be approximated by the union of small cubes, our counterexamples are also unions of cubes. 4.1 Skeleton In this subsection, we use an n-tuple (t1 , t2 , · · · , tn ) where ti s are non-negative Qnintegers to represent the n-dimensional unit hypercube: {(x1 , · · · xn ) | ∀i; ti ≤ xi ≤ ti + 1}, i.e., i=1 [ti , ti + 1]. Denote the sum of two sets by their Minkovski sum, namely A + B = {a + b | a ∈ A, b ∈ B}. We need to the notion of skeleton, which is important for our construction. Q Q βj Definition 4. (Connection Graph) In Rd , consider an FNC inequality ki=1 |TAi |αi ≥ m j=1 |TBj | (αi , βj > 0). The connection graph GC for the above inequality is an undirected graph GC = (V, E), where V = {v1 , · · · , vn }, representing n dimensions. The edge (vi , vj ) ∈ E if and only if both i and j appear in some Bj but not in any Ai . Definition 5. Let C1 , C2 , · · · , Cs be all cliques (complete subgraphs) in GC . M is a large positive integer. For every Cr , we define SKCr (M ) = {t | 0 ≤ ti ≤ M − 1, ∀i ∈ Cr ; ti = 0, ∀i ∈ / Cr }. The skeleton for the given NC inequality is defined as SKGC (M ) = ∪sr=1 SKCr (M ) See Figure 2 for an example. 9 3 M 1 2 M 3 2 1 M Figure 2: (i) Skeleton. (ii) Connection Graph. In the figure above, the connection graph is the right one and the corresponding skeleton is the object on the left. For S ⊂ [n], let ∆(S) be the size of maximum clique in GC [S], the subgraph induced by vertices in S. For sufficiently large M , we have the following asymptotic estimations: k Y i=1 m Y j=1 |TAi |αi ≈ M |TBj |βj ≈ M Pk Pm i=1 j=1 αi , βj ∆(Bj ) . The following lemma is a direct consequence of the above estimate. Q Q βj satisfies that Lemma 2. If the NC inequality ki=1 |TAi |αi ≥ m j=1 |TBj | k X i=1 αi < m X βj ∆(Bj ) j=1 then it is incorrect, i.e., there exists a counterexample for it. Example 3. Consider the NC inequality |T12 | · |T23P | · |T34 | ≥ |T123 | P · |T234 |. The connection graph GC contains two edges (1, 3) and (2, 4). We have i αi = 3 and j βj ∆(Bj ) = 4. Hence, the inequality is not true in general. 4.2 Union of Boxes By a box we mean a hypercube B(b) = {x | 0 ≤ xi ≤ bi } or a translation of it, i.e., B + v for some positive vector v, here the sum is the Minkowski sum. The examples in this subsection are the disjoint union of two boxes B1 and B2 . Here we require not only B1 and B2 are disjoint in Rn+ , but their projections onto any subspace RS are disjoint as well for any S ⊆ [n]. In particular, we use the following two boxes: B1 = B(1); B2 = B(M t1 , M t2 , · · · , M tn ) + 1 10 As before, for M sufficiently large and ti s to be determined later, the following asymptotic equations hold: k Pk P Y α max{0, s∈A ts } i , |TAi |αi ≈ M i=1 i i=1 m Y j=1 |TBj |βj ≈ M Pm j=1 βj max{0, P s∈Bj ts } . Note that we can use absolute value to replace the maximum function, max{0, a} = 12 (a + |a|), we obtain the following lemma. Lemma 3. If there exists t such that the following inequality is true: k X i=1 αi | X s∈Ai ts | < m X j=1 βj | X s∈Bj ts |, then the corresponding NC inequality is incorrect. Proof. Our counterexample is the union of two boxes B1 = B(1), B2 = B(M t1 , M t2 , · · · , M tn ) + 1 where t is the counterexample for the absolute value inequality. By the above asymptotics and Pm P Pk P α t = j=1 s∈Bj βj ts , we conclude that i=1 s∈Ai i s k X αi max{0, i=1 X s∈Ai ts } < m X j=1 βj max{0, X s∈Bj ts }. Hence, the object is a counterexample. Example 4. Again, consider the NC inequality in Example 3. Let t = (1, −1, 1, −1). We can see the condition of Lemma 3 is met and the inequality is incorrect. Example 5. Consider the NC inequality |T13 | · |T23 | · |T124 | ≥ |T123 | · |T1234 | and t = (−1, −1, 1, 2). So, this inequality is incorrect. 4.3 Exact Single Cover Theorem Using the union of boxes method we can also obtain the following theorem which is a necessary condition for an inequality to be true. Let ai be the 0/1 indicator vector for set Ai and bj for Bj , i.e., aij = 1 if and only if j ∈ Ai . P P Theorem 5. (Exact Single Cover Theorem) If the FNC inequality ki=1 αi xAi ≥ m j=1 βj xBj holds for every x ∈ Ψn , then for all j ∈ [m], there exist nonnegative c1 , c2 , ..., ck such that ci ≤ αi for all i and k X ci ai = βj bj . i=1 11 P Proof. Let K = {x | x = ki=1 ci ai , 0 ≤ ci ≤ αi , i = 1, 2, · · · , k}. It is immediate that K is a convex subset of Rn . If K does not include βj bj , by separating hyperplane theorem, there exists a vector t = (t1 , t2 , ..., tn ) and real number a such that t · x < a, ∀x ∈ K, but βj t · bj > a. We still use a union of two boxes to be the counterexample: B1 = B(1); B2 = B(M t1 , M t2 , · · · , M tn ) + 1. It can be seen that m Y j=1 Now, it suffices to show that k Y j=1 Qk αi j=1 |TAi | Pk |TAi |αi ≤ M The last inequality holds since |TBj |βj ≥ M βj t·bj > M a . i=1 P < M a . In the asymptotic showed before, we have that αi max{0, i:ai ·t≥0 αi ai P s∈Ai ts } ≤M P i:ai ·t≥0 αi ai ·t < M a. is in K. This completes the proof. Now, we show two simple corollaries. P P Corollary 1. Suppose the following FNC inequality ki=1 αi xAi ≥ m j=1 βj xBj holds for all x ∈ Ψn , and the set indicator vectors ai are linearly independent. Then this inequality can be written as a nonnegative combination of m BT inequalities. Proof. Let A (B resp.) be the matrix with ai being the ith column (bj the jth column). Let α = {α1 , . . . , αk }T and β = {β1P , . . . , βm }T . By the definition of FNC, we know that P Aα = Bβ. k c a for some 0 ≤ c ≤ α . So Aα = A( For each j, we know βj bj = ji i i=1 ji i j cj ), where P cj = (cj1 , . . . , cjk ). Since A has full column rank, it must be the case that α = j cj . P P Corollary 2. Suppose the following FNC inequality ki=1 αi xAi ≥ m j=1 βj xBj holds for all x ∈ Ψn , and m = 1 or 2. Then this inequality can be written as a nonnegative combination of m BT inequalities. P Proof. We only need to consider the case m = 2. From Theorem 5, β1 b1 = ki=1 ci ai for some Pk Pk 0 ≤ ci ≤ αi . Since i=1 αi ai = β1 b1 + β1 b2 , we have that β2 b2 = i=1 (αi − ci )ai . Example 6. Consider the NC inequality |T12 | · |T23 | · |T34 | ≥ |T123 | · |T234 | in Example 3. From either of the above corollary, if it is true, it can be decomposed into two BT inequalities. However, it is clear such a decomposition does not exist. So it is not true in general. Similarly, we can also see the inequality in Example 5 is not true. 12 4.4 A Hybrid Approach In fact, neither of the above methods are sufficient to disprove all NC\BT inequalities. In this section, we demonstrate an application of the combination of the above approaches. Example 7. One interesting example is the following NC inequality: |T1 | · |T12 | · |T23 | · |T34 | · |T24 | ≥ |T123 | · |T234 | · |T124 |. The example satisfies the statement of Theorem 5, however, we can show it is also not correct. Our counterexample utilizes a combination of skeleton and union-box methods. We observe that the given inequality is a combination of |T12 | · |T23 | · |T34 | ≥ |T123 | · |T234 |, and |T1 | · |T24 | ≥ |T124 |. We already have a skeleton counterexample for the former. Our idea is to take the union of the skeleton and a disjoint box B so that the values of |T12 |, |T23 |, |T34 |, |T123 |, |T234 | remain (approximately) the same, but |T1 | · |T24 | ≈ |T124 |. Since the skeleton construction allows the left hand side to be arbitrarily larger than the right hand side, we can see that Example 7 is also incorrect. We can let B = B(R3 , R−4 , R−6 , R5 ) with R > 0 large enough (larger than the constant M in the skeleton construction). Hence, |T1 | · |T24 | ≈ |T124 | ≈ R4 but |T12 | ≈ M + R−1 , , |T23 | ≈ M + R−10 , |T34 | ≈ M + R−1 , |T123 | ≈ M 2 + R−7 , |T234 | ≈ M 2 + R−5 . We have shown that some NC\BT, inequalities are not correct. It remains to ask whether there is an NC\BT inequality is correct. We have been unable to discover one such inequality. We have checked (in an exhaustive manner) all inequalities in R3 and R4 , and found out that all NC\BT inequalities are not true. Hence, we propose Conjecture 1, mentioned in the introduction. 5 Final Remarks and Acknowledgements All of our counterexamples in Section 4 are essentially combinatorial, and the constructions allow one side of the inequality to be arbitrarily larger than the other side. We suspect that all incorrect projection inequalities can be refuted in a similar fashion. In other words, we may not need to construct very delicate, twisted geometric objects, but instead just a union of a small number of boxes (the number related to n), to refute any incorrect linear projection inequality. We have developed a few other techniques to disprove some of NC inequalities. For example fitting boxes model is the combination of the two models we introduced. It consists of many boxes, each constructed according to the connection graph. Fitting box model can be used to handle all 4-dimensional inequality. However, it is hard to analyze and generalize to higher dimensions, and we decide not to introduce it here. In 2010, the third author JL proposed the notion of rectangular flowers and suspected that RF n = Ψn , which, if true, is a natural extension of the box theorem 2 in [3]. In fact, JL “verified” the above claim empirically using hundreds of thousands datasets (synthetically generated from different distributions with different dimensions and parameters). Now, we know that RF n ( Ψn . But it is still an interesting fact that all NC inequalities are true for many “random-like” data and there may be good mathematical reasons for it. Moreover, our counter-examples, which appear to 2 Let K be a body in Rm . The box theorem states that there is a rectangle B with vol(B) = vol(K) and vol(πS (B)) ≤ vol(πS (K)) for every S ⊆ [m]. 13 be quite simple in retrospect, may not be totally obvious without realizing the equivalence between rectangular flowers and the NC inequalities. We would like to thank Yuval Peres for introducing BT and Shearer’s inequalities to us, Elad Verbin and Raymond Yeung for discussing non-Shannon-type inequalities. In particular, we would like to thank Jeff Kahn for several discussions, and casting a doubt in the very beginning about RF n ? = Ψn , even the convexity of Ψn , for n ≥ 4, despite the “empirical evidences” we showed to him. We also thank Dan Suciu, Uri Zwick, Gil Kalai, Ely Porat, Zizhuo Wang, Chunwei Song, Yuan Yao, Andrew Thomason and Jacob Fox for useful discussions. References [1] Arvind Arasu, Raghav Kaushik, and Jian Li. Data generation using declarative constraints. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pages 685–696. ACM, 2011. [2] Paul Balister and B´ela Bollob´ as. Projections, entropy and sumsets. Combinatorica, 32(2):125– 141, 2012. [3] B´ela Bollob´ as and Andrew Thomason. Projections of bodies and hereditary properties of hypergraphs. Bulletin of the London Mathematical Society, 27(5):417–424, 1995. [4] Fan RK Chung, Ronald L Graham, Peter Frankl, and James B Shearer. Some intersection theorems for ordered sets and graphs. Journal of Combinatorial Theory, Series A, 43(1):23–37, 1986. [5] Ehud Friedgut. Hypergraphs, entropy, and inequalities. The American Mathematical Monthly, 111(9):749–760, 2004. [6] B. Korte and J. Vygen. Springer, 2012. Combinatorial optimization: theory and algorithms, volume 21. [7] LH Loomis and H Whitney. An inequality related to the isoperimetric inequality. 1949. [8] Hung Q Ngo, Ely Porat, Christopher R´e, and Atri Rudra. Worst-case optimal join algorithms:[extended abstract]. In Proceedings of the 31st symposium on Principles of Database Systems, pages 37–48. ACM, 2012. [9] Raymond W Yeung. Information theory and network coding. Springer, 2008. A Appendix 1 (BT n = Ψn for n ≤ 3) In this section, we prove BT 3 = Ψ3 in R3 . This appears to be a folklore result, and we provide a proof for completeness. Since BT inequalities are correct for every projection vector, it suffices to prove that for any vector π, there exist an object T such that π(T ) = π if π satisfies all BT inequalities in 3-dimension. In fact, we show BT n = N C n for n = 3. Since Ψn is sandwiched between them, all three of them are the same. Hence, it suffices to show that any NC inequality in R3 is the combination of some BT inequalities. 14 Suppose an NC inequality in R3 has the following form: Y |TS |αS ≥ 1 S⊆[3] where αS ∈ Z. According to the definition of NC inequalities, it is not hard to verify that αS ≥ 0 for all |S| = 1. If αS ≥ 0 for all |S| = 2, the inequality is indeed a BT inequalities. Now, suppose αS < 0 for some |S| = 2. Without lose of generality, we assume that α{1,2} < 0. By definition of NC, we obtain the following inequalities: α{1} ≥ −α{1,2} ; α{2} ≥ −α{1,2} and Y S⊆[3] |TS |αS · |T · |T{2} | α{1,2} ≥ 1, |T{1,2} | {1} | which is still an NC inequality without the term |T{1,2} |α{1,2} . 0 Q 0 0 Rewrite it as S⊆[3] |TS |αS ≥ 1 with α{1,2} = 0. If the above embedding inequality is still not 0 an BT inequality yet, then there exist some |S | = 2 such that αS 0 < 0. Without lose of generality, 0 we may assume that α{1,3} < 0. Repeating the above operation for α{1,2} , we can eliminate the 0 α 0 0 α 0 term |T{1,3} | {1,3} as well as |T{2,3} | {2,3} (if necessary) in the same way. The remaining part must be a BT inequality since the only negative αS is α{1,2,3} . Thus, we prove that BT 3 = N C 3 = Ψ3 . 15 Upper Bound on Function Computation in Directed Acyclic Networks Cupjin Huang, Zihan Tan and Shenghao Yang Institute for Interdisciplinary Information Sciences Tsinghua University, Beijing, China Abstract—Function computation in directed acyclic networks is considered, where a sink node wants to compute a target function with the inputs generated at multiple source nodes. The network links are error-free but capacity-limited, and the intermediate network nodes perform network coding. The target function is required to be computed with zero error. The computing rate of a network code is measured by the average number of times that the target function can be computed for one use of the network. We propose a cut-set bound on the computing rate using an equivalence relation associated with the inputs of the target function. Our bound holds for general target functions and network topologies. We also show that our bound is tight for some special cases where the computing capacity can be characterized. I. I NTRODUCTION We consider function computation in a directed acyclic network, where a target function f is intended to be calculated at a sink node, and the input symbols of the target function are generated at multiple source nodes. As a special case, network communication is just the computation of the identity function.1 Network function computation naturally arises in sensor networks [1] and Internet of Things, and may find applications in big data processing. Various models and special cases of this problem have been studied in literature (see the summarizations in [2]– [4]). We are interested in the following network coding model for function computation. Specifically, we assume that the network links have limited (unit) capacity and are error-free. Each source node generates multiple input symbols, and the network codes perform vector network coding by using the network multiple times.2 An intermediate network node can transmit the output of a certain fixed function of the symbols it receives. Here all the intermediated nodes are considered with unbounded computing ability. The target function is required to be computed correctly for all possible inputs. We are interested in the computing rate of a network code that computes the target function, i.e., the average number of times that the target function can be computed for one use of the network. The maximum achievable computing rate is called the computing capacity. When computing the identity function, the problem becomes the extensively studied network coding [5], [6], and it is known that in general linear network codes are sufficient to achieve the multicast capacity [6], [7]. For linear target functions over a finite field, a complete characterization of 1A function f : A → A is identity if f (x) = x for all x ∈ A. use of a network means the use of each link in the network at most 2 One once. the computing capacity is not available for networks with one sink node. Certain necessary and sufficient conditions have been obtained such that linear network codes are sufficient to calculate a linear target function [4], [8]. But in general, linear network codes are not sufficient to achieve the computing capacity of linear target functions [9]. Networks with a single sink node are discussed in this paper, while both the target function and the network code can be non-linear. In this scenario, the computing capacity is known when the network is a multi-edge tree [2] or when the target function is the identity function. For the general case, various bounds on the computing capacity based on cut sets have been studied [2], [3]. We find, however, that the upper bounds claimed in the previous works do not generally hold. For an example that we will evaluate, the computing capacity is strictly larger than the two upper bounds claimed in [2], [3] respectively, where the issue is related to the classification of the inputs of the target function that are in a certain sense equivalent for each cut. Towards a general upper bound, we define an equivalence relation associated with the inputs of the target function (but does not depend on the network topology) and propose a cutset bound on the computing capacity using this equivalence relation. Our bound holds for general target functions and general network topologies in the network coding model. We also show that our bound is tight when the network is a multiedge tree or when the target function is the identity function. In the remainder of this paper, Section II formally introduces the network computing model. The upper bound of the computing rate is given in Theorem 3, and is proved in Section IV. Section III compares with the previous results and discusses the tightness of our upper bound. Omitted proofs in this paper can be found in [10]. II. M AIN R ESULTS In this section, we will first introduce the network computing model. Then we will define cut sets and discuss some special cases of the function computation problem. Last we head to the main theorem about the cut-set bound for function computation. A. Function-Computing Network Codes Let G = (V, E) be a directed acyclic graph (DAG) with a finite vertex set V and an edge set E, where multi-edges between a certain pair of nodes are allowed. A network over G is denoted as N = (G, S, ρ), where S ⊂ V is called the source nodes and ρ ∈ V\S is called the sink node ρ. Let s = |S|, and without loss of generality (WLOG), let S = {1, 2, . . . , s}. For an edge e = (u, v), we call u the tail of e (denoted by tail(e)) and v the head of e (denoted by head(e)). Moreover, for each node u ∈ V, let Ei (u) = {e ∈ E : head(e) = u} and Eo (u) = {e ∈ E : tail(e) = u) be the set of incoming edges and the set of outgoing edges of u, respectively. Fix a topological order of the vertex set V. This order naturally induces an order of the edge set E, where edges e > e0 if either tail(e) > tail(e0 ) or tail(e) = tail(e0 ) and head(e) > head(e0 ). WLOG, we assume that Ei (j) = ∅ for all source nodes j ∈ S, and Eo (ρ) = ∅. We will illustrate in Section III-C how to apply our results on a network with Ei (j) 6= ∅ for certain j ∈ S. if e is an outgoing edge of u ∈ V \ (S ∪ {ρ}), then g e (x) = he g Ei (u) (x) . Denote by A and O two finite alphabets. Let f : As → O be the target function, which is the function to be computed via the network and whose ith input is generated at the ith source node. We may use the network to compute the function multiple times. Suppose that the jth source node consecutively generates k symbols in A denoted by x1j , x2j , . . . , xkj , and the symbols generated by all the source nodes can be given as a matrix x = (xij )k×s . We denote by xj the jth column of x, and denote by xi the ith row of x. In other words, xj is the vector of the symbols generated at the jth source node, and xi is the input vector of the ith computation of the function f . Define for x ∈ Ak×s > f (k) (x) = f (x1 ), f (x2 ), . . . , f (xk ) . B. Cut Sets and Special Cases The network defined above is used to compute a function, where multiple inputs are generated at the source nodes and the output of the function is demanded by the sink node. The computation units with unbounded computing ability are allocated at all the network nodes. However, the computing capability of the network will be bounded by the network transmission capability. Denote by B a finite alphabet. We assume that each edge can transmit a symbol in B reliably for each use. For convenience, we denote by xJ the submatrix of x formed by the columns indexed by J ⊂ S, and denote by xI the submatrix of x formed by the rows indexed by I ⊂ {1, 2, . . . , k}. We equate A1×s with As in this paper. For two positive integers n and k, a (n, k) (functioncomputing) network code over network N with target function f is defined as follows. Let x ∈ Ak×s be the matrix formed by symbols generated at the source nodes. The purpose of the code is to compute f (k) (x) by transmitting at most n symbols in A on each edge in E. Denote the symbols transmitted on edge e by g e (x) ∈ B n . For a set of edges E ⊂ E we define g E (x) = (g e (x)|e ∈ E) where g e1 (x) comes before g e2 (x) whenever e1 < e2 . The (n, k) network code contains the encoding function for each edge e, define: k n A if u ∈ S; Q →B , he : B n → B n , otherwise. e0 ∈Ei (tail(e)) Functions he , e ∈ E determine the symbols transmitted on the edges. Specifically, if e is an outgoing edge of the ith source node, then g e (x) = he (xi ); The (n, k) network code also contains a decoding function Y ϕ: Bn → Ok . e0 ∈Ei (ρ) Define ψ(x) = ϕ g Ei (ρ) (x) . If the network code computes f , i.e., ψ(x) = f (k) (x) for all x ∈ Ak×s , we then call nk log|B| |A| an achievable computing rate, where we multiply nk by log|B| |A| in order to normalize the computing rate for target functions with different input alphabets. The computing capacity of network N with respect to a target function f is defined as k k C(N , f ) = sup log|B| |A| log|B| |A| is achievable . n n For two nodes u and v in V, denote the relation u → v if there exists a directed path from u to v in G. If there is no directed path from u to v, we say u is separated from v. Given a set of edges C ⊆ E, IC is defined to be the set of source nodes which are separated from the sink node ρ if C is deleted from E. Set C is called a cut set if IC 6= ∅, and the family of all cut sets in network N is denoted as Λ(N ). Additionally, we define the set KC as KC = {i ∈ S|∃v, t ∈ V, i → v, (v, t) ∈ C} . It is reasonable to assume that u → ρ for all u ∈ V. Then one can easily see that KC is the set of source nodes from which there exists a path to the sink node through C. Define JC = KC \IC . The problem also becomes simple when s = 1. Proposition 1. For a network N with a single source node and any target function f : A → O, C(N , f ) = min C∈Λ(N ) |C| , log|A| |f [A]| where f [A] is the image of f on O. C. Upper Bounds In this paper, we are interested in the general upper bound on C(N , f ). The first upper bound is induced by Proposition 1 Proposition 2. For a network N with target function f , C(N , f ) ≤ min C∈Λ(N ):IC =S |C| . log|A| |f [As ]| Proof: Build a network N 0 by joining all the source nodes of N into a single “super” source node. Since a code for network N can be naturally converted to a code for network N 0 (where the super source node performs the operations of all the source nodes in N ), we have 1 2 e4 C(N , f ) ≤ C(N 0 , f ). e2 The above upper bound only uses the image of function f . We propose an enhanced upper bound by investigating an equivalence relation on the input vectors of f . We will compare this equivalence relation with similar definitions proposed in [2], [3] in the next section. Definition 1 (Equivalence Class). For any function f : As → O, any two disjoint index sets I, J ⊆ S, and any a, b ∈ e6 While Proposition 2 induces that |C| C∈Λ(N1 ):IC =S log|A| |f [As ]| = min |C| C (N1 , f ) ≤ (c) xS\(I∪J) = yS\(I∪J) . Two vectors a and b satisfying a ≡ b|I,J are said to be (I, J, c)-equivalent. When J = ∅ in the above definition, we use the convention that c is an empty matrix. (c) For every f , I, J and c ∈ A|J| , let WI,J,f denote the (c) total number of equivalence class induced by ≡ |I,J . Given a (c) network N and a cut set C, let WC,f = maxc∈A|JC | WIC ,JC ,f . Our main result is stated as following. The proof of the theorem is presented in Section IV). Theorem 3. If N is a network and f is a target function, then |C| C(N , f ) ≤ min := min-cut(N , f ). C∈Λ(N ) log|A| WC,f D ISCUSSION OF U PPER B OUND In this section, we first give an example to illustrate the upper bound. We compare our result with the existing ones, and proceed by a discussion about the tightness of the bound. A. An Illustration of the Bound First we give an example to illustrate our result. Consider the network N1 in Fig. 1 with the object function f (x1 , x2 , x3 ) = x1 x2 + x3 , where A = B = O = {0, 1}. Let us first compare the upper bounds in Theorem 3 and Proposition 2. Let C0 = {e6 , e7 }. Here we have • |C0 | = 2, IC0 = {3}, JC0 = {1, 2} ; and For any given inputs of nodes 1 and 2, different inputs from node 3 generate different outputs of f . Therefore (c) WIC ,JC ,f = 2 for any c ∈ A2 and hence WC0 ,f = 2. 0 0 By Theorem 3, we have |C0 | = 2. C(N1 , f ) ≤ min-cut(N1 , f ) ≤ log|A| WC0 ,f ρ Fig. 1. Network N1 has three source nodes, 1, 2 and 3, and one sink node ρ that computes the nonlinear function f (x1 , x2 , x3 ) = x1 x2 + x3 , where A = B = O = {0, 1}. A|I| , c ∈ A|J| , we say a ≡ b|I,J if for every x, y ∈ A1×s , we have f (x) = f (y) whenever xI = a, yI = b, xJ = yJ = c and Note that the definition of equivalence does not require a previously given network. However, it will soon be clear that with a network, the division of equivalence classes naturally leads to an upper bound of the network function-computing capacity based on cut sets. e7 v (c) • e3 e5 e1 The proof is completed by applying Proposition 1 on N 0 and Λ(N 0 ) = {C ∈ Λ(N ) : IC = S}. III. 3 min C∈Λ(N1 ):IC =S = 4, where the first equality follows from f [As ] = {0, 1}, and the second equality follows from min C∈Λ(N1 ):IC =S |C| = | {e4 , e5 , e6 , e7 } | = 4. Therefore, Theorem 3 gives a strictly better upper bound than Proposition 2. The upper bound in Theorem 3 is actually tight. We claim that there exists a (1, 2) network code that computes f in N1 . Consider an input matrix x = (xij )2×3 . Node i sends x1i to node v and sends x2i to node ρ for i = 1, 2, 3 respectively, i.e., for i = 1, 2, 3 g ei = x1i , g ei+3 = x2i . Node v then computes f (x1 ) = x11 x12 + x13 and sends it to node ρ via edge e7 . Node ρ receives f (x1 ) from e7 and computes f (x2 ) = x21 x22 + x23 using the symbols received from edges e4 , e5 and e6 . B. Comparison with Previous Works Upper bounds on the computing capacity have been studied in [2], [3] based on a special case of the equivalence class defined in Definition 1. However, we will demonstrate that the bounds therein do not hold for the example we studied in the last subsection. In Definition 1, when J = ∅, we will say a ≡ b|I , or a and b are I-equivalent. That is a ≡ b|I if for every x, y ∈ A1×s with xI = a, yI = b and xS\I = yS\I , we have f (x) = f (y). For target function f and I ⊂ S, denote by RI,f the total number of equivalence classes induced by ≡ |I . For a cut C ∈ Λ(N ), let RC,f = RIC ,f . Then we have the following lemma: Lemma 1. Fix network N and function f . Then, i) for any C ∈ Λ(N ), we have RC,f ≥ WC,f ; ii) for any C, C 0 ∈ Λ(N ) with C 0 ⊂ C and IC 0 = IC , we have WC 0 ,f ≥ WC,f . Define |C| . C∈Λ(N ) log|A| RC,f min-cutA (N , f ) = min By Lemma 1, we have min-cut(N , f ) ≥ min-cutA (N , f ). It is claimed in [2, Theorem II.1] that min-cutA (N , f ) is an upper bound on C(N , f ). We find, however, min-cutA (N , f ) is not universally an upper bound for the computing capacity. Consider the example in Fig. 1. For cut set C1 = {e4 , e6 , e7 }, we have IC1 = {1, 3}. On the other hand, it can be proved that RC1 ,f = 4 since i) f is an affine function of x2 given that x1 and x3 are fixed, and ii) it takes 2 bits to represent this affine function over the binary field. Hence |C1 | 3 min-cutA (N1 , f ) ≤ = < 2 = C(N1 , f ). log|A| RC1 ,f 2 For a network N as defined in Section II-A, we say a subset of nodes U ⊂ V is a cut if |U ∩ S| > 0 and ρ ∈ / U . For a cut U , denote by E(U ) the cut set determined by U , i.e., E(U ) = {e ∈ E : tail(e) ∈ V, head(e) ∈ V \ U }. Let Define Λ∗ (N ) = {E(U ) : U is a cut in N }. min-cutK (N , f ) = min C∈Λ∗ (N ) |C| . log|A| RC,f Since Λ∗ (N ) ⊂ Λ(N ), min-cutK (N , f ) ≥ min-cutA (N , f ). It is implied by [3, Lemma 3] that min-cutK (N , f ) is an upper bound on C(N , f ). However, min-cutK (N , f ) is also not universally an upper bound for the computing capacity. Consider the example in Fig. 1. For the cut U1 = {1, 3, v}, the corresponding cut set E(U1 ) = C1 = {e4 , e6 , e7 }. Hence, min-cutK (N1 , f ) ≤ |C1 | 3 = < 2 = C(N1 , f ). log|A| RC1 ,f 2 C. Tightness The upper bound in Theorem 3 is tight when the network is a multi-edge tree. Theorem 4. If G is a multi-edge tree, for network N = (G, S, ρ) and any target function f , C(N , f ) = min-cut(N , f ). The upper bound in Theorem 3 is not tight for certain cases. Consider the network N2 in Fig. 2(a) provided in [2]. Note that in N2 , source nodes 1 and 2 have incoming edges. To match our model described in Section II-A, we can modify N2 to N20 shown in Fig. 2(b), where the number of edges from node i to node i0 is infinity, i = 1, 2. Every network code in N20 naturally induces a network code in N2 and vise versa. Hence, we have C(N2 , f ) = C(N20 , f ). We then evaluate min-cut(N20 , f ). Note that |C| <∞ log|A| WC,f holds only if |C| < ∞, and we can thus consider only the finite cut sets. For a finite cut set C, we denote by C 0 = C ∩ {e1 , . . . , e4 }. We have |C 0 | ≤ |C| and JC 0 ⊆ JC , and we claim IC 0 = IC . Note that IC 0 ⊆ IC . Suppose that there exists 3 3 e1 1 e3 1 e4 2 e3 10 2 e2 e1 20 e2 e4 ρ ρ (a) Network N2 (b) Network N20 Fig. 2. Networks N2 and N20 have three binary sources, {1, 2, 3} and one sink ρ that computes the arithmetic sum of the source messages, where A = B = {0, 1}. In N20 , the number of edges from node i to node i0 is infinity, i = 1, 2. i ∈ IC \ IC 0 , then there exists a path from i to ρ which is disjoint with C 0 , but shares a subset D of edges with C. Then D ⊂ Eo (i) and hence |D| = 1. We simply replace the edge in D by an arbitrary edge in Eo (i) \ C and form a new path from i to ρ. This is always possible, since C ∩ Eo (i) is finite while Eo (i) is not. The newly formed path is disjoint with C, and then we have i ∈ / IC , a contradiction. According to Lemma 1, we have WC 0 ,f ≥ WC,f and hence |C| |C 0 | log|A| WC 0 ,f ≤ log|A| WC,f . Therefore we can consider only cut sets C 0 ⊆ {e1 , e2 , e3 , e4 }. We then have min-cut(N20 , f ) = 1, where the minimum is obtained by the cut set {e2 , e4 }. While for network N2 , it has been proved in [2] that C(N2 , f ) = log6 4 < 1. Hence min-cut(N20 , f ) = 1 > C(N20 , f ). IV. P ROOF OF M AIN T HEOREM To prove Theorem (3), we first give the definition of Fextension and two lemmas. Definition 2. [F-Extension] Given a network N and a cut set C ∈ Λ(N ), define D(C) ⊆ E as [ D(C) = Eo (i). i∈I / C Then the F-extension of C is defined as F (C) = C ∪ D(C). Lemma 2. For every cut set C, F (C) is a global cut set, i.e. ∀C ∈ Λ(N ), IF (C) = S. Proof: Clearly, IC ⊆ IF (C) , then it suffices to show that for all i ∈ / IC , we have i ∈ IF (C) . This is true, since Eo (i) ⊆ F (C) and i ∈ IEo (i) imply i ∈ IF (C) . Lemma 3. Consider a (n, k) network code in N = (G, S, ρ). For any global cut set C, ψ(x) is a function of g C (x), i.e., ψ(x) = ψ C (g C (x)) for certain function ψ C . Proof: For a global cut set C of N . Let GC be the subgraph of G formed by the (largest) connected component of G including ρ after removing C from E. Let SC be the set of nodes in GC that do not have incoming edges. Since GC is also a DAG, SC is not empty. For each node u ∈ SC , we have i) u is not a source node in N since otherwise C would not be a global cut set, and ii) all the incoming edges of u in G are in C since otherwise GC can be larger. For each node u in GC but not in SC , the incoming edges of u are either in GC or in C, since otherwise the cut set C would not be global. If we can show that for any edge e in GC , g e (x) is a function of g C (x), then ψ(x) = ϕ(Ei (ρ)) is a function of g C (x). Suppose that GC has K nodes. Consider the topological order on the set of nodes in GC , and number these nodes as u1 < . . . < uK , where uK = ρ. Denote by Eo (u|GC ) the set of outgoing edges of u in GC . We claim that g Eo (ui |GC ) (x) is a function of g C (x) for i = 1, . . . , K, which implies that for any edge e in GC , g e (x) is a function of g C (x). We prove this inductively. First g Eo (u1 |GC ) (x) is a function of g C (x) since u1 ∈ SC and hence all the incoming edges of u1 in G are in C. Assume that the claim holds for the first k nodes in GC , k ≥ 1. For uk+1 , we have two cases: If uk+1 ∈ SC , the claim holds since all the incoming edges of uk+1 in G are in C. If uk+1 ∈ / SC , we know that Ei (uk+1 ) ⊂ ∪ki=1 Eo (ui |GC ) ∪ C. By the induction hypothesis, we have that Eo (uk+1 |GC ) is a function of g C (x). The proof is completed. In the following proof of Theorem 3, it will be handy to extend the equivalence relation for a block of function inputs. For disjoint sets I, J ∈ S and c ∈ A1×|J| we say a, b ∈ Ak×|I| are (I, J, c)-equivalent if for any x, y ∈ Ak×s with > xI = a, yI = b, xJ = yJ = c> , c> , . . . , c> and xS\I∪J = yS\I∪J , we have f (k) (x) = f (k) (y). Then for the set Ak×|I| , the numberof equivalence classes induced by the equivalence k (c) relation is WI,J,f with . ∗ a and b are not (IC ∗ , JC ∗ , c∗ )-equivalent and ii) g C (x) = C∗ g (y) for any x, y ∈ Ak×s with xIC ∗ = a, yIC ∗ = b, xJ ∗ = yJC ∗ = (c∗> , c∗> , . . . , c∗> )> , (5) x C S\KC ∗ = yS\KC ∗ . Fix x, y ∈ Ak×s satisfying (5) and f (k) (x) 6= f (k) (y). The existence of such x and y is due to i). Since C ∗ and D(C ∗ ) are disjoint (see Definition 2) and for any i ∈ / I C ∗ , xi = yi , together with ii), we have ∗ Thus, applying Lemma 3 we have ψ(x) = ψ(y). Therefore, the code cannot computes both f (k) (x) and f (k) (y) correctly. The proof is completed. V. (1) We show that this code cannot compute f (x) correctly for all x ∈ Ak×s . Denote |C| C ∗ = arg min C∈Λ(N ) log|A| WC,f (2) and R EFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] (c) c∗ = arg max WIC ∗ ,JC ∗ ,f . c∈A|JC ∗ | (3) By (1)-(3), we have k |C ∗ | log|B| |A| > , (c∗ ) n log|A| WIC ∗ ,JC ∗ ,f which leads to |B||C ∗ ∗ |n k (c∗ ) < WC ∗ ,f . (4) Note that g C (x) only depends on xKC ∗ . By (4) and the pigeonhole principle, there exist a, b ∈ Ak×|IC ∗ | such that i) C ONCLUDING R EMARKS We propose a new definition of equivalence classes associated with the inputs of a function, which enable us to obtain a general upper bound on the network function computing capacity. We show that the upper bound is tight when the network is a multi-edge tree. Proof of Theorem 3: Suppose that we have a (n, k) code k log|B| |A| > min-cut(N , f ). n ∗ g F (C ) (x) = g F (C ) (y). [10] A. Giridhar and P. Kumar, “Computing and communicating functions over sensor networks,” Selected Areas in Communications, IEEE Journal on, vol. 23, no. 4, pp. 755–764, April 2005. R. Appuswamy, M. Franceschetti, N. Karamchandani, and K. Zeger, “Network coding for computing: Cut-set bounds,” Information Theory, IEEE Transactions on, vol. 57, no. 2, pp. 1015–1030, Feb 2011. H. Kowshik and P. Kumar, “Optimal function computation in directed and undirected graphs,” Information Theory, IEEE Transactions on, vol. 58, no. 6, pp. 3407–3418, June 2012. A. Ramamoorthy and M. Langberg, “Communicating the sum of sources over a network,” Selected Areas in Communications, IEEE Journal on, vol. 31, no. 4, pp. 655–665, April 2013. R. Ahlswede, N. Cai, S.-Y. R. Li, and R. W. Yeung, “Network information flow,” IEEE Trans. Inform. Theory, vol. 46, no. 4, pp. 1204– 1216, Jul. 2000. S.-Y. R. Li, R. W. Yeung, and N. Cai, “Linear network coding,” IEEE Trans. Inform. Theory, vol. 49, no. 2, pp. 371–381, Feb. 2003. R. Koetter and M. Medard, “An algebraic approach to network coding,” IEEE/ACM Trans. Networking, vol. 11, no. 5, pp. 782–795, Oct. 2003. R. Appuswamy and M. Franceschetti, “Computing linear functions by linear coding over networks,” Information Theory, IEEE Transactions on, vol. 60, no. 1, pp. 422–431, Jan 2014. B. Rai and B. Dey, “On network coding for sum-networks,” Information Theory, IEEE Transactions on, vol. 58, no. 1, pp. 50–63, Jan 2012. C. Huang, Z. Tan, and S. Yang, “Upper bound on function computation in directed acyclic networks,” 2014, submitted to ITW ’15. [Online]. Available: http://iiis.tsinghua.edu.cn/∼shenghao/pub/huang14.pdf J Comb Optim DOI 10.1007/s10878-014-9725-1 On the computational complexity of bridgecard Zihan Tan © Springer Science+Business Media New York 2014 Abstract Bridgecard is a classical trick-taking game utilizing a standard 52-card deck, in which four players in two competing partnerships attempt to “win” each round, i.e. trick. Existing theories and analysis have already attempted to show correlations between system designs and other technical issues with parts of the game, specifically the “Bidding” phase, but this paper will be the first to attempt to initiate a theoretical study on this game by formulating it into an optimization problem. This paper will provide both an analysis of the computational complexity of the problem, and propose exact, as well as, approximation algorithms. Keyword Bridgecard · Computational complexity · Approximation algorithm 1 Introduction Bridgecard is a game in which the two partners progress through two main phases: Bidding and Playing. In the Bidding phase, both partnership work jointly in order to secure a ‘contract’, that is, to determine a specified goal in the number of tricks in the declared denomination. Once the Bidding is completed, the game enters the Playing phase. In this phase, one player from the partnership which established the contract, known as the ‘declarer’, throws down a card from his hand. His partner, known as the ‘dummy’, attempts to support him, and the other partnership, known as the ‘defenders’, must deter them from completing the contract. In each round, each player throws down one card. The player with the highest denomination card wins the round, i.e. takes the trick. Should the declarer and dummy partnership fulfill their contract, i.e. wins the number of tricks agreed upon, they win this game. Z. Tan (B) Tsinghua University, Beijing, China e-mail: [email protected] 123 J Comb Optim The intricacies of this game lie in the fact that the players cannot coordinate with their partners even during the Bidding phase. Each must make his move based on different partial information while still attempting to fulfill their individual roles. 1.1 Relevant results Previous research about bridgecard lies mainly in technical part, such as how to design a reasonable bidding system and how to use signals in defending correctly. The book “Killing Defence” (Kelsey 1994) developed quantities of techniques in defending against a contract, while another book “the Expert Game” (Reese 1994) introduced all kinds of methods for a declarer. Analysis on strategy and computational complexity of similar games are also appealing to computer scientists and mathematicians. Such research includes Chess, Poker, Minesweeping, and so on. It seems that those research forms a new direction of theoretical development. Demaine has written a book “Algorithmic Combinatorial Game Theory” [3] on these problems, making analysis on games more relative with theoretical computer science. Another famous card game UNO was investigated in mathematical and computer aspects by Demaine (2010). He formulated the game UNO into mathematical forms and analyzed it in different versions. A single-player version was proved to be NP-hard and the uncooperative version for 2-player version was PSPACE-complete. Similar research has been done on different games including Tetris, Amazon, Othello and other games, which also helped establishing a new direction in theoretical research which stems from classical game theory, but included techniques in algorithm and theory of computation. 1.2 Motivation Previous research in technical aspect considered a real bridgecard game where there were a total of 52 cards, which was a severe restriction on understanding the complexity of the game. Bridgecard seems to be more difficult than other card games, however this is not clearly proved in mathematical and computer science aspects. Thus, a proper generalization of this game is necessary. Only if we drop the limit on number of cards and make it a parameter of the game, can we understand the complexity of bridgecard game accurately. On the other hand, popularity of this game and the fact that some other similar card games have been well studied theoretically make the bridgecard game deserve further explanation and analysis from theoretical perspective, which motivates our study. In this paper we investigate bridgecard game by formulating it into optimization problems, giving exact algorithms as well as approximation algorithms to them and analyzing the computational complexity of them. 2 Preliminaries Games are usually formulated into different versions of mathematical problems with respect to different settings of it. Perfect information or imperfect information? Single- 123 J Comb Optim player or multi-player? In a bridgecard game we have 4 players and 4 suits of cards (they are spade, heart, diamond and club). The number of players and the number of suits are the features of bridgecard. For simplicity we consider the case that game is played with only one suit of cards. 2-player case and 4-player case will be discussed respectively. 2.1 Game setting In a real bridgecard game, Players North(N), South(S), East(E), West(W) are dealt equal number (13) of cards respectively, where each card has its color and number. The game proceeds clockwise. North and South are partners, and East and West are partners. The following auction chooses one of the four players to be the declarer. Every bid in auction specifies one number (representing how many tricks he aims to get) and a trump suit (normally representing the longest suit in his hand). The pair that sets the contract (i.e. makes the last bid in auction) will try to win at least certain number of tricks, with the specified suit as trumps. Every bid actually represents the number of tricks in excess of six which the partnership undertakes to win. For example a bid of “two hearts” represents a contract to win at least 8 tricks (8 = 6 + 2) with hearts as trumps. For playing part, the game is playing in rounds. In every round each player plays a card and the player who wins this trick will play first in the next round. The player that is clockwise next to the declarer leads the first card in the first round. For example, if North is the declarer, then East is going to lead, and the suit of the card that he leads comes to be the suit of this round. Immediately after this opening lead, cards of declarer’s partner are exposed to all rest players. Play proceeds clockwise. Each player must (if possible) play a card of the suit of the round. A player without cards of that suit may play other cards. We say one player wins the trick if he plays the largest number out of four. And he is going to play first in the following round. We call the player who exposed his hand by another term dummy. It takes no active part in the play of the hand. Whenever it is dummy’s turn to play, the declarer will play one of his cards. Dummy is not permitted to offer any advice or comment. 2.2 Definition and models First, we begin by formalizing an important problem in bridgecard into an optimization problem: Suit play problem. A single-suit play problem is both essential and difficult, because it asks player to design an order of playing the cards in one suit so as to maximize the probability of taking certain number of tricks. In terms of bridgecard, a general single-suit play may be of the following form: 123 J Comb Optim Does there exist a strategy such we can get at least 4 tricks by using it? While in a real game there are 4 suits, for simplicity’s sake, as it does not reduce the difficulty of the problem, our formulation only considers the case when there is one suit. 2.2.1 Formulation into two general games Game 1 We have 4 players as North, East, South, and West (replaced by N,E,S,W for short). 4n different cards labeled by 1, 2, · · · 4n respectively are dealt to them until every one of them has n cards. One player is set to lead the first round. Play follows the rules below. (1) Every round has a leader to play first, then each player plays one of his cards from the beginner in clockwise direction. A card that has been played is not allowed to be taken back. (2) In one round, the player who plays the largest number out of 4 wins the trick (wins in this round). (3) In the next round, the player who won the previous trick plays first. Goal of the game: N and S want to maximize the number of tricks won by either N or S, while E and W want to maximize the number of tricks won by either E or W. This formulation kind of loses the generality for a single-suit play problem. Because in real bridgecard, every player holds 13 cards in total but they do not hold same number of cards in one suit. So another version of 2-suit case is given below. However, this is only for mentioning the generality that these small changes of setting are able to give us. Game 2 We have 4 players as N, E, S, W. There are n different cards in “trump” suit (red) labeled by different integers 1 to n respectively and 3n different cards in another suit (black) labeled by different integers 1 to 3n respectively. 4n cards are dealt to players with every one holding n of them. Play follows the rules below. (1) In every round each player plays one card from the beginner and in clockwise direction. A card that has been played is not allowed to be taken back. 123 J Comb Optim (2) In one round, players must play cards in the suit of beginner’s card unless he does not have one. (3) In one round, if 4 cards played are in the same suit, then the player who plays the largest number out of 4 wins the trick (wins in this round), if 4 cards are in different suits, then the player who plays the largest number in “trump” suit wins the trick (wins in the round). (4) In the next round, the player who won the previous trick plays first. Goal of the game: N and S want to maximize the number of tricks won by either N or S, while E and W want to maximize the number of tricks won by either E or W. Note that there are 13 cards in every suit in bridgecard, and we generalize it into a variable n, because we are going to formulate it into an optimization problem and analyze it from computational complexity perspective. Thus, the following asymptotic arguments can be made when n grows: Is it a polynomial of n or something larger? This can not be done in a real bridgecard game, because the number 13 is fixed and all computation is of constant size (though the constant is really large but it will not grow, thus we cannot see its exact form of complexity). Consequently, a variable n rather than 13 better represents the generality of the game. Therefore, our formulation is an extended version of a real bridgecard game. 2.2.2 Optimization problem of game 1 In the following discussion, we use the term “distribution of cards” to represent full information of cards in each player, namely what cards each player holds. On one hand, if considered purely from full information perspective, when the distribution of cards is given to all players (i.e. each player knows his cards as well as those of others), a natural question is to ask how many tricks N and S can win if all the four players are playing in the best strategies. In formal matches, such computation is done with computers to help players and coaches to better analyze their performance. Problem 1 Input: The distribution of cards φ and an integer k (the target number of tricks) Output: Whether there is a strategy (actually, an algorithm) for N and S to get at least k tricks under all possible defensive plays from E and W . If the answer is yes, output the strategy. On the other hand, the case in a real game is partial information. The cards are sequentially played. Before the first card is played, every player can only see cards in his own hand. When the first card is already played by W, N will lay down his hands to all three players and follow the instruction from S. Therefore, for declarer S, he can only see two hands and make his plan for playing. So here comes another question. What is the “best” play N can do when having only partial information? Because N cannot see E and W’s hand, there is a lot of information N does not know. What he can do is trying to make his plan better in order to increase the probability of getting enough tricks. Following definitions are introduced to make it clear. 123 J Comb Optim Definition 1 (Distribution) A distribution φ of cards is a set of four hands N , E, S, W , with a hand to be a set of n cards, denoted by φ = {N , E, S, W } and N , E, S, and W are sets of n distinct numbers, representing cards held by N , E, S, and W respectively. Definition 2 (Whole Play) For a given distribution φ = {N , E, S, W }, we call a specific whole process of game by a “whole play”, meaning the sequence of cards playing under the rules in Game 1, denoted by a family W Pφ of 4-element sets: W Pφ = {R1 , R2 , · · · Rn } Ri = {n i , ei , si , wi } where n i , ei , si and wi represents the card played by N, E, S, W respectively in ith round. Definition 3 (Partial Distribution) For a given distribution φ = {N , E, S, W } of a game, after several rounds of play, the distribution changes into φ = {N , E , S , W }, which is called a “partial distribution”, where N , E , S , W are subsets of N , E, S, W respectively. And the game corresponding to this partial distribution is called a “partial game”. We call a specific whole process of the partial game by a “partial play”, meaning the sequence of cards played, starting from the partial distribution φ , denoted by P Pφ = {Ri , · · · Rn } where R j is defined similarly as before and n − i + 1 is the number of cards held by every player. Definition 4 (Trick Function) Given a partial play P Pφ , it is easy to verify which player wins the jth trick just by looking at R j . We then use T (P Pφ ) to represent the number of tricks N or S gets in partial play P Pφ . Definition 5 (Maximal Trick Function) Given distribution of a partial game φ , N and S want to maximize the number of tricks they get, while E and W want to maximize the number of tricks they get. Thus, different strategies are used by both sides. For every strategy α of N and S (in fact a strategy α represents a decision tree, which will be discussed later), no matter what strategy E and W use (let β represent this strategy), they can always get certain number of tricks. In fact, the number of tricks that N and S can get is determined by two strategies α and β, denoted by Tφ (α, β). We name it by “guaranteed trick” of strategy α, denoted by GT (φ , α). GT (φ , α) = min Tφ (α, β) β Consider all strategies of N and S and find the strategy with maximal GT (φ , α). This value is called maximal guaranteed trick of φ for N and S, denoted by M T (φ ), 123 J Comb Optim meaning that the best N and S can do is to guarantee M T (φ ) tricks, under the condition that E and W can always find the most tough play against N and S’s strategy. M T (φ) = max GT (φ, α) α Among series of definitions, maximal trick function is most important.To better understand the definition, a “recurrence” version of definition is introduced. Consider partial distribution φ = {N , E, S, W } and let n, e, s, w represent the cards played by N, E, S, W in the first round respectively, and let N = N − {n}, and E , S and W similarly, and φ = N , E , S , W supposing the first card is played by N. M T (φ) = max min max min M T (φ ) n∈N e∈E s∈S w∈W Our first main problem is to find the value of the M T (φ). A decision version of this question is: Given φ and an positive integer k, is M T (φ) ≥ k? It is necessary to make another definition after M T function: the card that N plays following strategy to guarantee M T (φi ) tricks is called “best choice” for the partial game. Note that a “best choice” for φi can be more than one card, because N may face the case that any card among certain cards can give him M T (φi ) tricks. Our second problem asks what is the best N and S can do when they are given only partial information. As we only plan to analyze the first problem into details, we give descriptions rather than formal definition for the second problem. Now N can only see 2 hands (N’s and S’s). For him, any distribution for E and W is of same probability. The best he can do is to choose a strategy that maximizes the expected number of tricks he can guarantee, or a strategy that maximizes the probability to guarantee k tricks for a given number k. Definition 6 (Probability for MT function) Seeing only two hands, the probability to guarantee k tricks is defined to be Prφ [M T (φ) ≥ k] where φ is uniformly random chosen from all φ with respect to the visible N’s and S’s hands. Problem 2 is more similar to a real game, because the declarer can only see two hands instead of four hands. However, in this case what we are more concerned is the strategy that declarer chooses, namely the card he is going to play given only partial information. We roughly define the problem as follows. Problem 2 When declarer can only see his hand and the dummy’s hand, (1) Which card should he play in order to maximize the probability to get k tricks for a given integer k ? (2) Which card should he play in order to maximize the expected number of tricks that he can get? 123 J Comb Optim In fact problem 2 is asking for a “best probabilistic play”. It is necessary to point out that in problem 2, the Guaranteed Trick is not that important, nor is Maximal Trick, because we are completely drawn into a probablistic setting, and such formula with max and min notions are not appealing here. 3 Analysis for optimization problem Two problems are tractable, but that will take large amount of time. A deeper understanding from following aspects are needed: • Dynamic programming perspective (DP) • Intuition and conjecture of their difficulty • Special properties of them. The first aspect gives us an algorithm for solving the problem but takes exponentially large amount of time. The second aspect analyzes its computational complexity. The third aspect helps us better understand the optimization problems and gives us intuition on approximation algorithm design. 3.1 Computing the MT function 3.1.1 Exact algorithm (Dynamic programming) for computing MT function Recall that M T (φ) = M T ({N , E, S, W }) is the maximal guaranteed number of tricks of partial game φ. N , E, S, W represent the sets of cards of player N, E, S, W respectively. It can be seen that when we have computed the M T (φ), the best card to play first for N can also be found out. We call this card nin N by “best choice” of partial game φ. For convenience in writing, following definitions are introduced. Definition 7 (Trick Judging Function) Let t (Ri ) = t (n i , ei , si , wi ) be the trickjudging function for some round i, where n i , ei , si , wi are the cards played by N, E, S, and W respectively. t (Ri ) = 1 if N/S gets this trick, and t (Ri ) = 0 if E/W gets this trick. Definition 8 (Leader Judging Function) The person who won the last trick is going to lead the next trick. Let L i = L(Ri−1 ) represents the first player of round Ri . It is determined by Ri−1 and its range is {N , E, S, W }. Let N , E , S , W represents the left card-sets of N, E, S and W respectively after one round. Following recurrence equation should be satisfied. M T ({N , E, S, W }) = max min max min (M T ({N , E , S , W } + t (n, e, s, w)) n∈N e∈E s∈S w∈W 123 J Comb Optim where n = argmaxn∈N {min max min (M T ({N , E , S , W } + t (n, e, s, w))} e∈E s∈S w∈W N = N − {n} and similar definitions for e, s, w and E , S , W . Using dynamic programming techniques, we compute all cases for M T (φ )(conn ( ni )4 = sidering different partial games after different number of rounds). i=1 O(24n n1 ) cases are computed (estimated by Stirling formula). For every case, as the expression of min and max notions above, O(n 4 ) steps are in needed. Total amount of time is O(16n n 3 ) in this algorithm. 3.1.2 Properties for the optimization problem We make some definitions, claims and show proofs of them, which can help us better understand the properties of the game. Definition 9 (Complete Better) A hand A = {a1 < a2 < · · · < an } is “completely better” than another hand B = {b1 < b2 < · · · < bn } if ai ≥ bi stands for every i ∈ {1, 2, · · · , n}, and similar definition of “completely worse” is made. If a player wants to increase the number of tricks he got in a 2-player game, he would like to have a complete better hand than his current hand. For a 4−player case, we have the following claim 3 to specify the complete better property, i.e. a completely better partial game is obtained by exchanging some cards between two hands, but not by simply comparing numbers between hands, for the latter is not well defined in 4−player case. We extend our definition for maximum trick function here. The number of tricks that N can guaranteed depend on who leads the first round. N may benefit if he is the last to play in the first round because he can decide after E and W have played. Thus, we use notion φ N to represent a partial game started by N . And let M T (φ N ) be the maximal guaranteed tricks for N in a partial game φ started by N . Similarly for notions φ E , φ S , and φW , they represent the maximal number of tricks that N can guarantee to get for a partial game φ started by E, S, W respectively. Claim 1 (1) M T (φ E ) ≥ M T (φ N ) (2) M T (φW ) ≥ M T (φ N ) (3) M T (φ E ) ≥ M T (φ S ) (4) M T (φW ) ≥ M T (φ S ) Claim 2 (1) M T (φ E ) ≤ M T (φ N ) + 1 (2) M T (φW ) ≤ M T (φ N ) + 1 (3) M T (φ E ) ≤ M T (φ S ) + 1 (4) M T (φW ) ≤ M T (φ S ) + 1 Claim 3 (Completely Better Property) If N is replaced by another “completely worse” hand N , and W is replaced by another “completely better” hand W accordingly, i.e. 123 J Comb Optim N and W exchange a subset of their cards Ne and We such that |Ne | = |We | and We is completely better than Ne , and no change to the rest two hands and the player to lead the first round, the MT function does not increase: M T ({N , E, S, W }i ) ≥ M T ({N , E, S, W }i ) where i ∈ {N , E, S, W }. Claim 4 (Best Choice) If N is the last to play in the first round, his decision for best choice must follow the property: If he plans to win this trick, he will play the minimum card that is able to win; if he does not plan to win, he will play the minimum card of his hand. Proof We prove the four claims together on induction of n (the number of cards in every player’s hand). When n = 1 it is immediate that the conclusions hold. Supposing in the case n ≤ k the conclusions hold, we consider the case n = k + 1. For Claim 1, we just need to prove the following fact: If W is the first to play, after one round, S has a strategy that can give him a better remaining game (partial game) than the case that S is the first to play. Consider the case that S is the first to play in 1st round (we call it case 1). His best strategy could be to play s1 , and according to the card played by W , he decide which to play in N ’s hand. To be specific, if W play w1 then he will play n 1 , if W play w2 then he will play n 2 , etc. Now consider the case that W is to play first in 1st round (we call it case 2). We design a strategy for S as the following. The card played by W determines what he will play in N s hand according to the best strategy in case 1 (the strategy given in previous paragraph), i.e. if W play w1 then he will play n 1 , if W play w2 then he will play n 2 , etc. Then no matter what E plays, S will play s1 in his own hand. It is immediate that this strategy will bring S to a better partial game than the partial game in case 1. However, this strategy is one of the strategies that S will think of in case 2, that is, this strategy is in his consideration. According to definition of M T , the strategy that S uses will guarantee him a better result. In other words, the number of tricks that he can guarantee in case 2 is larger than or equal to that of case 1. For Claim 4, we considered continuously. For a whole play W Pφ1N , with sl , wi , n j , ek played in the first round, we consider the W Pφ2E with wi , sl , ek played and it is N’s turn to play now. If he managed to win this round with n j , and this time he hopes to win with n t , if nl > n t . Comparing this partial game ψ with φ , according to the hypothesis of Claim 3 in n = k case, it can be found that N can guarantee more tricks in playing n t . With symmetric analysis for the case that he plans not to win this trick, Claim 4 is proved. For Claim 2, it suffices to consider the case that N loses the first trick in W Pφ1N and plans to win it with n t instead of n j . It is immediate that n t > n j . Therefore, in this new partial game N has a completely worse hand than φ . According to the hypothesis 123 J Comb Optim of Claim 3 he cannot guarantee more tricks. So the difference of two maximal number of guaranteed tricks can be at most 1. For Claim 3, we make a correspondence between 2 partial plays: {n 1 · · · n k+1 }(with increasing order) which is completely weaker than another hand {n 1 · · · n k+1 }(with increasing order). If in the first round N plays n x to win this trick, then we consider the case that N plays n x in the new case. It is immediate that n x ≥ n x and he can still win this trick. While for the partial game φ the hand {n 1 · · · n x−1 , n x+1 , · · · n k+1 } is still completely better than {n 1 · · · n x−1 , n x+1 , · · · n k+1 }. According to the hypothesis the Claim 3, conclusion for the case n = k + 1 still holds. If in the first round N plays n x but loses this trick, then we consider N play n x in the new cas3. It is immediate that n x ≥ n x . If he still loses this trick, then for the partial game the hand {n 1 · · · n x−1 , n x+1 , · · · n k+1 } is still completely better than {n 1 · · · n x−1 , n x+1 , · · · n k+1 }. According to the hypothesis the Claim 3, conclusion for the case n = k + 1 still holds. But if he wins this trick, according to Claim 2 and Claim 1 in n = k case, Claim 3 is true here. We can see that Claim 4 is essential to understand the “best choice”. It is also an intuitive property: when we do not plan to win this trick, then there is no need to waste a big number, if we plan to win it, then the minimum number will do. We will see in the following that this property also helps us to design approximation algorithms to this optimization problem. 3.1.3 Computational complexity Dynamic Programming algorithm solves the problem, but takes exponential time. So what is the computational complexity of this problem? The answer is not readily apparent but we analyzed from different aspects of the problem and then found some evidence that it is in PSPACE-complete. We use “MT” to represent decision version of problem 1. First we have the following theorem: Theorem 1 MT is in PSPACE. Proof We prove the theorem by giving an algorithm using polynomial space. In fact the dynamic programming(DP) is the algorithm we want, although it runs for exponential time, but it uses polynomial space. We can regard the DP algorithm as implementing a tree-like algorithm, for deciding every node we need polynomial much resource from child nodes. Let deep first implementing be the order for running this DP-tree, it can be seen that implementation for every node needs polynomial space. Thus, MT is in PSPACE. This theorem gives us an “upper bound” of the computational complexity of the optimization problem. In order to get a lower bound, we consider a simpler problem as the following: Game 3 We have 2 players to be North and South. There are 2n different cards labeled by 1 to 2n respectively, with every player has n of them. One player is set to lead the first round. Plays follow the rules below. 123 J Comb Optim (1) Every round has a beginner to play first, then each player plays one of his cards from the beginner in clockwise direction. A card that has been played is not allowed to be taken back. (2) In one round, the player who plays the largest number out of 2 wins the trick (wins in this round). (3) In the next round, the player who won the previous trick plays first. Goal of the game: N wants to maximize the number of tricks won by N, while S wants to maximize the number of tricks won by S. The only difference between Game 3 and Game 1 is the number of players, for 2 < 4 and we make the following claim. Thus, our definitions before can be inherited here, with the only difference to be the number of players. Claim 5 The decision version of MT problem of Game 3 can be polynomially-reduced to the decision version of MT problem of Game 1. Proof For a partial game for Game 3, N = {n 1 < n 2 · · · < n k } and S = {s1 ≤ s2 < · · · < sk }. We consider a corresponding partial game for Game 1, where union of card-sets S and E are 1 to 2n (no matter the exact distribution of these 2n cards), and N = {n 1 + n < n 2 + n · · · < n k + n} and W = {w1 = s1 + n < w2 = s2 + n < · · · < wk = sk + n} It can be seen that S and E cannot win any trick. Then the partial game of Game 1 is exactly the same as the partial game of Game 3. The partial game of Game 1 is just the competition of N and W. Then if we can solve the partial game of Game 1, we can also solve the partial game of Game 3. This claim also means that we can use the property (Claim 1-4) in a four-player game to a two-player game. But are optimizations for Game 1 and Game 3 of the same computational complexity? We tend to believe that the answer is no, because there is essential difference between two games. In problem for Game 1, one trick is won by the following interacting determination: One of N/S should first decide what they play, then one of E/W makes their decision, next another one of N/S makes his choice and the remaining player of E/W chooses his card in this round. There is interaction of decision in every round. But this is not the case in Game 3. N and S just decide once in one trick, so it is not easy to find the polynomial-reduction from Game 1 to Game 3. Now we consider the same optimization problem of Game 3, with distribution given to both N and S. Problem 3 Is there a strategy for N to guarantee at least k tricks (under all possible defensive plays by S)? When k is a fixed constant, the problem is in P. Let M T − k represent the problem in the case that k is a constant but not part of input. Our conclusion is the following theorem. Theorem 2 MT-k is in P. 123 J Comb Optim Proof We prove by induction on k, for every k, we give a recurrence algorithm for solving problem M T − k. When k = 1.we need to consider two case: the leader of first round is N or S. If it is S, then from Claim 4 we know that the problem can be solved as the following. Whatever S plays in his hand in every round, if some time N is able to win the current trick, then he just wins it, if N cannot, he just plays the smallest number in his hand. Thus, it suffices to see whether any card of S is larger than any card of N , if he does not, then we claim that N is able to win one trick in this game, following the algorithm directed by Claim 4. If it is N , he needs to see how many cards he has that is larger than n, if he does not have any, then he is not able to win any trick; if he has more than one, he just needs to play the largest in his hand, and be waiting for the second largest card to win one trick. The case is a little bit complex when he has exactly one, if it is not n + 1, then he needs to play the smallest number in his hand and wait for the largest card in his hand to win one trick; If it is exactly n + 1 and he has n also, he needs to play n + 1 and n will be one trick for him; if it is exactly n + 1 and he does not have n, then he cannot get any trick. Consequently, it takes O(1) time to decide whether N is able to win one trick. We claim that it takes O(n 2k−2 ) time to decide whether N is able to win k tricks, no matter who is the leader of first round. Supposing the claim stands for k − 1 cases, for the case that N needs to win k tricks, we also divide the problem into two parts. If the leader of the first round is S, no matter what he plays from his hand, from Claim 4 we know that the best N can do is just 2 choices: to win it by smallest possible card or to play the smallest card in his hand. (If he cannot win or can only win then there is only one choice.) For every possible leading cards from S (at most n choices), if N is able to win it, he will computing M T (φ N , k − 1) in O(n 2k−4 ) time for the remaining partial game φ after both N and S have played one card. Consider the case that there exists some s ∈ S such that if S plays s in the first round and N covers it, then M T (φ N , k − 1) = 0, i.e. N cannot guarantee k − 1 tricks in the partial game, and thus he cannot guarantee k by covering s. Such s can be not unique, and we choose the smallest one of them according to Claim 3. In this case N should not cover s but play small. The problem transfers into computing M T (φ S , k) where there is only n − 1 rounds in the remaining partial game φ S . This transformation reduces the number of rounds, and thus can happen at most n times. So, after at most n transformation, we can reduce the target number of tricks k into k − 1. It is because each transformation takes at most n O(n 2k−4 ) time that the optimization problem can be solved in O(n 2k−2 ) time. If the leader of the first round is N , then he is going to think similarly as before from S’s perspective. Is there a card n ∈ N such that when he plays n, S cannot prevent him from getting at least k tricks in the remaining partial game? Similar transformation and argument can solve the problem, with time limited by O(n 2k−2 ). When k is part of input, further analysis on properties of the game is need. Several definitions are made for this purpose. 123 J Comb Optim Definition 10 (Consecutive Cards) Two or more cards are called consecutive cards if they are consecutive integers. Definition 11 (0-1 Representation) We call a 0/1 string a = a1 a2 · · · a2n with exactly n 1s and n 0s to be the representation N’s hand, ai = 1 if and only if N holds the card 2n + 1 − i. Definition 12 (Completely Better in Representations) We call a representation a = a1 a2 · · · a2n is “completely better” than another representation b = b1 b2 · · · b2n if for all i ∈ [2n], it holds that the number of 1s among first i digits in a is not less than that of b. Definition 13 (Complete Better in Choices) Suppose N holds a = a1 a2 · · · a2n and he is going to lead the first round. We call a choice ai is completely better than a j if no matter his opponent chooses to win or not to win, the remaining partial game for playing ai is completely better than that of playing a j . Claim 6 If the leader for current round holds no consecutive cards, then his best choice is to play the smallest number in his hand. Proof We prove the claim by analyzing the representation of partial game. If the current representation is a (i) where ai = a j = 1, i < j and ai−1 = ai+1 = a j−1 = a j+1 = 0. If N chooses to play card ai and S chooses to win it, then the remaining representation is going to be a = a1 a2 · · · ai−2 ai+1 · · · a2n ; if N chooses to play card a j and S chooses to win it, the left representation is a = a1 a2 · · · a j−2 a j+1 · · · a2n . It is easy to see that a is completely better than a when there is no consecutive cards. If N choose to play card ai and S choose not to win it, the left representation a is just deleting ai and the last 0 in origin representation a (i) . If N choose to play card a j and S choose not to win it, the left representation a is just deleting a j and the last 0 in origin representation, and it is easy to see that a is completely better than a when there is no consecutive cards. From Claim 1, Claim 2 and proofs of them we obtain the fact that the better representation we get, the more tricks we can win. Thus, if he does not have consecutive cards, his best choice is to play small. This claim gives us the intuition that if the hand is kind of “scattered” (cards are not consecutive), then it is easy to deal with (the best choice is to play small). If the hand is “compact, which means that there are a lot of consecutive in a hand, the case is still easy to deal with. We can see that among consecutive cards the choices are the same (it is same that you choose to play 5 or to play 6 when you hold both 5 and 6), and less choice means less difficult. Therefore, the main difficulty in optimization problem of Game 3 is the case that the hand reaches a balance between “scattered” and “compact”. Analysis on this is the following. We consider an instance of Game 3. If we are going to make the claim that N is able to win at least k tricks, we are supposed to give a strategy for him to guarantee certain amount of tricks. And the form of a strategy can be formalized into a tree as the following: 123 J Comb Optim Definition 14 (Strategy) We denote the strategy for N in distribution φ to get k tricks by S(φ, k), and the size of this strategy is the number of nodes in S(φ, k). This model is similar to a game tree, where every non-leaf node is labeled by logic operation “and” or “or”, and every leaf node is labeled by 0 or 1. But the size of the tree is larger than a game tree with same input n and is therefore difficult to be verified. Now we give an example that reaches the balance of “compact” and “scattering”, and consider the difficulty of the whole problem. Let n = 6k (if n is not a multiple of 6 then we consider n6 and the complexity would be almost the same), and let N’s hand N = {6k, 6k − 1, 6k − 2, 6(k − 1), 6(k − 1) − 1, 6(k − 1) − 2 · · · 6, 5, 4} and the target number of tricks is k0 . Claim 7 The size of S(φ, k0 ) is O((cn)dn ), where c and d are constants. Proof First, the number of layer is 6k. Notice that for a partial game if some choice will lead to worse better remaining partial game than another choice, then this choice will not lead a branch, i.e. it should not appear in strategy tree. When will the strategy tree split into many branches in some layer? The only case for this is that N has many consecutive cards, and they will lead to different partial games. Among these partial games, no one is completely better than another. Here we say a choice is completely better than another, meaning that this choice will lead to a completely better partial game than another. We claim that there always exists some layer that S will split into k − 1 branches in this layer, i.e. he has k − 1 choices and no one is completely better than another. This is immediate, because if N plays the top 3 honors first, then no matter what he plays next, S can win it and lead the next round, so for S there are k − 1 different 123 J Comb Optim choices, and we must cover all of them in strategy tree. This is why the strategy tree will definitely split into k − 1 branches in some layer. It is true that if one’s hand has k sets of consecutive cards, and it is his turn to play first in this round, then among these k choices, no one is completely better than another. Thus, the number of child nodes is the number of sets of consecutive cards. We claim next that after first split, every branch will split into at least k − 3 subbranches in some layer below. This can be proved by similar reasoning. Consequently. we obtain our conclusion: The size of S(φ, k0 ) is at least (k − 1)!!, which can be estimated using Stirling’s Formula. The size is (cn)dn . This size is really large for a verifier to check. We consider the definition of PC P(r (n), q(n)), it is the set of decision problems that have probabilistically checkable proofs which can be verified in polynomial time using at most r (n) random bits and by reading at most q(n) bits of the proof. And we have the important conclusion that when there is a lower bound for number of queries of some problem from PCP perspective, then this problem is really likely to be in PSPACE-complete. Conclusion 1 PCP[poly(n),O(1)]=PCP[poly(n),poly(n)]=NEXP (MIP = NEXP) (Babai et al. 1991). The complexity class NEXP strictly contains PSPACE. Also, it is generally believed that for the proof under the complexity of 2cn where c is a constant, we are probably able to use little randomness and few times of queries to make the probability of correctness of our verify a constant. But now the size is (cn)dn , it becomes less possible, probably impossible. And although we are not able to reduce some PSPACE-complete problems to this bridgecard optimization problem, considering both the difficulty of proving the equivalence between 2-player case and 4player case and the difficulty from PCP perspective to prove that the problem belongs to a lower complexity class, it is reasonable for us to believe that this problem is PSPACE-complete. Conjecture 1 MT is in PSPACE-complete. 3.2 Algorithm for best probabilistic play problem 3.2.1 Exact algorithm (DP) for computing best probablistic play First, we found some local property of the problem. When one round is finished, there are 4(n − 1) cards left. We can do rearrangement of those cards: For one player, he will clearly see how large is every of his card in the left 4(n − 1) cards. To be accurate, a new 0 − 1 representation can be found by him with respect to his remaining hand. Then he can regard the partial game as a new whole game containing a total of 4(n − 1) cards. And this “partial game transform” procedure can be done in O(n) time. For finding the “best probabilistic play” for a declarer, he is always looking for the strategy with the highest probability to get certain number of tricks according to the increasing information he has known, while his opponents E/W can see the card 123 J Comb Optim distribution. They watch the declarer making some choices and what they are going to do is to use the “oracle” of computing the “best choice” for the partial game (explained in problem 1) to decide what they play, and wait for declarer’s next choice. Consequently, when the distribution is given to E/W but not to N, if N always finds the best probabilistic play for partial game, then the total number of tricks cannot be larger than the “maximum number of guaranteed tricks in the initial whole game, because he is not always doing the best choice according to the given distribution, while his opponents are always doing their best to prevent N from getting more tricks. On the other hand, we can define and consider “best probabilistic defensive play” of a defender in another setting: For a defender E, what he can see is his own hand and the dummys hand, but the declarer can see the whole distribution and always makes the best choice for partial game. And over all possible distributions the choice that appears most times to be the best choice is the “best probabilistic play for a defender. We design an algorithm using dynamic programming to solve the Best Probabilistic Play problem. Before that, some definitions are made to formalize the setting. Definition 15 (Partial Game with Incomplete Information) Let φ ij = {N , E, S, W } be the partial game with incomplete information for declarer N. In this partial game the first round is started by j ∈ {N , S}, and all the declarer can see are two hands N = {n 1 , · · · n n } and S = {s1 , · · · sn }. He is going to make choices based on these two hands and the sequentially revealed cards sequentially revealed by E and W. Definition 16 (Strategy) Let α be a strategy of declarer N, it is a decision tree same as the tree above definition 14. Definition 17 (Best Defence) Defenders E and W can see the whole distribution of 4 hands. When E/W is going to play a card, he just runs Best Play algorithm of the remaining partial game to find the card that he should play to minimize the number of tricks that N can guarantee. This algorithm for defenders are called Best Defence. Definition 18 (Probability Function for a Strategy) Let P(φ ij , α, k) be the probability for the declarer N to win k tricks using the strategy α under the best defence from E and W. The probability is taken over uniformly chosen distribution over the left two hands. Definition 19 (Probability Function for a Partial Game with Incomplete Information) Let P(φ ij , k) = maxα P(φ ij , α, k) be the probability for the declarer N to win k tricks in partial game, namely the largest probability to get k tricks over all strategies. And let α(φ ij , k) denote the algorithm that provides N the largest probability to get k tricks, which is also called the Best Strategy. Let R denote the left 2n cards held in E and W , namely R = [4n] − N − S, and let the trick judging function t (Ri ) and leader judging function L(Ri ) be similarly defined in definition 7 and definition 8. The following recurrence function can be found: P(φ iN , k) = E,W i Pr [E, W ] max min max min P(φ L(R , k − t (R1 )) 1) n∈N e∈E s∈S w∈E 123 J Comb Optim where the probability is taken uniformly on all possible distribution of card sets E n!n! , and R1 = {n, e, s, w}, and φ i represents the and W , and Pr [E, W ] = 2n1 = (2n)! (n) remaining partial game after the first round. i , k). Dynamic Similarly, recurrence functions on P(φ Si , k), P(φ iE , k) and P(φW n programming algorithm is therefore set up. We need (2n)! n!n! = O(2 ) computations in n−1 n 2 every case and i=1 ( i ) = O(n · 22n ) cases are going to be computed. Thus, our algorithm runs in O(n · 23n ) time. 3.2.2 Computational complexity and properties of main problems We have not found out the computational complexity of this problem, but we did analysis on comparing the complexity of two problems: “Best Play” (best choice) and “Best Probabilistic Play”. From DP perspective, we can see the fact that with exponentially many calls of an oracle solving “best play” problem we can solve “best probabilistic play” problem, but we do not know if we can solve “best probabilistic play” with poly-call of “best play”. If we can, that means “best probabilistic play” is easier than “best play”. But the definition of “best probabilistic play”, it contains the elements of definition of “best play”, we tend to believe that “best probabilistic play” will not be easier than “best play”. On the other hand, we know that if a declarer follows the strategy given by “best probabilistic play”, he still cannot get more tricks than “best play” strategy, in other words “best play” does better job than “best probabilistic play”. This fact kind of shows us that “best play” is harder, because for most cases we tend to think that getting a better solution to a problem need more computation. Analysis from two aspects guides us to make the following conjecture. Conjecture 2 Problems Best Play and Best Probabilistic Play are of same computational complexity. Best Play ≤ p Best Probabilistic Play and Best Probabilistic Play ≤ p Best Play This conjecture tells us that, when a declarer wants to find the best strategy to make his contract, regardless of the mistakes from his opponents, it is as difficult as to find the best play from given distribution. However, the latter problem tends to be done by a computer because it is very complex. Also same case for a defender, when a defender wants to find the best strategy to beat declarer’s contract, regardless of the mistakes from declarer, it is as difficult as to find the best play from given distribution. And the latter problem tends to be done by computers. In short, it is almost impossible for a bridgecard player to make no mistakes in even one board. And this is exactly the case even in the top matches all over the world. On the other hand, through descriptions about “best probabilistic play” we know that if we have an oracle to this problem, then for any partial game, when it is our turn to play, we can use this oracle to compute the probability of winning for every option, and we always choose the highest of them and find the corresponding strategy and play just by following this strategy. Another interesting fact here is that from quantities of experiments on comparing the “best play” and the “best probabilistic play”, we find that in most cases the “best 123 J Comb Optim probabilistic play” can give the strategy for “best play”, and the “deviation” occurs on the following cases. It shows that sometimes the mistakes are unavoidable, because through computing we can gain nothing about what the best play is, because there are two completely different strategies with exactly the same opportunity. We call cases like this to be “undirected cases”. With plenty of evidence, we believe in the statement that “only with more than one choices of exactly same probability to win, does the best probabilistic play deviate the best play”. This fact guides us to make the following conjecture, which is reasonable and likely to be right. Conjecture 3 Best probabilistic play will give us best play if we do right on every “undirected case”. Anyway, this conjecture shows that sometimes we should not be accused of those mistakes, but are just unlucky. 3.3 Approximation algorithm for problem 1 We design an approximation algorithm for the problem “best play” in a 2-player case, and give our proof for the approximation ratio. We always assume that the opponents always do the best they can (they just choose the card that can best limit the number of tricks of the declarer). Let N to be the set of cards of North, and S for South similarly. Let o(t) be the number of cards which is one of the largest t th cards in N , and the following notion is essential in the proof. Definition 20 (Potential Tricks) Let P T (φ ij ) be the number of potential tricks partial game with incomplete information φ ij , formally defined to be P T (φ ij ) = max1≤i≤2n {2o(i) − i}. Approximation Algorithm 1 (1) When you are the 1st to play in this round, just play the largest number in your hand. (2) When you are the 2nd place in this round, then just see whether you can cover the first card played by E, if you can, then play the smallest card that covers it; if you cannot, then play the smallest card in your hand. Definition 21 (Upper Region) We call the range [n + 1, 2n] the upper region, when there are a total of 2n cards left in the partial game. 123 J Comb Optim Theorem 3 Approximation Algorithm is 4 − appr ox. Lemma 1 If N has p cards in the upper region of origin game, then if he uses approximation algorithm he will get at least 2p tricks when it is him to play first in the first round; and he will get at least 2p when it is his opponents turn to play first in the first round. Lemma 2 If N has p ( p ≤ n2 ) cards in the upper region of origin game, then if his opponent uses Approximation Algorithm, N will get at most 2 p tricks. Proof First we prove the lemma 1, we use the representation of the game defined above, and consider the following cases using induction on n. For n = 1 and n = 2 it is easy to check it, supposing the lemma is right when n ≤ k − 1, consider the case n = k. If your opponent plays the largest card in his hand that cannot be covered by you, you just play the smallest number in your hand. There is no change in the number of your cards in the upper region and we get a k − 1 case with leader to be your opponent. If your opponent plays a card and it is covered by you. The number of your cards in the upper region minus 1, and we really get a trick, then we get a k − 1 case with leader to be yourself. If it is your turn to play and your largest card is covered by opponent. The number of your cards in the upper region minus 1, and we really get a trick, then we get a k − 1 case with leader to be your opponent. If it is your turn to play and your largest card is not covered by opponent. The number of your cards in the upper region minus 1, and we really get a trick, then we get a k − 1 case with leader to be yourself. In the 4 cases we consider them with hypothesis and to see that the lemma holds in case n = k. Then we prove lemma 2, since N has p cards in the upper region, his opponent has n − p cards in the upper region. One of S’s cards does not win a trick for S if and only if it is covered by a N’s card in upper region. Thus, S has at least n − p − p = n − 2 p cards in the upper region that will guarantee a trick for him. Consequently, N can get at most 2 p trick. We come back to proof of the theorem. On one hand, if in N’s hand, the number of cards in the upper region is larger than n , 2 then the theorem is right, because theoretically he can win at most n tricks which is not larger than n, and our approximation algorithm can guarantee n4 tricks for him. On the other hand, if in N’s hand, the number of cards in the upper region p is less than n2 , then the theorem is right, because from lemma 2 theoretically he can win at most 2 p tricks, and our approximation algorithm can guarantee 2p tricks for him. However, this approximation algorithm can still be improved, because only a little bridgecard logic is added into algorithm, which is far from enough to reach the limit, although our 4appr ox algorithm seems to be a good result. Consider the following instance in a 4-player case. 123 J Comb Optim This case stands for a class of techniques in real bridgecard game (n = 2), which is called “finesse” in a real bridgecard game. Here S need to play small to 6, so if 7 is in the W’s hand then S can get all 2 tricks, but if 7 is in E’s hand, then S can only get 1 tricks and in fact there is no way for S to get 2 tricks. To be honest, to implement the finesse technique into an algorithm seems really difficult, maybe concerning some knowledge on “pattern recognition”. While for a real bridgecard artificial intelligence system, a single technique “finesse” is far not enough. There are still some cases like we should not play honors in the first round, but our approximation algorithm does so. (n = 5) In this case S should play 4 to 1 to guarantee 3 tricks, no matter which opponent wins this trick by a larger card, he will be “endplayed” to lead first in the next round. And in the next round, the declarer can play small in the second position, and when the third player plays a number larger than 16 you just win with the honor, if the third player plays a number smaller than 15 you can win by 15 or 16. 4 Conclusions and future development Our research initializes and analyzes the problem of bridgecard game, including formulating the game into two general optimization problems: Best play and best probabilistic play. The 2-player version of these problems are concrete and simple, and thus suitable to analyze the computational complexity. In the formulation we change the number of cards from 13 in one player to an undefined variable n, which allows us to make asymptotic argument from theoretical computer science aspect. We conjectured that best play is in PSPACE from analysis from PCP perspective. Then we conjectured that best probabilistic Play has the same computational complexity as best play. Dynamic programming is employed to give algorithms for both problems. Concerning some basic skills in real bridgecard games, we then designed an approximation algorithm and proved the constant approximation ratio. 123 J Comb Optim However, the approximation algorithm and analysis are fundamental, and certain flaws were pointed out in the last section. Therefore, future research on better approximation algorithms and deeper point of view on analysis are expected. Acknowledgments First and foremost, I would like to show my deepest gratitude to our first professor, Andrew Chichih Yao, who has provided me with good resource in learning and discussing. Without his help I cannot get the opportunity to walk deeply into theory of computer science. I shall extend my thanks to Prof Amy Wang for her guidance and encouragement for me to do deeper analysis on this project. I would also like to thank all my teachers who have helped me to develop the fundamental and essential academic competence: Prof Jian Li, Prof Iddo Tzameret and older student Hongyu Liang. Without their help I cannot gain deeper understanding of the problem. I would also thank Donna Dong for helping me in language. She managed to read the whole paper even if she was not familiar with the background knowledge. And I learned a lot on standard use of English and analytical writing under her guidance. Last but not least, I would like to thank all my friends, especially my three lovely roommates, for their encouragement and support. References Kelsey (1994) Killing defence. Reese (1994) The expert game. Demaine ED (2001) Algorithmic combinatorial game theory. Demaine ED (2010) The complexity of UNO. http://arxiv.org/pdf/1003.2851.pdf Babai L, Fotnow L, Lund C (1991) Non-deterministic exponential time has two-prover interactive protocols. Comput Complex 1(1):3–40 123 On the Meeting Time for Two Random Walks on a Regular Graph Yizhen Zhang† , Zihan Tan† and Bhaskar Krishnamachari‡ † Tsinghua University, Beijing, China ‡ University of Southern California, Los Angeles, USA {yz-zhang11, zh-tan11 }@mails.tsinghua.edu.cn, [email protected] October 31, 2014 Abstract We provide an analysis of the expected meeting time of two independent random walks on a regular graph. For 1-D circle and 2-D torus graphs, we show that the expected meeting time can be expressed as the sum of the inverse of non-zero eigenvalues of a suitably defined Laplacian matrix. We also conjecture based on empirical evidence that this result holds more generally for simple random walks on arbitrary regular graphs. Further, we show that the expected meeting time for the 1-D circle of size N is Θ(N 2 ), and for a 2-D N × N torus it is Θ(N 2 logN). 1 Introduction Consider a system of discrete-time random walks on a graph G(V, E) with two walkers. Each time, they each independently move to a nearby vertex or stay still with given probabilities. Denote the transition matrix of a single walker by P, where P(i, j) is the probability that one walker moves from vi to v j in a time slot. This process is assumed to start at steady state (i.e. uniform distribution) for each walker, and terminates when they meet at the same vertex. We denote this meeting time by τ, which is a random variable with the expectation E[τ]. Our objective is to analyze this quantity on d-regular graphs. Figure 1: 4 walkers on a 3-regular graph 1 It is instructive to consider the problem on the one-dimensional circle first. We study a circle with N nodes, denoted by V = {0, 1, 2, · · · , N − 1}. The two walkers start from arbitrary position according to the initial distribution. Every step, the walker on i chooses to move to {i − 1, i + 1} (for simplicity of notation, assume that if i = N − 1, then i + 1 = 0 and similarly that if i = 0 then i − 1 = N − 1 ) or stay still at i with probability {p1 , p2 , p3 } respectively. Figure 2: 1-D circle Since we are only concerned about the meeting time, the relative position of the two walkers is enough to describe that random variable. So we fix one walker at ’0’. Then in this new equivalent model, the transition matrix of the other walker before the encounter is M = P · PT . A similar equivalent model can be defined for a N × N torus. Let V = {(x, y)|x, y = 1, 2, ..., l}. Every step, the walker on (x, y) moves to (x ± 1, y ± 1) or stay still at (x, y) with given probability. Define the index of (x, y) to be Ind(x, y) = (x − 1)N + y, we can get a N 2 -order matrix P. Let i, j denote the indices of two vertex (xi , yi ), (x j , y j ). Then P(i, j) denotes the probability that one walker moves from (xi , yi ) to (x j , y j ) each step. P is a “block-circulant matrix” defined in 3.1.2 . Similar to the 1-D case, we fix one walker at the lower-right cell, the transient matrix of the other walker before the encounter here is also given as M = P · PT , which is symmetric. Figure 3: 2-D Torus Our main result is as follows: by suitably defining a Laplacian matrix L, the expected meeting time of the two walkers E[τ ] (i.e., the expectation of the first time that they meet on the same cell starting from the steady state uniform distribution) on a ring or torus could be explicitly expressed as the sum of the reciprocals of non-zero eigenvalues of L. We further conjecture based on empirical evidence that the result holds 2 more generally for simple random walks (i.e., with equal transition probabilities) on arbitrary regular graphs. 2 Method and Key Results 2.1 Preliminary Recall the standard definition of a Circulant Matrix: Definition 1 (Circulant Matrix) A circulant matrix is a matrix where each row vector is rotated one element to the right relative to the preceding row vector. A circulant matrix A is fully specified by one vector, a, which appears as the first row of A. 2.1.1 Properties of Circulant Matrix For arbitrary real, circulant matrix A generated by {a0 , a1 , · · · , an−1 }with order n, we can find its eigenvalue in a general way following the approach indicated in [1]. First define vector ξi whose jth component is 1 ξi ( j) = √ wi j n 2π where w = e n is the nth roots of unity (1) We can prove the following properties: (a) < ξi , ξ j >= δi j (b) Aξi = λi ξi , i = 1, 2, ..., n − 1, 0 (a) shows that{ξi |i = 1, 2, · · · , N} are the orthogonal eigenvectors of A. λi is the eigenvalue of A, which can be calculated by (Aξi )( j) = n X A( j, k)ξi (k) (2) k=1 a0 a1 an−1 √ wi j + √ wi( j+1) + · · · + √ wi( j+n−1) n n n n−1 X = ξi ( j)( ak wik ) = (3) (4) k=0 Let λi = Pn−1 k=0 ak wik , then we have the property (b). Definition 2 (Block-Circulant Matrix) If A is a n2 -order partitioned circulant matrix generated by A0 , A1 , · · · , An−1 where the Ak are all n-order circulant matrices generated by {ai,0 , ai,1 , · · · , ai,n−1 } (see illustration below for a 9-order Block-Circulant Matrix). Then A is called a block-circulant matrix. A0 A1 A2 ai,0 ai,1 ai,2 A = A2 A0 A1 , where Ai = ai,2 ai,0 ai,1 for i = 0, 1, 2. A1 A2 A0 ai,1 ai,2 ai,0 3 2.1.2 Properties of Block-Circulant Matrix Given index i, the coordinates of i is xi = quotient(i − 1, n), yi = remainderi − 1, nThen we need to modify the definition of ξi by 2π 1 xi yi +x j y j w wherew = e n is the nth roots of unity n The properties given above in section 3.1.1 still hold, and we have P Pn−1 xi l+yi (k+1) λi = n−1 is the ith eigenvalue of A. l=0 k=0 al,k w ξ x ( j) = 2.2 Results on Circle 2.2.1 The Expected Meeting Time (5) Let us first discuss the problem on the simplest graph, a 1-D circle. Theorem 1 If two particles make independent random walks on a circle with an uniP th form initial distribution, then the expected meeting time is λi ,0 λ−1 i , where λi is the i T eigenvalue of L = I − PP , and P is the transition matrix for a single walker. Put the transition probabilities in M as the weight of edges. Then we get the Laplacian matrix, L = I − M = I − PPT (6) which is a circulant matrix generated by {1−q0 , −q1 , −q2 , 0, · · · , 0, −q2 , −q1 }, where q0 = p21 + p22 + p23 , q1 = p3 (p1 + p2 ), q2 = p1 p2 . Let T i ,which is the ith component of vector T , denote the expected meeting time with starting vertex i. Obviously, T 0 = 0. The initial distribution is π. Then E[τ] = π T T . We can obtain a set of equations by recurrence: T i = q2 T i−2 + q1 T i−1 + q0 T i + q1 T i+1 + q2 T i+2 + 1 i,0 (7) Notice that the coefficients q0 + 2q1 + 2q2 = p21 + p22 + p23 + 2p3 (p1 + p2 ) + p1 p2 = (p1 + p2 + p3 )2 = 1. By summing up the above equations, we have: T 0 = q2 T N−2 + q1 T N−1 + q0 T 0 + q1 T 1 + q2 T 2 − (N − 1) (8) Thus, the Laplacian matrix L is the coefficient matrix of (4),(5). LT = ∆t, where ∆t = (1, 1, 1, · · · , 1, −(N − 1))T (9) Since L is a real, circulant matrix, we can use the conclusion in section 3.1.1. Taking the inner product of (9) with ξi on both sides, from the symmetry of L we have N−1 X wik < LT , ξi >=< T , Lξi >=< T , λi ξi >= λi ( T k √ + T 0 ) N k=1 4 (10) ( √ N−1 1 X ik − N i,0 < ∆t, ξi >= √ ( w − (N − 1)) = 0 i=0 N k=1 PN−1 ik Notice that k=1 w = −1 for i , 0. Combined with (9), for i , 0, N−1 X √ Tk √ wik = − N(λi )−1 N k=1 (11) (12) Summing up by i, we have: N−1 X N−1 N−1 X √ X Tk λ−1 √ wik = − N i N i=1 k=1 i=1 N−1 X N−1 X i=1 k=1 T k wik = −N Changing the order of summation, N−1 X Tk N−1 X i=1 k=1 wik = −N (13) N−1 X λ−1 i (14) N−1 X λ−1 i (15) i=1 i=1 N−1 N−1 X 1 X λ−1 Tk = i N k=1 i=1 (16) We assume the steady state distribution is the initial distribution. For any arbitrary regular graph, this is the uniform distribution. The expected meeting time is then given as: E[τ] = π T T = N−1 N−1 X 1 X λ−1 Tk = i N k=1 i=1 (17) Note that this is the sum of the reciprocals of non-zero eigenvalues of L. 2.2.2 The Order Estimation of E[τ] For simplicity, we estimate the order of E[τ] for simple random walk (i.e., p1 = p2 = p3 = 13 ): E[τ] = N X 1 i,0 = 3 N X 2 i=1 9 (2 − 4 πi 2 2πi −1 cos − cos ) 3 N 3 N (2 − cos N = πi πi − (cos )2 )−1 N N 2X 1 9 i=1 (2 + ti )(1 − ti ) 5 (18) ti )−1 ∈ [1/3, 1], which is bounded by constants. where ti = cos πi N . Thus (2 +P N From [2], we have that summation i=1 (1 − ti )−1 is O(N 2 ). Thus E[τ] is O(N 2 ). On the other side, for i = 1, applying the Taylor Theorem we have 1 1 = 1 − t1 1 − cos π N = 1 = Θ(N 2 ) Θ(1/N 2 ) (19) Thus E[τ] is also Ω(N 2 ), yielding that in fact for the 1-D circle, E[τ] grows with the size of the graph as Θ(N 2 ). 2.3 Results on Torus 2.3.1 The Expected Meeting Time Theorem 2 If two particles make independent random walks on a torus with an uniP th form initial distribution, then the expected meeting time is λi ,0 λ−1 i , where λi is the i T eigenvalue of L = I − PP , and P is the transition matrix for a single walker. Similarly, put the probabilities of transition in M as the weight of edges. Then we get the Laplacian matrix. L = I − M = I − PPT (20) Let T i denotes the expected encounter time with starting point with index i, which is the ith component of vector T . Obviously, T N 2 = 0. If the initial distribution is π, then E[τ] = π T T . We can get a set of equations by recurrence (for a more readable notation here we write that T x,y = T Ind (x, y)). For ease of exposition, we illustrate below this recurrence equation for a simple random walk, that means the walker in the original model moves to its neighbour or stay still with the same probability 51 : 1 1 2 2 T x±2,y + T x,y±2 + T x±1,y + T x,y±1 25 25 25 25 2 1 + T x±1,y±1 + T x,y + 1 i , 0 25 5 T x,y = (21) Note that such a recurrence equation for T x,y could also be written for any random walk that moves to neighboring nodes with different probabilities. We also have: LT = ∆t, where ∆t = (1, 1, 1, · · · , 1, −(N 2 − 1))T (22) With the same approach in 3.2, we have < LT , ξi >=< T , Lξi >=< T , λi ξi >= λi ( N X N X T k,l k=1 l=1 6 N w xi k+yi l ) (23) < ∆t, ξi >= ( X 1 −N ( w xi k+yi l − (N 2 − 1)) = 0 N (k,l),(N,N) i,0 i=0 (24) Combined with (20) and T 0 = 0, summing up by i for i , 0, we have 2 NX −1 i=1 2 NX −1 T k,l xi k+yi l w = −N λ−1 i N i=1 (k,l),(N,N) X X X T k,l w xi k+yi l (x,y),(N,N) (k,l),(N,N) = −N Change the sequence of summation, finally we have 1 N2 X T k,l = 2 NX −1 2 NX −1 λ−1 i (25) (26) i=1 λ−1 i (27) i=1 (k,l),(N,N) Note that we get actually the same expression as 1-D circle. Given the uniform initial distribution, the expected time E[τ] is the sum of the reciprocals of non-zero eigenvalues of L. 2.3.2 The Order Estimation of E[τ] Applied (6) to (25), we have E[τ] = N−1 X i, j=0 (i, j),(0,0) 1 4πi 4π j 2πi 2π j 2πi 2π j (20 − 2(cos + cos ) − 4(cos + cos ) − 8 cos cos ) 25 N N N N N N which can be rewritten as E[τ] ≡ A): N−1 X i, j=0 (i, j),(0,0) 1 1 2ti j si j + 3 1 − ti j si j (28) j) π(i− j) where ti j = cos π(i+ N , si j = cos N . By applying the lemma(proved in Appendix Lemma 1 If θ1 , θ2 ∈ [0, π4 ], then 1 4 ≤ 1 − cos θ1 cos θ2 1 − cos 2θ1 cos 2θ2 we can separate the summation into Θ(logN) parts, and prove that each part is Θ(N 2 ). Thus finally we obtain that E[τ] is Θ(N 2 logN) The complete proof is given in the Appendix A. 7 (29) !−1 3 Discussion We have proved that on the circle and the torus, the sum of the reciprocals of non-zero eigenvalues of L = I − PPT is the expected meeting time of two walkers. In fact, if the graph has a strong symmetry properties which guarantees M = PPT and L is (block)circulant, then the proof still holds. The simulation results shown in figure 4 match the conclusion in section 3. Figure 4: Simulation Results on 2-D Torus Moreover, we find empirically that the expression even works for simple random walks on arbitrary regular graphs. This is not a trivial observation, since the symmetry of vertices doesn’t hold for arbitrary regular graph, see the examples for 4-regular graphs in figure 5. In this case, the equivalent model approach of fixing one of the walkers at a particular location and defining the transition matrix of the other walker does not work. Figure 5: Special Cases for 4-regular Graph Conjecture 1 (Expected Meeting Time on Regular Graph) If two particles make independent simple random walks on a connected d-regular graph, and the initial distriP th bution is uniform, then the expected meeting time E[τ] is λi ,0 λ−1 i , where λi is the i T eigenvalue of L = I − PP , and P is the transition matrix for a single walker. Our conjecture is supported by empirical evidence which we present here. Figure 6 shows simulation results as well as relevant numerical calculations for simple random 8 walks over arbitrary regular graphs. The left figure shows the results on 10-regular graphs, while the right one on graphs with 30 vertices. For each horizontal point, a single random graph is generated and fixed for averaging over multiple random initial conditions drawn from a uniform distribution. Each blue mark indicates the average meeting time when doing the experiment independently for 500 times, and green mark for 10000 times. The red mark indicates the conjectured value of the expected meeting time (i.e. the sum of the reciprocals of non-zero eigenvalues of L). The black mark indicates the exact value of E[τ] which could be calculated by the definition of expectation once given transition probabilities (See Appendix B). In each case we see that the conjecture is valid. Figure 6: Simulation Results on General Regular Graphs One way to prove the conjecture may be to use the method in section 3; but for this approach we would need an additional conjecture. Conjecture 2 If A is the adjacency matrix of a connected d-regular graph G with n vertex, then A has a set of orthogonal eigenvectors {ξ1 , ξ2 , · · · , ξn } satisfying (a). ξn = (1, 1, · · · , 1)T ; (b). ξi (n) = 1, for all i; Pn (c). ξ ( j) = 0,for all i; Pnj=1 i (d). i=1 ξi ( j) = 0,for all j; (e). < ξi , ξ j >= nδi j Proposition 1 Conjecture 2 is a sufficient condition for Conjecture 1. Proof. Suppose µ1 , ..., µn is the eigenvalues of A. We define a matrix L˜ as follows: L˜ = I − P ⊗ P (30) where P ⊗ P is the kronecker product of P. Then fromP = (I + A)/(d + 1), we have the eigenvalue of P is βi = (µi + 1)/(d + 1). Thus from the properties of kronecker product, the eigenvalue and eigenvector of L˜ is λi, j = βi β j and ξi, j = ξi ⊗ ξ j . 9 We can similarly construct a recursive function of T i, j , which indicates the expected meeting time with walkers on vertex i and j. Obviously, T i,i = 0. We can prove that ˜ = ∆t, where ∆ti, j = 1 if i , j, else ∆ti,i = −(n − 1). Then LT ˜ , ξi, j >=< T , Lξi, j >=< T , λi ξi, j >= λi, j < LT (n,n) X T k,l ξi (k)ξ j (l) (31) (k,l)=(1,1) Combined with (c) and (e) in Conjecture 2, we have < ∆t, ξi, j n X X −(n − 1)ξi (k)ξ j (k) + ξi (k) >= ξ j (l) k=1 = n X k=1 l,k −(n − 1)ξi (k)ξ j (k) − ξi (k)ξ j (k) (32) = −n < ξi , ξ j >= −n2 δi j Thus we have (n,n) X T k,l ξi (k)ξ j (l) = (k,l)=(1,1) 1 < ∆t, ξi, j >= −n2 δi j λi, j (33) Summing by (i, j) , (n, n) and applying (d), finally we get the expression E[τ] = 1 n2 (n,n) X T i, j = (i, j)=(1,1) (n,n) X (i, j)=(1,1) n δi j X 1 1 = λi, j λi,i i (34) Notice that λi,i is the same eigenvalue of L = I − PPT in our original definition of L. Thus we have proved that if Conjecture 2 holds then the Conjecture 1 would be true. Remark 1 If we let ξn be the eigenvector with eigenvalue µ = d, then (a) holds. P Remark 2 Since nj=1 ξi ( j) = (1, 1, · · · , 1)T ξi , multiply (1, 1, · · · , 1)T on the left of Lξi = λi ξi and we have n X j=1 ξi ( j) = 1 (1, 1, · · · , 1)T L ξi = 0 λi (35) Notice that the the row sum of L is equal to 0. Thus we have (c). References [1] Robert Kleinberg, Lecture notes for Computer Science 6822 Advanced Topics in Theory of Computing: Flows, Cuts, and Sparsifiers, Fall 2011, online at www.cs.cornell.edu/courses/CS6822/2011fa/scribenotes/lec 2.pdf 10 [2] Elliott W. Montroll, “Random walks on lattices III: Calculation of first-passage times with application to exciton trapping on photosynthetic units”, J-MATHPHYS, 10 (4), p.753-p.765, April 1969 Appendix A: The Proof for E[τ] = Θ(N 2 logN) on 2-D Torus Recall Lemma 1: If θ1 , θ2 ∈ [0, π4 ], then 1 4 ≤ 1 − cos θ1 cos θ2 1 − cos 2θ1 cos 2θ2 Proof.Let s = cos θ1 and t = cos θ2 , then cos 2θ1 = 2s2 − 1 and cos 2θ2 = 2t2 − 1. The inequality in lemma is equivalent to: 4 − 4st ≥ 1 − (2s2 − 1)(2t2 − 1) (36) 4ts − 4t2 s2 + 2t2 + 2s2 ≤ 4 Let f (t, s) = 4ts − 4t2 s2 + 2t2 + 2s2 , since if ts = c ≤ 1 is fixed, f attains its maximum at t = 1, s = c. Thus, it remains to show f (1, s) ≤ 4, which is −s2 + 2s − 1 ≤ 0, this inequality is correct and we complete the proof. Recall the equation (26) which can be obtained by some trigonometric identities. E[τ] = N−1 X i, j=0 (i, j),(0,0) 1 1 2ti j si j + 3 1 − ti j si j Since 1 ≤ 2ts + 3 ≤ 5 for all i, j, then only need to estimate N−1 X i, j=0 (i, j),(0,0) 1 5 ≤ 1 2ts+3 ≤ 1, which is bounded. Then we π(i + j) π(i − j) 1 − cos cos N N !−1 (37) (i, j) are uniformly distributed within the grid [0, N] × [0, N] (except the origin), then (i + j, i − j) are uniformly distributed in a diamond area in [0, 2N] × [−N, N], by the symmetry of cosine function and omitting a constant coefficient, it’s equivalent to estimate N−1 X p,q=0 (p,q),(0,0) qπ −1 pπ cos 1 − cos 2N 2N 11 (38) Since when we set p = 0(or q = 0), the summation is N−1 X qπ −1 (1 − cos ) 2N q=0 (39) From [2], we have that summation is O(N 2 ). Thus, it remains to prove the following summation is in Θ(N 2 logN) N−1 X p,q=1 1 − cos pπ qπ −1 cos 2N 2N (40) Now let us partition the region into Θ(logN) parts, denote by Ak = Dk Ak−1 , whereDk = (p, q)|1 ≤ p, q ≤ 2k , k = 0, 1, 2, · · · , logN (41) for all k ≥ 1, |Ak + 1| = 4|Ak |, and every term (p, q) in Ak corresponds to (p, q), (p − 1, q), (p, q − 1), (p − 1, q − 1) in Ak + 1. Then applying the Lemma 1 and the cosine function is non-negative and monotone decreasing in [0, π/2], we can prove that Sk = X X qπ −1 pπ cos ≤ 1 − cos 2N 2N (p,q)∈A (p,q)∈A k k+1 pπ qπ −1 1 − cos cos = S k+1 2N 2N (42) for k = 0, S 0 ≤ S 1 also holds by a simple calculation. π π π −1 Notice that since 1−cos 2N = Θ( N12 ), thus S 1 = (1−cos 2N cos 2N ) is Θ(N 2 ). The π −1 terms in S logN is bounded above by a constant (1 − cos 4 ) = 2 and similarly bounded below by 0.5, then S logN is also Θ(N 2 ). Thus, we have logN X S k is Θ(N 2 logN) (43) E[τ] = k=0 Appendix B: Calculating the Exact Value of E[τ ] The exact value of expected meeting time could be calculated in the following way: Suppose there are two walkers a and b. We denote the state that a is at vertex i while b is at vertex j by S (i, j) , with index ((i − 1)N + j). Thus if the transition matrix for a single walker is P, then the transition matrix for the states of two walkers is Q = P ⊗ P except for the (i − 1)N + ith rows(the absorbing states), which are all zeros expect the the iN + ith component. Let Λ = {(i − 1)N + i|i = 1, 2, · · · , N}, and S Λ is the set of absorbing states. S (τ) indicates the state at time τ. Recall the definition of expectation, we have E[τ] = ∞ X τ=0 τPr[S (τ) ∈ S Λ , S (τ − 1) < S Λ ] that equals to 12 (44) E[τ] = ∞ X XX τ Pr[S (τ) = k, S (τ − 1) = l] τ=0 k∈S Λ l<S Λ ∞ X XX = τ Pr[S (τ) = k|S (τ − 1) = l]Pr[S (τ − 1) = l] τ=0 k∈S Λ l<S Λ ∞ XX X τ = Pr[S (τ) = k|S (τ − 1) = l]p0 · Qτ−1 · el τ=0 (45) k∈S Λ l<S Λ ∞ X =p0 (τ · Qτ−1 ) · b τ=0 P where b is a column vector with n2 component, b( j) = i∈Λ Q(i, j) if j < Λ, b( j) = 0 if j ∈ Λ. Then applying the series summation approach to matrix, finally we have E[τ] = p0 (I − B)−2˜ b. (46) where B is the sub-matrix of Q deliminating the rows and columns with index in Λ, b˜ is the sub-vector of b deliminating the rows and columns with index in Λ. 13 Robust Influence Maximization Zihan Tan November 1, 2014 1 Introduction Last decade has witnessed a tremendous increase in research of social network, since the Internet, together with large amount of data on it became more accessible to people. Study on influence propagation thus plays an important role in many applications including research on viral market, epidemic spread and network advertising. Influence maximization served as a central problem in influence propagation, and according to Chen et al [1], among previous mathematical models for studying influence maximization, independent cascade model and linear threshold model turned out to be most successful. Approximation algorithm and hardness results towards influence maximization on linear cascade model were proposed in previous literature. D.Kempe et al [2] proved that finding the optimal seed set for influence spread given the metric of network is generally NP-hard, and proposed an approximation algorithm with ratio 1 − 1e using techniques of submodular optimization. While the optimality of the approximation ration is proved by U.Feige [3]. In independent cascade model, the probability on each edge is give as an accurate real number in [0, 1]. However, in real setting this is not the case. It is necessary that we learn the probability from previous record about certain nodes, and our knowledge on the probability is usually represented by an interval pˆ − , pˆ + where pˆ is sample mean or some estimated value, and is confidence interval. Thus, if we are in favor of a seed set such that this seed set gives good performance on influence maximization under all possible parameters on the network, we are considering the robust version of influence maximization. A.Krause et al [4] proposed a saturation algorithm for solving general robust submodular optimization with a problem-dependent approximation ratio. However, for carrying it onto our problem, certain difficulty must be overcome. To be specific, the robust optimization only consider finitely many submodular functions which take only integer values, while robust influence maximization problem concerns minimum over infinitely many submodular functions which take real values. 1.1 Notations and Definitions We specify some notations about robust independent cascade model. Let N = (V, E) be a network with V denoting the set of vertices and E denoting the set of edges. For every edge e, we have a probability interval [le , re ] (0 ≤ le ≤ re ≤ 1) indicating the range of the latent probability pe on this edge which is unknown to us, where the latent probability is the probability that this edge is a Q live edge in an outcome graph generated by independent cascade model. As a whole, let Θ = e∈E [le , re ] be the parametric space of network N , and θ = (pe )e∈E be an instance of 1 parameter where pe ∈ [le , re ] for every edge e. Specifically, let θ− = (le )e∈E and θ+ = (re )e∈E be the minimum and maximum parametric respectively. For a parameter θ on the network, define function σθ : 2V → R+ to be the information spread function on parameter θ, and for a subset of nodes S that we called source nodes, σθ (S) denotes the expected number of nodes that is activated where the randomness is taken over the appearance of every edge according to parameter θ. 1.2 Three versions of Influence Maximization In this section we briefly summarize three versions of Influence Maximization problems. They are Influence Maximization, Robust Influence Maximization and Stochastic Influence Maximization. The second one is the robust version of the original Influence Maximization problem, with two interesting perspectives. Problem 1. (Influence Maximization) Given graph G = (V, E), probability on each edge θ = {pe }e∈E and fixed budget k, we are required to find a set of k vertices S ⊂ V, |S| = k such that the influence spread function σθ (S) is maximized, where σθ (S) is the expectation of number of node reached, and the randomness is taken according to the probability on every edge. It was proved that general Influence Maximization problem is NP-hard [2], and since the objective function σθ (S) is submodular, we have a 1 − 1e approximation using standard greedy algorithm. On the other hand, it is proved by Feige [3] that such constant could not be improved. Problem 2. (Robust Influence Maximization) Given graph G = (V, E) and the fixed budget k, with probability metric space Θ = ×e∈E [le , re ] be the product of intervals, indicating the ranges of true probability on every edge, we are required to find a set of k vertices S ⊂ V, |S| = k such that for all possible θ ∈ Θ, the influence spread function σθ (S) is comparative to the optimal solution, i.e. our algorithm should output S such that g(S) = min θ∈Θ is maximized, where Sθ? σθ (S) σθ (Sθ? ) is the optimal solution when the probability on every edge is given by θ. We states two concerns of Robust Influence Maximization problem, which are of independent interest. Perspective 1. Performance Directed The performance directed perspective concerns how large could the optimal g(S) be, i.e. how good could the performance of our output be. To some extent, it is the information-theoretic perspective of the problem, i.e. it explores the following expression that we called “optimal performance”. max g(S) = max min |S|=k |S|=k θ∈Θ σθ (S) σθ (Sθ? ) It concerns worst-case analysis, what it optimizes is the ratio of performance of output and the optimal solution in the worst case. If this value is good, we can claim that S is a “universal” good 2 approximation solution to Influence Maximization problem. This is because all we know about the probability on every edge is Θ, and the true θ could take arbitrary value in Θ, then if g(S) is good (e.g. a constant), then we know that under all possible instance of θ, S can always give us an approximation to the Influence Maximization problem. However, if the optimal g(S) is bad (e.g. polynomially small in n) then we know that even if out algorithm find the best S, in the worst case its performance σθ (S) is poor when compared to optimal solution σθ (Sθ? ). Perspective 2. Non-Performance Directed The performance directed perspective concerns how to find the optimal S, even if the performance of the optimal S is bad. The argument is that how large the optimal g(S) is is determined by the graph structure and the input metric space, and is something like knowledge that we cannot change by designing an algorithm, but what we can do is to design an algorithm to find the optimal S or an approximation to it. It can be observed that Non-Performance Directed perspective of Robust Influence Maximization is generally NP-hard, since when all le = re it degenerates to Influence Maximization problem, which is NP-hard. Problem 3. (Stochastic Influence Maximization) Given graph G = (V, E) and the fixed budget k, with distribution Φ = {φe }e∈E of probability on every edge, we are required to find a set of k vertices S ⊂ V, |S| = k such that h(S) = Eθ∼Φ [σθ (S)] is maximized. It can be observed that the objective of Problem 3 is an integral combination of submodular functions, then it is still a submodular function. Thus it could be approximated via standard greedy algorithm. Note that we would need fast evaluation of h(S) in the algorithm, and this could be done by Monte Carlo method with guaranteed accuracy. 2 Results on Performance Perspective In this section we introduce our results on performance directed perspective of Robust Influence Maximization problem. With some analysis it can be seen that constraints on Θ is needed. 2.1 Sensitivity of Influence Spread Function In this subsection we analyze the sensitivity of influence spread function σθ (S). We let δ be small enough and explore how σθ (S) is influenced when giving a δ− perturbation to its parameter θ. First we recall the following lemma proved in [1]: Lemma 1. (Sensitivity of Influence Spread Function) Given graph G and parametric space Θ, ∀S ⊆ V, ∀θ1 , θ2 ∈ Θ such that ||θ1 − θ2 ||∞ ≤ δ, then |σθ1 (S) − σθ2 (S)| ≤ f (δ) where f (δ) = |V | · |E| · δ, and ||θ1 − θ2 ||∞ ≤ δ means ∀e, |pθe1 − pθe2 | < δ. We then propose the following example showing that this bound could not be improved when considering only the number of edges and nodes, by not improvable we mean the order of n and m is not improvable, while the constant could be better. 3 Consider the graph G = (V, E) where V = A∪B = {a1 , ..., an/2 , b1 , ..., bn/2 } and there is an edge between ay pair of vertices. Θ is defined as the following: For all i, j ∈ [n/2], [l(ai ,bj ) , r(ai ,bj ) ] = [0, δ], for all i 6= j ∈ [n/2], [l(ai ,aj ) , r(ai ,aj ) ] = [l(bi ,bj ) , r(bi ,bj ) ] = [1 − δ, 1]. We are required to choose k = 1 vertices as the only source node. Assume without the loss of generality that an algorithm chooses S = {a1 } and we then calculate the sensitivity of influence spread function. On one hand, since with high probability u1 will reach the all vertices in A and no vertices in B, we have σθ− (S) ≈ n 2 2 On the other hand, when giving high probability to every edge, if at least one of the n4 edges between the two n2 -node cliques is live, then all vertices can be reached, otherwise only n2 nodes are reached. Thus, n2 n2 n n n n2 σθ+ (S) = (1 − δ) 4 + n · [1 − (1 − δ) 4 ] ≈ + · ·δ 2 2 2 4 σθ+ (S) − σθ− (S) ≈ n3 ·δ 8 3 Note that this argument implies that we could not avoid O( nδ ) when a δ−perturbation is assigned to the parameter when there is no additional constraint on the parameter. And by the following small modification we could make it show that the sensitivity bound mn˙ · δ is not improvable. Let the graph be G = (V, E) where V = A ∪ B = {a1 , ..., an/2 , b1 , ..., bn/2 }, and E = {(ai , ai+1 ) | i ∈ [ n2 −1]}∪{(bi , bi+1 ) | i ∈ [ n2 −1]}∪Eint . Eint could be arbitrary set of m−n+2 edges with one of its endpoints in A and the other endpoint is in B. Θ is defined as the following: For all i, j ∈ [n/2], [l(ai ,bj ) , r(ai ,bj ) ] = [0, δ], and for all i ∈ [ n2 − 1], [l(ai ,ai+1 ) , r(ai ,ai+1 ) ] = [l(bi ,bi+1 ) , r(bi ,bi+1 ) ] = [1 − δ, 1]. We are required to choose k = 1 vertices as the only source node. It is not hard to verify that as long as m > 2n, this construction illustrates the tightness of Lemma 1. 2.2 Main Results with Constraints on δ In this subsection we use the propositions proposed in previous one to give analysis to the performance of RIM under the different constraints of Θ. Proposition 1. If there is no constraint on input parametric space Θ, then max|S|=k minθ∈Θ is at most O( nk ). σθ (S) σθ (Sθ? ) Proof. Let G be an n−clique and for every e ∈ E, let le = 0 and re = 1, suppose S = {v1 , · · · , vk } be output of the algorithm, let pe = 0 for all ES = {e = (u, v) | u ∈ S or v ∈ S} and let pe = 1 for all e ∈ / ES . Then σθ (S) = k and σθ (Sθ? ) = n − 1. Since we can regard Θ as our knowledge about the transmitting probability on each edge, we can assume that we do have some meaningful knowledge. In other words, we should have some constraints on Θ so that the worst case performance is not as poor as O( nk ). The most instructive constraint that we could think of is “uniform length constraint with constant δ”: for every e ∈ E, re − le ≤ δ. 4 Recall we have the following two results about the performance when we have non-trivial uniform-length constraints on Θ: Proposition 2. When δ = O( n1 ), for any deterministic algorithm with optimal performance r, r = O( logn n ), where n is the number of nodes in the network. Proof. Consider graph G = (V, E) such that V = A ∪ B, |A| = |B| = n2 and E = {(u, v) | u, v ∈ A or u, v ∈ B}, and let E(A) be the set of edges with two endpoints in A and E(B) defined similarly. The problem is to find a single vertex(k = 1) such that the influence spread is maximized. Let p = n2 and the input instance is le = p − and re = p + for every edge e such that [le , re ] covers the critical interval of Erdos-Renyi’s graph with n2 nodes. Now since every node is seemingly the same for any algorithm, suppose the algorithm choose a vertex u ∈ A, then consider the worst case θ where for every e ∈ E(A), pe = le and for every e ∈ E(B), pe = re . It can be figured out that the optimal solution is an arbitrary point v ∈ B. Since σ({u}) = O(log n) and σ({v}) = O(n), then the ratio r = O( logn n ). If we allow the algorithm to be randomized, namely the output seed set S˜ is a random variable, the definition of optimal performance would be: " # ˜ σθ (S) r = max min ES˜ (1) ˜ S|≤k ˜ θ∈Θ σθ (Sθ∗ ) S:| Proposition 3. When δ = O( n1 ), for any randomized algorithm with optimal performance r, √ n ), where n is the number of nodes in the network. r = O( log n √ Proof. Consider graph G = (V, E) such that V = ∪1≤i≤√n Ai , |Ai | = n and E = {(u, v) | u, v ∈ Ai ∀i}, and let E(Ai ) be the set of edges with two endpoints in Ai . The problem is to find a single vertex(k = 1) such that the influence spread is maximized. Let p = √1n and the input instance is le = p− and re = p+ for every edge e such that [le , re ] covers the critical interval of Erdos-Renyi’s √ graph with n nodes. Now since every node appears to be the same for any algorithm, suppose the algorithm outputs √ a distribution on [ n], i.e. p1 + p2 + · · · + p√n = 1. Without loss of generality let p1 be the smallest one. Then consider the worst case θ where for every e ∈ E(A1 ), pe = re and for every e ∈ E(Ai ), i ≥ 2, pe = le . It can be figured out that the optimal solution is an arbitrary point v ∈ A1 . Since √ √ 1 1 σ(AlgΘ ) ≤ √ O( n) + (1 − √ )O(log n) = O(log n) n n and then the performance √ σ(OPT) = O( n) log n r = O( √ ) n 5 When adding tighter constraint on Θ, we could in expect better performance of informationtheoretic solution. To be specific, using Lemma 1, we could prove the following proposition: Proposition 4. For any graph G, if for all e ∈ E, re − le ≤ δ, then ∀θ0 ∈ Θ, let σθ (Sθ∗0 ) θ¯ = arg min θ σθ (Sθ∗ ) Then, σθ¯(Sθ∗0 ) σθ¯(Sθ∗¯ ) ≥1− 2nmδ σθ¯(Sθ∗¯ ) Proof. If we make tighter constraint on Θ, we would know pe more accurately, since standard greedy algorithm could obtain nearly constant approximation of Influence Maximization, it can also be applied here to get a similar constant performance. Thus, we obtain the following informal result table. δ Information Theoretic 1 O( n1 ) Polynomial Algorithm O( n1 ) ∆1 = O( n1 ) √ n) O( logn n );O( log n √ n) O( logn n );O( log n 1 ∆2 = O( mn ) 1− 1− 1 e − 0 1 1− 1 e One thing that we need to mention is that the value in the table is the upper bound or the lower bound of the real value (That’s also why this table is informal). To be specific, we could have the proposition of following type: “When our input parametric space Θ satisfies uniform length constraint with constant ∆1 , then the optimal performance of any deterministic algorithm could not be larger than O( logn n )” or “When our input parametric space Θ satisfies uniform length constraint with constant ∆2 , then there exists an algorithm that can achieve the performance at least 1 − 1e − ”. 2.3 2.3.1 Main Results with Constraint on Eigenvalue The Largest Eigenvalue of an Undirected Graph The results obtained from the research of the spread of SIS model arouse us to use the spectrum of a graph to study the information spread in our model. We try to use the largest eigenvalue to help us detect the ”critical threshold” of a graph. Following discussion is our trials with the import of graph spectra. We only consider the undirected weighted graph, where the weight of e is decided by the appended probability θe . Let G be the undirected graph and θ be the influence. Let Pθ be the adjacency matrix induced by (G, θ). It is obvious that Pθ is symmetric. Let λθ be the largest eigenvalue of Pθ , then we can get that λθ > 0 by the linear algebra. Then we define the average degree dθ (v) of a vertex v is X dθ (v) := θ(v,u) u∈neighbor(v) With these notations, we have the following result which may be helpful in our discussion. 6 Lemma 2. P dθ (v) ≤ λθ ≤ max dθ (v) v∈V n v∈V Proof. First, we need to notice that Pθ is symmetric, thus all eigenvalues λi of Pθ are real numbers. Thus we can see n X λi = tr[Pθ ] = 0 ⇒ λθ ≥ 0 i=1 First, we consider v = (1, 1, · · · , 1)H . Then we can find that rP P 2 dθ (v) kPθ vk2 v∈V dθ (v) = ≥ v∈V λθ ≥ kvk2 n n Then, let u = (u1 , u2 , · · · , un )H be the eigenvector of λθ . Then we find ui such that |ui | ≥ |uj | for all j. Then we see Pn Pn j=1 θij uj j=1 θij |uj | λθ ≤ ≤ dθ (i) ≤ max dθ (v) ≤ v∈V ui |ui | Therefore, we find an easy way to estimate the largest eigenvalue. What’s more, this lemma help us arouse some way to detect the ”critical threshold” of (G, Θ). θ − and θ + are defined as before. For easy writing, let λ− = λθ− and λ+ = λθ+ . Theorem 1. We only consider the condition with seed set of size 1. Let kθ + − θ − k∞ ≤ δ and c be a constant belonging to (0, 1), then 1 )-approximation. 1. when λ− ≥ nc and δ has no constraint, we can only get O( n1−c 1 1 2. when λ− ≥ nc and δ = ω( n1+c ), we can only get O( n1−c )-approximation. 3. when λ− ≥ nc , if we want to get (1 + ε)-approximation, then δ = o( n12 ) is necessary. 4. when λ+ ≤ α, we can get γ-approximation. (to be formed and proved) The fourth claim is only a format, which is what we want to find and prove for the next step. Here, we provide the proof of the first three claims. Proof. For a fixed graph G and influence uncertainty Θ, we define σθ (S) θ∈Θ max|S ∗ |≤k σθ (S ∗ ) ρ(S, k) = min 1 • For claim 1 and claim 2, we first prove that we can get O( n1−c )-approximation. We just choose the vertex v with the maximal average degree under influence θ − as our seed set. Then for any θ ∈ Θ, σθ ({v}) ≥ dθ (v) ≥ dθ− (v) ≥ λ− ≥ nc 7 Thus we have max ρ(S, 1) ≥ ρ({v}, 1) ≥ |S|=1 1 n1−c 1 Thus we can get O( n1−c )-approximation. Now we show that there exists an pair (G, Θ) s.t. this approximation is optimal. Consider a complete graph G, then we divide G into n1−c−ε clique of size nc+ε . Consider an edge e = (u, v): if u, v belong to the same clique, θe ∈ [1−δ, 1]; if u, v belong to the different cliques, θe ∈ [0, δ]. Now we can find that λ− ≥ (1 − δ)nc+ε ≥ nc for enough large n. Assume that we choose v as our seed, then we consider the following θ: Let K be the clique containing v and e = (u, w); if u ∈ V (K) and v 6∈ V (K), then θe = θe− = 0; otherwise, θe = θe+ . Then we can find that σθ ({v}) = nc+ε However, if we choose a node u 6∈ V (K). We only consider G \ K. If we treat every clique as a node, then we can treat the probability of an edge as p(Ki ,Kj ) = 1 − (1 − δ)n 2c+2ε ≈ (≤) n2c+2ε δ However, we can choose ε small enough and n large enough s.t. p := p(Ki ,Kj ) ≥ n2c+ε δ 1 Then we in fact form a random graph G(n1−c−ε , p). Then we can find that if δ = ω( n1+c ), 1 we will have p = ω( n1−c−ε ). Thus we can get σθ ({u}) = O(n) Therefore, we have ρ({v}, 1) ≤ O( 1 n1−c− ) 1 )-approximation is Notice that ε can be any small positive real number, we show that O( n1−c optimal. • Assume k = 1 and the algorithm selects a single point v in K1 , then we use the previous construction θ, letting the probability on all edges related with K1 are le and all others edges are assigned with re . It could be observed that σθ ({v}) nc nc = = σθ ({S ∗ }) nc + 2nc (nc nc n1−c n1−c δ) nc + 2nc · n2 δ This is because that if there exists edge linking two different cliques, the optimal solution would lead to 2nc nodes to be reached, and the probability that such an edge exists is (nc nc n1−c n1−c δ) = n2 δ when δ is small enough. Thus, to get an 1 + approximation, it is necessary that we let δ = o( n12 ). 8 3 3.1 Critical Detector What is Critical Threshold? We try to figure out what is ”critical threshold” of a graph. When we fix the size of influence space Θ, we can easily find that the critical threshold is decided by the position of interval, or in other words, the value of θ − . Therefore, we try to use the value of θ − to detect the critical threshold. First, we need to have a look at the influence spread function σ(·)(omit θ). Notice that we can treat the spread process as follows: first, we random pick up edges by the appended probability θ, then we figure out the vertices connected with seed set. In this way, we can treat G as a random variable, thus we use configuration C to represent the possible result of random variable G. Then we can get X σ(S) = Pr[G = C]|C(S)| C here we use C(S) to represent the set of vertices connected with seed set in graph C. Then we pick up an edge e, we can rewrite the formula as follows: X X σ(S) = θe (Pr[G = C|e ∈ E(C)]|C(S)|) + (1 − θe ) (Pr[G = C|e 6∈ E(C)]|C(S)|) C:e6∈E(C) C:e∈E(C) Notice that if we fix θe0 for all e0 6= e and change the value of θe . We must get the maximal value of σ(S) when θe = 0, 1. In fact, for σ(S), when θ = θ + we get the maximal value. Now, if we consider σθ+ (S) − σθ− (S) with fixed δ but non-fixed value of θ − , we can also write it as σθ+ (S) − σθ− (S) X = θe− Pr[G = C|e ∈ E(C)] − Pr[G = C|e ∈ E(C)] |C(S)| C:e∈E(C) + X θ− θ+ (1 − θe− ) Pr[G = C|e ∈ E(C)] − Pr[G = C|e ∈ E(C)] |C(S)| C:e6∈E(C) θ− θ+ In the same way, we can find that σθ+ (S) − σθ− (S) get the maximal value when θe− = 0 or 1 − δ, namely we get the maximal value at the end of the interval. But here we cannot easily find the actual value as for σθ (S). Here we can find that if this maximal value is vary small, we can think that there is no critical threshold in our influence space. If we call the condition that threshold may happen as critical section, then critical section should be some kind of 0-1 assignment. 3.2 Critical Detector Followed by the discussion of critical threshold, we want to find an easy way to detect whether the threshold has non-zero probability to happen in our input (G, Θ). Therefore, we want to design a algorithm to detect this condition, which we call as critical detector. Then we have following conjecture, which will help us to develop a easy-use detector. Conjecture 1. Given input (G, Θ), we define ρ(S, k) = min max θ∈Θ |S ∗ |≤k 9 σθ (S) σθ (S ∗ ) then for all integer k ∈ [1, n], we have max ρ({v}, 1) ≤ max ρ(S, k) v∈V S:|S|≤k If this conjecture is right, we just need to consider k = 1 to detect the critical threshold. Because if maxv∈V ρ({v}, 1) is large enough, then we can assert that maxS:|S|≤k ρ(S, k) is large enough. Here, we can see that even we just need to consider k = 1, maxv∈V ρ({v}, 1) is still difficult for us to calculate. Thus we may use following property to estimate maxv∈V ρ({v}, 1). Property 1. Given input (G, Θ), max ρ({v}, 1) ≥ v∈V maxu∈V σθ− ({u}) maxv∈V σθ+ ({v}) Proof. Let u∗ be the vertex that makes σθ− ({u}) largest and θ ∗ be the correspond influence of ρ({u∗ }, 1). Then we have max ρ({v}, 1) ≥ ρ({u∗ }, 1) = v∈V Notice that value of algorithm. 4 σθ∗ ({u∗ }) σθ− ({u∗ }) maxu∈V σθ− ({u}) ≥ = ∗ ∗ max|S ∗ |≤1 σθ (S ) maxv∈V σθ+ ({v}) maxv∈V σθ+ ({v}) maxu∈V σθ− ({u}) maxv∈V σθ+ ({v}) can be calculated, thus we can put it in our detector Approximation Algorithm on Non-Performance Perspective In this section we apply the Saturation Algorithm provided in paper “Robust Submodular Observation Selection” [4] to our problem. Although there is some obstacle in directly applying it, e.g., Finitely many submodular functions are considered in [4] while infinitely many functions are considered here in RIM. We made effort to overcome it, and our main result is shown in the following theorem. Theorem 2. For the Robust Influence Maximization problem, let S ∗ = arg max min |S|≤k θ∈Θ σθ (S) σθ (Sθ∗ ) For any small constant > 0, there exists an algorithm that can output a seed set S g in time O(mn2 log(m)) such that (1)|S g | ≤ α · |S ∗ | (2) σθ (S) σθ (A) (2) min ≥ β · max min ∗ ∗ θ∈Θ σθ (Sθ ) A,|A|≤k θ∈Θ σθ (Sθ ) where α = log minj,t Q mn e=(j,t) (1−re ) + 1, and β = 1− 1+ (1 − 1e ), m = n3 k . Particularly, if ∀e ∈ E, re ≤ 1 − n−p for some constant p, then α = O(log(n)). The algorithm is just like the Submodular Saturation Algorithm, containing the following steps: 10 1. Let m = n3 k . Define parameter θ1 , ..., θm : θ1 = θ− θm = θ+ (3) θm,e − θm−1,e = θm−1,e − θm−2,e = ... = θ2,e − θ1,e , ∀e ∈ E 2. For each parameter θi , use the basic greedy algorithm to obtain a seed set Sθgi . 3. Initial searching interval cmin = 0, cmax = 1. 4. Each iteration, pick the mid point c = (cmin + cmax )/2. 5. Obtain an approximate solution Sˆ using algorithm GPC with parameter c. ˆ > αk, let cmax = c. 6. If |S| ˆ ≤ αk, let cmin = c, S g = S. ˆ 7. If |S| 8. Until |cmax − cmin | ≤ 4.1 1 m. The New GPC Algorithm For a fixed c, let Fθi ,c (S) = min{ σθi (S) , c} σθi (Sθgi ) m 1 X Fc (S) = Fθi ,c (S) m (4) i=1 GPC Algorithm use a greedy strategy to find an approximate solution of the following Submodular Covering Problem: min |A| A subject to F¯c (A) = F¯c (V ) Algorithm 1 GPC Algorithm for a fixed c Input: Graph G, parameter value set Θ, {θi }m i=1 , c. S←∅ Q while c − Fc (S) > 1/ minj e=(j,∗) (1 − re ) do v0 ← arg maxv∈V \S Fc (S ∪ {v}) S ← S ∪ {v0 } end while output S 11 4.2 Performance of S g Assume S1 = arg min |S| s.t. σθi (S) ≥ c, ∀i ∈ [m] σθi (Sθgi ) S2 = arg min |S| s.t. min S S θ∈Θ σθ (S) ≥c σθ (Sθ∗ ) (5) Lemma 2 and 4 guarantee the good performance of S g shown in Theorem 1. Actually the analysis is just same as the one in ”Robust Submodular Selection”, so we will omit it and focus on the two lemmas: Lemma 2 shows the fact that the size of S g is not large. Lemma 3. |S g | ≤ α · |S2 | where α = log minj Q mn e=(j,∗) (1 − re ) (6) + const (7) Proof. According to the result in Wolsey’s paper, |S g | ≤ β · |S1 | (8) where β = 1 + log maxj∈V Fc ({j}) the minimum nonzero increase of Fc during the GPC algorithm Notice that for any set S such that minθ∈Θ σθ (S) σθ (Sθ∗ ) (9) ≥ c, it also satisfies for all θi , σθi (S) σθi (S) ≥c g ≥ σθi (Sθ∗i ) σθi (Sθi ) (10) Thus, we have |S1 | ≤ |S2 |. With the following lemma: Lemma 4. With the definition shown above, β ≤ α. we have |S g | ≤ β|S1 | ≤ β|S2 | ≤ α|S2 | (11) which completes the proof. The proof of Lemma 3 is shown in the next subsection. Lemma 4 shows the fact that the objective function of S g must be at least a constant rate of the best objective value. Lemma 5. min θ∈Θ σθ (S) σθ (A) ≥ β · max min ∗ σθ (Sθ∗ ) A,|A|≤k θ∈Θ σθ (Sθ ) (12) 1 1− (1 − ) 1+ e (13) where β= 12 Proof. Consider the time when S g is assigned in Step 7 and the value hasn’t been changed after that. Assume the feasible c this time is c0 . According to the definition of sequence {θj }m j=1 in Step 2, for all θ ∈ Θ, there exists i ∈ [m] such that 1 ||θi − θ||∞ ≤ (14) 2m Then with Lemma 1, it implies 1 |E||V | 2m |σθ (S) − σθi (S)| ≤ ≤ σθ (S) k (15) (1 + )σθ (S) ≥ σθi (S g ) ≥ c · σθi (Sθgi ) 1 ≥ (1 − )c · σθi (Sθ∗i ) e 1 ≥ (1 − )c · σθi (Sθ∗ ) e 1 ≥ (1 − )(1 − )c · σθ (Sθ∗ ) e (16) σθ (S) 1− 1 1− 1 σθ (S ∗ ) ≥ (1 − ) · c ≥ (1 − ) · min θ σθ (Sθ∗ ) σθ (Sθ∗ ) 1+ e 1+ e (17) Thus for all θ ∈ Θ, we have Then, min θ∈Θ 4.3 Upper Bound of α To control the size of the seed set we find, we need β to be relatively small compared to n. Proof. (proof of Lemma 3) maxj∈V Fc ({j}) the minimum nonzero increase of Fc during the GPC algorithm maxj∈V Fc ({j}) ≤ 1 + log minj∈V,S⊆V,Fc (S∩{j})6=Fc (S) Fc (S ∩ {j}) − Fc (S) β = 1 + log Firstly, for all j ∈ V , (18) m Fc {j} = 1 X Fi,c ({j}) m i=1 m 1 X σθi ({j}) ≤ m σθi (Sθgi ) i=1 1 ≤ 1− 1 ≤ 1− m 1 e 1 X σθi ({j}) m σθi (Sθi∗ ) 1 e 13 i=1 (19) Secondly, we consider the value minj∈V,S⊆V,Fc (S∩{j})6=Fc (S) Fc (S ∩ {j}) − Fc (S). Actually we have m 1 X σθi (S ∩ {j}) − σθi (S) σθ (S ∩ {j}) − σθi (S) 1 min i ≥ g m m j,S σθi (Sθi ) σθi (Sθgi ) i=1 ≥ ≥ 1 min(σθi (S ∩ {j}) − σθi (S)) mn j,S Q minj e=(j,∗) (1 − re ) According to the Algorithm 1, c − Fc (S) ≥ 1/ minj mn Fc (S ∩ {j}) − Fc (S) ≥ (1 − 1e ) minj mn Q e=(j,∗) (1 − re ) − re ). So Q minj e=(j,∗) (1 − re ) e=(j,∗) (1 min j∈V,S⊆V,Fc (S∩{j})6=Fc (S) β ≤ 1 + log Q (20) = 1 + log mn minj Q mn e=(j,∗) (1 − re ) (21) (22) The algorithm provide a way to obtain a different kind of ”approximation” toward the Influence Difference Maximization problem. It enlarge the constraint size while maintain the performance. The algorithm could deal with all of the graph and most of the circumstance unless re = 1 for some e ∈ E. However, we cannot use a random algorithm getting a seed set S1g that satisfies: |S1g | ≤ |S ∗ | σθ (S) σθ (A) min ≥ γ · max min ∗ ∗ θ∈Θ σθ (Sθ ) A,|A|≤k θ∈Θ σθ (Sθ ) (23) The reason is the objective function is not submodular. This problem restricts the use of the algorithm. Tian Lin suggests that deleting node from S g might be helpful. But the solution hasn’t been found. References [1] C.Wei et al, Information and Influence Propagation in Social Networks, Morgan and Claypool, 2013. [2] D.Kempe et al, Maximizing the Spread of Influence through a Social Network, Proc. 9th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 2003. [3] U.Feige, A Threshold of ln n for Approximating Set Cover, J. ACM, 45(4), 1998. [4] A.Krause et al, Robust Submodular Observation Selection, Journal of Machine Learning Research 9 (2008) 2761-2801. 14 CPEMAB Research Report 1014 Zihan Tan November 1, 2014 Problem 1. Given two biased coins indexed by 1 and 2, with unknown latent probabilities p1 and p2 (these two data called latent structure, and they could be equal), we are required to judge which coin has larger latent probability by making samples. An algorithm is said to be (, δ) correct if for any latent structure, with probability ≥ 1 − δ, it gives the correct output (if p1 = p2 then both 1 and 2 are correct outputs). Proposition 1. There does not exist an algorithm satisfying the following properties: (1)There exists a horizon time T such that the algorithm will halt after T steps (2)It is (0, δ) correct. Proof. We prove by contradiction. In Sample Complexity it is proved that (Theorem 1) Lemma 1. There exist positive constants c1 , c2 , 0 , and δ0 , such that for every n ≥ 2, ∈ (0, 0 ), and δ ∈ (0, δ0 ) correct, and for every (, δ)-correct policy, there exists some p ∈ [0, 1]2 such that: n c2 log 2 δ Suppose such an algorithm A exists, we choose > 0 sufficiently small such that c1 n2 log cδ2 > T . Since A’s output is (0, δ), then it is also (, δ) correct. However, it is proved that the expected running time is larger than T , which is a contradiction. Ep [T ] ≥ c1 Before proving the next proposition, some notations are in order to make the statements clear. Definition 1. (realization; finite realization) A pair of finite 0 − 1 sequences τ = (a1 a2 · · · an · · · , b1 b2 · · · bn · · · ) is called a realization of two coins, i.e. when algorithm makes the ith sample of coin 1, its outcome is ai , and when the algorithm makes the j th sample of coin 2, its outcome is bj . A pair of finite 0 − 1 sequences φ = (a1 a2 · · · an , b1 b2 · · · bn ) is called a finite realization of two coins, i.e. when algorithm makes the ith sample of coin 1, its outcome is ai , and when the algorithm makes the j th sample of coin 2, its outcome is bj . Let n = |φ| denotes its size. Since the output of the algorithm only relies on the outcome of samples, then for any realization, the algorithm either never halts on it, or halts on some finite realization (which is a finite prefix of the realization). For a latent structure (p1 , p2 ) define the a σ-algebra and a measure Pp1 ,p2 (·) on it as the following: 1 Definition 2. Let Ω be the set of all realizations. Then for a finite realization φ, let Sφ = {τ | φ is the prefix of τ }, let A = {Sφ | φ is a finite realization} ∪ {∅} ∪ {Ω}, then A is a σ-algebra on Ω. Let Pp1 ,p2 : A → [0, 1] such that Pp1 ,p2 (∅) = 0, Pp1 ,p2 (Ω) = 1 and Pp1 ,p2 (φ) = Y 1≤i≤n (1 − p1 + ai (2p1 − 1)) (1 − p2 + bi (2p2 − 1)) It is not hard to check that Pp1 ,p2 is a probability measure on (Ω, A). Definition 3. (Halt with probability 1) Let A be an algorithm, IA (τ ) be the indicator of halting, i.e. IA (τ ) = 1 if and only if A halts on realization τ , which means A halts after observing some finite prefix φ of τ . We say an algorithm halts with probability 1 if for all (p1 , p2 ), EPp1 ,p2 [IA ] = 1 Proposition 2. For any δ < 21 , there does not exist an algorithm satisfying the following properties: (1)With probability 1 it will halt. (2)It is (0, δ) correct. Proof. We prove by contradiction, assume such an algorithm exists. First without loss of generality we assume that the algorithm always make equal number of samples of both coins when halting (if an algorithm halts after making r samples of coin 1 and t samples of coin 2 (r > t), we could let it make r samples of both coins and only use t first records of coin 2). Now we suppose when algorithm gets a particular finite realization, it either stops or continues to make another sample (some algorithm randomly choose to stop or continue, and we now exclude them in the following proof, although with small modification towards the proof they could be however included). Definition 4. (terminating finite realization) Let φ be a finite realization, then it is a terminating finite realization for algorithm A if the algorithm halts when observing some prefix of φ as the outcome of samples. According to the definition, for all (p1 , p2 ), let X Pn (A) = Pp1 ,p2 (Sφ ) |φ|=n;φ is terminating for A Then we have: lim Pn (A) = 1 n→∞ Choose > 0 such that < 21 − δ. Let p1 = p2 = 12 , then there exists N ( 12 ) ∈ N such that for all n ≥ N ( 21 ), Pn (A) ≥ 1 − 12 . Now choose small enough 1 > 0 such that X |φ|=N ( 21 ) 1 |P 1 , 1 (Sφ ) − P 1 +1 , 1 −1 (Sφ )| ≤ 2 2 2 2 2 and 2 X |φ|=N ( 21 ) 1 |P 1 , 1 (Sφ ) − P 1 +1 , 1 −1 (Sφ )| ≤ 2 2 2 2 2 P This could be done since when g(x) = |φ|=N ( 1 ) |P 1 , 1 (Sφ ) − P 1 −x, 1 +x (Sφ )| is continuous and 2 2 2 2 2 g(0) = 0. Then, we set two latent structures. In the first structure p1 = 12 + 1 , p2 = 12 − 1 , while in the second structure p1 = 21 − 1 , p2 = 12 + 1 . It is immediate that outputting 1 in the first structure is correct and outputting 2 in the first structure is correct. Then consider the output of algorithms on these terminating finite realizations. Since for both latent structures, the probability that algorithm stops in N ( 12 ) steps is larger than 1 − , let s1 be the probability of outputting 1 in the first latent structure, s2 be the probability of outputting 2 in the first latent structure; let t1 be the probability of outputting 1 in the second latent structure, t2 be the probability of outputting 2 in the second latent structure. Thus, since the output of algorithm relies only on the realization that it observes, |s1 − t1 | + |s2 − t2 | ≤ Therefore, |s1 − t1 | ≤ , |s2 − t2 | ≤ . And since s1 + s2 ≤ 1, and the algorithm is correct with probability at least 1 − δ, we obtain that s1 ≥ 1 − δ, then s2 ≤ δ and t2 ≤ δ + and therefore t1 ≥ 1 − 2 − δ, and the probability that algorithm output correctly on the second latent structure is at most 2 + δ < 1 − δ, causing a contradiction. This finishes the proof. 3 On the Generalized Pagerank Model Zihan Tan; Yang Song; Yuchen Yang October 31, 2014 Abstract Although various centrality is investigated through different networks, special and mature centrality for analyzing influence of theoretical researcher has not appeared. Among previous method Bonacich’s model is the most outstanding one. However, in his main equation the relationship is trivially adjacency matrix. Our main idea is that one should build particular weighted relationship based on reasonable hypothesis which characterizes the real problem. In this paper we first build up the Erdos Network and analyze its property. We then propose a new model called “Pairwise Evaluation Model” to measure the influence, construct the nontrivial weighted relationship matrix by taking both real influence (co-authorship) and virtual influence (fame) into account. This new model will help us to obtain an objective relationship matrix when lacking in certain data. Following this idea, we then deal with two additional problems: We analyze the influence of movie actors in film-network and the influence of a paper in the citation network. Small variation is made in “Pairwise Evaluation Model” according to reasonable hypothesis, which gives us better results in two additional problems. 1 On the Generalized Pagerank Model Page 2 of 21 Contents 1 Introduction and Background 1.1 Previous Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Our Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3 4 2 Erdos Network and Property Analysis 5 3 Pairwise Evaluation Model and Pagerank Model 3.1 Explicit Expression of Models . . . . . . . . . . . . . . . . . . . . . . . 3.2 Interpretation of Models . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Essence of Pagerank Model . . . . . . . . . . . . . . . . . . . . 3.2.2 Essence of Our Pairwise Evaluation Model . . . . . . . . . . . . 3.2.3 Comparison and Our Improvement . . . . . . . . . . . . . . . . 3.3 Mathematical Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Convergence of Series and Main Equation . . . . . . . . . . . . 3.3.2 Numerical Method for Obtaining Approximation Solution for V 3.3.3 Interpretation of Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 7 8 8 8 9 10 10 10 10 4 Results 4.1 Influence Analysis in Erdos Network . . . 4.1.1 Algorithm and Results . . . . . . . 4.1.2 Remarks and Comments . . . . . . 4.2 Influence Analysis in Film Actor Network 4.2.1 Algorithm and Arguments . . . . . 4.2.2 Results and Comments . . . . . . 4.3 Influence Analysis of Fundamental Papers 4.3.1 Algorithm and Arguments . . . . . 4.3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 11 11 11 12 12 13 14 14 15 5 Essence and Understanding of Modeling Influence 5.1 Science and understanding of modeling influence within a network 5.2 Analysis on Individual Strategy . . . . . . . . . . . . . . . . . . . . 5.2.1 Experiment and Results . . . . . . . . . . . . . . . . . . . . 5.2.2 Analysis and Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 15 17 17 18 6 Sensitivity Analysis, Strength and Weakness 6.1 Sensitivity Analysis . . . . . . . . . . . . . . . 6.1.1 Experiment and Results . . . . . . . . 6.1.2 Analysis and Comments . . . . . . . . 6.2 Strength and Weakness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 18 18 20 20 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . On the Generalized Pagerank Model 1 Page 3 of 21 Introduction and Background In academic field, people tend to use various tools like SCI, H-factor, Impact factor or Google Scholar to evaluate peers. These are all network-based analysis tools. Long before the emergence of network science, there is a certain measurement–Erdos number–in mathematics community. Paul Erdos published over 1400 papers and co-authored with more than 500 researchers. Because of his significant influence in mathematic community, Erdos number is created to measure the proximity to him in bibliographical terms. In this paper we investigate the influence of researchers in network Erdos1, which contains all professors that have directly co-operated with Erdos. We develop a new model called “Pairwise Evaluation Model” to measure the influence, taking both real influence (co-authorship) and virtual influence (fame) into account, which will be discussed in section 3. The main idea is that, among researchers, different person has different point of view of a specific researcher, which is different from ideas in previous paper (uniform influence coefficient). We then analyze the performance of the model in movie actor network. For analyzing the influence of a paper or a journal, small variation is made in “Pairwise Evaluation Model” (combined with pagerank model) to obtain better performance. 1.1 Previous Results Intuitively, the node with highest degree should be most important in the graph. But there are cases where a node has many edges but isolated from the major population. Centrality measurements are classic for evaluating importance. Previous work has done a lot in this region. Bonacich [1] proposed a family of centrality measures in 1987. Apart from centrality measurements such as betweeness and closeness, another parameter beta is included to take into account the global property of the network. For a given node, if β is large, the centrality of other nodes connected to it takes more weight in the evaluation of this node’s importance. We develop our algorithm based on the same idea. However, this measurement can only take into account networkbased internal information, making assumptions such as all nodes in the graph is equal.Under the framework of the model, this measurement c(α, β) could be interpreted as the number of paths activated by sending a signal of a particular node. By comparing ci (α, β) we could evaluate the importance of node i. We will give further explanation and exploration in the model section. Newman investigate structure and properties of scientific collaboration network. In his papers [2][3], some local and global statistics and their differences in different bibliographic databases are investigated. Table 1 list all statistics include in his work. Watts and Strogatz first proposed the idea of small world model in 1998. Before that, a network is considered either completely random or totally regular. But it is not the case in many real networks whose topology lies in between. This phenomenon is measured by comparably large cluster coefficient and unusual small characteristic path length, compared with a regular graph. The indication is that akin to regular graph, small-world network is highly clustered, but the shortest distance between any pair of nodes is much smaller than regular lattice. This is caused by the existence of shortcuts that connect two far-away vertices together. Many social networks like 3 On the Generalized Pagerank Model Table 1: statistics for the scientific collaboration networks Statistics Collaborators per author Explanation significant people is more likely to have more collaborators results power law 80-90% of the total nodes Size of giant component Clustering coefficient Betweenness Average distances Page 4 of 21 in the network local communities, such as belonging to the same affliction in scientific community a measurement of significance average of the shortest path between each pair of nodes information can circulate fast strong clustering effect funneling, very clear winner small world effect co-author network, co-stardom network all have this small world phenomenon. (we calculate L and C for our graphs to illustrate the small world property of our network). 1.2 Our Results We constructed the Erdos1 network and analyzed its properties from classical perspective, namely computing all kinds of centrality and give certain distributions. For analyzing influence of individuals in this co-authorship network, we raised a new model called “Pairwise Evaluation”, which is an improved version of Pagerank model. New model takes both “real influence” and “virtual influence” into consideration, which is different from the method used in Pagerank model (Pagerank model simply uses adjacency matrix as the relationship matrix). This improvement is indeed necessary in researcher network because individual emotion matters. Let R denote the real influence matrix and V denote the virtual influence matrix. By defining certain random process to simutate the spread of information, the main equation of our model is the following: α2 V 2 + (α1 R − I)V + I = 0 The main idea for our improvement is that: Although Pagerank model gives us an outstanding way of measuring influence, we still need to find out the main feature in a particular network and characterize it mathematically when investigating this network. In short, we should go down into specific layer from general layer when faced with a particular problem. Following this main idea, we then investigate the influence of paper-citation network and influence in film-actor network. We invented another two improved versions of Pagerank model, with respect to the observation that, generally speaking, the influence of a movie or a paper is decreasing along the timeline, i.e. the earlier it has been published, the less influential it is now. Finally we give our own understanding of modeling influence within a network. 4 On the Generalized Pagerank Model Page 5 of 21 Based on what we have done, we think that the main idea of modeling influence is twofold. Make clear which particular property of influence you really care about and simulate or assign it mathematically on the graph (using random process) to obtain a measurement. The argument above is rather intuitive, but it can be practical and essential, as we suggested, in dealing with certain problems. Besides, random process and maybe some relative concentration inequalities will be a powerful tool in this field. At the end of this paper, sensitivity analysis are done and strengths and weaknesses are pointed out. We also provide insights for future work. 2 Erdos Network and Property Analysis Figure 1: Erdos1 network Figure 1 is part of Erdos network. We delete all researchers with degree less than 5 to obtain this simplified version. We then compute classical centrality and the degree distribution of this network, see Table 2, Table 3 and Table 4. Generally speaking, the graph is not dense. 511 researchers have only 1639 times of collaboration, with the density to be 0.013 and the average degree to be 6.415. One important measurement called number of weakly connected components can also illustrate its sparseness. In this graph, there are 42 weakly connected components. Another illustration is given by the clustering coefficient whose value is 0.343. 5 On the Generalized Pagerank Model Page 6 of 21 Rank 1 2 3 4 (a) betweenness distribution Name HARARY, FRANK SOS, VERATURAN RUBEL, LEEALBERT STRAUS, ERNSTGABOR (b) Top rank in betweenness Table 2: Result in Betweenness 1 2 3 4 (a) Closenness distribution HENRIKSEN, MELVIN GILLMAN, LEONARD BOES, DUANECHARLES GAAL,STEVENA. (GAL,ISTVANSANDOR) (b) Top rank in closeness Table 3: Result in closeness 1 2 3 4 (a) Closenness distribution ALON, NOGAM HARARY, FRANK GRAHAM, RONALDLEWIS BOLLOBAS, BELA (b) Top rank in degree Table 4: Result in degree However, edges are not uniformly distributed. There are 1639 edges in the network. Less than 25 percent of them (389) are not related with top 100 researchers, and less than half of them (791) 6 On the Generalized Pagerank Model Page 7 of 21 not related with top 50 researchers. (Here top researchers represents the nodes with highest degree.) We can conclude that there is a central group of researchers, they dominate most collaboration. 3 Pairwise Evaluation Model and Pagerank Model In the following discussion we denote the adjacency matrix of co-authorship network by A, i.e. Aij = Aji = 1 if researcher i and j have once coauthored on some paper, and Aij = Aji = 1 if not. Let di be the degree of researcher i, namely the degree of vertex i in the graph. 3.1 Explicit Expression of Models Phillip Bonacich [1] investigated a fundamental family of influence measures. c(α, β) = α(I − βA)−1 A1 In this formula c(α, β) is the influence vector (c(α, β) ∈ Rn ) with the ith coordinate being a real number in [0, 1], representing the degree of influence of researcher i, and α and β are the parameters in his model, and they are flexible to different problems. 1 represents the n−dimensional vector with all entries to be 1. Phillip Bonacich’s model was later generated into Pagerank algorithm, which is a fairly influential and well-performed model in research and application of search engine. The basic idea is mainly the following: In a graph, a vertex is important and influential if the following two conditions are satisfied. • Its degree is large. • Most neighboring nodes are important and influential. We propose a model taking both real influence and virtual influence into account. Our main idea is that: If researcher i has collaborated with j, then they have “real influence” on each other. Besides, every researcher also has “virtual influence” on another researcher. This is the case when i has not co-operated with j but has heard of j or has read j’s paper. Although they have not co-authored on some paper, they know each other to some degree. From our perspective, virtual influence between every pair of researchers should be not the same. The virtual influence of s on v is denoted by a real number vst ∈ [0, 1] in the following discussion. Thus, we separate the influence in the network into two parts: real influence and virtual influence. To be mathematically specific, we denote them by two matrices: R and V . Given a network, these two matrices are determined in the following way: R is simply a normalized version of adjacency matrix A, namely −1 −1 R = diag{d−1 1 , d2 , · · · dn }A V is then determined by the following formula: 7 On the Generalized Pagerank Model Page 8 of 21 α2 V 2 + (α1 R − I)V + I = 0 where α1 and α2 are flexible parameters of the model, which should satisfy α1 +α2 ≤ 1. Besides, these two parameters should not be large so that the following normalization property of V could be satisfied. • Restricted Anonymity For all i, n X Vij ≤ 1 n X Vij ≤ 1 j=1 • Restricted Input For all j, i=1 3.2 3.2.1 Interpretation of Models Essence of Pagerank Model The essence and intuition of Bonacich’s model (we call it “pagerank model” in the following) is the following. Influence is the ability to spread a message. Bonacich defined a random process to simulate the process of spreading a message from each node in the network. Every researcher sends a message to all his neighbors, then with probability β that a communication, once sent, will be transmitted by any receiving individual to any of his contacts. Thus, the expected total number of communication caused by each individual is proportional to the ability of spreading a message from individual i, which can be written as the following formula. c(α, β) = α ∞ X k=1 3.2.2 β k−1 Rk 1 = α(R1 + βR2 1 + β 2 R3 1 + · · · ) Essence of Our Pairwise Evaluation Model Following the main idea and manner of pagerank model, we also define a random process to simulate the process of spreading a message. Differently, we do not measure the expected number of total paths. We measure the expected total amount of information, which will be defined which will be defined as: A researcher receives a same message from k of his neighbors, then the amount of information he gets is k. The random process proceeds in the following manner. First, researcher s receive a message and he is going to spread this message according to following rule. What we care about is the expected amount of information that t received in the process. 8 On the Generalized Pagerank Model Page 9 of 21 When any person i received this message, he will randomly send this message to others. To be specifically, he will do two following “random-spread” actions independently with certain probability: • Local Spread (with probability α1 ) He sends the message independently to all researchers (including himself) according to virtual influence coefficients, i.e. researcher j will get the message from person i with probability vij independently. • Global Spread (with probability α2 ) He sends the message uniformly random and independent to all his neighbors (not including himself), i.e. with probability Rij (also independently) he will pass this message to his neighbor j. This step is well-defined due to normalization property of R. Following the random process, for the pair (s, t), we compute the expected total amount of information received by t sent from s which should also by definition the ability of spreading a message from individual s to individual t, namely vst . The above argument holds for every pair s, t. We can therefore write down the following formula. V = ∞ X (α1 R + α2 V )k k=0 By proving the convergence, we obtain the following main equation: α2 V 2 + (α1 R − I)V + I = 0 Instead of solving it analytically, we use numerical method to obtain an approximation solution of V (which will be in next subsection, and it is also guaranteed that all entries of V is non-negative). However, we are required to output the degree of influence of every researcher in the network. Let M = βR + (1 − β)V be the relation influence matrix of the co-authorship network. β is a flexible parameter for the model. We the output a influence vector u from influence matrix M . Using method of eigenvector centrality, we can get the unique non-negative eigenvector, which is guaranteed by Perron Frobenius Theorem, to be the influence vector: M u = λu namely, ui represents the influence coefficient. 3.2.3 Comparison and Our Improvement Our model differs mainly in two features from pagerank model. First, the uniformity of the probability of transmitting a message is omitted. In pagerank model, with same probability β every researcher will transmit the message to any neighbor of his. However, this may not be the case in real co-author network. It would be more likely that a researcher may send a message to someone that has more influence on him (for example, the one he is admired 9 On the Generalized Pagerank Model Page 10 of 21 of) than to a person he was not familiar with, even if he has ever co-operated with. The degree of closeness has not be a fundamental factor in pagerank model. But here in our model it is denoted by Virtual Influence Matrix V . Second, pagerank model only allows individuals to send the message to his neighbors, which may also not be the case in real network. Consider that one researcher has come up with an interesting problem, he may directly communicate with experts in that field but not first tell his friends or the individuals that he has co-operated with. In our model, the GlobalSpread phase allows this to happen. 3.3 3.3.1 Mathematical Remarks Convergence of Series and Main Equation We are left with the following matrix equation: V = ∞ X (α1 R + α2 V )k k=0 First the convergence of right hand side is guaranteed by simple mathematical argument. For the limited number of pages we omit it. Thus, we obtain our main equation: α2 V 2 + (α1 R − I)V + I = 0 3.3.2 Numerical Method for Obtaining Approximation Solution for V We use approaching method to get an approximation solution to this equation instead of solving it analytically: • Let f (V ) = α2 V 2 + α1 RV + I, the equation asks for a fixed point for matrix function f . • f is a contractive mapping for specific set of V . Recurrence method is therefore employed here to find a solution. • Let V0 be the uniform distribution matrix, which means every entry of V0 is n1 , and Vk+1 = f (Vk ) for k. 3.3.3 Interpretation of Constraints Several constraints are assigned in our model, we give explanations to them here: • Restricted Anonymity and Restricted Input is required to make its operator norm bounded, and therefore the series is convergent. • α1 + α2 ≤ 1 is necessary of the convergence of series. This constraint is also the case when every researcher is required to randomly choose one of two spread action. 10 On the Generalized Pagerank Model Rank 1 2 3 4 5 6 7 8 9 10 Name RODL,VOJTECH GRAHAM,RONALDLEWIS FUREDI,ZOLTAN BOLLOBAS,BELA TUZA,ZSOLT SPENCER,JOELHAROLD SOS,VERATURAN HARARY,FRANK* GYARFAS,ANDRAS FAUDREE, RALPHJASPER,JR. Rank 11 12 13 Score 4.84428 4.82455 4.54946 4.54344 4.25501 3.93599 3.71115 3.67796 3.64732 14 15 16 17 18 19 3.5356 20 Page 11 of 21 Name Score LOVASZ, LASZLO 3.50772 SZEMEREDI,ENDRE 3.45698 CHUNG, FANRONGKING(GRAHAM) 3.45627 PACH,JANOS 3.39278 HAJNAL, ANDRAS 3.13349 NESETRIL,JAROSLAV 3.06854 SCHELP, RICHARDH. 2.98698 SIMONOVITS,MIKLOS 2.95897 KOSTOCHKA, ALEXANDRV. 2.85383 BABAI, LASZLO 2.84392 Table 5: Rank of Mathematicians in Erdos1 network 4 Results 4.1 4.1.1 Influence Analysis in Erdos Network Algorithm and Results Algorithm 1 Erdos Network Analysis 1: Construct adjacency matrix A 2: for i=1 → n do 3: for i=j → n do 4: if (both i and j has collaborated with Erdos more than twice, and i and j has ever collaborated) 5: Aij = 2 6: end for 7: end for 8: Use Bonacich’s model to compute influence vector with β = 0.01 and R = A We list the top rank 20 mathematicians in Table 5 the following table. 4.1.2 Remarks and Comments Some arguments are in order to give better explanation to our algorithm. • Why choose β = 0.01 By computation we find that the spectrum radius of R is 36, following the sensitivity analysis 1 in the next section, to make all entries In the influence vector positive, it is necessary that β ≤ 36 . The relationship in researchers’ network is positive, so we take β = 0.01 (By sensitivity analysis in the next section we know that this is reasonable). This value makes the series convergent. To argue the rationality of β (it is not so small that little impact is made from it in the equation). We consider the Jordan standard representation of R: 11 On the Generalized Pagerank Model Page 12 of 21 R = P −1 JP Then the formula can be written into C(α, β) = αP −1 (J + βJ 2 + · · · )P 1 It can be observed that although β is small, the ratio of norms between consecutive terms are approximately 0.36, which is not negligible. • The Construction of Relationship Matrix Since Erdos is the real centre of this network, i.e. all researchers has collaborated with Erdos. We make the following plausible hypothesis. • A pair of researchers that have collaborated more than once would be familiar with each other. • A pair of researchers that have both collaborated with Erdos more than once, and they have ever collaborated. Then they would be familiar with each other. This hypothesis guides us to break the uniformity in weight of relationship in the following manner: We give such pair a doubled weight in their relationship, namely Aij = Aji = 2. • Main Idea Extracted from Our Model Our main idea is that, when faced with a particular network, one should make characterize it features, construct proper weighted relationship matrix and then apply Bonacichs model to compute influence. It is our belief that well-characterized model will produce good performance. However, in this problem we are just given the adjacency matrix and the times of collaboration of each researcher with Erdos. Further description on the relationship is lacking. Our model in section 3 gives a method to deeper explore the relationship without knowing any other information. It is not used analyzing this problem. The reason is twofold. Approximation Computation is not successful and we are more or less given data about the relationship, namely the number of collaboration with Erdos. 4.2 4.2.1 Influence Analysis in Film Actor Network Algorithm and Arguments We choose the field of movie actors to implement our algorithm. Our data is directly downloaded from website, which is extracted from IMDB (Internet Movie Database). To collect the data, we first maintain a list of 500 famous actors (the influence is computed according to movie 2006 and 2007 that the actor was in, using the following method.) We then search all the collaboration between these 500 actors to determine its relationship matrix. Our main idea for designing the algorithm is to change the trivial relationship matrix (R = A) used in pagerank model into a non-trivial one. To do this, we take “pairwise evaluation” into account to get the influence matrix. This is because the “pairwise evaluation” is the main character of co-authorship network. In movie-actor field, we investigate another important character in the following way. Recall that the influence coefficient (entries in M) measures the degree of influence between researchers. Here the co-efficient should measure the degree that two actors get to know each other. Thus, we need to investigate the rule of determining the degree of familiarity. 12 On the Generalized Pagerank Model Page 13 of 21 We can make the following two reasonable hypothesis: • The familiarity between a pair of actors in a movie will decrease if the total number of actors in this movie is increased. • Among actors in a movie, the familiarity between every pair is nearly the same. Seemingly, the second hypothesis is reasonable but still not convincing enough. In fact we lack data to measure familiarity in every pair, which is just the same case as problem (2): We lack data to measure the influence between a pair of researchers that have co-authored. What we know is just whether or not they have ever co-operated. Following the argument above, our algorithm is given below. The set of all selected actors is denoted by A = {a1 , · · · an }. Let movies be represented by F1 , · · · , Fk , where Fr includes all the main actors (not necessary the selected actors) in that movie. Let y(Fr ) be the year when movie Fr is published. Let actors be represented by a1 , · · · , an . The familiarity matrix is denoted by symmetric matrix M , with mij = mji ∈ R+ being the degree of familiarity between actor ai and aj . Algorithm 2 Actor-Rank 1: for i=1 → n do 2: for j=1 → n do 3: mij = 0 4: end for 5: end for 6: for r=1 → k do 7: for all (s, t) ∈ Fr , s 6= t do 1 8: mst + = mts + = (2008−y(F r ))|Fr | 9: end for 10: end for 11: Use Pagerank formula with R = M get the ranking vector 4.2.2 Rank 1 2 3 4 5 6 7 8 9 10 Results and Comments Name McKeown, Denis Stone, Sharon Lowe, Crystal Baumel, Shane Cage, Nicolas Sykes, Wanda Castro, Mary Koechner, David Lang, Michelle (V) Mann, Danny (I) Influence Factor 2.66057 2.64451 2.56269 2.54767 2.52578 2.33047 2.31565 2.31457 2.27214 2.2532 Rank 11 12 13 14 15 16 17 18 19 20 Name Voronina, Irina Scott, Codie (I) Tatasciore, Fred Campbell, Adam (IV) Halse, Jody Summers, Stewart Kebbel, Arielle Wood, Elijah Caudle, Dr. Melissa Gyllenhaal, Maggie Table 6: Top 20 movie stars 13 Influence Factor 2.2352 2.22132 2.18808 2.15699 2.14141 2.03297 2.00451 2.00262 1.97262 1.92831 On the Generalized Pagerank Model Page 14 of 21 We summarize the results for top 20 movies stars in Table 6. Some analysis are in order based on our results. On one hand, our result is convincing due to the following data: Stone, Sharon was nominated for an Academy Award for Best Actress and won a Golden Globe Award for Best Actress in a Motion Picture Drama for her performance in Casino. Cage, Nicolas received an Academy Award, a Golden Globe, and Screen Actors Guild Award. Lowe, Crystal is known for her scream queen roles as Ashlyn Halperin in Final Destination 3. On the other hand, we can get surprised by part of our result. A vivid example could be that Baumel, Shane is in top 5 even if he is just a child. It might be because he took part in 7 movies in 2006, more than most other actors. And some famous actors including Leonardo DiCaprio are ranked 100+, which seems not reasonable. One reason for the result to be surprising is that for every actor in the same movie, they get exactly same influence from it.This hypothesis will lead us overestimate the influence of many minor actors.This, property will get the persons who attend in more movies to be more influential, but not focus on the importance of their role in the movie. Another reason is that we give all movies the same ability of bringing lifting the influence of actors. However, this is not convincing in the real life. Take Leonardo Dicaprio as an example, the number of movies starring by him is not that large, which causes this famous actor to be ranked low. But most of his movies are well-appreciated and influential. Unluckily, the quality of movie is not measured in our model. Compared with our work in Erdos network, we will find that our model gives better performance on researcher network than actor network. We believe that the main reason is that we does not know who is the major actor in a film. But in theoretical research, one must has deep understanding of his question if he is one of the authors of a paper. Thus, co-authorship implies strong connection between researchers. However, cooperativeness in movie does not imply concrete connection between actors. 4.3 4.3.1 Influence Analysis of Fundamental Papers Algorithm and Arguments Since paper does not have “individual emotion”. Pairwise Evaluation model is not suitable here. However, standard Page-rank algorithm seems to give good performance here because they are designed to evaluate the importance of network, which has a lot in common with influence of papers. We make small changes to Pagerank to get our algorithm below. Initially we have 16 papers on the list, we number them from 1 to 16 according to the order in the list. If a paper is denoted by s, then the age of a paper s (namely the number of years from publication to 2014) is denoted by y(s). let c(s) be average number of citations per year of paper c, namely the number that paper s is cited by others divided by the age of this paper. Let m(t) be the number of papers within 16 selected papers that cite paper t. w(t) is defined to be the weight (namely, influence) of certain paper. We finally rank all the papers by its weight. The paper with more weight is considered to be more influential. Some arguments are in order to give explanations to the changes we made in origin pagerank model. 14 On the Generalized Pagerank Model Algorithm 3 Adjusted Paper-rank 1: for t = 1 → 16 do 2: Construct set CITt = {s|s cites t} 3: end for 4: for all leaf nodes s do 5: w(s) = c(s) 6: end for 7: for all internal nodes t do 8: w(t) = 21 c(t) 9: for all child node s of t do 1 10: w(t)+ = 2m(t) w(s) 11: end for 12: end for Page 15 of 21 • Citation is necessary Lack of the capacity of drawing large amount of data, our network is relatively small. Since it is also a DAG. If we does not add citation to its weight, the weight of a lead node will be 0, which is unreasonable and also bad for further analysis of origin papers. In that case all analysis becomes meaningless. Thus, some “basic” weight must be added to leaf node. Citation is without doubt the best and general choice. • Taking average of weight is necessary To be specifically, when we want to compute the weight of a certain node, the formula is: m(t) 1 1 X w(v) = (c(v) + w(ui )) 2 m(t) i=1 where ui represents the selected papers that cite paper v. We argue that taking the average of weight is necessary. Directly taking sum will make the weight of a child not able to beyond his father node, which is not the case in real life. On the other hand, taking the average can reduce the instability brought by selecting such a small group of papers. • Timeline is necessary. Some papers were published recentlywhile others were published thirty years ago. Suppose that the number of citation increase uniformlyit is better to divide the number of citation by its age (number of years till now since published). 4.3.2 Results Our result for ranking fundamental papers is summarized in Table 7. 5 5.1 Essence and Understanding of Modeling Influence Science and understanding of modeling influence within a network 15 On the Generalized Pagerank Model Rank Article Name 1 Statistical mechanics of complex networks 2 Collective dynamics of small-world’ networks 3 Emergence of scaling in random networks The structure of scientific collaboration networks 4 5 Scientific collaboration networks: II 6 On Random Graphs 7 The structure and function of complex networks On properties of a well-known graph 8 9 Navigation in a small world dentity and search in social networks I0 11 Power and Centrality: A family of measures 12 Networks, influence, and public opinion formation 13 Models of core/periphery structures 14 Identifying sets of key players in a network 15 Social network thresholds in the diffusion of innovations 16 Statistical models for social networks Page 16 of 21 Score 1034.63 985.702 864.223 623.007 575.891 558.505 531.117 482.774 400.973 300.35 210.765 97.1429 39.5 38 27.6667 11 Table 7: Rank of 16 papers We use a random process model to evaluate influence within co-authorship network, film-actor network and citation network. Some understanding is in order for general methods to model influence within a network. Based on what we have done, we think that the main idea of modeling influence is twofold: Make clear which particular property of influence you really care about and simulate or assign it mathematically on the graph (using random process) to obtain a measurement. First we should know what influence is. According to definition on (5), influence is the ability to alter or sway an individual’s or a group’s thoughts, beliefs, or actions. However, these abilities are hard to measure from a general point of view. It is known that in particular social networks influence can be measured practically (6). But from theoretical perspective, (1) tells us that we may not be able to explicitly model the process of persuading others to change their behavior, especially when we do not have all of the necessary data in one place. What should we do then ? To simplify the definition of influence from mathematical point of view might be a good choice. Problem setting is rather clear: a graph with simple edges. Then how should we make the definition? It depends on what information we want to extract from the graph. Following this idea, classical definitions are made in a deterministic manner. For example, when we think that a vertex is influential if it is near than most of other points, closeness centrality is a good mathematical measurement; when we agree that vertex is influential if most paths linking another pair of nodes pass this vertex, betweeness centrality is a good theoretical tool. However, this is not the case for a co-authorship network. We insist that the influence within a co-authorship network is the capacity to spreading the information, i.e. a researcher is influential if whenever he came up with an idea or a problem, most of other researchers would know it, and even follow it. To measure the capacity of spreading information, Bonacich [1] raised an outstanding model. He defined a random process and claim that the expected size region that received the message should be a good measurement of influence for individuals. This is also intuitively fantastic and of great 16 On the Generalized Pagerank Model Page 17 of 21 importance. It guides us to simulate mathematically what we care about on the graph. Random process is the essential tool for this methodology and so does our work. What is original in our work is that we established a new framework to compute relationship matrix, different from Bonacich’s method, which just used trivial adjacency matrix. Our work is to some extent better because we take “pairwise evaluation” into account, which characterizes the feature of co-authorship network. Methodology of random process is fundamental and effective. It can also be observed from the fact that pagerank algorithm ruled the searching engine for such a long time. People even said that the main formula for pagerank is the wealthiest formula among all. Thus, making clear what property you really care about in graph is just letting your argument make sense mathematically. Simulating it (through random process method or else) on graph is making your analysis make sense mathematically. From our perspective, these are the two key points in modeling influence within a network. 5.2 5.2.1 Analysis on Individual Strategy Experiment and Results Familiar with most influential persons in a network, one can adopt some strategy to boost his/her influence rapidly. We come up with some strategies and design experiment to check the performance. Here are the strategies that will be beneficial to a new-comer. • Strategy 1. Collaborate with some most influential researchers. • Strategy 2. Collaborate with one of the most influential researcher and close collaborators of him. In a network where the most influential persons are in different connected component or weakly connected component (just as Erdos Network), two strategies shown above are completely different. However, in Erdos network, two strategies are nearly the same. We design the following experiment to check the performance of two strategies. We first add a new node s to origin Erdos network. This node represents a new researcher who has just entered this network. Due to the limit of time and resources, this researcher is allowed to collaborate with T other researchers. Using different strategies to add edges to other nodes will bring us different graphs. We compute the influence coefficient of this new node. It is necessary to include a trivial strategy: the new-comer just randomly choose his collaborator. To be specifically, three strategies are stated below: • Strategy 1. The new nodes links T most influential nodes in origin graph. • Strategy 2. The new nodes links the most influential node and (T − 1) most influential neighbors of it in origin graph. • Strategy 3. The new nodes uniformly random choose T nodes in origin graph to be its neighbors. Since strategy 3 is non-deterministic, we repeat the experiment 100 times to get the mean influence measurement. For different T , the results are given below. The integer in the table represents the average ranking of the new comer. 17 On the Generalized Pagerank Model T Strategy 1 6 133.0000 9 92.0000 12 70.0000 47.0000 18 Page 18 of 21 Strategy 2 133.0000 92.0000 71.0000 48.0000 Random Strategy 234.5000 169.0200 129.1100 86.2000 Table 8: Sensitivity to an Extra Vertex 5.2.2 Analysis and Comments It can be observed that strategy 1 or 2 cannot give the new-comer far larger influence than random strategy. In Erdos network, the most influential nodes are adjacent, and thus strategy 1 and 2 has approximately the same performance. With the increase of the number of new edges, strategy 1 begins to take the lead. We can conclude from the experiment that it is beneficial to choose collaborators according to how influence they are. Although there is a flaw in our model: We are using algorithm based on certain influence measure to compare the performance of strategy 1 and 2, and meanwhile the main idea of strategy 1 and 2 are to choose your collaborators to be influential persons according to the same influence measure. It is to some degree a cyclic proof. However, this is reasonable as long as the influence measure is good, which has been proved in previous sections. It is indeed reasonable and beneficial to use network analysis for lifting one’s influence. As indicated by this experiment and result. Collaborating with the most influential researchers in particular field is helpful. 6 Sensitivity Analysis, Strength and Weakness 6.1 6.1.1 Sensitivity Analysis Experiment and Results Our sensitivity analysis mainly includes the following three parts. In every part we add a perturbation to parameter or the structure, and then observe the performance influence vector. It turns out that our model is stable, and thus robust. • Sensitivity to an Extra Vertex In the first part we add a new vertex into Erdos1 network. We also add 6 more edges, and the other end of these edges are chosen uniformly random from origin 511 nodes. 6 is the average number of edges in previous graph. For new graph G we compute the influence using our model and algorithm, and rank the researchers according to their influence coefficient. We then measure the difference between this ranking and origin ranking. To be specific, some notions are in order. Let t1 , ·, tn be previous ranking of Erdos network, i.e. ti represents the researcher ranked tth . In the perturbed network we let q(t) represents the new ranking of researcher t. A measurement 18 On the Generalized Pagerank Model m E[d(m)] 30 0.0200 100 0.0485 511 1.2638 Page 19 of 21 Max d(m) 0.2000 0.2000 1.8885 Min d(m) 0.0000 0.0000 0.6732 Table 9: Sensitivity to an Extra Vertex m 30 100 511 E[d(m)] 3.5268 3.5470 3.5267 Max d(m) 4.7515 4.9393 4.8611 Min d(m) 2.3875 2.3209 2.1331 Table 10: Sensitivity to an Extra Vertex of difference of two ranking could be: m 1 X |q(ti ) − i| d(m) = m i=1 This measurement mainly tells us the difference of top m researchers, which is the most important for a kibitzer. We repeat the experiment 100 times (In one experiment we randomly draw 6 researchers out of 511 to be the neighbor of new node.) and figure out the mean and the variance of d, for m = 30, 100, 500. The result is listed below. • Sensitivity for Extra Edges In the second part of sensitivity analysis we does not change the vertex set of G. Instead we randomly add edges into the network. The pair of every edge is uniformly random chosen from 511 nodes. If there is already an edge between a chosen pair, we simply increase its weight by 1 unit (Let Aij + = 1) We add a total of 30 edges (approximately 2 percent of origin number of edges) and then measure the difference of influence vector using the same methodology as in part 1. After 100 time repeated experiment, the result is given below: • Sensitivity of Perturbation in Parameter In the third part we simply perturb the value of the parameter β. The experiment is done in the following three cases. m 30 100 511 E[d(m)]) 2.7667 7.8900 17.7691 V ar[d(m)] 9.7023 70.3413 337.8211 m 30 100 511 (a) β: 0.01 → 0.015 E[d(m)]) 1.4667 13.9693 7.9726 V ar[d(m)] 2.5333 3.5200 78.3679 (b) β: 0.01 → 0.02 Table 11: sensitivity in β 19 On the Generalized Pagerank Model m E[d(m)]) V ar[d(m)] 30 343.7667 18137 100 276.27 21175 511 175.3933 20284 m 30 100 511 (a) β: 0.30 →0.31 E[d(m)]) 355.2000 279.7000 179.6830 Page 20 of 21 V ar[d(m)] 22611 24773 18864 (b) β: 0.40 → 0.41 Table 12: sensitivity in β 6.1.2 Analysis and Comments From experiments in previous section we obtain further analysis and comments of results in sensitivity analysis. • When β does not change, small changes to structure of graph cannot cause huge difference in the influence measurement, i.e. our model is not sensitive to the perturbation on structure of network. • When β does not change, the smaller m is, the smaller d(m) is, namely smaller top group is more stable when the perturbation is on G’s structure. • Sensitivity to parameter β is a little bit complex. P k k+1 1 If the variation of β is in reasonable region (all β in this region makes the series ∞ k=0 β R convergent), small perturbation of β brings small change in influence measurement. However, when P k Rk+1 1 divergent, β the variation of β is dangerous, i.e. some value of β will make the series ∞ k=0 the model gives very unstable output as indicated by out results of experiment. Perhaps Bonacich should add one more condition to his methodology: β should be less than the spectrum radius of R. The reason is large β will destroy stability of model. 6.2 Strength and Weakness Strength of our model is discussed several times in previous section. We make a summary here: • Originality We raised the model that has not appeared before. The most influential previous work on modeling collaboration network might be Bonacich’s influence measures, and the Pagerank algorithm which follows Bonacich’s model. We make changes to this model, changing the trivial “relationship matrix” R into a more specific one M , taking both real influence and virtual influence into account. This change is cased on reasonable hypothesis and the main feature of researcher’s network. • Characterize the Features of Certain Networks Two main changes are made in Pagerank model due to the following crucial observation about the network. First, the uniformity of the probability of transmitting a message is omitted. We replace the relationship matrix by a weighted influence matrix, computed from a matrix equation, given by a well-defined random process. Second, pagerank model only allows individuals to send the message to his neighbors, which is also omitted here. In our model, the GlobalSpread phase allows this to happen. In analysis of film actors and fundamental papers, crucial observations also offer improvement in models, and thus better performance is obtained. 20 On the Generalized Pagerank Model Page 21 of 21 • Mathematical Simulation Random Process is a powerful tool in analyzing abstract definition on graph. To be specific, when we want to measure some kinds of ability, we can define certain “amount of information” and use random process to do computation. Random process is also well studied in theory. Several methods such as Brown Motion and Markov Process becomes successful in application in industry. Our model has some weaknesses, though. They are analyzed below. • Time Complexity Recall that our main equation is the following: α2 V 2 + (α1 R − I)V + I = 0 Given adjacency matrix A we can find R immediately. However, solving V is a hard problem. Since no analytical techniques are developed to solve matrix equation with high degree efficiently. It is N P -hard from theoretical computer perspective. • Cannot Guarantee a Good Approximation Solution for V Here for data analysis, we can only use approximation methods to obtain a pseudo-solution, which is given by the following formula: f (V ) = α2 V 2 + α1 RV + I Recurrence method is then employed, which is specified in model section. However, f is not a contractive mapping for all pair of V . If we want convergence in recurrence method. α1 and α2 need to be small enough, which reduce the flexibility of the model. Even if we apply in small α1 and α2 , the recurrence method can still give us divergent solution. These two mathematical flaws are important. Further work on solving matrix equations analytically or obtaining approximation solution is in need. References [1] Phillip Bonacich. Power and centrality: A family of measures. American journal of sociology, pages 1170–1182, 1987. [2] Mark EJ Newman. Scientific collaboration networks. ii. shortest paths, weighted networks, and centrality. Physical review E, 64(1):016132, 2001. [3] Mark EJ Newman. The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences, 98(2):404–409, 2001. 21 Variants of Prophet Inequalities Zihan Tan; Fu li 1 Introduction The classic prophet inequality problem is that given a class C of random variables X = (X1 , X2 , · · · ), find the universal inequalities valid for all X in C which compare the expected supremum of the sequence with the optimal stopping value of the sequence. To be specific, if M denotes the expected supremum: M = M (X) = E[sup Xn ] n and V denoted the optimal stopping value (over the set T = T (X) of stopping rules for X) V = V (X) = sup E[Xt ] t∈T If the random variables are independent and only take non-negative values, there have been a celebrated result: V ≤ M ≤ 2V And it is shown that this bound is tight. In this research report, we study two alternatives of this problem (mentioned in the survey [1]), called the time-average payoff and time-discount payoff. Problem 1. ∑ If Y1 , · · · Yn are i.i.d r.v’s taking values in [0, 1], and Xj = 1j ji=1 Yi , then what is the advantage of M over V ? In other words, find the minimal value k such that: V ≤ M ≤ kV Problem 2. If Y∑ 1 , · · · Yn are i.i.d r.v’s taking values in [0, 1] and let 0 < α < 1 be the discount factor, and Xj = ji=1 αj−i Yi , then what is the advantage of M over V ? In other words, find the minimal value k such that: V ≤ M ≤ kV 1 In the following sections, we show our results for these two problems respectively. For every problem, we first analyze the case that n = 2 and find the optimal value for k, and also show that it is hard to find the value of k for n ≥ 3 since the distribution can be arbitrary. And with some additional argument we conjecture that the optimal advantage k for n = 2 is exactly the optimal advantage for arbitrary n. Then we focus on what is the best stopping rule when given the distribution for all Yi ’s. Finally we do some computation on the uniform distribution case, i.e. Yi ’s are uniformly distributed in [0, 1]. 2 Time-Average payoff 2.1 Analysis for 2 Random Variables Assume n = 2, let p(·) be the probability distribution for Y1 and Y2 . Then the best stopping rule should be a threshold strategy. To be specific, we have the following proposition: Proposition 1. (Threshold Strategy) The best stopping should be of the following form: Probe Y1 , if it is larger than the value T (previously fixed with respect to p(·), in fact T = E[Y ]), then we just stop at Y1 , otherwise we continue to probe Y2 . Proof. In fact any stopping rule can be written as a (probably randomized) projection f : [0, 1] → {0, 1}, here the variable is the value of Y1 , and after observing value of Y1 , the rule should make a decision whether or not it will continue to probe Y2 . The projection could be randomized since the strategy could be randomized. And f (x) = 0 indicates that it stops at Y1 , while f (x) = 1 indicates that it will probe Y2 . This randomized projection could be indeed written as a distribution of deterministic projection. Thus, it suffices to show that the threshold strategy is the best among all deterministic strategies. In the following let f be a deterministic projection, and let E = E[Y ]. ∫ ∫ 1 E[payoff of f] = yp(y)dy + (y + E)p(y)dy 2 y:f (y=0) y:f (y=1) Thus, it is clear that if y ≥ 12 (y + E), we should set f (y) = 0 and otherwise we should set f (y) = 1. This completes the proof. We formally state our result of n = 2 case as the following theorem. Theorem 1. If Y1 , Y2 are i.i.d r.v’s taking values in [0, 1], and for j = 1, 2, Xj = √ V ≤ M ≤ (6 − 2 6)V 2 1 j ∑j i=1 Yi , then Proof. From previous proposition for best stopping rule, we obtain the following expression for V . V = ∫ 1 E yp(y)dy + ∫ 0 E 1 1 1 (y + E)p(y)dy = E + 2 2 2 ∫ 1 E 1 yp(y)dy + E 2 ∫ E p(y)dy 0 For stopping rule with complete foresight, it is clear that it should stops at Y1 if y1 gey2 , and continue to probe Y2 otherwise. Since the probability for y1 ≥ y2 is 21 (this is true for all continuous p(·), and this is also true with some additional argument when the distribution is discrete) we have the following expression for M : 1 1 X = E + E[max{y1 , y2 }] 2 2 From [1] we have the following dilation lemma: Lemma 1. (Dilation Lemma) Let X be any integrable r.v and −∞ < a ∫< b < +∞, (X)ba is a r.v. satisfying (X)ba = X if X∈ / [a, b], (X)ba = a with probability (b − a)−1 X∈[a,b] (b − X)dP (x), and (X)ba = b with probability ∫ (b − a)−1 X∈[a,b] (X − a)dP (x), this (X)ba is called the dilation of X on interval [a, b], and the following two properties hold. (1)[X] = [(X)ba ]. (2)If Y is any r.v independent of both X and (X)ba , then E[max{X, Y }] ≤ E[max{(X)ba , Y }]. (Y1 ,Y2 ) Since we want to find the maximal advantage ratio supp(·) M V (Y1 ,Y2 ) , and V depends on three ∫1 ∫E values: E, E yp(y)dy and 0 p(y)dy. We just need to focus on those p(·) that maximize the ratio when the three values are fixed. According to dilation lemma, such near optimal p(·) should be discrete and of the following form: (ϵ in the following is a sufficiently small positive value, just to make the resulting distribution has the same three values as before and is omitted in computation.) p(0) = a; p(E − ϵ) = b; p(E + ϵ) = c; p(1) = d; a + b + c + d = 1; (b + c)E + d = E; Thus, it remains to figure out the maximal advantage ratio as a function of a, b, c, d. d Since a + b + c + d = 1 and (b + c)E + d = E, we have E = a+d . E[max Y1 , Y2 ] = [1−(1−d)(1−d)]·1+[2(b+c)(1−d)−(b+c)2 ]·E = (2−d)d+(1−a−d)(1−d+a)E 1 1 1 1 1 V = E + (cE + d) + E(a + b) = E(2 − d) + d 2 2 2 2 2 ratio = 1 E + 12 ((2 − d)d + (1 − a − d)(1 − d + a)E) X a(1 − a − d) = 2 =1+ 1 1 V 2+a 2 E(2 − d) + 2 d It remains to maximize a(1−a−d) such that a, d ≥ 0 and a + d ≤ 1. It ie clear that we should 2+a set d = 0, and let t = a + 2, √ a(1 − a − d) −t2 + 5t − 6 6 = ≤ 5 − (t + ) ≤ 5 − 2 6 2+a t t 3 And √ it can be seen that we can set a, b, c, d appropriately such that this ratio is arbitrarily close to 6 − 2 6. √ Thus, k = 6 − 2 6, this completes out proof. Remark 1. This methodology does not work for case n ≥ 3, since the expression of X is not clean and thus hard to analyze when the distribution p(·) is arbitrary. According to WLLN, we know that as n goes to infinity, Xn is highly concentrated around E, and the speed of convergence is given by Chernoff’s Bound. Thus, if a strategy allows to stop at large index with certain probability, then its expected payoff should be close to E. For example, the following strategy will give us expected payoff approximately E: If Xi ≤ E, then continue to probe Yi+1 , and if Xi > E. we stop at Yi . Let kn be the maximal ratio in the case that the number of Y ’s is n. Thus, it can be observed that the maximal ratio should not be sensitive when n is sufficiently large, and it should converge to come value. On the other hand, it is intuitive that the more random variables we have, the less advantage that the prophet should have, since the payoff is the average value of first t values and the additional random variable does not give prophet in the case n is large as much advantage as in the case n is small. Therefore, the maximal ratio should appear in the first a few terms of {kn }+∞ n=1 and the sequence should converge. Based on this observation, we propose the following conjecture. Conjecture 1. If Y1 , Y2 · · · are i.i.d r.v’s taking values in [0, 1], and Xj = √ i.e. kn ≤ 6 − 2 6 for all n ≥ 3. 2.2 √ V ≤ M ≤ (6 − 2 6)V 1 j ∑j i=1 Yi , then Best Strategy Since it is hard to explore the optimal advantage ratio when n ≥ 3, we turn to explore what the best stopping rule should be. In case n = 2 we have proved that the best strategy is a threshold strategy with threshold equal to E[Y ], we will prove similar results for n ≥ 3 case in this section. First we give formal definition of a strategy. A strategy is defined to be a (possibly randomized) projection f : [0, 1]n → [n], namely f (y1 , · · · , yn ) = t means that when receiving the sequence realization of random variables y1 , · · · , yn , the strategy should stop at yt . Since the strategy is not equipped with foresight, assume that it is deterministic, then the following property should be satisfied: ′ ′ if f (y1 , · · · , yn ) = t, then ∀yt+1 , · · · , yn′ ∈ [0, 1], f (y1 , · · · , yt , yt+1 , · · · , yn′ ) = t which means that whether or not f (Y ) takes value t only depends on the first t coordinates of Y . Thus, we can define another function: gt : [0, 1]t → {0, 1}, where gt (y1 · · · yt ) = 0 means that the strategy will stop at Yt , and gt (y1 · · · yt ) = 1 suggests that the strategy will go on to probe Yt+1 . 4 Note that gt is only defined on all sequence (y1 , · · · , yt ) where ∀0 < i < t, gi (y1 , · · · , yi ) = 1. It is clear that f is equivalent to a series of {gi }ni=1 . Since a randomized projection is indeed a probability distribution of deterministic projection, it suffices to figure out the best deterministic stopping rule. We have the following propositions for best strategy. In the following we suppose probability distribution for Yi is continuous. But in fact the proposition is also true with additional specification when distribution is discrete. Proposition 2. (Threshold Strategy) In every round, f should be a threshold strategy. To be specific, if (y1 , · · · , yt ) satisfy ∀0 < i < t, gi (y1 , · · · , yi ) = 1, then there exists a threshold Tt , such that gt (y1 , · · · , yt ) = 1yt ≥Tt , where 1yt ≥Tt is the indicator function of event yt ≥ Tt . The proof is straightforward, and similar to proposition 1. Proposition 3. (Uniform Threshold Strategy) f should be a uniform threshold strategy. To be specific, there exists T1 · Tn−1 such that for all (y1 , · · · , yt ) satisfying ∀0 < i < t, gi (y1 , · · · , yi ) = 1, gt (y1 , · · · , yt ) = 1yt ≥Tt −∑t−1 yi , where 1E is i=1 the indicator function of event E. The proof is straightforward, and similar as proposition 1. Proposition 4. (Decreasing Thresholds) The uniform thresholds T1 · · · Tn−1 should satisfy the following decreasing property: T1 ≥ 2.3 T2 T3 Tn−1 ≥ ≥ ··· ≥ = E[Y ] 2 3 n−1 Analysis for Uniform Distribution For getting more properties of the problem, we continue our exploration with restring the distribution of every random variable Yi as the uniform distribution. Concerning n random variables Y1 , · · · , Yn , let f : [0, 1]n → [n] be a strategy without foresight. Then if n random variables Y1 , · · · , Yn take the values y1 · · · yn , denote the value when the strategy f stops on the Yf (y1 ,...,yn ) by Uf (y1 , . . . , yn ). Therefore, Uf (y1 , . . . , yn ) = ∑f (y1 ,...,yn ) yi i=1 . f (y1 , . . . , yn ) And we call this value Uf (y1 , . . . , yn ) as the utility of the strategy f on y1 , . . . , yn . The following we calculate maxf EY1 ,··· ,Yn Uf (y1 , . . . , yn ), which is the maximum utility maxf EY1 ,··· ,Yn Uf (y1 , . . . , yn ) where a strategy without foresight f can reach. In section 3, the best strategy without foresight can be described by n thresholds T1 , · · · , Tn . Based on the n thresholds, on deciding the strategy whether should stop on Yi , it is enough to know whether the sum of all the value of Y1 , · · · , Yi is beyond the threshold. Therefore, after Y1 , · · · , Yi−1 already took value, the maximum utility of the strategy only depends on the sum of the value of random of Yi and all fixed values of Y1 , · · · , Yi−1 . 5 Therefore, for i ∈ [n], let Ui be the function computing the maximum utility when the Y1 , · · · , Yi ’s values are already given. Then Ui can be decided on one variable ti representing the sum of Y1 , · · · , Yi . So Ui : [0, i] → [0, 1] is defined as follows: ti ti ≥ T i ; i, Ui (ti ) = EYi+1 Ui+1 (ti + Yi+1 ), ti < Ti . ti ti ≥ Ti ; i, That is, for 1 ≤ i < n, Ui (ti ) = for i = n, Un (tn ) = tnn . ∫ ti +1 Ui+1 (y)dy, ti < Ti , ti We can prove the following Lemma 2. maxf EY1 ,··· ,Yn Uf (y1 , . . . , yn ) = Et1 U1 (t1 ). Proof. Similarly with proposition 1. Now Un , Un−1 , · · · , U1 are both fixed and can be computed sequentially. But it is too complex to write the precise representation for general n. Therefore, we only talk about n = 3, 4 for the beginning. When n = 3, t3 U3 (t3 ) = 3 t2 t2 ≥ 1 ; 2, U2 (t2 ) = ∫ t2 +1 y ∫ t2 +1 2t2 +1 U (y)dy = 3 3 dy = 6 , t2 < 1. t2 t2 And T2 = 1. U1 (t1 ) = U1 (t1 ) = t1 , ∫1 t1 , ∫ t1 +1 t1 2y+1 6 dy ∫ 1+t1 t1 ≥ T 1 ; U2 (y)dy, t1 < T1 . t1 ≥ T1 ; y 2 dy 1 2 = 12 t1 + 13 t1 + 31 , t1 < T1 . √ 1 2 Thus T1 is the root of t1 = 12 t1 + 13 t1 + 13 . T1 = 4 − 2 3 ∼ 0.5358. When n = 4, t4 U4 (t4 ) = 4 t3 t3 ≥ 1.5 ; 3, U3 (t3 ) = ∫ t3 +1 U4 (y)dy = 2t38+1 , t3 < 1.5. t3 t 2 t2 ≥ T2 ; 2, ∫ ∫ t2 +1 y 2 1.5 2y+1 25 dy + 1.5 dy = y24 + 5y U2 (t2 ) = 8 3 24 + 96 , 0.5 ≤ t2 < T2 . t 2 ∫ t2 +1 2y+1 dy = y+1 , t2 < 0.5. 8 4 t2 t1 + 1 6 x 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 y 0.8 1.0 1.2 1.4 1.6 1.8 2.0 Figure 1: The√green line is y/2. And the blue line is the quadratic line. And their intersection point is 3.5 − 6 ∼ 1.05 √ 6. Please see figure 1 for the intuition of the graph of the above function. t1 ≥ T 1 ; t1 , U1 (t1 ) = ∫ t1 +1 U2 (y)dy = FU2 (t1 + 1) − FU2 (t1 ), t1 < T1 . t1 Thus T2 = 3.5 − Then ∫ Namely FU2 = U2 (y)dy. And 0. < y ≤ 0.5 0.125y 2 + 0.25y 0.0138889y 3 + 0.104167y 2 + 0.260417y − 0.00173611 0.5 < y ≤ 1.05051 FU2 = 0.25y 2 + 0.126998 1.05051 < y ≤ 2. Thus T1 is the root of t1 = FU2 (t1 + 1) − FU2 (t1 ). And we can use computer to get a solution T1 = 0.553772. 3 Time-Discount Payoff In this section, we study another payoff function of this problem (mentioned in the survey [HK92]) Problem 3. 7 If Y∑ 1 , · · · Yn are i.i.d r.v’s taking values in [0, 1] and let 0 < α < 1 be the discount factor, and Xj = ji=1 αj−i Yi , then what is the advantage of M over V ? In other words, find the minimal value k such that: V ≤ M ≤ kV In the following sections, we show our results for this problem. In subsection 1, we analyze the case that n = 2 and find the optimal value for k, and also show that it is hard to find the value of k for n ≥ 3 since the distribution can be arbitrary. However, in subsection 2 we make some analysis for infinite stage game for this payoff function. We also focus on what is the best stopping rule when given the distribution for all Yi ’s. In subsection 3 we do some computation on the uniform distribution case, i.e. Yi ’s are uniformly distributed in [0, 1]. 3.1 Analysis for 2 Random Variables Assume n = 2, let p(·) be the probability distribution for Y1 and Y2 . Then the best stopping rule should be a threshold strategy. To be specific, we have the following proposition: Proposition 5. (Threshold Strategy) The best stopping should be of the following form: Let E be the expectation of Yi . Probe Y1 , if it is larger than the value Y1 , otherwise we continue to probe Y2 . E 1−α , then we just stop at Proof. E If Y1 > 1−α , since the expectation of Y2 is E, the expectation payoff of continuing probing Y2 is αY1 + E. Thus, it is better to continue probing Y2 if and only if αY1 + E ≥ Y1 , which is equivalent E . to Y1 ≤ 1−α We formally state our result of n = 2 case as the following theorem. Theorem 2. ∑ If Y1 , Y2 are i.i.d r.v’s taking values in [0, 1], and for j = 1, 2, Xj = ji=1 αj−i Yi , then 2 V 1+α V ≤M ≤ Proof. From previous proposition for best stopping rule, we obtain the following expression for V . V = ∫ 1 E 1−α yp(y)dy + ∫ 0 E 1−α (αy + E)p(y)dy = α ∫ 0 E 1−α yp(y)dy + E ∫ E 1−α p(y)dy + 0 ∫ 1 E 1−α yp(y)dy For stopping rule with complete foresight, it is clear that it should stops at Y1 if y1 ≥ αy1 + y2 , and continue to probe Y2 otherwise. Thus, we have the following expression for M : M = E[max{αY1 + Y2 , Y1 }] = E[αY1 ] + E[max{Y2 , (1 − α)Y1 }] 8 (Y1 ,Y2 ) Since we want to find the maximal advantage ratio supp(·) M V (Y1 ,Y2 ) , and V depends on three ∫1 ∫ E values: E, E yp(y)dy and 01−α p(y)dy. We just need to focus on those p(·) that maximize the 1−α ratio when the three values are fixed. According to dilation lemma, such near optimal p(·) should be discrete and of the following form: E < 1, then If 1−α E E − ϵ) = b; p( + ϵ) = c; p(1) = d; 1−α 1−α E a + b + c + d = 1; (b + c) + d = E; 1−α p(0) = a; p( If E 1−α ≥ 1, then p(0) = a, p(1) = 1 − a d(1−α) E Since a + b + c + d = 1 and (b + c) 1−α + d = E, we have E = a+d−α . E Considering that it is necessary to compare (1 − α) with 1−α since we are taking the maximum of two random variable: Thus, it remains to figure out the maximal advantage ratio as a function of a, b, c, d in every case. E Case.1 1−α ≥1 The best strategy is to always probe Y1 and Y2 , and E = (1 − a). V = ∫ 1 (αy + E)p(y)dy = (1 + α)E 0 M = a2 · 0 + 2(1 − a)a · 1 + (1 − a)2 · (1 + α) = (1 − a)(1 + α)E + 2aE We can let a → 1 and find that Case.2 (1 − α) < p(0) = a; p( E 1−α (1 − α)a + (1 + α) M = V 1−α M V → 2 1+α . <1 E E E − ϵ) = b; p( + ϵ) = c; p(1) = d; a + b + c + d = 1; (b + c) + d = E; 1−α 1−α 1−α E Since a + b + c + d = 1 and (b + c) 1−α + d = E, we have E = V = αE + (a + b)E + (1 − α)(c · and a > α. E 1+a + d) = (1 − α)d 1−α a+d−α M = αE+E[max{Y1 , (1−α)Y1 }] = αE+d+(1−a−d) 9 d(1−α) a+d−α , E (1 − α)d +a(1−α)E = (1+α+(1−α)a) 1−α a+d−α M 2α =1−α+ V 1+a Let a → α and the ratio goes to 1 − α + Case.3 E 1−α 2α 1+α . ≤1−α V = αE + (a + b)E + (1 − α)(c · E 1+a + d) = (1 − α)d 1−α a+d−α M = αE + E[max{Y1 , (1 − α)Y1 }] = αE + d + (1 − a − d)( E (1 − d) + d(1 − α)) + a(1 − α)E 1−α Through tedious computation we found that in this case we would not obtain a better bound than previous two cases. Since 2 1+α >1−α+ 2α 1+α , the best bound for advantage ratio is 2 1+α . Remark 2. This methodology does not work for case n ≥ 3, since the expression of V is not clean (we would need to discuss 2n different cases and each one has completely different formula) and thus hard to analyze when the distribution p(·) is arbitrary. 3.2 3.2.1 Infinite-Stage Game and Best Strategy Infinite-Stage Game Consider the infinite-stage version of the problem: We are given infinitely many independent identically distributed random variable and are allowed to stop whenever we want. We have the following proposition for the best strategy. Proposition 6. The best strategy is a uniform threshold strategy. Whenever your current payoff E is higher than a previously fixed T∞ , you should stop. And we have T∞ > 1−α . Proof. It is easy to see the best strategy is a threshold strategy. To show that it is uniform, it suffices to point out that every round, of your current payoff is X, then after you have another probe your payoff would be αX + Y , and there are infinitely many choices forehead. This is independent on the number of probes that you have done. Thus, the threshold should be uniform. E To see that it is strictly larger than 1−α , since the payoff of the strategy ”not to stop” is at least αX + E, if this is even larger than X, then of course you should not stop here. 10 3.2.2 Best Strategy for Finite-Stage Game Since it is hard to explore the optimal advantage ratio when n ≥ 3, we turn to explore what the best stopping rule should be. In case n = 2 we have proved that the best strategy is a threshold E strategy with threshold equal to 1−α , we will prove similar results for n ≥ 3 case in this section. Since a randomized projection is indeed a probability distribution of deterministic projection, it suffices to figure out the best deterministic stopping rule. We have the following propositions for best strategy. In the following we suppose probability distribution for Yi is continuous. But in fact the proposition is also true with additional specification when distribution is discrete. Proposition 7. (Threshold Strategy) In every round, the player should play a threshold strategy. The proof is straightforward, and similar as proposition 1. We also state the following proposition without proof, since it is easy to verify. Proposition 8. (Decreasing Thresholds) The thresholds T1 · · · Tn−1 should satisfy the following decreasing property: T∞ > T1 > T2 > T3 > · · · > Tn−1 = 3.2.3 E 1−α Generality for Threshold Strategy For this kind of stage-style probing game, it is very likely that the best strategy is a threshold strategy. To be more specific, we have the following natural proposition. Proposition 9. In every round let your probed value is yi and you stop at yn . If the payoff function f (y1 , · · · yn ) is increasing in every of its coordinates, then the best strategy is a threshold strategy. 3.3 Analysis for Uniform Distribution In this situation, we calculate the threshold precisely. Let f : [0, 1]n → [n] be a strategy. By previous section, f can be described by n thresholds T1 , . . . , Tn . Then define n piecewise functions U1 , . . . , Un where each function Ui represent the expected payoff operating according to the threshold Ti on the given input Y1 , . . . , Yi . Namely By the similar discussion before, the best strategy f is decided by the optimal thresholds T1 , . . . , Tn . And each optimal threshold Ti is also locally optimal on the part of the possible sequence Y1 , . . . , Yi . Therefore Ti should make Ui reach its maximum. Namely Ti = arg max Ui . Ti 11 Note that once Ti is fixed, we can represent Ui exactly. Thus, we can calculate the value Ey iid on [0,1] Ui (αX ∑ i + y) and then find Ti = arg maxTi−1 Ui . That is, based on the fact Un (Y1 , . . . , Yn ) = nj=1 αj−n Yj , we can calculate Tn−1 , Tn−2 , . . . , T1 inductively. Focus on each Ui , we can do some simplification. Since now we only want to find the Ti maximizing Ui , namely Ti should be less than Xi when Xi ≥ Ey∼[0,1] Ui (αXi + y) and Ti should be larger than Xi on the contrary. And by the pervious discussion, we already know Xi and Ey∼[0,1] Ui (αXi + y) are both monotone and have one intersection in the interval [0, 1]i . Therefore, ∑ Ti is just the intersection, namely the root in [0, ij=0 αj ] of the equation Xi = Ey∼[0,1] Ui (αXi + y) = ∫ αXi +1 Ui (y)dy. αXi Therefore, we formulate the above discussion as an Algorithm1. Algorithm 1: Computing the best thresholds T1 , . . . , Tn for n variables Y1 , . . . , Yn 1 2 3 4 5 Initially, let Ti = 0, ∀i ∈ [n − 1]; let U (X) = X; for i from n − 1 to 1 do ∫ αX+1 Let Ti be the root in [0, 1] of X = αX U (y)dy, if can’t find one, let Ti = 1; X ≥ Ti ; X, Update U (X) = ; ∫ αX+1 U (y)dy, X < T . i αX Return T1 , . . . , Tn−1 ; Furthermore, we give the example running of the algorithm when n = 3. When n = 3, U3 (X) = X, ∫ αX+1 1 ydy = αX + . X= 2 αX 1 T2 = . 2(1 − α) 1 X ≥ 2(1−α) ; X, U (X) = αX + 1 , X < 1 . 2 2(1−α) 1 αX + 12 , X > 2α(1−α) . ∫ αX+1 1 5−4α 1 1 2 3 2 X= U (y)dy = 2 ((α − α )X + αX + 4−4α ), 2α(1−α) − α ≤ X ≤ αX α2 X + 1+α , 1 X < 2α(1−α) − α1 . 2 When 0 < α < 1/2, 1 2α(1−α) − 1 α ≤0<1< 1 2α(1−α) . ; Thus we only need consider the equation 5 − 4α 1 ). X = ((α2 − α3 )X 2 + αX + 2 4 − 4α 12 1 2α(1−α) Thus T1 = ( ) √ 0.5 α2 − 2 α5 − 3α4 + 2α3 + 2α2 − 3α + 1 − 3α + 2 α4 − 2α3 + α2 . Observe that when α is among the interval (0, 1/2), the threshold is the Figure as follows. 1.0 0.8 0.6 0.4 0.2 0.1 0.2 0.3 0.4 Figure 2: The red line is T2 = 0.5 1 2(1−α) . And the blue line is T1 . References [1] Theodore P Hill and Robert P Kertz. A survey of prophet inequalities in optimal stopping theory. Contemporary Mathematics, 125(1):191, 1992. 13 On the Compressed Sensing of Microblog Zihan Tan 1 Abstract Microblog is nowadays very popular among people in different regions. Everyday there is a large amount of information flowing on the microblog-internet, which is of great value for both theoretical research and economic analysis. There is also a rather important property in such kinds of networks: Power Law, which is a common property in a great deal of networks with large scale. We formalized the microblog-internet into graph model and did analysis on compressed sensing aspects. We designed a model for operating the whole network and give an algorithm to recover the specific values from large amount of input using only small space. Key word: Compressed Sensing, Power Law, Networks, Microblog 2 Introduction In the past decades, the research on network has increased rapidly. The common method consists of two parts: Modeling the problem into networks and analyzing the property of networks. These research has a common feature: They all deal with large amount of data. When it comes to data, the problem is mainly separated into two component: data gathering and data analysis. A standard method compressed sensing is related to both components. Research about compressed sensing mainly contains two parts: theoretical part and application part. Theory part is advanced in numerical analysis. In the survey by Yuan Yao (2009), theory part is now related to homogeneous spaces and fourier analysis. The goal is to find a set of vectors to be the sensing base, and get sparse sensing vector, wavelet analysis is also a good tool for this. However, more research is on application part, because using compressed method can always give us good data-extracting results and accurate analysis of large-scale data, then let us better understand the structure and property of specific class of networks. Victor Hugo (1993), with data courtesy to Knuth, studied network including different types of relationships. Coauthorship network between scientists was studied by Newman (2006), rounded by Jon Kleinberg later. The important structure of application research on compressed sensing contains two parts: datagathering and data-analysis. It suffices to design efficient and good algorithms. Xiaoye Jiang, Yuan Yao, Han Liu and Leonidas Guibas (2011) set up a new framework for modeling and connected two seemingly different areas: network data analysis and compressed sensing. Fundamental work on data-gathering is done by Chong Luo, Feng Wu, Jun Sun and Chang Wei Chen (2009). Our research concerns network of microblogs, and analyzes the property of forwarding of microblogs. The network is of power law, and we both initiate the method of data-gathering and 1 design the algorithm for sensing and recovering, then give informal proofs for the correctness of our model. 3 Model Setting Consider a microblog network as a general directed graph with the number of nodes to be n. Let every node represents a specific client, and there is an edge from node u to node v iff the client u friended client v, or followed v. Every node u is accompanied with a vector pu , representing his persinality. One microblog x is also accompanied with a vector mx , with the same pattern as the personality vector, representing the feature of this microblog. But the vector is going to be changed in the procedure of being forwarded. We say that a diffusion φx of a microblog x, is a set of 3-tuples: (i, u, mxu,i ), meaning that the client u is the ith person to forward the origin microblog x, and after his forwarding, the vector of x is changed to mxu,i . There is also a family of matrices called transition matrices, representing the essential formula of the way to change the vector of a microblog: {T i }ni=1 , T i is a matrix of n rows and n columns, representing every client respectively. The transition f ormula is the following: when ith person u see the forwarding microblog x from (i − 1)th person v, with the vector already changed to mxv,i−1 , then after he forwards this mocroblog, the vector of microblog x changes into mxu,i = i 1 Tuv x m + pv v,i−1 i i 1 + Tuv 1 + Tuv (i.e.· · · ) 4 Algorithm The input is large amount of information on all personality vectors and a lot of diffusion information. We are allowed to use only a small amount of space (i.e space(o(n2 ))). And we need to output all the vectors using compressed sensing method. Our algorithm can be divided into three parts: First, we memorize the transition matrices. Then we use the transition matrices to get the sensing information. Next we use specific algorithm of compressed sensing to recover all the vectors. 4.1 Memorizing the Transition Matrices We memorize the transition matrices using standard power iteration method, with some techniques. Given a matrix T i not symmetric, we first change it into two symmetric components and memorize them respectively: Let Li and Ri be the following matrices: 2 Li = Ri = i T11 i T21 .. . i T21 i T22 .. . ··· ··· .. . i Tn1 i Tn2 .. . i i Tn1 Tn2 ··· i Tnn i T11 i T12 .. . ··· ··· .. . i T1n i T2n .. . i i T1n T2n ··· i Tnn i T12 i T22 .. . (1) (2) We memorize Li and Ri in the following form: memorize the log n eigenvalue and their correspondence eigenvector, namely the log n pairs (λj , tj ). Finally we concatenate two parts together to be the approximated matrix T 0i . (i.e.· · · ) 4.2 Sensing Information We combine all the first entries from every personality vector together to be a large vector g 1 , with gu1 = pu1 , we choose a microblog being forwarded for klog n times and try to establish the family of transition vector txu (row vectors) firstly, the way to compute the transition vector is the following: A transition vector txu is a n−dimensional row vector. If u is the ith person to forward this microblog x and the previous i − 1 persons are v1 , v2 , · · · , vi−1 , then we use a computing sequence to find txu : (txu )1 = (1, 0, 0, · · · , 0) (txu )j+1 = 1 1+ Tvjj vj+1 (txu )j + Tvjj vj+1 1+ Tvjj vj+1 v (0, 0, · · · , p1j+1 , 0, 0, · · · , 0) (i.e. · · · ) for all 0 < j < i, where the unique nonzero entry is the (j + 1)th entry in the vector. Then our sensing matrix is: x tu tx v Sx = . .. txw We use this sensing matrix to sense the vector g 1 to get a sensing information equality: Sx g 1 = Ix1 3 (3) Ix1 = (mxv1 , mxv2 , · · · , mxvi−1 , mxu , 0, 0, · · · , 0)T 4.3 Recovering We use the standard linear programming method to recover g 1 from the sensing inequality information. Sx g 1 = Ix1 And we try log n more microblogs and compute the average vector of g 1 to be our final result. Likely we can recover g 2 , g 3 · · · , and it remains only to rearrange them to get all the vectors. 4 Note on Satisfiability and Evolution 2014.08.13 The evolution model is the following. Suppose f is the fitness function of a genotype, an n−variable function defined on {−1, 1}n , taking values in [1, 1+] where is a small enough constant (this is due to weak selection). The frequency of genotypes follows a product distribution determined by an n−dimensional vector u = (u1 , · · · , un ) (called the feature vector), ui ∈ [0, 1]. Let Pu be the probability distribution on the cube {−1, 1}n such that Y 1 1 xi Pu (x) = Pu (x1 , · · · , xn ) = + u · (−1) 2 2 1≤i≤n The frequency of genotype are evolving in generations according to Nagylaki’s Theorem. Mathematically speaking, let ut be the feature vector in tth generation, the recurrence equation is the following. For all i, ut+1 = i EX [Xi f (X)] EX [f (X)] where X follows the distribution Put (·). We further explore the evolving phenomenon in the following sections. We first give the understanding of the evolution model from different perspectives, which helps us to make connection between this novel problem and some other classical fields. Then we show some calculation and experiment result and extract several basic facts about this evolution model. 1 1.1 Understanding From Different Perspectives Center of Mass In this section we consider the case that f only takes values in {1, 1 + }. Then all vertices of the cube are divided into two sets: high set (including all vectors whose fitness is 1 + ) and low set (including all vectors whose fitness is 1). The evolution of genotypes could be characterized from perspective of center of mass in the following way. Every time ut determine a product distribution on vertices of the cube, and the probability can be considered as the weight on those vertices. Then we multiply the weight at every high point by 1 + and do not change the weight at low points. After that we calculate the mass of center in this system, by regarding probability as the mass at certain vertex, and let this point be ut+1 . To see things more clearly, every time when ut determine a product distribution on vertices of the cube, we could calculate two centers called “high center” (ht ) and “low center” (lt ) where the 1 high center is the center of mass among all high points and the high center is the center of mass among all low points. It is immediate that the high center, the low center and ut and ut+1 should be on the same line. And it can be observed that: ut+1 = ut + α(ht − lt ) where α is a small number related to . We say ut is stable if ut+1 = ut . Another remark could be that ut is stable if and only if ht = lt . However, if the pattern of f is not regular enough (1 and 1 + are distributed in a messy way), it would be also hard to compute ht and lt every round. Thus, this understanding could not help us out. 1.2 Coordination Game We can also regard the evolution rule as a coordination game. We have n players and each player has two action −1 and 1. One generation means one round of the game, and in each round every 1+ut 1−ut player choose a mixed strategy ( 2 i , 2 i ). Each player update his strategy in the following way: player i calculates the expected payoff for every single choice in the last round: E[payoff | uti = 1] and E[payoff | uti = 1], where the expectation is taking on other player’s randomness in the strategy. And he updates the probability distribution in the next round so that: 1 + ut+1 (1 + uti )E[payoff|uti = 1] i = (1 − uti )E[payoff|uti = −1] 1 − ut+1 i This is a multiplicative weight updating rule for strategy updating. No much is known about the monotonicity of the payoff when n players all update their strategies in the next round. However, after a lot of numerical experiment, no counterexample is found. Thus, it is plausible that the payoff is increasing under the updating rule. For a stable point, we claim that it must be a mixed Nash Equilibrium of the game. This is true since no player has the intention to change his current mixed strategy in order to get higher payoff. 2 Monotonicity By monotonicity we mean that f˜(ut+1 ) ≥ f˜(ut ) for all ut , where f˜ is the multi-linear extension of f . We consider two special cases and analyze the monotonicity in these cases. And we conjecture that the monotonicity holds in general. There should be no limit in both the number of genes and the number of alleles in the general cases. However, it is hard to prove or disprove the monotonicity in the general case. Case 1. Two-Player, Two-Action, Relaxed Value of f By relaxed value of f we mean f could take any value in [1, 1 + ] on vertices. In this case we assume the payoff table is a11 a12 (1) a21 a22 2 By exactly computing the payoff in tth round and t + 1th round, we can prove that the monotonicity holds. Case 2. N-Player, Two-Action, Constrained Value of f In this case we have not proved the monotonicity, but we can understand the monotonicity in another way: to put the evolution in a continuous form. The frequency of genotypes evolve in generations, like game played in rounds. However, in reality the evolution is a continuous procedure. We could regard the feature vector as a function of time t, and the evolving rule becomes some kind of differential equation. Due to previous understanding of center of mass, the direction that u(t) is going should be same with vector h(t) − l(t). By calculation we obtain the following formula: ∂ f˜ 2 = · det ∂ui (1 + ui )(1 − ui ) Hi+ Hi− L+ L− i i where Hi+ = Pr{x | xi = 1, f (x) = 1 + }, Hi− = Pr{x | xi = −1, f (x) = 1 + }, and L+ i is defined similarly. Note that Hi+ does not have index t, but it is defined to be the summation of certain probability at time t. + 2 Hi Hi− h(t) − l(t) = − · det L+ L− (Hi+ + Hi− )(L+ i i i + Li ) Thus, we can see that the angle spanned by the gradient of f on u and the changing direction of u is acute. which means if we change the evolution rule into a continuous one, we would prove the monotonicity. This, however, guides us to believe that in this case monotonicity holds. In general case, we believe monotonicity holds since a lot of numerical experiment is done and no counterexample has been found. 3 Convergence We analyze the convergence of the evolution rule from two aspects: the point that any initial u will converge to and the convergence-path it will go. Unfortunately, no theoretical result is proved and we rely on numerical experiment to get some conjecture. 3.1 Endpoint of Convergence When f only takes constrained value at vertices, it is believed that almost all initial points in [−1, 1]n will converge to some point with highest fitness (also known as satisfiability). Here almost all means the lebesgue measure of all such point is 2n . Here is the convergence diagram of all f when n = 2. It can be observed that some initial point do not converge to a point with highest fitness, i.e. there exists stable point with low fitness. 3 When f takes relaxed value at vertices, it is possible that with non-zero lebesgue measure, the initial point will converge to a point with low fitness. The following two 2−dimensional convergence diagram serve as examples. 4 3.2 Path of Convergence The following two diagrams are the convergence path when n = 2. 5 6 Impossibility for Stack Mutual Exclusion 1 Problem and Conclusion Consider the problem that 2 processes p0 , p1 want to implement Mutual Exclusion with only a shared stack. Their legal operations on the stack are pop() and push(). When someone implements pop() to the stack, the top entry of stack will be eliminated and the return value is the content in this entry. When someone pushes an entry into the stack, this entry would be added into the stack and be the new top entry, and push operation has no return value. Given the above setting, we would like to show that there is no algorithm such that mutual exclusion property and no starvation property are both satisfied. To be specific, we are going to prove the following theorem. Theorem 1. For any algorithm that has mutual exclusion property and any initial state of stack, there exists an infinite schedule s = (s1 , s2 , · · · ) such that every process implements infinite number of operations in infinite time. That is, let I[sj = 0] be the indicator of event sj = 0, the following property is satisfied: i→∞ i X lim i X lim i→∞ I[sj = 0] = +∞ j=1 I[sj = 1] = +∞ j=1 Then this schedule will cause starvation, i.e. some process will be able to enter the critical section only finitely many times. 2 Proof Let the stack alphabet be Γ, and let ST (s) be the content of stack after the implementation of a finite schedule s (ST (s) is an array). Assume every process runs a Turing machine and implements according to its transition function. 1 Assume progress property is satisfied, i.e. there is some process that eventually entered into the critical section (otherwise the theorem is already proved). Without loss of generality we have a schedule s1 such that after implementations in s1 , p0 enters the critical section for the first time. Now we consider |ST (s1 k 1n )| = w and let minstack(n) = inf k≥n |ST (s1 k 1k )|. It is immediate that sequence {minstack1 (n)}∞ n=1 is non-decreasing. There are three cases for this sequence: (1) limn→∞ minstack1 (n) = ∞ (2) limn→∞ minstack1 (n) = a > 0 (3) limn→∞ minstack1 (n) = 0 For these three cases, we prove that there exists an infinite schedule such that starvation happens to at least one process, and the constructed sequence allows each process to implement infinitely many times. Case 1. limn→∞ minstack1 (n) = ∞ For every i ≥ |ST (s1 k 1n )| = w, let ri = arg minr≥0 minstack(r) = i. Consider the following infinite schedule: s1 k 1rw+1 0 k 1rw+2 0 k · · · It is observed that every process implements infinitely many times in the schedule. We claim that in this schedule p1 will be starved, i.e. it cannot enter the critical section forever. We compare the above constructed schedule with another schedule s1 k 1∗ . We now prove that every return value of p1 ’s operation is exactly the same. This is because in the constructed schedule, after the implementation of s1 k 1ri , p1 will never visit the entries that are currently in the stack now, i.e. p1 will not pop any of these entries out due to the definition of ri . And the only operation from p0 will not influence p1 to “live in his own world”. The next operation for p1 is surely some push, and the pop operations afterwards only return what he have pushed from then on. Thus, we claim that in the constructed sequence, p1 is not able to enter the critical section. This is because in the schedule s1 k 1∗ , p1 cannot enter the critical section, otherwise the mutual exclusion property is violated. And the same return value wills lead p1 to have same performance. Case 2. limn→∞ minstack1 (n) = a > 0 This case is pretty similar with case 1. Since there are infinitely many r such that the size of stack |ST (s1 k 1r )| = a, we let ri be such sequence of index in the increasing order, i.e. ri+1 > ri for all i. Like the previous case, we construct the following infinite schedule and claim that p1 will be starved forever. s1 k 1r1 0 k 1r2 0 k · · · The proof is exactly the same with case 1. We omit it here. Case 3. limn→∞ minstack1 (n) = 0 Since there are infinitely many r such that the size of stack |ST (s1 k 1r )| = 0, we let ri be such sequence of index in the increasing order, i.e. ri+1 > ri for all i. Let p0 get out of the critical section after implementation of s1 k 1r1 , i.e. we consider the sequence s1 k 1r1 k 0∗ . 2 • If process 0 cannot enter the critical section in this infinite schedule. Consider the similar sequence {minstack0 (n)}∞ n=1 for process p0 . To be specific, let |ST (s1 k 1r1 )| = w0 and let minstack0 (n) = inf k≥n |ST (s1 k 1r1 k 0n )|. Three cases are similarly discussed as the following: (1) limn→∞ minstack0 (n) = ∞ The following schedule will cause p0 into starvation. (i ≥ |ST (s1 k 1n )| = w0 , let ri0 = arg minr≥0 minstack(r) = i) 0 rw 0 +1 s1 k 1r1 k 0 0 rw 0 +2 1k0 1 k ··· (2) limn→∞ minstack0 (n) = b > 0 The following schedule will cause p0 into starvation. (Since there are infinitely many r such that the size of stack |ST (s1 k 1r )| = a, we let ri0 be such sequence of index in the increasing order, 0 i.e. ri+1 > ri0 for all i.) 0 0 s1 k 1r1 k 0r1 1 k 0r2 1 k · · · (3) limn→∞ minstack0 (n) = 0. The following schedule will cause both p0 and p1 into starvation. (Since there are infinitely many r such that the size of stack |ST (s1 k 1r )| = a, we let ri0 be such sequence of index in the 0 increasing order, i.e. ri+1 > ri0 for all i.) 0 0 s1 k 1r1 k 0r1 k 1r2 k 0r2 · · · • If process 0 can enter the critical section in this infinite schedule. That is, there exists some t1 such that after implementation of s1 k 1r1 k 0t1 , p0 enters the critical section again. Then consider p1 starts to do implementation from now on, similar definitions for minstack1 (n) are made and of course it is 0 (otherwise it can be reduced to case 1 or case 2 and therefore proved.) Thus, there exists r2 such that |ST (s1 k 1r1 k 0t1 k 1r2 )| = 0. We let p0 get out of critical section again and then do implementations on its own. Consider if he can enter the critical section again this time. If not, it can be reduced to the previous dot subcase; if yes, we continue to let p1 run again, · · · We obtain the following sequence that can cause p1 into starvation. s1 k 1r1 k 0t1 k 1r2 k 0t2 · · · After implementation of s1 k 1r1 k 0t1 · · · || k 1ri k 0ti , p0 get into the critical section, and After implementation of s1 k 1r1 k 0t1 · · · || k 1ri , the stack is empty. Consequently, combining the analysis for all cases, we complete the proof of the theorem. 3 Notes on New Definitions for Differential Privacy Zihan Tan 2014.06.10 1 Motivation Differential Privacy has been a prevailing definition for privacy in recent years and was studied through a lot of mechanisms. However, dimensionality curse always prevents a differentially private mechanism from giving good performance. We observe that the differential privacy is a demanding definition probably because it is an information-theoretic property. In this note we attempt to develop a new computational-theoretic definition based on statistical distance for privacy and the material is immature and several proofs are missing. 2 Statistical Distance Definition 1. (Statistical Distance) Let p and q be probability density function on R, then the statistical distance of them is defined to be: Z 1 ∞ s(p, q) := |p(x) − q(x)|dx 2 −∞ If the support of distribution is not R, we just do the integration on the support space. The essence of statistical between two distributions is really clear: Consider you are given a real number t ∈ R and are told that t is a sample from one of two distributions p and q. And then you are asked to guess which distribution it was sampled from. The optimal strategy for you is to compare p(t) and q(t) and if p(t) is larger then guess p and otherwise guess q. It can be computed that the probability that your guess is right is exactly 21 + s(p, q). Intuitively speaking, statistical distance in some sense represents the ability of an adversary to distinguish between two distributions given an one-shot access. It can thus be used as definition for differential privacy. First we give the following definitions and propositions, which are not hard to prove and will be essential for understanding the new privacy definition. Definition 2. (Product Distribution) Let p, q be two distributions on R, we define their product distribution p·q to be a distribution on R2 , whose density function is p·q(x, y) = p(x)q(y). Proposition 1. Let p, q be distribution on R, then s(pm+n , q m+n ) ≤ s(pm , q m ) + s(pn , q n ) 1 Proposition 2. Let p, q, r be distribution on R, then s(p, q) ≤ s(p, r) + s(q, r) 3 Statistical Differential Privacy We give the new definition of privacy based on statistical distance. Definition 3. (Statistical Privacy) A mechanism M is defined to be (t, p) − Statistically P rivate if for every neighboring databases D, D0 , let M (D) and M (D0 ) be the distributions of outputs on input D and D0 respectively, for any polynomial-time adversary, the probability that it passes the following experiment is not larger than 1 2 + p. Experiment: The adversary is given t output samples (sampled from distribution M (D) or M (D0 )) and asked to guess which one the data are sampled from. It passes the experiment if and only if it guesses right. In this new definition we are basically requiring that given any pair of neighboring database, the ability for an poly-time adversary to distinguish the output distributions are bounded by p. Some propositions and remarks are in order to better understand the new definition. Remark 1. Note that when t = 1, the probability for an adversary(poly-time since it just need to query M (D) and M (D0 ) once respectively) to guess right is just 12 + s(M (D), M (D0 )). This means we need to compute the largest statistical distance between output distribution over any pair of neighboring databases. And when t > 1, we will care about the statistical distance between M (D)t and M (D0 )t , which are two product distributions. Proposition 3. -differential privacy implies (1, 21 (e −1))-statistical privacy; but (1, p) statistical privacy does not imply -differential privacy for any . The intuition here is that good statistical privacy allows that for some x ∈ R, p(x) > 0 while q(x) = 0. However, this is forbidden in differential privacy. Proposition 4. (, δ)-differential privacy implies (1, δ + 12 (e −1))-statistical privacy, and (1, δ + 1 − − 2)-differential privacy. 2 (e −1)) statistical privacy implies (, 2δ + e + e This proposition shows the equivalence of (, δ) differential privacy and statistical privacy in some sense. However, the flavor of the order of parameters are different. For example it is often the case that = O(1) and δ = O( n12 ), but this will cause e + e− − 2 = O(1), so it is not that equivalent. Proposition 5. (1, p)-statistical privacy implies (t, tp)-statistical privacy. This is a direct corollary for proposition 1. Here are some negative results about statistical privacy. 2 Remark 2. (Computational-Theoretic or not) It seems that the new definition works for any poly-time adversary. But indeed we find a characterization of statistical privacy from statistical distance, which means the poly-time constraint for adversary here does not play a role. This shows that essentially our new definition is still a information-theoretic one, and certain mechanisms cannot give us good performance when the extrinsic dimension of data is large. Remark 3. (Flavor of Parameters) In definition of (, δ) differential privacy, usually we require = O(1) and δ = o( n1 ), from proposition 4 we could only deduce a very weak statistical privacy property. Besides, consider a mechanism that inputs D = {x1 , ·, xn } and outputs uniformly random one xi . It is kind of a silly mechanism since with certain probability it just releases the whole privacy of some client, but it satisfies (1, n1 ) privacy, which means when t = 1, p = n1 is not good enough. However, if p = o( n1 ), for example p = n12 , we will see that we are actually restricting the mechanism a lot so that it gives very similar outputs for all inputs, which in turn, leads to poor performance. To be specific, for two complete different database D1 , D2 , according to proposition 2 the output distributions of them should be of statistical distance at most n1 . This is generally not good for accuracy of the mechanism. 4 Transportation Distance Another well-known distance between probability distributions is transportation distance(a.k.a. earth-mover distance and Wasserstein Metric). This could possibly also be transplanted as definition for privacy. However, it does not have good argument about its essence like statistical distance. It worth mentioning that transportation distance is defined over a coupling function, whose idea could be probably utilized for privacy definition. 3 Notes on Approximability of Random Priority 2014.06.09 1 1.1 Problem and Algorithm Facility Location We have n clients and l facilities located on a d−dimensional space. Now we would like to assign clients to the facilities so that they can get the service they want. FacilityPi has a capacity ci , indicating the largest number of clients that it could be assigned (assuming 1≤i≤l ci = n). However, the location of the clients is unknown to us, and will be reported by them after we set up the assigning mechanism. Every clients wants to minimize the distance from his true location to the facility that he is assigned to. We want to design a truthful mechanism M so that the social welfare is maximized in some degree, where the social maximum is defined to be: Social Welfare = X 1≤i≤n d(i, M(i)) Here M(i) represents the facility that i is matched under the mechanism M, and d(·, ·) is the distance in the high-dimensional space. 1.2 Random Priority We state the following mechanism called Random Priority (also known as Random Serial Dictatorship) without proving its truthfulness. We uniformly choose a random permutation, and we let the clients report in the order chosen by us. When we receive a reported location, we assign the nearest available facility to it. The mechanism terminates until every client is matched. 2 An example In this note we give a lower bound for approximation ratio for the Random Priority scheme. To be specific, we proved the following theorem. Theorem 1. The guaranteed approximation ratio of RandomP riority mechanism is larger than O(n0.29 ). Proof. Consider the following example. 1 Suppose there are l facilities and n = 3l clients locating on a line (mostly at integer points as you will see). Let fi be the number and ci of facilities located at point i respectively. c1 = 1; c2 = 3 − 1; c22 = 32 − 31 , · · · , c2l = 3l − 3l−1 f− = 1; f2 = 3 − 1; f22 = 32 − 31 , · · · , f2l = 3l − 3l−1 For other i, ci = fi = 0, and is a sufficiently small number. The optimal assignment for this example is straightforward, for t ≥ 2, we assign clients at t to the facility at t, and we assign client at 1 to facility −. Thus, the total cost is 1 + . We then compute the cost given by Random Priority Mechanism. For the c1 + c2 = 3 clients at points 1 and 2, the probability that the client at 1 does not come the last among 3 is 31 . Thus, the probability that it is assigned to facility at 2 is (1 − 13 ), which also means one of clients at 2 would be located somewhere else. P Similar analysis for i ≥ 1: for the c1 + c2 + · · · + c2i clients at points 21 → 2i . Let si = ij=0 c2i , Q sk ) = ( 23 )i then the probability that there exists one that will be located at 2i+1 is ik=1 (1 − sk+1 By linearity of expectation, the expected cost given by Random Priority is: l X i=0 2 2i · ( )i+1 3 This is approximated as ( 43 )l , and note that n is approximated as 3l , from the fact: 4 64 256 4 ( )3 = ≤3≤ = ( )4 3 27 81 3 1 1 we know that the cost is between O(n 4 ) and O(n 3 ). In the next section, we optimize the constant in the example to obtain the inapproximability ratio in theorem. 3 Optimization of Example and Remarks We use the notation defined in previous section. First we optimize the example as follows. Let si r = si−1 for all i ≥ 1. We can see that n = O(rl ), the expected cost given by Random Priority is approximated as (2(1 − 1r ))l . We would like to find r such that 1 f (r) = logr 2(1 − ) r is maximized, in this case the example with parameter r will give us an lower bound on approximation ratio as O(nf (r) ). Since this function is a transcendental function, it is impossible to calculate analytical expression of the optimizer r and the maximal value of f (r). By observing from the graph of the function we obtain that maximizer is approximately 4.4 and optimal value is approximately 0.29. 2 We then give some vague remarks showing that in some sense this example is optimal. Remarks are not rigorous but may give us some intuition in proving the upper bound or indicate the points that we may find the breakthrough in constructing new examples. Remark 1. Line This example seems optimal in its line structure since for every metric space, the only nontrivial constraint for distance is triangle inequality. And the case for inequality to hold is that three or points lie in a line. Thus the ”extreme” case for example might be of the line structure. Remark 2. Parameters for Client Assignment Back to our expression for cost: cost = (1 − s0 s0 s1 s0 s1 sl−1 ) · 20 + (1 − )(1 − ) · 21 + · · · + (1 − )(1 − ) · · · (1 − ) · 2l s1 s1 s2 s1 s2 sl First, we need this series to be divergent, since a convergent series will give us a constant cost, which is small when compared with n, and in turn not able to give us a good lower bound on approximation ratio. Second, when this series is divergent, it is often the case that the last term dominates the whole s l series. In other words, it is of the same level of the sum. Consider (1 − ss01 )(1 − ss12 ) · · · (1 − l−1 sl ) · 2 , √ 2 it can be observed that (1 − a)(1 − b) ≤ (1 − ab) , then we tend to make the ratio between si and si+1 about the same since s0 = 1 and sl = n are fixed. Following these two observation, we come to the example mentioned in section 1. Although n0.29 might not be the optimal lower bound for all truthful mechanisms, it seems like the optimal lower bound we could obtain from Random Priority. 3 Machine Learning: Technical Report of Conway’s Inverse Game of Life Instructed by Liwei Wang Due on Jan 14th, 2013 H. Jiachen, T. Zihan, H. Heping, Z. Zeyu 1 Formulation of the Problem The Game of Life is a cellular automaton created by mathematician John Conway in 1970. The game consists of a board of cells that are either on or off. One creates an initial configuration of these on/off states and observes how it evolves. There are four simple rules to determine the next state of the game board, given the current state. 1.Any live cell with fewer than two live neighbors dies, as if caused by under-population. 2.Any live cell with two or three live neighbors lives on to the next generation. 3.Any live cell with more than three live neighbors dies, as if by overcrowding. 4.Any dead cell with exactly three live neighbors becomes a live cell, as if by reproduction. 2 Observations about Conway’s Reverse Game of Life Three important observations are made after some experiment. These features will lead us to design better learning algorithms. • Locality The state of a cell depends mostly on the origin configuration of its neighborhood, since the rule requires the resulting state of next round is determined by the states of surrounding 9 cells in the current round. This locality guides us to make decision based on neighboring configuration. • Clustering After some experiment of the game, an important is made: The more number of rounds is, the more clustering the final configuration is. This is also the performance of lives. People tend to live together locally but not separately. This feature to some extent characterize Conway’s Life game. • Non-Uniqueness Given an origin state, the proceeding algorithm is deterministic, but given a resulting state, the initial state is not unique. Thus, finding a feasible solution is meaningless. Instead, those origin configuration with high likelihood is what we need. 3 Ideas and Framework of Algorithm 3.1 Basic Notions Some definitions are in order first to let the following algorithms and arguments make sense. H. Jiachen, T. Zihan, H. Heping,Technical Z. Zeyu Report of Conway’s Inverse Game of Life Our data is generated in the following style: A random sampled configuration evolved 5 round, and the resulting configuration is given as the “base” configuration, and our input is the resulting configuration of “base” configuration after 5 round. The following graph clarify our notions for certain configurations and cells: Initial Configuration → Origin Configuration → Resulting Configuration Initial State → Origin State → Resulting State 3.2 Stream of Algorithms In this section, we introduce every algorithm that we have come up with in this project. Some of them have certain flaws, and new ideas and construction are invented to overcome them. • Local Perfect Trace Back After some experiment, the clustering observation first came into our mind. Since the origin state is the resulting state of a random sampled initial configuration after 5 rounds. It is stable in some measure, i.e. with good probability, the origin state is well-clustered already. Thus, it will be helpful to find Local Perfect Trace Back, the specific algorithm is the following: First we do clustering of the resulting state, then for every cluster we design an algorithm to figure out a feasible local origin configuration for it. Finally we combine these local configurations to be our guess of origin configuration. Drawback: Sometimes the performance of clustering is not that satisfying because the size of clustering is so large that it is rather inefficient to find the local perfect track back. On the other hand, the Non-Uniqueness observation tells us that finding a perfect track back is not that necessary and sometimes it has large deviation. Even if we get a perfect trace back for the whole configuration from combining partial ones, some mutation will happen at the connection/intersection of some clusters so that the resulting configuration of our state is not always good. • Local Likelihood for Single Guess Following the Non-Uniqueness observation, likelihood of neighboring configuration is computed to help determine the state of a cell in origin configuration. Thus, given the resulting configuration, we focus on one cell and make decision of its origin states based on the neighboring configuration. What is local? A explicit size should be determined to make the algorithm work. However, there is a trade off between the following two perspectives on deciding the size. (1) Small size limits the accuracy of our knowledge of likelihood, and it will therefore influence the probability of guessing right. (2) Large size needs large space in storage, and when given an resulting state, our algorithm is indeed inefficient for giving an output since it will search deeply into the data-tree otherwise the deep training is meaningless. After some experiment we set t = 5, i.e. we decide the origin state of a cell based on the knowledge of the 5 × 5 configuration in the resulting configuration. t = 5 also makes our knowledge symmetric in every direction. Drawback: On one hand, the coverage of 5 × 5 configuration is not good, here the coverage is the fraction between number of appeared configuration and the number of possible configurations (225 ), which makes our guess unavailable at some input resulting configuration, this drawback can be overcome by changing the method of searching 5 × 5 local configuration into developing a BFS style data-tree. On the other hand, the following problem occurs too many time so that the general performance is not satisfying. Page 2 of 6 H. Jiachen, T. Zihan, H. Heping,Technical Z. Zeyu Report of Conway’s Inverse Game of Life 0.4 0.4 0.4 0.4 ! (1) At some 2 × 2 local configuration, the likelihood is approximately 40 percent for each cell. However, according to method of maximum likelihood we should guess “dead”, but the real case of origin local configuration is usually one the following two: 1 0 0 1 ! 0 1 1 0 ! (2) This local error has huge impact on performance. It is because every time we make guess on a single cell but not a local small configuration. Certain relation between neighboring cells is not taken into concern. • Simulated Annealing for Local Likelihood for Single Guess The first change we made for overcoming the lack of relation in Single Guess learning algorithm is Simulated Annealing. We use standard Local Likelihood for Single Guess to get a guessing origin configuration, then evolve it 5 rounds to get a resulting state, then compare it with the input resulting state, add certain alive cells according to the likelihood, use the new resulting state as the input of algorithm and run Local Likelihood for Single Guess again. • Local Likelihood for Local Guess Following the Clustering observation, and the drawback of Local Likelihood for Single Guess algorithm, it is helpful to make decision on some local cells simultaneously rather than a single cell. Since it is observed that the method maximum likelihood often give us four dead cell output even if the likelihood to be alive is about 40 percent each, in which case it is of high probability that two of them are alive. We first develop a database containing the likelihood of central cell to be alive for each configuration. They are stored in a tree. Method of maximum likelihood is employed to made final decisions. And they also help to stop the development of the data tree for the limit of storage. The details are listed below: (1) Consider the limit of storage and the requirement of efficiency, we focus on deciding 2 × 2 local configuration based on our knowledge on the surrounding 4 × 4 local configuration in the input resulting configuration. (2) We compute the likelihood for every 2 × 2 possible outcome for local configuration, given that the resulting 4 × 4 local configuration is the input local configuration. Conditional probability method is used in guessing the origin local configuration. 3.3 Training The data tree is developed in BFS style. Every round a new layer is generated, which means a wider neighborhood comes into consideration. When the likelihood goes beyond the threshold probability at some leaf node, it then stops developing. Example: w = 6, t = 2, D = 1000000maps, p = 0.42, f = 400 Page 3 of 6 H. Jiachen, T. Zihan, H. Heping,Technical Z. Zeyu Report of Conway’s Inverse Game of Life Algorithm 1: Train Input: Search width w, train width t, training data D, probability threshold p, frequency threshold f Output: Binary search tree 1 TreeHead.EndSearch ← f alse 2 2 for TreeDepth d ← 1 to w − 1 do 3 for All the w × w rectangles ri in the end maps of D do 4 Node ← TreeHead 5 c ← the center cell of ri 6 while Node.HaveSon & Node.Depth < d do 7 if c is dead then 8 Node ← Node.Left 9 10 11 12 13 14 15 16 17 18 19 20 21 22 else Node ← Node.Right c ← Next c (Spiral from the center of ri ) if Node.Depth = d & Node.EndSearch = f alse then Node ← Node.Left/Right (Depends on c. If there is no son, new one) ri0 ← the corresponding t × t rectangle in the start map (The rectangle is the center of ri ) Node.Config(ri0 ).Freq++ Node.Freq++ for Every leaf node Node where Node.Depth = d + 1 do Node.EndSearch = true 2 for ri0 ∈ {0, 1}t do Node.Config(ri0 ).Freq if Node.Freq > f & | − 21 | < p then Node.Freq Node.EndSearch = f alse return TreeHead Page 4 of 6 H. Jiachen, T. Zihan, H. Heping,Technical Z. Zeyu Report of Conway’s Inverse Game of Life 3.4 Explicit Algorithm Box Method of maximum likelihood is employed to make decisions on local cells. Data for computing is extracted from our training tree. Algorithm 2: Run Input: End map E, search width w, TreeHead (with train width t), TreeHead’ (with train width 1), frequency threshold f 0 Output: Start map S 1 for Each cell c in E do 2 c0 ← c 3 Node ← TreeHead’ 4 while Node.HaveSon do 5 if c0 is dead then 6 Node ← Node.Left else Node ← Node.Right 7 8 9 10 11 12 13 14 15 16 17 if c0 ← Next c0 (Spiral from the center c) | Node.Config(1).Freq Node.Freq − 12 | > p0 then S.Position(c) ← sgn( Node.Config(1).Freq − 12 ) Node.Freq else Find all the t × t rectangles ri covering c Use the method in line 2-10 (but using TreeHead and c0 ← the center of ri ) to find all the t × t configration probabilities in ri Calculate the most likely configuration of the (2t − 1) × (2t − 1) rectangle with center c S.Position(c) ← the answer return S 4 Experiment and Result We tried three mechanisms by now. Here is the result. We first tried generating the tree mentioned in the previous section. Due to the fact that the tree may have limitations in this growth speed, we first tried a small training set (that is 100 ∗ 1024 configurations per layer). The result came out to be 12.35, and we can get the seventh at the time. Afterwards, we tried determining cells in block style, that is, we consider the overall probability of a 2 ∗ 2 grid. Unfortunately, this works better than naive tree generating mechanism only in the one-step case. Mechanism \ Steps Tree-Gen(1) Blockwise Tree-Gen(2) 1 9.99 9.68 9.51 2 12.21 12.24 11.79 3 12.96 13.04 12.66 4 13.24 13.37 13.10 5 13.36 13.48 13.29 Avg. 12.35 12.36 12.07 Table 1: Performance of three mechanisms we have tried. Page 5 of 6 H. Jiachen, T. Zihan, H. Heping,Technical Z. Zeyu Report of Conway’s Inverse Game of Life More efforts are devoted into the intuition of simulated annealing and the use of conditional probability. Unfortunately, due to limited time we didn’t come up with a version which works well, so we decided to construct a bigger tree using the first tree generation idea. This time we used 100 ∗ 16384 configurations per layer to construct the BST. Although the size of the tree has been considerably large, performance seemed to be better. The result error rate turned out to be 12.07 on my laptop, and in Kaggle 12.1038. We are now the 9th on the leaderboard. A bigger tree may lead another leap in performance, but we have no enough computational resources to afford it. Moreover, I think that there are still chance to significantly improve the performance of block style estimation. Page 6 of 6