DAVA-tree - Computer Science at Virginia Tech
Transcription
DAVA-tree - Computer Science at Virginia Tech
DAVA: Distributing Vaccines over Networks under Prior Information Yao Zhang, B. Aditya Prakash Department of Computer Science Virginia Tech SDM, Philadelphia, April 24, 2014 Motivation: Epidemiology • Virus spreads over contact networks • SIR model [Anderson+ 1991] • Susceptible-Infectious-Recovered • Weights pij: propagation prob. from i to j • Recovered prob. δ for each node • (models mumps-like infections) 2 Zhang and Prakash, SDM2014 Motivation: Social Media • Meme/Rumor spreads over friendship networks • E.g.: Twitter following network • Independent cascade model (IC) [Kempe+ KDD2003] • Each node has only one chance to infect its neighbors • Special case of SIR model 3 Zhang and Prakash, SDM2014 Immunization • Centers for Disease Control (CDC) cares about containing epidemic diseases • E.g: ~400 million dollars used for vaccines for children in 2013 • Twitter tries to stop rumor spread • E.g.: rumors of victims after the Boston Marathon bombs in 2013 How to choose best nodes to vaccinate (remove)? 4 Zhang and Prakash, SDM2014 Immunization Pre-emptive immunization (choose nodes before the epidemic starts) • Acquaintance strategy [Cohen+ 2003] • pick a random person, immunize one of its neighbors at random • Netshield [Tong+ 2010] • Minimize the epidemic threshold (point when the virus takes-off) Good for baseline strategies 5 Zhang and Prakash, SDM2014 In reality Pre-emptive immunization (choose nodes before the epidemic starts) • Acquaintance strategy [Cohen+ 2003] • Netshield [Tong+ 2010] Typically the epidemic has already started! • More realistic intervention • Which nodes to vaccinate now? • We call it Data-Aware Immunization 6 this paper Zhang and Prakash, SDM2014 ? Outline • • • • • • 7 Motivation Problem Definition Complexity Our Proposed Methods Experiments Conclusion Zhang and Prakash, SDM2014 Data-Aware Vaccination Problem Problem: Given a set of infected nodes and a contact graph, how to distribute k vaccines (node removal) to minimize the expected number of infected nodes at the end of the epidemic? D A D Best solution A E B 1 vaccine? E B F C pij =1 for all edges 8 F Remove A, save {A, D}; Remove B, save {B}; Remove C, save {C}; Zhang and Prakash, SDM2014 C Outline • • • • • • 9 Motivation Problem Definition Complexity Our Proposed Methods Experiments Conclusion Zhang and Prakash, SDM2014 Complexity of DAV See paper for details • NP-hard • Reduce from Maximum K-Intersection Problem (MaxKI: maximizing the intersection of k subsets) • MaxKI is NP-Complete [Vinterbo 2004] • Approximation algorithm? • Not submodular • Actually, DAV is hard to approximate within an absolute error! 10 Zhang and Prakash, SDM2014 Outline • • • • Motivation Problem Definition Complexity Our Proposed Methods • assume IC model and undirected graph • Experiments • Conclusion 11 Zhang and Prakash, SDM2014 1: Simplify - Merging infected nodes • Idea: merge all the infected nodes into a single ‘super infected’ node I Merged Graph Original Graph A pA pX B Equivalent Super node I pA pB A B pY Logical-OR pB=1-(1-pX)(1-pY) pC C 12 pC C Zhang and Prakash, SDM2014 2: DAVA-Tree Algorithm: Idea • Select nodes with the largest “benefit” • : the expected number of saved nodes after removing set S on graph G • Benefit of adding additional node j into S: # of saved nodes after adding j into S Merged Infected Node Benefit: 4 Additional number of saved nodes when Benefit: 5 adding node j into S pij =1for all edges 13 Benefit: 2 Zhang and Prakash, SDM2014 DAVA-Tree Alg.: Optimal on Trees For any set S: Merged Infected Node • Fact 1: the chosen nodes in the optimal set must be neighbors of infected node I • Fact 2: the benefit of each such node is independent of the rest of the set S Benefit: 2 pij =1for all edges Linear Time Benefit: 4 DAVA-tree algorithm: Select top k node from I’s neighbors with the max. benefit 14 Zhang and Prakash, SDM2014 Benefit: 5 3: General Case – Arbitrary Graphs • Idea • We have the optimal algorithm for a tree • Extract a spanning tree, then run DAVA-tree • What kind of tree? • Minimum spanning tree Optimal solution MST pij =1 for all edges 15 Zhang and Prakash, SDM2014 Optimal on MST by DAVA-tree 3: General Case – Arbitrary Graphs • Idea • We have the optimal algorithm for a tree • Build a spanning tree first • What kind of tree? • Minimum spanning tree Software engineering We propose to use dominator tree u dominates v every path from I to v contains u pij =1 for all edges 16 4 dominates 8,9,10,11 Zhang and Prakash, SDM2014 Dominator Tree u is immediate dominator of v u dominates v AND every other dominator of v dominates u Dominator tree: add an edge between every such u and v Optimal solution Linear time [Buchsbaum, Tarjan 1998] pij =1 for all edges Optimal from DAVA-tree Dominator Tree Merged Graph • Fact 1: the optimal solution should be among the children of root I in the dominator tree for any arbitrary graph • Fact 2: (for special case, k = 1, p = 1) running DAVA-tree on the dominator tree gives the optimal solution 17 Zhang and Prakash, SDM2014 Weighting the dominator tree • Weighting the dominator tree • #P-complete • Our solution: maximum propagation path probability between nodes I and v (using Dijkstra’s algorithm) w1 p1 p3 p6 w3 w6 Dominator Tree Merged Graph 18 Zhang and Prakash, SDM2014 DAVA algorithm Merged Graph (pij =1 for all edges) Step: 1. T = Build a dominator tree 2. v = Run DAVA-tree on T with budget=1 3. Remove v from G 4. Goto Step 1 until |S|=k |S|=2 Iteration=1 19 Zhang and Prakash, SDM2014 Dominator Tree Merged Graph DAVA algorithm Step: 1. T = Build a dominator tree 2. v = Run DAVA-tree on T with budget=1 3. Remove v from G 4. Goto Step 1 until |S|=k Remove selected node O(k(|E|+ |V|log|V|)) Too slow for large networks! Dominator tree |S|=2 Iteration=2 Iteration=1 20 Zhang and Prakash, SDM2014 DAVA-fast: a faster algorithm Merged Graph Step: 1. T = Build a dominator tree 2. S = Run DAVA-tree on T with budget=k |S|=2 • In practice, the performance of DAVA-fast is very close to DAVA • Time complexity: subquadratic! – DAVA-fast: O(|V|log|V|+|E|) Dominator tree 21 Zhang and Prakash, SDM2014 Extending to SIR model • See the paper 22 Zhang and Prakash, SDM2014 Outline • • • • • • 23 Motivation Problem Definition Complexity Our Proposed Methods Experiments Conclusion Zhang and Prakash, SDM2014 Experiments • Virus Propagation Model • IC and SIR • Settings (See more settings in the paper) • Randomly uniformly chosen initial infected nodes • Baseline Algorithms • • • • RANDOM: randomly uniformly chosen healthy nodes DEGREE: choose nodes with top weighted degrees PAGERANK: choose nodes with top pageranks NETSHIELD • state-of-the-art pre-emptive immunization algorithm to minimize the epidemic threshold of the graph [Tong+ ICDM 2010] • Assumes no data is given before the epidemic starts 24 Zhang and Prakash, SDM2014 Experiments: datasets Datasets are chosen from different domains • Social media (IC model) • • • • OREGON: AS router graph STANFORD: hyperlink network GNUTELLA: peer-to-peer network BRIGHTKITE: friendship network • Epidemiology (SIR model) • PORTLAND and MIAMI: large urban social-contact graph used in national smallpox modeling studies [Eubank+, 2004] OREGON STANFORD GNUTELLA BRIGHTKITE PORTLAND MIAMI |V| 633 8,929 10,876 58,228 0.5 million 0.6 million |E| 2,172 53,829 39,994 21,4078 1.6 million 2.1 million 25 Zhang and Prakash, SDM2014 Experiments: Quality GNUTELLA (IC model) PORTLAND (SIR model) Higher is better DAVA consistently outperforms the baseline algorithms. Further DAVA-fast performs almost as well as DAVA. (See more results in the paper) 26 Zhang and Prakash, SDM2014 Experiments: Scalability Lower is better 27 Running time(sec.) did not finish within 10 hours Zhang and Prakash, SDM2014 Outline • • • • • • 28 Motivation Problem Definition Complexity Our Proposed Methods Experiments Conclusion Zhang and Prakash, SDM2014 Conclusion Data-Aware Vaccination problem Given: Graph and Infected nodes Find: ‘best’ nodes for immunization • Complexity Graph with infected nodes • NP-hard • Hard to approximate within an absolute error • DAVA-tree Merged graph • Optimal solution on the tree • DAVA and DAVA-fast • Merging infected nodes • Build a dominator tree, and run DAVA-tree • Running time: subquadratic • DAVA: O(k(|E|+ |V|log|V|)) • DAVA-fast: O(|E|+|V|log|V|) 29 Zhang and Prakash, SDM2014 Dominator tree Any Questions? Graph with infected nodes Code at: http://people.cs.vt.edu/~yaozhang Merged graph Yao Zhang B. Aditya Prakash Thanks for the support of NSF (Grant No. IIS1353346). 30 Zhang and Prakash, SDM2014 Dominator tree