Infinite Edge ParOOon Models for Overlapping Community

Transcription

Infinite Edge ParOOon Models for Overlapping Community
Infinite Edge Par..on Models for Overlapping Community Detec.on and Link Predic.on Mingyuan Zhou Department of Informa.on, Risk, and Opera.ons Management The University of Texas at Aus.n, Aus.n, TX, USA Introduc.on A hierarchical gamma process infinite edge par11on model is proposed to factorize the binary adjacency matrix of an unweighted undirected rela1onal network under a Bernoulli-­‐Poisson link: Ø The Bernoulli-­‐Poisson link connects each edge to a latent count that is further par11oned. Each node is assigned to one or mul1ple latent communi1es depending on how its edges are par11oned. Ø The model describes both homophily and stochas1c equivalence, and is scalable to big sparse networks by focusing its computa1on on pairs of linked nodes. q  Hierarchical Gamma Process q  Gamma Process Edge Par..on Model q  Hierarchical Gamma Process Edge Par..on Model The community-­‐affilia1on graph model of Yang & Leskovec (2012) can be considered as a special case if we restrict . q Gibbs Sampling via Data Augmenta.on and Marginaliza.on Ø The number of communi1es is automa1cally inferred in a nonparametric Bayesian manner, and efficient inference via Gibbs sampling is derived using novel data augmenta1on techniques. q  Modeling Components Poisson Factor Analysis: Modeling Assorta1vity: Both assorta1vity and dissorta1vity: Bernoulli-­‐Poisson Link: Link binary to count: Marginal distribu1on: Condi1onal posterior: Overlapping community structure: The count represents how oVen nodes i and j interact due to their affilia1ons with communi1es k1 and k2, respec1vely. 2015 Computa1on is mainly spent on pairs of linked nodes, as if b_ij=0, then all are equal to zeros almost surely. O(dN) instead of O(N^2), where d is the average node degrees. Ø It can not only discover overlapping communi1es and inter-­‐
community interac1ons, but also predict missing edges. Model and Inference q  Scalability for Big Sparse Networks Using inference techniques developed for the Bernoulli-­‐Poisson link, and the Poisson, mul1nomial, and nega1ve binomial distribu1ons. q  Protein-­‐Protein interac.on network Example Results q  NIPS234 Coauthor network