Bayesian estimation of complex networks and dynamic choice in the
Transcription
Bayesian estimation of complex networks and dynamic choice in the
Data sets from the music broadcasting industry Multidimensional panel data An exponential random model Estimation method Bayesian estimation of complex networks and dynamic choice in the music industry Stefano Nasini Víctor Martínez-de-Albéniz Dept. of Production, Technology and Operations Management, IESE Business School, University of Navarra, Barcelona, Spain Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015 Data sets from the music broadcasting industry Multidimensional panel data An exponential random model Estimation method Outline 1 Data sets from the music broadcasting industry 2 Multidimensional panel data 3 An exponential random model Multidimensional Gaussian reduction The exponential family of distribution 4 Estimation method Numerical results Goodness of fit Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015 Data sets from the music broadcasting industry Multidimensional panel data An exponential random model Estimation method Artist goods: the music broadcasting industry Artist goods Their life cycles that resemble clothing fashion trends, with a time window in which their popularity increases shortly after their premiere and then decrease. This is due to network externalities in individual preferences and opinions. Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015 Data sets from the music broadcasting industry Multidimensional panel data An exponential random model Estimation method Artist goods: the music broadcasting industry A data set of songs played on TV channels and radio stations Broadcasting companies Artists Songs Time periods Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015 Germany 41 13860 48785 163 weeks UK 51 16169 65531 163 weeks Data sets from the music broadcasting industry Multidimensional panel data An exponential random model Estimation method Artist goods: the music broadcasting industry A song’s popularity increases after their premiere and then decrease (a) B. Mars, Just the way you are in Germany. (c) B. Mars, Just the way you are in the UK. Stefano Nasini, Víctor Martínez-de-Albéniz (b) B. Mars, Locked Out Of Heaven in Germany. (d) B. Mars, Locked Out Of Heaven in the UK. ENBIS-Spring-meeting-2015 Data sets from the music broadcasting industry Multidimensional panel data An exponential random model Estimation method Artist goods: the music broadcasting industry Correlated choices from different broadcasting companies BBC 1 Xtra Capital FM Kiss 100 FM Metro Radio Radio City Smooth Radio London BBC 1 Xtra 1.000 Capital FM 0.729 1.000 Kiss 100 FM 0.668 0.814 1.000 Metro Radio 0.686 0.830 0.829 1.000 Radio City 0.010 -0.135 -0.142 0.078 1.000 Smooth Radio London – – – – – 1.000 Table: Spearman’s correlations among the dynamic plays of Locked Out Of Heaven. BBC 1 Xtra Capital FM Kiss 100 FM Metro Radio Radio City Smooth Radio London BBC 1 Xtra 1.000 Capital FM 0.508 1.000 Kiss 100 FM – – 1.000 Metro Radio 0.329 0.417 – 1.000 Radio City 0.001 -0.128 – -0.268 1.000 Smooth Radio London -0.076 -0.091 – -0.222 0.495 1.000 Table: Spearman’s correlations among the dynamic plays of Just the way you are. Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015 Data sets from the music broadcasting industry Multidimensional panel data An exponential random model Estimation method Artist goods: the music broadcasting industry Our goal is to have a joint model which allows . . . Predicting the common life cycle of song diffusion within the music broadcasting industry. Detecting the structure of imitation and spillover between radio stations and TV channels, based on the observed correlations. Taking decision about what’s the best broadcasting industry to launch a song in order to maximize the future number of plays. Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015 Data sets from the music broadcasting industry Multidimensional panel data An exponential random model Estimation method Multidimensional panel data as two-mode network Notation R := set of individuals (primary layer); S := set of item (secondary layer); T := set of time periods; xst = [xs1t xs2t . . . xs|R|t ]T ∈ χ is the |R|-dimensional connection profile of the sth item at time t. E ⊆ R × R := a set of connections between broadcasting industries; Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015 Data sets from the music broadcasting industry Multidimensional panel data An exponential random model Estimation method Multidimensional panel data as two-mode network Spillover measurements to internalize cross-section dependency in the panel 1 i Ghk (xst ; xs,t−1 , . . . , xs,t−τ ) = |E|τ Pτ 1 ii Ghk (xst ; xs,t−1 , . . . , xs,t−τ ) = − |E|τ `=0 Pτ Stefano Nasini, Víctor Martínez-de-Albéniz 1 d` (xsht )uh (xsk (t−`) )uk p ; `=0 2 xsk (t−`) p x ; d` usht − u h k ENBIS-Spring-meeting-2015 Data sets from the music broadcasting industry Multidimensional panel data An exponential random model Estimation method Multidimensional Gaussian reduction The exponential family of distribution An exponential random model X P(xst | xs,t−1 , . . . , xs,t−τ ) ∝ h(xst ) exp αst Ss + r ∈R βr Rr + X γhk Ghk (h, k )∈E - Sst accounts for the size effect of each item in the secondary layer, for s ∈ S; - Rr accounts for the size effect of each individual in the primary layer, for r ∈ R; - Ghk internalizes the one-mode projection into the primary layer, for (h, k ) ∈ R; Underlying measure: either h(xst ) = Y 1 xsrt ! or h(xst ) = (2π)− r ∈R Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015 (τ +1)|R| 2 Data sets from the music broadcasting industry Multidimensional panel data An exponential random model Estimation method Multidimensional Gaussian reduction The exponential family of distribution An exponential random model The spillover measurement Ghk plays an important role. 0 0 P(xsrt | xsr 0 t 0 such that r 6= r , t < t) ∝ where η= 1 τ |E| X γrk k ∈R τ X 1 η= τ |E| γr 1 . . . γrn xsrt ! exp αst + βr η T xsrt C(xsrt ) 1 (xsk (t−`) ) p and C(xsrt ) = (xsrt ) p , for i, `=1 2 τ x X xs1(t−`) p srt d` − ur u1 `=1 . and C(xsrt ) = . . τ X xsn(t−`) p2 x srt d` − ur un `=1 1 1 Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015 for ii. ! , Data sets from the music broadcasting industry Multidimensional panel data An exponential random model Estimation method Multidimensional Gaussian reduction The exponential family of distribution An exponential random model α = 1 and γ = 1 α = −1 and γ = 1 Stefano Nasini, Víctor Martínez-de-Albéniz Spillover measurement 1 x!y ! exp(α(x + y ) + γ(xy )1/2 ) 1 x!y ! exp(α(x + y ) + γ|x − y |) ENBIS-Spring-meeting-2015 Data sets from the music broadcasting industry Multidimensional panel data An exponential random model Estimation method Multidimensional Gaussian reduction The exponential family of distribution Multidimensional Gaussian reduction Under special conditions: P(xst | xs,t−1 , . . . , xs,t−τ ) ∝ h(xst ) exp αst Ss + X βr Rr + r ∈R - Ghk (xst ; xs,t−1 , . . . , xs,t−τ ) = - h(xst ) = (2π)− Xst . . . Xs,t−τ (τ +1)|R| 2 Pτ `=0 X γhk Ghk (h, k )∈E d` xsht xsk (t−`) ; ; ∼ N (µ, Σ) , where µ = Σ αst e + β 1 . and Σ = − . . 2 αs,t−τ e + β Stefano Nasini, Víctor Martínez-de-Albéniz d0 Γ . . . dτ Γ ENBIS-Spring-meeting-2015 ... ... −1 dτ Γ . . . . d0 Γ Data sets from the music broadcasting industry Multidimensional panel data An exponential random model Estimation method Multidimensional Gaussian reduction The exponential family of distribution Why is our model an extension of the ERGM? Exponential Family Whenever the density of a random variable may be written f (x) ∝ h(x) exp{θT C(x)} the family of all such random variables (for all possible θ) is called an exponential family. Exponential Random Graph Model (ERGM) Pθ (X = x) = exp{θ T C(x)} , where Z (θ) X is a random network on n nodes (a matrix of 0’s and 1’s); θ is a vector of parameters; C(x) is a known vector of graph statistics on x. Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015 Data sets from the music broadcasting industry Multidimensional panel data An exponential random model Estimation method Multidimensional Gaussian reduction The exponential family of distribution Why it is difficult to find the MLE The log-likelihood function - the model: P(X = x(0) |θ) = exp{θ T C(x(0) )} , where x(0) is the Z (θ) observed data set. - The log-likelihood function is `(θ) = θ T C(x(0) ) − log Z (θ) X = θ T C(x(0) ) − log ! T exp{θ C(x)} all possible x - Even in the simplest case of undirected graphs without self-edges, the number of graphs in the sum is very large. Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015 Data sets from the music broadcasting industry Multidimensional panel data An exponential random model Estimation method Multidimensional Gaussian reduction The exponential family of distribution Maximum Pseudo-likelihood Let xw be a unique component of x and x−w the vector of all the remaining components. The pseudo-likelihood function Let’s approximate the marginal P(xw |θ) by the conditional P(xw |x−w ; θ)? Y e = Then `(θ) P(xw |x−w ; θ). w Result: The maximum pseudo-likelihood estimate. Unfortunately, little is known about the quality of MPL estimates. Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015 Data sets from the music broadcasting industry Multidimensional panel data An exponential random model Estimation method Multidimensional Gaussian reduction The exponential family of distribution Pseudo-likelihood for ERGM Notation: For a network x and a pair (i, j) of nodes e `(θ) = Y P(xw |x−w ; θ) w = exp{θ T C(x(0) )} Y T (i,j) =Q exp{θ C(xij = 1, x−ij )} + exp{θ T C(xij = 0, x−ij )} (i,j) exp{n(n − 1)θ T C(x(0) )} exp{θ T C(xij = 1, x−ij )} + exp{θ T C(xij = 0, x−ij )} Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015 Data sets from the music broadcasting industry Multidimensional panel data An exponential random model Estimation method Multidimensional Gaussian reduction The exponential family of distribution Pseudo-likelihood for our model Pseudo-likelihood for our model e `(θ) = Y P(xsrt | xsr 0 t 0 such that r 0 6= r , t 0 < t) (r ,t) ∝ Y 1 (r ,t) xsrt ! exp αst + βr η T xsrt C(xsrt ) ! , What is the normalizing constant for the full conditional? Z (αst , βr , η) = X xsrt ≥0 1 xsrt ! exp αst + βr η T xsrt C(xsrt ) Even the pseudo-likelihood is hard to define for our model Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015 ! Data sets from the music broadcasting industry Multidimensional panel data An exponential random model Estimation method Multidimensional Gaussian reduction The exponential family of distribution Pseudo-likelihood for our model Pseudo-likelihood for our model e `(θ) = Y P(xsrt | xsr 0 t 0 such that r 0 6= r , t 0 < t) (r ,t) ∝ Y 1 (r ,t) xsrt ! exp αst + βr η T xsrt C(xsrt ) ! , What is the normalizing constant for the full conditional? Z (αst , βr , η) = X xsrt ≥0 1 xsrt ! exp αst + βr η T xsrt C(xsrt ) Even the pseudo-likelihood is hard to define for our model Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015 ! Data sets from the music broadcasting industry Multidimensional panel data An exponential random model Estimation method Goodness of fit Bayesian posterior Let θ = [α1t , . . . , α|S|t , β1 , . . . , β|R| , γ11 , . . . , γ|R|,|R| ]T be the vector of natural parameters, π(θ) a prior distribution and x(0) the observed data set. By applying the Bayes rule we have: P(θ | x(0) ) P(x(0) | θ)π(θ) = Z P(x(0) | θ)π(θ) dθ θ P(x1 . . . , xτ ; θ) = Z P(x1 . . . , xτ ; θ) θ w Y P(xt | xt−1 . . . , xt−τ ; θ)π(θ) t = τ +1 w Y P(xt | xt−1 . . . , xt−τ ; θ)π(θ) dθ t = τ +1 P(x1 . . . , xτ ; θ) = Z P(x1 . . . , xτ ; θ) θ Stefano Nasini, Víctor Martínez-de-Albéniz π(θ) Z (θ) π(θ) Z (θ) w m Y Y t = τ +1 s=1 w m Y Y qs,t,θ (xst ) qs,t,θ (xst ) dθ t = τ +1 s=1 ENBIS-Spring-meeting-2015 Data sets from the music broadcasting industry Multidimensional panel data An exponential random model Estimation method Goodness of fit Metropolis-Hastings Since both P(x(0) | θ) and P(θ | x(0) ) can only be specified under proportionality conditions, almost all known valid MCMC algorithms for θ cannot be applied. Consider for instance the Metropolis-Hastings acceptance probability: πaccept (θ, θ 0 ) = min = min 1, P(x(0) | θ 0 )π(θ 0 ) Q(θ | θ 0 ) × Q(θ 0 | θ) P(x(0) | θ)π(θ) P(x1 . . . , xτ ; θ 0 ) 1, P(x1 . . . , xτ ; θ) w m Y Y t = τ +1 s=1 w m Y Y qs,t,θ0 (xst )π(θ 0 ) × qs,t,θ (xst )π(θ) t = τ +1 s=1 where Q(θ 0 | θ) is the proposal distribution. Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015 0 Z (θ) Q(θ | θ ) Z (θ 0 )Q(θ 0 | θ) Data sets from the music broadcasting industry Multidimensional panel data An exponential random model Estimation method Goodness of fit Specialized MCMC for doubly intractable distributions Murray proposed a MCMC approach which overcomes the drawback to a large extent, based on the simulation of the joint distribution of the parameter and the sample spaces, conditioned to the observed data set x(0) , that is to say P(x, θ | x(0) ). Algorithm 1 Exchange algorithm of Murray. 1: Initialize θ 2: repeat 3: Draw θ 0 from an arbitrary proposal distribution; 4: Draw x0 from P(. | θ 0 ) P(x0 | θ)P(x(0) | θ 0 )π(θ 0 ) Accept θ 0 with probability min 1, P(x(0) | θ)P(x0 | θ 0 )π(θ) 6: Update θ 7: until Convergence 5: Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015 Data sets from the music broadcasting industry Multidimensional panel data An exponential random model Estimation method Goodness of fit Goodness of fit: graphical illustration Total number of plays along time by the top-30 songs (a) Full model. Stefano Nasini, Víctor Martínez-de-Albéniz (b) Null model (γ = 0). ENBIS-Spring-meeting-2015 Data sets from the music broadcasting industry Multidimensional panel data An exponential random model Estimation method Goodness of fit Goodness of fit: graphical illustration Total number of plays along time by the top-30 songs (a) Total plays along time. Stefano Nasini, Víctor Martínez-de-Albéniz (b) Market share. ENBIS-Spring-meeting-2015 Data sets from the music broadcasting industry Multidimensional panel data An exponential random model Estimation method Goodness of fit Reducing the dimensionality of the parameter space Model specification based on structural properties of the music industry The parameter space is the whole (|T | × |S| + |R| + |E|)-dimensional Euclidean space, while the sample space has dimension (|T | × |S| × |R|). We use two strategies to reduce the dimensionality of the parameter space: A. Define communities of broadcasting companies to consider only within-group spillover effects γ; B. Define a functional form for the effect of the song life cycle α. Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015 Data sets from the music broadcasting industry Multidimensional panel data An exponential random model Estimation method Goodness of fit Reducing the dimensionality of the parameter space A. Reducing the |E| effects γ Pairwise spillover effects γkh , between individual companies h and k with the same radio format. B. Reducing the |T | × |S| effects α The broadcasting pattern of songs exhibit a time window in which their popularity quickly increases shortly after their premier and then decreases. Common spillover effect between different radio formats γkh , if h and k have different formats. Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015 Data sets from the music broadcasting industry Multidimensional panel data An exponential random model Estimation method Goodness of fit Groups of broadcasting companies WITHIN FORMAT – BETWEEN FORMATS TV channels Let’s introduce only the effects γ which are associated to TV channels and radio station of the same format. Contemporary and Easy listening Top 40 and Urban Radio stations Rock music Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015 Data sets from the music broadcasting industry Multidimensional panel data An exponential random model Estimation method Goodness of fit The estimated spillover effects The estimated spillover effects Contemporary Contemporary Rock News Sport Top-40 World-Music (−0.035, −0.021) (−0.023, 0.047) (−0.009, 0.076) (−0.070, 0.001) (−0.017, 0.014) Rock (−0.089, 0.004) News (0.012, 0.021) (−0.049, 0.037) (−0.072, −0.010) (−0.036, −0.001) (−0.068, 0.030) (−0.083, 0.022) (−0.052, 0.000) (−0.029, 0.036) (−0.022, 0.005) (−0.038, 0.022) (−0.017, 0.011) Top-40 (−0.164, 0.012) (−0.032, 0.001) (−0.005, −0.024) (−0.015, 0.013) World-Music (−0.030, 0.019) (−0.015, −0.021) (0.009, 0.030) (−0.029, 0.001) (−0.025, 0.019) TV channels (−0.186, −0.068) (−0.014, 0.024) (−0.291, −0.038) TV channels BBC 1 Xtra BBC 1 Xtra Capital FM Kiss 100 FM Metro Radio Radio City Smooth R. London Sport (−0.028, 0.014) (−0.018, 0.001) (−0.035, 0.008) (−0.015, 0.051) (−0.020, 0.124) (−0.008, 0.094) (−0.019, 0.110) (−0.033, 0.011) Capital FM (−0.009, 0.060) (−0.028, 0.025) (−0.009, 0.012) (−0.040, 0.012) (−0.021, 0.014) Kiss 100 FM (−0.104, 0.057) (−0.060, 0.001) (−0.027, 0.026) (−0.015, 0.022) (−0.022, 0.016) Stefano Nasini, Víctor Martínez-de-Albéniz Metro Radio (−0.015, 0.012) (−0.009, 0.025) (−0.009, 0.025) (−0.021, 0.009) (−0.032, 0.023) Radio City (0.005, 0.024) (0.000, 0.025) (−0.032, 0.021) (−0.014, 0.037) (−0.022, 0.001) ENBIS-Spring-meeting-2015 Smooth R. London (−0.015, 0.012) (−0.013, 0.019) (0.001, 0.029) (0.000, 0.055) (0.010, 0.033) Data sets from the music broadcasting industry Multidimensional panel data An exponential random model Estimation method Goodness of fit Songs’ dynamics Define a functional form for the effect of song dynamics The attractiveness trajectory of the sth song can be specified by letting t0 be the starting week when the song is launched and then considering a gamma kernel to design the shape its time dynamics: 0 δs + δs1 (t − t0 ) + δs2 log(t − t0 ) if t > t0 αst = −∞ otherwise where t0 is the week when the song has been launched. Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015 Data sets from the music broadcasting industry Multidimensional panel data An exponential random model Estimation method Goodness of fit Songs life cycle Common life cycle of the top-30 songs Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015 Data sets from the music broadcasting industry Multidimensional panel data An exponential random model Estimation method Goodness of fit Propagation of the broadcasting decision after the premier week t0 . max T 1 X X S s∈S 0 t =1 X subject to yr = 1 i h E xs,•,t+t 0 |xsrt = zr : for all r ∈ R , r ∈R zr ≤ min{Myr , φ} r ∈ R, yr ∈ {0, 1}, zr ≥ 0, F ≥ φ ≥ 0 r ∈ R, Format Eigenvector Contemporary Rock News Sport Top-40 World Music TV-channels 0.098 0.121 0.098 0.177 0.097 0.187 0.101 Expected plays in t0 + 1 φ = 10 φ = 100 265.795 267.647 265.209 261.803 265.609 264.058 260.301 263.021 264.272 265.318 267.345 266.350 264.165 263.425 Stefano Nasini, Víctor Martínez-de-Albéniz Expected plays in t0 + 2 φ = 10 φ = 100 265.949 267.720 265.687 261.381 265.995 263.211 260.875 263.055 264.879 265.098 266.858 266.603 264.171 263.438 ENBIS-Spring-meeting-2015 Data sets from the music broadcasting industry Multidimensional panel data An exponential random model Estimation method Goodness of fit Discussion Which are the real achievements of this work? We considered a large multidimensional panel of songs weekly broadcasted on radio stations and TV channels and detect a pattern of cross-section dependencies, based on pairwise imitations. An exponential random model has been proposed to internalized in a unique probabilistic framework both the songs’ life cycle and the complex correlation structure. A specialized MCMC method has been implemented to estimate the model parameters. The out-of-sample goodness of fit has been analyzed, assessing the model adequacy for the observed data set. Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015 Data sets from the music broadcasting industry Multidimensional panel data An exponential random model Estimation method Goodness of fit THANK YOU FOR YOUR ATTENTION Acknowledgements The research leading to these results has received funding from the European Research Council under the European Union’s Seventh Framework Programme (FP/2007-2013) / ERC Grant Agreement n. 283300. Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015