Bayesian estimation of complex networks and dynamic choice in the

Transcription

Bayesian estimation of complex networks and dynamic choice in the
Data sets from the music broadcasting industry
Multidimensional panel data
An exponential random model
Estimation method
Bayesian estimation of complex networks and
dynamic choice in the music industry
Stefano Nasini
Víctor Martínez-de-Albéniz
Dept. of Production, Technology and Operations Management,
IESE Business School, University of Navarra,
Barcelona, Spain
Stefano Nasini, Víctor Martínez-de-Albéniz
ENBIS-Spring-meeting-2015
Data sets from the music broadcasting industry
Multidimensional panel data
An exponential random model
Estimation method
Outline
1
Data sets from the music broadcasting industry
2
Multidimensional panel data
3
An exponential random model
Multidimensional Gaussian reduction
The exponential family of distribution
4
Estimation method
Numerical results
Goodness of fit
Stefano Nasini, Víctor Martínez-de-Albéniz
ENBIS-Spring-meeting-2015
Data sets from the music broadcasting industry
Multidimensional panel data
An exponential random model
Estimation method
Artist goods: the music broadcasting industry
Artist goods
Their life cycles that resemble
clothing fashion trends, with a time
window in which their popularity
increases shortly after their
premiere and then decrease.
This is due to network externalities
in individual preferences and
opinions.
Stefano Nasini, Víctor Martínez-de-Albéniz
ENBIS-Spring-meeting-2015
Data sets from the music broadcasting industry
Multidimensional panel data
An exponential random model
Estimation method
Artist goods: the music broadcasting industry
A data set of songs played on TV
channels and radio stations
Broadcasting companies
Artists
Songs
Time periods
Stefano Nasini, Víctor Martínez-de-Albéniz
ENBIS-Spring-meeting-2015
Germany
41
13860
48785
163 weeks
UK
51
16169
65531
163 weeks
Data sets from the music broadcasting industry
Multidimensional panel data
An exponential random model
Estimation method
Artist goods: the music broadcasting industry
A song’s popularity increases after their premiere and then decrease
(a) B. Mars, Just the way you are in Germany.
(c) B. Mars, Just the way you are in the UK.
Stefano Nasini, Víctor Martínez-de-Albéniz
(b) B. Mars, Locked Out Of Heaven in Germany.
(d) B. Mars, Locked Out Of Heaven in the UK.
ENBIS-Spring-meeting-2015
Data sets from the music broadcasting industry
Multidimensional panel data
An exponential random model
Estimation method
Artist goods: the music broadcasting industry
Correlated choices from different broadcasting companies
BBC 1 Xtra
Capital FM
Kiss 100 FM
Metro Radio
Radio City
Smooth Radio London
BBC 1 Xtra
1.000
Capital FM
0.729
1.000
Kiss 100 FM
0.668
0.814
1.000
Metro Radio
0.686
0.830
0.829
1.000
Radio City
0.010
-0.135
-0.142
0.078
1.000
Smooth Radio London
–
–
–
–
–
1.000
Table: Spearman’s correlations among the dynamic plays of Locked Out Of Heaven.
BBC 1 Xtra
Capital FM
Kiss 100 FM
Metro Radio
Radio City
Smooth Radio London
BBC 1 Xtra
1.000
Capital FM
0.508
1.000
Kiss 100 FM
–
–
1.000
Metro Radio
0.329
0.417
–
1.000
Radio City
0.001
-0.128
–
-0.268
1.000
Smooth Radio London
-0.076
-0.091
–
-0.222
0.495
1.000
Table: Spearman’s correlations among the dynamic plays of Just the way you are.
Stefano Nasini, Víctor Martínez-de-Albéniz
ENBIS-Spring-meeting-2015
Data sets from the music broadcasting industry
Multidimensional panel data
An exponential random model
Estimation method
Artist goods: the music broadcasting industry
Our goal is to have a joint model which allows . . .
Predicting the common life cycle of song diffusion within
the music broadcasting industry.
Detecting the structure of imitation and spillover between
radio stations and TV channels, based on the observed
correlations.
Taking decision about what’s the best broadcasting
industry to launch a song in order to maximize the future
number of plays.
Stefano Nasini, Víctor Martínez-de-Albéniz
ENBIS-Spring-meeting-2015
Data sets from the music broadcasting industry
Multidimensional panel data
An exponential random model
Estimation method
Multidimensional panel data as two-mode network
Notation
R := set of individuals (primary layer); S := set of item (secondary layer); T := set of time periods;
xst = [xs1t xs2t . . . xs|R|t ]T ∈ χ is the |R|-dimensional connection profile of the sth item at time t.
E ⊆ R × R := a set of connections between broadcasting industries;
Stefano Nasini, Víctor Martínez-de-Albéniz
ENBIS-Spring-meeting-2015
Data sets from the music broadcasting industry
Multidimensional panel data
An exponential random model
Estimation method
Multidimensional panel data as two-mode network
Spillover measurements to internalize cross-section dependency in the panel
1
i Ghk (xst ; xs,t−1 , . . . , xs,t−τ ) = |E|τ
Pτ
1
ii Ghk (xst ; xs,t−1 , . . . , xs,t−τ ) = − |E|τ
`=0
Pτ
Stefano Nasini, Víctor Martínez-de-Albéniz
1
d` (xsht )uh (xsk (t−`) )uk p ;
`=0
2
xsk (t−`) p
x
;
d` usht −
u
h
k
ENBIS-Spring-meeting-2015
Data sets from the music broadcasting industry
Multidimensional panel data
An exponential random model
Estimation method
Multidimensional Gaussian reduction
The exponential family of distribution
An exponential random model


X
P(xst | xs,t−1 , . . . , xs,t−τ ) ∝ h(xst ) exp αst Ss +
r ∈R
βr Rr +
X
γhk Ghk 
(h, k )∈E
- Sst accounts for the size effect of each item in the secondary layer, for s ∈ S;
- Rr accounts for the size effect of each individual in the primary layer, for r ∈ R;
- Ghk internalizes the one-mode projection into the primary layer, for (h, k ) ∈ R;
Underlying measure:
either h(xst ) = Y
1
xsrt !
or h(xst ) = (2π)−
r ∈R
Stefano Nasini, Víctor Martínez-de-Albéniz
ENBIS-Spring-meeting-2015
(τ +1)|R|
2
Data sets from the music broadcasting industry
Multidimensional panel data
An exponential random model
Estimation method
Multidimensional Gaussian reduction
The exponential family of distribution
An exponential random model
The spillover measurement Ghk plays an important role.
0
0
P(xsrt | xsr 0 t 0 such that r 6= r , t < t) ∝
where
η=
1
τ |E|

X

γrk
k ∈R
τ
X
1




η=
τ |E| 

γr 1
. 
. 
. 
γrn
xsrt !
exp
αst + βr
η
T xsrt
C(xsrt )
1
(xsk (t−`) ) p  and C(xsrt ) = (xsrt ) p ,
for i,
`=1
2
τ
x
X
xs1(t−`) p
srt

d` −

ur
u1
 `=1


.

and C(xsrt ) = 
.
.

 τ
 X
xsn(t−`) p2
x
srt

d` −
ur
un
`=1

1
1
Stefano Nasini, Víctor Martínez-de-Albéniz












ENBIS-Spring-meeting-2015
for ii.
!
,
Data sets from the music broadcasting industry
Multidimensional panel data
An exponential random model
Estimation method
Multidimensional Gaussian reduction
The exponential family of distribution
An exponential random model
α = 1 and γ = 1
α = −1 and γ = 1
Stefano Nasini, Víctor Martínez-de-Albéniz
Spillover measurement
1
x!y !
exp(α(x + y ) + γ(xy )1/2 )
1
x!y !
exp(α(x + y ) + γ|x − y |)
ENBIS-Spring-meeting-2015
Data sets from the music broadcasting industry
Multidimensional panel data
An exponential random model
Estimation method
Multidimensional Gaussian reduction
The exponential family of distribution
Multidimensional Gaussian reduction
Under special conditions:


P(xst | xs,t−1 , . . . , xs,t−τ ) ∝ h(xst ) exp αst Ss +
X
βr Rr +
r ∈R
- Ghk (xst ; xs,t−1 , . . . , xs,t−τ ) =
- h(xst ) = (2π)−




Xst
.
.
.
Xs,t−τ

(τ +1)|R|
2
Pτ
`=0
X
γhk Ghk 
(h, k )∈E
d` xsht xsk (t−`) ;
;



 ∼ N (µ, Σ) , where µ = Σ 



αst e + β

1
.
 and Σ = −
.

.
2
αs,t−τ e + β
Stefano Nasini, Víctor Martínez-de-Albéniz




d0 Γ
.
.
.
dτ Γ
ENBIS-Spring-meeting-2015
...
...
−1
dτ Γ

.

.
.

.
d0 Γ
Data sets from the music broadcasting industry
Multidimensional panel data
An exponential random model
Estimation method
Multidimensional Gaussian reduction
The exponential family of distribution
Why is our model an extension of the ERGM?
Exponential Family
Whenever the density of a random variable may be written f (x) ∝ h(x) exp{θT C(x)}
the family of all such random variables (for all possible θ) is called an exponential
family.
Exponential Random Graph Model (ERGM)
Pθ (X = x) =
exp{θ T C(x)}
, where
Z (θ)
X is a random network on n nodes (a matrix of 0’s and 1’s);
θ is a vector of parameters;
C(x) is a known vector of graph statistics on x.
Stefano Nasini, Víctor Martínez-de-Albéniz
ENBIS-Spring-meeting-2015
Data sets from the music broadcasting industry
Multidimensional panel data
An exponential random model
Estimation method
Multidimensional Gaussian reduction
The exponential family of distribution
Why it is difficult to find the MLE
The log-likelihood function
- the model: P(X = x(0) |θ) =
exp{θ T C(x(0) )}
, where x(0) is the
Z (θ)
observed data set.
- The log-likelihood function is
`(θ)
= θ T C(x(0) ) − log Z (θ)
X
= θ T C(x(0) ) − log
!
T
exp{θ C(x)}
all possible x
- Even in the simplest case of undirected graphs without
self-edges, the number of graphs in the sum is very large.
Stefano Nasini, Víctor Martínez-de-Albéniz
ENBIS-Spring-meeting-2015
Data sets from the music broadcasting industry
Multidimensional panel data
An exponential random model
Estimation method
Multidimensional Gaussian reduction
The exponential family of distribution
Maximum Pseudo-likelihood
Let xw be a unique component of x and x−w the vector of all the
remaining components.
The pseudo-likelihood function
Let’s approximate the marginal P(xw |θ) by the conditional
P(xw |x−w ; θ)?
Y
e =
Then `(θ)
P(xw |x−w ; θ).
w
Result: The maximum pseudo-likelihood estimate.
Unfortunately, little is known about the quality of MPL estimates.
Stefano Nasini, Víctor Martínez-de-Albéniz
ENBIS-Spring-meeting-2015
Data sets from the music broadcasting industry
Multidimensional panel data
An exponential random model
Estimation method
Multidimensional Gaussian reduction
The exponential family of distribution
Pseudo-likelihood for ERGM
Notation: For a network x and a pair (i, j) of nodes
e
`(θ)
=
Y
P(xw |x−w ; θ)
w
=
exp{θ T C(x(0) )}
Y
T
(i,j)
=Q
exp{θ C(xij = 1, x−ij )} + exp{θ T C(xij = 0, x−ij )}
(i,j)
exp{n(n − 1)θ T C(x(0) )}
exp{θ T C(xij = 1, x−ij )} + exp{θ T C(xij = 0, x−ij )}
Stefano Nasini, Víctor Martínez-de-Albéniz
ENBIS-Spring-meeting-2015
Data sets from the music broadcasting industry
Multidimensional panel data
An exponential random model
Estimation method
Multidimensional Gaussian reduction
The exponential family of distribution
Pseudo-likelihood for our model
Pseudo-likelihood for our model
e
`(θ)
=
Y
P(xsrt | xsr 0 t 0 such that r 0 6= r , t 0 < t)
(r ,t)
∝
Y
1
(r ,t)
xsrt !
exp
αst + βr
η
T xsrt
C(xsrt )
!
,
What is the normalizing constant for the full conditional?
Z (αst , βr , η) =
X
xsrt ≥0
1
xsrt !
exp
αst + βr
η
T xsrt
C(xsrt )
Even the pseudo-likelihood is hard to define for our model
Stefano Nasini, Víctor Martínez-de-Albéniz
ENBIS-Spring-meeting-2015
!
Data sets from the music broadcasting industry
Multidimensional panel data
An exponential random model
Estimation method
Multidimensional Gaussian reduction
The exponential family of distribution
Pseudo-likelihood for our model
Pseudo-likelihood for our model
e
`(θ)
=
Y
P(xsrt | xsr 0 t 0 such that r 0 6= r , t 0 < t)
(r ,t)
∝
Y
1
(r ,t)
xsrt !
exp
αst + βr
η
T xsrt
C(xsrt )
!
,
What is the normalizing constant for the full conditional?
Z (αst , βr , η) =
X
xsrt ≥0
1
xsrt !
exp
αst + βr
η
T xsrt
C(xsrt )
Even the pseudo-likelihood is hard to define for our model
Stefano Nasini, Víctor Martínez-de-Albéniz
ENBIS-Spring-meeting-2015
!
Data sets from the music broadcasting industry
Multidimensional panel data
An exponential random model
Estimation method
Goodness of fit
Bayesian posterior
Let θ = [α1t , . . . , α|S|t , β1 , . . . , β|R| , γ11 , . . . , γ|R|,|R| ]T be the vector of natural
parameters, π(θ) a prior distribution and x(0) the observed data set. By applying the
Bayes rule we have:
P(θ | x(0) )
P(x(0) | θ)π(θ)
= Z
P(x(0) | θ)π(θ) dθ
θ
P(x1 . . . , xτ ; θ)
= Z
P(x1 . . . , xτ ; θ)
θ
w
Y
P(xt | xt−1 . . . , xt−τ ; θ)π(θ)
t = τ +1
w
Y
P(xt | xt−1 . . . , xt−τ ; θ)π(θ) dθ
t = τ +1
P(x1 . . . , xτ ; θ)
= Z
P(x1 . . . , xτ ; θ)
θ
Stefano Nasini, Víctor Martínez-de-Albéniz
π(θ)
Z (θ)
π(θ)
Z (θ)
w
m
Y
Y
t = τ +1 s=1
w
m
Y
Y
qs,t,θ (xst )
qs,t,θ (xst ) dθ
t = τ +1 s=1
ENBIS-Spring-meeting-2015
Data sets from the music broadcasting industry
Multidimensional panel data
An exponential random model
Estimation method
Goodness of fit
Metropolis-Hastings
Since both P(x(0) | θ) and P(θ | x(0) ) can only be specified under proportionality
conditions, almost all known valid MCMC algorithms for θ cannot be applied.
Consider for instance the Metropolis-Hastings acceptance probability:
πaccept (θ, θ 0 )
= min
= min
1,











P(x(0) | θ 0 )π(θ 0 )
Q(θ | θ 0 )
×
Q(θ 0 | θ)
P(x(0) | θ)π(θ)
P(x1 . . . , xτ ; θ 0 )
1,
P(x1 . . . , xτ ; θ)
w
m
Y
Y
t = τ +1 s=1
w
m
Y
Y
qs,t,θ0 (xst )π(θ 0 )
×
qs,t,θ (xst )π(θ)
t = τ +1 s=1
where Q(θ 0 | θ) is the proposal distribution.
Stefano Nasini, Víctor Martínez-de-Albéniz
ENBIS-Spring-meeting-2015




0 
Z (θ) Q(θ | θ ) 
Z (θ 0 )Q(θ 0 | θ) 




Data sets from the music broadcasting industry
Multidimensional panel data
An exponential random model
Estimation method
Goodness of fit
Specialized MCMC for doubly intractable distributions
Murray proposed a MCMC approach which overcomes the drawback to a
large extent, based on the simulation of the joint distribution of the parameter
and the sample spaces, conditioned to the observed data set x(0) , that is to
say P(x, θ | x(0) ).
Algorithm 1 Exchange algorithm of Murray.
1: Initialize θ
2: repeat
3:
Draw θ 0 from an arbitrary proposal distribution;
4:
Draw x0 from P(. | θ 0 )
P(x0 | θ)P(x(0) | θ 0 )π(θ 0 )
Accept θ 0 with probability min 1,
P(x(0) | θ)P(x0 | θ 0 )π(θ)
6:
Update θ
7: until Convergence
5:
Stefano Nasini, Víctor Martínez-de-Albéniz
ENBIS-Spring-meeting-2015
Data sets from the music broadcasting industry
Multidimensional panel data
An exponential random model
Estimation method
Goodness of fit
Goodness of fit: graphical illustration
Total number of plays along time by the top-30 songs
(a) Full model.
Stefano Nasini, Víctor Martínez-de-Albéniz
(b) Null model (γ = 0).
ENBIS-Spring-meeting-2015
Data sets from the music broadcasting industry
Multidimensional panel data
An exponential random model
Estimation method
Goodness of fit
Goodness of fit: graphical illustration
Total number of plays along time by the top-30 songs
(a) Total plays along time.
Stefano Nasini, Víctor Martínez-de-Albéniz
(b) Market share.
ENBIS-Spring-meeting-2015
Data sets from the music broadcasting industry
Multidimensional panel data
An exponential random model
Estimation method
Goodness of fit
Reducing the dimensionality of the parameter space
Model specification based on structural
properties of the music industry
The parameter space is the whole (|T | × |S| + |R| + |E|)-dimensional
Euclidean space, while the sample space has dimension (|T | × |S| × |R|).
We use two strategies to reduce the dimensionality of the parameter space:
A. Define communities of broadcasting companies to consider only within-group
spillover effects γ;
B. Define a functional form for the effect of the song life cycle α.
Stefano Nasini, Víctor Martínez-de-Albéniz
ENBIS-Spring-meeting-2015
Data sets from the music broadcasting industry
Multidimensional panel data
An exponential random model
Estimation method
Goodness of fit
Reducing the dimensionality of the parameter space
A. Reducing the |E| effects γ
Pairwise spillover effects γkh ,
between individual
companies h and k with the
same radio format.
B. Reducing the |T | × |S| effects α
The broadcasting pattern of songs
exhibit a time window in which their
popularity quickly increases shortly
after their premier and then
decreases.
Common spillover effect
between different radio
formats γkh , if h and k have
different formats.
Stefano Nasini, Víctor Martínez-de-Albéniz
ENBIS-Spring-meeting-2015
Data sets from the music broadcasting industry
Multidimensional panel data
An exponential random model
Estimation method
Goodness of fit
Groups of broadcasting companies
WITHIN FORMAT – BETWEEN FORMATS
TV channels
Let’s introduce only the effects γ which
are associated to TV channels and radio
station of the same format.
Contemporary and Easy listening
Top 40 and Urban
Radio stations
Rock music
Stefano Nasini, Víctor Martínez-de-Albéniz
ENBIS-Spring-meeting-2015
Data sets from the music broadcasting industry
Multidimensional panel data
An exponential random model
Estimation method
Goodness of fit
The estimated spillover effects
The estimated spillover effects
Contemporary
Contemporary
Rock
News
Sport
Top-40
World-Music
(−0.035, −0.021)
(−0.023, 0.047)
(−0.009, 0.076)
(−0.070, 0.001)
(−0.017, 0.014)
Rock
(−0.089, 0.004)
News
(0.012, 0.021)
(−0.049, 0.037)
(−0.072, −0.010)
(−0.036, −0.001) (−0.068, 0.030)
(−0.083, 0.022) (−0.052, 0.000)
(−0.029, 0.036) (−0.022, 0.005)
(−0.038, 0.022)
(−0.017, 0.011)
Top-40
(−0.164, 0.012)
(−0.032, 0.001)
(−0.005, −0.024)
(−0.015, 0.013)
World-Music
(−0.030, 0.019)
(−0.015, −0.021)
(0.009, 0.030)
(−0.029, 0.001)
(−0.025, 0.019)
TV channels
(−0.186, −0.068)
(−0.014, 0.024)
(−0.291, −0.038)
TV channels
BBC 1 Xtra
BBC 1 Xtra
Capital FM
Kiss 100 FM
Metro Radio
Radio City
Smooth R. London
Sport
(−0.028, 0.014)
(−0.018, 0.001)
(−0.035, 0.008)
(−0.015, 0.051)
(−0.020, 0.124)
(−0.008, 0.094)
(−0.019, 0.110)
(−0.033, 0.011)
Capital FM
(−0.009, 0.060)
(−0.028, 0.025)
(−0.009, 0.012)
(−0.040, 0.012)
(−0.021, 0.014)
Kiss 100 FM
(−0.104, 0.057)
(−0.060, 0.001)
(−0.027, 0.026)
(−0.015, 0.022)
(−0.022, 0.016)
Stefano Nasini, Víctor Martínez-de-Albéniz
Metro Radio
(−0.015, 0.012)
(−0.009, 0.025)
(−0.009, 0.025)
(−0.021, 0.009)
(−0.032, 0.023)
Radio City
(0.005, 0.024)
(0.000, 0.025)
(−0.032, 0.021)
(−0.014, 0.037)
(−0.022, 0.001)
ENBIS-Spring-meeting-2015
Smooth R. London
(−0.015, 0.012)
(−0.013, 0.019)
(0.001, 0.029)
(0.000, 0.055)
(0.010, 0.033)
Data sets from the music broadcasting industry
Multidimensional panel data
An exponential random model
Estimation method
Goodness of fit
Songs’ dynamics
Define a functional form for the effect of song dynamics
The attractiveness trajectory of the sth song can be specified by letting t0 be
the starting week when the song is launched and then considering a gamma
kernel to design the shape its time dynamics:
0
δs + δs1 (t − t0 ) + δs2 log(t − t0 )
if t > t0
αst =
−∞
otherwise
where t0 is the week when the song has been launched.
Stefano Nasini, Víctor Martínez-de-Albéniz
ENBIS-Spring-meeting-2015
Data sets from the music broadcasting industry
Multidimensional panel data
An exponential random model
Estimation method
Goodness of fit
Songs life cycle
Common life cycle of the top-30 songs
Stefano Nasini, Víctor Martínez-de-Albéniz
ENBIS-Spring-meeting-2015
Data sets from the music broadcasting industry
Multidimensional panel data
An exponential random model
Estimation method
Goodness of fit
Propagation of the broadcasting decision after the
premier week t0 .
max
T
1 X X
S s∈S 0
t =1
X
subject to
yr = 1
i
h
E xs,•,t+t 0 |xsrt = zr : for all r ∈ R ,
r ∈R
zr ≤ min{Myr , φ}
r ∈ R,
yr ∈ {0, 1}, zr ≥ 0, F ≥ φ ≥ 0
r ∈ R,
Format
Eigenvector
Contemporary
Rock
News
Sport
Top-40
World Music
TV-channels
0.098
0.121
0.098
0.177
0.097
0.187
0.101
Expected plays in t0 + 1
φ = 10
φ = 100
265.795
267.647
265.209
261.803
265.609
264.058
260.301
263.021
264.272
265.318
267.345
266.350
264.165
263.425
Stefano Nasini, Víctor Martínez-de-Albéniz
Expected plays in t0 + 2
φ = 10
φ = 100
265.949
267.720
265.687
261.381
265.995
263.211
260.875
263.055
264.879
265.098
266.858
266.603
264.171
263.438
ENBIS-Spring-meeting-2015
Data sets from the music broadcasting industry
Multidimensional panel data
An exponential random model
Estimation method
Goodness of fit
Discussion
Which are the real achievements of this work?
We considered a large multidimensional panel of songs weekly
broadcasted on radio stations and TV channels and detect a pattern of
cross-section dependencies, based on pairwise imitations.
An exponential random model has been proposed to internalized in a
unique probabilistic framework both the songs’ life cycle and the
complex correlation structure.
A specialized MCMC method has been implemented to estimate the
model parameters.
The out-of-sample goodness of fit has been analyzed, assessing the
model adequacy for the observed data set.
Stefano Nasini, Víctor Martínez-de-Albéniz
ENBIS-Spring-meeting-2015
Data sets from the music broadcasting industry
Multidimensional panel data
An exponential random model
Estimation method
Goodness of fit
THANK YOU FOR YOUR ATTENTION
Acknowledgements The research leading to these results has
received funding from the European Research Council under the
European Union’s Seventh Framework Programme (FP/2007-2013) /
ERC Grant Agreement n. 283300.
Stefano Nasini, Víctor Martínez-de-Albéniz
ENBIS-Spring-meeting-2015