Dynamic Estimation of Media Slant

Transcription

Dynamic Estimation of Media Slant
Dynamic Estimation of Media Slant∗
Jong Hee Park
Department of Political Science and International Relations
Seoul National University
http://jhp.snu.ac.kr
[email protected]
January 4, 2016
Abstract
Existing methods for media slant estimation focus on how to map observed
text data with a low dimensional vector of fixed quantities. In doing so, these
methods ignore the sequence of news and fail to consider the possibility of changes
in media slant. In this paper, we highlight the chronological characteristic of news
reports and develop Bayesian statistical methods that allow a joint estimation of
time-varying public opinion trends and media slant subject to multiple discrete
changes. Automated text analysis methods are used to extract relative frequency
of partisan words from news reports. We apply this method to news reports covering the Sewol ferry disaster by nineteen daily newspapers of South Korea.
Keyword : media slant, dynamic linear model, hidden Markov model, automated
text analysis, South Korea
∗
Prepared for the Asian Political Methodology Meeting at Tsinghua University in January 2016.
1
1
Introduction
In a modern democratic society, mass media reflect and shape public opinion. As strategic
actors and professionals pursuing independence, journalists and editors wish to be more than
“a conveyor belt” for what has happened and what politicians have said (Zaller, 1999, 1-2).
Mass media play a critical role in framing, disseminating, and constructing public issues. On
the other hand, mass media cannot report an “unrepresentative sample” of public opinion
consistently because of market pressure. Irresponsive media, if disfavored by consumers, will
quickly lose market shares. As the number and types of media outlets increase, behaviors of
mass media have become increasingly constrained by the market competition. Considering
the constant and complex interactions between mass media and public opinion, it is essential
to explicate the public opinion-media nexus to properly understand the role of mass media
in a democracy.
Among many important roles of mass media in a democracy, theoretical and empirical
investigations of “media slant”1 –“media bias” – have recently received increasing attention
from social scientists (Groseclose and Milyo, 2005; Baron, 2006; Matthew Gentzkow, 2006;
Bernhardt, Krasa and Polborn, 2008; Gerber and Bergan, 2009; Gentzkow and Shapiro,
2010; Larcinese, Puglisi and James M. Snyder, 2007; Stone, 2011; Gans and Leigh, 2012;
Strömberg, 2015; Agirdas, 2015). This paper contributes to the literature by developing a
new method for dynamic estimation of media slant.
In our dynamic modeling framework, the news content generating process is assumed to
be a stochastic process where public opinion and mass media constantly interact with each
other. We consider media slant – or any quantity we derive from time-ordered news contents
– as time-indexed and subject to temporal changes. Such characterization of the news
generating process as stochastic marks an important departure from the existing literature.
Most existing methods for the media slant estimation focus on how to map observed text data
with a low dimensional vector of fixed quantities, such as sentiments, slants, or ideologies
(Groseclose and Milyo, 2005; Gentzkow and Shapiro, 2010; Gans and Leigh, 2012; Taddy,
2013). In doing so, these methods treat each document as “an exchangeable collection
of phrase tokens” (Taddy, 2013, 755) and fail to consider the temporal order in the news
content generating process. For example, Gentzkow and Shapiro (2010) treat their key input
measures, phrase frequencies in newspapers and phrase frequencies in the 2005 Congressional
Record, as samples randomly drawn from a population of all phrases.
There are other reasons to consider the possibility of changes in media slant rather than
the mere fact that news contents are time-ordered. Epochal changes in a society fundamentally reshape both positions of mass media and directions of public opinion. If we fail
to account for those massive and long-lasting changes in the society, our measurement of
media slant will be biased in an unknown direction and inconsistent in a statistical sense.
Moreover, even though politicians’ use of partisan phrases shows a consistent pattern, movements of public opinion and media reports as reflections of public opinion at each time tend
to show nonstationary patterns even for a short period of time. Public opinion movements
have periods of a persistent trend and periods of irregular changes. For example, Gentzkow
and Shapiro (2010) assume that each media outlet chooses “estate tax” over “death tax,”
1
We define media slant as relative ideological leanings of media outlets in news coverage.
2
or vice versa, in a consistent manner throughout the entire year of 2005. The assumption
of time homogeneity needs to be tested because observed phrase frequencies are sensitive to
public opinion changes or effects of events such as the enactment of the Death Tax Repeal
Permanency Act of 2005.2
In this paper, we propose a new method to jointly estimate public opinion trends and
media slant that aims to properly account for the time-ordered nature of the news content
generating process and public opinion movements. The first step is to extract a relative
frequency of partisan phrases from media reports using automated text analysis methods.
This can be done in various ways depending on the availability of other information that
helps us identify ideological dimensions of news content data. Then, we decompose the
time series cross-sectional data of the relative partisan word frequency into time-varying
movements of public opinion and hidden Markov transitions of media slant using dynamic
linear multilevel model with parametric breaks in the second level parameter. Instead of
imposing dynamics and Markov transitions on the data, we test the modeling assumption
of time-varying movements of public opinion and hidden Markov transitions of media slant
using two simpler models as benchmarks: the multilevel model of media slant and dynamic
linear multilevel model. For model diagnostics and changepoint detection, we utilize the
Widely Applicable Information Criterion (WAIC), an intuitive and effective Bayesian model
diagnostic tool (Watanabe, 2010).
2
The Proposed Method
Our method for a dynamic estimation of media slant consists of three steps. First, we identify
partisan phrases and quantify their relative frequency in media reports. Second, the word
selection on an issue by media is modeled as a function of public opinion toward the issue
at each time and a political ideology of the media. We discuss different types of dynamic
models for the slant estimation. Model diagnostics of competing dynamic models in order
to avoid overfitting is the last step of the proposed method.
2.1
Relative Frequency of Partisan Phrases
Many different methods have been proposed to identify partisan phrases from news contents,
mostly in the U.S. context. For example, Groseclose and Milyo (2005) consider the frequency
of citations to think-tank materials by legislators and journalists to indicate a partisan
leaning of the medium. Gentzkow and Shapiro (2010) use a hand-coded list of bigram or
trigram phrases that are more frequently used by one party members than the other; the
relative frequency of partisan phrases is then regressed on a congressperson’s ideology to
obtain a partisan score for the newspaper.
Generally speaking, however, the definition of partisan phrases varies by time and space.
It is hard to generalize the identification method of partisan phrases in the literature to
non-US cases because congressional voting and speech data are not always readily available.
Also, in some countries, like South Korea and the United Kingdom, party-line voting is
2
The bill was introduced by Rep. Hulshof, Kenny C. (R-MO) on Feb. 17, 2005 (https://www.congress.
gov/bill/109th-congress/house-bill/8).
3
quite dominant. Therefore, there is a need to develop a creative way to identify ideological
dimensions of news content data using other types of information.3 In the case of South
Korea–discussed in Section 3–political parties release press releases almost every day. Such
press releases can be used to uncover ideological dimensions of the news content data.
2.2
Models
Once we extract the relative frequency of partisan phrases, the next step is to develop an
empirical model of the news content generating process in which media slant is one of the
key random variables. In this section, we discuss three modeling options in the order of
model complexity: (1) the multilevel model (MLM), (2) the linear dynamic multilevel model
(DMLM), and (3) the hidden Markov linear dynamic multilevel model (HMDMLM). The
discussion of how to choose the “right” model using a Bayesian model diagnostic tool follows.
2.2.1
Model 1: Multilevel Model (MLM) of Media Slant
Using the relative frequency of partisan phrases for newspaper i at t, we model partisan
biases of each newspaper as random samples from an unknown normal distribution:
yit
αi
β
σy
σα
=
∼
∼
∼
∼
αi + β + σy it ,
N (µα , σα2 )
N (b0 , B0 )
IG(c0 , d0 )
IG(e0 , f0 )
it ∼ N (0, 1)
(1)
This is a baseline multilevel model upon which we will build additional complexity. Although this model is naive in that it ignores the time-varying nature of the news content
generation, there are three major advantages of modeling media slant within a multilevel
model.
First, the multilevel setup provides consistent estimates of group-level parameters from
time series cross-sectional data. Media slant is information at a group level that needs to
be gleaned across individual observations. When a group-level parameter is a quantity of
our substantive interest, we cannot use the fixed-effects method that removes group-level
constants for the consistent estimation of individual level slopes.
Second, the hierarchical structure of the multilevel model allows us to combine complex
dynamic models in the context of the time series cross-section data analysis. As it will be
discussed shortly, we can add various time series models, such as an autoregressive model, a
linear dynamic model, and a hidden Markov model, to the multilevel model.
Last, and most importantly, the multilevel model summarizes variations across heterogeneous groups in a way that balances within-group variations with inter-group differences.
3
Alternatively, one can develop a sophisticated theoretical model that explains the news content generating process and can estimate theoretical parameters from the news content data. Recently, a group
of scholars are working on deriving ideological measures from a theoretical model of (slanted) word choice
in text document (Baron, 2006; Bernhardt, Krasa and Polborn, 2008; Stone, 2011; Kim, Londregan and
Ratkovic, 2015).
4
This characteristic, known as “partial pooling” in the multilevel literature, ensures us to
avoid overfitting the data. Overfitting usually happens in the estimation of media slant
when we estimate media slant of newspaper i only using data observed from i. In that case,
i’s slant estimate captures only the realized i’s information without learning from other similarly generated data; thus, the resulting estimates tend to be overly confident about the
out-of-sample predictive accuracy of i’s slant estimate.
The posterior samples of slant estimates (αi ) can be decomposed into the sum of two
parts:
No pooling
Ti
1
Complete pooling
z }| {
2
z}|{
σy2
σα
(ȳi − β) + Ti
µα
(2)
αi = Ti
1
1
+ σ2
2 + σ2
σy2
σ
α
α
|
{zy
}
Partial pooling
where Ti is the number of newspaper i’s observation and ȳi is a group mean of newspaper i’s
relative partisan phrase frequency. Here, i’s slant is a weighted sum of i’s average distance
from the global mean (ȳi − β)) and the mean of all media slants (µα ).
The Markov chain Monte Carlo (MCMC) sampling algorithms of the MLM are well
known. Several Bayesian statistical packages in R provide functions to fit the MLM and
other Bayesian software like the BUGS (Bayesian inference Using Gibbs Sampling) project,
JAGS (Just Another Gibbs Sampler), and Stan (http://mc-stan.org) are also available.
However, it is not straightforward to add dynamic components to these packages. Thus, we
proceed to discuss the MCMC algorithms.
For efficient sampling of group level parameters, we decompose the posterior density into
three blocks as suggested by Algorithm 2 of Chib and Carlin (1999):
Z
2
2
p(β, σy , σα |y) = p(β, αi |y)p(σα2 |y, β, αi )p(σy2 |y, β, αi , σα2 , αi )dαi .
The full conditional distribution of p(β, αi |y) is decomposed into p(β|y)p(αi |y, β) so that the
sampling of β is not directly dependent upon αi . Using conjugate priors, all the sampling
steps can be done by the Gibbs sampler.
2.2.2
Model 2: Linear Dynamic Multilevel Model (DMLM) of Media Slant
Now, we add one layer of dynamics onto the MLM. The global mean of the relative partisan
phrase frequency follows a random walk where the current global mean is determined by the
previous global mean and a random shock. The size of a random shock (σβ ) is estimated by
5
data. The resulting model takes a following form:
yit
αi
βt
β1
σy
σα
σβ
=
∼
=
∼
∼
∼
∼
αi + βt + σy it , it ∼ N (0, 1)
N (µα , σα2 )
βt−1 + σβ t for t > 1
N (b0 , B0 )
IG(c0 , d0 )
IG(e0 , f0 )
IG(g0 , h0 ).
(3)
The above model was first introduced by Jackman (2005) who used it to estimate house
effects in time series cross-sectional polling data.
A natural question arising from this model setup is what the time-varying global mean of
the relative partisan phrase frequency (βt ) substantively means. Technically speaking, βt is
the average level of partisan slant reflected in news reports measured at t. βt follows a random
walk. Thus, the current βt is highly predictable from βt−1 , but not from β t−2 ≡ (βt−2 . . . β1 ).
We interpret β ≡ (β1 . . . βT ) as movements of public opinion showing the average partisan
slant among the public observed by the mass media in the sample. Public opinion changes
smoothly over time, mostly reflecting what people thought right before. Yet, it sometimes
changes dramatically in one way or the other in response to external shocks. Such irregular
movements are modeled as a random walk process in our model, and the size of irregularity
is estimated by the transition variance (σβ2 ).
Interpreting β as public opinion movements makes it easy to substantively understand
our estimates of media slant. The posterior samples of slant estimates (αi ) from the DMLM
consist of the sum of two parts:
αi =
i’s average distance from public opinion
Ti
σy2 +σβ2
Ti
σy2 +σβ2
+
1
2
σα
zX }|
{
(yi,t − zit β)
t∈Ti
+
1
2
σα
Ti
σy2 +σβ2
+
1
2
σα
µα
(4)
where Ti is a collection of time indices realized in i’s data and zit is tth row of zi which is a
Ti × T matrix that identifies specific time indices realized in i’s data.4
As noted by the brace in Equation (4), the DLML constructs
newspaper i’s slant as a
P
weighted sum of i’s average distance from public opinion ( t∈Ti (yi,t − zit β)) and the mean of
all media slants (µα ). This formula allows us to consider the time-varying nature of public
opinion and anchor our slant measure, accordingly, which makes more sense than anchoring
4
For example, if newspaper i has three observations at t = 1, 3, 5 and the maximum range of time series
in the data set is t = (1, 2, 3, 4, 5), then
 
β1


β2 
1 0 0 0 0
 

zi = 0 0 1 0 0 , β = 
β3  .
β4 
0 0 0 0 1
β5
6
media slant to the constant global mean (β) in Equation (1).
Distance from the public opinion
Distance from the public opinion
0.0
Relative Frequency
0.2
0.4
Measuring Slant from the Public Opinion
Distance from the public opinion
Distance from the public opinion
Distance from the public opinion
2014−06−10
2014−06−09
2014−06−08
2014−06−07
2014−06−06
2014−06−05
−0.4
−0.2
Distance from the public opinion
Figure 1: Media Slant as Average Distance from Public Opinion: Data from 19 South Korean Newspapers,
April 2014 - April 2015. Red dots are relative partisan phrase frequencies of Munhwa Ilbo and the blue line
is estimated public opinion from 19 newspapers.
Figure 1 visualizes the idea of estimating media slant as an average distance from public
opinion. The data set used here is the relative partisan phrase frequency data from nineteen
South Korean newspapers over the Sewol ferry disaster. The blue line is estimated public
opinion (βt ) and red dots are relative partisan phrase frequencies of a newspaper called
Munhwa Ilbo. The braces between the red dots and the blue line indicate partisan slants of
Munhwa Ilbo at each time point. It is clear that Munhwa Ilbo’s partisan slants are located
above the public opinion except on June 9, 2014, indicating the possibility of Munhwa Ilbo’s
conservative bias.
The MCMC sampling algorithm of the DMLM involves two additional steps for β and σβ2 .
As the movement of β is modeled as a linear dynamic model, all the MCMC sampling steps
can be done by the Gibbs sampler. The posterior density can be decomposed as follows:
forward filtering backward sampling
p(β, σβ2 , σy2 , σα2 |y)
Z
=
z }| {
p(β|y)
p(σβ2 |y, β)p(αi |y, β, σβ2 )
p(σα2 |y, β, σβ2 , αi )p(σy2 |y, β, σβ2 , σα2 , αi )dαi .
The sampling of β is done by the forward filtering backward sampling (FFBS) algorithm
developed by Carter and Kohn (1994) and Frühwirth-Schnatter (1994). The sampling of σβ2
7
is from the following inverse gamma distribution:
σβ2 |y, β, σα2 , αi ∼ IG
2.2.3
g0 + T h0 +
,
2
Pt=1
T
2
(βt − βt−1 )
2
!
.
Model 3: Hidden Markov Linear Dynamic Multilevel Model (HMDMLM)
of Media Slant
The DMLM makes an important contribution to the literature by allowing the joint estimation of public opinion and constant parameters at the group level. However, it is logically
incoherent and theoretically unsatisfying to assume that only public opinion changes over
time while media slant remains constant. However strong partisan predispositions of mass
media might be, mass media cannot ignore massive and dramatic changes in public opinion.
Important social events – such as disasters, international disputes, economic crisis, political
scandals, critical elections, and social movements – affect how journalists think and write
about their society. Moreover, tides of public opinion reshuffle the popularity of mass media
in terms of their responsiveness. As a result, the assumption of constant media slant, however convenient it would be in terms of model estimation, must be tested with data rather
than to be taken for granted.
In order to let media slant change during the sample period, two assumptions are made.
First, changes of media slant are discrete rather than continuous. That is, media slant
changes take place not every time, but once in a while. This is a reasonable assumption
that reduces the complexity of model and the computational cost. Ideological positions of
mass media are usually consolidated over a long period of time by ownership, government
regulations, media market trends, recruitment, and promotion. They may change in response
to major changes in public opinion, but not always. Second, changes in media slant are
caused mainly by shifts in public opinion or positions of other media outlets. Changes in
media slant can be simply passive reflections of changes in public opinion. Or, they can be
a result of strategic repositioning by mass media in response to changes in positions of other
media outlets.
To highlight the difference between the DMLM and the HMDMLM, Figure 2 and Figure 3
compares the modeling structure of the two dynamic models. The arrows show the direction
of statistical dependence in each model. Observed partisan frequencies of each media (yit )
are generated by two stochastic variables: βt and αi,st . Values of αi,st are further dependent
upon latent states (st ), the transition of which is governed by βt . In plain words, public
opinion (βt ) is the main driving force of the news content generation process and changes in
ideological positions of mass media. We extract information about the transition of hidden
states by assuming that βt follows a local linear trend until unknown break points. That
is, the sampling of latent states is not dependent upon the response data in our model:
p(s|y, β, σβ2 , σα2 , αi , σy2 ) = p(s|β, σβ2 , σα2 , αi , σy2 ).
8
βt−1
yi,t−1
αi
βt
yi,t
αi
βt−1
βt
βt+1
yi,t−1
yi,t
yi,t+1
αi,(st−1 )
αi,(st )
αi,(st+1 )
st−1
st
st+1
βt+1
yi,t+1
αi
Figure 2: DMLM
Figure 3: HMDMLM
Based on these two assumptions, we introduce the HMDMLM to account for dramatic
changes in media slant and time-varying changes in public opinion. Compared with the
DMLM, two new parameters are added to the HMDMLM. The first one is latent state variables (s ≡ (s1 , . . . , sT )), and the second one is a transition matrix (P) which summarizes the
movement of latent state variables. We adopt Chib (1998)’s non-ergodic (i.e. non-switching
and forward moving Markov chain) design that efficiently identifies multiple changepoints
across various types of response data.
The resulting model can be written as follows:
yit = αi,st + βt + σy it , it ∼ N (0, 1)
βt = βt−1 + σβ t for t > 1
st |st−1 ∼ Markov(P, π0 )

2
if τ0 < t 6 τ1
 N (µα,1 , σα,1 )

..
.
αi,st ∼


2
N (µα,M +1 , σα,M
+1 ) if τM < t 6 τM +1
β1 ∼ N (b0 , B0 )
σy ∼ IG(c0 , d0 )
σα ∼ IG(e0 , f0 )
σβ ∼ IG(g0 , h0 )
pii ∼ Beta(a, b) for i = (1, . . . , M )
where τ0 = 0 and τM +1 = T . Note that the cutpoints (τ ≡ (τ0 , . . . , τM +1 )) are identified
by sampled latent state variables (s ≡ (s1 , . . . , sT )) at each MCMC simulation step. pii is a
probability of staying in the ith state. Although each row of the transition matrix follows a
Dirichlet distribution, there are only two non-zero elements in each row except the last row.
The sum of these two non-zero elements is 1 by definition. Hence we model only diagonal
elements (pii ) of the transition matrix as a Beta distribution.
Slant estimates (αi ) from the dynamic linear multilevel model with hidden Markov transi9
tions are state dependent. Suppose that the hidden state at t be m and Mi be the collection
of time indices pertaining to state m for media i. We interpret state dependent slant estimates (αi,m ) as media i’s average distance from public opinion during state m.
αi,st =m =
i’s average distance from public opinion during state m
mi
σy2 +σβ2
mi
σy2 +σβ2
+
zX
1
2
σα,m
}|
{
(yi,t − zit β)
+
t∈Mi
1
2
σα,m
mi
σy2 +σβ2
+
1
2
σα,m
µα,m
(5)
where mi is the number of media i’s observations during state m.
The MCMC sampling algorithm of the dynamic linear multilevel model with hidden
Markov transitions at the group-level parameters involves two additional steps for s and P
compared with that of the MLM-DLM. The posterior density can be decomposed as follows:
Z Z
2
2
2
p(β, σβ , σy , σα , P|y) =
p(β|y)p(σβ2 |y, β)p(αi |y, β, σβ2 )p(σα2 |y, β, σβ2 , αi )
multi-move sampling
z
}|
{
p(σy2 |y, β, σβ2 , σα2 , αi ) p(s|y, β, σβ2 , σα2 , αi , σy2 )
row-wise sampling from a Beta distribution
z
}|
{
2
2
2
p(P|y, β, σβ , σα , αi , σy , s)
dαi ds.
The sampling of pii can be easily done once we have samples of hidden states. Using the
Beta prior distribution,
π(pii |s) ∝ f (s|pii )Beta(a, b)
b−1
∝ pniiii (1 − pii )nij pa−1
ii (1 − pii )
pii |s ∼ Beta(a + nii , b + nij )
where nii be the number of one-step transitions from state i to i and nij be the number of
one-step transitions from state i to j.
The sampling of hidden state variables is done by the multi-move sampling algorithm
proposed by Chib (1996) and Chib (1998), and explained in detail by Frühwirth-Schnatter
(2006, 342-346). Note that we do not sample hidden states from response data (y) but from
β because, as discussed above, we identify the source of discrete changes in media slant from
dramatic changes in public opinion or position changes of other media outlets.
2.3
Model Validity Check
Like so, the paper presents statistical methods to jointly estimate dynamics of public opinion
and discrete changes in media slant. However, one important question we need to address
when developing complex models is: “is it really necessary?” That is, the models discussed
above will return estimates of dynamic public opinion and/or discrete changes in media
slant simply due to the structure of the model. Then, how do we know that the estimated
dynamics and changes are real? Cross-validation tests and Monte Carlo studies can be useful
and suggestive but they do not provide a definitive answer to the question of model validity.
10
In this Section, we discuss how to check the validity of the above models against data –
model diagnostics – and other alternative models – model comparison.
Discussions of the Bayesian model diagnostics and model comparison have been extensively covered in the statistics literature (Gelfand and Smith, 1990; Raftery, 1995; Kass and
Raftery, 1995; Green, 1995; Chib, 1995; Gelman et al., 2004; Vehtari and Lampinen, 2002;
Vehtari and Ojanen, 2012). The goal of the Bayesian model diagnostics and model comparison is to assess the posterior probability of the model. Assuming a uniform prior probability
for each model, the posterior probability of model i is:
p(y|Mi )
.
p(Mi |y) = PJ
j=1 p(y|Mj )
An important quantity that needs to be computed from each model to find the posterior
model probability is the marginal density of data given each model:
Z
p(y|Mi ) =
p(y|Θi , Mi )p(Θk |Mi )dΘi
where Θi denotes the parameter vector for model i.
However, it is very difficult to get a stable estimate of the marginal density of data except
in the case of simple models. Our dynamic linear multilevel models have many parameters,
most of which are latent. Thus, even if we manage to compute the marginal density of data
using an approximate method, the likely accuracy of the obtained values can be problematic.
An alternative way to check the model validity without computing the marginal density
of data is to use Watanabe-Akaike information criterion or widely applicable information
criterion (WAIC) proposed by Watanabe (2010). Gelman, Hwang and Vehtari (2014) provide
an introduction to the WAIC. The WAIC has several advantages in the Bayesian model
diagnostics. First, the WAIC approximates leave-one-out cross validation (LOO-CV) and
hence can serve as a metric for out-of-sample predictive accuracy of a model. We believe
that predictive accuracy of a model is a critical criterion for choosing the “right” model.
Second, the WAIC is relatively easy to compute. The computation of the WAIC can be
done at the end of MCMC sampling and does not involve additional MCMC runs. Third,
unlike a deviance information criterion (DIC) (Spiegelhalter et al., 2002, 2014), the WAIC
is fully Bayesian and applicable to both nonsingular and singular models.5 Last, estimates
of the WAIC are numerically stable.
We use Gelman, Hwang and Vehtari (2014)’s formula to compute WAIC. Let Θ denote
5
Singular models are statistical models with singularity in their Fisher-information matrix. When statistical models are singular, the assumption of large sample approximation in the Akaike information criterion
(AIC) or the Bayesian information criterion (BIC) does not hold. Mixture models and hidden Markov models
are examples of singular models.
11
all the parameters in a model. Then,
log pointwise predictive density =
N X
X
Z
log
p(yit |Θ)ppost dΘ
i=1 t∈Ti
pWAIC = 2
N X
X
(log(Epost p(yit |Θ)) − Epost (log p(yit |Θ)))
i=1 t∈Ti
WAIC = −2(log pointwise predictive density − pWAIC ).
The expectation and integration in the formula are easily computable using simulated MCMC
samples.
3
3.1
Analysis of Korean Media Slant: News Coverage of
the Sewol Ferry Disaster
Aftermath of the Sewol Ferry Disaster
On April 16, 2014, a South Korean ferry capsized in the Yellow Sea on its way from Incheon
to Jeju island. Out of total 476 people in the ferry including passengers and the crew, 295
died; 9 are still missing at the time of writing this Paper. Most of the victims were high
school students and teachers on a field trip. The Sewol disaster was considered as the worst
peacetime disaster in South Korean history, leaving a tremendous impact on the collective
memory of the public.
Immediately after the disaster, reactions of South Korean people were homogenous without any political divide. People on the left and right were both raged by factors that caused
the disaster. For example, the ferry company, Cheonghaejin Marine, ignored basic safety
protocols; government regulators blindly approved the condition of the freight before the departure; the captain and the crew who were not regular employees of Cheonghaejin Marine,
but those of a contractor who did not make an evacuation order even at the moment they
left the ferry; government rescue agencies and the Ministry of Security and Public Administration wasted critical moments for early rescuing; and the president did not comprehend
the seriousness of the disaster until several hours had passed after the incident. South Korean people were torn by pictures and clips taken by dead high school students in their last
minutes: they were firmly following the last order of stay inside calmly from the crew who
already deserted the ferry.
However, with the passage of time, media reports on the disaster started to diverge
following the ideological division. One group of media, usually rendered “progressive” or
“left-leaning,” considered the disaster as epitomizing systematic failure and government incompetence. They extensively reported demands by the families of the victims, civil organizations, and the opposition party (Saejungchi ). Another group of media, usually considered
“conservative” or “right-leaning,” focused on the investigation of the ferry company and the
crew. One of the most sensitive issues was whether the Congress should establish its own independent investigation agency and, if so, how much authority and power such agency should
12
have. What made this issue so sensitive is whether the president who did not take any action
for seven hours after the disaster should be investigated. The president’s party (Saenuri )
harshly denounced the demands for the investigation as pointless and ill-intentioned political attacks. The two electionsthe nationwide gubernatorial election on June 4, 2014 and
the by-election to fill fifteen vacancies in Congress on July 30, 2014 further escalated the
politicization of the issue.
In this Section, we estimate ideological leanings of nineteen South Korean newspapers
based on their Sewol-related news reports from the date of the incidence, April 16, 2014, to
April 11, 2015. Our quantity of interests is (1) whether South Korean newspapers showed
significant ideological leanings in their reports related to the Sewol ferry disaster and (2)
whether these ideological leanings have changed in response to changes in public opinion.
3.2
Data
For the analysis, we downloaded newspaper reports containing the term, “Sewol-ho” from
April 16, 2014 to April 11, 2015 on the NAVER News stand (http://news.naver.com).6
Nineteen major newspapers in South Korea are the targets of analysis. The total number
of downloaded newspaper reports containing “Sewol-ho” is 122,317. On average, each newspaper published 6,438 reports on various issues related with the ferry disaster for 360 days.
This is equivalent to 18 reports per day for each newspaper. We also downloaded the two
major political parties’ press releases containing “Sewol-ho” for the same period in order to
identify partisan phrases in the issues related with the disaster. The government party of
Saenuri published 334 press releases while the opposition party of Saejungchi published 577
press releases containing the term, “Sewol-ho.” We decompose each newspaper report and
press release by 20 morphemes using KoNLPy, a Python package for natural language processing of the Korean language (Park and Cho, 2014). We discard meaningless morphemes
in the Korean language.
Then, we search bigram or trigram phrases that were used by only one political party (oneparty phrases) regarding the Sewol ferry disaster. This sets the method apart from that of
Gentzkow and Shapiro (2010). Gentzkow and Shapiro (2010) choose partisan phrases by
identifying a list of partisan phrases that have a predictive power of individual legislators’
party identification. This “feature selection” method is not feasible in our case because we
do not have incident-related speech data at the individual legislator level. However, our
method of choosing partisan phrases has several advantages. First, our method of one-party
phrase selection is highly intuitive. If we find partisan phrases from a pool of phrases used
by both parties, we need to consider an arbitrary threshold in terms of asymmetric usage
to classify partisan words. Yet, there is no doubt that bigram or trigram phrases that are
used by only one political party and never used by the other party well satisfy the definition
of “partisan phrases.” Second, political parties choose phrases in their press releases very
cautiously and selectively. Unlike individual legislators’ speech, press releases rarely contain
a cheap talk or a slip of the tongue. They are well planned and sophisticatedly calculated
signals to outside audience.
6
The Naver is the most popular South Korean web search portal and the NAVER News stand is NAVER’s
online news content providing service.
13
After removing some meaningless phrases from a pool of one-party phrases, we have
ninety partisan phrases for the opposition party and thirty for the government party. Figure 4
shows partisan phrases of the opposition party. Some of the main phrases utilized to criticize
the government for mishandling the disaster are: congressional investigation, irresponsible
government, obstinacy, and deregulation. On the other hand, Figure 5 shows partisan phrases
of the government party in defense of the president. The government party accused the
opposition party of inciting conflicts and divisive conflicts; it emphasizes the normalization
of Congress and the bipartisan focus on the economy.
Figure 4: Partisan Words by the Opposition Party ( Saejungchi): Word sizes are adjusted by relative
frequency. Frequent words are located at the center.
Using the chosen partisan phrases, we compute the relative frequency of partisan phrases
for each media. Let O and G be the total counts of partisan phrases for the opposition party
Opposition
and the government party, respectively. Let fi,o,t
be the frequency of the oth opposition
party phrase within reports by media i at t. roOpposition is the weight of the oth opposition
party phrase measured by the relative frequency of the oth opposition party phras in the
opposition party’s press releases. We compute the relative frequency by a ratio of each phrase
frequency to the maximum frequency in each party’s phrases. We define quantities for the
government party similarly. Then, the relative frequency of partisan phrases for newspaper
i at t is defined as
yit =
G
X
Government
fi,g,t
×
rgGovernment
−
g=1
O
X
o=1
14
Opposition
fi,o,t
× roOpposition .
(6)
Figure 5: Partisan Words by the Government Party ( Saenuri): Word sizes are adjusted by relative frequency. Frequent words are located at the center.
We subtract frequencies of the opposition party phrases from those of the government
party phrases so that the negative sign of yit indicates the liberal direction and the positive
sign indicates the conservative direction. Since opposition party phrases are more verbose
than government party phrases, the mean of yit is not necessarily close to zero.
Figure 6 shows yit (colored dots) for the nineteen South Korean newspapers over the sample period. The thick solid line in the middle indicates daily averages of yit . Positive signs
indicate conservative media reports, and negative signs indicate liberal media reports. Overall, daily averages of the relative partisan phrase frequency are below zero; the distribution
of yit is quite large.
Figure 7 illustrates a close look at yit by displaying data for only two newspapers: Munhwa
Ilbo and Hangyereh Shinmun. Generally speaking, Munhwa Ilbo is considered as one of most
conservative newspapers while Hangyereh Shinmun is one of most liberal newspapers in
South Korea. Reflecting this conventional belief, the relative partisan phrase frequencies of
Hangyereh Shinmun are almost always located below those of Munhwa Ilbo. The average
relative partisan phrase frequencies of Hangyereh Shinmun is -3.40 (standard deviation =
5.45) and that of Munhwa Ilbo is 0.01 (standard deviation = 2.07).
15
16
10
20
2015−03−22
2015−03−12
2015−03−02
2015−02−20
2015−02−10
2015−01−31
2015−01−21
2015−01−11
2015−01−01
2014−12−22
2014−12−12
2014−12−02
2014−11−22
2014−11−12
2014−11−02
2014−10−23
2014−10−13
2014−10−03
2014−09−23
2014−09−13
2014−09−03
2014−08−24
2014−08−14
2014−08−04
2014−07−25
2014−07−15
2014−07−05
2014−06−25
2014−06−15
2014−06−05
2014−05−26
2014−05−16
2014−05−06
2014−04−26
2014−04−16
2015−04−11
0
2015−04−01
−10
Relative Frequency
Partisan Word Relative Frequency
2015−04−11
−20
Figure 7: Relative Partisan Phrase Frequencies of Two Newspaper Companies: Munhwa Ilbo and Hangyereh Shinmun
2015−04−01
2015−03−22
2015−03−12
2015−03−02
2015−02−20
2015−02−10
2015−01−31
2015−01−21
2015−01−11
2015−01−01
2014−12−22
2014−12−12
2014−12−02
2014−11−22
2014−11−12
2014−11−02
2014−10−23
2014−10−13
2014−10−03
2014−09−23
2014−09−13
2014−09−03
2014−08−24
2014−08−14
2014−08−04
2014−07−25
2014−07−15
2014−07−05
2014−06−25
2014−06−15
2014−06−05
2014−05−26
2014−05−16
2014−05−06
2014−04−26
2014−04−16
−30
−25
−20
−15
−10
Relative Frequency
−5
0
5
Figure 6: Relative Partisan Phrase Frequencies of Nineteen Newspaper Companies: Colored dots are rel-
ative frequencies of partisan phrases for each newspaper and the red thick line is daily averages of relative
frequencies of partisan phrases.
Partisan Word Relative Frequency
4
Results
4.1
Model Selection Result
Table 1 demonstrates the results of model diagnostics for the nine models. The largest WAIC
in the second row assures that the static multilevel model has the smallest predictive power
and that it poorly fits the data. Although all the dynamic models show much smaller WAIC
values, the HMDMLMs have better predictive power than the DMLM. Among them, the
five break model (HMDMLM (5)) shows the smallest WAIC; it is analyzed in detail in the
following.
Table 1: Model Diagnostics: A linear dynamic multilevel model with 6 parametric breaks has the smallest
WAIC.
Model
MLM
DMLM
HMDMLM
HMDMLM
HMDMLM
HMDMLM
HMDMLM
HMDMLM
HMDMLM
Break Number
0
0
1
2
3
4
5
6
7
WAIC
32965.996
32801.005
31358.685
31375.490
31360.990
31372.260
31321.365
31332.379
31363.821
log pointwise predictive density
-16442.941
-16353.122
-15642.791
-15639.193
-15628.587
-15624.030
-15620.046
-15621.829
-15618.744
pwaic
40.057
47.381
36.551
48.553
51.907
62.100
40.636
44.360
63.166
Figure 8: Public Opinion Trend and 6 Breaks: Expected break points are 2014-05-26, 2014-08-01, 2014-0919, 2014-12-08, 2015-02-18.
−10
−20
−15
Slant
−5
0
5
Public Opinion Trend
Trend (no break)
Trend (5 breaks)
Figure 8 shows the estimated public opinion trend and timings of the five breaks. The
first regime from April 16, 2014 to May 26, 2014 is distinguished by strongly negative values
17
2015−04−11
2015−04−01
2015−03−22
2015−03−12
2015−03−02
2015−02−20
2015−02−10
2015−01−31
2015−01−21
2015−01−11
2015−01−01
2014−12−22
2014−12−12
2014−12−02
2014−11−22
2014−11−12
2014−11−02
2014−10−23
2014−10−13
2014−10−03
2014−09−23
2014−09−13
2014−09−03
2014−08−24
2014−08−14
2014−08−04
2014−07−25
2014−07−15
2014−07−05
2014−06−25
2014−06−15
2014−06−05
2014−05−26
2014−05−16
2014−05−06
2014−04−26
2014−04−16
−25
Break Timing
of the relative frequency, reflecting the immediate impact of the Sewol ferry disaster on
public opinion. The second regime, which is also distinguished by negative values of the
relative frequency, lasts from May 26, 2014 to August 1, 2014. Note that these two regimes
are identified by the gubernatorial election of June 4 and the by-election of July 30, 2014.
The two elections were significant political events that electrified partisan accusations over
who was to blame. To many’s surprise, the government party managed to win eight out of
seventeen gubernatorial offices in the nationwide gubernatorial election in June and to sweep
eleven out of fifteen seats in the by-election in July. After the two elections, especially the
July one, public opinion quickly moved in the conservative direction.
The fourth regime started from September 19, 2014 and ended at December 8, 2014.
The beginning of the fourth regime is likely to be marked by the driver beating incident;
on September 17th, a member of the victim’s family and a member of the opposition party
got involved in beating a driver from a driving escort company, intensifying the critical
voice against the victims’ families and the opposition party. In November 2014, the victim’s
families agreed to accept the bipartisan proposal for the special Sewol law that included
plans for the congressional investigation and compensation. The fifth regime from December
8, 2014 to February 18, 2015 is very similar to the fourth regime except the fact that the
volatility in the relative frequency is larger in the fifth regime than in the fourth regime.
The relatively stable public opinion during the fourth and fifth regime is largely due to the
successful bipartisan agreement of the special Sewol law in the National Assembly.
The stable public opinion continued until March 27, 2015, from which public opinion began to sharply move towards the liberal direction. On March 27, the government announced
the enforcement ordinance of the special Sewol law under which the government can freely
appoint their own bureaucrats to important positions within the special investigation committee. The opposition party and the victims’ families harshly criticized the ordinance; even
some members of the ruling party considered the ordinance as the usurpation of legislative
power. Reflecting these concerns, public opinion turned away from the government and the
ruling party at the end of March 2015. Unfortunately, this sharp drop in the public opinion
is not detected as another break due to the lack of information at the end of the sample
period.
4.2
Slant Estimates
Before discussing time-varying slant estimates from the HMDMLM, we first examine timeconstant slant estimates from the DMLM for the sake of comparison. Figure 9 shows timeconstant measures of media slant from the DMLM. The dots are posterior means, and
horizontal bars are 95% credible intervals. South Korean newspapers are quite selective in
reporting partisan phrases of the two parties. We can roughly classify the nineteen newspapers into four groups: strongly liberal, moderately liberal, moderately conservative, and
strongly conservative. Hangyereh Shinmun and Kyunghang Shinmun distinguish themselves
as most liberal. Asia Economy, Hanguk Ilbo, Seoul Shinmun, Money Today, Kukmin Ilbo and
Segye Ilbo can be considered as moderately liberal. Herald Economy, Donga Ilbo, Hanguk
Economy, Seoul Economy and Joongang Ilbo are moderately conservative. Lastly, Chosun
Ilbo, Mail Economy, Financial News, Digital Times, Junja Shinmun and Munhwa Ilbo are
most conservative in their choice of partisan phrases regarding the Sewol ferry disaster.
18
Figure 9: Time-Constant Media Slants After Controlling for Public Opinion
No Break
●
●
●
●
●
●
Munhwa
Junja
Digital Times
Financial News
Mail Economy
Chosun
●
●
●
●
●
Joongang
Media
Seoul Economy
Hanguk Economy
Donga
Herald Economy
●
Segye
●
●
●
●
●
Kukmin
Money Today
Seoul
Hanguk
Asia Economy
●
Kyunghang
●
Hangyereh
−3
−2
−1
0
1
Slant
This classification is quite consistent with previous studies on the ideological identification of the South Korean newspapers. For example, Lee and Koh measure political ideologies
of five major newspapers in South Korea (Joongang Ilbo, Chosun Ilbo, Donga Ilbo, Hangyereh Shinmun, Kyunghang Shinmun) by examining “the valences of information given by
various sources appearing in US beef imports articles” (Lee and Koh, 2009, 458). They concluded that Joongang Ilbo, Chosun Ilbo and Donga Ilbo–moderately or strongly conservative
newspapers according to our measure–are more conservative than Hangyereh Shinmun and
Kyunghang Shinmun–strongly liberal newspapers according to our measure–in their selection of news sources. Park (N.d.) also echoed this line of ideological division among the
South Korean newspapers by examining news reports regarding the part time employment
problem.
The next question is whether the positions of the newspapers have changed as public
opinion shifted in one way or the other during the sample period. Our model diagnostic test
illustrated in Table 1 strongly suggests that the positions of the newspapers have changed as
the linear trend of public opinion shifted. The details of the change are reported in Figure
10 and Figure 11.
Figure 10 shows regime-specific estimates of media slant.7 Two changes are notable
from Figure 10. First, the rank order of the slant estimates changes over time. While
7
Note that Chosun Ilbo did not provide their news reports to NAVER News stand from April 16 to
August 29, 2014. Thus, they are missing in our data. Despite the missingness, the HMDMLM provides
slant estimates for Chosun Ilbo from April 16 to August 29, 2014 by borrowing information from other
newspapers. Figure 10 reports regime-specific slant estimates and slant estimates for Chosun Ilbo during the
missing periods have large variances due to the lack of direct information.
19
Figure 10: Regime Specific Media Slants
−5.0
●
●
●
−2.5
●
0.0
2.5
5.0
●
−5.0
●
●
●
●
●
●
●
●
●
●
●
−2.5
0.0
Slant
●
−2.5
0.0
5.0
●
−5.0
●
●
−2.5
2.5
5.0
Junja
Digital Times
Hanguk Economy
Chosun
Seoul Economy
Seoul
Munhwa
Money Today
Herald Economy
Mail Economy
Segye
Hanguk
Donga
Financial News
Joongang
Asia Economy
Kukmin
Kyunghang
Hangyereh
●
●
−5.0
Slant
−2.5
0.0
0.0
2.5
5.0
2.5
5.0
Regime 6
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Slant
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Slant
Regime 5
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Media
Media
−5.0
●
2.5
Munhwa
Mail Economy
Kukmin
Financial News
Chosun
Segye
Hanguk Economy
Seoul Economy
Donga
Joongang
Junja
Digital Times
Herald Economy
Asia Economy
Money Today
Seoul
Hanguk
Kyunghang
Hangyereh
Slant
Regime 4
Financial News
Digital Times
Junja
Munhwa
Chosun
Hanguk Economy
Mail Economy
Seoul
Seoul Economy
Donga
Joongang
Asia Economy
Herald Economy
Money Today
Kukmin
Hanguk
Segye
Kyunghang
Hangyereh
Regime 3
●
●
●
●
●
●
●
Media
●
●
●
●
Munhwa
Financial News
Digital Times
Mail Economy
Junja
Donga
Joongang
Seoul Economy
Chosun
Hanguk Economy
Segye
Seoul
Kukmin
Herald Economy
Money Today
Hanguk
Asia Economy
Kyunghang
Hangyereh
Media
●
●
●
●
●
●
●
Regime 2
●
●
●
●
Media
Media
Regime 1
Junja
Munhwa
Herald Economy
Digital Times
Joongang
Mail Economy
Seoul Economy
Financial News
Money Today
Segye
Hanguk Economy
Chosun
Donga
Kukmin
Asia Economy
Seoul
Kyunghang
Hanguk
Hangyereh
2.5
5.0
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Junja
Munhwa
Digital Times
Mail Economy
Hanguk
Hanguk Economy
Seoul Economy
Joongang
Chosun
Donga
Financial News
Segye
Herald Economy
Kukmin
Seoul
Kyunghang
Asia Economy
Hangyereh
Money Today
−5.0
−2.5
0.0
Slant
Hangyereh Shinmun and Kyunghang Shinmun are consistently located at the far left corner,
as we saw in Figure 9, Hanguk Ilbo and Seoul Shinmun are very close to these two liberal
newspapers during the first regime. In the third regime, Hanguk Ilbo is located near the far
left corner with Hangyereh Shinmun and Kyunghang Shinmun. These two regimes – Regime
1 and Regime 3 – are periods in which liberal newspapers are well identified from the other
newspapers.
The second notable finding from Figure 10 is changes in the variance of media slants
over time, which is drawn as line graphs in Figure 11 for easier interpretation. Regime 1
and Regime 3 stand out in terms of large variances in media slants. The four ideological
groupings of the newspapers – strongly liberal, moderately liberal, moderately conservative,
and strongly conservative – clearly emerged right after the ferry disaster. The ideological
groups became weaker during June and July 2014. Then, they reemerged after the July 30
by-election. There must be a multitude of factors that contributed to the turn of public
opinion in the third regime: the surprising victory of the ruling party in the by-election, the
discovery of the death of the ferry company owner, Byung Eon You, and the uncompromising
attitude of the victim’s families.
5
Conclusion
Public opinion changes over time. Mass media try to maximize the market share by closely
reflecting shifts of public opinion. At the same time, journalists, editors, and other participants of news production try to influence public opinion with their own views and agenda.
The constant interaction of public opinion and mass media is one of essential characteristics
of modern democracy.
20
Figure 11: Changes in Media Slants Over Time: Colors are scaled based on the rank-order of constant
media slant. Bright colors indicate conservative slants and dark colors indicate liberal slants. Dim vertical
lines in the background indicate timings of break in public opinion.
4
Junja
Munhwa
Herald Economy
Digital Times
2
Slant Changes Over Time
Joongang
Money Today
Segye
Hanguk
Economy
Chosun
0
Slant
Mail
SeoulEconomy
Economy
Financial News
Donga
−2
Kukmin
Asia Economy
−4
Seoul
Kyunghang
Hanguk
Estimation of media slant becomes an important empirical endeavor to understand strategic and political choices of mass media in its market. These choices are fundamentally shaped
by public opinion. However, the existing methods of media slant have focused on how to
map observed text data with a low dimensional vector without paying enough attention to
the fact that news reports are time-ordered data and that media slant may change in response to exogenous shocks or public opinion shifts. This Paper has sought to fill this gap
by presenting its own method for dynamic estimation of media slant.
We build our method upon a simple multilevel model that parameterizes media slant
as group-level varying intercepts when we observe frequencies of partisan phrases at the
media-time level. Then, we let the global mean of the observed partisan phrase frequency
smoothly vary over time using the linear dynamic model in Bayesian time series literature. In
the model, the smoothly moving average partisan phrase frequency captures public opinion
changes. In order to check the possibility of changes in media slant, we allow group-level
varying intercepts to follow hidden Markov transitions identified by shifts in the linear trend
of public opinion. We suggest the WAIC as a measure of model diagnostics to choose the
most reasonable model out of a pool of our competing models – either static or dynamic.
The paper has illustrated the application of this dynamic method by analyzing the ideological positions of nineteen South Korean newspapers in their news reports related with
the Sewol disaster for the period from April 16, 2014 to April 11, 2015. We have uncovered
dramatic changes in public opinion and ideological positions of the newspapers. Reflecting
the fluctuations in public opinion, the variance of media slant has changed over time. We
found that the by-elections of July 30 played an important role in shifting public opinion
21
2015−04−11
2015−04−01
2015−03−22
2015−03−12
2015−03−02
2015−02−20
2015−02−10
2015−01−31
2015−01−21
2015−01−11
2015−01−01
2014−12−22
2014−12−12
2014−12−02
2014−11−22
2014−11−12
2014−11−02
2014−10−23
2014−10−13
2014−10−03
2014−09−23
2014−09−13
2014−09−03
2014−08−24
2014−08−14
2014−08−04
2014−07−25
2014−07−15
2014−07−05
2014−06−25
2014−06−15
2014−06−05
2014−05−26
2014−05−16
2014−05−06
2014−04−26
2014−04−16
Hangyereh
in favor of the ruling party and the government. Moreover, some newspapers transformed
their ideological positions greatly in the middle of the sample period. Static models of media
slant would have failed not only to distinguish public opinion changes from changes in media
slant, but also to detect short-term changes in ideological positions of mass media.
22
References
Agirdas, Cagdas. 2015. “What Drives Media Bias? New Evidence From Recent Newspaper
Closures.” Journal of Media Economics 28:123–141.
Baron, David P. 2006. “Persistent Media Bias.” Journal of Public Economics 90:1–36.
Bernhardt, Dan, Stefan Krasa, and Mattias Polborn. 2008. “Political polarization and the
electoral effects of media bias.” Journal of Public Economics 92:1092 – 1104.
Carter, C. K., and R. Kohn. 1994. “One Gibbs Sampling for State-Space Models.” Biometrika
81:541–533.
Chib, Siddhartha. 1995. “Marginal Likelihood From the Gibbs Output.” Journal of the
American Statistical Association 90 (December):1313–1321.
Chib, Siddhartha. 1996. “Calculating Posterior Distributions and Modal Estimates in
Markov Mixture Models.” Journal of Econometrics 75:79–98.
Chib, Siddhartha. 1998. “Estimation and Comparison of Multiple Change-Point Models.”
Journal of Econometrics 86 (June):221–241.
Chib, Siddhartha, and Bradley P. Carlin. 1999. “On MCMC Sampling in Hierarchical Longitudinal Models.” Statistics and Computing 9:17–26.
Frühwirth-Schnatter, Sylvia. 1994. “Data Augmentation and Dynamic Linear Models.” Journal of Time Series Analysis 15:183–202.
Frühwirth-Schnatter, Sylvia. 2006. Finite Mixture and Markov Switching Models. Heidelberg:
Springer Verlag.
Gans, Joshua S., and Andrew Leigh. 2012. “How Partisan is the Press? Multiple Measures
of Media Slant.” The Economic Record 88:127–147.
Gelfand, Alan E., and Adrian F. M. Smith. 1990. “Sampling-Based Approaches to Calculating Marginal Densities.” Journal of the American Statistical Association 85 (June):398–
409.
Gelman, Andrew, Jessica Hwang, and Aki Vehtari. 2014. “Understanding predictive information criteria for Bayesian models.” 24:997–1016.
Gelman, Andrew, John B. Carlin, Hal S. Stern, and Donald B. Rubin. 2004. Bayesian Data
Analysis. 2nd ed. New York: Chapman and Hall.
Gentzkow, Matthew, and Jesse M. Shapiro. 2010. “What Drives Media Slant? Evidence
From U.S. Daily Newspapers.” Econometrica 78:35–71.
Gerber, Alan S., Dean Karlan, and Daniel Bergan. 2009. “Does the Media Matter? A
Field Experiment Measuring the Effect of Newspapers on Voting Behavior and Political
Opinions.” American Economic Journal: Applied Economics 1:35–52.
23
Green, Peter J. 1995. “Reversible Jum Markov Chain Monte Carlo Computation and
Bayesian Model Determination.” Biometrika 82:711–732.
Groseclose, Tim, and Jeffrey Milyo. 2005. “A Measure of Media Bias.” Quarterly Journal of
Economics CXX:1191–1237.
Jackman, Simon. 2005. “Pooling the Polls Over an Election Campaign.” Australian Journal
of Political Science 40:499 – 517.
Kass, Robert E., and Adrian E. Raftery. 1995. “Bayes Factors.” Journal of the American
Statistical Association 90:773–795.
Kim, In Song, John Londregan, and Marc Ratkovic. 2015. “Voting, Speechmaking, and the
Dimensions of Conflict in the US Senate.” Presented at the 2015 Asian Political Methodology Meeting.
Larcinese, Valentino, Riccardo Puglisi, and Jr. James M. Snyder. 2007. “Partisan Bias in
Economic News: Evidence on the Agenda-Setting Behavior of U.S. Newspapers.” Journal
of Public Economics 95:1178–1189.
Lee, Gunho, and Heungseok Koh. 2009. “Korean Newspaper’s Political Orientation Featuring
in US Beef Imports Articles.” Korean Journal of Journalism Communication Studies
53:347–369.
Matthew Gentzkow, Jesse M. Shapiro. 2006. “Media Bias and Reputation.” Journal of
Political Economy 114:280–316.
Park, Eunjeong, and Sungzoon Cho. 2014. KoNLPy: Korean natural language processing in
Python. In Proceedings of the 26th Annual Conference on Human & Cognitive Language
Technology.
Park, Jae-Young. N.d. “Ideological Dimension of South Korean News Media (Hanguk Eolonsadulye Jungpasung Jihyung).” Seminar on the Jounalism Practice Committee.
Raftery, Adrian E. 1995. “Bayesian Model Selection in Social Research.” Sociological Methodology 25:111–163.
Spiegelhalter, David J., Nicola G. Best, Bradley P. Carlin, and Angelika Van Der Linde. 2002.
“Bayesian measures of model complexity and fit.” J R Stat Soc Series B Stat Methodol
64:583–639.
Spiegelhalter, David J., Nicola G. Best, Bradley P. Carlin, and Angelika van der Linde. 2014.
“The deviance information criterion: 12 years on.” Journal of the Royal Statistical Society:
Series B (Statistical Methodology) 76:485–493.
Stone, Daniel F. 2011. “Ideological media bias.” Journal of Economic Behavior Organization
78:256 – 271.
Strömberg, David. 2015. “Media and Politics.” Annual Review of Economics 7:173–205.
24
Taddy, Matt. 2013. “Multinomial Inverse Regression for Text Analysis.” Journal of the
American Statistical Association 108:755–770.
Vehtari, Aki, and Janne Ojanen. 2012. “A survey of Bayesian predictive methods for model
assessment, selection and comparison.” pp. 142–228.
Vehtari, Aki, and Jouko Lampinen. 2002. “Bayesian Model Assessment and Comparison
Using Cross-Validation Predictive Densities.” Neural Computation 14 (2015/07/19):2439–
2468.
Watanabe, Sumio. 2010. “Asymptotic equivalence of Bayes cross validation and widely
applicable information criterion in singular learning theory.” Journal of Machine Learning
Research 11:3571–3594.
Zaller, John. 1999. A Theory of Media Politics. Unpublished.
25