Dynamic Estimation of Media Slant
Transcription
Dynamic Estimation of Media Slant
Dynamic Estimation of Media Slant∗ Jong Hee Park Department of Political Science and International Relations Seoul National University http://jhp.snu.ac.kr [email protected] January 4, 2016 Abstract Existing methods for media slant estimation focus on how to map observed text data with a low dimensional vector of fixed quantities. In doing so, these methods ignore the sequence of news and fail to consider the possibility of changes in media slant. In this paper, we highlight the chronological characteristic of news reports and develop Bayesian statistical methods that allow a joint estimation of time-varying public opinion trends and media slant subject to multiple discrete changes. Automated text analysis methods are used to extract relative frequency of partisan words from news reports. We apply this method to news reports covering the Sewol ferry disaster by nineteen daily newspapers of South Korea. Keyword : media slant, dynamic linear model, hidden Markov model, automated text analysis, South Korea ∗ Prepared for the Asian Political Methodology Meeting at Tsinghua University in January 2016. 1 1 Introduction In a modern democratic society, mass media reflect and shape public opinion. As strategic actors and professionals pursuing independence, journalists and editors wish to be more than “a conveyor belt” for what has happened and what politicians have said (Zaller, 1999, 1-2). Mass media play a critical role in framing, disseminating, and constructing public issues. On the other hand, mass media cannot report an “unrepresentative sample” of public opinion consistently because of market pressure. Irresponsive media, if disfavored by consumers, will quickly lose market shares. As the number and types of media outlets increase, behaviors of mass media have become increasingly constrained by the market competition. Considering the constant and complex interactions between mass media and public opinion, it is essential to explicate the public opinion-media nexus to properly understand the role of mass media in a democracy. Among many important roles of mass media in a democracy, theoretical and empirical investigations of “media slant”1 –“media bias” – have recently received increasing attention from social scientists (Groseclose and Milyo, 2005; Baron, 2006; Matthew Gentzkow, 2006; Bernhardt, Krasa and Polborn, 2008; Gerber and Bergan, 2009; Gentzkow and Shapiro, 2010; Larcinese, Puglisi and James M. Snyder, 2007; Stone, 2011; Gans and Leigh, 2012; Strömberg, 2015; Agirdas, 2015). This paper contributes to the literature by developing a new method for dynamic estimation of media slant. In our dynamic modeling framework, the news content generating process is assumed to be a stochastic process where public opinion and mass media constantly interact with each other. We consider media slant – or any quantity we derive from time-ordered news contents – as time-indexed and subject to temporal changes. Such characterization of the news generating process as stochastic marks an important departure from the existing literature. Most existing methods for the media slant estimation focus on how to map observed text data with a low dimensional vector of fixed quantities, such as sentiments, slants, or ideologies (Groseclose and Milyo, 2005; Gentzkow and Shapiro, 2010; Gans and Leigh, 2012; Taddy, 2013). In doing so, these methods treat each document as “an exchangeable collection of phrase tokens” (Taddy, 2013, 755) and fail to consider the temporal order in the news content generating process. For example, Gentzkow and Shapiro (2010) treat their key input measures, phrase frequencies in newspapers and phrase frequencies in the 2005 Congressional Record, as samples randomly drawn from a population of all phrases. There are other reasons to consider the possibility of changes in media slant rather than the mere fact that news contents are time-ordered. Epochal changes in a society fundamentally reshape both positions of mass media and directions of public opinion. If we fail to account for those massive and long-lasting changes in the society, our measurement of media slant will be biased in an unknown direction and inconsistent in a statistical sense. Moreover, even though politicians’ use of partisan phrases shows a consistent pattern, movements of public opinion and media reports as reflections of public opinion at each time tend to show nonstationary patterns even for a short period of time. Public opinion movements have periods of a persistent trend and periods of irregular changes. For example, Gentzkow and Shapiro (2010) assume that each media outlet chooses “estate tax” over “death tax,” 1 We define media slant as relative ideological leanings of media outlets in news coverage. 2 or vice versa, in a consistent manner throughout the entire year of 2005. The assumption of time homogeneity needs to be tested because observed phrase frequencies are sensitive to public opinion changes or effects of events such as the enactment of the Death Tax Repeal Permanency Act of 2005.2 In this paper, we propose a new method to jointly estimate public opinion trends and media slant that aims to properly account for the time-ordered nature of the news content generating process and public opinion movements. The first step is to extract a relative frequency of partisan phrases from media reports using automated text analysis methods. This can be done in various ways depending on the availability of other information that helps us identify ideological dimensions of news content data. Then, we decompose the time series cross-sectional data of the relative partisan word frequency into time-varying movements of public opinion and hidden Markov transitions of media slant using dynamic linear multilevel model with parametric breaks in the second level parameter. Instead of imposing dynamics and Markov transitions on the data, we test the modeling assumption of time-varying movements of public opinion and hidden Markov transitions of media slant using two simpler models as benchmarks: the multilevel model of media slant and dynamic linear multilevel model. For model diagnostics and changepoint detection, we utilize the Widely Applicable Information Criterion (WAIC), an intuitive and effective Bayesian model diagnostic tool (Watanabe, 2010). 2 The Proposed Method Our method for a dynamic estimation of media slant consists of three steps. First, we identify partisan phrases and quantify their relative frequency in media reports. Second, the word selection on an issue by media is modeled as a function of public opinion toward the issue at each time and a political ideology of the media. We discuss different types of dynamic models for the slant estimation. Model diagnostics of competing dynamic models in order to avoid overfitting is the last step of the proposed method. 2.1 Relative Frequency of Partisan Phrases Many different methods have been proposed to identify partisan phrases from news contents, mostly in the U.S. context. For example, Groseclose and Milyo (2005) consider the frequency of citations to think-tank materials by legislators and journalists to indicate a partisan leaning of the medium. Gentzkow and Shapiro (2010) use a hand-coded list of bigram or trigram phrases that are more frequently used by one party members than the other; the relative frequency of partisan phrases is then regressed on a congressperson’s ideology to obtain a partisan score for the newspaper. Generally speaking, however, the definition of partisan phrases varies by time and space. It is hard to generalize the identification method of partisan phrases in the literature to non-US cases because congressional voting and speech data are not always readily available. Also, in some countries, like South Korea and the United Kingdom, party-line voting is 2 The bill was introduced by Rep. Hulshof, Kenny C. (R-MO) on Feb. 17, 2005 (https://www.congress. gov/bill/109th-congress/house-bill/8). 3 quite dominant. Therefore, there is a need to develop a creative way to identify ideological dimensions of news content data using other types of information.3 In the case of South Korea–discussed in Section 3–political parties release press releases almost every day. Such press releases can be used to uncover ideological dimensions of the news content data. 2.2 Models Once we extract the relative frequency of partisan phrases, the next step is to develop an empirical model of the news content generating process in which media slant is one of the key random variables. In this section, we discuss three modeling options in the order of model complexity: (1) the multilevel model (MLM), (2) the linear dynamic multilevel model (DMLM), and (3) the hidden Markov linear dynamic multilevel model (HMDMLM). The discussion of how to choose the “right” model using a Bayesian model diagnostic tool follows. 2.2.1 Model 1: Multilevel Model (MLM) of Media Slant Using the relative frequency of partisan phrases for newspaper i at t, we model partisan biases of each newspaper as random samples from an unknown normal distribution: yit αi β σy σα = ∼ ∼ ∼ ∼ αi + β + σy it , N (µα , σα2 ) N (b0 , B0 ) IG(c0 , d0 ) IG(e0 , f0 ) it ∼ N (0, 1) (1) This is a baseline multilevel model upon which we will build additional complexity. Although this model is naive in that it ignores the time-varying nature of the news content generation, there are three major advantages of modeling media slant within a multilevel model. First, the multilevel setup provides consistent estimates of group-level parameters from time series cross-sectional data. Media slant is information at a group level that needs to be gleaned across individual observations. When a group-level parameter is a quantity of our substantive interest, we cannot use the fixed-effects method that removes group-level constants for the consistent estimation of individual level slopes. Second, the hierarchical structure of the multilevel model allows us to combine complex dynamic models in the context of the time series cross-section data analysis. As it will be discussed shortly, we can add various time series models, such as an autoregressive model, a linear dynamic model, and a hidden Markov model, to the multilevel model. Last, and most importantly, the multilevel model summarizes variations across heterogeneous groups in a way that balances within-group variations with inter-group differences. 3 Alternatively, one can develop a sophisticated theoretical model that explains the news content generating process and can estimate theoretical parameters from the news content data. Recently, a group of scholars are working on deriving ideological measures from a theoretical model of (slanted) word choice in text document (Baron, 2006; Bernhardt, Krasa and Polborn, 2008; Stone, 2011; Kim, Londregan and Ratkovic, 2015). 4 This characteristic, known as “partial pooling” in the multilevel literature, ensures us to avoid overfitting the data. Overfitting usually happens in the estimation of media slant when we estimate media slant of newspaper i only using data observed from i. In that case, i’s slant estimate captures only the realized i’s information without learning from other similarly generated data; thus, the resulting estimates tend to be overly confident about the out-of-sample predictive accuracy of i’s slant estimate. The posterior samples of slant estimates (αi ) can be decomposed into the sum of two parts: No pooling Ti 1 Complete pooling z }| { 2 z}|{ σy2 σα (ȳi − β) + Ti µα (2) αi = Ti 1 1 + σ2 2 + σ2 σy2 σ α α | {zy } Partial pooling where Ti is the number of newspaper i’s observation and ȳi is a group mean of newspaper i’s relative partisan phrase frequency. Here, i’s slant is a weighted sum of i’s average distance from the global mean (ȳi − β)) and the mean of all media slants (µα ). The Markov chain Monte Carlo (MCMC) sampling algorithms of the MLM are well known. Several Bayesian statistical packages in R provide functions to fit the MLM and other Bayesian software like the BUGS (Bayesian inference Using Gibbs Sampling) project, JAGS (Just Another Gibbs Sampler), and Stan (http://mc-stan.org) are also available. However, it is not straightforward to add dynamic components to these packages. Thus, we proceed to discuss the MCMC algorithms. For efficient sampling of group level parameters, we decompose the posterior density into three blocks as suggested by Algorithm 2 of Chib and Carlin (1999): Z 2 2 p(β, σy , σα |y) = p(β, αi |y)p(σα2 |y, β, αi )p(σy2 |y, β, αi , σα2 , αi )dαi . The full conditional distribution of p(β, αi |y) is decomposed into p(β|y)p(αi |y, β) so that the sampling of β is not directly dependent upon αi . Using conjugate priors, all the sampling steps can be done by the Gibbs sampler. 2.2.2 Model 2: Linear Dynamic Multilevel Model (DMLM) of Media Slant Now, we add one layer of dynamics onto the MLM. The global mean of the relative partisan phrase frequency follows a random walk where the current global mean is determined by the previous global mean and a random shock. The size of a random shock (σβ ) is estimated by 5 data. The resulting model takes a following form: yit αi βt β1 σy σα σβ = ∼ = ∼ ∼ ∼ ∼ αi + βt + σy it , it ∼ N (0, 1) N (µα , σα2 ) βt−1 + σβ t for t > 1 N (b0 , B0 ) IG(c0 , d0 ) IG(e0 , f0 ) IG(g0 , h0 ). (3) The above model was first introduced by Jackman (2005) who used it to estimate house effects in time series cross-sectional polling data. A natural question arising from this model setup is what the time-varying global mean of the relative partisan phrase frequency (βt ) substantively means. Technically speaking, βt is the average level of partisan slant reflected in news reports measured at t. βt follows a random walk. Thus, the current βt is highly predictable from βt−1 , but not from β t−2 ≡ (βt−2 . . . β1 ). We interpret β ≡ (β1 . . . βT ) as movements of public opinion showing the average partisan slant among the public observed by the mass media in the sample. Public opinion changes smoothly over time, mostly reflecting what people thought right before. Yet, it sometimes changes dramatically in one way or the other in response to external shocks. Such irregular movements are modeled as a random walk process in our model, and the size of irregularity is estimated by the transition variance (σβ2 ). Interpreting β as public opinion movements makes it easy to substantively understand our estimates of media slant. The posterior samples of slant estimates (αi ) from the DMLM consist of the sum of two parts: αi = i’s average distance from public opinion Ti σy2 +σβ2 Ti σy2 +σβ2 + 1 2 σα zX }| { (yi,t − zit β) t∈Ti + 1 2 σα Ti σy2 +σβ2 + 1 2 σα µα (4) where Ti is a collection of time indices realized in i’s data and zit is tth row of zi which is a Ti × T matrix that identifies specific time indices realized in i’s data.4 As noted by the brace in Equation (4), the DLML constructs newspaper i’s slant as a P weighted sum of i’s average distance from public opinion ( t∈Ti (yi,t − zit β)) and the mean of all media slants (µα ). This formula allows us to consider the time-varying nature of public opinion and anchor our slant measure, accordingly, which makes more sense than anchoring 4 For example, if newspaper i has three observations at t = 1, 3, 5 and the maximum range of time series in the data set is t = (1, 2, 3, 4, 5), then β1 β2 1 0 0 0 0 zi = 0 0 1 0 0 , β = β3 . β4 0 0 0 0 1 β5 6 media slant to the constant global mean (β) in Equation (1). Distance from the public opinion Distance from the public opinion 0.0 Relative Frequency 0.2 0.4 Measuring Slant from the Public Opinion Distance from the public opinion Distance from the public opinion Distance from the public opinion 2014−06−10 2014−06−09 2014−06−08 2014−06−07 2014−06−06 2014−06−05 −0.4 −0.2 Distance from the public opinion Figure 1: Media Slant as Average Distance from Public Opinion: Data from 19 South Korean Newspapers, April 2014 - April 2015. Red dots are relative partisan phrase frequencies of Munhwa Ilbo and the blue line is estimated public opinion from 19 newspapers. Figure 1 visualizes the idea of estimating media slant as an average distance from public opinion. The data set used here is the relative partisan phrase frequency data from nineteen South Korean newspapers over the Sewol ferry disaster. The blue line is estimated public opinion (βt ) and red dots are relative partisan phrase frequencies of a newspaper called Munhwa Ilbo. The braces between the red dots and the blue line indicate partisan slants of Munhwa Ilbo at each time point. It is clear that Munhwa Ilbo’s partisan slants are located above the public opinion except on June 9, 2014, indicating the possibility of Munhwa Ilbo’s conservative bias. The MCMC sampling algorithm of the DMLM involves two additional steps for β and σβ2 . As the movement of β is modeled as a linear dynamic model, all the MCMC sampling steps can be done by the Gibbs sampler. The posterior density can be decomposed as follows: forward filtering backward sampling p(β, σβ2 , σy2 , σα2 |y) Z = z }| { p(β|y) p(σβ2 |y, β)p(αi |y, β, σβ2 ) p(σα2 |y, β, σβ2 , αi )p(σy2 |y, β, σβ2 , σα2 , αi )dαi . The sampling of β is done by the forward filtering backward sampling (FFBS) algorithm developed by Carter and Kohn (1994) and Frühwirth-Schnatter (1994). The sampling of σβ2 7 is from the following inverse gamma distribution: σβ2 |y, β, σα2 , αi ∼ IG 2.2.3 g0 + T h0 + , 2 Pt=1 T 2 (βt − βt−1 ) 2 ! . Model 3: Hidden Markov Linear Dynamic Multilevel Model (HMDMLM) of Media Slant The DMLM makes an important contribution to the literature by allowing the joint estimation of public opinion and constant parameters at the group level. However, it is logically incoherent and theoretically unsatisfying to assume that only public opinion changes over time while media slant remains constant. However strong partisan predispositions of mass media might be, mass media cannot ignore massive and dramatic changes in public opinion. Important social events – such as disasters, international disputes, economic crisis, political scandals, critical elections, and social movements – affect how journalists think and write about their society. Moreover, tides of public opinion reshuffle the popularity of mass media in terms of their responsiveness. As a result, the assumption of constant media slant, however convenient it would be in terms of model estimation, must be tested with data rather than to be taken for granted. In order to let media slant change during the sample period, two assumptions are made. First, changes of media slant are discrete rather than continuous. That is, media slant changes take place not every time, but once in a while. This is a reasonable assumption that reduces the complexity of model and the computational cost. Ideological positions of mass media are usually consolidated over a long period of time by ownership, government regulations, media market trends, recruitment, and promotion. They may change in response to major changes in public opinion, but not always. Second, changes in media slant are caused mainly by shifts in public opinion or positions of other media outlets. Changes in media slant can be simply passive reflections of changes in public opinion. Or, they can be a result of strategic repositioning by mass media in response to changes in positions of other media outlets. To highlight the difference between the DMLM and the HMDMLM, Figure 2 and Figure 3 compares the modeling structure of the two dynamic models. The arrows show the direction of statistical dependence in each model. Observed partisan frequencies of each media (yit ) are generated by two stochastic variables: βt and αi,st . Values of αi,st are further dependent upon latent states (st ), the transition of which is governed by βt . In plain words, public opinion (βt ) is the main driving force of the news content generation process and changes in ideological positions of mass media. We extract information about the transition of hidden states by assuming that βt follows a local linear trend until unknown break points. That is, the sampling of latent states is not dependent upon the response data in our model: p(s|y, β, σβ2 , σα2 , αi , σy2 ) = p(s|β, σβ2 , σα2 , αi , σy2 ). 8 βt−1 yi,t−1 αi βt yi,t αi βt−1 βt βt+1 yi,t−1 yi,t yi,t+1 αi,(st−1 ) αi,(st ) αi,(st+1 ) st−1 st st+1 βt+1 yi,t+1 αi Figure 2: DMLM Figure 3: HMDMLM Based on these two assumptions, we introduce the HMDMLM to account for dramatic changes in media slant and time-varying changes in public opinion. Compared with the DMLM, two new parameters are added to the HMDMLM. The first one is latent state variables (s ≡ (s1 , . . . , sT )), and the second one is a transition matrix (P) which summarizes the movement of latent state variables. We adopt Chib (1998)’s non-ergodic (i.e. non-switching and forward moving Markov chain) design that efficiently identifies multiple changepoints across various types of response data. The resulting model can be written as follows: yit = αi,st + βt + σy it , it ∼ N (0, 1) βt = βt−1 + σβ t for t > 1 st |st−1 ∼ Markov(P, π0 ) 2 if τ0 < t 6 τ1 N (µα,1 , σα,1 ) .. . αi,st ∼ 2 N (µα,M +1 , σα,M +1 ) if τM < t 6 τM +1 β1 ∼ N (b0 , B0 ) σy ∼ IG(c0 , d0 ) σα ∼ IG(e0 , f0 ) σβ ∼ IG(g0 , h0 ) pii ∼ Beta(a, b) for i = (1, . . . , M ) where τ0 = 0 and τM +1 = T . Note that the cutpoints (τ ≡ (τ0 , . . . , τM +1 )) are identified by sampled latent state variables (s ≡ (s1 , . . . , sT )) at each MCMC simulation step. pii is a probability of staying in the ith state. Although each row of the transition matrix follows a Dirichlet distribution, there are only two non-zero elements in each row except the last row. The sum of these two non-zero elements is 1 by definition. Hence we model only diagonal elements (pii ) of the transition matrix as a Beta distribution. Slant estimates (αi ) from the dynamic linear multilevel model with hidden Markov transi9 tions are state dependent. Suppose that the hidden state at t be m and Mi be the collection of time indices pertaining to state m for media i. We interpret state dependent slant estimates (αi,m ) as media i’s average distance from public opinion during state m. αi,st =m = i’s average distance from public opinion during state m mi σy2 +σβ2 mi σy2 +σβ2 + zX 1 2 σα,m }| { (yi,t − zit β) + t∈Mi 1 2 σα,m mi σy2 +σβ2 + 1 2 σα,m µα,m (5) where mi is the number of media i’s observations during state m. The MCMC sampling algorithm of the dynamic linear multilevel model with hidden Markov transitions at the group-level parameters involves two additional steps for s and P compared with that of the MLM-DLM. The posterior density can be decomposed as follows: Z Z 2 2 2 p(β, σβ , σy , σα , P|y) = p(β|y)p(σβ2 |y, β)p(αi |y, β, σβ2 )p(σα2 |y, β, σβ2 , αi ) multi-move sampling z }| { p(σy2 |y, β, σβ2 , σα2 , αi ) p(s|y, β, σβ2 , σα2 , αi , σy2 ) row-wise sampling from a Beta distribution z }| { 2 2 2 p(P|y, β, σβ , σα , αi , σy , s) dαi ds. The sampling of pii can be easily done once we have samples of hidden states. Using the Beta prior distribution, π(pii |s) ∝ f (s|pii )Beta(a, b) b−1 ∝ pniiii (1 − pii )nij pa−1 ii (1 − pii ) pii |s ∼ Beta(a + nii , b + nij ) where nii be the number of one-step transitions from state i to i and nij be the number of one-step transitions from state i to j. The sampling of hidden state variables is done by the multi-move sampling algorithm proposed by Chib (1996) and Chib (1998), and explained in detail by Frühwirth-Schnatter (2006, 342-346). Note that we do not sample hidden states from response data (y) but from β because, as discussed above, we identify the source of discrete changes in media slant from dramatic changes in public opinion or position changes of other media outlets. 2.3 Model Validity Check Like so, the paper presents statistical methods to jointly estimate dynamics of public opinion and discrete changes in media slant. However, one important question we need to address when developing complex models is: “is it really necessary?” That is, the models discussed above will return estimates of dynamic public opinion and/or discrete changes in media slant simply due to the structure of the model. Then, how do we know that the estimated dynamics and changes are real? Cross-validation tests and Monte Carlo studies can be useful and suggestive but they do not provide a definitive answer to the question of model validity. 10 In this Section, we discuss how to check the validity of the above models against data – model diagnostics – and other alternative models – model comparison. Discussions of the Bayesian model diagnostics and model comparison have been extensively covered in the statistics literature (Gelfand and Smith, 1990; Raftery, 1995; Kass and Raftery, 1995; Green, 1995; Chib, 1995; Gelman et al., 2004; Vehtari and Lampinen, 2002; Vehtari and Ojanen, 2012). The goal of the Bayesian model diagnostics and model comparison is to assess the posterior probability of the model. Assuming a uniform prior probability for each model, the posterior probability of model i is: p(y|Mi ) . p(Mi |y) = PJ j=1 p(y|Mj ) An important quantity that needs to be computed from each model to find the posterior model probability is the marginal density of data given each model: Z p(y|Mi ) = p(y|Θi , Mi )p(Θk |Mi )dΘi where Θi denotes the parameter vector for model i. However, it is very difficult to get a stable estimate of the marginal density of data except in the case of simple models. Our dynamic linear multilevel models have many parameters, most of which are latent. Thus, even if we manage to compute the marginal density of data using an approximate method, the likely accuracy of the obtained values can be problematic. An alternative way to check the model validity without computing the marginal density of data is to use Watanabe-Akaike information criterion or widely applicable information criterion (WAIC) proposed by Watanabe (2010). Gelman, Hwang and Vehtari (2014) provide an introduction to the WAIC. The WAIC has several advantages in the Bayesian model diagnostics. First, the WAIC approximates leave-one-out cross validation (LOO-CV) and hence can serve as a metric for out-of-sample predictive accuracy of a model. We believe that predictive accuracy of a model is a critical criterion for choosing the “right” model. Second, the WAIC is relatively easy to compute. The computation of the WAIC can be done at the end of MCMC sampling and does not involve additional MCMC runs. Third, unlike a deviance information criterion (DIC) (Spiegelhalter et al., 2002, 2014), the WAIC is fully Bayesian and applicable to both nonsingular and singular models.5 Last, estimates of the WAIC are numerically stable. We use Gelman, Hwang and Vehtari (2014)’s formula to compute WAIC. Let Θ denote 5 Singular models are statistical models with singularity in their Fisher-information matrix. When statistical models are singular, the assumption of large sample approximation in the Akaike information criterion (AIC) or the Bayesian information criterion (BIC) does not hold. Mixture models and hidden Markov models are examples of singular models. 11 all the parameters in a model. Then, log pointwise predictive density = N X X Z log p(yit |Θ)ppost dΘ i=1 t∈Ti pWAIC = 2 N X X (log(Epost p(yit |Θ)) − Epost (log p(yit |Θ))) i=1 t∈Ti WAIC = −2(log pointwise predictive density − pWAIC ). The expectation and integration in the formula are easily computable using simulated MCMC samples. 3 3.1 Analysis of Korean Media Slant: News Coverage of the Sewol Ferry Disaster Aftermath of the Sewol Ferry Disaster On April 16, 2014, a South Korean ferry capsized in the Yellow Sea on its way from Incheon to Jeju island. Out of total 476 people in the ferry including passengers and the crew, 295 died; 9 are still missing at the time of writing this Paper. Most of the victims were high school students and teachers on a field trip. The Sewol disaster was considered as the worst peacetime disaster in South Korean history, leaving a tremendous impact on the collective memory of the public. Immediately after the disaster, reactions of South Korean people were homogenous without any political divide. People on the left and right were both raged by factors that caused the disaster. For example, the ferry company, Cheonghaejin Marine, ignored basic safety protocols; government regulators blindly approved the condition of the freight before the departure; the captain and the crew who were not regular employees of Cheonghaejin Marine, but those of a contractor who did not make an evacuation order even at the moment they left the ferry; government rescue agencies and the Ministry of Security and Public Administration wasted critical moments for early rescuing; and the president did not comprehend the seriousness of the disaster until several hours had passed after the incident. South Korean people were torn by pictures and clips taken by dead high school students in their last minutes: they were firmly following the last order of stay inside calmly from the crew who already deserted the ferry. However, with the passage of time, media reports on the disaster started to diverge following the ideological division. One group of media, usually rendered “progressive” or “left-leaning,” considered the disaster as epitomizing systematic failure and government incompetence. They extensively reported demands by the families of the victims, civil organizations, and the opposition party (Saejungchi ). Another group of media, usually considered “conservative” or “right-leaning,” focused on the investigation of the ferry company and the crew. One of the most sensitive issues was whether the Congress should establish its own independent investigation agency and, if so, how much authority and power such agency should 12 have. What made this issue so sensitive is whether the president who did not take any action for seven hours after the disaster should be investigated. The president’s party (Saenuri ) harshly denounced the demands for the investigation as pointless and ill-intentioned political attacks. The two electionsthe nationwide gubernatorial election on June 4, 2014 and the by-election to fill fifteen vacancies in Congress on July 30, 2014 further escalated the politicization of the issue. In this Section, we estimate ideological leanings of nineteen South Korean newspapers based on their Sewol-related news reports from the date of the incidence, April 16, 2014, to April 11, 2015. Our quantity of interests is (1) whether South Korean newspapers showed significant ideological leanings in their reports related to the Sewol ferry disaster and (2) whether these ideological leanings have changed in response to changes in public opinion. 3.2 Data For the analysis, we downloaded newspaper reports containing the term, “Sewol-ho” from April 16, 2014 to April 11, 2015 on the NAVER News stand (http://news.naver.com).6 Nineteen major newspapers in South Korea are the targets of analysis. The total number of downloaded newspaper reports containing “Sewol-ho” is 122,317. On average, each newspaper published 6,438 reports on various issues related with the ferry disaster for 360 days. This is equivalent to 18 reports per day for each newspaper. We also downloaded the two major political parties’ press releases containing “Sewol-ho” for the same period in order to identify partisan phrases in the issues related with the disaster. The government party of Saenuri published 334 press releases while the opposition party of Saejungchi published 577 press releases containing the term, “Sewol-ho.” We decompose each newspaper report and press release by 20 morphemes using KoNLPy, a Python package for natural language processing of the Korean language (Park and Cho, 2014). We discard meaningless morphemes in the Korean language. Then, we search bigram or trigram phrases that were used by only one political party (oneparty phrases) regarding the Sewol ferry disaster. This sets the method apart from that of Gentzkow and Shapiro (2010). Gentzkow and Shapiro (2010) choose partisan phrases by identifying a list of partisan phrases that have a predictive power of individual legislators’ party identification. This “feature selection” method is not feasible in our case because we do not have incident-related speech data at the individual legislator level. However, our method of choosing partisan phrases has several advantages. First, our method of one-party phrase selection is highly intuitive. If we find partisan phrases from a pool of phrases used by both parties, we need to consider an arbitrary threshold in terms of asymmetric usage to classify partisan words. Yet, there is no doubt that bigram or trigram phrases that are used by only one political party and never used by the other party well satisfy the definition of “partisan phrases.” Second, political parties choose phrases in their press releases very cautiously and selectively. Unlike individual legislators’ speech, press releases rarely contain a cheap talk or a slip of the tongue. They are well planned and sophisticatedly calculated signals to outside audience. 6 The Naver is the most popular South Korean web search portal and the NAVER News stand is NAVER’s online news content providing service. 13 After removing some meaningless phrases from a pool of one-party phrases, we have ninety partisan phrases for the opposition party and thirty for the government party. Figure 4 shows partisan phrases of the opposition party. Some of the main phrases utilized to criticize the government for mishandling the disaster are: congressional investigation, irresponsible government, obstinacy, and deregulation. On the other hand, Figure 5 shows partisan phrases of the government party in defense of the president. The government party accused the opposition party of inciting conflicts and divisive conflicts; it emphasizes the normalization of Congress and the bipartisan focus on the economy. Figure 4: Partisan Words by the Opposition Party ( Saejungchi): Word sizes are adjusted by relative frequency. Frequent words are located at the center. Using the chosen partisan phrases, we compute the relative frequency of partisan phrases for each media. Let O and G be the total counts of partisan phrases for the opposition party Opposition and the government party, respectively. Let fi,o,t be the frequency of the oth opposition party phrase within reports by media i at t. roOpposition is the weight of the oth opposition party phrase measured by the relative frequency of the oth opposition party phras in the opposition party’s press releases. We compute the relative frequency by a ratio of each phrase frequency to the maximum frequency in each party’s phrases. We define quantities for the government party similarly. Then, the relative frequency of partisan phrases for newspaper i at t is defined as yit = G X Government fi,g,t × rgGovernment − g=1 O X o=1 14 Opposition fi,o,t × roOpposition . (6) Figure 5: Partisan Words by the Government Party ( Saenuri): Word sizes are adjusted by relative frequency. Frequent words are located at the center. We subtract frequencies of the opposition party phrases from those of the government party phrases so that the negative sign of yit indicates the liberal direction and the positive sign indicates the conservative direction. Since opposition party phrases are more verbose than government party phrases, the mean of yit is not necessarily close to zero. Figure 6 shows yit (colored dots) for the nineteen South Korean newspapers over the sample period. The thick solid line in the middle indicates daily averages of yit . Positive signs indicate conservative media reports, and negative signs indicate liberal media reports. Overall, daily averages of the relative partisan phrase frequency are below zero; the distribution of yit is quite large. Figure 7 illustrates a close look at yit by displaying data for only two newspapers: Munhwa Ilbo and Hangyereh Shinmun. Generally speaking, Munhwa Ilbo is considered as one of most conservative newspapers while Hangyereh Shinmun is one of most liberal newspapers in South Korea. Reflecting this conventional belief, the relative partisan phrase frequencies of Hangyereh Shinmun are almost always located below those of Munhwa Ilbo. The average relative partisan phrase frequencies of Hangyereh Shinmun is -3.40 (standard deviation = 5.45) and that of Munhwa Ilbo is 0.01 (standard deviation = 2.07). 15 16 10 20 2015−03−22 2015−03−12 2015−03−02 2015−02−20 2015−02−10 2015−01−31 2015−01−21 2015−01−11 2015−01−01 2014−12−22 2014−12−12 2014−12−02 2014−11−22 2014−11−12 2014−11−02 2014−10−23 2014−10−13 2014−10−03 2014−09−23 2014−09−13 2014−09−03 2014−08−24 2014−08−14 2014−08−04 2014−07−25 2014−07−15 2014−07−05 2014−06−25 2014−06−15 2014−06−05 2014−05−26 2014−05−16 2014−05−06 2014−04−26 2014−04−16 2015−04−11 0 2015−04−01 −10 Relative Frequency Partisan Word Relative Frequency 2015−04−11 −20 Figure 7: Relative Partisan Phrase Frequencies of Two Newspaper Companies: Munhwa Ilbo and Hangyereh Shinmun 2015−04−01 2015−03−22 2015−03−12 2015−03−02 2015−02−20 2015−02−10 2015−01−31 2015−01−21 2015−01−11 2015−01−01 2014−12−22 2014−12−12 2014−12−02 2014−11−22 2014−11−12 2014−11−02 2014−10−23 2014−10−13 2014−10−03 2014−09−23 2014−09−13 2014−09−03 2014−08−24 2014−08−14 2014−08−04 2014−07−25 2014−07−15 2014−07−05 2014−06−25 2014−06−15 2014−06−05 2014−05−26 2014−05−16 2014−05−06 2014−04−26 2014−04−16 −30 −25 −20 −15 −10 Relative Frequency −5 0 5 Figure 6: Relative Partisan Phrase Frequencies of Nineteen Newspaper Companies: Colored dots are rel- ative frequencies of partisan phrases for each newspaper and the red thick line is daily averages of relative frequencies of partisan phrases. Partisan Word Relative Frequency 4 Results 4.1 Model Selection Result Table 1 demonstrates the results of model diagnostics for the nine models. The largest WAIC in the second row assures that the static multilevel model has the smallest predictive power and that it poorly fits the data. Although all the dynamic models show much smaller WAIC values, the HMDMLMs have better predictive power than the DMLM. Among them, the five break model (HMDMLM (5)) shows the smallest WAIC; it is analyzed in detail in the following. Table 1: Model Diagnostics: A linear dynamic multilevel model with 6 parametric breaks has the smallest WAIC. Model MLM DMLM HMDMLM HMDMLM HMDMLM HMDMLM HMDMLM HMDMLM HMDMLM Break Number 0 0 1 2 3 4 5 6 7 WAIC 32965.996 32801.005 31358.685 31375.490 31360.990 31372.260 31321.365 31332.379 31363.821 log pointwise predictive density -16442.941 -16353.122 -15642.791 -15639.193 -15628.587 -15624.030 -15620.046 -15621.829 -15618.744 pwaic 40.057 47.381 36.551 48.553 51.907 62.100 40.636 44.360 63.166 Figure 8: Public Opinion Trend and 6 Breaks: Expected break points are 2014-05-26, 2014-08-01, 2014-0919, 2014-12-08, 2015-02-18. −10 −20 −15 Slant −5 0 5 Public Opinion Trend Trend (no break) Trend (5 breaks) Figure 8 shows the estimated public opinion trend and timings of the five breaks. The first regime from April 16, 2014 to May 26, 2014 is distinguished by strongly negative values 17 2015−04−11 2015−04−01 2015−03−22 2015−03−12 2015−03−02 2015−02−20 2015−02−10 2015−01−31 2015−01−21 2015−01−11 2015−01−01 2014−12−22 2014−12−12 2014−12−02 2014−11−22 2014−11−12 2014−11−02 2014−10−23 2014−10−13 2014−10−03 2014−09−23 2014−09−13 2014−09−03 2014−08−24 2014−08−14 2014−08−04 2014−07−25 2014−07−15 2014−07−05 2014−06−25 2014−06−15 2014−06−05 2014−05−26 2014−05−16 2014−05−06 2014−04−26 2014−04−16 −25 Break Timing of the relative frequency, reflecting the immediate impact of the Sewol ferry disaster on public opinion. The second regime, which is also distinguished by negative values of the relative frequency, lasts from May 26, 2014 to August 1, 2014. Note that these two regimes are identified by the gubernatorial election of June 4 and the by-election of July 30, 2014. The two elections were significant political events that electrified partisan accusations over who was to blame. To many’s surprise, the government party managed to win eight out of seventeen gubernatorial offices in the nationwide gubernatorial election in June and to sweep eleven out of fifteen seats in the by-election in July. After the two elections, especially the July one, public opinion quickly moved in the conservative direction. The fourth regime started from September 19, 2014 and ended at December 8, 2014. The beginning of the fourth regime is likely to be marked by the driver beating incident; on September 17th, a member of the victim’s family and a member of the opposition party got involved in beating a driver from a driving escort company, intensifying the critical voice against the victims’ families and the opposition party. In November 2014, the victim’s families agreed to accept the bipartisan proposal for the special Sewol law that included plans for the congressional investigation and compensation. The fifth regime from December 8, 2014 to February 18, 2015 is very similar to the fourth regime except the fact that the volatility in the relative frequency is larger in the fifth regime than in the fourth regime. The relatively stable public opinion during the fourth and fifth regime is largely due to the successful bipartisan agreement of the special Sewol law in the National Assembly. The stable public opinion continued until March 27, 2015, from which public opinion began to sharply move towards the liberal direction. On March 27, the government announced the enforcement ordinance of the special Sewol law under which the government can freely appoint their own bureaucrats to important positions within the special investigation committee. The opposition party and the victims’ families harshly criticized the ordinance; even some members of the ruling party considered the ordinance as the usurpation of legislative power. Reflecting these concerns, public opinion turned away from the government and the ruling party at the end of March 2015. Unfortunately, this sharp drop in the public opinion is not detected as another break due to the lack of information at the end of the sample period. 4.2 Slant Estimates Before discussing time-varying slant estimates from the HMDMLM, we first examine timeconstant slant estimates from the DMLM for the sake of comparison. Figure 9 shows timeconstant measures of media slant from the DMLM. The dots are posterior means, and horizontal bars are 95% credible intervals. South Korean newspapers are quite selective in reporting partisan phrases of the two parties. We can roughly classify the nineteen newspapers into four groups: strongly liberal, moderately liberal, moderately conservative, and strongly conservative. Hangyereh Shinmun and Kyunghang Shinmun distinguish themselves as most liberal. Asia Economy, Hanguk Ilbo, Seoul Shinmun, Money Today, Kukmin Ilbo and Segye Ilbo can be considered as moderately liberal. Herald Economy, Donga Ilbo, Hanguk Economy, Seoul Economy and Joongang Ilbo are moderately conservative. Lastly, Chosun Ilbo, Mail Economy, Financial News, Digital Times, Junja Shinmun and Munhwa Ilbo are most conservative in their choice of partisan phrases regarding the Sewol ferry disaster. 18 Figure 9: Time-Constant Media Slants After Controlling for Public Opinion No Break ● ● ● ● ● ● Munhwa Junja Digital Times Financial News Mail Economy Chosun ● ● ● ● ● Joongang Media Seoul Economy Hanguk Economy Donga Herald Economy ● Segye ● ● ● ● ● Kukmin Money Today Seoul Hanguk Asia Economy ● Kyunghang ● Hangyereh −3 −2 −1 0 1 Slant This classification is quite consistent with previous studies on the ideological identification of the South Korean newspapers. For example, Lee and Koh measure political ideologies of five major newspapers in South Korea (Joongang Ilbo, Chosun Ilbo, Donga Ilbo, Hangyereh Shinmun, Kyunghang Shinmun) by examining “the valences of information given by various sources appearing in US beef imports articles” (Lee and Koh, 2009, 458). They concluded that Joongang Ilbo, Chosun Ilbo and Donga Ilbo–moderately or strongly conservative newspapers according to our measure–are more conservative than Hangyereh Shinmun and Kyunghang Shinmun–strongly liberal newspapers according to our measure–in their selection of news sources. Park (N.d.) also echoed this line of ideological division among the South Korean newspapers by examining news reports regarding the part time employment problem. The next question is whether the positions of the newspapers have changed as public opinion shifted in one way or the other during the sample period. Our model diagnostic test illustrated in Table 1 strongly suggests that the positions of the newspapers have changed as the linear trend of public opinion shifted. The details of the change are reported in Figure 10 and Figure 11. Figure 10 shows regime-specific estimates of media slant.7 Two changes are notable from Figure 10. First, the rank order of the slant estimates changes over time. While 7 Note that Chosun Ilbo did not provide their news reports to NAVER News stand from April 16 to August 29, 2014. Thus, they are missing in our data. Despite the missingness, the HMDMLM provides slant estimates for Chosun Ilbo from April 16 to August 29, 2014 by borrowing information from other newspapers. Figure 10 reports regime-specific slant estimates and slant estimates for Chosun Ilbo during the missing periods have large variances due to the lack of direct information. 19 Figure 10: Regime Specific Media Slants −5.0 ● ● ● −2.5 ● 0.0 2.5 5.0 ● −5.0 ● ● ● ● ● ● ● ● ● ● ● −2.5 0.0 Slant ● −2.5 0.0 5.0 ● −5.0 ● ● −2.5 2.5 5.0 Junja Digital Times Hanguk Economy Chosun Seoul Economy Seoul Munhwa Money Today Herald Economy Mail Economy Segye Hanguk Donga Financial News Joongang Asia Economy Kukmin Kyunghang Hangyereh ● ● −5.0 Slant −2.5 0.0 0.0 2.5 5.0 2.5 5.0 Regime 6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Slant ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Slant Regime 5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Media Media −5.0 ● 2.5 Munhwa Mail Economy Kukmin Financial News Chosun Segye Hanguk Economy Seoul Economy Donga Joongang Junja Digital Times Herald Economy Asia Economy Money Today Seoul Hanguk Kyunghang Hangyereh Slant Regime 4 Financial News Digital Times Junja Munhwa Chosun Hanguk Economy Mail Economy Seoul Seoul Economy Donga Joongang Asia Economy Herald Economy Money Today Kukmin Hanguk Segye Kyunghang Hangyereh Regime 3 ● ● ● ● ● ● ● Media ● ● ● ● Munhwa Financial News Digital Times Mail Economy Junja Donga Joongang Seoul Economy Chosun Hanguk Economy Segye Seoul Kukmin Herald Economy Money Today Hanguk Asia Economy Kyunghang Hangyereh Media ● ● ● ● ● ● ● Regime 2 ● ● ● ● Media Media Regime 1 Junja Munhwa Herald Economy Digital Times Joongang Mail Economy Seoul Economy Financial News Money Today Segye Hanguk Economy Chosun Donga Kukmin Asia Economy Seoul Kyunghang Hanguk Hangyereh 2.5 5.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Junja Munhwa Digital Times Mail Economy Hanguk Hanguk Economy Seoul Economy Joongang Chosun Donga Financial News Segye Herald Economy Kukmin Seoul Kyunghang Asia Economy Hangyereh Money Today −5.0 −2.5 0.0 Slant Hangyereh Shinmun and Kyunghang Shinmun are consistently located at the far left corner, as we saw in Figure 9, Hanguk Ilbo and Seoul Shinmun are very close to these two liberal newspapers during the first regime. In the third regime, Hanguk Ilbo is located near the far left corner with Hangyereh Shinmun and Kyunghang Shinmun. These two regimes – Regime 1 and Regime 3 – are periods in which liberal newspapers are well identified from the other newspapers. The second notable finding from Figure 10 is changes in the variance of media slants over time, which is drawn as line graphs in Figure 11 for easier interpretation. Regime 1 and Regime 3 stand out in terms of large variances in media slants. The four ideological groupings of the newspapers – strongly liberal, moderately liberal, moderately conservative, and strongly conservative – clearly emerged right after the ferry disaster. The ideological groups became weaker during June and July 2014. Then, they reemerged after the July 30 by-election. There must be a multitude of factors that contributed to the turn of public opinion in the third regime: the surprising victory of the ruling party in the by-election, the discovery of the death of the ferry company owner, Byung Eon You, and the uncompromising attitude of the victim’s families. 5 Conclusion Public opinion changes over time. Mass media try to maximize the market share by closely reflecting shifts of public opinion. At the same time, journalists, editors, and other participants of news production try to influence public opinion with their own views and agenda. The constant interaction of public opinion and mass media is one of essential characteristics of modern democracy. 20 Figure 11: Changes in Media Slants Over Time: Colors are scaled based on the rank-order of constant media slant. Bright colors indicate conservative slants and dark colors indicate liberal slants. Dim vertical lines in the background indicate timings of break in public opinion. 4 Junja Munhwa Herald Economy Digital Times 2 Slant Changes Over Time Joongang Money Today Segye Hanguk Economy Chosun 0 Slant Mail SeoulEconomy Economy Financial News Donga −2 Kukmin Asia Economy −4 Seoul Kyunghang Hanguk Estimation of media slant becomes an important empirical endeavor to understand strategic and political choices of mass media in its market. These choices are fundamentally shaped by public opinion. However, the existing methods of media slant have focused on how to map observed text data with a low dimensional vector without paying enough attention to the fact that news reports are time-ordered data and that media slant may change in response to exogenous shocks or public opinion shifts. This Paper has sought to fill this gap by presenting its own method for dynamic estimation of media slant. We build our method upon a simple multilevel model that parameterizes media slant as group-level varying intercepts when we observe frequencies of partisan phrases at the media-time level. Then, we let the global mean of the observed partisan phrase frequency smoothly vary over time using the linear dynamic model in Bayesian time series literature. In the model, the smoothly moving average partisan phrase frequency captures public opinion changes. In order to check the possibility of changes in media slant, we allow group-level varying intercepts to follow hidden Markov transitions identified by shifts in the linear trend of public opinion. We suggest the WAIC as a measure of model diagnostics to choose the most reasonable model out of a pool of our competing models – either static or dynamic. The paper has illustrated the application of this dynamic method by analyzing the ideological positions of nineteen South Korean newspapers in their news reports related with the Sewol disaster for the period from April 16, 2014 to April 11, 2015. We have uncovered dramatic changes in public opinion and ideological positions of the newspapers. Reflecting the fluctuations in public opinion, the variance of media slant has changed over time. We found that the by-elections of July 30 played an important role in shifting public opinion 21 2015−04−11 2015−04−01 2015−03−22 2015−03−12 2015−03−02 2015−02−20 2015−02−10 2015−01−31 2015−01−21 2015−01−11 2015−01−01 2014−12−22 2014−12−12 2014−12−02 2014−11−22 2014−11−12 2014−11−02 2014−10−23 2014−10−13 2014−10−03 2014−09−23 2014−09−13 2014−09−03 2014−08−24 2014−08−14 2014−08−04 2014−07−25 2014−07−15 2014−07−05 2014−06−25 2014−06−15 2014−06−05 2014−05−26 2014−05−16 2014−05−06 2014−04−26 2014−04−16 Hangyereh in favor of the ruling party and the government. Moreover, some newspapers transformed their ideological positions greatly in the middle of the sample period. Static models of media slant would have failed not only to distinguish public opinion changes from changes in media slant, but also to detect short-term changes in ideological positions of mass media. 22 References Agirdas, Cagdas. 2015. “What Drives Media Bias? New Evidence From Recent Newspaper Closures.” Journal of Media Economics 28:123–141. Baron, David P. 2006. “Persistent Media Bias.” Journal of Public Economics 90:1–36. Bernhardt, Dan, Stefan Krasa, and Mattias Polborn. 2008. “Political polarization and the electoral effects of media bias.” Journal of Public Economics 92:1092 – 1104. Carter, C. K., and R. Kohn. 1994. “One Gibbs Sampling for State-Space Models.” Biometrika 81:541–533. Chib, Siddhartha. 1995. “Marginal Likelihood From the Gibbs Output.” Journal of the American Statistical Association 90 (December):1313–1321. Chib, Siddhartha. 1996. “Calculating Posterior Distributions and Modal Estimates in Markov Mixture Models.” Journal of Econometrics 75:79–98. Chib, Siddhartha. 1998. “Estimation and Comparison of Multiple Change-Point Models.” Journal of Econometrics 86 (June):221–241. Chib, Siddhartha, and Bradley P. Carlin. 1999. “On MCMC Sampling in Hierarchical Longitudinal Models.” Statistics and Computing 9:17–26. Frühwirth-Schnatter, Sylvia. 1994. “Data Augmentation and Dynamic Linear Models.” Journal of Time Series Analysis 15:183–202. Frühwirth-Schnatter, Sylvia. 2006. Finite Mixture and Markov Switching Models. Heidelberg: Springer Verlag. Gans, Joshua S., and Andrew Leigh. 2012. “How Partisan is the Press? Multiple Measures of Media Slant.” The Economic Record 88:127–147. Gelfand, Alan E., and Adrian F. M. Smith. 1990. “Sampling-Based Approaches to Calculating Marginal Densities.” Journal of the American Statistical Association 85 (June):398– 409. Gelman, Andrew, Jessica Hwang, and Aki Vehtari. 2014. “Understanding predictive information criteria for Bayesian models.” 24:997–1016. Gelman, Andrew, John B. Carlin, Hal S. Stern, and Donald B. Rubin. 2004. Bayesian Data Analysis. 2nd ed. New York: Chapman and Hall. Gentzkow, Matthew, and Jesse M. Shapiro. 2010. “What Drives Media Slant? Evidence From U.S. Daily Newspapers.” Econometrica 78:35–71. Gerber, Alan S., Dean Karlan, and Daniel Bergan. 2009. “Does the Media Matter? A Field Experiment Measuring the Effect of Newspapers on Voting Behavior and Political Opinions.” American Economic Journal: Applied Economics 1:35–52. 23 Green, Peter J. 1995. “Reversible Jum Markov Chain Monte Carlo Computation and Bayesian Model Determination.” Biometrika 82:711–732. Groseclose, Tim, and Jeffrey Milyo. 2005. “A Measure of Media Bias.” Quarterly Journal of Economics CXX:1191–1237. Jackman, Simon. 2005. “Pooling the Polls Over an Election Campaign.” Australian Journal of Political Science 40:499 – 517. Kass, Robert E., and Adrian E. Raftery. 1995. “Bayes Factors.” Journal of the American Statistical Association 90:773–795. Kim, In Song, John Londregan, and Marc Ratkovic. 2015. “Voting, Speechmaking, and the Dimensions of Conflict in the US Senate.” Presented at the 2015 Asian Political Methodology Meeting. Larcinese, Valentino, Riccardo Puglisi, and Jr. James M. Snyder. 2007. “Partisan Bias in Economic News: Evidence on the Agenda-Setting Behavior of U.S. Newspapers.” Journal of Public Economics 95:1178–1189. Lee, Gunho, and Heungseok Koh. 2009. “Korean Newspaper’s Political Orientation Featuring in US Beef Imports Articles.” Korean Journal of Journalism Communication Studies 53:347–369. Matthew Gentzkow, Jesse M. Shapiro. 2006. “Media Bias and Reputation.” Journal of Political Economy 114:280–316. Park, Eunjeong, and Sungzoon Cho. 2014. KoNLPy: Korean natural language processing in Python. In Proceedings of the 26th Annual Conference on Human & Cognitive Language Technology. Park, Jae-Young. N.d. “Ideological Dimension of South Korean News Media (Hanguk Eolonsadulye Jungpasung Jihyung).” Seminar on the Jounalism Practice Committee. Raftery, Adrian E. 1995. “Bayesian Model Selection in Social Research.” Sociological Methodology 25:111–163. Spiegelhalter, David J., Nicola G. Best, Bradley P. Carlin, and Angelika Van Der Linde. 2002. “Bayesian measures of model complexity and fit.” J R Stat Soc Series B Stat Methodol 64:583–639. Spiegelhalter, David J., Nicola G. Best, Bradley P. Carlin, and Angelika van der Linde. 2014. “The deviance information criterion: 12 years on.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76:485–493. Stone, Daniel F. 2011. “Ideological media bias.” Journal of Economic Behavior Organization 78:256 – 271. Strömberg, David. 2015. “Media and Politics.” Annual Review of Economics 7:173–205. 24 Taddy, Matt. 2013. “Multinomial Inverse Regression for Text Analysis.” Journal of the American Statistical Association 108:755–770. Vehtari, Aki, and Janne Ojanen. 2012. “A survey of Bayesian predictive methods for model assessment, selection and comparison.” pp. 142–228. Vehtari, Aki, and Jouko Lampinen. 2002. “Bayesian Model Assessment and Comparison Using Cross-Validation Predictive Densities.” Neural Computation 14 (2015/07/19):2439– 2468. Watanabe, Sumio. 2010. “Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory.” Journal of Machine Learning Research 11:3571–3594. Zaller, John. 1999. A Theory of Media Politics. Unpublished. 25