A Framework for Eliciting, Incorporating, and Disciplining
Transcription
A Framework for Eliciting, Incorporating, and Disciplining
Preliminary and Incomplete A Framework for Eliciting, Incorporating, and Disciplining Identification Beliefs in Linear Models Francis J. DiTraglia Camilo Garcia-Jimeno University of Pennsylvania March 20, 2015 Abstract We consider the problem of estimating a causal effect from observational data in a simple linear model that may be subject to classical measurement error, an endogenous regressor and an invalid instrument. After characterizing the identified set for this problem, we propose a Bayesian tool for inference that is more general and informative than the usual frequentist approach to partial identification and show how it can be used to help applied researchers reason coherently about their identification beliefs. We conclude with two simple examples illustrating the usefulness of our method. 1 Introduction To identify causal effects from observational data, even the staunchest frequentist econometrician must augment the data with her beliefs. In an instrumental variable (IV) regression, for example, the exclusion restriction represents the belief that the instrument has no direct on the outcome of interest after controlling for the regressors. While this is a strong belief, it is explicit, and its meaning is well-understood. Although the exclusion restriction can never be directly tested, applied researchers know how to think about it and how to debate it. Indeed, in specific problems we often 1 have a reasonable idea of the kinds of factors that make up the regression error term: in a wage regression, for example, the key unobservable is ability. This allows us to consider whether the assumption that these are uncorrelated with the instrument is truly plausible. The exclusion restriction is what we call an “formal identification belief” – something that is directly stated and whose role in achieving identification is clear. In addition to imposing formal beliefs to achieve identification, researchers often state a number of other “informal beliefs” in applied work. We use this term to refer to beliefs that are not imposed in estimation, but which may be used, among other things, to interpret results, or reconcile conflicting estimates from different specifications. For example, papers that report the results of IV regressions almost invariably state the authors’ belief about the sign of the correlation between the endogenous regressor and the error term but fail to exploit this information. Referring to the more than 60 papers published in the top three empirical journals between 2002 and 2005 that reported the results of IV regressions, for example, Moon and Schorfheide (2009) pointed out that “in almost all of the papers the authors explicitly stated their beliefs about the sign of the correlation between the endogenous regressor and the error term; yet none of the authors exploited the resulting inequality moment condition in their estimation.” Another common informal belief involves measurement error. When empirical researchers uncover an OLS estimate that is substantially smaller than but has the same sign as its IV counterpart, classical measurement error, with its attendant “least squares attenuation bias,” often takes the blame. While measurement error, endogenous regressors and invalid instruments have all generated voluminous literatures, we know of no paper that considers the effects of all three problems at once. In a certain sense this is unsurprising: a partial identification analysis based on a model that suffers from so many serious problems seems unlikely to produce particularly informative bounds. Nevertheless, applied researchers have beliefs about all three of these quantities, and at present lack a tool for testing whether these beliefs cohere and, if they do, imposing them in estimation. In this paper we consider a simple linear model in which the goal is to estimate the causal effect β of a regressor x that may be measured with error and is potentially endogenous. Although an instrumental variable z is available, it may not satisfy the exclusion restriction. For the moment we abstract from covariates and limit our attention to classical measurement error: extensions to address both of these short- 2 comings are currently in progress. After characterizing the identified set for this model, we propose a Bayesian tool to allow applied researchers, who may not be Bayesians themselves, to reason coherently about their identification beliefs. Specifically, we propose a procedure for sampling uniformly from the identified set for the non-identified parameters of the model conditional on the identified parameters. By imposing sign and interval restrictions, we can add prior information to the problem while remaining uniform on the regions of the identified set that remain. In some cases the result can be quite informative even if the beliefs imposed are somewhat weak. Unlike the usual frequentist partial identification analysis, we take seriously the possibility that there may be more points on the identified set that are compatible with a particular value of β than another. While the uniformity of the prior on the identified set need not be taken literally, it provides a good starting point for moving beyond the more common analysis based on “worst-case” bounds. This paper relates to a vast literature on partial identification, measurement error, and invalid instruments. Two recent papers with a similar flavor to this one are Conley et al. (2012), who propose a Bayesian procedure for examining the importance of violations of the exclusion restriction in IV regressions, and Nevo and Rosen (2012) who derive bounds in the setting where an endogenous regressor is “more endogenous” than the variable used to instrument it is invalid. Our paper also relates to the literature on the Bayesian analysis of non-identified models, particularly Poirier (1998) and Moon and Schorfheide (2012). The remainder of this paper is organized as follows. Section 2 derives the model, and Section 3 explains our preferred parameterization of the identified set. Section 4 solves for the identified set and Section 5 describes our inferential procedure and how we sample from the identified set. We conclude in Section 6 with two examples illustrating the usefulness of our method: one that examines the effect of institutions on development and another that revisits the returns to schooling. 3 2 The Model We observe x, y and z from the following linear structural model y = βx∗ + u (1) x∗ = πz + v (2) x = x∗ + w (3) where we assume, without loss of generality, that all random variables in the system are mean zero or have been de-meaned. Our goal is to learn the parameter β, the causal effect of x∗ . Unfortunately x∗ is unobserved: we only observe a noisy measure x that has been polluted by classical measurement error w. We call (u, v, w, z) the “primitives” of the system. Their covariance matrix is as follows: Ω = V ar u v w z = σu2 σuv 0 σuz 0 σuv σv2 0 0 0 σw2 0 σuz 0 0 σz2 (4) Because w represents classical measurement error, it is uncorrelated with u, v, and z as well as x∗ . The parameter σuz controls the endogeneity of the instrument z: unless σuz = 0, z is an invalid instrument. Both σuv and σuz are sources of endogeneity for the unobserved regressor x∗ . In particular, σx∗ u = σuv + πσuz which we can derive, along with the rest the fact that y 1 x 0 x∗ = 0 0 z (5) of the covariance matrix for (y, x, x∗ , z), from β 1 1 0 0 βπ u 1 π v 0 π w 0 1 z (6) along with the assumptions underlying the covariance matrix Ω of (u, v, w, z). The system we have just finished describing is not identified: without further restrictions we cannot learn the value of β from any amount of data. In particular, 4 neither the OLS nor IV estimators converge in probability to β, instead they approach βOLS σxy σx2∗ σ x∗ u = 2 =β 2 + 2 σx σx∗ + σw2 σx∗ (7) σzy σzy =β+ σxz σxz (8) and βIV = where σx2∗ denotes the variance of the unobserved true regressor x∗ , which equals σx2 −σw2 . Some quantities in the system, however, are identified. Since we observe (x, y, z), we can learn the entries of the covariance matrix Σ of the observables, defined as σx2 σxy σxz Σ = σxy σy2 σyz σxz σyz σz2 (9) and, as a consequence, the value of the first stage coefficient π since π= σ x∗ z σxz = 2 2 σz σz (10) where the fact that σx∗ z = σxz follows from Equations 4 and 6. Although β is unidentified, the observable covariance matrix Σ, along with constraints on the unobserved covariance matrix Ω of the primitives, does impose restrictions on the unobservables. Combined with even relatively weak subject-specific prior knowledge, these restrictions can sometimes prove surprisingly informative, as we show below. Before we can do this, however, we need to derive the identified set. To aid in this derivation, we first provide a re-parameterization of the problem that will not only simplify the expressions for the identified set, but express it in terms of quantities that are empirically meaningful and thus practical for eliciting beliefs. 3 A Convenient Parameterization The model introduced in the preceding section contains five non-identified parameters: β, σuv , σuz , σv2 , and σw2 . In spite of this, as we will show below, there are only two degrees of freedom: knowledge of any two of the five is sufficient to identify the remaining three. As such we have a choice of how to represent the identified set. Because our ultimate goal is to elicit and incorporate researcher’s beliefs, we adopt three criteria for choosing 5 a parameterization: 1. The parameters should be scale-free. 2. The parameter space should be compact. 3. The parameters should be meaningful in real applications. Based on these considerations, we define the identified set in terms of the following quantities: ρzu = Cor(z, u) (11) ρx∗ u = Cor(x∗ , u) σ 2∗ σ 2∗ κ = x2 = 2 x 2 σx σx∗ + σw (12) (13) Note that these parameters are not independent of one another. For example, ρx∗ u depends on both κ and ρzu . This is precisely the point of our analysis: these three quantities are bound together by the assumptions of the model, which allows us to derive the identified set. The first quantity ρzu is the correlation between the instrument and the main equation error term u. This measures the endogeneity of the instrument: the exclusion restriction in IV estimation, for example, corresponds the belief that ρzu = 0. When critiquing an instrument, researchers often state a belief about the likely sign of this quantity. The second quantity ρzu is the correlation between the unobserved regressor x∗ and the main equation error term. This measures the overall endogeneity of x∗ , taking into account both the effect of σuv and σzu . As pointed out by Moon and Schorfheide (2009), researchers almost invariably state their belief about the sign of this quantity before undertaking an IV estimation exercise. The third quantity, κ, is somewhat less familiar. In the simple setting we consider here, with no covariates, κ measures the degree of attenuation bias present in the OLS estimator. In other words, if ρx∗ u = 0 then the OLS probability limit is κ. Equivalently, since σx∗ y = σxy κ= σx2∗ σx2 2 σyx 2 σyx ∗ = 2 σyx σx2 σy2 σx2∗ σy2 2 σyx ∗ ρ2yx = 2 ρyx∗ (14) so another way to interpret κ is as the ratio of the observed R2 of the main equation and the unobserved R2 that we would obtain if our regressor had not been polluted 6 by measurement error. A third and more general way to think about κ is in terms of signal and noise. If κ = 1/2, for example, this means that half of the variation in the observed regressor x is “signal,” x∗ , and the remainder is noise, w. While the other two interpretations we have provided are specific to the case of no covariates, this third interpretation is not. There are several advantages to parameterizing measurement error in terms of κ rather than the measurement error variance σw2 . First, κ has compact support: it takes a value in (0, 1]. When κ = 1, σw2 = 0 so there is no measurement error. In the limit as κ approaches zero corresponds to taking σw2 to infinity. Second, writing expressions in terms of κ greatly simplifies our calculations. Indeed, as we will see in the next section, the sample data provide simple and informative bounds for κ. Third, and most importantly, we consider it much easier to elicit beliefs about κ than σw2 . We will consider this point in some detail in the empirical examples that we present below. In the section that follows we will solve for ρzu in terms of ρx∗ u , κ and the observable covariance matrix Σ. First, however, we will derive bounds on these three quantities. 4 4.1 The Identified Set Bounds on The Non-Identified Parameters Our compact parameterization from the preceding section gives us several obvious bounds: ρx∗ u , ρzu ∈ [−1, 1] and κ ∈ (0, 1]. Yet there are other, less obvious bounds that come from the two covariance matrices: Σ and Ω. To state these additional bounds, we need an expression for σv2 , the variation in x∗ not attributable to the instrument z, in terms of κ and observables only. To this end, note that the R2 of the IV first stage, ρ2xz , can be expressed as (πσz )2 π2σ2 ρ2zx = 2 2 = 2 z σx σz σx Combining this with the fact that σx2 = σv2 + σw2 + π 2 σz2 , we have 1= σv2 + σw2 + ρ2xz σx2 Rearranging and simplifying we find that ρ2xz = κ − σv2 /σx2 and hence σv2 = σx2 (κ − ρ2xz ) 7 (15) We now proceed to construct an additional bound for κ in terms of the elements of Σ. To begin, since we can express κ as ρ2xy /ρ2x∗ y and squared correlations are necessarily less than or equal to one, it follows that κ > ρ2xy . Although typically stated somewhat differently, this bound is well known: in fact it corresponds to the familiar “reverse regression bound” for β which goes back at least to Frisch (1934).1 As it happens, however, Σ provides an additional bound that may be tighter than κ > ρ2xy . Since σv2 and σx2 are both strictly positive, Equation 15 immediately implies that κ ≥ ρ2xz . In other words the R2 of the IV first-stage provides an upper bound for the maximum possible amount of measurement error. Given its simplicity, we doubt that we are the first to notice this additional bound. Nevertheless, to the best of our knowledge, it has not appeared in the literature. Taking the best of these two bounds, we have max{ρ2zx , ρ2yx } ≤ κ ≤ 1 (16) Recall that κ is inversely related to the measurement error variance σw2 : larger values of κ correspond to less. We see from the bound in Equation 16 that larger values of either the first-stage or OLS R-squared leave less room for measurement error. This is important because applied econometricians often argue that their data is subject to large measurement error to explain a large discrepancy between OLS and IV estimates, but we are unaware of any cases in which this belief is confronted with these restrictions. Before proceeding to solve for the identified set, we derive one further bound from the requirement that Ω – the covariance matrix of the model primitives (u, v, w, z) – be positive definite. At first glance it might appear that this restriction merely ensures that variances are positive and correlations bounded above by one in absolute value. Recall, however, that Equation 4 imposes a considerable degree of structure on Ω. In particular, many of its elements are assumed to equal zero. Consider the restriction |Ω| > 0. This implies 2 2 σz + σv2 σu σz2 − σuz >0 σw2 −σuv but since σw2 > 0, this is equivalent to 2 2 σz2 σu2 σv2 − σuv > σv2 σuz 1 To see this, suppose that ρx∗ u = 0, and without loss of generality that β is positive. Then Equation 7 gives βOLS = κβ < β. Multiplying both sides of κ > ρ2xy by β and rearranging gives β < βOLS /ρ2xy , and hence βOLS < β < βOLS /ρ2xy . 8 Dividing both sides through by σu2 σz2 σv2 and rearranging, we find that ρ2uv + ρ2uz < 1 (17) In other words (ρuz , ρuv ) must lie within the unit circle: if one of the correlations is very large in absolute value, the other cannot be. To understand the intuition behind this constraint, recall that since v is the residual from the projection of x∗ onto z, it is uncorrelated with z by construction. Now suppose that ρuz = 1. If ρuv were also equal to one, we would have a contradiction: z and v would be perfectly correlated. The constraint given in Inequality 17 rules this out. As explained above, we will characterize the identified set in terms of ρx∗ u , ρzu and κ, eliminating ρuv from the system. Thus, we need to restate Inequality 17 so that it no longer involves ρuv . To accomplish this, first write ρx∗ u = σv σx∗ ρuv + πσz σ x∗ ρuz p p and then note that σv /σx∗ = 1 − ρ2xz /k and πσz /σx∗ = ρ2xz using Equation 15 and the definition of κ. Combining, ρ x∗ u = p p 1 − ρ2xz /κ ρuv + ρ2xz /κ ρuz and solving for ρuv , ρuv √ ρx∗ u κ − ρuz ρxz p = κ − ρ2xz (18) (19) so we can re-express the constraint from Inequality 17 as !2 √ ρx∗ u κ − ρuz ρxz p + ρ2uz < 1 κ − ρ2xz 4.2 (20) Solving for the Identified Set We now provide a characterization of the identified set by solving for ρuz in terms of ρx∗ u , κ and the observables contained in Σ. Rewriting the Equation 8, we have β= σzy − σuz σzx 9 (21) and proceeding similarly for Equation 7, β= σxy − σx∗ u κσx2 (22) Combining Equations 21 and 22, we have σzy − σuz σxy − σx∗ u = σzx κσx2 (23) Now, using Equations 4 and 6, the variance of y can be expressed as σy2 = σu2 + β 2σx∗ u + βκσx2 Substituting Equation 21 for β, Equation 22 for βκσx2 , and rearranging, σu2 − σy2 + σzy − σuz σzx (σx∗ u + σxy ) = 0 (24) The next step is to eliminate σu from our system of equations. First we substitute √ σx∗ z = σu κσx ρx∗ u σuz = σu σz ρuz into Equations 23 and 24, yielding σu2 and − σy2 + σzy − σu σz ρuz σzx √ σu σx κρx∗ u + σxy = 0 σxy − σx σu ρx∗ u σzy − σu σz ρuz = σzx κσx2 (25) (26) Rearranging Equation 26 and solving for σu , we find that σu = σx √ σzx σxy − κσx2 σzy κσxz ρx∗ u − σz κσx2 ρuz (27) Since we have stated the problem in terms of scale-free structural parameters, namely (ρzu , ρx∗ u , κ), we may assume without loss of generality that σx = σy = σz . Even if the raw data do not satisfy this assumption, the identified set for the structural parameters is unchanged. Imposing this normalization, the equation for the identified 10 set becomes σ eu2 −1 + where ρzy − σ eu ρuz ρzx √ σ eu κρx∗ u + ρxy = 0 ρxz ρxy − κρzy σ eu = √ κρxz ρx∗ u − κρuz (28) (29) We use the notation σ eu to indicate that normalizing y to have unit variance does change the scale of σu . Specifically, σ eu = σu /σy . This does not introduce any complications because we eliminate σ eu from the system by substituting Equation 29 into Equation √ 28. Note, however, that when κρuz = ρx∗ u ρxz , Equation 27 has a singularity. After eliminating σ eu , Equation 28 becomes a quadratic in ρzu with parameters that depend on the structural parameters (ρx∗ u , κ) and the reduced form correlations (ρxy , ρxz , ρzy ). Solving, we find that − (ρ+ uz , ρuz ) = ρx∗ u ρxz √ κ s ± (ρxy ρxz − κρzy ) 1 − ρ2x∗ u κ κ − ρ2xy (30) Notice that the fraction under the square root is always positive, so both solutions are always real. This follows because ρ2x∗ u must be between zero and one and, as we showed above, κ > ρ2xy . Although the preceding expression always yields two real solutions, one of these is extraneous as it implies a negative value for σ eu . To see why this is the case, substitute each solution into the reciprocal of Equation 29. We have σ eu−1 s # " √ κρxz ρx∗ u κ ρx∗ u ρxz 1 − ρ2x∗ u √ − = ± (ρxy ρxz − κρzy ) ρxy ρxz − κρzy ρxz ρxy − κρzy κ κ κ − ρ2xy " √ # s √ κρxz ρx∗ u κρxz ρx∗ u κ(1 − ρ2x∗ u ) = − ± ρxy ρxz − κρzy ρxy ρxz − κρzy κ − ρ2xy s κ(1 − ρ2x∗ u ) = ∓ κ − ρ2xy Since the quantity inside the square root is necessarily positive given the constraints on correlations and κ, we see that ρ+ uz is always extraneous. Thus, the only admissible solution is s ∗ ρx u ρxz 1 − ρ2x∗ u √ ρuz = − (ρxy ρxz − κρzy ) (31) κ κ κ − ρ2xy Along with Inequalities 16 and 20, and the requirement that correlations be less than 11 one in absolute value, Equation 31 gives a complete characterization of the identified set. Given a triple (ρzu , ρx∗ u , κ) and values for the elements (σx , σz , σy , ρxy , ρxz , ρyz ) of the observable covariance matrix Σ, we can solve for the implied value of β using Equation 21. Specifically, σy ρyz − ρzu σ eu β= (32) σz ρxz using the fact that σ eu = σu /σy , where σ eu is the standard deviation of the main equation error term from the normalized system, as given in Equation 27, and σu is the standard deviation of the main equation error term from the original system. Notice that ρx∗ u and κ enter Equation 32 through σ eu . This fact highlights the central point of our analysis: even though exact knowledge of σuz alone would be sufficient to correct the IV estimator, yielding a consistent estimator of β, stating beliefs about this quantity alone does not provide a satisfactory solution to the identification problem. For one, because it depends on the scaling of both z and u, it may be difficult to elicit beliefs about σuz . Although we can learn σz from the data, σu can only be estimated if we have resolved the identification problem. In contrast, ρzu , our preferred parameterization, is scale-free. More importantly, however, the form of the identified set makes it clear that our beliefs about ρuz are constrained by any beliefs we may have about ρx∗ u and κ. This observation has two important consequences. First, it provides us with the opportunity to incorporate our beliefs about measurement error and the endogeneity of the regressor to improve our estimates. Failing to use this information is like leaving money on the table. Second, it disciplines our beliefs to prevent us from reasoning to a contradiction. Without knowledge of the form of the identified set, applied researchers could easily state beliefs that are mutually incompatible without realizing it. Our analysis provides a tool for them to realize this and adjust their beliefs accordingly. While we have thus far discussed only beliefs about ρzu , ρx∗ u and κ, one could also work backwards from beliefs about β to see how they constrain the identified set. We explore this possibility in one of our examples below. 5 Bayesian Inference for the Identified Set Having characterized the identified set, the usual Frequentist approach would be to use it to derive bounds for β, possibly after imposing sign or interval restrictions on ρzu , ρx∗ u and κ. In its broad strokes, we agree with this approach: it makes sense 12 to report the full range of possible values for β and the prior beliefs that researchers commonly state often take the form of sign restrictions. But bounds on β tell only part of the story. The identified set is a two-dimensional surface of which the usual partial identification bounds consider only the two worst-case points. While it may well be difficult to specify an informative prior on the identified set (ρzu , ρx∗ u , κ) it is surely relevant to consider what fraction of the points in this set lead to a particular value for β. Given that the partial identification bounds for β could easily map back to extremely atypical values for (ρzu , ρx∗ u , κ), it would seem odd not to find some way of averaging over the information contained in the entire identified set. Accordingly, we adopt a suggestion from Moon and Schorfheide (2012) and place a uniform prior on the identified set conditional on the observable covariance matrix Σ. Choosing a prior to represent “ignorance” is always somewhat contentious as a prior that is flat in one parameterization can be highly informative in another. As explained above, we believe that there are compelling reasons to parameterize the problem in terms of ρzu , ρx∗ u and κ: they are scale-free, empirically meaningful quantities about which researchers are naturally inclined to state beliefs. In most situations, however, these beliefs will be fairly vague. And indeed, specifying an informative prior on the identified set may be challenging. An advantage of our proposed conditionally uniform prior is that it remains uniform after imposing interval or sign restrictions by “cutting off” sections of the identified set. In this way, we can allow researchers impose beliefs on the problem without the need to specify a density supported on a complicated two-dimensional region embedded in three-dimensional space. Moreover, there is no need to take the uniform prior literally in this context. Instead, one can view it as a starting point. For example, one can pose the question of what kind of deviation from uniformity would be necessary to encode particular beliefs about β. Will consider this possibility in one of our empirical examples below. The analysis of the preceding section took Σ as known, but in practice it must be estimated from sample data. As such there is not a single identified set but an identified set for each possible Σ. Thus, having stated a conditional prior for ρzu , ρx∗ u , κ, it remains to decide how to sampling uncertainty in the observable covariance matrix Σ into the problem. As our aim is to appeal to applied researchers who may not typically rely on Bayesian methods, the ideal would be a minimally informative, default prior that closely approximates the usual frequentist inference for the identified parameters. We are currently exploring various possibilities to achieve this goal. In interim, and 13 for the purposes of this draft, we specify a multivariate normal likelihood for (x, y, z) and a Jeffrey’s prior for Σ. Specifically for i = 1, . . . , n we suppose xi yi ∼ iid N3 (µ, Σ) zi (33) π (µ, Σ) ∝ |Σ|−2 (34) leading to the marginal posterior Σ|x, y, z ∼ Inverse-Wishart(n − 1, S) (35) where (x − x ¯ ) i n h i X S= (yi − y¯) (xi − x¯) (yi − y¯) (zi − z¯) i=1 (zi − z¯) (36) To generate uniform draws on the identified set conditional on a given posterior draw for Σ` , we employ a two-stage accept-reject algorithm. We begin the first step by drawing κj ∼ Uniform(κL , κU ) independently of ρjx∗ u ∼ Uniform(ρLx∗ u , ρUx∗ u ). Absent any prior restrictions that further restrict the support of κ or ρx∗ u , we take κL = max (ρ2zx )j , (ρ2xy )j , κU = 1, ρLx∗ u = −1 and ρUx∗ u = 1. We then solve for ρjzu via Equation 31 and check whether it lies in the interval [ρLzu , ρUzu ]. Absent any prior restrictions on ρzu , we take this interval to be [−1, 1]. If ρjzu lies in this region and if the triple (ρzu , ρx∗ u , κ) satisfies Inequality 20, we accept draw j; otherwise we reject it. We repeat this process until we have J draws on the identified set. While these draws are uniform when projected into the (κ, ρx∗ u ) plane, however, they are not uniform on the identified set itself. To make them uniform, we need to re-weight each draw based on the local surface area of the identified set at that point. By “local surface area” we refer to the quantity s M (ρx∗ u , κ) = 1+ ∂ρuz ∂ρx∗ u 2 + ∂ρuz ∂κ 2 (37) which Apostol (1969) calls the “local magnification factor” of a parametric surface. 14 The derivatives required to evaluate the function M are ρxz ρx∗ u (ρxy ρxz − κρzy ) ∂ρuz =√ +q ∂ρx∗ u κ κ κ − ρ2xy (1 − ρ2x∗ u ) (38) and ∂ρuz ρx∗ u ρxz =− + ∂κ 2κ3/2 s 1 − ρ2x∗ u κ κ − ρ2xy 1 1 1 ρzy + (ρxy ρxz − κρzy ) + 2 κ κ − ρ2xy (39) To accomplish the re-weighting, we first evaluate M j = M (ρjx∗ u , κj ) at each draw j that was accepted in the first step. We then calculate Mmax = maxj=1,...,J M j and resample the draws ρjzu , ρjx∗ u , κj with probability pj = M j /Mmax . 6 Empirical Examples We now consider two simple empirical examples illustrating the methods proposed above: the first considers the effect of institutions on income per capita, and the second considers the returns to schooling. 6.1 The Colonial Origins of Comparative Development We begin by considering the main specification of Acemoglu et al. (2001), who use early settler mortality as an instrument to study the effect of institutions on GDP per capita based on cross-country data for a sample of 64 countries. The main equation is log GDP/capita = constant + β (Institutions) + u and the first stage is Institutions = constant + π (log Settler Mortality) + v Leading to an OLS estimate of βbOLS = 0.52 and an IV estimate that is nearly twice as large (βbIV = 0.94), a difference which the authors attribute to measurement error: This estimate is highly significant . . . and in fact larger than the OLS estimates reported in Table 2. This suggests that measurement error in the 15 institutions variables that creates attenuation bias is likely to be more important that reverse causality and omitted variables biases. But can measurement error really explain this disparity, or is something else to blame? Figure 1 presents two views of the identified set evaluated at the maximum likelihood estimate for Σ, imposing no prior information on the problem. Figure 2 depicts each two-dimensional projection of the same set. The points in red correspond to values of κ that are greater than 0.6 Without prior restrictions, the identified set is not particularly informative although it does rule out especially large amounts of measurement error: the minimum value of κ consistent with the data (at the MLE) is around 0.5. Figure 3 maps the points on the identified set to the corresponding values of β. The panel at left is based on the identified set at the MLE, while the panel at right averages over 1000 identified sets corresponding to the Inverse-Wishart draws depicted in Figure 4. The posterior mean value of β in this case is quite close to the IV estimate and far above the OLS estimate. Moreover, the posterior is heavily concentrated around positive values of β. Even if you do not believe our uniform conditional prior, if would be difficult to obtain a posterior the assigned substantial probability to negative values of β in this case. In their paper, however, Acemoglu et al. (2001) state a number of beliefs that are relevant for this exercise. First, they claim that there is likely a positive correlation between “true” institutions and the main equation error term u. Second, by way of a footnote that uses a second measure of institutions as an instrument for the first, they argue that measurement error could be substantial enough to yield a value of κ as small as 0.6. This would correspond to 40% of the variation in the observed measure of institutions being noise. Accordingly, Figures 5 and 6 restrict the identified set to impose these constraints. Even after imposing these relatively weak beliefs the picture dramatically changes. From the rightmost panel of Figure 5, we see that Settler Mortality cannot be a valid instrument: if we believe that ρx∗ u is positive and that κ is at most 0.6, then ρzu must be negative. Turning our attention to Figure 6, the posterior for β is now concentrated around the OLS estimate. Indeed, the IV estimate is at the edge of being infeasible given the data. At the very least it is likely to be a substantial overestimate. Nevertheless, the main result of Acemoglu et al. (2001) continues to hold: in spite of the fact that Settler Mortality is negatively correlated with u, from this exercise it appears that the effect of institutions on income per capita is almost certainly positive. 16 6.2 The Returns to Schooling Our second example uses a subset of data from Blackburn and Neumark (1992) to study the returns to schooling based on a sample of 935 US males. The main equation is log Wage = constant + β(Education) + u and the first stage is Education = constant + π(Siblings) + v The variable Education measures an individual’s years of schooling, and Siblings measures the number of brothers and sisters that he has. The estimated first stage coefficient in this example is π b = −0.23 while the OLS and IV estimates are βbOLS = 0.06 and βbIV = 0.12. As in the Colonial Origins example, the IV estimate is much larger than the OLS estimate: a 12% increase in wages per additional year of schooling compared to a 6% increase. Could measurement error be the blame? Figure 7 presents two views of the identified set, evaluated at the MLE for Σ, imposing only the requirement that κ > 0.1 to avoid numerical problems. (Since this lower bound corresponds to 90% of the observed variation in years of schooling being noise, it may be considered a fairly innocuous restriction.) Note how different the identified set appears in this example compared to the Colonial Origins example. Here, the data do not rule out any values of κ and the restrictions κ > 0.1 binds. Figure 8 gives the corresponding posterior for β: the panel at left ignores sampling variability, considering the identified set at the MLE for Σ, while the panel at right averages over the 1000 Inverse-Wishart draws depicted in Figure 9. With nearly 1000 observations and only six quantities to estimate, sampling variability has no appreciable impact in this example, unlike the Colonial Origins example from above. But more importantly, the identified set in this example is almost completely uninformative: -300 and +300% differences in wages per additional year of schooling appear to be consistent with the data. Indeed, on this scale, the differences between the OLS and IV estimates are trivial. Perhaps imposing prior beliefs can help. The key unobservable that makes up u is almost certainly ability, which we would suspect is positively correlated with years of schooling. Because of mis-reporting, it is likely that years of schooling is measured with error but it seems extreme to entertain a value of κ below 0.5, as this would correspond to more than half of the observed 17 variation in years of schooling being noise. But what about the instrument, Siblings? There is certainly reason to suspect that it could be correlated with ability, u. For example, in parents with more children likely have less time to spend with each of them and this may cause a negative correlation between Siblings and u. Alternatively, one could imagine that older siblings supplement parental attention and thereby increase the ability of their younger siblings. This story would result in a positive correlation between Siblings and ability. Based on this reasoning, we now consider imposing the restrictions κ > 0.5 and ρx∗ u > 0. Because it is unclear what sign to expect for ρzu , we leave this parameter unconstrained. Figure 10 gives the posterior for β after restricting the identified set so that κ > 0.5 and ρx∗ u > 0. Surprisingly the result of this restriction has not been to rule out extremely large negative effects of schooling on wages, but nearly all positive effects: wage declines of 100 or even 200% still appear to be consistent with the data. Surely something must be amiss: we have a very strong prior belief that the returns to education should not be negative. To understand what is happening here, we plot both the restricted and unrestricted identified sets using the color red to denote a point that maps into a positive value of β: Figure 11 presents the three-dimensional version while Figure 12 presents the two-dimensional projections. From the Figures we see that, while a majority of the unrestricted identified set map into a positive values for β, nearly all of these points correspond to extremely small values of κ and negative values for ρx∗ u . After imposing the restrictions κ > 0.5 and ρx∗ u > 0, hardly any of the red points remain. In this example the results are essentially negative: we do not learn anything meaningful about the returns to education. Nevertheless, we still uncovered something valuable: a contradiction in our beliefs. The belief that ρx∗ u is positive and κ isn’t too small is effectively incompatible with the belief that the returns to education are positive in this example. Something is clearly wrong: either with our beliefs or with our maintained assumptions – for example the model specification and the assumption that the measurement error is classical – but this was not obvious until after we examined the identified set and posterior for β. 18 References Acemoglu, D., Johnson, S., Robinson, J. A., 2001. The colonial origins of comparative development: An empirical investigation. The American Economic Review 91 (5), 1369–1401. Apostol, T. M., 1969. Calculus, 2nd Edition. Vol. II. John Wiley and Sons, New York. Blackburn, M., Neumark, D., 1992. Unobserved ability, efficiency wages, and interindustry wage differentials. The Quarterly Journal of Economics 107 (4), 1421– 1436. Conley, T. G., Hansen, C. B., Rossi, P. E., 2012. Plausibly exogenous. The Review of Economics and Statistics 94 (1), 260–272. Moon, H. R., Schorfheide, F., 2009. Estimation with overidentifying inequality moment conditions. Journal of Econometrics 153, 136–154. Moon, H. R., Schorfheide, F., 2012. Bayesian and frequentist inference in partially identified models. Econometrica 80 (2), 755–782. Nevo, A., Rosen, A. M., 2012. Identification with imperfect instruments. The Review of Economics and Statistics 94 (3), 659–671. Poirier, D. J., 1998. Revising beliefs in nonidentified models. Econometric Theory 14, 483–509. 19 Figure 1: Identified set for Colonial Origins Example: No prior constraints on ρzu , ρx∗ u , κ. 20 Figure 2: Identified Set for Colonial Origins Example: No prior constraints on ρzu , ρx∗ u , κ, values of κ less than 0.6 in red. 21 Full Posterior 2.0 Conditional Prior (at MLE) 5 OLS 0 0.0 1 0.5 2 Density 1.0 Density 3 4 1.5 IV −2 −1 0 β 1 2 3 −2 −1 0 β 1 2 3 Figure 3: Posterior Draws for β in the Colonial Origins Example: no prior constraints on ρzu , ρx∗ u , κ ρzu , ρx∗ u , κ. The left panel ignores uncertainty in Σ and evaluates the b The right panel averages over 1000 identified sets correidentified set at the MLE Σ. sponding to the posterior draws for Σ illustrated in Figure 4. 22 Cov(y,x) Cov(z,x) 1.5 2.0 2.5 3.0 3.5 4.0 4.5 1.0 0.0 0.0 0.0 0.2 0.5 0.5 0.4 1.0 0.6 0.8 1.5 1.5 1.0 Var(x) 0.5 1.0 1.5 2.0 2.5 −2.5 −2.0 −1.5 −1.0 −0.5 Cov(z,y) 0.6 0.0 0.0 0.5 0.5 1.0 1.0 1.5 1.5 2.0 2.0 Var(y) 0.8 1.0 1.2 1.4 1.6 1.8 2.0 −1.5 −1.0 −0.5 0.8 1.2 Var(z) 0.4 Red Vertical Line = MLE 0.0 Based on 1000 Posterior Draws 1.0 1.5 2.0 2.5 Figure 4: Posterior Draws for Σ in Colonial Origins Example. 23 3.0 Figure 5: Identified Set for Colonial Origins Example: κ constrained to be greater than 0.6 and ρx∗ u constrained to be positive. Full Posterior 1.5 Conditional Prior (at MLE) OLS Density 0.0 0.0 0.5 0.5 Density 1.0 1.0 1.5 2.0 IV −1.0 −0.5 0.0 β 0.5 1.0 −1.0 −0.5 0.0 β 0.5 1.0 Figure 6: Posterior Draws for β in the Colonial Origins Example: κ constrained to be greater than 0.6 and ρx∗ u constrained to be positive. The left panel ignores uncertainty b The right panel averages over 1000 in Σ and evaluates the identified set at the MLE Σ. identified sets corresponding to the posterior draws for Σ illustrated in Figure 4. 24 Figure 7: Identified Set for Returns to Schooling Example: No prior constraints on ρzu , ρx∗ u , κ. 25 Full Posterior 0.30 Conditional Prior (at MLE) OLS Density 0.15 0.00 0.00 0.05 0.05 0.10 0.10 Density 0.15 0.20 0.20 0.25 0.25 IV −5 0 β 5 −5 0 β 5 Figure 8: Posterior Draws for β in the Returns to Schooling Example: no prior constraints on ρzu , ρx∗ u , κ ρzu , ρx∗ u , κ. The left panel ignores uncertainty in Σ and evaluates b The right panel averages over 1000 identified sets corthe identified set at the MLE Σ. responding to the posterior draws for Σ illustrated in Figure 9. 26 Cov(y,x) Cov(z,x) 2.0 0.5 0.0 0 0.0 2 0.5 4 1.0 6 1.0 8 1.5 10 1.5 12 Var(x) 4.5 5.0 5.5 0.20 0.25 0.30 0.35 0.40 −1.6 −1.4 −1.2 −1.0 −0.8 Cov(z,y) 0 0 2 10 4 20 6 8 30 40 10 12 Var(y) 0.16 0.17 0.18 0.19 0.20 0.21 −0.25 −0.20 −0.15 −0.10 −0.05 1.0 1.5 Var(z) Red Vertical Line = MLE 0.0 0.5 Based on 1000 Posterior Draws 5.0 5.5 Figure 9: Posterior Draws for Σ in Returns to Schooling Example. 27 6.0 Full Posterior 0.8 0.8 Conditional Prior (at MLE) OLS Density 0.4 0.0 0.0 0.2 0.2 Density 0.4 0.6 0.6 IV −5 −4 −3 −2 β −1 0 −5 −4 −3 −2 β −1 0 Figure 10: Posterior Draws for β in the Returns to Schooling Example: κ constrained to be greater than 0.5 and ρx∗ u constrained to be positive. The left panel ignores b The right panel uncertainty in Σ and evaluates the identified set at the MLE Σ. averages over 1000 identified sets corresponding to the posterior draws for Σ illustrated in Figure 9. 28 Figure 11: Identified Set for Returns to Schooling Example: red points correspond to positive values of β. The top panel imposes no constraints on κ, ρzu , ρx∗ u while the bottom panel constrains κ to be greater than 0.5 and ρx∗ u constrained to be positive. 29 Figure 12: Identified Set for Returns to Schooling Example: red points correspond to positive values of β. The top panel imposes no constraints on κ, ρzu , ρx∗ u while the bottom panel constrains κ to be greater than 0.5 and ρx∗ u constrained to be positive. 30