Ignorability and stability assumptions in neighborhood effects research
Transcription
Ignorability and stability assumptions in neighborhood effects research
Ignorability and stability assumptions in neighborhood e¤ects research Tyler J. VanderWeele Department of Health Studies, University of Chicago 5841 South Maryland Avenue, MC 2007, Chicago, IL 60637, USA [email protected] Abstract Two central assumptions concerning causal inference in the potential outcomes framework for estimating neighborhood e¤ects are examined. The stable unit treatment value assumption in the context of neighborhood e¤ects requires that an individual’s outcome does not depend on the treatment assigned to neighborhoods other than the individual’s own neighborhood. The assumption is important in that it makes estimation feasible; although some progress can be made even when the assumption is relaxed. Some discussion is given concerning the contexts in which the neighborhood-level stable unit treatment value assumption is likely to hold. The ignorability assumption allows the researcher to move from conclusions about association to conclusions about causation. In the context of neighborhood-wide interventions, the ignorability assumption for the individual-level potential outcomes framework can be easily adapted for neighborhood e¤ects. Keywords: Causal inference; ignorability; multilevel models; neighborhood e¤ects; potential outcomes. 1 1. Introduction Several di¤erent types of inquiry arise when considering questions of "neighborhood e¤ects." We might be interested in how health or income outcomes would di¤er if individual i were moved from neighborhood j to neighborhood k. Alternatively we might be interested in how individual or population outcomes would likely change under a particular neighborhood-level intervention. We could consider, for example, the e¤ect of building a new hospital, or of increasing the police force in a particular area or of …xing local roads and sidewalks. In some studies, researchers are able to randomize interventions to neighborhoods (e.g. [1]) or sometimes are even able to randomize individuals to a change in neighborhood (e.g. [2]). Studies of the former type are generally referred to as randomized community trials; such studies are well understood and have been successfully implemented [3]. With the Moving to Opportunity (MTO) study sponsored by the U.S. Department of Housing and Urban Development, there has been more recent interest in the interpretation of studies which randomize individuals to a change in neighborhood [2, 4, 5]. In many studies, however, in which the aim is to estimate neighborhood e¤ects, observational data is used [6-9]. This raises questions about the assumptions that are required to move from associational to causal conclusions in such observational settings. The potential outcomes framework has been routinely employed to articulate these assumptions in the context of non-clustered cohorts [10, 11]. However, the manner in which the concepts from the potential outcomes framework extend to the study of neighborhood e¤ects in which individuals are clustered within neighborhoods is not well documented or understood. In this paper our focus will be the e¤ects of a neighborhood-level intervention rather than the e¤ects of moving an individual from one neighborhood to another. We will consider both randomized community trials and observational data settings and we will examine the technical assumptions needed to draw causal conclusions about neighborhood e¤ects from such data. The studies we consider in this paper are those in which treatment is administered at the neighborhood level. Such studies need to be distinguished from studies in which treatment occurs at the individual-level (so that not all individuals in a particular neighborhood receive treatment) but in which only aggregate level data is available. Within the epidemiologic literature these studies are known as "ecological studies." In such cases, neighborhood-level outcomes are modeled as functions of neighborhood-level characteristics. It has long been recognized [12, 13] that in ecological studies, associations at the neighborhood-level could di¤er considerably, or even be in the opposite direction, from the corresponding associations at the individual-level. Greenland and Morgenstern [14] documented how even in the 2 absence of bias within groups due to confounding, selection, misclassi…cation, etc., neighborhood associations may not re‡ect individual association if either (i) the outcome in the untreated group varies between groups or (ii) the treatment e¤ect is not additive between groups [cf. 15]. Furthermore, variables may be confounders even if they are unassociated with treatment in every group, especially if they are associated with treatment across groups [14, 16]. Moreover, control for such variables may not reduce the bias produced by these confounding variables [16, 17]. Greenland [18] has also noted that if both a neighborhood characteristic and its individual-level equivalent have an e¤ect on the outcomes then even if neither conditions (i) or (ii) above apply, estimates from ecologic studies are not purely neighborhood e¤ects but represent a combination of neighborhood and individual-level e¤ects. Interpretation becomes even more di¢ cult in the case of non-linear models. Haneuse and Wake…eld [19, 20] have done some work on inference in ecological studies in which the ecologic data are supplemented with a sample of individual-level case control data. Our focus in this paper, however, will be studies in which treatment is in fact administered at the neighborhood level. The remainder of this paper is organized as follows. Sections 2 and 3 review the potential outcomes framework and multilevel models respectively. Section 4 discusses the technical assumption needed to draw causal inferences in studies in which treatment is administered at the neighborhood level. Section 5 provides some concluding discussion. 2. Potential Outcomes Before we begin our discussion of neighborhood e¤ects we give a brief overview of the potential outcomes framework [10, 11] for non-clustered individuals. We will a use the notation A BjC to denote that A is independent of B given C. We will let Ti denote the treatment actually received by individual i. In the case of binary treatments, we will let Ti = 1 denote a particular treatment and Ti = 0 denote the absence of that treatment. Let Yi denote some post-treatment outcome for individual i. We let Yi (t) denote the counterfactual value of Y for individual i if treatment T were set, possibly contrary to fact, to the value t. Each individual i has a counterfactual Yi (t) for each possible value of t; these counterfactual variables Yi (t) are often referred to as "potential outcomes." In causal inference we are generally interested in expressions such as E[Y (1)] E[Y (0)] i.e. what the average outcome would have been if the entire population had received treatment as compared to what the average outcome in the population would have been if the entire population had not received treatment. The potential outcomes framework also imposes a "consistency" assumption or axiom that Yi (Ti ) = Yi i.e. that the value of Y which 3 would have been observed if T had been set to what it in fact was for individual i is equal to the value of Y which was in fact observed. Thus the only potential outcome for individual i that is observed is the potential outcome Yi (Ti ), the value of Y which would have been observed if T were set to what it in fact was. Treatment assignment T is said to be ignorable (or weakly ignorable) given coa variates X if Y (t) T jX for all t and if 0 < P (T = tjX = x) < 1 for all t and x. This condition is sometimes also referred to as the assumption of no unmeasured confounders. It is easily veri…ed thatX if treatment assignment T is ignorable given covariates X then for all t, E(Yt ) = E(Y jT = t; X = x)P (X = x). The right x hand side of the equation, unlike the left hand size, is given entirely in observable quantities. Ignorable treatment assignment given covariates X means that within strata of X, treatment gives no information on the distribution of counterfactual outcomes. In practice, a researcher trying to draw causal inferences from observational data attempts to collect a su¢ ciently large set of variables so that any pretreatment di¤erence between the di¤erent treatment groups are attributable to factors in the set of measured variables X so that within strata of X, the groups with di¤erent levels of treatment variable T are comparable except with regard to the treatment that they received. Under the assumption of conditionally ignorable treatment assignment, the researcher can use the observed outcomes within strata of X, weighted by the probability of X, as a valid estimate of average potential outcomes. In this paper we will generalize this potential outcomes framework to settings in which outcomes are recorded at the individual level but in which treatment is administered at the neighborhood level. Before moving on we note that some authors have developed methodology for causal inference without employing the idea of a counterfactual [21, 22]. 3. Multilevel Models and Neighborhood E¤ects Multilevel modeling (also called hierarchical modeling) is often employed in the estimation of the causal e¤ects of neighborhood-level interventions or characteristics. Let Yik denote the outcome for individual i in neighborhood k. Let Xik denote individual-level characteristics for for individual i in neighborhood k. Let Zk denote neighborhood-level characteristics for neighborhood k. Let Tk denote the neighborhood wide treatment or intervention under consideration. The typical model employed in neighborhood e¤ects research concerning a neighborhood-level intervention will have the following form. Yik = + 1 Xik + 2 Zk 4 + Tk + u k + ik (1) where denotes an intercept, uk denotes a neighborhood-level random error and ik denotes an individual-level random error. The uk and ik are usually assumed to be independent and identically distributed with mean zero. That the ik are independent of one another implies that the correlation between deviations from the mean that arises from subjects within the same neighborhood is wholly attributable to the neighborhood random term uk . Model (1) is also often written in hierarchical form with level 1 given by the equation: Yik = k + 1 Xik + 2 Zk + Tk + u k : ik (2) and level 2 given by the equation: k = + (3) Substituting equation (3) into equation (2) gives equation (1). Equation (1) could of course also be generalized to include interaction terms. Estimation of the coe¢ cients of (1) or equivalently of (2) and (3) is straightforward using standard statistical software. Several text books cover such estimation procedures; one such textbook [23] provides some discussion of causal inference with the multilevel model. For purposes of illustration we will consider the e¤ect of the construction of new hospitals, a neighborhood-level intervention, on health outcomes. Let Tk denote the construction of a new hospital in neighborhood k in year r. Let Yik denote a continuous measure of health status for individual i in neighborhood k in year r + 1. Let Xik denote a vector of individual-level characteristics for individual i in neighborhood k including age, sex, race, socioeconomic status, and prior health status, all measured at the beginning of year r. Let Zk denote neighborhood-level characteristics for neighborhood k including number of existing hospitals, neighborhood safety and neighborhood mean income, also measured at the beginning of year r. Before proceeding we note that not all neighborhood-level interventions or randomized community trials can be classi…ed as "neighborhood e¤ects research." Consider, for example, a randomized community trial in which all households in the neighborhoods selected for "treatment" are mailed a nine month supply of vitamin supplements. Such a study may not be considered "neighborhood research" because the intervention is such that the neighborhood itself is not in any way integrally altered. The line between what types of interventions do and do not constitute a "neighborhood e¤ect" may not always be entirely clear - one might imagine a scenario in which the vitamin supplement mailings lead to greater community discussion of nutrition and consequently a more "health-conscious community." Nevertheless, 5 the vitamin supplement mailings are considerably less concerned with altering the neighborhood than an intervention that intentionally tries to create a more healthconscious community by mailing health information, promoting community discussion, making use of billboards and local newspapers, etc. 4. Ignorability and Stability Assumptions In this section we consider the assumptions necessary to provide a causal interpretation of the coe¢ cient . We …rst consider the neighborhood-level stable unit treatment value assumption (NL-SUTVA). The neighborhood-level stable unit treatment value assumption requires that an individual’s outcome does not depend on the treatment assigned to neighborhoods other than the individual’s own neighborhood. Let Yik (t) denote the outcome individual i in neighborhood k would have obtained if the neighborhood-level treatment Tk in neighborhood k were set to t. The potential outcome Yik (t) is only well de…ned under NL-SUTVA. If NL-SUTVA does not hold then the individual’s outcome may depend on the treatments assigned to neighborhoods other than the individual’s own neighborhood and so the quantity we wish to de…ne by Yik (t) may take on many di¤erent values depending upon the treatment status of neighborhoods other than neighborhood k. Under NL-SUTVA, Yik (t) is uniquely de…ned. Note that NL-SUTVA does not require that there be no treatment interaction between two individuals in the same neighborhood; rather NL-SUTVA requires that there be no treatment interaction between individuals in di¤erent neighborhoods, or once again more generally, that the treatment received by one neighborhood does not a¤ect outcomes for individuals in other neighborhoods. The condition may be violated either because of spill-over e¤ects or because of interaction. An example of a spill-over e¤ect leading to a violation of NL-SUTVA would be easier access to health care from the construction of a new hospital that is not in one’s own neighborhood but is in a nearby neighborhood. An example of interaction leading to a violation of NL-SUTVA would be a context in which when two adjacent neighborhoods both have hospitals then each of the hospitals is able to attract more competent physicians. More competent physicians in turn would lead to better individual level health outcomes. Thus if a study were conducted in which adjacent neighborhoods were included in the study, NL-SUTVA would not be satis…ed because the health outcomes in one neighborhood would depend on whether a hospital had been built in adjoining neighborhoods. Clearly then, in certain contexts, NL-SUTVA is likely not to hold. However, clarifying the stable unit treatment value assumption for the neighborhood e¤ects context may have important implications for study design. Although 6 the researcher has no control over treatment assignment in observational studies, in both observational studies and randomized trials the researcher will likely have considerable choice in which experimental units to include in the study. As discussed above, NL-SUTVA will be violated if the outcome of individual i in neighborhood k is a¤ected by treatment in neighborhood j 6= k. Thus, if an individual’s health outcomes are a¤ected by easier access to health care from the construction of a new hospital in a nearby neighborhood, NL-SUTVA will be violated. A researcher designing a randomized trial might thus choose the experimental units, the neighborhoods, in such a way that there would be considerable distance between the neighborhoods so as to prevent the possibility of the construction of a new hospital in neighborhood j from a¤ecting individuals in any other neighborhood k 6= j i.e. by ensuring that neighborhoods near neighborhood j are not in the study (cf. [24]). If in an observational context NL-SUTVA is violated then it will be necessary to use techniques which relax this assumption and allow for the modeling of spillover e¤ects [25, 26]; see [5] for the relaxation of SUTVA conditions in a setting in which individuals are randomized to a change of neighborhoods. This will of course be a necessity if spillover e¤ects (i.e. between-group interactions) are in fact of primary interest. We next turn to the second central assumption needed to draw causal conclusions from statistical models like that given in equation (1). The ignorability assumption makes certain conditional independence claims about the counterfactual outcomes and the treatment and thereby allows the researcher to move from conclusions about association to conclusions about causation. Intuitively, the ignorability assumption is often understood as indicating that within strata of certain measured, potentially confounding variables, the treatment is essentially as though it were randomized. More formally, we will say that neighborhood treatment assignment is ignorable given the covariates Xik and Zk if for t = 0; 1 a Yik (t) Tk jXik ; Zk (4) and if 0 < P (Tk = tjXik = x; Zk = z) < 1 for all t, x and z. The ignorability assumption states that conditional on the covariates Xik ; Zk the potential outcomes Yik (0) and Yik (1) are independent of treatment Tk . The ignorability assumption makes statements about the distributions of the potential outcomes Yik (0) and Yik (1) and since some of these potential outcomes are unobserved the ignorability assumption cannot be tested with data; background knowledge must be used to assess whether the assumption is reasonable. The ignorability condition would hold if treatment is randomized within strata of Xik and Zk . Condition (4) is in fact somewhat weaker than treatment randomization within strata of Xik and Zk . Treatment randomiza7 tion within strata of Xik and Zk implies that treatment Tk is conditionally independent not only of the potential outcomes Yik (0) and Yik (1) given Xik and Zk but also of all other variables. Condition (4) requires only that treatment Tk be conditionally independent of the potential outcomes given Xik and Zk . Intuitively, the ignorability assumption (4) is often interpreted as requiring that for every neighborhood k and for each individual i the covariates Xik ; Zk include all relevant variables that a¤ect both the neighborhood-level treatment Tk and the potential outcomes Yik (0) and Yik (1) i.e. that Xik and Zk contain all the individual and neighborhood level confounding variables. Thus the covariate vector Zk would likely need to include the number of previously existing hospitals in the neighborhood, the presence of hospitals in adjacent neighborhoods along with perhaps the mean income for the neighborhood and any other neighborhood-level characteristics that a¤ect both the construction of a new hospital and individual health outcomes. The situation with the individuallevel covariates is more subtle. An individual-level covariate needs to be included in the individual-level covariate vector Xik only if it a¤ects neighborhood-level treatment Tk in some way di¤erent than the neighborhood-level covariate equivalent in Zk . For example, if the income distribution of the neighborhood were used in making decisions about the construction of a new hospital and if an individual’s income a¤ected health outcomes then the covariate vector Xik would have to include the income for individual i in neighborhood k because each individual’s income a¤ects both the individual’s health outcomes and the overall income distribution which affects decisions about hospital construction. However, if only mean income were used in making decisions about the construction of a new hospital, rather than the income distribution, then to ensure ignorability condition (4) it would su¢ ce to include mean income in the covariate vector Zk ; no individual-level income would need to be included among the covariates. Nevertheless, the individual-level income data may be included in the statistical model in order to improve e¢ ciency though this inclusion can also sometimes induce bias, as discussed below. In practice, …nding covariates Xik and Zk such that the treatment ignorability condition (4) will hold may be challenging. Many variables that may a¤ect both treatment and the outcome, such as for example political resourcefulness, trust and community solidarity, will often be very di¢ cult to measure. Condition (4) will at best hold only approximately and a central challenge of the study’s design will be collecting su¢ ciently rich data so that the approximation will be a reasonable one. Of course, as indicated above, in a community randomized trial the ignorability condition holds by design. We need one …nal assumption or axiom, the "consistency assumption" for neigh8 borhood level treatments. The consistency assumption states that when treatment Tk = t then the counterfactual outcome intervening to set treatment Tk to t is equal to the observed outcome i.e. Tk = t =) Yik (t) = Yik . The assumption is necessary mathematically but is intuitively obvious i.e. that the outcome that would have been observed if treatment were set to what it in fact was is equal to the outcome that was in fact observed. With NL-SUTVA and the consistency assumption, the neighborhood treatment ignorability assumption allows us to derive estimates of the average treatment e¤ect from observational data as stated in the following proposition. Proposition. If the neighborhood treatment assignment is ignorable given Xik and Zk then X E[Yik (t)] = E[Yik jTk = t; Xik = x; Zk = z]P (Xik = x; Zk = z) (5) x;z where the expectation is taken over all individuals and neighborhoods. Furthermore under model (1) we have that E[Yik (1)] E[Yik (0)] = . Proof. By the law of iterated expectations we have that, X E[Yik (t)] = E[Yik (t)jXik = x; Zk = z]P (Xik = x; Zk = z) = X x;z E[Yik (t)jTk = t; Xik = x; Zk = z]P (Xik = x; Zk = z) by ignorability x;z = X x;z E[Yik jTk = t; Xik = x; Zk = z]P (Xik = x; Zk = z) by consistency. Furthermore, if model (1) holds then, E[Yik (1)] E[Yik (0)] X = fE[Yik jTk = 1; Xik = x; Zk = z] x;z = X E[Yik jTk = 0; Xik = x; Zk = z]gP (Xik = x; Zk = z) P (Xik = x; Zk = z) = : x;z Thus, in our example, provided there are no interactions in the statistical model, represents the average causal e¤ect of building a new hospital on the health of individuals in the neighborhood in which the hospital is built. Even in the presence of interactions, equation (5) can be used to estimate causal e¤ects. The left hand 9 side of equation (5) is a causal quantity. The expression E[Yik (t)] denotes the expected value of the potential outcome for an individual living in a neighborhood in which there is an intervention to set treatment Tk to t. The right hand side of equation (5) can be estimated from the data. The expression E[Yik jTk = t; Xik = x; Zk = z] is in fact the conditional expectation that is estimated in the multi-level model in equation (1). Of course, for the estimates obtained from the multi-level model to be valid, the functional form of the model must be correctly speci…ed. Thus quadratic terms, interaction terms and cross-level interaction terms between covariates in Xik and covariates in Zk may need to be included in the model. If there is an interaction between Tk and Zk in the model then the e¤ect of treatment will vary by neighborhood. For example, the average treatment e¤ect of building a new hospital on health outcomes may vary by the quality and connectivity of the roads in a neighborhood since the e¢ cacy of emergency medical will in part depend upon how quickly an individual can reach the hospital. Even with treatment-covariate interactions, the average treatment e¤ect overall will still be equal to the weighted average given in (5). Appropriate multilevel modeling techniques must be used to estimate the variance of the E[Yik (t)jXik = x; Zk = z] estimates. If the e¤ect of treatment on the study sample is of interest then P (Xik = x; Zk = z) can be taken to be …xed weights corresponding to the relative frequency of the covariates Xik and Zk in the study population. If the study sample consists of a simple random sample of subjects from a speci…c study population then the variances of the estimates of the e¤ect of treatment on the study population can be computed by bootstrapping techniques. If the researcher is interested in the e¤ect of treatment on the treated, i.e. on the sub-population of individuals in neighborhoods that actually received treatment then the weights P (Xik = x; Zk = z) may be replaced by P (Xik = x; Zk = zjTk = 1). 5. Discussion In this …nal section we provide some discussion concerning the estimation and causal interpretation of multilevel model coe¢ cients in neighborhood e¤ects research along with some discussion on the limitations of and possible extensions to these methods. As noted above, for the coe¢ cient estimate to be interpreted causally the covariate vectors Xik and Zk must be chosen so that the ignorability condition (4) holds i.e. Xik and Zk must include all variables that a¤ect both the treatment and the outcome. If some variable in Xik or Zk a¤ects only the outcome and not the treatment then including the variable in model (1) may increase the e¢ ciency of the coe¢ cient estimates. Provided such variables are measured prior to treatment the ignorability condition (4) will in general be una¤ected by their inclusion in the 10 covariates vectors Xik and Zk . Exceptions can, however, occur, for example, when the pretreatment covariate is a common e¤ect of two other variables, one of which is a cause of the outcome and the other of which is a cause of treatment and neither of these other two variables are controlled for [27, 28]. Graphical models can be useful in assessing the conditional independence assumption required by treatment ignorability [27, 29, 30]. Glymour provides some discussion speci…cally on the use of graphical models to assess common problems in social epidemiology [31] which may be of interest to those doing neighborhood e¤ects research. Some comments are also in order concerning the modeling assumptions in the estimation of the multilevel model given in equation (1). Equation (1) assumes a linear relationship between the covariates in the model and the outcome. This assumption will no doubt at best be an approximation to the actual functional relationship between the outcome and the covariates. However, in order to assess whether such a linearity assumption is reasonable and in order to genuinely separate the treatment e¤ect from the e¤ect of the covariates, it is important that there is considerable overlap between the distribution of the individual and neighborhoodlevel covariate values in the neighborhoods receiving treatment (Tk = 1) and the neighborhoods which are controls (Tk = 0). Otherwise, signi…cant and potentially scienti…cally meaningful estimates of the treatment e¤ect may in fact be an artifact of the linearity assumption. Propensity score methods can be of considerable use in assessing these issues concerning the comparability of the treatment and control groups [32-34]. Oakes [35] has pointed out how these comparability assumptions can be particularly problematic in neighborhood e¤ects research which attempts to estimate the e¤ects of neighborhood-level socioeconomic status on say health status. This is because the e¤ect of neighborhood-level socioeconomic status is confounded by individual-level socioeconomic status and although individual-level income data can be controlled for in a model there will often be insu¢ cient overlap in the distribution of individual-level income between rich and poor neighborhoods. That is to say, the wealthy tend to live in wealthy neighborhoods and the poor in poor neighborhoods and so it is di¢ cult to separate the e¤ects of individual socioeconomic status from neighborhood-level socioeconomic status. Such considerations can sometimes be partially remedied by appropriate study design in which neighborhoods of only modest di¤erence in socioeconomic status are compared. The discussion in this paper is easily generalized to non-continuous outcomes through the use of generalized linear multilevel models and also to cases with categorical or continuous, rather than binary, treatments. The stable unit treatment value assumption discussed in Section 2 remains the same under these more gen11 eral settings. With continuous treatment, the ignorability assumption given in (4) must be modi…ed so as to apply not only to t = 0; 1 but to any value of t in the support of Tk . For generalized linear multilevel models, equation (1) or equivalently equations (2) and (3) must be modi…ed and appropriate multilevel estimation techniques employed but the causal assumptions required for the valid estimation of neighborhood e¤ects remain essentially the same. In estimating average causal e¤ects from generalized linear models, equation (5) can be used but care must be taken to transform estimates using the inverse link function so that the expression E[Yik jTk = t; Xik = x; Zk = z], rather than its transform, is weighted. The odds ratio, for example, is not collapsible [36] and thus if a logistic multi-level model is used, the inverse logit transformation must be applied before standardization takes place. In our discussion we have seen that multilevel methods do hold some promise for neighborhood e¤ects research. The conclusions drawn from observational data are always tentative and some of the assumptions required may be di¢ cult to meet in practice; however, a …rm understanding of the assumptions required for a causal interpretation of estimates from statistical models allows the researcher to critically assess the likely validity of study conclusions. References [1] Wagenaar AC, Murray DM, Wolfson M, Forster JL, Finnegan JR. Communities mobilizing for change on alcohol: Design of a randomized community trial. Journal of Community Psychology CSAP, Special Issue, 1994; 79-101. [2] Del Conte A, Kling J. A synthesis of MTO research on self-su¢ ciency, safety and health, and behavior and delinquency," Poverty Research News, 2001; 5:3-6. [3] Hannan PJ. Experimental social epidemiology - Controlled community trials. In: J.M. Oakes and J.S. Kaufman (eds.), Methods in Social Epidemiology. Jossey-Bass: San Francisco, CA, 2006. [4] Kling JR, Liebman JB, Katz LF. Experimental analysis of neighborhood e¤ects. Econometrica, in press. [5] Sobel ME. What do randomized studies of housing mobility demonstrate?: Causal inference in the face of interference. Journal of the American Statistical Association, 2006; 101:1398-1407. 12 [6] Diez Rouz A, Nieto F, Muntaner C. Neighborhood environments and coronary heart disease: A multilevel analysis. American Journal of Epidemiology, 1997; 146:48-63. [7] Sampson RJ, Raudenbush SW, Earls F. Neighborhoods and violent crime: a multilevel study of collective e¢ cacy. Science, 1997; 227:918-923. [8] Subramanian SV, Kim DJ, Kawachi I. Social trust and self-rated health in US communities: a multilevel analysis. Journal of Urban Health. 2002; 79:S21-34. [9] Browning CR, Wallace D, Feinberg S, Cagney KA. Neighborhood social processes and disaster-related mortality: The case of the 1995 Chicago heat wave. American Sociological Review, 2006; 71:665-682. [10] Rubin DB. Estimating causal e¤ects of treatments in randomized and nonrandomized studies. Journal Educational Psychology, 1974; 66:688-701. [11] Rubin DB. Bayesian inference for causal e¤ects: The role of randomization. Annals of Statistics, 1978; 6:34-58. [12] Robinson WS. Ecological correlations and the behavior of individuals. American Sociological Review 1950, 15:351-357. [13] Duncan OD, Cuzzort RP, Duncan B. Statistical geography: problems in analyzing areal data. Greenwood Press: Westport, CT, 1961. [14] Greenland S, Morgenstern H. Ecological bias, confounding, and e¤ect modi…cation. American Journal of Epidemiology, 1989; 18:269-274. [15] Rothman KJ, Greenland S. Modern Epidemiology. Lippincott-Raven: Philadelphia, PA, 1998. [16] Greenland S. Divergent biases in ecologic and individual-level studies. Statistics in Medicine, 1992; 11:1209-1223. [17] Greenland S, Robins JM. Invited commentary: ecologic studies - biases, misconceptions, and counterexamples. American Journal of Epidemiology, 1994; 139:747760. [18] Greenland S. A review of multilevel theory for ecologic analyses. Statistics in Medicine, 2002; 21:389-395. 13 [19] Haneuse S, Wake…eld J. Hierarchical models for combining ecological and casecontrol data. Biometrics, 1992; 63:128-136. [20] Haneuse S, Wake…eld J. The combination of ecological and case-control data. Berkeley Electronic Press, UW Biostatistics Working Paper Series. Working Paper 292. http://www.bepress.com/uwbiostat/paper292. [21] Spirtes, P., Glymour, C. and Scheines, R. Causation, Prediction and Search. New York: Springer-Verlag, 1993. [22] Dawid, A. P. (2000). Causal inference without counterfactuals. Journal of the American Statistical Association, 2000; 95:407-424. [23] Gelman A, Hill J. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press: Cambridge, 2007. [24] Murray D. Design and analysis of group-randomized trials. Oxford University Press: New York, 1998. [25] Hong G, Raudenbush SW. Evaluating kindergarten retention policy: A case study of causal inference for multilevel observational data. Journal of the American Statistical Association, 2006; 101:901-910. [26] Verbitsky N, Raudenbush SW. Causal inference in spatial settings: A case study of community policing program in Chicago. Working paper. [27] Greenland S, Pearl J, Robins JM. Causal diagrams for epidemiologic research. Epidemiology, 1999; 10:37-48. [28] Greenland S. Quantifying biases in causal models: classical confounding vs collider-strati…cation bias. Epidemiology, 2003; 14:300-306. [29] Pearl J. Causal diagrams for empirical research. Biometrika, 1995; 82:669-688. [30] Pearl J. Causality: Models, Reasoning, and Inference. Cambridge University Press: Cambridge, 2000. [31] Glymour MM. Using causal diagrams to understand common problems in social epidemiology. In: J.M. Oakes and J.S. Kaufman (eds.), Methods in Social Epidemiology. Jossey-Bass: San Francisco, CA, 2006. 14 [32] Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal e¤ects. Biometrika, 1983; 70:41-55. [33] Rubin DB. Estimating causal e¤ects from large data sets using propensity scores, Annals of Internal Medicine, 1997; 127:757-763. [34] Oakes JM, Johnson PJ. Propensity score matching for social epidemiology. In: J.M. Oakes and J.S. Kaufman (eds.), Methods in Social Epidemiology. Jossey-Bass: San Francisco, CA, 2006. [35] Oakes JM. The (mis)estimation of neighborhood e¤ects: causal inference for a practicable social epidemiology. Social Science & Medicine, 2004; 58:1929-1952. [36] Greenland S, Robins JM, Pearl J. Confounding and collapsibility in causal inference. Statistical Science, 1999; 14:29-46. 15