Robust Inference in Sample Selection Models Mikhail Zhelonkin , Marc G. Genton

Transcription

Robust Inference in Sample Selection Models Mikhail Zhelonkin , Marc G. Genton
Robust Inference in Sample Selection Models
Mikhail Zhelonkin∗, Marc G. Genton†, and Elvezio Ronchetti∗
November 29, 2012
Abstract
The problem of non-random sample selectivity often occurs in practice in many different fields. The classical estimators introduced by Heckman (1979) are the backbone
of the standard statistical analysis of these models. However, these estimators are very
sensitive to small deviations from the distributional assumptions which are often not satisfied in practice. In this paper we develop a general framework to study the robustness
properties of estimators and tests in sample selection models. We derive the influence
function and the change-of-variance function of the Heckman’s two-stage estimator, and
demonstrate the non-robustness of this estimator and its estimated variance to small
deviations from the assumed model. We propose a procedure for robustifying the estimator, prove its asymptotic normality and give its asymptotic variance. This allows us
to construct a simple robust alternative to the sample selection bias test. We illustrate
the use of our new methodology in an analysis of ambulatory expenditures and compare
the performance of the classical and robust methods in a Monte Carlo simulation study.
Keywords: Change-of-variance function; Heckman model; Influence function; M-estimator;
Robust estimation; Robust inference; Sample selection; Two-stage estimator.
1
Introduction
A sample selectivity problem occurs when an investigator observes a non-random sample of
a population, that is, when the observations are present according to some selection rule.
Consider, for instance, the analysis of consumer expenditures, where typically the spending
amount is related to the decision to spend. If a regression model is estimated using the
entire sample (including zero-expenditures) or a sub-sample with non-zero expenditures, the
∗
Research Center for Statistics and Department of Economics, University of Geneva, Blv. Pont-d’Arve 40,
CH-1211 Geneva 4, Switzerland. E-mail: {Mikhail.Zhelonkin, Elvezio.Ronchetti}@unige.ch.
†
CEMSE Division, King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia.
E-mail: [email protected]
1
corresponding estimators of the regression coefficients will be biased. This type of problems
arise in many research fields, including economics, sociology, political science, finance, and
many others.
A sample selection model can be represented by the following regression system
∗
y1i
= xT1i β1 + e1i ,
(1)
∗
y2i
= xT2i β2 + e2i ,
(2)
∗
∗
where the responses y1i
and y2i
are unobserved latent variables, xji is a vector of explanatory
variables, βj is a pj × 1 vector of parameters, j = 1, 2, and the error terms follow a bivariate
normal distribution with variances σ12 = 1, σ22 , and correlation ρ. Notice that the variance
parameter σ12 is set to be equal to 1 to ensure identifiability. Here (1) is the selection equation,
defining the observability rule, and (2) is the equation of interest. The observed variables are
defined by
∗
y1i = I(y1i
> 0),
(3)
∗
∗
y2i = y2i
I(y1i
> 0),
(4)
where I is the indicator function. This model is classified by Amemiya (1984) as a “Tobit type-2
model” or “Heckman model” originally discussed by Heckman (1979) in his seminal paper.
If all the data were available, i.e. the regressor matrix was of full rank, or the data were
missing at random, i.e. no selection mechanism was involved, we could estimate the model by
Ordinary Least Squares (OLS). But in general these conditions are not satisfied and the OLS
estimator is biased and inconsistent.
Heckman (1979) proposed two estimation procedures for this model. The first one is a
Maximum Likelihood Estimator (MLE), based on the assumption of bivariate normality of the
error terms. The second one is a two-stage procedure. Consider the conditional expectation
of y2i given x2i and the selection rule
∗
E(y2i |x2,i , y1i
> 0) = xT2i β2 + E(e2i |e1i > −xT1i β1 ).
2
Then the conditional expectation of the error term is in general different from zero, which leads
to the following modified regression
y2i = xT2i β2 + βλ λi (xT1i β1 ) + vi ,
(5)
where βλ = ρσ2 , λi = φ(xT1i β1 )/Φ(xT1i β1 ) is the inverse Mills ratio, vi is the error term with zero
expectation, and φ(·) denotes the density and Φ(·) the cumulative distribution function of the
standard normal distribution, respectively. Heckman (1979) proposed then in the first stage
to estimate β1 by probit MLE and to compute estimated values of λ, and in a second stage to
use OLS in (5), where the additional variable corrects for the sample selection bias.
Both estimators have advantages and drawbacks, studied extensively in the literature; see
e.g. Stolzenberg and Relles (1997), Puhani (2000), Toomet and Henningsen (2008), and the
general reviews Winship and Mare (1992), Vella (1998) and references therein. An important
criticism is their sensitivity to the normality assumption, which is often violated. Several Monte
Carlo studies have investigated the behavior of the estimators under completely different distributional assumptions; see Paarsch (1984) and Zuehlke and Zeman (1991). Various methods
aiming at relaxing the distributional assumptions have been proposed. They include several
semiparametric (Ahn and Powell 1993, Newey 2009) and nonparametric (Das et al. 2003)
methods. Recently, Marchenko and Genton (2012) derived estimators for a sample selection
model based on the t distribution of the error terms.
To illustrate the sensitivity of the classical procedures in this model, consider the behavior
of the sample selection bias (SSB) test (Heckman 1979) when we move a single observation
x11 from 0 (the mean value under the normal model) to −6 in a sample of size 200. The
data generating process is described in Section 4.1. The boxplots of the test statistic, the
log10 (p-values) and the empirical power of the test are shown in Figure 1. As x11 moves away
from 0, the test statistic decreases and eventually becomes smaller than the threshold of 1.96
reversing the test decision. As a corollary, we see a substantial increase of the p-values and a
drastic reduction of the empirical power of the test. Therefore, a single observation out of 200
3
can change completely the decision about the presence of sample selection bias.
In this paper we focus on the robustness analysis of Heckman’s two-stage procedure. Although a robustness investigation of the MLE for this model could be carried out applying
standard tools of robust statistics (Salazar 2008), we feel that it is more important to perform such an analysis for Heckman’s two-stage procedure. In fact the two-stage estimator is
structurally simpler, has a straightforward interpretation, and is much less computationally intensive than the MLE which explains its success in applied economic analysis. Moreover, there
are numerous extensions of the classical Heckman’s model, including switching regressions,
simultaneous equations with selectivity, and models with self-selectivity, to mention a few (see
Section 5), where the construction of the joint likelihood becomes difficult and cumbersome
whereas Heckman’s estimator can be easily computed.
In the past decades, robust estimators and tests have been developed for large classes of
models both in the statistical and econometric literature; see for instance, the books by Huber
(1981, 2nd edition by Huber and Ronchetti 2009), Hampel et al. (1986), Maronna et al. (2006)
in the statistical literature and Peracchi (1990, 1991), Ronchetti and Trojani (2001), and
Koenker (2005) in the econometric literature. However, except for a few specific contributions
mentioned above, sample selection models have not received a lot of attention and no general
robustness theory is available.
In this paper we fill this gap by providing the following contributions to the literature. In
Section 2 we investigate the robustness properties of Heckman’s estimator and its estimated
variance by deriving the influence function and the change-of-variance function; see Hampel
et al. (1986). These functions are used to quantify the bias of the estimator and its estimated
variance respectively, when deviations from the assumed bivariate normal model are present
and the true data generating process lies in a neighborhood of the assumed bivariate normal
distribution. It turns out that both functions are unbounded and this implies that a small
deviation can have a large impact on the bias of the estimator and on its variance. The latter
4
in turn has a large effect on the corresponding confidence intervals. Moreover, by means of
these functions we provide a von Mises expansion of the test statistic for the test on SSB which
confirms the results obtained in the simple illustration mentioned above.
Since the classical estimation and testing procedures are not robust with respect to deviations from the assumed underlying stochastic model, we propose in Section 3 new robust
estimators and a new robust test for SSB, which are the natural robust counterparts of the
classical Heckman’s two-stage estimator and test of SSB. Section 4 reports the finite sample
performance of the new estimators and test in a simulation setting and with an example on ambulatory expenditures data. In Section 5 we show the flexibility of our approach by discussing
the use of our results for two important extensions of the basic model, namely switching regression models and simultaneous equations models with selectivity. The technical derivations
and assumptions are provided in Appendix A, whereas Appendix B presents the output of the
classical and robust analysis of our example obtained by using the new function ssmrob in R
(R Core Team 2012), which is available from the authors.
2
Robustness Issues with Heckman’s Two-Stage Estimator
In this section we present the main results concerning the two-stage estimator. We derive its
influence function (IF) and its change-of-variance function (CVF) and discuss the robustness
properties of the classical estimator. Moreover, we explain the connection between its IF and
the asymptotic variance. Finally, we explore the robustness properties of the SSB test.
We consider a parametric sample selection model {Fθ }, where θ = (β1 , β2 ) lies in Θ, a
compact subset of Rp1 +p2 . Let FN be the empirical distribution function putting mass 1/N at
each observation zi = (z1i , z2i ), where zji = (xji , yji ), j = 1, 2, i = 1, . . . , N , and let F be the
distribution function of zi . The Heckman’s estimator is a particular case of general two-stage
M-estimators, with probit MLE in the first stage and OLS in the second stage. Define two
5
statistical functionals S and T corresponding to the estimators of the first and second stage,
respectively. The domain of S is a class of probability distributions on Rp1 and its range is a
vector in Rp1 . The domain of T is a class of probability distributions on Rp1 +p2 and its range
is a vector in Rp2 .
The two-stage estimator can be expressed as a solution of the system:
Z
Ψ1 {(x1 , y1 ); S(F )}dF = 0,
(6)
Ψ2 [(x2 , y2 ); λ{(x1 , y1 ); S(F )}, T (F )]dF = 0,
(7)
Z
where Ψ1 (·; ·) and Ψ2 (·; ·, ·) are the score functions of the first and second stage estimators,
respectively. In the classical case Ψ1 (·; ·) is given by (29), and Ψ2 (·; ·, ·) is given by (25). Here
λ{(x1 , y1 ); S(F )} denotes the dependence of λ on S(F ) = β1 , while T (F ) depends directly on
F and indirectly on F through S(F ).
2.1
Influence Function
For a given functional T (F ), the influence function (IF) is defined by Hampel (1974) as
IF (z; T, F ) = lim→0 [T {(1 − )F + ∆z } − T (F )]/, where ∆z is the probability measure
which puts mass 1 at the point z. In our case (1 − )F + ∆z is a contamination of the
joint distribution of zi , but marginal contaminations on the components of zi can also be considered; see the comments below. The IF describes the standardized asymptotic bias on the
estimator due to a small amount of contamination at the point z. An estimator is considered
to be locally robust if small departures from the assumed distribution have only small effects
on the estimator.
Assume that we have a contaminated distribution F = (1 − )F + G, where G is some
arbitrary distribution function. Using a von Mises (1947) expansion, we can approximate the
statistical functional T (F ) at the assumed distribution F as
Z
T (F ) = T (F ) + IF (z; T, F )dG + o()
6
(8)
and the maximum bias over the neighborhood described by F is approximately
sup kT (F ) − T (F )k ∼
= sup kIF (z; T, F )k.
z
G
Therefore, a condition for (local) robustness is a bounded IF with respect to z, which means
that if the IF (·; ·, ·) is unbounded then the bias of the estimator can become arbitrarily large.
Notice also that (8) can be used to approximate numerically the bias of the estimator T at
a given underlying distribution F for a given G. The next proposition gives the influence
function of Heckman’s two-stage estimator.
Proposition 1. For the model (1)-(4), the IF of the Heckman’s two-stage estimator is
−1 (
Z x2
x2 xT2 λx2
T
(y2 − x2 β2 − λβλ )
y1
y1 dF
IF (z; T, F ) =
λ
λ2
λxT2
)
Z 0
x2 β λ
+
y1 λ dF · IF (z; S, F ) ,
λβλ
(9)
where
Z IF (z; S, F ) =
−1
φ(xT1 β1 )2 x1 xT1
φ(xT1 β1 )x1
T
dF
{y1 − Φ(x1 β1 )}
.
Φ(xT1 β1 ){1 − Φ(xT1 β1 )}
Φ(xT1 β1 ){1 − Φ(xT1 β1 )}
(10)
Proofs and derivations of all results are given in Appendix A.
The first term of (9) is the score function of the second stage and it corresponds to the IF
of a standard OLS regression. The second term contains the IF of the first stage estimator.
Clearly, the first term is unbounded with respect to y2 , x2 and λ. Notice that the function
λ is unbounded from the left, and it tends to zero from the right (the solid line in Figure 2).
From (10) we can see that the second term is also unbounded, which means that there is a
second source of unboundedness arising from the selection stage. Therefore, the estimator fails
to be locally robust. A small amount of contamination is enough for the estimator to become
arbitrarily biased. In Section 3 we will present two ways to construct a two-stage estimator
with a bounded influence function.
7
2.2
Asymptotic Variance and Change-of-Variance Function
The expression of the asymptotic variance for the two-stage estimator has been derived by
Heckman (1979), and later corrected by Greene (1981). Duncan (1987) suggested another
approach to derive the asymptotic variance using the M-estimation framework. Using the
result in Hampel et al. (1986), the general expression of the asymptotic variance is given by
Z
V (T, F ) =
IF (z, T, F ) · IF (z, T, F )T dF (z).
Specifically, denote the components of the IF as follows:
xT2 β2
x2
λ
a(z) = (y2 −
,
− λβλ )
Z 0
x2 β λ
b(z) =
λ dF · IF (z; S, F ),
λβλ
Z x2 xT2 λx2
M (Ψ2 ) =
dF .
λxT2
λ2
(11)
Then the expression of the asymptotic variance of Heckman’s two-stage estimator is
V (T, F ) = M (Ψ2 )
−1
Z
a(z)a(z)T + a(z)b(z)T + b(z)a(z)T + b(z)b(z)T dF (z) M (Ψ2 )−1 .
After integration and some simplifications we obtain the asymptotic variance of the classical
estimator
V
β2
βλ
,F
T
−1
= (X X)
σ22 {X T (I
βλ2
2 T
T
− 2 ∆)X} + βλ X ∆X1 · Var(S, F ) · X1 ∆X (X T X)−1 ,
σ2
where ∆ is a diagonal matrix with elements δii =
∂λ(x1i β1 )
∂(x1i β1 )
and Var(S, F ) denotes the asymptotic
variance of the probit MLE.
Robustness issues are not limited to the bias of the estimator, but concern also the stability
of the asymptotic variance. Indeed, the latter is used to construct confidence intervals for the
parameters and we want the influence of small deviations from the underlying distribution on
their coverage probability and length to be bounded. Therefore, we investigate the behavior
of the asymptotic variance of the estimator under a contaminated distribution F and derive
8
the CVF, which reflects the influence of a small amount of contamination on the asymptotic
variance of the estimator. These results will be used in Section 2.3 to investigate the robustness
properties of the SSB test.
The CVF of an M-estimator T at a distribution F is defined by the matrix CV F (z; T, F ) =
(∂/∂)V {T, (1−)F +∆z }
, for all z where this expression exists; see Hampel et al. (1981)
=0
and Genton and Rousseeuw (1995). As in (8) a von Mises (1947) expansion of log V (T, F ) at
F gives
Z
CV
F
(z;
T,
F
)
dG .
V (T, F ) ∼
= V (T, F ) exp V (T, F )
(12)
If the CV F (z; T, F ) is unbounded then the variance can behave unpredictably (arbitrarily
large or small); see Hampel et al. (1986, p. 175). Similarly to the approximation of the bias,
using (12) one can obtain the numerical approximation of the variance of the estimator T at
a given underlying distribution (1 − )F + G for a given G.
Proposition 2. The CVF of the Heckman’s two-stage estimator is given by
−1
Z
x2 xT2 λx2
λ2
λxT2
y1 V
CV F (z; S, T, F ) = V − M (Ψ2 )
DH dF +
Z
−1
+ M (Ψ2 )
AH a(z)T + AH b(z)T + BH b(z)T dF M (Ψ2 )−1
Z
−1
T
+ M (Ψ2 )
a(z)ATH + b(z)ATH + b(z)BH
dF M (Ψ2 )−1
+ M (Ψ2 )−1 a(z)a(z)T + a(z)b(z)T + b(z)a(z)T + b(z)b(z)T M (Ψ2 )−1
Z
x2 xT2 λx2
−V
DH dF +
y1 M (Ψ2 )−1 ,
(13)
λxT2
λ2
where V denotes the asymptotic variance of the Heckman (1979) two-stage estimator, M (Ψ2 )
is defined by (11), AH =
∂
a(z),
∂
BH =
∂
b(z),
∂
and DH =
∂ ∂
Ψ (z; λ, θ).
∂ ∂θ 2
Explicit expressions
for these terms are given in Appendix A.
The CVF has several sources of unboundedness. The first line of (13) contains the derivative
of the score function Ψ2 (·; ·, ·) with respect to the parameter which is unbounded. The same
holds for the fifth line. Finally, in the fourth line there are two terms depending on the score
9
functions of two estimators which are unbounded. Clearly, the CVF is unbounded, which
means that the variance can become arbitrarily large. Taking into account that the two-stage
estimator by definition is not efficient, we can observe a combined effect of inefficiency with
non-robustness of the variance estimator. These problems can lead to misleading p-values and
incorrect confidence intervals. Second order effects in the von Mises expansion are discussed
in general in La Vecchia et al. (2012).
2.3
Sample Selection Bias Test
Heckman (1979) proposed to test for the selection bias using the standard t-test of the coefficient βλ . Melino (1982) showed that this test is equivalent to a Lagrange multiplier test and
has desirable asymptotic properties. Several other proposals are available in the literature, see
e.g. Vella (1992), but the simple Heckman’s test is the most widely used by applied researchers.
Here we investigate the effect of contamination on this test statistic
τn =
√
βˆλ
np
.
V (βλ , F )
Using the expressions for the IF and CVF of the estimator and its asymptotic variance, we
obtain the von Mises expansion of the test statistic:
"
#
T (F )
IF (z; T, F )
1
CV F (z, T, F )
T (F )
p
=p
+ p
+ T (F )
+ remainder,
2n
V (F )/n
V (F )/n
V (F )/n
{V (F )/n}5/2
which provides an approximation of the bias of the test statistic under contamination. It is
clear that the IF of the test depends on the IF and CVF of the estimator. Hence, the IF of the
test statistic is also unbounded. Since, according to Hampel et al. (1986, p. 199), the IFs of
the level and of the power of the test are proportional to the IF of the test statistic, the test is
not robust. Moreover, because Heckman’s two-stage estimator suffers from a lack of efficiency,
small deviations from the model can enhance this effect and increase the probability of type I
and type II errors of the SSB test.
Notice however that the term containing the CVF is of higher order, which means that
the influence of the contamination on the test statistic is mostly explained by the IF of the
10
corresponding estimator. Hence, for practical purposes we need to have at least a robust
estimator with a bounded IF with an additional bonus if the CVF is bounded as well.
3
Robust Estimation and Inference
In this section we suggest how to robustify the two-stage estimator and propose a simple robust
alternative to the SSB test.
3.1
Robust Two-Stage Estimator
From the expression of the IF in (9), it is natural to construct a robust two-stage estimator by
robustifying the estimators in both stages. The idea is to obtain an estimator with bounded
bias in the first stage, then compute λ, which will transfer potential leverage effects from the
first stage to the second, and use the robust estimator in the second stage, which will correct
for the remaining outliers.
Consider the two-stage M-estimation framework given by (6) and (7). We can obtain a
robust estimator by bounding both score functions. In the first stage, we construct a robust
probit estimator following the idea of Cantoni and Ronchetti (2001). We use a general class
of M-estimators of Mallows’ type, where the influence of deviations on y1 and x1 are bounded
separately. The estimator is defined by the following score function:
T
ΨR
1 {z1 ; S(F )} = ν(z1 ; µ)ω1 (x1 )µ − α(β1 ),
where α(β1 ) =
1
n
Pn
i=1
(14)
E{ν(z1i ; µi )}ω1 (x1i )µTi is a term to ensure the unbiasedness of the
estimating function with the expectation taken with respect to the conditional distribution of
y|x, ν(·|·), ω1 (x1 ) are weight functions defined below, and µi = µi (z1i , β1 ) = Φ(xT1i β1 ).
The weight functions are defined by
ν(z1i ; µi ) = ψc1 (ri )
where ri =
y1i −µi
V 1/2 (µi )
1
V
1/2 (µ
i)
,
are Pearson residuals and ψc1 is the Huber function defined by
r,
|r| ≤ c1 ,
ψc1 (r) =
c1 sign(r), |r| > c1 .
11
(15)
The tuning constant c1 is chosen to ensure a given level of asymptotic efficiency at the model.
√
A simple choice of the weight function ω1 (·) is ω1i = 1 − Hii , where Hii is the ith diagonal
element of the hat matrix H = X(X T X)−1 X T . More sophisticated choices for ω1 are available,
e.g. the inverse of the robust Mahalanobis distance based on high breakdown robust estimators
of location and scatter of the x1i . For the probit case we have that µi = Φ(xT1i β1 ), V (µi ) =
Φ(xT1i β1 ){1 − Φ(xT1i β1 )} and hence the quasi-likelihood estimating equations are
n X
ψc1 (ri )ω1 (x1i )
i=1
1
φ(xT1i β1 )x1i − α(β1 )
T
[Φ(x1i β1 ){1 − Φ(xT1i β − 1)}]1/2
= 0,
and E{ψc1 (ri )} in the α(β1 ) term is equal to
E ψc1
y1i − µi
V 1/2 (µi )
= ψc1
−µi
1/2
V (µi )
1 − Φ(xT1i β1 ) + ψc1
1 − µi
V 1/2 (µi )
Φ(xT1i β1 ).
This estimator has a bounded IF and ensures robustness of the first estimation stage.
To obtain a robust estimator for the equation of interest (second stage) we propose to use
an M-estimator of Mallows-type with the following Ψ-function:
T
ΨR
2 (z2 ; λ, T ) = Ψc2 (y2 − x2 β2 − λβλ )ω(x2 , λ)y1 ,
(16)
where Ψc2 (·) is the classical Huber function from (15), but with a different tuning constant
c2 , ω(·) is the weight function, which can also be based on the robust Mahalanobis distance
d(x2 , λ), e.g.
ω(x2 , λ) =
x2 ,
x2 c m
,
d(x2 ,λ)
if d(x2 , λ) < cm ,
if d(x2 , λ) ≥ cm .
We summarize the result in the following proposition.
Proposition 3. Under the assumptions stated in Appendix A, Heckman’s two-stage estimator
defined by (14) and (16) is robust, consistent, and asymptotically normal, with asymptotic
variance given by:
V (T, F ) = M
−1
ΨR
2
Z
{aR (z)aR (z)T + bR (z)bR (z)T }dF M ΨR
2
12
−1
,
(17)
R
where M ΨR
2 = −
∂
ΨR (z; λ, T )dF ,
∂β2 2
Z
bR (z) =
aR (z) = ΨR
2 (z; λ, T ), and
∂ R
0
Ψ2 (z; λ, T )λ dF
∂λ
Z
∂ R
Ψ (z; S)dF
∂β1 1
−1
ΨR
1 (z; S).
The asymptotic variance of the robust estimator has the same structure as that of the
classical Heckman’s estimator. Its computation can become complicated, depending on the
choice of the score function, but for simple cases, e.g. Huber function, it is relatively simple.
3.2
Robust Inverse Mills Ratio
Often the outliers appear only in one or a few components of the observation z = (x1 , y1 , x2 , y2 ).
In these cases the use of robust estimators in both stages is not necessary. If outliers are in y2
and/or x2 , then a robust estimator for the equation of interest is needed. If outliers are in x1
and y1 = 0, then a robust estimator of the probit model is needed. The most complicated case
is when a leverage point x1 with y1 = 1 in the selection equation is transferred to the equation
of interest through the exogenous variable λ, a nonlinear transformation of xT1 β1 .
In this case a natural solution is to bound the influence of the outliers that come into
the main equation when computing λ. Since λ is unbounded and approximately linear in the
predictor xT1 β1 from the left (see Figure 2), we can transform the linear predictor in such a
way that it becomes bounded, ensuring the boundedness of λ, and hence the boundedness of
the IF of the final estimator.
To achieve this we rewrite the linear predictor xT1 β1 = Φ−1 {y1 − rV 1/2 (µ)}, where r =
(y1 − µ)/V 1/2 (µ) and µ = Φ(xT1 β1 ). Then, using the bounded function ψc1 from (15), we
obtain the bounded linear predictor
η(xT1 β1 ) = Φ−1 y1 − ψc1 (r)V 1/2 (µ) ,
(18)
and (18) ensures the boundedness of the inverse Mills ratio. It also has the advantage to avoid
introducing an additional weight to bound directly λ, which would increase the complexity of
the estimator. Note that, if c1 → ∞, then η = xT1 β1 . The classical and robust versions of the
13
inverse Mills ratio are plotted in Figure 2. Depending on the tuning parameter c1 of the robust
probit, the influence of large linear predictors can be reduced, which ensures the robustness of
the estimator.
3.3
Robust Sample Selection Bias Test
To test SSB, i.e. H0 : βλ = 0 vs HA : βλ 6= 0, we simply propose to use a t-test based on
the robust estimator of βλ and the corresponding estimator of its standard error derived in
Section 3.1, where the latter is obtained by estimating (17).
R
−1
−1
aR (z)aR (z)T dF M (ΨR
The first term of (17), M (ΨR
2 ) , is similar to the asymptotic
2)
variance of standard linear regression, but with heteroscedasticity. Therefore, we use the Eicker
(1967) - Huber (1967) - White (1980) heteroscedasticity-consistent variance estimator, i.e. we
ˆ (ΨR )−1 1 P a
ˆ (ΨR ) and a
ˆ (ΨR )−1 , where M
estimate this first term by M
ˆR (z) are
ˆ(zi )ˆ
a(zi )T M
2
2
2
n
the sample versions of M and aR (z), respectively.
−1
The second term of the asymptotic variance, M (ΨR
2)
R
−1
bR (z)bR (z)T M (ΨR
2 ) , is the
asymptotic variance of the probit MLE pre- and post-multiplied by the constant matrix, which
depends on the form of the score function of the second stage. Thus, a consistent estimator is
ˆ (ΨR )−1 1
M
2
n
where
4
4.1
∂ΨR
2i
∂β1
X
=
ˆbR (zi )ˆbR (zi )T M
ˆ (ΨR )−1 = M
ˆ (ΨR )−1 1
2
2
n
X ∂ΨR
2i
∂β1
ˆ
Var(S,
F)
X
T
1
∂ΨR
2i
ˆ (ΨR )−1 ,
M
2
n
∂β1
∂Ψ2 (z2i ;λ,T (F )) ∂λ(z1i ;S(F ))
.
∂λ
∂β1
Numerical Examples
Simulation Study
Consider the model described in Section 1.1. We carry out a Monte Carlo simulation study
to illustrate the robustness issues in this model and compare different estimators. In our
experiment we generate x1 ∼ N (0, 1), x2 ∼ N (0, 1), and the errors e1 and e2 from a bivariate
normal distribution with expectation zero, σ1 = σ2 = 1, and ρ = 0.5, which gives βλ = 0.5.
The degree of censoring is controlled by the intercept in the selection equation, denoted by β10
14
and set to 0, which corresponds to 50% of censoring. Results (not shown here) are similar for
other censoring proportions such as 75% and 25%. The intercept in the equation of interest is
β20 = 0. The slope coefficients are β11 = β21 = 1. We find the estimates of β1 and β2 without
contamination and with two types of contamination. In the first scenario we contaminate x1
when the corresponding y1 = 0. We generate observations from the model described above
and replace them with probability = 0.01 by a point mass at (6, 0, 1, 1), corresponding to
(x1 , y1 , x2 , y2 ). In this case we study the effect of leverage outliers when they are not transferred
to the main equation. In the second scenario we contaminate x1 when the corresponding
y1 = 1. We use the same type of contamination as in the first scenario, but the point mass is
at (−6, 1, 1, 1). This is the most complicated case, because the outliers influence not only the
first estimation stage, but also the second stage through λ. The sample size is N = 200 and
we repeat the experiment 500 times. We do not consider here the cases when only the second
stage is contaminated because this is essentially the situation of standard linear regression
which has been studied extensively; see e.g. Hampel et al. (1986) and Maronna et al. (2006).
In Table 1 we present the results of the estimation of the first stage using a classical probit
MLE and a robust probit M-estimator. Under the model we see that the robust estimator is
less efficient, but in the presence of a small amount of contamination it remains stable, both for
bias and for variance. The classical estimator is clearly biased and becomes much less efficient.
In Table 2 we present the results of the two-stage procedure for four different estimators. We
compare the classical estimator, robust two-stage (robust 2S) from Section 3.1, robust inverse
Mills ratio (IMR) from Section 3.2, and the estimator using only the robust probit in the
first stage and OLS in the second. First of all, notice that all the estimators perform well
without contamination. The loss of efficiency for the robust versions is reasonable and the
bias is close to zero. Obviously, under contamination the classical estimator breaks down.
This effect can be seen in Figure 3. The boxplots correspond to the four estimators described
above. The classical estimator is denoted by (a), the robust probit with OLS by (b), the robust
15
IMR by (c), and the robust 2S by (d). In the case when the outlier is not transferred to the
equation of interest (Figure 3 top panel) it is enough to use a robust probit, but when the
outlier emerges in the equation of interest (Figure 3 bottom panel), a robust estimation of
the second stage is necessary. The robust IMR and the robust two-stage estimators are both
stable regardless of the presence or absence of outliers in λ or x1 . The robust IMR estimator
has smaller variance than the robust two-stage, but in the case of outliers in y2 or x2 it will
require a robust estimator in the second stage anyhow. So this estimator cannot be considered
as a self-contained estimator, but it is a useful tool in some situations.
4.2
Ambulatory Expenditures Data
To further illustrate the behavior of our new robust methodology, we consider the data on ambulatory expenditures from the 2001 Medical Expenditure Panel Survey analyzed by Cameron
and Trivedi (2009, p. 545). The data consist of 3,328 observations, where 526 (15.8%) correspond to zero expenditures. The distribution of the expenditures is skewed, so the log scale is
used. The selection equation includes such explanatory variables as age, gender (female), education status (educ), ethnicity (blhisp), number of chronic diseases (totchr ), and the insurance
status (ins). The outcome equation holds the same variables. The exclusion restriction could
be introduced by means of the income variable, but the use of this variable for this purpose is
arguable. All these variables are significant for the decision to spend, and all except education
status and insurance status are significant for the spending amount. The results of the estimation obtained using the R package selection are reported in the Appendix B. The p-value
of the SSB t-test is 0.0986 which is not significant at the 5% level. The possible conclusion
of no selection bias seems to be doubtful. The estimation of this model using the joint MLE
returns the p-value of the Wald test equal to 0.38. Such behavior of the classical MLE is not
surprising due to the fact that it uses a stronger distributional assumption than Heckman’s
estimator, and is even more sensitive to the presence of deviations from this assumption.
Using the robust two-stage estimator, we obtained results similar to Cameron and Trivedi’s
16
analysis using the classical Heckman’s estimator. The output is reported in the Appendix B.
For all the variables the differences are not dramatic, except for the inverse Mills ratio (IMR)
parameter. The robust estimator returns βˆRIM R = −0.6768, compared to the classical βˆIM R =
−0.4802. The SSB test is highly significant with a p-value p = 0.009, which leads to the
intuitive conclusion of the presence of SSB.
5
Discussion and Extensions
We introduced a framework for robust estimation and testing for sample selection models.
These methods allow to deal with data deviating from the assumed model and to carry out
reliable inference even in the presence of deviations from the assumed normality model. Monte
Carlo simulations demonstrated the good performance of the robust estimators under the
model and with different types of contamination. Although we focused on the basic sample
selection model, our methodology can be easily extended to more complex models. Here we
briefly explore two such extensions.
5.1
Switching Regression Model
A natural extension of Heckman’s model is the switching regression model or “Tobit type-5
model”. In this case we have two regimes, and the regime depends on the selection process.
This model consists of three equations:
∗
y1i
= xT1i β1 + e1i ,
∗
y21i
= xT21i β21 + e2i ,
∗
y22i
= xT22i β22 + e3i ,
where the error terms follow a multivariate normal distribution. The observed variables are
∗
y1i = I(y1i
> 0) and
y2i =
∗
y21i
,
∗
y22i ,
17
if y1i = 1,
if y1i = 0.
The model can be estimated by a two-stage procedure with the following switching regressions:
y2i =
xT21i β21 + βλ1 λi + v1i ,
˜ i + v2i ,
xT22i β22 − βλ2 λ
if y1i = 1,
if y1i = 0,
(19)
˜ i = λi (−xT β1 ). This system can be estimated as two independent
where λi = λi (xT1i β1 ) and λ
1i
linear models.
The IF for the first regression in (19) is exactly the same as that in the basic Heckman’s
model and is given by (9). For the second regression in (19), a slight modification is required.
˜ λ0 by
The IF for the second regression is given by (9), where λ is replaced by λ,
2
T
T
T
T
˜ 0 = −{1 − Φ(x1 β1 )}φ(x1 β1 )x1 β1 + φ(x1 β1 ) xT ,
λ
1
2
{1 − Φ(xT1 β1 )}
and y1 by 1 − y1 .
˜ is unbounded from the right and therefore the conclusion about the roObviously, the λ
bustness properties remains the same. A robust estimator can be easily obtained using the
procedure described in Section 3.1.
5.2
Simultaneous Equations Models With Selectivity
Lee et al. (1980) derived the asymptotic covariance matrices of two-stage probit and two-stage
Tobit methods for simultaneous equations models with selectivity:
Ii =xTi δ − ei ,
(20)
B1 Y1i + Γ1 Z1i =e1i if Ii > 0,
(21)
B2 Y2i + Γ2 Z2i =e2i if Ii ≤ 0,
(22)
where Y1i and Y2i are vectors of p1 and p2 endogenous variables, Z1i and Z2i are vectors of q1
and q2 exogenous variables, B1 and B2 are p1 × p1 and p2 × p2 matrices, Γ1 is a p1 × q1 matrix,
Γ2 is a p2 × q2 matrix, e1i is a p1 vector of residuals, e2i is a p2 vector of residuals, and x is an
m vector, consisting of some or all exogenous variables Zji and another additional exogenous
variables, and δ is an m vector of parameters.
18
The residuals e1 , e2 and e follow a multivariate normal distribution with mean 0 and
covariance matrix


Σ11 Σ12 Σ1e
Σ22 Σ2e  .
Σ=
1
The estimation procedure is explained in Lee et al. (1980). We only mention that in the
first stage the equation (20) is estimated using a probit analysis, and then the reduced forms
of equations (21) and (22) are estimated using OLS with additional variables based on the
inverse Mills ratio. In the third stage the structural parameters are estimated using the second
stage of two-stage least squares (2SLS).
In spite of the complexity of this model, it fits our framework. It is straightforward to
show that the IF of the two-stage estimator of this model is unbounded. A robust estimator
consists of a robust probit on the first stage with a robust 2SLS. The robust probit is given in
Section 3.1, and the robustness of 2SLS has been studied in Zhelonkin et al. (2012) and Kim
and Muller (2007).
A
Appendix: Technical Derivations
A.1
IF of the Heckman’s two-stage estimator
Proof of Proposition 1. The Heckman’s estimator belongs to the class of two-stage M-estimators
and satisfies the conditions required in Zhelonkin et al. (2012). The IF of the general Mestimator has the following form:
IF (z; T, F ) = M (Ψ2 )−1 Ψ2 [(x2 , y2 ); h{(x1 , y1 ); S(F )}, T (F )]
Z
+
where M (Ψ2 ) = −
R
!
∂
∂
Ψ2 {(x2 , y2 ); θ, T (F )} h{(x1 , y1 ); η}dF · IF (z; S, F ) ,
∂θ
∂η
∂
Ψ [(x2 , y2 ); h{(x1 , y1 ); S(F )}, ξ]dF
∂ξ 2
(23)
and IF (z; S, F ) denotes the IF of
the first stage. Note that in (23) T and S denote general M-estimators.
To find the IF of the Heckman’s estimator we need to identify the derivatives in (23). The
19
h(·; ·) function is the inverse Mills ratio and its derivative with respect to β1 is:
2
−Φ(xT1 β1 )φ(xT1 β1 )xT1 β1 − φ(xT1 β1 ) T
∂
x1 .
λ{(x1 , y1 ); η} =
λ =
2
∂η
Φ(xT1 β1 )
0
(24)
The score function of the second stage is
Ψ2 [(x2 , y2 ); λ{(x1 , y1 ); S(F )}, T (F )] = (y2 −
xT2 β2
− λβλ )
x2
λ
y1 .
(25)
The M (Ψ2 ) matrix is given by
Z Z ∂
x2 xT2 λx2
− Ψ2 [(x2 , y2 ); λ{(x1 , y1 ); S(F )}, ξ] dF =
−
y1 dF.
λ2
λxT2
∂ξ
(26)
The derivative with respect to the second parameter of Ψ2 is
∂
Ψ2 {(x2 , y2 ); θ, T (F )} =
∂θ
0
y2 − xT2 β2 − λβλ
y1 −
x2
λ
βλ y1 .
(27)
After taking the expectation, the first term of the righthand side of (27) becomes zero. The
IF (z; S, F ) = M (Ψ1 )−1 Ψ1 {(x1 , y1 ); S(F )} is the IF of the MLE estimator for probit regression,
where:
Z M (Ψ1 ) =
φ(xT1 β1 )2
x1 xT1 dF,
T
T
Φ(x1 β1 ){1 − Φ(x1 β1 )}
Ψ1 ((x1 , y1 ); S(F )) ={y1 − Φ(xT1 β1 )}
φ(xT1 β1 )
x1 .
Φ(xT1 β1 ){1 − Φ(xT1 β1 )}
(28)
(29)
Inserting the expressions in formulas (24)-(29) in (23) we obtain the IF of Heckman’s estimator.
20
A.2
CVF of the Heckman’s two-stage estimator
Proof of Proposition 2. The expression for the CVF of the general two-stage M-estimator (Zhelonkin et al. 2012) is given by
Z
∂
CV F (z; S, T, F ) = V (T, F ) − M (Ψ2 )
DdF (z) + Ψ2 [z2 ; h{z1 ; S(F )}, θ] V (T, F )
∂θ
Z
+ M (Ψ2 )−1
Aa(z)T + Ba(z)T + Ab(z)T + Bb(z)T dF (z)M (Ψ2 )−1
Z
−1
+ M (Ψ2 )
a(z)AT + b(z)AT + a(z)B T + b(z)B T dF (z)M (Ψ2 )−1
−1
+ M (Ψ2 )−1 a(z)a(z)T + a(z)b(z)T + b(z)a(z)T + b(z)b(z)T M (Ψ2 )−1
Z
∂
DdF (z) + Ψ2 [z2 ; h{z1 ; S(F )}, θ] M (Ψ2 )−1 ,
− V (T, F )
(30)
∂θ
where D is a matrix with elements
Dij =
∂ ∂Ψ2i (z2 ; h, θ)
∂h
∂θj
T
∂h(z1 ; s)
IF (z; S, F ) +
∂s
∂ ∂Ψ2i (z2 ; h, θ)
∂θ
∂θj
T
IF (z; T, F ).
Here S and T denote general M-estimators. The vector a(z) denotes the score function of the
second stage and its derivative is given by
A=
∂
∂
∂
Ψ2 {z2 ; h, T (F )} h(z1 ; s)IF (z; S, F ) + Ψ2 [z2 ; h{z1 ; S(F )}, θ] · IF (z; T, F ).
∂h
∂s
∂θ
The vector b(z) =
R
∂
∂
Ψ (z ; θ, T (F )) ∂η
h(z1 ; η)dF (z)
∂θ 2 2
· IF (z; S, F ), and its derivative B has
the following form
Z
Z
∂
∂
B= R
h(z1 ; s)dF IF (z; S, F ) +
Ψ2 {z2 ; h, T (F )}R(2) dF IF (z; S, F )
∂s
∂h
Z
Z
∂
∂
∂
−1
(1)
−
Ψ2 {z2 ; h, T (F )} h(z1 ; s)dF M (Ψ1 )
D dF + Ψ1 (z1 ; θ) IF (z; S, F )
∂h
∂s
∂θ
Z
∂
∂
∂
+
Ψ2 {z2 ; h, T (F )} h(z1 ; s)dF M (Ψ1 )−1 Ψ1 (z1 ; θ)IF (z; S, F )
∂h
∂s
∂θ
∂
∂
+
Ψ2 {z2 ; h, T (F )} h(z1 ; s) · IF (z; S, F ),
(31)
∂h
∂s
(1)
where R(1) is the matrix with elements
(1)
Rij
∂ ∂Ψ2i {z2 ; h, T (F )}
=
∂h
∂hj
T
∂
h(z1 ; s)IF (z; S, F ) +
∂s
21
∂ ∂Ψ2i (z2 ; h, θ)
∂θ
∂hj
T
IF (z; T, F ),
(2)
R(2) is the matrix with elements Rij =
n
∂ ∂hi (z1 ;s)
∂s
∂sj
oT
IF (z; S, F ), and D(1) is an analog of the
matrix D, but reflects only the first estimation stage, such that the elements of the matrix are
T
(1)
∂ ∂Ψ1i
Dij = ∂θ
IF (z; S, F ).
∂θj
Now we can proceed with the computation of the CVF for the Heckman’s estimator. Recall
that the function h{z1 ; S(F )} is a scalar function λ(xT1 β1 ) and its derivative with respect to β1
is given by (24). The IF’s of T (F ) and S(F ) are given by (9) and (10), respectively. The M (Ψ1 )
and M (Ψ2 ) matrices are given by (28) and (26), respectively and the scores Ψ1 {z; S(F )} and
Ψ2 {z; λ, T (F )} are given by (29) and (25). The derivative of the score function with respect
to λ is given in the formula (27).
The second term of the matrix D is equal to zero, hence the D matrix for Heckman’s
estimator takes the form
DH =
0 x2
xT2 2λ
0
λ y1 IF (z; S, F ),
where 0 is a (p2 − 1) × (p2 − 1) matrix of zeroes.
Denote the analog of the A matrix for the Heckman’s estimator as AH . we obtain
0
x2 xT2 x2 λ
−x2 βλ
y1 IF (z; T, F ).
y1 λ IF (z; S, F ) +
AH =
xT2 λ λ2
y2 − xT2 β2 − 2λβλ
(1)
(2)
Next, we need to get the expression of BH . Define RH and RH corresponding to the R(1)
and R(2) matrices :
(1)
RH
=
0
−2βλ
0
y1 λ IF (z; S, F ) +
0
−x2
−x2 −2λ
y1 IF (z; T, F ),
where 0 in the first term is a (p2 − 1) vector, and in the second term 0 is a (p2 − 1) × (p2 − 1)
(2)
matrix. Let us compute RH :
∂ ∂ ∂ 0 (2)
RH =
λ = λ ∂ ∂s ∂ =0
=0
#T
"
T
T
−φ(x1 β1 )x1 β1 + Φ(xT1 β1 )(xT1 β1 )2 − Φ(xT1 β1 ) + 2φ(xT1 β1 ) φ(xT1 β1 )
=
x1 xT1 IF (z; S, F )
Φ(xT1 β1 )2
"
#T
2φ(xT1 β1 )2 −Φ(xT1 β1 )xT1 β1 − φ(xT1 β1 )
−
x1 xT1 IF (z; S, F ) .
Φ(xT1 β1 )3
22
The expression above is a p1 vector. The D(1) matrix for the probit estimator is:
∂ ∂
Ψ1 (z; θ)
Dprobit =
∂ ∂θ
=0
T
−φ(x1 β1 )2 xT1 β1
T
x IF (z; S, F ) x1 xT1
=
Φ(xT1 β1 ){1 − Φ(xT1 β1 )} 1
φ(xT1 β1 )3 {1 − 2Φ(xT1 β1 )} T
x IF (z; S, F ) x1 xT1 ,
−
Φ(xT1 β1 )2 {1 − Φ(xT1 β1 )}2 1
where we first take the derivative with respect to θ, and when we take the derivative with
respect to , we evaluate the second derivative with respect to θ at θ = S(F ). We defined all
the ingredients of the BH matrix, and we can use them in (31).
In order to obtain the expression of the CVF we need to insert the expressions obtained
T
are equal to zero, we
above into (30). By noting that the integrals of BH a(z)T and a(z)BH
finally obtain (13).
A.3
Assumptions and Proof of Proposition 3
T
T
R
. Assume the following conditions, which have
Denote ΨR (z; θ) = ΨR
1 (z; β1 ) , Ψ2 (z; β1 , β2 )
been adapted from Duncan (1987):
(i) z1 , . . . , zN is a sequence of independent identically distributed random vectors with distribution F defined on a space Z;
(ii) Θ is a compact subset of Rp1 +p2 ;
(iii)
R
ΨR (z; θ)dF = 0 has a unique solution, θ0 , in the interior of Θ;
(iv) ΨR (z; θ) and
∂
ΨR (z; θ)
∂θ
are measurable for each θ in Θ, continuous for each z in Z,
and there exist F -integrable functions ξ1 and ξ2 such that for all θ ∈ Θ and z ∈ Z
∂
|ΨR (z; θ)ΨR (z; θ)T | ≤ ξ1 and | ∂θ
ΨR (z; θ)| ≤ ξ2 ;
(v)
R
ΨR (z; θ)ΨR (z; θ)T dF is non-singular for each θ ∈ Θ;
(vi)
R
∂
ΨR (z; θ0 )dF
∂θ
is finite and non-singular.
23
Proof of Proposition 3. Consistency and asymptotic normality follow directly from Theorems
1-4 in Duncan (1987). The asymptotic variance consists of two terms, because aR (z)bR (z)T
and bR (z)aR (z)T vanish after integration, due to the independence of the error terms.
B
Appendix: R output for the numerical results
B.1
Classical two-stage estimator
-------------------------------------------------------------2-step Heckman / heckit estimation
Probit selection equation:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.71771
0.19247
-3.729 0.000195 ***
age
0.09732
0.02702
3.602 0.000320 ***
female
0.64421
0.06015
educ
0.07017
0.01134
6.186 6.94e-10 ***
blhisp
-0.37449
0.06175
-6.064 1.48e-09 ***
totchr
0.79352
0.07112
11.158
ins
0.18124
0.06259
10.710
< 2e-16 ***
< 2e-16 ***
2.896 0.003809 **
Outcome equation:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
5.30257
0.29414
18.028
< 2e-16 ***
age
0.20212
0.02430
8.319
< 2e-16 ***
female
0.28916
0.07369
3.924 8.89e-05 ***
educ
0.01199
0.01168
1.026
blhisp
-0.18106
0.06585
-2.749
0.006 **
totchr
0.49833
0.04947
10.073
< 2e-16 ***
-0.04740
0.05315
-0.892
ins
24
0.305
0.373
Error terms:
Estimate Std. Error t value Pr(>|t|)
invMillsRatio
sigma
rho
-0.4802
0.2907
-1.652
0.0986 .
1.2932
NA
NA
NA
-0.3713
NA
NA
NA
--Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
--------------------------------------------------------------
B.2
Robust two-stage estimator
-------------------------------------------------------------Robust 2-step Heckman / heckit M-estimation
Probit selection equation:
Coefficients:
Estimate Std. Error z-value Pr(>|z|)
(Intercept) -0.74914
0.19507
-3.840 0.000123 ***
age
0.10541
0.02770
3.806 0.000141 ***
female
0.68741
0.06226
educ
0.07012
0.01147
6.116 9.62e-10 ***
blhisp
-0.39775
0.06265
-6.349 2.17e-10 ***
totchr
0.83284
0.08028
10.374
ins
0.18256
0.06371
11.041
< 2e-16 ***
< 2e-16 ***
2.865 0.004166 **
Outcome equation:
Value
Std. Error t value
(Intercept)
5.40154
0.27673
19.5192 ***
age
0.20062
0.02451
8.1859 ***
female
0.25501
0.06992
3.6467 ***
25
educ
0.01325
0.01162
1.1405
blhisp
-0.15508
0.06507
-2.3835 *
totchr
0.48116
0.03822
12.5861 ***
ins
-0.06707
0.05159
-1.2999
IMR
-0.67676
0.25928
-2.6102 **
--Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
--------------------------------------------------------------
References
Ahn, H. and Powell, J. L. (1993), “Semiparametric Estimation of Censored Selection Models
with a Nonparametric Selection Mechanism,” Journal of Econometrics, 58, 3–29.
Amemiya, T. (1984), “Tobit Models: a Survey,” Journal of Econometrics, 24, 3–61.
Cameron, C. A. and Trivedi, P. K. (2009), Microeconometrics Using Stata, College Station,
TX: Stata Press.
Cantoni, E. and Ronchetti, E. (2001), “Robust Inference for Generalized Linear Models,” Journal of the American Statistical Association, 96, 1022–1030.
Das, M., Newey, W. K., and Vella, F. (2003), “Nonparametric Estimation of Sample Selection
Models,” Review of Economic Studies, 70, 33–58.
Duncan, G. M. (1987), “A Simplified Approach to M-Estimation with Application to TwoStage Estimators,” Journal of Econometrics, 34, 373–389.
Eicker, F. (1967), “Limit Theorems for Regression with Unequal and Dependent Errors,” in
Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability,
L.M. LeCam, J. Neyman (Eds.), Berkeley: University of California Press, 59–82.
Genton, M. G. and Rousseeuw, P. J. (1995), “The Change-of-Variance Function of MEstimators of Scale under General Contamination,” Journal of Computational and Applied
Mathematics, 64, 69–80.
Greene, W. H. (1981), “Sample Selection Bias as a Specification Error: Comment,” Econometrica, 49, 795–798.
Hampel, F. (1974), “The Influence Curve and Its Role in Robust Estimation,” Journal of the
American Statistical Association, 69, 383–393.
Hampel, F., Ronchetti, E. M., Rousseeuw, P. J., and Stahel, W. A. (1986), Robust Statistics:
The Approach Based on Influence Functions, New York: John Wiley and Sons.
26
Hampel, F., Rousseeuw, P. J., and Ronchetti, E. (1981), “The Change-of-Variance Curve and
Optimal Redescending M-Estimators,” Journal of the American Statistical Association, 76,
643–648.
Heckman, J. (1979), “Sample Selection Bias as a Specification Error,” Econometrica, 47, 153–
161.
Huber, P. J. (1967), “The Behavior of Maximum Likelihood Estimates under Nonstandard
Conditions,” in Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics
and Probability, L.M. LeCam, J. Neyman (Eds.), Berkeley: University of California Press,
221–233.
— (1981), Robust Statistics, New York: John Wiley and Sons.
Huber, P. J. and Ronchetti, E. (2009), Robust Statistics, New York: John Wiley and Sons, 2nd
ed.
Kim, T.-K. and Muller, C. (2007), “Two-Stage Huber Estimation,” Journal of Statistical Planning and Inference, 137, 405–418.
Koenker, R. (2005), Quantile Regression, New York: Cambridge University Press.
La Vecchia, D., Ronchetti, E., and Trojani, F. (2012), “Higher-Order Infinitesimal Robustness,”
Journal of the American Statistical Association, to appear.
Lee, L.-F., Maddala, G., and Trost, R. (1980), “Asymptotic Covariance Matrices of Two-Stage
Probit and Two-Stage Tobit Methods for Simultaneous Equations Models With Selectivity,”
Econometrica, 48, 491–503.
Marchenko, Y. V. and Genton, M. G. (2012), “A Heckman Selection-t Model,” Journal of the
American Statistical Association, 107, 304–317.
Maronna, R. A., Martin, G. R., and Yohai, V. J. (2006), Robust Statistics: Theory and Methods,
Chichester: John Wiley and Sons.
Melino, A. (1982), “Testing for Sample Selection Bias,” Review of Economic Studies, XLIX,
151–153.
Newey, W. K. (2009), “Two-Step Series Estimation of Sample Selection Models,” Econometrics
Journal, 12, S217–S229.
Paarsch, H. J. (1984), “A Monte Carlo Comparison of Estimators for Censored Regression
Models,” Journal of Econometrics, 24, 197–213.
Peracchi, F. (1990), “Bounded-Influence Estimators for the Tobit Model,” Journal of Econometrics, 44, 107–126.
— (1991), “Robust M-Tests,” Econometric Theory, 7, 69–84.
Puhani, P. A. (2000), “The Heckman Correction for Sample Selection and Its Critique,” Journal
of Economic Surveys, 14, 53–68.
27
R Core Team (2012), R: A Language and Environment for Statistical Computing, R Foundation
for Statistical Computing, Vienna, Austria, ISBN 3-900051-07-0.
Ronchetti, E. and Trojani, F. (2001), “Robust Inference with GMM Estimators,” Journal of
Econometrics, 101, 37–69.
Salazar, L. (2008), “A Robustness Study of Heckman’s Model,” Master’s Thesis, University of
Geneva, Switzerland.
Stolzenberg, R. M. and Relles, D. A. (1997), “Tools for Intuition about Sample Selection Bias
and Its Correction,” American Sociological Review, 62, 494–507.
Toomet, O. and Henningsen, A. (2008), “Sample Selection Models in R: Package SampleSelection,” Journal of Statistical Software, 27, 1–23.
Vella, F. (1992), “Simple Tests for Sample Selection Bias in Censored and Discrete Choice
Models,” Journal of Applied Econometrics, 7, 413–421.
— (1998), “Estimating Models with Sample Selection Bias: A Survey,” The Journal of Human
Resources, 33, 127–169.
von Mises, R. (1947), “On the Asymptotic Distribution of Differentiable Statistical Functions,”
The Annals of Mathematical Statistics, 18, 309–348.
White, H. (1980), “A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct
Test for Heteroskedasticity,” Econometrica, 48, 817–838.
Winship, C. and Mare, R. D. (1992), “Models for Sample Selection Bias,” Annual Review of
Sociology, 18, 327–350.
Zhelonkin, M., Genton, M. G., and Ronchetti, E. (2012), “On the Robustness of Two-Stage
Estimators,” Statistics & Probability Letters, 82, 726–732.
Zuehlke, T. W. and Zeman, A. R. (1991), “A Comparison of Two-Stage Estimators of Censored
Regression Models,” The Review of Economics and Statistics, 73, 185–188.
28
Table 1: Bias, Variance and MSE of the classical and robust probit MLE at the model and
under two types of contamination
N = 200
Classical
β10
β11
Robust
β10
β11
Not contaminated
Bias
Var
MSE
x1 is contaminated, y1 = 1
Bias
Var
MSE
x1 is contaminated, y1 = 0
Bias
Var
MSE
0.0053
0.0092
0.0100
0.0188
0.0101
0.0189
0.0524
−0.4458
0.0084
0.0566
0.0112
0.2553
−0.0439
−0.4522
0.0084
0.0520
0.0103
0.2565
0.0062
0.0106
0.0110
0.0270
0.0111
0.0272
0.0062
0.0111
0.0111
0.0272
0.0112
0.0273
0.0061
0.0124
0.0113
0.0281
0.0113
0.0282
Table 2: Bias, Variance and MSE of the classical and robust two-stage estimators at the model
and under two types of contamination
N = 200
Classical
β20
β21
β2λ
Robust probit + OLS
β20
β21
β2λ
Robust IMR
β20
β21
β2λ
Robust 2S
β20
β21
β2λ
Not contaminated
Bias
Var
MSE
x1 is contaminated, y1 = 1
Bias
Var
MSE
x1 is contaminated, y1 = 0
Bias
Var
MSE
−0.0061
0.0053
−0.0044
0.0273
0.0092
0.0526
0.0274
0.0092
0.0526
0.1307
−0.0055
−0.2837
0.0252
0.0094
0.0353
0.0424
0.0094
0.1159
−0.3384
0.0052
0.3878
0.1812
0.0093
0.2923
0.2957
0.0093
0.4427
−0.0089
0.0053
−0.0004
0.0282
0.0092
0.0558
0.0283
0.0092
0.0558
0.1924
−0.0038
−0.3760
0.0197
0.0094
0.0295
0.0567
0.0094
0.1709
−0.0090
0.0052
−0.0012
0.0284
0.0093
0.0561
0.0285
0.0093
0.0561
−0.0107
0.0055
0.0088
0.0294
0.0092
0.0594
0.0295
0.0092
0.0595
−0.0101
0.0056
0.0100
0.0263
0.0088
0.0549
0.0264
0.0088
0.0550
−0.0108
0.0053
0.0084
0.0296
0.0093
0.0601
0.0297
0.0093
0.0602
−0.0142
0.0044
0.0035
0.0312
0.0106
0.0677
0.0314
0.0106
0.0677
0.0189
0.0029
−0.0462
0.0277
0.0105
0.0553
0.0280
0.0105
0.0586
−0.0143
0.0040
0.0033
0.0314
0.0107
0.0681
0.0316
0.0107
0.0681
Figure 1: Influence of a single observation x11 on the sample selection bias (SSB) test. The
data generating process is described in Section 4.1 with the only difference that here ρ = 0.75.
The sample size is 200 and we generate 1000 replicates. The top panel represents the SSB test
statistic, the middle panel its log10 (p-values), and the bottom panel represents its empirical
power, i.e. the fraction of rejections of the null hypothesis among the 1000 replicates. On the
horizontal axis, x11 varies between 0 and −6 and the corresponding y11 = 1. The dashed line
in the top panel corresponds to 1.96, and in the middle panel to log10 (0.05).
Figure 2: The solid line corresponds to the classical inverse Mills ratio. The dotted lines
correspond to the robust inverse Mills ratio from Section 3.2.
Figure 3: Comparison of classical and robust two-stage estimators with contamination of x1 .
Top panel corresponds to y1 = 0, and bottom panel corresponds to y1 = 1. Case (a) corresponds
to the classical estimator, (b) corresponds to robust probit with OLS on the second stage, (c)
corresponds to robust IMR, and (d) to robust two-stage. Horizontal lines mark the true values
of the parameters.