As an editor and reviewer, I’m seeing more selection models (e.g., Heckman) these days that suffer from weak exclusion restrictions (i.e., weak instruments). Weak instruments are a problem in any method dealing with endogeneity where an instrument varible, i, is a proxy for random selection. Heckman selection models share a similar problem of weak instruments, and it has to do with the exclusion restriction (Bushway et al. (2007)). Researchers employ a Heckman selection model to address omitted variable bias stemming from a specific sample selection problem. In the classic example, a model predicting the relationship between wages and education would only include in the sample those educated individuals who chose to work. The self-selection into the sample, for reasons unknown to the researcher, create a type of omitted variable problem manifesting as endogeneity (Certo et al., 2016).
In the Heckman correction, the researcher estimates a first-stage probit model predicting the likelihood of the entity selecting into the sampling condition. For example, in a study on corporate venturing on firm performance, the first stage equation would be a probit model[^1](#) predicting the probability of the firm engaging in corporate venturing activity, in the following form:
Here we estimate the probability of y occurring given a set of observed predictors, Z, with effects Beta, and cdf as the cumulative distribution function of the standard normal distribution. Heckman’s insight was to recognize that a transformation of the predicted values in the first stage represents the selection hazard of appearing in the sample (Clougherty et al., 2016). Using this transformation of the predicted values from the equation, often called the inverse Mills ratio, in a second stage, ordinary least squares estimate of the focal model of interest, yields an estimate of the selection hazard, typically denoted by lambda. In our example, lambda would represent the selection hazard of the firm engaging in corporate venturing activity. Evaluating the statistical significance of lambda proxies the presence of a meaningful selection effect in the second stage model.
Drawing the correct inference about selection effects in the second stage model depends though on two critical factors. The first factor is that while including the inverse Mills ratio in the second stage equation yields a consistent estimate of x—assuming all other assumptions are met—it also yields inconsistent standard errors for every estimated parameter (Clougherty et al., 2016). There are several methods to correct the standard errors in the second stage, including manual matrix manipulation, but most selection estimators (e.g., sampleSelection in R and heckman in Stata) make this correction automatically. The concern is whether the researcher used these estimators, or simply calculated the inverse Mills ratio by hand and then included the value as an other regressor in a second model and then didn’t correct the standard errors.
The second factor is high collinearity between the inverse Mills ratio and the other predictors in the second stage equation. Because the first and second stage equations share the same vector of predictors, the transformed predicted value in the first stage correlates strongly with the predictors in the second stage. As in any multiple regression model, high collinearity yields inconsistent estimates. The solution is generally to include one or more additional predictors in the first stage that are then excluded in the second stage. Akin to instrument variables, these predictors should influence selection into the sample (the first stage), but then have no relationship to the ultimate disturbance term in the second stage (Certo et al., 2016). Failure to include these exclusion restriction variables, using weak exclusion variables, or using exclusion variables that are themselves endogenous, will yield inconsistent estimates in the second stage equation.
Given the difficulties inherent to properly specifying selection models, and that the selection hazard parameter (lambda) only deals with endogeneity specifically from sample selection, many scholars—myself included—recommend using endogeneity correction approaches that deal with selection along with other omitted variable concerns simultaneously (e.g., 2SLS, regression discontinuity, and so forth). The bottom line though, just like any instrument variable method, the quality of the second stage model is predicated on the quality of the first stage model.
[^1](#): While researchers often use logit and probit interchangeably, the Heckman method is a case where the researcher must use a probit model in the first stage equation. The reason is the distributional assumption differences between the two models—the Heckman method depends on the assumption of bivariate normality, which is an outcome only of probit limited dependent variable models.