A rigorous way to test (and to think about) mediation

ENT5587B - Research Design & Theory Testing II

Brian S. Anderson, Ph.D.
Assistant Professor
Department of Global Entrepreneurship & Innovation

© 2017 Brian S. Anderson

  • How goes it?
  • IRB update
  • Paper reviews
  • Experimental design and mediation
  • Mediation exercise
  • Lab 23 March – Mediation assessment

Causal mediation model

“Once more into the breach, dear friends…”

As may be apparent from today’s reading, there is a substantial difference in the methodological requirements to estimate a causal mediation model from a simple indirect effect model.

My personal opinion—methodological awakening if you will—is that mediation models are enormously helpful and really help us to uncover just what is going on in a given \(x\) —> \(y\) relationship.

But such inference is only useful if we can eliminate alternate explanations for the indirect effect.

That means, effectively, that we must be able to derive a causal estimate of the indirect effect. This is, as you now know, exceptionally difficult.

But certainly doable. We—and by we, I mean management in general and entrepreneurship in particular—need to take causality seriously.

To date, cough cough cough, we haven’t done the best job at that. But, times are changing!

To understand causality in mediation designs, it’s helpful to understand the two most common ways to test (not to explain, that’s different) indirect effects.

  • Measurement-of-mediation
  • Manipulation-of-mediation

The measurement of mediation approach is by far the most common in psychology, management, etc.

Under this approach, the mediator is observed as an outcome variable of \(x\), and then modeled as a predictor of \(y\).

Remember, the three rules of causal inference still apply, regardless of whatever it is that we are modeling. For example, if the mediator is measured at the same time as \(y\), then we violate temporal precedence and hence, no causal inference.

Leave aside for a minute whether study participants were randomly assigned to \(x\), and just focus on \(m\).

The mediator is an observed variable in this approach, irrespective of whether it is temporally separated from \(y\). Given this reality, whether the study was done in a lab or not, what is the threat to causal inference?

Pesky omitted variables again!

This is where the manipulation of mediation approach comes is.

Under manipulation of mediation, study participants are randomly assigned to levels of \(m\). From the counterfactual framework, if we have randomization into the treatment condition, then we can estimate the average causal effect (ACE) of \(m\) on \(y\).

Sounds conceptually easy, right? In practice, not so much.

Lets get into some data and start figuring this out.

my.ds <- read_csv("http://a.web.umkc.edu/andersonbri/CausalMediation.csv")
my.df <- as.data.frame(my.ds)

In this data, we’ve got a dichotomous \(x\), indicating a treatment condition. We’ve got a continuous \(m\), and a continuous \(y\).

We’re going to make the assumption that we’ve conducted an experiment in which we’ve manipulated \(x\), but then measured \(m\) and \(y\)

\(m\) is our theoretical mechanism, but it’s an observed (non-manipulated) measure.

Lets revisit the counterfactual.

For any individual, \(i\), we can derive the causal effect of \(x\) on \(y\) as simply the difference in y between the treatment and control condition.

\(y_i(t) - y_i(c)\)

Except that we can’t do that. Why?

This is the essence of the counterfactual. A given individual can’t have experienced both treatment and control.

So, we estimate the average causal effect (ACE) based on a large group of people, some of whom received the treatment and some of whom didn’t. When we write the ACE, we use the \(E\) notation, for expectation, or the mean.

\(ACE = E(y(t) - y(c))\)

So far so good?

Now, here’s the kicker—although we won’t go through all of the details on this—the ability to draw causal inference from this formula depends on the assumption that assignment of \(i\) to one of the treatment conditions is statistically independent from the observed outcome.

In plain English, this means that there were no measured or unmeasured factors that influenced whether or not a given \(i\) received \(x\) = 1 or \(x\) = 0.

When we write this, we add an up tack symbol to our equation denoting that we are assuming random assignment into a treatment condition:

\(ACE = E(y(t) - y(c))\bot{T}\)

When this happens, a given \(i\) has the same probability of being in the \(x\) = 1 or \(x\) = 0 condition before the observation of \(y\) occurs.

If this probability is true, then we can determine the true causal effect of \(x\) on \(y\) by subtracting the difference between the mean value of \(y\) in the treatment group from the mean value of \(y\) in the control group:

\(ACE = E(y|T = 1) - E(y|T = 0)\)

Make sense?

Now lets go back to our mediation model:

Causal mediation model

We represent the classic mediation model with the following two equations:

Eq 1: \(m =\alpha+\beta{x}+\epsilon\)

Eq 2: \(y =\alpha+\beta{x}+\beta{m}+\epsilon\)

In a mediation model, we have two dependent variables, \(y\) and \(m\). That means we have two different causal paths to estimate: \(x\)—>\(m\) and \(m\)—>\(y\).

The same rules for causal inference apply. If we’ve randomly assigned \(i\)’s to treatment conditions for \(x\), then we can estimate the causal effect of \(x\) on \(m\). But if we don’t randomly assign \(i\)’s to treatment conditions for \(m\), then we violate the \(ACE = E(y(t) - y(c))\bot{T}\) assumption and we can’t infer a causal effect of \(m\) on \(y\).

Lack of randomization creates an omitted variable problem, and hence the assumption that \(m\) is endogenous to \(y\). Any indirect effect—\(ab\)—calculated from such a model would be incorrect.

Lets go back to our data. When I created this dataset, I put in an omitted variable, \(z\), between \(m\) and \(y\). There is no omitted variable between \(x\) and \(m\), because we randomly assigned \(i\)’s to \(x\).

Here’s the correct model:

mediation.model <- 'm ~ a * x  # a path
                    y ~ b * m  # b path

                    # Correct omitted variable paths
                    m ~ z
                    y ~ z
                    # Specified parameters
                    ab := a*b'
mediation.fit <- sem(mediation.model, data = my.df)

##    lhs op rhs label    est    se      z pvalue ci.lower ci.upper
## 1    m  ~   x     a  0.626 0.022 27.872      0    0.582    0.671
## 2    y  ~   m     b  0.512 0.009 60.101      0    0.495    0.529
## 3    m  ~   z        0.601 0.011 52.981      0    0.578    0.623
## 4    y  ~   z        0.581 0.011 51.608      0    0.559    0.603
## 5    m ~~   m        1.263 0.018 70.711      0    1.228    1.298
## 6    y ~~   y        0.988 0.014 70.711      0    0.961    1.016
## 7    x ~~   x        0.250 0.000     NA     NA    0.250    0.250
## 8    x ~~   z       -0.003 0.000     NA     NA   -0.003   -0.003
## 9    z ~~   z        0.983 0.000     NA     NA    0.983    0.983
## 10  ab := a*b    ab  0.321 0.013 25.285      0    0.296    0.346
fitmeasures(mediation.fit, c("chisq", "df", "pvalue"))
##  chisq     df pvalue 
##  0.086  1.000  0.769

The correct indirect effect of \(x\) on \(y\) through \(m\) is .321.

Now, lets estimate a model without \(z\)—the naive model (and the one that most researchers, including experimental psychologists typically use).

mediation.ov.model <- 'm ~ a * x  # a path
                       y ~ b * m  # b path
                       # Specified parameters
                       ab := a*b'
mediation.ov.fit <- sem(mediation.ov.model, data = my.df)

##   lhs op rhs label   est    se      z pvalue ci.lower ci.upper
## 1   m  ~   x     a 0.618 0.025 24.308      0    0.568    0.668
## 2   y  ~   m     b 0.712 0.009 83.239      0    0.695    0.728
## 3   m ~~   m       1.618 0.023 70.711      0    1.573    1.662
## 4   y ~~   y       1.252 0.018 70.711      0    1.217    1.286
## 5   x ~~   x       0.250 0.000     NA     NA    0.250    0.250
## 6  ab := a*b    ab 0.440 0.019 23.333      0    0.403    0.477
fitmeasures(mediation.ov.fit, c("chisq", "df", "pvalue"))
##  chisq     df pvalue 
## 39.652  1.000  0.000

Notice that our estimate of the \(a\) path is fairly similar between the two models (\(\beta\)=.626 v. \(\beta\)=.618), as we would expect. The estimate of the \(b\) path is substantially different (\(\beta\)=.512 v. \(\beta\)=.712). The resulting \(ab\) is also quite a bit different in this model (\(ab\)=.440).

Yes, this model is misspecified (\(\chi\)2=39.652, p<.001, df=1).

Unfortunately too many researchers ignore the \(\chi\)2. Which is really, really, really unfortunate.

Side note…

So we know that our naive model is wrong, but is it substantively wrong?

Here’s the big question—would we have drawn an incorrect (Type I or Type II) nomological conclusion based on failing to account for \(z\) in our model?

If not, what’s the big deal about trying to account for omitted variables (endogeneity)? Seems like a lot of trouble for not a lot of gain…

What’s the fundamental error with my logic?

Remember, with omitted variable bias, the parameter estimate will be inconsistent. We don’t know the way in which the inconsistency manifests—could be positive, could be negative.

Hence, we ALWAYS must account for endogeneity before drawing nomological conclusions, and NEVER EVER EVER EVER EVER after the fact. Ever. I mean it.

So the simplest approach to addressing omitted variable bias in the case where the mediator is biased is to instrument \(m\) and use a 2SLS approach. I happen to have two such instruments in the dataset.

mediation.iv.model <- 'm ~ a * x  # a path
                       y ~ b * m  # b path

                       # Instrument variable paths
                       m ~ I1 + I2

                       # Specify the error term covariance
                       m ~~ y

                       ab := a*b'
mediation.iv.fit <- sem(mediation.iv.model, data = my.df)

##    lhs op rhs label    est    se      z pvalue ci.lower ci.upper
## 1    m  ~   x     a  0.633 0.023 27.664      0    0.588    0.678
## 2    y  ~   m     b  0.512 0.020 26.027      0    0.473    0.550
## 3    m  ~  I1        0.348 0.012 29.752      0    0.325    0.371
## 4    m  ~  I2        0.355 0.011 31.172      0    0.333    0.378
## 5    m ~~   y        0.342 0.030 11.274      0    0.283    0.402
## 6    m ~~   m        1.372 0.019 70.711      0    1.334    1.410
## 7    y ~~   y        1.320 0.023 57.365      0    1.275    1.365
## 8    x ~~   x        0.250 0.000     NA     NA    0.250    0.250
## 9    x ~~  I1        0.000 0.000     NA     NA    0.000    0.000
## 10   x ~~  I2       -0.008 0.000     NA     NA   -0.008   -0.008
## 11  I1 ~~  I1        0.958 0.000     NA     NA    0.958    0.958
## 12  I1 ~~  I2        0.002 0.000     NA     NA    0.002    0.002
## 13  I2 ~~  I2        1.013 0.000     NA     NA    1.013    1.013
## 14  ab := a*b    ab  0.324 0.018 17.749      0    0.288    0.360
fitmeasures(mediation.iv.fit, c("chisq", "df", "pvalue"))
##  chisq     df pvalue 
##  2.793  2.000  0.247

Ah, much better. Our \(\chi\)2 statistic looks good (\(\chi\)2=2.793, p=.247, df=2). Again, our \(a\) path didn’t change much, but \(b\) is now back in line with the correct model (\(\beta\)=.512 v. \(\beta\)=.512). Importantly, our \(ab\) parameter is also very similar to the correct model (\(ab\)=.324).

All is right with the world. Long live 2SLS!

Ok, sometimes instruments aren’t feasible, available, or they just plain don’t work well (dooh!).

So, the Imai et al. (2010) paper that you read introduces the concept of sensitivity analysis, using a Bayesian (yeah!) Monte Carlo algorithm to probe just how big the omitted variable problem (specifically, the covariance between \(\zeta{m}\) and \(\zeta{y}\)) would need to be to yield a nomologically incorrect estimation of the indirect effect.

What’s really cool about the methodology is that the logic of the paper extends to basically any construction of \(x\), \(m\), or \(y\).

Lets revisit the estimation for the causal effect of \(x\) on \(y\).

\(ACE = E(y|T = 1) - E(y|T = 0)\)

In a mediation model though, we’re really interested in the indirect effect of \(x\) on \(y\) that passes through \(m\).

Lets just assume that the mediator can take on 1 of 2 possible values (say 0 and 1). For each value of \(x\)—the treatment—we can estimate the average causal indirect effect of \(x\) on \(y\) for each value of \(m\):

\(ACIE(T=1) = E(y(T(M=1)) - y(T(M=0)))\)


\(ACIE(T=0) = E(y(T(M=1)) - y(T(M=0)))\)

Again key to making causal inference from either of these conditions is the assumption that no confounding factors influenced assignment into \(x\) or \(m\). With a measured \(m\) though, we violate that assumption.

As we’ve said, instruments are the cleanest way to establish conditional ignorability for \(m\). In the paper, an alternate approach is sequential ignorability. This is basically the assumption that randomization into \(x\) AND the collection (and modeling) of pretreatment covariates are such that while we know we don’t get ignorability, we get close enough to establish causal inference.

The simulation methodology is a way to test the assumption of sequential ignorability.

Lets set it up…

# We first set up our two regression equations
mediator.model <- lm(m ~ x, data = my.df)
outcome.model <- lm(y ~ x + m, data = my.df)
# Now we specify the full model using the 'mediate' function
mediation.si.fit <- mediate(mediator.model, outcome.model, treat = "x", 
                            mediator = "m", robustSE = TRUE, sims = 1000)

Now lets take a look…

## Causal Mediation Analysis 
## Quasi-Bayesian Confidence Intervals
##                Estimate 95% CI Lower 95% CI Upper p-value
## ACME              0.448        0.409        0.487       0
## ADE              -0.144       -0.187       -0.099       0
## Total Effect      0.304        0.250        0.361       0
## Prop. Mediated    1.480        1.286        1.732       0
## Sample Size Used: 10000 
## Simulations: 1000

Ok, now we know from our correct model that the ab path isn’t accurate, but in the real world, we wouldn’t know this. We can also see that the ab estimate here is also very close to the naive SEM model (.448 v. .440).

We can also plot these results…


We’re really after testing the assumption of sequential ignorability. For that, we need to run some simulations using the medsens function.

sensitivity.fit <- medsens(mediation.si.fit, rho.by = 0.1, 
                           effect.type = "indirect", sims = 100)

Lets walk through the logic before we get to the output…

## Mediation Sensitivity Analysis for Average Causal Mediation Effect
## Rho at which ACME = 0: 0.6
## R^2_M*R^2_Y* at which ACME = 0: 0.36
## R^2_M~R^2_Y~ at which ACME = 0: 0.2

The basic logic is to simulate what the value of the correlation (\(\rho\)) between \(\zeta{m}\) and \(\zeta{y}\) would need to be in order to have observed no statistically significant indirect effect (ACME = 0).

In this model, \(\rho\) = .6. The higher the value, according to the paper, the higher the threshold if you will for an omitted variable to be materially influencing the results.

We can also take a look at this visually…

plot(sensitivity.fit, sens.par = "rho", main = "Mediator")

Now, I think this is a terrific approach, but it’s not better than 2SLS, ASSUMING you have individually and jointly valid instruments, the instruments are strong, and they are properly excluded.

Now, that said, I think sensitivity analysis is a very helpful technique, and certainly should be integrated into an instrumental variable framework.

In this model, the sensitivity analysis pointed to a pretty high value of \(\rho\), but we know from our correct model that the estimated indirect effect is pretty far off.

Again, we’re back to the substantively different versus numerically different distinction. You know where I stand :)

We’re not going to go through it today, but the mediate package can handle a wide range of parametric and non-parametric estimators, which makes it very handy.

Lets talk about manipulation of the mediator research designs.

Here, we want to randomize assignment into both \(x\) and \(m\). The logic is straightforward—if I randomize into both predictors, then I can estimate the causal indirect effect.

There are a few ways to do this, all of which have limitations, and some of which are downright hard.

The one I want to focus on today is the parallel design.

In the parallel design, you conduct two experiments in parallel. In experiment one, you use a measurement of mediator approach, randomly assigning \(i\)’s to treatment conditions for \(x\), but then measuring \(m\) and \(y\). Analysis of the data proceeds as before.

In the second experiment, you randomly assign \(i\)’s to treatment conditions of \(x\), and then a subset of each group is randomly assigned to a mediator condition.

The logic is a little tricky, and it carries some drawbacks…

Also, it’s not often that the researcher can actually manipulate the mediator. So one possibility is to use an encouragement to have a given \(i\) take on a particular value of the mediator.

Lets go back to our example from last week…

A popular argument today is that participating in incubators/accelerators can improve the probability of a nascent venture’s successful start. A key mechanism for this effect is the positive social and professional interactions with other founder teams that happen as a function of being in the incubator/accelerator.

In our perfect world, we would randomly assign startups to either be in an incubator or not (our \(x\) variable). But after that, we can’t feasibly stop people from engaging in any social and professional interactions (necessary for \(m\) to equal 0).

But we can encourage selection…

For those in the incubators, we could have three different incubator setups. For example, in the first one, we set up the space to minimize interactions, like having all closed door offices. We could also have no shared spaces (like break rooms or a conference room), and send a message that we really want the startup teams to work independently.

So we can’t stop the startup teams from socializing, but we have made it difficult to do so.

Similarly, in another group of incubators, we could have a completely open floor plan, lots of shared spaces, lots of social events, and send the message that we want a lot of interaction among startup teams. Of course, we can’t stop the private/anti-social folks, but we made it harder.

Then we’ve got a third group in the incubators who receive no encouragement—we try our best to be ‘neutral’ in our physical space setup and our message about social/professional interaction.

We could use a similar approach for startups not in incubators.

Now, you might be thinking that the social/professional encouragement could also be a moderator of the \(x\) —> \(y\) relationship.

And you would be right; it’s very likely. However, for an encouragement design to work (i.e., to be identified), we make the assumption that there is no interaction between \(x\) and \(m\). We’ll be talking more about this after Spring Break.

We can then estimate the causal indirect effect for both experiments and make a comparison about the validity of both inferences.

The mediation package has a set of built in tools to analyze parallel encouragement design data, and I encourage you to take a look. It is though beyond our scope today.

Ok, yes, this is hard to do. But, mediation is so valuable—and so often poorly done—that it’s in our (science) best interest to do this better.

So debrief…What’s still bugging you?


Lab 23 March – Mediation assessment

Seminar 3 April – Conditional indirect effect models