Mediation Basics

ENT5587B - Research Design & Theory Testing II

Brian S. Anderson, Ph.D.
Assistant Professor
Department of Global Entrepreneurship & Innovation

© 2017 Brian S. Anderson

  • How goes it?
  • IRB update
  • Paper meetings
  • The curse of Baron & Kenny
  • Mediation with observed variables
  • Bootstrap analysis
  • Lab 9 March – Mediation Assessment

Why Baron and Kenny (1986) did so much to help clarify mediation analysis, and how misuse of their method resulted in, well, some of the worst science in our field…

Traditional mediation model

This is the classical graphical depiction for mediation, drawn from Baron and Kenny (1986). Lets walk through the logic.

There is ample discussion of just what a mediator is, but there are generally two schools of thought. Both of them are really talking about the same thing, but theory construction is subtly different.

The first perspective is that a mediator is an intervening mechanism connecting X to Y. By mechanism, I mean the theoretical device that explains why Y changes as a function of a change in X.

For example, a firm’s entrepreneurial behaviors (Y) increases as a function of prior firm growth (X) because the firm’s knowledge base (M) expands as a result of positive growth. A larger knowledge base is the impetus for future opportunity exploitation through the firm’s entrepreneurial behaviors.

By the way, I don’t care for this perspective, even though I’ve done it. A lot.

Two challenges I have with the mechanism perspective…

  • The problem of the infinitesimal
  • Theoretical specificity (or lack thereof)

The second perspective is the causal chain approach. X causes a change in M, which in turn causes a change in Y.

We can use our earlier example and talk about the growth story in the same way.

Firm growth (X) causes the firm’s knowledge base (M) to expand. This expansion illuminates new opportunities for value creation using the firm’s existing and newly acquired resources. The presence of new value creating opportunities is a key antecedent to enacting entrepreneurial behaviors (Y).

Ok, while I like this perspective better, we still have challenges. Lets go back to my logic and lay out just all of the steps necessary to specify this causal chain…

Oh, I’ve missed something really important in my logic. What is it?

Go ahead, model that well.

We also have a theoretical specificity issue with the causal chain perspective—what would it be?

Yep, that pesky eliminating alternate causes problem.

As we will explore, modeling mediation is frighteningly simple. Modeling mediation well is, well, pretty darn hard.

I think causal mediation models offer tremendous opportunity for knowledge creation in our field. I also think that while difficult to model well, the process for causal mediation modeling is generally easier than that for moderation, because endogeneity is easier to address in a mediation model.

Yes, I’m serious.

Lets go back and revisit Baron and Kenny…

Traditional mediation model

You can break down this picture into a series of equations…

\(m = \alpha + {\beta}x + \epsilon\)

\(y = \alpha + {\beta}x + {\beta}m + \epsilon\)

There is technically another equation in the B&K method, but we’re going to ignore that because, well, they were wrong.

Why are they wrong you say? Take a look at this graphic again…

Traditional mediation model

If the path (called the \(c'\) path) from X to Y remains statistically significant in the presence of M, why would this model always be misspecified?

Infinitesimals. Annoying little things.

So depending on who you ask, the correct way to graphically depict a mediation model is like this…

Causal mediation model

Note though that we still use the same equations to specify our model, why? Hint…what would happen if we dropped \(x\) from the second equation?

\(m = \alpha + {\beta}x + \epsilon\)

\(y = \alpha + {\beta}x + {\beta}m + \epsilon\)

So here’s the kicker…

Our theory—whether mechanism or causal chain—specifies these equations as a system. The two equations do not exist independently of each other, because the value of \(m\) is a consequence in the first model, and the same observation is an antecedent to the next.

This means that to evaluate our model, we must use an estimator that allows for simultaneous equations.

Failing to take into consideration the joint dependence of the error term (because the two equations share some of the same terms) will yield biased coefficient estimates.

All the time.

This means that you can safely ignore mediation studies on observational data—and even some experimental designs—that test for mediation using separately specified equations. Why?

Now, layer in the concern with alternate causes, omitted variables, measurement error, selection effects, simultaneity, and all of the other threats to causal inference, and it’s easy to see that most mediation studies using observational data are, well, not so good.

So I’m not going to bother teaching you the B&K method, and I’m only going to briefly touch on the bootstrapping method that you read about today. While I like bootstrapping, I don’t think it buys you much over a 2SLS estimator with robust standard errors…

The reason being is that I agree with Antonakis et al (2010) that there is never a reason to specify a mediation model, particularly with observational data, that does not account for endogeneity.

Before we get to instruments though, lets start from the top.

Take another look at our model…

Causal mediation model

The path between X –> M is usually termed \(a\). The M – > Y path is termed \(b\).

In a mediation model, we are interested in the indirect effect of X on Y that passes through M. Whether M is the mechanism or the causal chain, the ‘effect’ of X on Y must transfer through M.

You can’t get a change in Y as a function of a change in X without also having an intermediary change in M. Make sense?

We calculate the indirect effect by multiplying \(a\) x \(b\).

What’s so important about multiplying these values together?

Remember, a mediation model suggests that M is the critical piece connecting X and Y together.

So an easy way to think about the importance of the \(ab\) path is to consider effect sizes. Take a small effect of \(a\), say \(\beta\) = .2. Now imagine another small effect of \(b\), again \(\beta\) = .2. Both of these paths are statistically significant.

But when you look at the indirect effect of X on Y through M, you’re only talking about a total effect of .04 (.2 * .2). That’s not very big, and may not be statistically significant.

How would we interpret this kind of result?

One way to think about it is that X –> M is important, and M –> Y is important, but X –> M –> Y isn’t all that important.

M isn’t the critical piece we thought it would be to connect X and Y together. M is still in the nomological network for X and Y, but M does not meaningfully connect those two (X and Y) nomological networks together.

A (brief) note about the Sobel test and the proportion of effect mediated…

Back in the day, you would run two (or three in the way back in the day) different regressions, grab the coefficient estimates for \(a\) and \(b\), multiply them together, and construct what’s called the Sobel test to evaluate the strength and statistical significance of the indirect effect.

The problem is that it depends on \(ab\) being normally distributed, which it almost never is (remember our normality discussion from moderation?). So Sobel, while still found in our literature—unfortunately—isn’t a valid approach.

Also back in the day folks would calculate the ratio of the ab path to the ‘total’ effect. The total effect equals to \(c' + ab\), with \(c'\) being the effect of X on Y controlling for the presence of M. You will sometimes see the total effect written as just \(c\).

The logic being that for a mediation effect to be meaningful, the indirect effect should account for proportionally more variance in Y than the effect of X on Y absent M.

But we don’t think mediation in those terms anymore, because of the fallacy of partial mediation.

Quick quiz…why do we not care AT ALL about \(c\) or \(c'\)?

Ok, that’s not entirely true. Observing a statistically significant \(c'\) path if we included it does tell us that we have an omitted variable problem, why?

So we can’t use independent regression equations, and we can’t use the Sobel method, what’s a researcher to do?

Well, the choice of modeling approach may vary depending on whether you have an observational design or an experimental design.

Today we’re going to tackle the observational design, and next week the experimental design.

Training time out.

What are the three necessary and sufficient conditions to establish causality?

What is the condition that we NEVER have in an observational design?

So what must we do then when modeling observational data?

So we need an estimator that can estimate simultaneous equations, AND incorporate instrumental variables into the model.

There are a couple of options, including Three Stage Least Squares, which you will sometimes see in our literature.

My preferred option though is SEM. Yes, structural equation modeling.

There are a lot of SEM tools in R. My go-to is the lavaan package, and it’s the one that I base my SEM course on (just FYI).

We’re just going to be working with observed variables today, but the logic extends to latent variables as well.

Lets start by getting some data…

my.ds <- read_csv("")
my.df <- %>%
         filter(SGR > -50) %>% 

The model that we’re going to work with today posits that innovativeness (M) is a causal mechanism connecting the firm’s long-term strategic orientation (X) to its strategic risk taking (Y).

In equation form…

\(Innovativeness = \alpha + {\beta}LongRangePlanning + \epsilon\)

\(RiskTaking = \alpha + {\beta}LongRangePlanning + {\beta}Innovativeness + \epsilon\)

Lets start with a simple mediation model using lavaan.

mediation.model <- 'Innovativeness ~ a * LRP  # a path
                    RiskTaking ~ b * Innovativeness  # b path
                    ab := a*b  # Indirect effect' <- sem(mediation.model, data = my.df)

## lavaan (0.5-23.1097) converged normally after  13 iterations
##   Number of observations                           109
##   Estimator                                         ML
##   Minimum Function Test Statistic                1.247
##   Degrees of freedom                                 1
##   P-value (Chi-square)                           0.264
## Parameter Estimates:
##   Information                                 Expected
##   Standard Errors                             Standard
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   Innovativeness ~                                    
##     LRP        (a)    0.316    0.073    4.318    0.000
##   RiskTaking ~                                        
##     Innovtvnss (b)    0.460    0.065    7.063    0.000
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .Innovativeness    1.805    0.245    7.382    0.000
##    .RiskTaking        0.978    0.132    7.382    0.000
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     ab                0.145    0.039    3.684    0.000

The estimate of the standard error for the indirect effect (\(ab\)) comes from the Delta method.

But, we’re not done.

Lets talk about bootstrapping.

This approach has become all the rage, and there is nothing wrong with it. In fact, it’s a good thing, and I recommend integrating it with your mediation analysis.

But…the non-parametric approach that you see often has largely been replaced by a Monte Carlo approach.

Yes, this is from the same Kris Preacher from the Preacher & Hayes (2004) paper that you read about today. While non-parametric methods are also quite good, the Monte Carlo method is generally more computationally efficient, and is easier to use.

We’re going to make use of the semTools package for this, which is an add-on to lavaan.

indirect.effect <- 'a*b'
monteCarloMed(indirect.effect, object =, rep = 10000, CI = 95)
## $`Point Estimate`
## [1] 0.1454109
## $`95% Confidence Interval`
## LL 0.0744
## UL 0.2280

We interpret this result the same way as any other confidence interval, with the critical concern being whether the interval contains zero. If it did, how would we interpret this finding?

Here’s the thing though, particularly with observational designs. Because you can’t rule out omitted variable bias from either the \(a\) path or the \(b\) path, what’s the point of bootstrapping the standard errors for a biased indirect effect?

No matter how many iterations of the bootstrap you run, any inference drawn on the indirect effect will be incorrect. Still, assuming that you have an endogeneity correction, bootstrapping is a good idea, although it’s up in the air whether bootstrapping is superior to just robust standard errors.

So, we need to talk about endogeneity, but that’s for next week!


Lab 9 March – Mediation assessment

Seminar 13 March – Isolating causal estimates in mediation