Causal Inference In Mediation

ENT5587B - Research Design & Theory Testing II

Brian S. Anderson, Ph.D.
Assistant Professor
Department of Global Entrepreneurship & Innovation
andersonbri@umkc.edu

© 2017 Brian S. Anderson

• How goes it?
• IRB update – Lets Get Started!
• Paper reviews
• Experimental design brainstorm
• Observational design brainstorm
• What does it mean to have an identification problem?
• 2SLS and mediation
• Lab 16 March – Mediation Assessment

Lets walk through the logic again of the traditional mediation model.

Why do we prefer this specification instead?

Remember that the path between X –> M is termed $$a$$. The M – > Y path is termed $$b$$.

In a mediation model, we’re interested in what effect again?

To establish an indirect effect, what must we be able to show?

That’s right…eliminating alternate causes.

This is going to be our focus for today.

Before we get into the empirics, we’re going to spend some time focusing on research design for observational studies testing indirect effects.

We’ll cover experimental designs next week.

This is a framework used a lot in economics, and you can find what I think is the best discussion of this framework in Mostly Harmless Econometrics, which I highly recommend.

Lets walk through the steps (there are four).

Step 1

What is the causal relationship of interest?

Step 2

What would the ideal experiment be to test this relationship?

By ideal, we mean in a world free of budget—and IRB—restrictions, how would we evaluate the causal effect we are interested in.

This step is really important, because it’s a valuable tool to identify instruments and solve what’s called the identification problem.

Step 3

Speaking of the identification problem, what is your identification strategy?

When we talk about identifying an observational model, we’re talking about using some variable(s)— instruments—to mimic the ignorability feature of a randomized experiment.

To see how our friends in economics do it, here’s one example.

Here’s another solid example using the natural experiment approach.

Quick quiz—do we ever get ignorability with instruments?

Step 4

What is your estimation approach?

Ok, lets walk through this in practice, and head to the board.

A popular argument today is that participating in incubators/accelerators can improve the probability of a nascent venture’s successful start. A key mechanism for this effect is the positive social and professional interactions with other founder teams that happen as a function of being in the incubator/accelerator.

So how would we draw this picture?

Ok, before we can move past step 1, we need some definitions…

Step 2

What would be an ideal experiment to test our relationship. I want you to spend 10-15 minutes working on this this. How would you do it? What would be the manipulation? What would be the sample and sample characteristics?

Even more, what would be necessary to estimate the causal effect of our mechanism on our outcome?

Now, the hard part, and this will take some time. We know that we aren’t going to get our ideal experiment, because it would be unethical to do to new ventures what we’d like to do.

So, we have an identification problem.

The trick—and I use that term loosely—is to peel back, one layer at a time, the assumptions necessary to observe a causal effect in our model.

Lets start with the $$a$$ path in our model: The relationship between between in an incubator and having positive social/professional interactions with other founder teams.

Assume we have observational data. What assumptions would need to be met in order to satisfy a hypothetical ignorability condition to observe the causal effect of being in an incubator and yielding positive social/professional interactions?

So here’s the big question, and lets do this as a team. What variable(s) could we come up with that would satisfy those assumptions?

In other words, is there a variable that can ‘stand in’ for the randomization provided by an experimental design?

So after this exercise, you should have a healthy respect for our friends in economics who do this a lot.

You should also further appreciate the value of experiments (natural or controlled).

You should also be even more skeptical of mediation studies (or just main effect studies!!!) that do not address the identification problem.

Ok, so we’ve found our instruments—yeah!—and now lets talk about our estimation approach.

Remember, to avoid an interpretational confounding issue, we need an estimator that can evaluate simultaneous equations and integrate instrument variables.

3SLS is an option, so are some other GMM models. My preferred option is SEM because, well, it’s straightforward to model and to interpret.

SEM can also…

• Easily deal with missing data (listwise or FIML)
• Integrate robust standard errors or bootstrapped standard errors
• Account for measurement error
• Employ both latent and observed variables (in the same model)
• Employ continuous, nominal, and interval variables (in the same model)
• Estimate a range of LDV and other non-continuous DV models
• Easily estimate multiple DV models

Ok, so lets assume we’ve collected our data and our instruments. How do we do this thing?

We’re going to use a dataset that I’ve made up…

library(tidyverse)
my.df <- as.data.frame(my.ds)

Before we get going using SEM, lets use the classic four step Baron and Kenny methodology with the three equations. Yes, we’re setting up B&K for failure.

Step 1 – Establish an effect to be mediated

$$y = \alpha + {\beta}x + \epsilon$$

me.model <- lm(y ~ x, data = my.df)
summary(me.model)
# Set a variable equal to the coeffecient estimate for x
me <- coef(me.model)["x"]
##
## Call:
## lm(formula = y ~ x, data = my.df)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -5.9367 -0.9925  0.0016  1.0062  5.8870
##
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.003412   0.014710   0.232    0.817
## x           0.321803   0.011803  27.264   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.471 on 9998 degrees of freedom
## Multiple R-squared:  0.0692, Adjusted R-squared:  0.06911
## F-statistic: 743.4 on 1 and 9998 DF,  p-value: < 2.2e-16
## Coefficient estimate for x --> y // B&K Step 1:  0.3218034

So according to B&K, we should proceed with a mediation model. Wait, why doesn’t that make sense again?

Oh yeah, keep an eye on that estimate. It’s completely bogus. How do I know? Well, because I made this data set up, with an omitted variable between x and m (z1), and between m and y (z2). The omitted variable could be another causal factor, measurement error, selection effects, etc…

Ok, moving on to Step 2 – Establish a relationship between x and the mediator (m)

$$m = \alpha + {\beta}x + \epsilon$$

apath.model <- lm(m ~ x, data = my.df)
a <- coef(apath.model)["x"]
cat("Coefficient estimate for x --> m // B&K Step 2: ", a)
## Coefficient estimate for x --> m // B&K Step 2:  0.6309059

Yes, it’s statistically significant. I’m omitting that part, but you can see it for yourself in your summary output if you want to.

Step 3 – Establish a relationship between m and y

$$y = \alpha + {\beta}x + {\beta}m + \epsilon$$

bpath.model <- lm(y ~ x + m, data = my.df)
b <- coef(bpath.model)["m"]
c_p <- coef(bpath.model)["x"]  # Store this for later
cat("Coefficient estimate for m --> y // B&K Step 3: ", b)
## Coefficient estimate for m --> y // B&K Step 3:  0.6920514

Step 4 – Evaluate the change in the coefficient for x ($$c'$$) in the model with m present

## Coefficient estimate for main effect // B&K Step 1:  0.3218034
## Coefficient estimate for c prime // B&K Step 4:  -0.1148159

So just in looking at it, yeah, I’d say that’s a change. Just on the surface this should be concerning because we completely flipped a sign from one model to the next. Possible? Sure. Likely? Well…

Lets go ahead and get the ‘total effect’ just for comparison…

ab <- a * b
c <- c_p + ab
cat("Indirect effect: ", ab, "  Total effect: ", c)
## Indirect effect:  0.4366193   Total effect:  0.3218034

We’re not going to bother with calculating proportion of effect mediated, bootstrapped standard errors, etc. Because, well, they would be wrong.

So, lets do this correctly.

We’re going to start with mimicking the B&K method using structural equation modeling, like we did last week. So, we’ll be including a path from x to y (the $$c'$$ path), along with calculating a parameter for the ‘total effect’ ($$c = c' + ab$$).

library(lavaan)
mediation.bk.model <- 'm ~ a * x  # a path
y ~ b * m  # b path
y ~ c_p * x  # c prime path

# Specified parameters
ab := a*b
c := c_p + (a * b)'
mediation.bk.fit <- sem(mediation.bk.model, data = my.df)

The default output for the model doesn’t fit well on a slide, so lets grab the key pieces…

parameterEstimates(mediation.bk.fit)
##   lhs op       rhs label    est    se       z pvalue ci.lower ci.upper
## 1   m  ~         x     a  0.631 0.011  58.145      0    0.610    0.652
## 2   y  ~         m     b  0.692 0.008  82.474      0    0.676    0.708
## 3   y  ~         x   c_p -0.115 0.011 -10.902      0   -0.135   -0.094
## 4   m ~~         m        1.829 0.026  70.711      0    1.778    1.879
## 5   y ~~         y        1.288 0.018  70.711      0    1.252    1.323
## 6   x ~~         x        1.553 0.000      NA     NA    1.553    1.553
## 7  ab :=       a*b    ab  0.437 0.009  47.522      0    0.419    0.455
## 8   c := c_p+(a*b)     c  0.322 0.012  27.267      0    0.299    0.345

Go back and take a look at our regression results, and take a look at the indirect effect estimate and the total effect estimate.

What’s the moral of our story at this point?

SEM is a very powerful tool that will help you be a better analyst.

But, you need to know what you’re doing, because despite using SEM you would have made the exact same mistake as we did using just OLS.

By the way, take a look again at the total effect, and walk through the algebra. Lets say that the $$c'$$ path is ‘correct’ and it’s negative. Our indirect effect though is positive. So we added a negative coefficient to a positive indirect effect. Is that concerning at all, and what would it mean?

Ok, lets go back to our model. This model is what we call ‘just identified’, so we don’t have any extra degrees of freedom to evaluate model fit (we’ll talk more about this when we get to our SEM class, but it’s an important concept).

fitmeasures(mediation.bk.fit, c("chisq", "df", "pvalue"))
##  chisq     df pvalue
##      0      0     NA

Oh, it’s the same term as ‘identification’ from our earlier discussion on instrument variables, but it has a different meaning in this context.

Ok, lets assume that this is observational data. We know then that there will be an omitted variable problem.

Likely this is caused by a selection effect, but it could also be because of an omitted common covariate (or cause).

In fact, this is the correct model…

Lets go ahead and model that…

mediation.ov.model <- 'm ~ a * x  # a path
y ~ b * m  # b path
y ~ c_p * x  # c prime path

# Correct omitted variable paths
x ~ z1
m ~ z1 + z2
y ~ z2

# Specified parameters
ab := a*b
c := c_p + (a * b)'
mediation.ov.fit <- sem(mediation.ov.model, data = my.df)
parameterEstimates(mediation.ov.fit)
##    lhs op       rhs label    est    se      z pvalue ci.lower ci.upper
## 1    m  ~         x     a  0.417 0.010 41.449   0.00    0.397    0.437
## 2    y  ~         m     b  0.502 0.008 60.740   0.00    0.486    0.518
## 3    y  ~         x   c_p  0.005 0.010  0.482   0.63   -0.014    0.023
## 4    x  ~        z1        0.592 0.011 53.117   0.00    0.570    0.614
## 5    m  ~        z1        0.574 0.013 45.163   0.00    0.549    0.599
## 6    m  ~        z2        0.601 0.011 53.754   0.00    0.579    0.623
## 7    y  ~        z2        0.592 0.011 52.460   0.00    0.570    0.614
## 8    m ~~         m        1.227 0.017 70.711   0.00    1.193    1.261
## 9    y ~~         y        1.010 0.014 70.711   0.00    0.982    1.038
## 10   x ~~         x        1.211 0.017 70.711   0.00    1.178    1.245
## 11  z1 ~~        z1        0.975 0.000     NA     NA    0.975    0.975
## 12  z1 ~~        z2       -0.005 0.000     NA     NA   -0.005   -0.005
## 13  z2 ~~        z2        0.981 0.000     NA     NA    0.981    0.981
## 14  ab :=       a*b    ab  0.209 0.006 34.237   0.00    0.197    0.221
## 15   c := c_p+(a*b)     c  0.214 0.010 22.124   0.00    0.195    0.233
fitmeasures(mediation.ov.fit, c("chisq", "df", "pvalue"))
##  chisq     df pvalue
##  0.935  2.000  0.627

Yeah, so that’s a bit different. Again, the results here are the ‘correct’ results because this was how I set up the model when I generated the data (including the omitted z variables).

By the way, this model shows no evidence of misspecification (non-significant $$\chi$$2)—this is going to be very important in a minute.

I had no idea what the Baron and Kenny method without the z variables was going to spit out for parameter estimates. The moral of our story?

Well, when you have an omitted variable—endogeneity—problem, life is like a box of chocolates.

The problem for researchers is, of course, while we know the z’s exist in real life, we didn’t observe them (otherwise we just would have modeled them).

So, we need some help, and that’s where the 2SLS estimator comes in.

So lets revisit the logic for using instruments…

To recover causal parameter estimates, we need an instrument(s) that ideally ‘stands in for’ random assignment to our treatment conditions. In this case, we have two treatment conditions, x, and m.

We’ll walk through more about causality in mediation designs next week. For now, lets just focus on model specification.

But with two variables that are not randomly assigned, we need instruments for x, and for m.

Under the conditional ignorability assumption, we can recover consistent (and hence causal, with an appropriate research design) parameter estimates if…

• Our instrument(s) are individually and jointly valid
• Our instrument(s) are properly excluded (not themselves endogenous)

I’ve got some instruments in the dataset that we’re going to use. I1 and I2 are instruments for x; I3 and I4 are instruments for m.

mediation.iv.model <- 'm ~ a * x  # a path
y ~ b * m  # b path
y ~ c_p * x  # c prime path

# Instrument variable paths
x ~ I1 + I2
m ~ I3 + I4

# Specify the error term covariances
x ~~ m
m ~~ y
x ~~ y

ab := a*b
c := c_p + (a * b)'
mediation.iv.fit <- sem(mediation.iv.model, data = my.df)
parameterEstimates(mediation.iv.fit)
##    lhs op       rhs label    est    se      z pvalue ci.lower ci.upper
## 1    m  ~         x     a  0.438 0.027 16.274  0.000    0.385    0.491
## 2    y  ~         m     b  0.506 0.024 21.009  0.000    0.458    0.553
## 3    y  ~         x   c_p  0.005 0.027  0.202  0.840   -0.047    0.058
## 4    x  ~        I1        0.333 0.011 29.379  0.000    0.311    0.356
## 5    x  ~        I2        0.341 0.011 29.864  0.000    0.319    0.364
## 6    m  ~        I3        0.336 0.013 26.717  0.000    0.311    0.360
## 7    m  ~        I4        0.349 0.013 27.809  0.000    0.325    0.374
## 8    m ~~         x        0.296 0.039  7.655  0.000    0.221    0.372
## 9    m ~~         y        0.340 0.043  7.894  0.000    0.256    0.425
## 10   y ~~         x       -0.004 0.035 -0.111  0.911   -0.074    0.066
## 11   m ~~         m        1.654 0.028 58.409  0.000    1.598    1.709
## 12   y ~~         y        1.351 0.025 53.682  0.000    1.302    1.401
## 13   x ~~         x        1.325 0.019 70.711  0.000    1.288    1.362
## 14  I1 ~~        I1        1.006 0.000     NA     NA    1.006    1.006
## 15  I1 ~~        I2        0.004 0.000     NA     NA    0.004    0.004
## 16  I1 ~~        I3        0.010 0.000     NA     NA    0.010    0.010
## 17  I1 ~~        I4        0.012 0.000     NA     NA    0.012    0.012
## 18  I2 ~~        I2        0.993 0.000     NA     NA    0.993    0.993
## 19  I2 ~~        I3        0.011 0.000     NA     NA    0.011    0.011
## 20  I2 ~~        I4        0.008 0.000     NA     NA    0.008    0.008
## 21  I3 ~~        I3        0.977 0.000     NA     NA    0.977    0.977
## 22  I3 ~~        I4        0.016 0.000     NA     NA    0.016    0.016
## 23  I4 ~~        I4        0.980 0.000     NA     NA    0.980    0.980
## 24  ab :=       a*b    ab  0.221 0.017 12.895  0.000    0.188    0.255
## 25   c := c_p+(a*b)     c  0.227 0.030  7.447  0.000    0.167    0.287
fitmeasures(mediation.iv.fit, c("chisq", "df", "pvalue"))
##  chisq     df pvalue
##  6.105  5.000  0.296

Ok, it’s tough to see all of the results. Lets put our models together…

Correct model – With Z’s Included:

##   Path Parameter      se pvalue
## 1   ab     0.209 0.00612      0
## 2    c     0.214 0.00968      0

B&K model – Z’s Omitted (Endogeneity):

##   Path Parameter      se pvalue
## 1   ab     0.437 0.00919      0
## 2    c     0.322 0.01180      0

2SLS model – Z’s Omitted (Instruments & Correlated Disturbances):

##   Path Parameter     se   pvalue
## 1   ab     0.221 0.0172 0.00e+00
## 2    c     0.227 0.0305 9.57e-14

Lets also take a look at the psi ($$\psi$$) matrix from our 2SLS model, which in lavaan is the matrix containing the covariances between the eta’s ($$\eta$$)—the dependent variables (x, m, and y in our model). These are actually the covariances between the error terms (called $$\zeta$$) that are necessary to properly estimate a 2SLS model.

inspect(mediation.iv.fit, what = "estimates")\$psi
##    m      y      x      I1     I2     I3     I4
## m   1.654
## y   0.340  1.351
## x   0.296 -0.004  1.325
## I1  0.000  0.000  0.000  1.006
## I2  0.000  0.000  0.000  0.004  0.993
## I3  0.000  0.000  0.000  0.010  0.011  0.977
## I4  0.000  0.000  0.000  0.012  0.008  0.016  0.980

By the way, in the classic LISREL notation, part of this matrix would be phi ($$\phi$$). We’ll talk about that more when we get to the SEM class, but knowing LISREL is helpful to understanding other SEM software packages.

So the large covariance between $$\zeta$$’s for x and m reflects that there is an omitted variable (z1, remember?) that is causing x to be endogenous to m. Using our two instruments to help us over identify the model, we can actually see what this value is (unlike using ivreg, etc.). Because it’s statistically significant, we know that we need to retain this path, along with our instruments.

The same logic applies to the covariance between the $$\zeta$$’s for m and y.

But how about the covariance between the $$\zeta$$’s for x and y? Do we need that one?

So we could drop that parameter and our model won’t be misspecified. We didn’t need to free that covariance, nor did we need to estimate a $$c'$$ path in the first place, because our instruments would account for any missing M’s that have gone unobserved.

Yet another reason why we don’t need—and it would actually be a mistake to—estimate equation 1 from B&K.

So our more parsimonious model looks like this…

mediation.ivclean.model <- 'm ~ a * x  # a path
y ~ b * m  # b path

# Instrument variable paths
x ~ I1 + I2
m ~ I3 + I4

# Specify the error term covariances
x ~~ m
m ~~ y

# Indirect effect
ab := a*b'
mediation.ivclean.fit <- sem(mediation.ivclean.model, data = my.df)
parameterEstimates(mediation.ivclean.fit)
##    lhs op rhs label   est    se      z pvalue ci.lower ci.upper
## 1    m  ~   x     a 0.437 0.026 16.639      0    0.386    0.489
## 2    y  ~   m     b 0.509 0.013 40.653      0    0.484    0.533
## 3    x  ~  I1       0.333 0.011 29.373      0    0.311    0.356
## 4    x  ~  I2       0.341 0.011 29.870      0    0.319    0.364
## 5    m  ~  I3       0.336 0.012 26.983      0    0.312    0.360
## 6    m  ~  I4       0.350 0.012 28.103      0    0.325    0.374
## 7    m ~~   x       0.298 0.038  7.884      0    0.224    0.372
## 8    m ~~   y       0.336 0.025 13.504      0    0.287    0.385
## 9    m ~~   m       1.654 0.028 58.891      0    1.599    1.709
## 10   y ~~   y       1.349 0.021 64.698      0    1.308    1.390
## 11   x ~~   x       1.325 0.019 70.711      0    1.288    1.362
## 12  I1 ~~  I1       1.006 0.000     NA     NA    1.006    1.006
## 13  I1 ~~  I2       0.004 0.000     NA     NA    0.004    0.004
## 14  I1 ~~  I3       0.010 0.000     NA     NA    0.010    0.010
## 15  I1 ~~  I4       0.012 0.000     NA     NA    0.012    0.012
## 16  I2 ~~  I2       0.993 0.000     NA     NA    0.993    0.993
## 17  I2 ~~  I3       0.011 0.000     NA     NA    0.011    0.011
## 18  I2 ~~  I4       0.008 0.000     NA     NA    0.008    0.008
## 19  I3 ~~  I3       0.977 0.000     NA     NA    0.977    0.977
## 20  I3 ~~  I4       0.016 0.000     NA     NA    0.016    0.016
## 21  I4 ~~  I4       0.980 0.000     NA     NA    0.980    0.980
## 22  ab := a*b    ab 0.222 0.015 15.126      0    0.193    0.251
fitmeasures(mediation.ivclean.fit, c("chisq", "df", "pvalue"))
##  chisq     df pvalue
##  6.149  7.000  0.522

See? No evidence of misspecification in our $$\chi$$2 statistic. We’re good to go.

Last thing…what happens when we keep our instruments, but DO NOT include the error term covariances?

mediation.ivbad.model <- 'm ~ a * x  # a path
y ~ b * m  # b path

# Instrument variable paths
x ~ I1 + I2
m ~ I3 + I4

# Indirect effect
ab := a*b'
##    lhs op rhs label   est    se      z pvalue ci.lower ci.upper
## 1    m  ~   x     a 0.629 0.010 62.017      0    0.609    0.649
## 2    y  ~   m     b 0.646 0.007 88.578      0    0.632    0.660
## 3    x  ~  I1       0.334 0.011 29.097      0    0.311    0.356
## 4    x  ~  I2       0.341 0.012 29.510      0    0.318    0.363
## 5    m  ~  I3       0.337 0.013 26.315      0    0.311    0.362
## 6    m  ~  I4       0.346 0.013 27.078      0    0.321    0.371
## 7    m ~~   m       1.597 0.023 70.711      0    1.553    1.641
## 8    y ~~   y       1.303 0.018 70.711      0    1.267    1.339
## 9    x ~~   x       1.325 0.019 70.711      0    1.288    1.362
## 10  I1 ~~  I1       1.006 0.000     NA     NA    1.006    1.006
## 11  I1 ~~  I2       0.004 0.000     NA     NA    0.004    0.004
## 12  I1 ~~  I3       0.010 0.000     NA     NA    0.010    0.010
## 13  I1 ~~  I4       0.012 0.000     NA     NA    0.012    0.012
## 14  I2 ~~  I2       0.993 0.000     NA     NA    0.993    0.993
## 15  I2 ~~  I3       0.011 0.000     NA     NA    0.011    0.011
## 16  I2 ~~  I4       0.008 0.000     NA     NA    0.008    0.008
## 17  I3 ~~  I3       0.977 0.000     NA     NA    0.977    0.977
## 18  I3 ~~  I4       0.016 0.000     NA     NA    0.016    0.016
## 19  I4 ~~  I4       0.980 0.000     NA     NA    0.980    0.980
## 20  ab := a*b    ab 0.406 0.008 50.803      0    0.391    0.422
fitmeasures(mediation.ivbad.fit, c("chisq", "df", "pvalue"))
##   chisq      df  pvalue
## 261.174   9.000   0.000

Quick quiz…there’s one crystal clear way to know that we’ve made a mistake. What is it?

By the way, this was just a simple mediation problem. Look at what was necessary to just properly specify a x –> m –> y model.

See why I am such a skeptic about multiple mediation models?

Oh…the bootstrap thing. So the lavaan package makes it very easy. We’ll use our ivclean model.

mediation.ivboot.fit <- sem(mediation.ivclean.model, data = my.df,
se = "bootstrap", bootstrap = 5000)
parameterEstimates(mediation.ivboot.fit)
##    lhs op rhs label   est    se      z pvalue ci.lower ci.upper
## 1    m  ~   x     a 0.437 0.026 16.694      0    0.384    0.487
## 2    y  ~   m     b 0.509 0.012 40.854      0    0.484    0.533
## 3    x  ~  I1       0.333 0.011 29.575      0    0.312    0.356
## 4    x  ~  I2       0.341 0.012 29.637      0    0.319    0.365
## 5    m  ~  I3       0.336 0.012 27.007      0    0.311    0.361
## 6    m  ~  I4       0.350 0.012 27.983      0    0.325    0.375
## 7    m ~~   x       0.298 0.038  7.859      0    0.225    0.375
## 8    m ~~   y       0.336 0.025 13.448      0    0.288    0.387
## 9    m ~~   m       1.654 0.028 58.900      0    1.600    1.712
## 10   y ~~   y       1.349 0.020 65.881      0    1.310    1.391
## 11   x ~~   x       1.325 0.019 71.348      0    1.289    1.361
## 12  I1 ~~  I1       1.006 0.000     NA     NA    1.006    1.006
## 13  I1 ~~  I2       0.004 0.000     NA     NA    0.004    0.004
## 14  I1 ~~  I3       0.010 0.000     NA     NA    0.010    0.010
## 15  I1 ~~  I4       0.012 0.000     NA     NA    0.012    0.012
## 16  I2 ~~  I2       0.993 0.000     NA     NA    0.993    0.993
## 17  I2 ~~  I3       0.011 0.000     NA     NA    0.011    0.011
## 18  I2 ~~  I4       0.008 0.000     NA     NA    0.008    0.008
## 19  I3 ~~  I3       0.977 0.000     NA     NA    0.977    0.977
## 20  I3 ~~  I4       0.016 0.000     NA     NA    0.016    0.016
## 21  I4 ~~  I4       0.980 0.000     NA     NA    0.980    0.980
## 22  ab := a*b    ab 0.222 0.015 15.277      0    0.193    0.251

Now with robust standard errors…

mediation.ivrobust.fit <- sem(mediation.ivclean.model, data = my.df, se = "robust.sem")
parameterEstimates(mediation.ivrobust.fit)
##    lhs op rhs label    est    se      z pvalue ci.lower ci.upper
## 1    m  ~   x     a  0.437 0.028 15.461  0.000    0.382    0.492
## 2    y  ~   m     b  0.509 0.012 41.033  0.000    0.484    0.533
## 3    x  ~  I1        0.333 0.013 25.951  0.000    0.308    0.359
## 4    x  ~  I2        0.341 0.013 26.410  0.000    0.316    0.367
## 5    m  ~  I3        0.336 0.014 24.010  0.000    0.309    0.364
## 6    m  ~  I4        0.350 0.014 25.248  0.000    0.322    0.377
## 7    m ~~   x        0.298 0.041  7.236  0.000    0.217    0.378
## 8    m ~~   y        0.336 0.025 13.603  0.000    0.288    0.385
## 9    m ~~   m        1.654 0.029 57.643  0.000    1.598    1.711
## 10   y ~~   y        1.349 0.021 65.703  0.000    1.309    1.389
## 11   x ~~   x        1.325 0.019 69.169  0.000    1.287    1.362
## 12  I1 ~~  I1        1.006 0.000     NA     NA    1.006    1.006
## 13  I1 ~~  I2        0.004 0.000     NA     NA    0.004    0.004
## 14  I1 ~~  I3        0.010 0.000     NA     NA    0.010    0.010
## 15  I1 ~~  I4        0.012 0.000     NA     NA    0.012    0.012
## 16  I2 ~~  I2        0.993 0.000     NA     NA    0.993    0.993
## 17  I2 ~~  I3        0.011 0.000     NA     NA    0.011    0.011
## 18  I2 ~~  I4        0.008 0.000     NA     NA    0.008    0.008
## 19  I3 ~~  I3        0.977 0.000     NA     NA    0.977    0.977
## 20  I3 ~~  I4        0.016 0.000     NA     NA    0.016    0.016
## 21  I4 ~~  I4        0.980 0.000     NA     NA    0.980    0.980
## 22   m ~1           -0.016 0.014 -1.177  0.239   -0.043    0.011
## 23   y ~1            0.012 0.012  1.067  0.286   -0.010    0.035
## 24   x ~1            0.003 0.012  0.250  0.802   -0.021    0.028
## 25  I1 ~1            0.006 0.000     NA     NA    0.006    0.006
## 26  I2 ~1            0.011 0.000     NA     NA    0.011    0.011
## 27  I3 ~1           -0.009 0.000     NA     NA   -0.009   -0.009
## 28  I4 ~1            0.010 0.000     NA     NA    0.010    0.010
## 29  ab := a*b    ab  0.222 0.016 14.267  0.000    0.192    0.253

Not much difference, right? Note that sometimes you can’t use robust standard errors (depending on the estimator), so it’s always good to have bootstrapping in your pocket.

So debrief…this is a tough topic. What’s still bugging you?

Wrap-up.

Lab 16 March – Mediation assessment

Seminar 20 March – A rigorous way to think about mediation