Causal Inference In Mediation

ENT5587B - Research Design & Theory Testing II

Brian S. Anderson, Ph.D.

Assistant Professor

Department of Global Entrepreneurship & Innovation

andersonbri@umkc.edu

Â© 2017 Brian S. Anderson

- How goes it?
- IRB update â€“ Lets Get Started!
- Paper reviews
- Experimental design brainstorm
- Observational design brainstorm
- What does it mean to have an identification problem?
- 2SLS and mediation
- Lab 16 March â€“ Mediation Assessment

Lets walk through the logic again of the traditional mediation model.

Why do we prefer this specification instead?

Remember that the path between X â€“> M is termed \(a\). The M â€“ > Y path is termed \(b\).

In a mediation model, weâ€™re interested in what effect again?

To establish an indirect effect, what **must** we be able to show?

Thatâ€™s rightâ€¦eliminating alternate causes.

This is going to be our focus for today.

Before we get into the empirics, weâ€™re going to spend some time focusing on research design for observational studies testing indirect effects.

Weâ€™ll cover experimental designs next week.

This is a framework used *a lot* in economics, and you can find what I think is the best discussion of this framework in *Mostly Harmless Econometrics*, which I highly recommend.

Lets walk through the steps (there are four).

**Step 1**

What is the causal relationship of interest?

**Step 2**

What would the *ideal* experiment be to test this relationship?

By ideal, we mean in a world free of budgetâ€”and IRBâ€”restrictions, how would we evaluate the causal effect we are interested in.

This step is really important, because itâ€™s a valuable tool to identify instruments and solve whatâ€™s called the identification problem.

**Step 3**

Speaking of the identification problem, what is your identification strategy?

When we talk about identifying an observational model, weâ€™re talking about using some variable(s)â€” instrumentsâ€”to mimic the ignorability feature of a randomized experiment.

To see how our friends in economics do it, hereâ€™s one example.

Hereâ€™s another solid example using the natural experiment approach.

Quick quizâ€”do we ever get ignorability with instruments?

**Step 4**

What is your estimation approach?

Ok, lets walk through this in practice, and head to the board.

A popular argument today is that participating in incubators/accelerators can improve the probability of a nascent ventureâ€™s successful start. A key mechanism for this effect is the positive social and professional interactions with other founder teams that happen as a function of being in the incubator/accelerator.

So how would we draw this picture?

Ok, before we can move past step 1, we need some definitionsâ€¦

**Step 2**

What would be an ideal experiment to test our relationship. I want you to spend 10-15 minutes working on this this. How would you do it? What would be the manipulation? What would be the sample and sample characteristics?

Even more, what would be necessary to estimate the *causal* effect of our mechanism on our outcome?

Now, the hard part, and this will take some time. We know that we arenâ€™t going to get our ideal experiment, because it would be unethical to do to new ventures what weâ€™d like to do.

So, we have an identification problem.

The trickâ€”and I use that term looselyâ€”is to peel back, one layer at a time, the assumptions necessary to observe a causal effect in our model.

Lets start with the \(a\) path in our model: The relationship between between in an incubator and having positive social/professional interactions with other founder teams.

Assume we have observational data. What assumptions would need to be met in order to satisfy a hypothetical ignorability condition to observe the causal effect of being in an incubator and yielding positive social/professional interactions?

So hereâ€™s the big question, and lets do this as a team. What variable(s) could we come up with that would satisfy those assumptions?

In other words, is there a variable that can â€˜stand inâ€™ for the randomization provided by an experimental design?

So after this exercise, you should have a healthy respect for our friends in economics who do this a lot.

You should also further appreciate the value of experiments (natural or controlled).

You should also be even more skeptical of mediation studies (or just main effect studies!!!) that do not address the identification problem.

Ok, so weâ€™ve found our instrumentsâ€”yeah!â€”and now lets talk about our estimation approach.

Remember, to avoid an interpretational confounding issue, we need an estimator that can evaluate simultaneous equations *and* integrate instrument variables.

3SLS is an option, so are some other GMM models. My preferred option is SEM because, well, itâ€™s straightforward to model and to interpret.

SEM can alsoâ€¦

- Easily deal with missing data (listwise or FIML)
- Integrate robust standard errors or bootstrapped standard errors
- Account for measurement error
- Employ both latent and observed variables (in the same model)
- Employ continuous, nominal, and interval variables (in the same model)
- Estimate a range of LDV and other non-continuous DV models
- Easily estimate multiple DV models

Ok, so lets assume weâ€™ve collected our data and our instruments. How do we do this thing?

Weâ€™re going to use a dataset that Iâ€™ve made upâ€¦

```
library(tidyverse)
my.ds <- read_csv("http://a.web.umkc.edu/andersonbri/Mediation.csv")
my.df <- as.data.frame(my.ds)
```

Before we get going using SEM, lets use the classic four step Baron and Kenny methodology with the three equations. Yes, weâ€™re setting up B&K for failure.

Step 1 â€“ Establish an effect to be mediated

\(y = \alpha + {\beta}x + \epsilon\)

```
me.model <- lm(y ~ x, data = my.df)
summary(me.model)
# Set a variable equal to the coeffecient estimate for x
me <- coef(me.model)["x"]
```

```
##
## Call:
## lm(formula = y ~ x, data = my.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.9367 -0.9925 0.0016 1.0062 5.8870
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.003412 0.014710 0.232 0.817
## x 0.321803 0.011803 27.264 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.471 on 9998 degrees of freedom
## Multiple R-squared: 0.0692, Adjusted R-squared: 0.06911
## F-statistic: 743.4 on 1 and 9998 DF, p-value: < 2.2e-16
```

`## Coefficient estimate for x --> y // B&K Step 1: 0.3218034`

So according to B&K, we should proceed with a mediation model. Wait, why doesnâ€™t that make sense again?

Oh yeah, keep an eye on that estimate. Itâ€™s completely bogus. How do I know? Well, because I made this data set up, with an omitted variable between x and m (z1), and between m and y (z2). The omitted variable could be another causal factor, measurement error, selection effects, etcâ€¦

Ok, moving on to Step 2 â€“ Establish a relationship between x and the mediator (m)

\(m = \alpha + {\beta}x + \epsilon\)

```
apath.model <- lm(m ~ x, data = my.df)
a <- coef(apath.model)["x"]
cat("Coefficient estimate for x --> m // B&K Step 2: ", a)
```

`## Coefficient estimate for x --> m // B&K Step 2: 0.6309059`

Yes, itâ€™s statistically significant. Iâ€™m omitting that part, but you can see it for yourself in your summary output if you want to.

Step 3 â€“ Establish a relationship between m and y

\(y = \alpha + {\beta}x + {\beta}m + \epsilon\)

```
bpath.model <- lm(y ~ x + m, data = my.df)
b <- coef(bpath.model)["m"]
c_p <- coef(bpath.model)["x"] # Store this for later
cat("Coefficient estimate for m --> y // B&K Step 3: ", b)
```

`## Coefficient estimate for m --> y // B&K Step 3: 0.6920514`

Step 4 â€“ Evaluate the change in the coefficient for x (\(c'\)) in the model with m present

`## Coefficient estimate for main effect // B&K Step 1: 0.3218034`

`## Coefficient estimate for c prime // B&K Step 4: -0.1148159`

So just in looking at it, yeah, Iâ€™d say thatâ€™s a change. Just on the surface this should be concerning because we completely flipped a sign from one model to the next. Possible? Sure. Likely? Wellâ€¦

Lets go ahead and get the â€˜total effectâ€™ just for comparisonâ€¦

```
ab <- a * b
c <- c_p + ab
cat("Indirect effect: ", ab, " Total effect: ", c)
```

`## Indirect effect: 0.4366193 Total effect: 0.3218034`

Weâ€™re not going to bother with calculating proportion of effect mediated, bootstrapped standard errors, etc. Because, well, they would be wrong.

So, lets do this correctly.

Weâ€™re going to start with mimicking the B&K method using structural equation modeling, like we did last week. So, weâ€™ll be including a path from x to y (the \(c'\) path), along with calculating a parameter for the â€˜total effectâ€™ (\(c = c' + ab\)).

```
library(lavaan)
mediation.bk.model <- 'm ~ a * x # a path
y ~ b * m # b path
y ~ c_p * x # c prime path
# Specified parameters
ab := a*b
c := c_p + (a * b)'
mediation.bk.fit <- sem(mediation.bk.model, data = my.df)
```

The default output for the model doesnâ€™t fit well on a slide, so lets grab the key piecesâ€¦

`parameterEstimates(mediation.bk.fit)`

```
## lhs op rhs label est se z pvalue ci.lower ci.upper
## 1 m ~ x a 0.631 0.011 58.145 0 0.610 0.652
## 2 y ~ m b 0.692 0.008 82.474 0 0.676 0.708
## 3 y ~ x c_p -0.115 0.011 -10.902 0 -0.135 -0.094
## 4 m ~~ m 1.829 0.026 70.711 0 1.778 1.879
## 5 y ~~ y 1.288 0.018 70.711 0 1.252 1.323
## 6 x ~~ x 1.553 0.000 NA NA 1.553 1.553
## 7 ab := a*b ab 0.437 0.009 47.522 0 0.419 0.455
## 8 c := c_p+(a*b) c 0.322 0.012 27.267 0 0.299 0.345
```

Go back and take a look at our regression results, and take a look at the indirect effect estimate and the total effect estimate.

Whatâ€™s the moral of our story at this point?

SEM is a **very** powerful tool that will help you be a better analyst.

But, you need to know what youâ€™re doing, because despite using SEM you would have made the exact same mistake as we did using just OLS.

By the way, take a look again at the total effect, and walk through the algebra. Lets say that the \(c'\) path is â€˜correctâ€™ and itâ€™s negative. Our indirect effect though is positive. So we added a negative coefficient to a positive indirect effect. Is that concerning at all, and what would it mean?

Ok, lets go back to our model. This model is what we call â€˜just identifiedâ€™, so we donâ€™t have any extra degrees of freedom to evaluate model fit (weâ€™ll talk more about this when we get to our SEM class, but itâ€™s an important concept).

`fitmeasures(mediation.bk.fit, c("chisq", "df", "pvalue"))`

```
## chisq df pvalue
## 0 0 NA
```

Oh, itâ€™s the same term as â€˜identificationâ€™ from our earlier discussion on instrument variables, but it has a different meaning in this context.

Ok, lets assume that this is observational data. We know then that there **will** be an omitted variable problem.

Likely this is caused by a selection effect, but it could also be because of an omitted common covariate (or cause).

In fact, this is the correct modelâ€¦

Lets go ahead and model thatâ€¦

```
mediation.ov.model <- 'm ~ a * x # a path
y ~ b * m # b path
y ~ c_p * x # c prime path
# Correct omitted variable paths
x ~ z1
m ~ z1 + z2
y ~ z2
# Specified parameters
ab := a*b
c := c_p + (a * b)'
mediation.ov.fit <- sem(mediation.ov.model, data = my.df)
```

`parameterEstimates(mediation.ov.fit)`

```
## lhs op rhs label est se z pvalue ci.lower ci.upper
## 1 m ~ x a 0.417 0.010 41.449 0.00 0.397 0.437
## 2 y ~ m b 0.502 0.008 60.740 0.00 0.486 0.518
## 3 y ~ x c_p 0.005 0.010 0.482 0.63 -0.014 0.023
## 4 x ~ z1 0.592 0.011 53.117 0.00 0.570 0.614
## 5 m ~ z1 0.574 0.013 45.163 0.00 0.549 0.599
## 6 m ~ z2 0.601 0.011 53.754 0.00 0.579 0.623
## 7 y ~ z2 0.592 0.011 52.460 0.00 0.570 0.614
## 8 m ~~ m 1.227 0.017 70.711 0.00 1.193 1.261
## 9 y ~~ y 1.010 0.014 70.711 0.00 0.982 1.038
## 10 x ~~ x 1.211 0.017 70.711 0.00 1.178 1.245
## 11 z1 ~~ z1 0.975 0.000 NA NA 0.975 0.975
## 12 z1 ~~ z2 -0.005 0.000 NA NA -0.005 -0.005
## 13 z2 ~~ z2 0.981 0.000 NA NA 0.981 0.981
## 14 ab := a*b ab 0.209 0.006 34.237 0.00 0.197 0.221
## 15 c := c_p+(a*b) c 0.214 0.010 22.124 0.00 0.195 0.233
```

`fitmeasures(mediation.ov.fit, c("chisq", "df", "pvalue"))`

```
## chisq df pvalue
## 0.935 2.000 0.627
```

Yeah, so thatâ€™s a bit different. Again, the results here are the â€˜correctâ€™ results because this was how I set up the model when I generated the data (including the omitted z variables).

By the way, this model shows no evidence of misspecification (non-significant \(\chi\)^{2})â€”this is going to be **very** important in a minute.

I had no idea what the Baron and Kenny method without the z variables was going to spit out for parameter estimates. The moral of our story?

Well, when you have an omitted variableâ€”**endogeneity**â€”problem, life is like a box of chocolates.

The problem for researchers is, of course, while we know the zâ€™s exist in real life, we didnâ€™t observe them (otherwise we just would have modeled them).

So, we need some help, and thatâ€™s where the 2SLS estimator comes in.

So lets revisit the logic for using instrumentsâ€¦

To recover causal parameter estimates, we need an instrument(s) that ideally â€˜stands in forâ€™ random assignment to our treatment conditions. In this case, we have two treatment conditions, x, and m.

Weâ€™ll walk through more about causality in mediation designs next week. For now, lets just focus on model specification.

But with two variables that are not randomly assigned, we need instruments for x, and for m.

Under the conditional ignorability assumption, we can recover consistent (and hence causal, with an appropriate research design) parameter estimates ifâ€¦

- Our instrument(s) are individually and jointly valid
- Our instrument(s) are properly excluded (not themselves endogenous)

Iâ€™ve got some instruments in the dataset that weâ€™re going to use. I1 and I2 are instruments for x; I3 and I4 are instruments for m.

Lets start with the full syntaxâ€¦

```
mediation.iv.model <- 'm ~ a * x # a path
y ~ b * m # b path
y ~ c_p * x # c prime path
# Instrument variable paths
x ~ I1 + I2
m ~ I3 + I4
# Specify the error term covariances
x ~~ m
m ~~ y
x ~~ y
ab := a*b
c := c_p + (a * b)'
mediation.iv.fit <- sem(mediation.iv.model, data = my.df)
```

`parameterEstimates(mediation.iv.fit)`

```
## lhs op rhs label est se z pvalue ci.lower ci.upper
## 1 m ~ x a 0.438 0.027 16.274 0.000 0.385 0.491
## 2 y ~ m b 0.506 0.024 21.009 0.000 0.458 0.553
## 3 y ~ x c_p 0.005 0.027 0.202 0.840 -0.047 0.058
## 4 x ~ I1 0.333 0.011 29.379 0.000 0.311 0.356
## 5 x ~ I2 0.341 0.011 29.864 0.000 0.319 0.364
## 6 m ~ I3 0.336 0.013 26.717 0.000 0.311 0.360
## 7 m ~ I4 0.349 0.013 27.809 0.000 0.325 0.374
## 8 m ~~ x 0.296 0.039 7.655 0.000 0.221 0.372
## 9 m ~~ y 0.340 0.043 7.894 0.000 0.256 0.425
## 10 y ~~ x -0.004 0.035 -0.111 0.911 -0.074 0.066
## 11 m ~~ m 1.654 0.028 58.409 0.000 1.598 1.709
## 12 y ~~ y 1.351 0.025 53.682 0.000 1.302 1.401
## 13 x ~~ x 1.325 0.019 70.711 0.000 1.288 1.362
## 14 I1 ~~ I1 1.006 0.000 NA NA 1.006 1.006
## 15 I1 ~~ I2 0.004 0.000 NA NA 0.004 0.004
## 16 I1 ~~ I3 0.010 0.000 NA NA 0.010 0.010
## 17 I1 ~~ I4 0.012 0.000 NA NA 0.012 0.012
## 18 I2 ~~ I2 0.993 0.000 NA NA 0.993 0.993
## 19 I2 ~~ I3 0.011 0.000 NA NA 0.011 0.011
## 20 I2 ~~ I4 0.008 0.000 NA NA 0.008 0.008
## 21 I3 ~~ I3 0.977 0.000 NA NA 0.977 0.977
## 22 I3 ~~ I4 0.016 0.000 NA NA 0.016 0.016
## 23 I4 ~~ I4 0.980 0.000 NA NA 0.980 0.980
## 24 ab := a*b ab 0.221 0.017 12.895 0.000 0.188 0.255
## 25 c := c_p+(a*b) c 0.227 0.030 7.447 0.000 0.167 0.287
```

`fitmeasures(mediation.iv.fit, c("chisq", "df", "pvalue"))`

```
## chisq df pvalue
## 6.105 5.000 0.296
```

Ok, itâ€™s tough to see all of the results. Lets put our models togetherâ€¦

Correct model â€“ With Zâ€™s Included:

```
## Path Parameter se pvalue
## 1 ab 0.209 0.00612 0
## 2 c 0.214 0.00968 0
```

B&K model â€“ Zâ€™s Omitted (Endogeneity):

```
## Path Parameter se pvalue
## 1 ab 0.437 0.00919 0
## 2 c 0.322 0.01180 0
```

2SLS model â€“ Zâ€™s Omitted (Instruments & Correlated Disturbances):

```
## Path Parameter se pvalue
## 1 ab 0.221 0.0172 0.00e+00
## 2 c 0.227 0.0305 9.57e-14
```

Lets also take a look at the *psi (\(\psi\))* matrix from our 2SLS model, which in `lavaan`

is the matrix containing the covariances between the *etaâ€™s (\(\eta\))*â€”the dependent variables (x, m, and y in our model). These are actually the covariances between the error terms (called \(\zeta\)) that are necessary to properly estimate a 2SLS model.

`inspect(mediation.iv.fit, what = "estimates")$psi`

```
## m y x I1 I2 I3 I4
## m 1.654
## y 0.340 1.351
## x 0.296 -0.004 1.325
## I1 0.000 0.000 0.000 1.006
## I2 0.000 0.000 0.000 0.004 0.993
## I3 0.000 0.000 0.000 0.010 0.011 0.977
## I4 0.000 0.000 0.000 0.012 0.008 0.016 0.980
```

By the way, in the classic LISREL notation, part of this matrix would be *phi (\(\phi\))*. Weâ€™ll talk about that more when we get to the SEM class, but knowing LISREL is helpful to understanding other SEM software packages.

So the large covariance between \(\zeta\)â€™s for x and m reflects that there is an omitted variable (z1, remember?) that is causing x to be endogenous to m. Using our two instruments to help us over identify the model, we can actually see what this value is (unlike using ivreg, etc.). Because itâ€™s statistically significant, we know that we need to retain this path, along with our instruments.

The same logic applies to the covariance between the \(\zeta\)â€™s for m and y.

But how about the covariance between the \(\zeta\)â€™s for x and y? Do we need that one?

So we could drop that parameter and our model wonâ€™t be misspecified. We didnâ€™t need to free that covariance, nor did we need to estimate a \(c'\) path in the first place, because our instruments would account for any missing Mâ€™s that have gone unobserved.

Yet another reason why we donâ€™t needâ€”and it would actually be a mistake toâ€”estimate equation 1 from B&K.

So our more parsimonious model looks like thisâ€¦

```
mediation.ivclean.model <- 'm ~ a * x # a path
y ~ b * m # b path
# Instrument variable paths
x ~ I1 + I2
m ~ I3 + I4
# Specify the error term covariances
x ~~ m
m ~~ y
# Indirect effect
ab := a*b'
mediation.ivclean.fit <- sem(mediation.ivclean.model, data = my.df)
```

`parameterEstimates(mediation.ivclean.fit)`

```
## lhs op rhs label est se z pvalue ci.lower ci.upper
## 1 m ~ x a 0.437 0.026 16.639 0 0.386 0.489
## 2 y ~ m b 0.509 0.013 40.653 0 0.484 0.533
## 3 x ~ I1 0.333 0.011 29.373 0 0.311 0.356
## 4 x ~ I2 0.341 0.011 29.870 0 0.319 0.364
## 5 m ~ I3 0.336 0.012 26.983 0 0.312 0.360
## 6 m ~ I4 0.350 0.012 28.103 0 0.325 0.374
## 7 m ~~ x 0.298 0.038 7.884 0 0.224 0.372
## 8 m ~~ y 0.336 0.025 13.504 0 0.287 0.385
## 9 m ~~ m 1.654 0.028 58.891 0 1.599 1.709
## 10 y ~~ y 1.349 0.021 64.698 0 1.308 1.390
## 11 x ~~ x 1.325 0.019 70.711 0 1.288 1.362
## 12 I1 ~~ I1 1.006 0.000 NA NA 1.006 1.006
## 13 I1 ~~ I2 0.004 0.000 NA NA 0.004 0.004
## 14 I1 ~~ I3 0.010 0.000 NA NA 0.010 0.010
## 15 I1 ~~ I4 0.012 0.000 NA NA 0.012 0.012
## 16 I2 ~~ I2 0.993 0.000 NA NA 0.993 0.993
## 17 I2 ~~ I3 0.011 0.000 NA NA 0.011 0.011
## 18 I2 ~~ I4 0.008 0.000 NA NA 0.008 0.008
## 19 I3 ~~ I3 0.977 0.000 NA NA 0.977 0.977
## 20 I3 ~~ I4 0.016 0.000 NA NA 0.016 0.016
## 21 I4 ~~ I4 0.980 0.000 NA NA 0.980 0.980
## 22 ab := a*b ab 0.222 0.015 15.126 0 0.193 0.251
```

`fitmeasures(mediation.ivclean.fit, c("chisq", "df", "pvalue"))`

```
## chisq df pvalue
## 6.149 7.000 0.522
```

See? No evidence of misspecification in our \(\chi\)^{2} statistic. Weâ€™re good to go.

Last thingâ€¦what happens when we keep our instruments, but *DO NOT* include the error term covariances?

```
mediation.ivbad.model <- 'm ~ a * x # a path
y ~ b * m # b path
# Instrument variable paths
x ~ I1 + I2
m ~ I3 + I4
# Indirect effect
ab := a*b'
mediation.ivbad.fit <- sem(mediation.ivbad.model, data = my.df)
```

`parameterEstimates(mediation.ivbad.fit)`

```
## lhs op rhs label est se z pvalue ci.lower ci.upper
## 1 m ~ x a 0.629 0.010 62.017 0 0.609 0.649
## 2 y ~ m b 0.646 0.007 88.578 0 0.632 0.660
## 3 x ~ I1 0.334 0.011 29.097 0 0.311 0.356
## 4 x ~ I2 0.341 0.012 29.510 0 0.318 0.363
## 5 m ~ I3 0.337 0.013 26.315 0 0.311 0.362
## 6 m ~ I4 0.346 0.013 27.078 0 0.321 0.371
## 7 m ~~ m 1.597 0.023 70.711 0 1.553 1.641
## 8 y ~~ y 1.303 0.018 70.711 0 1.267 1.339
## 9 x ~~ x 1.325 0.019 70.711 0 1.288 1.362
## 10 I1 ~~ I1 1.006 0.000 NA NA 1.006 1.006
## 11 I1 ~~ I2 0.004 0.000 NA NA 0.004 0.004
## 12 I1 ~~ I3 0.010 0.000 NA NA 0.010 0.010
## 13 I1 ~~ I4 0.012 0.000 NA NA 0.012 0.012
## 14 I2 ~~ I2 0.993 0.000 NA NA 0.993 0.993
## 15 I2 ~~ I3 0.011 0.000 NA NA 0.011 0.011
## 16 I2 ~~ I4 0.008 0.000 NA NA 0.008 0.008
## 17 I3 ~~ I3 0.977 0.000 NA NA 0.977 0.977
## 18 I3 ~~ I4 0.016 0.000 NA NA 0.016 0.016
## 19 I4 ~~ I4 0.980 0.000 NA NA 0.980 0.980
## 20 ab := a*b ab 0.406 0.008 50.803 0 0.391 0.422
```

`fitmeasures(mediation.ivbad.fit, c("chisq", "df", "pvalue"))`

```
## chisq df pvalue
## 261.174 9.000 0.000
```

Quick quizâ€¦thereâ€™s one crystal clear way to know that weâ€™ve made a mistake. What is it?

By the way, this was just a *simple* mediation problem. Look at what was necessary to just properly specify a x â€“> m â€“> y model.

See why I am such a skeptic about multiple mediation models?

Ohâ€¦the bootstrap thing. So the `lavaan`

package makes it very easy. Weâ€™ll use our `ivclean`

model.

```
mediation.ivboot.fit <- sem(mediation.ivclean.model, data = my.df,
se = "bootstrap", bootstrap = 5000)
parameterEstimates(mediation.ivboot.fit)
```

```
## lhs op rhs label est se z pvalue ci.lower ci.upper
## 1 m ~ x a 0.437 0.026 16.694 0 0.384 0.487
## 2 y ~ m b 0.509 0.012 40.854 0 0.484 0.533
## 3 x ~ I1 0.333 0.011 29.575 0 0.312 0.356
## 4 x ~ I2 0.341 0.012 29.637 0 0.319 0.365
## 5 m ~ I3 0.336 0.012 27.007 0 0.311 0.361
## 6 m ~ I4 0.350 0.012 27.983 0 0.325 0.375
## 7 m ~~ x 0.298 0.038 7.859 0 0.225 0.375
## 8 m ~~ y 0.336 0.025 13.448 0 0.288 0.387
## 9 m ~~ m 1.654 0.028 58.900 0 1.600 1.712
## 10 y ~~ y 1.349 0.020 65.881 0 1.310 1.391
## 11 x ~~ x 1.325 0.019 71.348 0 1.289 1.361
## 12 I1 ~~ I1 1.006 0.000 NA NA 1.006 1.006
## 13 I1 ~~ I2 0.004 0.000 NA NA 0.004 0.004
## 14 I1 ~~ I3 0.010 0.000 NA NA 0.010 0.010
## 15 I1 ~~ I4 0.012 0.000 NA NA 0.012 0.012
## 16 I2 ~~ I2 0.993 0.000 NA NA 0.993 0.993
## 17 I2 ~~ I3 0.011 0.000 NA NA 0.011 0.011
## 18 I2 ~~ I4 0.008 0.000 NA NA 0.008 0.008
## 19 I3 ~~ I3 0.977 0.000 NA NA 0.977 0.977
## 20 I3 ~~ I4 0.016 0.000 NA NA 0.016 0.016
## 21 I4 ~~ I4 0.980 0.000 NA NA 0.980 0.980
## 22 ab := a*b ab 0.222 0.015 15.277 0 0.193 0.251
```

Now with robust standard errorsâ€¦

```
mediation.ivrobust.fit <- sem(mediation.ivclean.model, data = my.df, se = "robust.sem")
parameterEstimates(mediation.ivrobust.fit)
```

```
## lhs op rhs label est se z pvalue ci.lower ci.upper
## 1 m ~ x a 0.437 0.028 15.461 0.000 0.382 0.492
## 2 y ~ m b 0.509 0.012 41.033 0.000 0.484 0.533
## 3 x ~ I1 0.333 0.013 25.951 0.000 0.308 0.359
## 4 x ~ I2 0.341 0.013 26.410 0.000 0.316 0.367
## 5 m ~ I3 0.336 0.014 24.010 0.000 0.309 0.364
## 6 m ~ I4 0.350 0.014 25.248 0.000 0.322 0.377
## 7 m ~~ x 0.298 0.041 7.236 0.000 0.217 0.378
## 8 m ~~ y 0.336 0.025 13.603 0.000 0.288 0.385
## 9 m ~~ m 1.654 0.029 57.643 0.000 1.598 1.711
## 10 y ~~ y 1.349 0.021 65.703 0.000 1.309 1.389
## 11 x ~~ x 1.325 0.019 69.169 0.000 1.287 1.362
## 12 I1 ~~ I1 1.006 0.000 NA NA 1.006 1.006
## 13 I1 ~~ I2 0.004 0.000 NA NA 0.004 0.004
## 14 I1 ~~ I3 0.010 0.000 NA NA 0.010 0.010
## 15 I1 ~~ I4 0.012 0.000 NA NA 0.012 0.012
## 16 I2 ~~ I2 0.993 0.000 NA NA 0.993 0.993
## 17 I2 ~~ I3 0.011 0.000 NA NA 0.011 0.011
## 18 I2 ~~ I4 0.008 0.000 NA NA 0.008 0.008
## 19 I3 ~~ I3 0.977 0.000 NA NA 0.977 0.977
## 20 I3 ~~ I4 0.016 0.000 NA NA 0.016 0.016
## 21 I4 ~~ I4 0.980 0.000 NA NA 0.980 0.980
## 22 m ~1 -0.016 0.014 -1.177 0.239 -0.043 0.011
## 23 y ~1 0.012 0.012 1.067 0.286 -0.010 0.035
## 24 x ~1 0.003 0.012 0.250 0.802 -0.021 0.028
## 25 I1 ~1 0.006 0.000 NA NA 0.006 0.006
## 26 I2 ~1 0.011 0.000 NA NA 0.011 0.011
## 27 I3 ~1 -0.009 0.000 NA NA -0.009 -0.009
## 28 I4 ~1 0.010 0.000 NA NA 0.010 0.010
## 29 ab := a*b ab 0.222 0.016 14.267 0.000 0.192 0.253
```

Not much difference, right? Note that sometimes you canâ€™t use robust standard errors (depending on the estimator), so itâ€™s always good to have bootstrapping in your pocket.

So debriefâ€¦this is a tough topic. Whatâ€™s still bugging you?

Wrap-up.

Lab 16 March â€“ Mediation assessment

Seminar 20 March â€“ A rigorous way to think about mediation