Visualizing and interpreting continuous-by-continuous interaction effects

· by Brian Anderson · Read in about 12 min · (2534 words) ·

In the entrepreneurship (and management) literature, we typically see a continuous-by-continuous interaction effect plotted like this:

I think this is a poor way to visualize interactions, and I think we need a new ‘default’ for how we plot continuous-by-continuous interaction effects. A facet plot with a simple slope visualization with confidence intervals on one side, and a marginal effects plot on the other.

Data

Lets start by simulating some data…

library(tidyverse)
# We are going to simulate a relatively weak continuous by continuous interaction, 
#  which is far more likely in entrepreneurship, strategy, and management studies.

# We start with creating our X, M, and Y variables representing, our predictor (IV), 
#  moderator (also an IV), and outcome (DV) variables respectively, with a sample 
#  size of 150, also common in our literature.

set.seed(08022003)

n <- 150

# We set x with a mean of 0 and standard deviation of 1.75
x <- rnorm(n, 0, 1.75)

# We set m with a mean of 0 and standard deviation of .75
m <- rnorm(n, 0, .75)

# Now we create our outcome variable regressing x, m, and x*m on y, and including
#  a random error term with mean zero and standard deviation of 2. I set the effect 
#  sizes to reflect a relatively weak interaction of x and m, and an intercept of 
#  .05.

y <- .05 + .2*x + .1*m + .3*x*m + rnorm(n, sd = 2)

# Now we bind these variables in a dataframe we will use in our models.
moderation.df <- data_frame(y, x, m)


Centering & Multicolinearity

Mean-centering, or standardizing, the lower order variables has absolutely nothing to do with multicolinearity. For the mathematical proofs see here and here. Yes, the variance inflation factor (VIF), a common tool to evaluate multicollinearity, will be high in models with uncentered lower-order terms. But the high VIF is a function of the equation to calculate VIF and does not reflect instability in the coefficient estimates.

Consider the interaction coefficient in the original model and in a model with mean-centered predictors:

notcentered.model <- lm(y ~ x * m, data = moderation.df)
summary(notcentered.model)
## 
## Call:
## lm(formula = y ~ x * m, data = moderation.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.7411 -1.4421 -0.0501  1.4020  3.9900 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -0.04198    0.15738  -0.267   0.7901  
## x            0.19746    0.10272   1.922   0.0565 .
## m            0.02593    0.22958   0.113   0.9102  
## x:m          0.32206    0.15307   2.104   0.0371 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.911 on 146 degrees of freedom
## Multiple R-squared:  0.0544, Adjusted R-squared:  0.03497 
## F-statistic:   2.8 on 3 and 146 DF,  p-value: 0.04216
# First we create mean centered values for our predictors
moderation.df <- moderation.df %>% 
  mutate(cen.x = x - mean(x),
         cen.m = m - mean(m))

# Now we estimate a new model
centered.model <- lm(y ~ cen.x * cen.m, data = moderation.df)
summary(centered.model)
## 
## Call:
## lm(formula = y ~ cen.x * cen.m, data = moderation.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.7411 -1.4421 -0.0501  1.4020  3.9900 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -0.03741    0.15618  -0.240   0.8111  
## cen.x        0.17190    0.10353   1.660   0.0990 .
## cen.m        0.03835    0.22890   0.168   0.8672  
## cen.x:cen.m  0.32206    0.15307   2.104   0.0371 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.911 on 146 degrees of freedom
## Multiple R-squared:  0.0544, Adjusted R-squared:  0.03497 
## F-statistic:   2.8 on 3 and 146 DF,  p-value: 0.04216

Yep, they are the same, although the lower order terms (the intercept, \(\beta_x\) and \(\beta_m\)) are different. And yes, you can interpret the lower order terms in an interaction model. Mean-centering does make this interpretation a lot easier, and is one of the reasons why I recommend it. Standardizing is also fine, but I think it’s easier to express results in the original scale instead of standard deviations.


Simple Slopes Plot

There a lot of packages for R that build continuous-by-continuous interaction plots. I like building them myself, because I have more control over the look and feel, and it helps me to make sure I know what is going on!

Our first step is to specify values for \(x\) and for \(m\) that we will use to generate predicted values of \(y\), using our regression equation to test for moderation:

\[y = \alpha + \beta{x} + \beta{m} + \beta{xm} + \epsilon\]

This is a linear relationship, and so strictly speaking all we would need are four sets of (\(x\),\(y\)) coordinates—one for each endpoint of the two lines.

I like to specify though a range of values for \(x\), starting with the minimum and using some meaningful increment based on the data, all the way up to the maximum value.

For \(m\), the standard is two lines for the interaction, typically set at +/- 1 standard deviation from the mean, and representing the effect of \(x\) on \(y\) when \(m\) is below the mean, and the effect of \(x\) on \(y\) when \(m\) is above the mean.

Sometimes you see a third line at the mean of \(m\). There is nothing wrong with this, but I’m not sure this line adds much.

# Create a range of meaningful values of x for the prediction. We don't need all of them, 
#  so I am going to increment by .5
x.seq <- seq(from = round(min(moderation.df$cen.x)), 
             to = round(max(moderation.df$cen.x)), by = .5)

# Now we get the low and high values of m based on one standard deviation 
#  above and below the mean
low.m <- mean(moderation.df$cen.m) - sd(moderation.df$cen.m)
high.m <- mean(moderation.df$cen.m) + sd(moderation.df$cen.m)

# Lastly we create two new dataframes based on our new values of x and our low/high values of m
low.df <- data_frame(cen.x = x.seq, cen.m = low.m)
high.df <- data_frame(cen.x = x.seq, cen.m = high.m)

Now we create our predicted values of \(y\) using our centered model but using dataframes with our range of \(x\)’s in the data and at low/high values of \(m\) respectively.

library(margins)
# This one is the low condition
predict.low <- cplot(centered.model, x = "cen.x", what = c("prediction"),
                     data = low.df, draw = FALSE)
##         xvals        yvals     upper      lower
## 1  -4.0000000  0.135029163 1.4425186 -1.1724603
## 2  -3.6666667  0.118460473 1.3298811 -1.0929602
## 3  -3.3333333  0.101891784 1.2183588 -1.0145753
## 4  -3.0000000  0.085323094 1.1082622 -0.9376161
## 5  -2.6666667  0.068754404 1.0000211 -0.8625123
## 6  -2.3333333  0.052185714 0.8942415 -0.7898700
## 7  -2.0000000  0.035617025 0.7917952 -0.7205611
## 8  -1.6666667  0.019048335 0.6939559 -0.6558592
## 9  -1.3333333  0.002479645 0.6025981 -0.5976388
## 10 -1.0000000 -0.014089045 0.5204494 -0.5486275
## 11 -0.6666667 -0.030657735 0.4512839 -0.5125994
## 12 -0.3333333 -0.047226424 0.3997091 -0.4941619
## 13  0.0000000 -0.063795114 0.3700046 -0.4975949
## 14  0.3333333 -0.080363804 0.3641137 -0.5248414
## 15  0.6666667 -0.096932494 0.3804411 -0.5743061
## 16  1.0000000 -0.113501183 0.4148530 -0.6418553
## 17  1.3333333 -0.130069873 0.4627015 -0.7228412
## 18  1.6666667 -0.146638563 0.5201034 -0.8133805
## 19  2.0000000 -0.163207253 0.5842277 -0.9106422
## 20  2.3333333 -0.179775943 0.6531229 -1.0126747
# Now I am going to add a marker variable to designate that this was the low m condition
predict.low <- predict.low %>% 
  mutate(m = -1)

Now we do the same thing for the +1 s.d. of \(m\) condition.

# This one is the high condition
predict.high <- cplot(centered.model, x = "cen.x", what = c("prediction"), 
                      data = high.df, draw = FALSE)
##         xvals       yvals       upper       lower
## 1  -4.0000000 -1.58503720 -0.38177336 -2.78830103
## 2  -3.6666667 -1.45386874 -0.33431996 -2.57341751
## 3  -3.3333333 -1.32270027 -0.28576760 -2.35963294
## 4  -3.0000000 -1.19153181 -0.23583126 -2.14723237
## 5  -2.6666667 -1.06036335 -0.18412592 -1.93660078
## 6  -2.3333333 -0.92919489 -0.13012367 -1.72826611
## 7  -2.0000000 -0.79802643 -0.07309064 -1.52296221
## 8  -1.6666667 -0.66685796 -0.01199671 -1.32171921
## 9  -1.3333333 -0.53568950  0.05460612 -1.12598512
## 10 -1.0000000 -0.40452104  0.12872269 -0.93776477
## 11 -0.6666667 -0.27335258  0.21300441 -0.75970957
## 12 -0.3333333 -0.14218412  0.31062010 -0.59498834
## 13  0.0000000 -0.01101566  0.42466140 -0.44669271
## 14  0.3333333  0.12015281  0.55706427 -0.31675866
## 15  0.6666667  0.25132127  0.70767973 -0.20503720
## 16  1.0000000  0.38248973  0.87435228 -0.10937282
## 17  1.3333333  0.51365819  1.05392551 -0.02660913
## 18  1.6666667  0.64482665  1.24327722  0.04637609
## 19  2.0000000  0.77599512  1.43984124  0.11214899
## 20  2.3333333  0.90716358  1.64169374  0.17263341
# Now I am going to add a marker variable to designate that this was the low m condition
predict.high <- predict.high %>% 
  mutate(m = 1)

Now we bind the two dataframes together to create our plot. We’ll also create a factor variable out of our \(m\) variable, which makes the plotting process a bit easier.

predict.model <- bind_rows(predict.low, predict.high) %>%
  mutate(m = factor(m, 
                    levels = c(-1, 1),
                    labels = c("Low M", "High M")))

Now lets create our simple slopes plot. I’m a big fan of ggplot, so that’s what I’ll use.

predict.plot <- ggplot(data = predict.model, aes(x = xvals, group = m)) + 
  geom_line(aes(y = yvals, color = m)) + 
  geom_ribbon(alpha = .2, aes(ymin = lower, ymax = upper)) + 
  labs(title = "Effect of X on Y Moderated by M",
       subtitle = "Simple Slopes at +/- 1 Standard Deviation of Mean-Centered M",
       y = "Predicted Values of Y",
       x = "Mean Centered Value of X", 
       colour = "") +
  theme_bw() +
  theme(legend.position = c(.1, .85))

predict.plot

Now, compare the plot above with the standard simple slopes plot we see in the literature…

ggplot(data = predict.model, aes(x = xvals, group = m)) + 
  geom_line(aes(y = yvals, color = m)) + 
  labs(title = "Effect of X on Y Moderated by M",
       y = "Predicted Values of Y",
       x = "Mean Centered Value of X", 
       colour = "") +
  scale_fill_discrete(guide = FALSE) + 
  theme_bw() +
  theme(legend.position = c(.1, .85))

Why is the first plot better? By including the confidence intervals, we see the range of statistical significance and the uncertainty around the interaction effect and we also include the range of predicted values for y; that sets a more reasonable scale for the y axis.

Remember the model output?

summary(centered.model)$coef
##                Estimate Std. Error    t value   Pr(>|t|)
## (Intercept) -0.03740538  0.1561783 -0.2395044 0.81105040
## cen.x        0.17189966  0.1035313  1.6603644 0.09898738
## cen.m        0.03835214  0.2289024  0.1675480 0.86717073
## cen.x:cen.m  0.32205917  0.1530743  2.1039395 0.03709789

When we consider the coefficient estimate for the interaction effect, the plot with the confidence intervals and across the range of predicted values of \(y\) makes it clear that the interaction effect isn’t all that large.

Yes, the slopes are statically different (what we are actually testing in a moderation model), but they do not seem materially different.


Marginal Effect Plot

In a continuous-by-continuous interaction, the model assumes that the effect of \(x\) on \(y\) varies as a function of \(m\). When \(m\) is a discrete variable, \(\beta{x}\) simply shifts as a function of the discrete change in \(m\). But when \(m\) is continuous, \(\beta{x}\) represents the instantaneous change in the effect of \(x\) on \(y\) as \(m\) increments.

The marginal effect of \(x\) on \(y\), which is \(\beta{x}\), is itself constantly changing.

I’ve written about marginal effects before, and I think they are underused and under-appreciated!

So lets calculate our marginal effects first, then we will build the plot. I’m going to make use of the margins package again. The margins package for R mimics Stata’s margins, dydx(*) function. By default, margins calculates the marginal effect across the observed values of \(x\) and \(m\), so it provides a comprehensive look the expected change in \(\beta{x}\) when \(m\) changes.

library(margins)

# We are going to calculate the marginal effects, but store the results in a dataframe,
#  so that we can create our plot with ggplot.

# NOTE!!! The 'x' listed in the function isn't the X variable in our model; it's the 
#  variable that goes along the x axis. For a marginal effect plot, that is the moderator,
#  which in our case is 'cen.m'. We want the predicted marginal effects (the dx =) to be 
#  our X variable, which is 'cen'x'.
margins.model <- cplot(centered.model, dx = "cen.x", x = "cen.m",
                       what = "effect", draw = FALSE)

# Lets take a look at the dataframe...
head(margins.model, 10)
##    xvals   yvals  upper   lower factor
##  -1.4989 -0.3108 0.2051 -0.8268  cen.x
##  -1.3697 -0.2692 0.2113 -0.7498  cen.x
##  -1.2406 -0.2276 0.2180 -0.6733  cen.x
##  -1.1114 -0.1860 0.2254 -0.5975  cen.x
##  -0.9823 -0.1444 0.2337 -0.5226  cen.x
##  -0.8531 -0.1028 0.2431 -0.4488  cen.x
##  -0.7239 -0.0612 0.2540 -0.3765  cen.x
##  -0.5948 -0.0197 0.2668 -0.3061  cen.x
##  -0.4656  0.0219 0.2823 -0.2384  cen.x
##  -0.3364  0.0635 0.3012 -0.1741  cen.x

Here’s the key part of the output above—the ‘yvals’ column isn’t the predicted value of \(y\), its the predicted value for \(\beta{x}\), and as you can see, it has a pretty good range. While it won’t be exact because of how margins factors \(m\), if we take the average of the yvals above, it should be close to the value for \(\beta{x}\) that our regression model gave us, which is the average marginal effect for \(x\) on \(y\) across the range of values for \(m\).

margins.model %>% 
  summarise(mean(yvals))
##   mean(yvals)
## 1   0.1883366
summary(centered.model)$coef
##                Estimate Std. Error    t value   Pr(>|t|)
## (Intercept) -0.03740538  0.1561783 -0.2395044 0.81105040
## cen.x        0.17189966  0.1035313  1.6603644 0.09898738
## cen.m        0.03835214  0.2289024  0.1675480 0.86717073
## cen.x:cen.m  0.32205917  0.1530743  2.1039395 0.03709789

This is what makes marginal effect plots so important for continuous-by-continuous interaction effects—it helps us to see the range of statistical significance for the interaction effect. The average is helpful, but tells a limited part of the story.

Lets make our plot to get a better handle on it…

margins.plot <- ggplot(data = margins.model, aes(x = xvals, y = yvals)) + 
  geom_line(color = "red") + 
  geom_ribbon(alpha = .2, aes(ymin = lower, ymax = upper)) + 
  geom_hline(yintercept = 0, linetype = "dashed") +  # I add a vertical line at zero to help with interpretation
  labs(title = "Marginal Effect of X on Y as a Function of M",
       subtitle = "Marginal Effect of X Across Range of Mean-Centered M",
       y = "Estimated Effect of X on Y (BetaX)",
       x = "Mean Centered Values of M", 
       colour = "") +
  theme_bw()  

margins.plot

It’s a lot clearer now that we are talking about a pretty small interaction effect. There is a general increase in the effect of \(x\) on \(y\) as \(m\) increases, which is consistent with our simple slopes plot. But, the areas in which the confidence interval cross zero indicate that there is no statistically significant change in \(y\) as \(x\) changes for those values of \(m\) (i.e., \(\beta{x}\) isn’t itself changing).

So really, our interaction effect manifests only when \(m\) takes on values roughly higher than its mean. When we look at the simple slopes plot, the slope of \(x\) on \(y\) is pretty flat at -1 standard deviation of \(m\), which is consistent with our marginal effect plot that \(\beta{x}\) does not vary much when \(m\) is low.


A New Default…

So how would we use these in a paper? Here’s my proposal—side by side.

library(cowplot)
plot_grid(predict.plot, margins.plot, ncol = 2, align = "h")


Entrepreneurship researchers are still warming to R, but the same default plot could easily be done in Stata :)

Here’s hoping the new default catches on!