Moderation: Continuous moderators

ENT5587B - Research Design & Theory Testing II

Brian S. Anderson, Ph.D.
Assistant Professor
Department of Global Entrepreneurship & Innovation
andersonbri@umkc.edu


RIEI Logo
© 2017 Brian S. Anderson

  • How goes it?
  • IRB Applications
  • Paper progress – Due Monday 20th at Noon
  • Interaction duplicity
  • Centering and standardizing
  • Plotting continuous by continuous interactions
  • Marginal effects
  • Lab 16 Feb: Moderation paper critique

\(y=\alpha+{\beta}{x}+{\beta}{m}+{\beta}{x}{m}+\varepsilon\)

Moderation

We’re going to dig deeper into moderation today, and particularly with moderators that are continuous variables.

First though, a little exercise.

Lets start by getting some data…

library(tidyverse)
my.ds <- read_csv("http://a.web.umkc.edu/andersonbri/ENT5587.csv")
my.df <- as.data.frame(my.ds) %>%
         select(RiskTaking, Inn = Innovativeness, 
                RND = RNDIntensity) %>%
         na.omit()

First up, estimate the following model.

\(RiskTaking=\alpha+{\beta}{Innovativeness}+{\beta}{RNDIntensity...}+\) \({...\beta}{Innovativeness*}{RNDIntensity}+\varepsilon\)

innovativeness.model <- lm(RiskTaking ~ Inn*RND, data = my.df)
summary(innovativeness.model)
## 
## Call:
## lm(formula = RiskTaking ~ Inn * RND, data = my.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.5355 -0.5570  0.1050  0.6536  2.3631 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.43164    0.38951   3.675 0.000367 ***
## Inn          0.63402    0.10248   6.187 1.05e-08 ***
## RND          0.17263    0.07444   2.319 0.022225 *  
## Inn:RND     -0.03807    0.01646  -2.313 0.022548 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.971 on 111 degrees of freedom
## Multiple R-squared:  0.3661, Adjusted R-squared:  0.349 
## F-statistic: 21.37 on 3 and 111 DF,  p-value: 5.349e-11

So there is a significant interaction effect between Innovativeness and R&D Intensity on Risk Taking. Note that both Innovativeness and R&D Intensity are continuous variables.

Take ten minutes and write a hypothesis (1 paragraph) for why the effect of Innovativeness on Risk Taking changes as a function of a change in the level of R&D Intensity.

Next up, estimate this model…

\(RiskTaking=\alpha+{\beta}{RNDIntensity}+{\beta}{Innovativeness...}+\) \({...\beta}{RNDIntensity*}{Innovativeness}+\varepsilon\)

rnd.model <- lm(RiskTaking ~ RND*Inn, data = my.df)
summary(rnd.model)
## 
## Call:
## lm(formula = RiskTaking ~ RND * Inn, data = my.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.5355 -0.5570  0.1050  0.6536  2.3631 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.43164    0.38951   3.675 0.000367 ***
## RND          0.17263    0.07444   2.319 0.022225 *  
## Inn          0.63402    0.10248   6.187 1.05e-08 ***
## RND:Inn     -0.03807    0.01646  -2.313 0.022548 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.971 on 111 degrees of freedom
## Multiple R-squared:  0.3661, Adjusted R-squared:  0.349 
## F-statistic: 21.37 on 3 and 111 DF,  p-value: 5.349e-11

Hmmm…so now we have a significant interaction between R&D Intensity and Innovativeness.

Take ten minutes and write a hypothesis (1 paragraph) for why the effect of R&D Intensity on Risk Taking changes as a function of a change in the level of Innovativeness.

So just taking a step back, which of the two seems the more plausible hypothesis?

Why is the theoretical distinction between x and m important, despite their mathematical equivalency in the model?

Bonus points…anybody see the opportunity to engage in questionable research practices with this feature of moderation models?

So what’s the moral of our story?

Next up, lets talk about centering and standardizing.

We’re going to stick with this model from now on…

\(RiskTaking=\alpha+{\beta}{Innovativeness}+{\beta}{RNDIntensity...}+\) \({...\beta}{Innovativeness*}{RNDIntensity}+\varepsilon\)

And lets create mean centered versions of Innovativeness and R&D Intensity…

my.df$Inn.center <- (my.df$Inn - mean(my.df$Inn))
my.df$RND.center <- (my.df$RND - mean(my.df$RND))

You read papers today on the myth that just won’t die that centering deals with multicollinearity.

I’m not going to rehash these arguments, because I completely agree with them (and you should as well—it’s just basic math). I do though want to quickly show an illustration…

First up, an uncentered variable model.

uncentered.model <- lm(RiskTaking ~ Inn*RND, data = my.df)
library(visreg)
visreg(uncentered.model, "Inn", by="RND", 
       overlay=TRUE, partial=FALSE)

Now, a centered variable model.

centered.model <- lm(RiskTaking ~ Inn.center*RND.center, data = my.df)
visreg(centered.model, "Inn.center", by="RND.center", 
       overlay=TRUE, partial=FALSE)

No, your eyes aren’t playing tricks on you. What difference do you see between these two plots?

The best discussion in my opinion of uncentered and centered predictors is in Chapter 7 of Cohen et al. (2003).

It is a bit technical, but well worth your time.

Feel free to borrow my copy.

Lets walk through what’s going on under the hood of these two models.

library(stargazer)
stargazer(uncentered.model, centered.model, type = "html",
          title = "Model comparisons",
          dep.var.labels.include = FALSE,
          dep.var.caption = "",
          column.labels = c("Uncentered", "Centered"),
          model.numbers = FALSE,
          single.row = TRUE,
          ci = FALSE,
          omit.stat = c("ser"),
          star.cutoffs = c(0.05, 0.01, 0.001),
          star.char = c("*", "**", "***"),
          notes = "† p < .05; ** p < .01; *** p < .001",
          notes.align = c("r"),
          notes.label = "",
          notes.append = FALSE)

Model comparisons
Uncentered Centered
Inn 0.634*** (0.102)
RND 0.173* (0.074)
Inn:RND -0.038* (0.016)
Inn.center 0.445*** (0.067)
RND.center 0.023 (0.027)
Inn.center:RND.center -0.038* (0.016)
Constant 1.432*** (0.390) 4.042*** (0.098)
Observations 115 115
R2 0.366 0.366
Adjusted R2 0.349 0.349
F Statistic (df = 3; 111) 21.368*** 21.368***
† p < .05; ** p < .01; *** p < .001

Pay very careful attention to the coefficient estimate for the interaction xm term. What do you notice?

The short answer is that centering a variable has nothing to do with addressing multicollinearity in an interaction model. Whether you center or don’t center, the highest order coeffecient in the model, xm, does not change. The estimates of the lower order terms do change because we’ve shifted the scale of measurement, but that has nothing to do with collinearity.

We’ll talk about interpreting the coefficients in a minute, but as you have almost certainly read in papers (including mine, tragically), it is common to calculate variance inflation factors in a regression model.

Here’s the uncentered regression model…

library(car)
vif(uncentered.model)
##       Inn       RND   Inn:RND 
##  2.830703  9.748464 14.233548

And the centered…

vif(centered.model)
##            Inn.center            RND.center Inn.center:RND.center 
##              1.206177              1.243624              1.035093

Based on our rule of thumb, a VIF greater than 2.0 suggests that multicollinearity—significant correlations among the predictors—may be inflating the variance of our estimated coefficients in our uncentered model. On the surface, this seems to be a problem.

Remember though our formula for the VIF of the ith coefficient…

\({VIF}_{i}=\frac{1}{1-{R}_{i}^{2}}\)

You calculate the VIF by regressing each predictor on all of the other predictors. The logic is that different variables may inadvertently be capturing similar conceptual domains, rendering inconsistent coefficient estimates because of the high correlations among the predictors.

In an interaction model, the xm term is the product of two variables, x and m, and so not surprisingly, xm correlates strongly with its constituent elements. But this is essential collinearity—it has to be there and will always be there. Fortunately though, linear models are very robust to essential collinearity. It doesn’t bother the estimates at all, it doesn’t impact the estimates of the lower order terms, it doesn’t influence the t statistic or p-value, nor does it impact the R2 of the overall model.

Bottom line, feel free to ignore.

Nonessential collinearity due to measurement error or random error is a different story, and centering does help address these issues, so it’s among the many reasons why you should center.

Lets revisit our coefficient estimates…

Model comparisons
Uncentered Centered
Inn 0.634*** (0.102)
RND 0.173* (0.074)
Inn:RND -0.038* (0.016)
Inn.center 0.445*** (0.067)
RND.center 0.023 (0.027)
Inn.center:RND.center -0.038* (0.016)
Constant 1.432*** (0.390) 4.042*** (0.098)

In both models, the coefficient estimate for the xm term is the average change in the effect of x on y across all observed values of m. We’ll come back to this in a minute.

Also in both models..

  • The coefficient estimate for x is the estimated effect of x on y when m = 0.
  • The coefficient estimate for m is the estimated effect of m on y when x = 0.

Anybody see the issue with this?

Hint…think about how we interpret an intercept.

It’s always appropriate to interpret the lower order coefficients in an interaction model, it’s just not all that easy.

If there is no meaningful value of zero in either x or m, then the coefficient estimate is the effect of x (or m) on y in a hypothetical world that m = 0 (or x = 0) exists.

So absent this context, many scholars simply chose to ignore them (which they shouldn’t!).

So the simple solution to improve interpretability is to give the data a meaningful zero value. We do this by shifting the scale of measurement such that mean of the vector of variables is zero. We don’t do anything to the variance of the variable, just move it over by subtracting each observation from the mean of the variable…

# We'll use our Innovativeness measure
centering.plot <- ggplot(my.df, aes(Inn)) + 
                geom_density((aes(colour = "Uncentered"))) +
                geom_density(data=my.df, (aes(Inn.center, colour = "Centered"))) +
                scale_colour_manual("", values = c("darkblue", "firebrick")) +
                labs(title = "Centered Versus Uncentered Innovativeness",
                     x = "Scale", 
                     y = "Distribution Density")
centering.plot + theme_minimal() + 
                 xlim(-10, 10)

It’s the same variance, we just moved the scale to the left.

So with our variables centered, it’s easier to make sense of our variables…

Beta1 <- centered.model$coefficients["Inn.center"]
Beta2 <- centered.model$coefficients["RND.center"]
Beta3 <- centered.model$coefficients["Inn.center:RND.center"]

Predicted average effect of x on y when m = 0: 0.4449012

Predicted average effect of m on y when x = 0: 0.02266476

Predicted average change in the effect of x on y as m increases -0.03807064

If only it was that simple…

The presence of the interaction term means that the effect of x (or of m for that matter) changes as a function of the level of m. In a continuous by continuous interaction however, m takes on, well, lots of different levels.

In a multiple regression model, x and m are additive. But in an interaction model with a continuous moderator, xm increases curvilinearly as x and m increase linearly (we’ll see a plot later that helps make sense of this).

The net result is that the coefficient estimate for xm is simply the average of those curvilinear changes, which makes the value of the coefficient estimate itself not all that useful (unlike with a dichotomous moderator).

What’s more, the way that we commonly view continuous by continuous interactions really doesn’t tell much of the story.

In fact, it can be downright misleading.

Lets take a look again at the most common way to visualize a continuous by continuous interaction. Usually, researchers plot the effect of x on y at +/- 1 standard deviation (and sometimes at the mean) of x and m, and cite Aiken and West (1991) or Cohen et al. (2003) as justification.

We can do that with visreg by setting two breaks at -4 and 4, which roughly corresponds to +/- 1 standard deviation in our data.

visreg(centered.model, "Inn.center", by="RND.center", 
       overlay=TRUE, partial=FALSE, breaks=c(-4, 4))