As an editor, I’m seeing more papers modeling an interaction effect talking about marginal effects and average marginal effects. This is a **great** thing, because the simple slope plots that are the *de facto* standard only tell a limited—and sometimes misleading—story about the underlying nature of an interaction effect.

What sparked this post though is seeing a number of papers discussing marginal effects, and saying that they are plotting marginal effects using Stata’s `margins`

command, but in actuality are plotting simple slopes.^{1} I think the problem here is that, by default, Stata’s `margins`

command does not output marginal effects, but rather predicted values. In this post, I’ll cover plotting and interpreting simple slopes and marginal effects, which hopefully helps researchers using the `margins`

command.

### Basic moderation model

So here we’ve got a classic moderation model:

\(y=\alpha+\beta{x}+\beta{m}+\beta{xm}+\epsilon\)

I’m going to assume that both \(x\) and \(m\) are continuous variables, because that is the model I see most often.

In a moderation model, we are assuming that the effect of \(x\) on \(y\) varies as a function of the level of \(m\). In an interaction model, there is no ‘direct effect’ or ‘main effect’ of \(x\) on \(y\), because the nature of \(x\)’s relationship to \(y\) *depends* on the value of \(m\). You can’t tell a story about \(x\) on \(y\) without including \(m\)—you would be missing the ‘true’ relationship.

### Marginal effects and average marginal effects

In a linear regression model, say \(y=\alpha+\beta{x}+\epsilon\), the marginal effect is the expected change in \(y\) as a function of a one unit change in \(x\). If this definition sounds familiar, it should, because it’s the same explanation for \(\beta\). In fact, we can derive \(\beta\) just in those terms:

\(\beta{x}=\frac{\Delta{y}}{\Delta{x}}\)

In our simple regression model, the marginal effect of \(x\) on \(y\) is constant for every value of \(x\). No matter where in the range of values for \(x\) in our data, \(y\) will *always* change by the same amount as \(x\) increases by one unit.

In an interaction model, the marginal effect of \(x\) on \(y\) **isn’t constant**. It can’t be, if the hypothesized model represents the data. An interaction model assumes that there is a different marginal effect of \(x\) on \(y\) at different values of \(m\). We can write the interaction effect this way…

\(Interaction=\frac{\Delta{y}}{\Delta{x}*\Delta{m}}\)

In the case where \(m\) is dichotomous (0 or 1), The above equation is simply the difference in the expected change in \(y\) for a unit change in \(x\) when \(m\) equals 0 (sometimes called the baseline) versus when \(m\) equals 1.

Where it gets complicated is when \(m\) is continuous, and takes on a whole range of values. Here, the estimate of \(\beta{xm}\) becomes the *Average Marginal Effect (AME)*, which is the average expected change in \(y\) for an average one unit change in \(x\) for a given change in \(m\):

\(\beta{xm}=\frac{\Delta{y}}{\Delta{x}*\Delta{m}}\)

Not exactly intuitive.

### Our interaction model

Lets start by loading the `auto`

database which is included with Stata:

`sysuse auto`

Now we’re going to estimate the following interaction model:

\(Price=\alpha+\beta{mpg}+\beta{weight}+\beta{mpg*weight}+\epsilon\)

Our argument is that the relationship between a car’s price (in 1978) is a function of its gas mileage, but that this relationship changes at different levels of the car’s weight.

**Centering and standardizing variables in interaction models…**

So just a brief note here…

Centering (or standardizing) predictors in an interaction effect to address multicollinearity is an urban legend that just won’t die.

So please, don’t say you are centering/standardizing to deal with multicollinearity. Yes, I am guilty of this in the past myself. Now I know better.

That said, I think centering variables does help to make interpreting the coefficient estimates—and the marginal effects—a lot easier, because we set the mean of \(x\) and \(m\) to be equal to zero. So that’s a good thing, and that’s what I’ll do here. Note that the use of the ‘quietly’ prefix tells Stata to run the command but suppress the output.

```
quietly sum mpg
gen cen_mpg = mpg - r(mean)
quietly sum weight
gen cen_weight = weight - r(mean)
sum cen_mpg cen_weight
```

```
%
% . quietly sum mpg
%
% . gen cen_mpg = mpg - r(mean)
%
% . quietly sum weight
%
% . gen cen_weight = weight - r(mean)
%
% . sum cen_mpg cen_weight
%
% Variable | Obs Mean Std. Dev. Min Max
% -------------+---------------------------------------------------------
% cen_mpg | 74 -4.03e-08 5.785503 -9.297297 19.7027
% cen_weight | 74 -.0000111 777.1936 -1259.459 1820.541
%
% .
```

Now we specify our model. Note the shortcut of the double hash marks (##) and the `c.`

prefix for our variables. The `c.`

prefix tells Stata that our variables are continuous, and are necessary to make the most use of the `margins`

command. The hash mark shortcut tells Stata to multiply `cen.mpg`

and `cen_weight`

together, and also include the constituent terms in the model. In our model, we observe a statistically significant estimate for the interaction between mpg and weight:

`reg price c.cen_mpg##c.cen_weight`

```
%
% . reg price c.cen_mpg##c.cen_weight
%
% Source | SS df MS Number of obs = 74
% -------------+---------------------------------- F(3, 70) = 13.11
% Model | 228430465 3 76143488.5 Prob > F = 0.0000
% Residual | 406634931 70 5809070.44 R-squared = 0.3597
% -------------+---------------------------------- Adj R-squared = 0.3323
% Total | 635065396 73 8699525.97 Root MSE = 2410.2
%
% ------------------------------------------------------------------------------
% price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
% -------------+----------------------------------------------------------------
% cen_mpg | -181.9843 96.14216 -1.89 0.063 -373.7337 9.76524
% cen_weight | .9847513 .6768468 1.45 0.150 -.3651771 2.33468
% |
% c.cen_mpg#|
% c.cen_weight | -.1916796 .0711936 -2.69 0.009 -.3336706 -.0496885
% |
% _cons | 5478.971 378.7809 14.46 0.000 4723.517 6234.426
% ------------------------------------------------------------------------------
%
% .
```

### Using the margins command

So given that the interpretation of \(\beta{xm}\) doesn’t really tell us much about the nature of the interaction effect, it’s common to set \(m\) at some arbitrary value, often +/- 1 standard deviation, and then calculate the effect of \(x\) on \(y\) at the two levels of \(m\). Stata’s `margins`

command makes this really easy to do…

`margins, at(cen_weight=(-777 777)) vsquish`

```
%
% . margins, at(cen_weight=(-777 777)) vsquish
%
% Predictive margins Number of obs = 74
% Model VCE : OLS
%
% Expression : Linear prediction, predict()
% 1._at : cen_weight = -777
% 2._at : cen_weight = 777
%
% ------------------------------------------------------------------------------
% | Delta-method
% | Margin Std. Err. t P>|t| [95% Conf. Interval]
% -------------+----------------------------------------------------------------
% _at |
% 1 | 4713.819 554.9554 8.49 0.000 3606.996 5820.643
% 2 | 6244.123 729.4772 8.56 0.000 4789.226 7699.019
% ------------------------------------------------------------------------------
%
% .
```

By default, the `margins`

command outputs the predicted values of \(y\) at the mean value of \(x\) (in our model, cen_mpg), at whatever value of the moderator (cen_weight) we choose. Here, I specified that I wanted the predicted value of Price when cen_mpg is at its mean value (in our model, that is when cen_mpg is effectively zero), and when cen_weight is at +/- 1 standard deviation from its mean (which we can get from the `sum`

output above, and is roughly 777). (Note that the `vsquish`

option just removes blank lines from the output.)

**Yes, it says “margins”, but the default output isn’t marginal effects, it is predicted values of \(y\).**

To make it a little easier to see, I’ll just estimate a simple linear model regressing price on cen_mpg, and then calling the `margins`

command:

```
reg price cen_mpg
margins
```

```
%
% . reg price cen_mpg
%
% Source | SS df MS Number of obs = 74
% -------------+---------------------------------- F(1, 72) = 20.26
% Model | 139449476 1 139449476 Prob > F = 0.0000
% Residual | 495615920 72 6883554.45 R-squared = 0.2196
% -------------+---------------------------------- Adj R-squared = 0.2087
% Total | 635065396 73 8699525.97 Root MSE = 2623.7
%
% ------------------------------------------------------------------------------
% price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
% -------------+----------------------------------------------------------------
% cen_mpg | -238.8943 53.07669 -4.50 0.000 -344.7008 -133.0879
% _cons | 6165.257 304.9935 20.21 0.000 5557.263 6773.25
% ------------------------------------------------------------------------------
%
% . margins
%
% Predictive margins Number of obs = 74
% Model VCE : OLS
%
% Expression : Linear prediction, predict()
%
% ------------------------------------------------------------------------------
% | Delta-method
% | Margin Std. Err. t P>|t| [95% Conf. Interval]
% -------------+----------------------------------------------------------------
% _cons | 6165.257 304.9935 20.21 0.000 5557.263 6773.25
% ------------------------------------------------------------------------------
%
% .
```

Take a look at the constant (intercept) for the regression model. That’s the expected (predicted) value of Price (\(y\)) when cen_mpg (\(x\)) is equal to zero, which in our case, is the mean value of cen_mpg.

Now take a look at the output of the `margins`

command, under the column header “Margin”. See, it’s the same value of the constant—the predicted value of \(y\) when \(x\) is at its mean, which in our case, is zero. This is one reason why it is very helpful to center predictors.

### Visualizing interaction effects using `margins`

Now lets return to our interaction model. The preceding example estimated the effect of cen_mpg on Price at the mean value of cen_mpg, but at +/- 1 standard deviation of cen_weight.

When we visualize the interaction effect though, the most common plot is to also dichotomize \(x\), by setting it at +/- 1 standard deviation from its mean (roughly -6 and 6 in our data). The effect is to create a simple slope comparison, and the plot will be easily recognizable to management researchers.

```
quietly reg price c.cen_mpg##c.cen_weight
margins, at(cen_mpg=(-6 6) cen_weight=(-777 777)) vsquish
```

```
%
% . quietly reg price c.cen_mpg##c.cen_weight
%
% . margins, at(cen_mpg=(-6 6) cen_weight=(-777 777)) vsquish
%
% Adjusted predictions Number of obs = 74
% Model VCE : OLS
%
% Expression : Linear prediction, predict()
% 1._at : cen_mpg = -6
% cen_weight = -777
% 2._at : cen_mpg = -6
% cen_weight = 777
% 3._at : cen_mpg = 6
% cen_weight = -777
% 4._at : cen_mpg = 6
% cen_weight = 777
%
% ------------------------------------------------------------------------------
% | Delta-method
% | Margin Std. Err. t P>|t| [95% Conf. Interval]
% -------------+----------------------------------------------------------------
% _at |
% 1 | 4912.115 969.3942 5.07 0.000 2978.719 6845.511
% 2 | 8229.639 439.7096 18.72 0.000 7352.665 9106.612
% 3 | 4515.524 412.4696 10.95 0.000 3692.879 5338.169
% 4 | 4258.607 1465.846 2.91 0.005 1335.07 7182.144
% ------------------------------------------------------------------------------
%
% .
```

Now we use the `marginsplot`

function—which creates a distinctive looking plot that is easy to identify in submitted papers—to graph the interaction effect:

`marginsplot`

With our plot, it’s easy to see that we are talking about predicted values of \(y\)—it says so in the title and on the y-axis :)

### Getting the marginal effects

Ok, so how do we use the `margins`

command to actually get us the marginal effects? We need to use the `dydx`

option:

```
quietly reg price c.cen_mpg##c.cen_weight
margins, dydx(cen_mpg) at(cen_weight=(-777 777)) vsquish
```

```
%
% . quietly reg price c.cen_mpg##c.cen_weight
%
% . margins, dydx(cen_mpg) at(cen_weight=(-777 777)) vsquish
%
% Average marginal effects Number of obs = 74
% Model VCE : OLS
%
% Expression : Linear prediction, predict()
% dy/dx w.r.t. : cen_mpg
% 1._at : cen_weight = -777
% 2._at : cen_weight = 777
%
% ------------------------------------------------------------------------------
% | Delta-method
% | dy/dx Std. Err. t P>|t| [95% Conf. Interval]
% -------------+----------------------------------------------------------------
% cen_mpg |
% _at |
% 1 | -33.04924 82.82389 -0.40 0.691 -198.2363 132.1378
% 2 | -330.9193 133.2173 -2.48 0.015 -596.6129 -65.22565
% ------------------------------------------------------------------------------
%
% .
```

Here we are calculating the discrete change (marginal effect) in cen_mpg at +/- 1 standard deviation of cen_weight. That the two marginal effects are so different supports our earlier finding of a statistically significant interaction effect—the change in \(y\) for a given change in \(x\) varies as a function of \(m\).

Here’s the thing though…in reality, our moderator, cen_weight, is continuous, not dichotomous. In a model with a continuous moderator, the marginal effect of \(x\) on \(y\) is itself *continuously* changing. Now, it’s not really feasible to calculate the marginal effect of \(x\) on \(y\) at **all** possible values of \(m\), but we should establish a reasonable range of values. The choice of range delimiter is purely discretionary, and should be based on the scale of the variable. Given that our cen_weight variable has a standard deviation of 777, I’ll use 50 as the delimiter.

In the code below, I’ll have Stata calculate the marginal effect of cen_mpg on Price at values of cen_weight ranging from -777 to 777, incrementing each time by 50:

```
quietly reg price c.cen_mpg##c.cen_weight
margins, dydx(cen_mpg) at(cen_weight=(-777(50)777)) vsquish
```

```
%
% . quietly reg price c.cen_mpg##c.cen_weight
%
% . margins, dydx(cen_mpg) at(cen_weight=(-777(50)777)) vsquish
%
% Average marginal effects Number of obs = 74
% Model VCE : OLS
%
% Expression : Linear prediction, predict()
% dy/dx w.r.t. : cen_mpg
% 1._at : cen_weight = -777
% 2._at : cen_weight = -727
% 3._at : cen_weight = -677
% 4._at : cen_weight = -627
% 5._at : cen_weight = -577
% 6._at : cen_weight = -527
% 7._at : cen_weight = -477
% 8._at : cen_weight = -427
% 9._at : cen_weight = -377
% 10._at : cen_weight = -327
% 11._at : cen_weight = -277
% 12._at : cen_weight = -227
% 13._at : cen_weight = -177
% 14._at : cen_weight = -127
% 15._at : cen_weight = -77
% 16._at : cen_weight = -27
% 17._at : cen_weight = 23
% 18._at : cen_weight = 73
% 19._at : cen_weight = 123
% 20._at : cen_weight = 173
% 21._at : cen_weight = 223
% 22._at : cen_weight = 273
% 23._at : cen_weight = 323
% 24._at : cen_weight = 373
% 25._at : cen_weight = 423
% 26._at : cen_weight = 473
% 27._at : cen_weight = 523
% 28._at : cen_weight = 573
% 29._at : cen_weight = 623
% 30._at : cen_weight = 673
% 31._at : cen_weight = 723
% 32._at : cen_weight = 773
%
% ------------------------------------------------------------------------------
% | Delta-method
% | dy/dx Std. Err. t P>|t| [95% Conf. Interval]
% -------------+----------------------------------------------------------------
% cen_mpg |
% _at |
% 1 | -33.04924 82.82389 -0.40 0.691 -198.2363 132.1378
% 2 | -42.63322 82.63737 -0.52 0.608 -207.4483 122.1818
% 3 | -52.2172 82.60398 -0.63 0.529 -216.9656 112.5312
% 4 | -61.80117 82.72388 -0.75 0.458 -226.7888 103.1864
% 5 | -71.38515 82.99643 -0.86 0.393 -236.9163 94.14601
% 6 | -80.96913 83.42012 -0.97 0.335 -247.3453 85.40706
% 7 | -90.55311 83.99267 -1.08 0.285 -258.0712 76.96499
% 8 | -100.1371 84.71106 -1.18 0.241 -269.088 68.81379
% 9 | -109.7211 85.57161 -1.28 0.204 -280.3883 60.94613
% 10 | -119.305 86.57009 -1.38 0.173 -291.9636 53.35355
% 11 | -128.889 87.70178 -1.47 0.146 -303.8047 46.02667
% 12 | -138.473 88.9616 -1.56 0.124 -315.9013 38.95533
% 13 | -148.057 90.3442 -1.64 0.106 -328.2428 32.12885
% 14 | -157.6409 91.84402 -1.72 0.091 -340.8181 25.53618
% 15 | -167.2249 93.45543 -1.79 0.078 -353.6159 19.16605
% 16 | -176.8089 95.17275 -1.86 0.067 -366.625 13.00716
% 17 | -186.3929 96.99036 -1.92 0.059 -379.8341 7.048296
% 18 | -195.9769 98.90273 -1.98 0.051 -393.2321 1.278425
% 19 | -205.5608 100.9045 -2.04 0.045 -406.8085 -4.313198
% 20 | -215.1448 102.9904 -2.09 0.040 -420.5527 -9.736965
% 21 | -224.7288 105.1554 -2.14 0.036 -434.4547 -15.00286
% 22 | -234.3128 107.3949 -2.18 0.032 -448.5051 -20.12045
% 23 | -243.8967 109.7041 -2.22 0.029 -462.6947 -25.09879
% 24 | -253.4807 112.0789 -2.26 0.027 -477.0149 -29.9465
% 25 | -263.0647 114.515 -2.30 0.025 -491.4577 -34.67171
% 26 | -272.6487 117.0088 -2.33 0.023 -506.0153 -39.28207
% 27 | -282.2327 119.5565 -2.36 0.021 -520.6805 -43.78477
% 28 | -291.8166 122.1548 -2.39 0.020 -535.4467 -48.18653
% 29 | -301.4006 124.8006 -2.42 0.018 -550.3076 -52.49367
% 30 | -310.9846 127.4909 -2.44 0.017 -565.2571 -56.71207
% 31 | -320.5686 130.2229 -2.46 0.016 -580.2899 -60.84725
% 32 | -330.1525 132.994 -2.48 0.015 -595.4008 -64.90431
% ------------------------------------------------------------------------------
%
% .
```

Ok, that’s not the easiest output to make sense of (although you certainly can!). What’s more helpful is to call `marginsplot`

again and have it make us a plot of the calculated marginal effects:

`marginsplot`

### Making sense of the marginal effects plot

Lets take a look at the two visualizations side by side:

Simple Slopes | Marginal Effects |
---|---|

The y-axis of the marginal effects plot (the one on the right) is the marginal effect of cen_mpg (x) on Price. In other words, it’s the estimate of \(\beta{x}\) at that discrete value of cen_weight. We can actually see that range of values in the output table from the `margins, dydx(cen_mpg) at(cen_weight=(-777(50)777)) vsquish`

command. Each value in the `dy/dx`

column *is* a different estimate for \(\beta{x}\).

So what’s it telling us? Well, recall how we determine whether a simple effect of \(x\) on \(y\) is statistically significant:

\(t=\frac{{\beta}_{x}}{{se}_{{\beta}_{x}}}\)

Where \(t\) is our t-statistic that we consider statistically significant at the p < .05 level if its value is roughly greater than +/- 1.96.

Now go back and look at the marginal effects plot, which includes a 95% confidence interval around the point estimate for \(\beta{x}\). Every estimate where the confidence interval crosses zero we would interpret as not being statistically significant at the p < .05 level.

So what does this mean? It means that there is no material change in the change of the effect of \(x\) on \(y\) at that value of \(m\). In other words, no interaction effect at those values of \(m\).

So in our model, we really don’t see a significant interaction effect until cen_weight (our moderator) takes on values roughly larger than its mean value.

When we look at the simple slopes graph next to it, this interpretation makes sense. Take a look at the confidence intervals for the effect of cen_mpg on Price when cen_mpg is at one standard deviation above the mean (roughly 6). See how the confidence intervals substantially overlap? At higher levels of mpg, there is no discernible difference in the value of Price *regardless* of the differing values of weight. The interaction only matters when mpg is low.

### Making these plots better

So these are the basic plots. There are no “official” standards for plotting and visualizing continuous by continuous interaction effects. I think we’ve largely settled on the +/- 1 standard deviation for the values of \(x\) and \(m\) but I really think it’s worthwhile to expand the range. I’m not going to do that here for simplicity, but it’s worth thinking about. So, how would I graph these for publication? Well, here’s one suggestion and code…

#### Simple effects plot

```
quietly reg price c.cen_mpg##c.cen_weight
quietly margins, at(cen_mpg=(-6 6) cen_weight=(-777 777)) vsquish
marginsplot, xtitle("Miles Per Gallon") ytitle("Predicted Value of Price ($)") ///
legend(order(1 "Lighter Weight(lbs) [-1 sd]" 2 "Heavier Weight(lbs) [+1 sd]")) ///
title("Simple Slope Comparison of Miles per Gallon" "on Car Price among Lighter and Heavier Cars") ///
recastci(rarea) scheme(sj)
```

A big change is recasting the confidence intervals to be ribbons instead of bars—I just think it really makes it easy to see the range of significance of the interaction effect, and even better, just how noisy the interaction effect is (with N = 74, it’s pretty darn noisy!). I’m also partial to the *Stata Journal* theme to create plots, which is what the `scheme(sj)`

code does.

#### Marginal effects plot

For my increment of \(m\), I like to use one/tenth of one standard deviation. So in this model, I would set the increment at 77.

```
quietly reg price c.cen_mpg##c.cen_weight
quietly margins, dydx(cen_mpg) at(cen_weight=(-777(77)777)) vsquish
marginsplot, xtitle("Car Weight(lbs)") ytitle("Estimated Effect of Miles Per Gallon on Car Price") ///
title("Change in Marginal Effect of Miles Per Gallon" "on Car Price Across Different Car Weights") ///
xlabel(, alternate) recastci(rarea) yline(0) scheme(sj)
```

Again I recast the confidence intervals as ribbons—it’s just easier to interpret. I also had Stata include a highlighted line at 0 on the y-axis; it makes it easier to see when the marginal effect contains zero. I also alternated the x-axis tick marks to make them easier to read. Lastly, I think it’s important to be really clear on the axis labels to help people make sense of just what the plot is trying to say.

Now, another way to do it, given that the scale of weight is so large, is to make the cutoff points for both the simple slopes and marginal effects closer to a rounded number, for example +/- 750. What’s important though is to keep the same range of values in both plots.

#### Bottom line

The bottom line is that visualizing and interpreting interaction effects with continuous variables needs to go beyond simple slope plots. It also requires us to take a step back and make sure that we understand just what exactly our statistics software is producing for us. In a time where ‘point-and-click’ statistics makes it easy to generate output for our publications, we would all do well to make sure we know just what those clicks are doing :)

As if I wasn’t already a big enough nerd, I created this post about Stata using R :)↩