Trouble at the margins

· by Brian Anderson · Read in about 17 min · (3520 words) ·

As an editor, I’m seeing more papers modeling an interaction effect talking about marginal effects and average marginal effects. This is a great thing, because the simple slope plots that are the de facto standard only tell a limited—and sometimes misleading—story about the underlying nature of an interaction effect.

What sparked this post though is seeing a number of papers discussing marginal effects, and saying that they are plotting marginal effects using Stata’s margins command, but in actuality are plotting simple slopes.1 I think the problem here is that, by default, Stata’s margins command does not output marginal effects, but rather predicted values. In this post, I’ll cover plotting and interpreting simple slopes and marginal effects, which hopefully helps researchers using the margins command.


Basic moderation model

So here we’ve got a classic moderation model:

\(y=\alpha+\beta{x}+\beta{m}+\beta{xm}+\epsilon\)

I’m going to assume that both \(x\) and \(m\) are continuous variables, because that is the model I see most often.

In a moderation model, we are assuming that the effect of \(x\) on \(y\) varies as a function of the level of \(m\). In an interaction model, there is no ‘direct effect’ or ‘main effect’ of \(x\) on \(y\), because the nature of \(x\)’s relationship to \(y\) depends on the value of \(m\). You can’t tell a story about \(x\) on \(y\) without including \(m\)—you would be missing the ‘true’ relationship.


Marginal effects and average marginal effects

In a linear regression model, say \(y=\alpha+\beta{x}+\epsilon\), the marginal effect is the expected change in \(y\) as a function of a one unit change in \(x\). If this definition sounds familiar, it should, because it’s the same explanation for \(\beta\). In fact, we can derive \(\beta\) just in those terms:

\(\beta{x}=\frac{\Delta{y}}{\Delta{x}}\)

In our simple regression model, the marginal effect of \(x\) on \(y\) is constant for every value of \(x\). No matter where in the range of values for \(x\) in our data, \(y\) will always change by the same amount as \(x\) increases by one unit.

In an interaction model, the marginal effect of \(x\) on \(y\) isn’t constant. It can’t be, if the hypothesized model represents the data. An interaction model assumes that there is a different marginal effect of \(x\) on \(y\) at different values of \(m\). We can write the interaction effect this way…

\(Interaction=\frac{\Delta{y}}{\Delta{x}*\Delta{m}}\)

In the case where \(m\) is dichotomous (0 or 1), The above equation is simply the difference in the expected change in \(y\) for a unit change in \(x\) when \(m\) equals 0 (sometimes called the baseline) versus when \(m\) equals 1.

Where it gets complicated is when \(m\) is continuous, and takes on a whole range of values. Here, the estimate of \(\beta{xm}\) becomes the Average Marginal Effect (AME), which is the average expected change in \(y\) for an average one unit change in \(x\) for a given change in \(m\):

\(\beta{xm}=\frac{\Delta{y}}{\Delta{x}*\Delta{m}}\)

Not exactly intuitive.


Our interaction model

Lets start by loading the auto database which is included with Stata:

sysuse auto

Now we’re going to estimate the following interaction model:

\(Price=\alpha+\beta{mpg}+\beta{weight}+\beta{mpg*weight}+\epsilon\)

Our argument is that the relationship between a car’s price (in 1978) is a function of its gas mileage, but that this relationship changes at different levels of the car’s weight.

Centering and standardizing variables in interaction models…

So just a brief note here…

Centering (or standardizing) predictors in an interaction effect to address multicollinearity is an urban legend that just won’t die.

So please, don’t say you are centering/standardizing to deal with multicollinearity. Yes, I am guilty of this in the past myself. Now I know better.

That said, I think centering variables does help to make interpreting the coefficient estimates—and the marginal effects—a lot easier, because we set the mean of \(x\) and \(m\) to be equal to zero. So that’s a good thing, and that’s what I’ll do here. Note that the use of the ‘quietly’ prefix tells Stata to run the command but suppress the output.

quietly sum mpg
gen cen_mpg = mpg - r(mean)
quietly sum weight
gen cen_weight = weight - r(mean)
sum cen_mpg cen_weight
% and
% 
% . quietly sum mpg
% 
% . gen cen_mpg = mpg - r(mean)
% 
% . quietly sum weight
% 
% . gen cen_weight = weight - r(mean)
% 
% . sum cen_mpg cen_weight
% 
%     Variable |        Obs        Mean    Std. Dev.       Min        Max
% -------------+---------------------------------------------------------
%      cen_mpg |         74   -4.03e-08    5.785503  -9.297297    19.7027
%   cen_weight |         74   -.0000111    777.1936  -1259.459   1820.541
% 
% .

Now we specify our model. Note the shortcut of the double hash marks (##) and the c. prefix for our variables. The c. prefix tells Stata that our variables are continuous, and are necessary to make the most use of the margins command. The hash mark shortcut tells Stata to multiply cen.mpg and cen_weight together, and also include the constituent terms in the model. In our model, we observe a statistically significant estimate for the interaction between mpg and weight:

reg price c.cen_mpg##c.cen_weight
% and
% 
% . reg price c.cen_mpg##c.cen_weight
% 
%       Source |       SS           df       MS      Number of obs   =        74
% -------------+----------------------------------   F(3, 70)        =     13.11
%        Model |   228430465         3  76143488.5   Prob > F        =    0.0000
%     Residual |   406634931        70  5809070.44   R-squared       =    0.3597
% -------------+----------------------------------   Adj R-squared   =    0.3323
%        Total |   635065396        73  8699525.97   Root MSE        =    2410.2
% 
% ------------------------------------------------------------------------------
%        price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
% -------------+----------------------------------------------------------------
%      cen_mpg |  -181.9843   96.14216    -1.89   0.063    -373.7337     9.76524
%   cen_weight |   .9847513   .6768468     1.45   0.150    -.3651771     2.33468
%              |
%    c.cen_mpg#|
% c.cen_weight |  -.1916796   .0711936    -2.69   0.009    -.3336706   -.0496885
%              |
%        _cons |   5478.971   378.7809    14.46   0.000     4723.517    6234.426
% ------------------------------------------------------------------------------
% 
% .

Using the margins command

So given that the interpretation of \(\beta{xm}\) doesn’t really tell us much about the nature of the interaction effect, it’s common to set \(m\) at some arbitrary value, often +/- 1 standard deviation, and then calculate the effect of \(x\) on \(y\) at the two levels of \(m\). Stata’s margins command makes this really easy to do…

margins, at(cen_weight=(-777 777)) vsquish
% and
% 
% . margins, at(cen_weight=(-777 777)) vsquish
% 
% Predictive margins                              Number of obs     =         74
% Model VCE    : OLS
% 
% Expression   : Linear prediction, predict()
% 1._at        : cen_weight      =        -777
% 2._at        : cen_weight      =         777
% 
% ------------------------------------------------------------------------------
%              |            Delta-method
%              |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
% -------------+----------------------------------------------------------------
%          _at |
%           1  |   4713.819   554.9554     8.49   0.000     3606.996    5820.643
%           2  |   6244.123   729.4772     8.56   0.000     4789.226    7699.019
% ------------------------------------------------------------------------------
% 
% .

By default, the margins command outputs the predicted values of \(y\) at the mean value of \(x\) (in our model, cen_mpg), at whatever value of the moderator (cen_weight) we choose. Here, I specified that I wanted the predicted value of Price when cen_mpg is at its mean value (in our model, that is when cen_mpg is effectively zero), and when cen_weight is at +/- 1 standard deviation from its mean (which we can get from the sum output above, and is roughly 777). (Note that the vsquish option just removes blank lines from the output.)

Yes, it says “margins”, but the default output isn’t marginal effects, it is predicted values of \(y\).

To make it a little easier to see, I’ll just estimate a simple linear model regressing price on cen_mpg, and then calling the margins command:

reg price cen_mpg
margins
% and
% 
% . reg price cen_mpg
% 
%       Source |       SS           df       MS      Number of obs   =        74
% -------------+----------------------------------   F(1, 72)        =     20.26
%        Model |   139449476         1   139449476   Prob > F        =    0.0000
%     Residual |   495615920        72  6883554.45   R-squared       =    0.2196
% -------------+----------------------------------   Adj R-squared   =    0.2087
%        Total |   635065396        73  8699525.97   Root MSE        =    2623.7
% 
% ------------------------------------------------------------------------------
%        price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
% -------------+----------------------------------------------------------------
%      cen_mpg |  -238.8943   53.07669    -4.50   0.000    -344.7008   -133.0879
%        _cons |   6165.257   304.9935    20.21   0.000     5557.263     6773.25
% ------------------------------------------------------------------------------
% 
% . margins
% 
% Predictive margins                              Number of obs     =         74
% Model VCE    : OLS
% 
% Expression   : Linear prediction, predict()
% 
% ------------------------------------------------------------------------------
%              |            Delta-method
%              |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
% -------------+----------------------------------------------------------------
%        _cons |   6165.257   304.9935    20.21   0.000     5557.263     6773.25
% ------------------------------------------------------------------------------
% 
% .

Take a look at the constant (intercept) for the regression model. That’s the expected (predicted) value of Price (\(y\)) when cen_mpg (\(x\)) is equal to zero, which in our case, is the mean value of cen_mpg.

Now take a look at the output of the margins command, under the column header “Margin”. See, it’s the same value of the constant—the predicted value of \(y\) when \(x\) is at its mean, which in our case, is zero. This is one reason why it is very helpful to center predictors.


Visualizing interaction effects using margins

Now lets return to our interaction model. The preceding example estimated the effect of cen_mpg on Price at the mean value of cen_mpg, but at +/- 1 standard deviation of cen_weight.

When we visualize the interaction effect though, the most common plot is to also dichotomize \(x\), by setting it at +/- 1 standard deviation from its mean (roughly -6 and 6 in our data). The effect is to create a simple slope comparison, and the plot will be easily recognizable to management researchers.

quietly reg price c.cen_mpg##c.cen_weight
margins, at(cen_mpg=(-6 6) cen_weight=(-777 777)) vsquish
% and
% 
% . quietly reg price c.cen_mpg##c.cen_weight
% 
% . margins, at(cen_mpg=(-6 6) cen_weight=(-777 777)) vsquish
% 
% Adjusted predictions                            Number of obs     =         74
% Model VCE    : OLS
% 
% Expression   : Linear prediction, predict()
% 1._at        : cen_mpg         =          -6
%                cen_weight      =        -777
% 2._at        : cen_mpg         =          -6
%                cen_weight      =         777
% 3._at        : cen_mpg         =           6
%                cen_weight      =        -777
% 4._at        : cen_mpg         =           6
%                cen_weight      =         777
% 
% ------------------------------------------------------------------------------
%              |            Delta-method
%              |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
% -------------+----------------------------------------------------------------
%          _at |
%           1  |   4912.115   969.3942     5.07   0.000     2978.719    6845.511
%           2  |   8229.639   439.7096    18.72   0.000     7352.665    9106.612
%           3  |   4515.524   412.4696    10.95   0.000     3692.879    5338.169
%           4  |   4258.607   1465.846     2.91   0.005      1335.07    7182.144
% ------------------------------------------------------------------------------
% 
% .

Now we use the marginsplot function—which creates a distinctive looking plot that is easy to identify in submitted papers—to graph the interaction effect:

marginsplot


With our plot, it’s easy to see that we are talking about predicted values of \(y\)—it says so in the title and on the y-axis :)


Getting the marginal effects

Ok, so how do we use the margins command to actually get us the marginal effects? We need to use the dydx option:

quietly reg price c.cen_mpg##c.cen_weight
margins, dydx(cen_mpg) at(cen_weight=(-777 777)) vsquish
% and
% 
% . quietly reg price c.cen_mpg##c.cen_weight
% 
% . margins, dydx(cen_mpg) at(cen_weight=(-777 777)) vsquish
% 
% Average marginal effects                        Number of obs     =         74
% Model VCE    : OLS
% 
% Expression   : Linear prediction, predict()
% dy/dx w.r.t. : cen_mpg
% 1._at        : cen_weight      =        -777
% 2._at        : cen_weight      =         777
% 
% ------------------------------------------------------------------------------
%              |            Delta-method
%              |      dy/dx   Std. Err.      t    P>|t|     [95% Conf. Interval]
% -------------+----------------------------------------------------------------
% cen_mpg      |
%          _at |
%           1  |  -33.04924   82.82389    -0.40   0.691    -198.2363    132.1378
%           2  |  -330.9193   133.2173    -2.48   0.015    -596.6129   -65.22565
% ------------------------------------------------------------------------------
% 
% .

Here we are calculating the discrete change (marginal effect) in cen_mpg at +/- 1 standard deviation of cen_weight. That the two marginal effects are so different supports our earlier finding of a statistically significant interaction effect—the change in \(y\) for a given change in \(x\) varies as a function of \(m\).

Here’s the thing though…in reality, our moderator, cen_weight, is continuous, not dichotomous. In a model with a continuous moderator, the marginal effect of \(x\) on \(y\) is itself continuously changing. Now, it’s not really feasible to calculate the marginal effect of \(x\) on \(y\) at all possible values of \(m\), but we should establish a reasonable range of values. The choice of range delimiter is purely discretionary, and should be based on the scale of the variable. Given that our cen_weight variable has a standard deviation of 777, I’ll use 50 as the delimiter.

In the code below, I’ll have Stata calculate the marginal effect of cen_mpg on Price at values of cen_weight ranging from -777 to 777, incrementing each time by 50:

quietly reg price c.cen_mpg##c.cen_weight
margins, dydx(cen_mpg) at(cen_weight=(-777(50)777)) vsquish
% and
% 
% . quietly reg price c.cen_mpg##c.cen_weight
% 
% . margins, dydx(cen_mpg) at(cen_weight=(-777(50)777)) vsquish
% 
% Average marginal effects                        Number of obs     =         74
% Model VCE    : OLS
% 
% Expression   : Linear prediction, predict()
% dy/dx w.r.t. : cen_mpg
% 1._at        : cen_weight      =        -777
% 2._at        : cen_weight      =        -727
% 3._at        : cen_weight      =        -677
% 4._at        : cen_weight      =        -627
% 5._at        : cen_weight      =        -577
% 6._at        : cen_weight      =        -527
% 7._at        : cen_weight      =        -477
% 8._at        : cen_weight      =        -427
% 9._at        : cen_weight      =        -377
% 10._at       : cen_weight      =        -327
% 11._at       : cen_weight      =        -277
% 12._at       : cen_weight      =        -227
% 13._at       : cen_weight      =        -177
% 14._at       : cen_weight      =        -127
% 15._at       : cen_weight      =         -77
% 16._at       : cen_weight      =         -27
% 17._at       : cen_weight      =          23
% 18._at       : cen_weight      =          73
% 19._at       : cen_weight      =         123
% 20._at       : cen_weight      =         173
% 21._at       : cen_weight      =         223
% 22._at       : cen_weight      =         273
% 23._at       : cen_weight      =         323
% 24._at       : cen_weight      =         373
% 25._at       : cen_weight      =         423
% 26._at       : cen_weight      =         473
% 27._at       : cen_weight      =         523
% 28._at       : cen_weight      =         573
% 29._at       : cen_weight      =         623
% 30._at       : cen_weight      =         673
% 31._at       : cen_weight      =         723
% 32._at       : cen_weight      =         773
% 
% ------------------------------------------------------------------------------
%              |            Delta-method
%              |      dy/dx   Std. Err.      t    P>|t|     [95% Conf. Interval]
% -------------+----------------------------------------------------------------
% cen_mpg      |
%          _at |
%           1  |  -33.04924   82.82389    -0.40   0.691    -198.2363    132.1378
%           2  |  -42.63322   82.63737    -0.52   0.608    -207.4483    122.1818
%           3  |   -52.2172   82.60398    -0.63   0.529    -216.9656    112.5312
%           4  |  -61.80117   82.72388    -0.75   0.458    -226.7888    103.1864
%           5  |  -71.38515   82.99643    -0.86   0.393    -236.9163    94.14601
%           6  |  -80.96913   83.42012    -0.97   0.335    -247.3453    85.40706
%           7  |  -90.55311   83.99267    -1.08   0.285    -258.0712    76.96499
%           8  |  -100.1371   84.71106    -1.18   0.241     -269.088    68.81379
%           9  |  -109.7211   85.57161    -1.28   0.204    -280.3883    60.94613
%          10  |   -119.305   86.57009    -1.38   0.173    -291.9636    53.35355
%          11  |   -128.889   87.70178    -1.47   0.146    -303.8047    46.02667
%          12  |   -138.473    88.9616    -1.56   0.124    -315.9013    38.95533
%          13  |   -148.057    90.3442    -1.64   0.106    -328.2428    32.12885
%          14  |  -157.6409   91.84402    -1.72   0.091    -340.8181    25.53618
%          15  |  -167.2249   93.45543    -1.79   0.078    -353.6159    19.16605
%          16  |  -176.8089   95.17275    -1.86   0.067     -366.625    13.00716
%          17  |  -186.3929   96.99036    -1.92   0.059    -379.8341    7.048296
%          18  |  -195.9769   98.90273    -1.98   0.051    -393.2321    1.278425
%          19  |  -205.5608   100.9045    -2.04   0.045    -406.8085   -4.313198
%          20  |  -215.1448   102.9904    -2.09   0.040    -420.5527   -9.736965
%          21  |  -224.7288   105.1554    -2.14   0.036    -434.4547   -15.00286
%          22  |  -234.3128   107.3949    -2.18   0.032    -448.5051   -20.12045
%          23  |  -243.8967   109.7041    -2.22   0.029    -462.6947   -25.09879
%          24  |  -253.4807   112.0789    -2.26   0.027    -477.0149    -29.9465
%          25  |  -263.0647    114.515    -2.30   0.025    -491.4577   -34.67171
%          26  |  -272.6487   117.0088    -2.33   0.023    -506.0153   -39.28207
%          27  |  -282.2327   119.5565    -2.36   0.021    -520.6805   -43.78477
%          28  |  -291.8166   122.1548    -2.39   0.020    -535.4467   -48.18653
%          29  |  -301.4006   124.8006    -2.42   0.018    -550.3076   -52.49367
%          30  |  -310.9846   127.4909    -2.44   0.017    -565.2571   -56.71207
%          31  |  -320.5686   130.2229    -2.46   0.016    -580.2899   -60.84725
%          32  |  -330.1525    132.994    -2.48   0.015    -595.4008   -64.90431
% ------------------------------------------------------------------------------
% 
% .

Ok, that’s not the easiest output to make sense of (although you certainly can!). What’s more helpful is to call marginsplot again and have it make us a plot of the calculated marginal effects:

marginsplot



Making sense of the marginal effects plot

Lets take a look at the two visualizations side by side:

Simple Slopes Marginal Effects


The y-axis of the marginal effects plot (the one on the right) is the marginal effect of cen_mpg (x) on Price. In other words, it’s the estimate of \(\beta{x}\) at that discrete value of cen_weight. We can actually see that range of values in the output table from the margins, dydx(cen_mpg) at(cen_weight=(-777(50)777)) vsquish command. Each value in the dy/dx column is a different estimate for \(\beta{x}\).

So what’s it telling us? Well, recall how we determine whether a simple effect of \(x\) on \(y\) is statistically significant:

\(t=\frac{{\beta}_{x}}{{se}_{{\beta}_{x}}}\)

Where \(t\) is our t-statistic that we consider statistically significant at the p < .05 level if its value is roughly greater than +/- 1.96.

Now go back and look at the marginal effects plot, which includes a 95% confidence interval around the point estimate for \(\beta{x}\). Every estimate where the confidence interval crosses zero we would interpret as not being statistically significant at the p < .05 level.

So what does this mean? It means that there is no material change in the change of the effect of \(x\) on \(y\) at that value of \(m\). In other words, no interaction effect at those values of \(m\).

So in our model, we really don’t see a significant interaction effect until cen_weight (our moderator) takes on values roughly larger than its mean value.

When we look at the simple slopes graph next to it, this interpretation makes sense. Take a look at the confidence intervals for the effect of cen_mpg on Price when cen_mpg is at one standard deviation above the mean (roughly 6). See how the confidence intervals substantially overlap? At higher levels of mpg, there is no discernible difference in the value of Price regardless of the differing values of weight. The interaction only matters when mpg is low.


Making these plots better

So these are the basic plots. There are no “official” standards for plotting and visualizing continuous by continuous interaction effects. I think we’ve largely settled on the +/- 1 standard deviation for the values of \(x\) and \(m\) but I really think it’s worthwhile to expand the range. I’m not going to do that here for simplicity, but it’s worth thinking about. So, how would I graph these for publication? Well, here’s one suggestion and code…


Simple effects plot

quietly reg price c.cen_mpg##c.cen_weight
quietly margins, at(cen_mpg=(-6 6) cen_weight=(-777 777)) vsquish

marginsplot, xtitle("Miles Per Gallon") ytitle("Predicted Value of Price ($)") ///
  legend(order(1 "Lighter Weight(lbs) [-1 sd]" 2 "Heavier Weight(lbs) [+1 sd]")) ///
  title("Simple Slope Comparison of Miles per Gallon" "on Car Price among Lighter and Heavier Cars") ///
  recastci(rarea) scheme(sj)


A big change is recasting the confidence intervals to be ribbons instead of bars—I just think it really makes it easy to see the range of significance of the interaction effect, and even better, just how noisy the interaction effect is (with N = 74, it’s pretty darn noisy!). I’m also partial to the Stata Journal theme to create plots, which is what the scheme(sj) code does.


Marginal effects plot

For my increment of \(m\), I like to use one/tenth of one standard deviation. So in this model, I would set the increment at 77.

quietly reg price c.cen_mpg##c.cen_weight
quietly margins, dydx(cen_mpg) at(cen_weight=(-777(77)777)) vsquish

marginsplot, xtitle("Car Weight(lbs)") ytitle("Estimated Effect of Miles Per Gallon on Car Price") ///
  title("Change in Marginal Effect of Miles Per Gallon" "on Car Price Across Different Car Weights") ///
  xlabel(, alternate) recastci(rarea) yline(0) scheme(sj)


Again I recast the confidence intervals as ribbons—it’s just easier to interpret. I also had Stata include a highlighted line at 0 on the y-axis; it makes it easier to see when the marginal effect contains zero. I also alternated the x-axis tick marks to make them easier to read. Lastly, I think it’s important to be really clear on the axis labels to help people make sense of just what the plot is trying to say.

Now, another way to do it, given that the scale of weight is so large, is to make the cutoff points for both the simple slopes and marginal effects closer to a rounded number, for example +/- 750. What’s important though is to keep the same range of values in both plots.


Bottom line

The bottom line is that visualizing and interpreting interaction effects with continuous variables needs to go beyond simple slope plots. It also requires us to take a step back and make sure that we understand just what exactly our statistics software is producing for us. In a time where ‘point-and-click’ statistics makes it easy to generate output for our publications, we would all do well to make sure we know just what those clicks are doing :)


  1. As if I wasn’t already a big enough nerd, I created this post about Stata using R :)