Multilevel Modeling

· by Brian Anderson · Read in about 30 min · (6211 words) ·

This is admittedly a long post.

I’m going to look at multilevel modeling from two perspectives, economics and psychology. Both perspectives rely on the same underlying mathematics to model multilevel data. To an extent, both interpret the results in a similar way, but they approach multilevel modeling from different traditions, and even different notation. The purpose of the post is to help entrepreneurship scholars reconcile the similarities—and differences—across multilevel models estimated from an economics perspective and a psychology perspective.

My starting research question (because we always need to start with a question) is…

Does being in a wealthier community improve the surviability of new businesses?


Data

I’m going to use two data sources…

I’ve already worked through how to tidy and join both datasets, so I’m just going to use a quick script…

# Load two datasets
library(tidyverse)
kauffman.ds <- read_csv("http://www.drbanderson.com/data/MainStreetEntrepreneurship.csv")
pcgdp.ds <- read_csv("http://www.drbanderson.com/data/gmpPCGDP.csv")

# Filter and Tidy the Kauffman Data we need
survive.df <- kauffman.ds %>% 
  select(Location, 'Index Year', Location_name, Survival_Rate) %>% 
  rename(Year = 'Index Year') %>% 
  filter(!is.na(Survival_Rate)) %>% 
  arrange(Location_name, Year)

# Tidy the GDP data from BEA
pcgdp.df <- pcgdp.ds %>% 
  filter(!is.na(GeoName)) %>% 
  select(GeoFIPS, GeoName, `2001`:`2016`) %>% 
  gather(Year, GDP_PC, `2001`:`2016`) %>% 
  filter(GeoFIPS != "00998") %>% 
  mutate(GeoName = str_remove(GeoName, ",.*")) %>% 
  rename(Location = GeoFIPS,
         Location_name = GeoName) %>% 
  mutate(Location = as.integer(Location),
         Year = as.integer(Year)) %>% 
  filter(Year >= 2001) %>% 
  filter(Location > 56) %>% 
  arrange(Location_name, Year)

# Join the data
survival.df <- inner_join(survive.df, pcgdp.df, by = c("Location", "Year"))

# Tidy the joined data
survival.df <- survival.df %>% 
  select(-Location_name.y) %>% 
  rename(MSA = Location_name.x) %>% 
  mutate(MSA = as.factor(MSA)) %>% 
  mutate(logGDP_PC = log(GDP_PC))

Our outcome will be the five-year rate of new business survival, Survival_Rate, and our predictor will be the log-transformed value of GDP per capita, logGDP_PC.


Economics Perspective

We start with some sort of nested or hierarchical data, where each observation occurs within a higher order entity. Most often when you are working in the economics perspective you are modeling observations of an entity occurring over multiple time points, which is often called panel data.

We’re really interested in two possible research questions…

  1. Is there a meaningful within entity effect of \(x\) on \(y\)?
  2. Is there a meaningful between entity effect of \(x\) on \(y\)?

Model Specification

Lets take a look at some assumptions about multilevel models from the economics perspective…

\[y_{it}=\alpha+\beta{x_{it}}+\mu_{i}+\epsilon_{it}\]

  • \(y\) = Value of the \(t^{th}\) observation of \(y\) for the \(i^{th}\) entity
  • \(\alpha\) = The value of \(y\) when \(x\) equals zero across all \(i, t's\)
  • \(\beta\) = The expected change in \(y\) for an average \(i\) across time \(t\). Note that this interpretation gets more complicated, quickly.
  • \(x\) = Value of the \(t^{th}\) observation of \(x\) for the \(i^{th}\) entity
  • \(\mu\) = The portion of the disturbance term unique to \(i\) and that is constant over time \(t\).
  • \(\epsilon\) = The variance in the \(t^{th}\) observation of \(y\) for the \(i^{th}\) entity that is not explained by the variance in the \(t^{th}\) observation of \(x\) for the \(i^{th}\) entity or \(\mu_{i}\).

With panel data, we are looking at two different effects of \(x\) on \(y\).

  • The within effect, which is the average effect of \(x\) on \(y\) for a given (average) \(i\) across time \(t\)
  • The between effect, which is the average effect of \(x\) on \(y\) across \(i's\) over time \(t\)

We can decompose \(x_{it}\) into the following:

\[x_{it}=\gamma_i+\tau_{it}\]

For each \(x_{it}\), there is going to be a between-component, \(\gamma\), that never changes over time for each \(i\) in the sample (the firm, for example). But there is also going to be a within-component, \(\tau\), that can change for each \(\gamma\) over time (the firm’s sales, for example).

Pooled Model

OLS makes the assumption that each observation is i.i.d. to estimate a consistent parameter—\(\beta\)—for the change in \(y\) expected for each one unit change in \(x\).

With panel data, we violate that assumption. Given that each \(t\) occurs as a function of the \(i\) it nests under, we assume that there is some type of clustering of each \(t\) around its respective \(i\). If this is true, then our OLS estimator will be inconsistent.

With a pooled model, we are estimating a simple linear model that ignores the multilevel nature of the data…

\[y = \alpha + \beta{x} + \epsilon\]

summary(lm(Survival_Rate ~ logGDP_PC, data = survival.df))
## 
## Call:
## lm(formula = Survival_Rate ~ logGDP_PC, data = survival.df)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.120740 -0.022482 -0.000354  0.023188  0.112334 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.070933   0.073613  -0.964    0.336    
## logGDP_PC    0.048725   0.006761   7.206 1.67e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03516 on 622 degrees of freedom
## Multiple R-squared:  0.07706,    Adjusted R-squared:  0.07557 
## F-statistic: 51.93 on 1 and 622 DF,  p-value: 1.671e-12

And a plot…

ggplot(survival.df, aes(y = Survival_Rate, x = logGDP_PC)) + 
  geom_point() + 
  geom_smooth(method = "lm")

Random Effects Model

So it is usually not the case that \(\beta{_{it}}\) = \(\beta\) for all \(i,t\) in multilevel data. So what do we do?

Well, we need to add a term to our equation:

\[y_{it}=\alpha+\beta{x_{it}}+\mu_{i}+\epsilon_{it}\]

The \(\mu_{i}\) term is another predictor of \(y_{it}\). In this case, it is an unobserved portion of the disturbance term (\(\epsilon\)) that represents the between \(i\) differences for each \(t\). The kicker is that the model assumes that \(\mu_{i}\) has a mean of zero and constant variance.

This means that each MSA’s slope may vary, but it does so randomly around a mean of zero—the differences between \(i's\) effectively wash each other out.

You can also think about it as yes, each \(i\) has an effect on it’s \(t's\), so we need to account for it in our estimate of \(\beta_{x}\), but we really don’t care what the specific \(i\) effect is (just the average effect) because the differences occur at random.

In the random effects model, you are explicitly making the assumption that \(\mu_{i}\) is NOT correlated with the focal predictor(s) in the model—any \(i\) effect does not correlate with \(x_{it}\).

We can fit a random effect model using the plm package

library(plm)

random.model <- plm(Survival_Rate ~ logGDP_PC, data = survival.df, 
                    index = c("Location", "Year"), model = "random")

summary(random.model)
## Oneway (individual) effect Random Effect Model 
##    (Swamy-Arora's transformation)
## 
## Call:
## plm(formula = Survival_Rate ~ logGDP_PC, data = survival.df, 
##     model = "random", index = c("Location", "Year"))
## 
## Balanced Panel: n = 39, T = 16, N = 624
## 
## Effects:
##                     var   std.dev share
## idiosyncratic 0.0006902 0.0262715 0.563
## individual    0.0005363 0.0231591 0.437
## theta: 0.7272
## 
## Residuals:
##        Min.     1st Qu.      Median     3rd Qu.        Max. 
## -0.08916311 -0.01820744  0.00081681  0.01784548  0.10210116 
## 
## Coefficients:
##              Estimate Std. Error t-value  Pr(>|t|)    
## (Intercept) -0.603711   0.144343 -4.1825 3.298e-05 ***
## logGDP_PC    0.097670   0.013256  7.3681 5.527e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    0.47834
## Residual Sum of Squares: 0.43994
## R-Squared:      0.080275
## Adj. R-Squared: 0.078797
## F-statistic: 54.2895 on 1 and 622 DF, p-value: 5.5272e-13

The idiosyncratic effect in the output is the variance of \(\epsilon_{it}\). This is analogous to the disturbance term in OLS regression, and it accounts for about 56% of the variance in the combined (\(\mu\) and \(\epsilon\)) disturbance term.

The individual effect is the variance of \(\mu_{i}\). This is the unobserved estimate in the \(i\) effect variance. It accounts for 43% of the variance in the combined (\(\mu\) and \(\epsilon\)) disturbance term.

One of the challenges though with interpreting the effect of \(x\) on \(y\) in a random effect model is that the estimate of \(\beta_{x}\) contains variance between \(i's\) and within \(i's\).

Think about it this way…Is the effect of wealth on survivability because of differences between MSAs, or because of differences within an MSA over time? In a random effect model, you are saying both differences matter, but all of the differences are included in the estimate of \(\beta\).

Fixed Effect Model

It seems clear though that there are meaningful between, and within differences. So one of the questions is whether we can relax the assumption that differences between MSAs occur at random. If this isn’t the case, then our modeling options are limited to only the within \(i\) effect.

We do this with a fixed effect model at the MSA level. We need to stop \(\mu_{i}\) from being random, and hold its effect as fixed.

Lets specify our model with a MSA fixed effect.

fixed.model <- plm(Survival_Rate ~ logGDP_PC, data = survival.df, 
                   index = c("Location","Year"), model = "within")

summary(fixed.model)
## Oneway (individual) effect Within Model
## 
## Call:
## plm(formula = Survival_Rate ~ logGDP_PC, data = survival.df, 
##     model = "within", index = c("Location", "Year"))
## 
## Balanced Panel: n = 39, T = 16, N = 624
## 
## Residuals:
##       Min.    1st Qu.     Median    3rd Qu.       Max. 
## -0.0860454 -0.0165294  0.0033831  0.0171706  0.0922930 
## 
## Coefficients:
##           Estimate Std. Error t-value Pr(>|t|)    
## logGDP_PC 0.146612   0.017815  8.2295 1.23e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    0.44981
## Residual Sum of Squares: 0.40307
## R-Squared:      0.10392
## Adj. R-Squared: 0.044074
## F-statistic: 67.7241 on 1 and 584 DF, p-value: 1.2301e-15

Theoretical considerations aside, a random effect model is more efficient, because it uses within and between variance. So ideally, we would like to use that one.

But if \(i's\) correlate with their \(t's\), or if there is unobserved heterogeneity, then wanting to use a random effect model is immaterial—the estimate of \(\beta\) will be inconsistent.

So we use a workhorse of economics, the Hausman test, to determine whether there is a systematic difference in \(\beta\) as a function of the unobserved variance in the \(i\) effect.

phtest(fixed.model, random.model)
## 
##  Hausman Test
## 
## data:  Survival_Rate ~ logGDP_PC
## chisq = 16.907, df = 1, p-value = 3.926e-05
## alternative hypothesis: one model is inconsistent

Our null is that there is no systematic difference, and we start from the assumption that our fixed effect model is consistent. So it looks like we need to use a fixed effect model, which isn’t all that surprising given our plots earlier.

Role of Time

But what about time? Our data spans the 2008-2010 recession, so it seems reasonable that there may be a effect of time on business survivor rate. We can test this assumption using the Breusch-Pagan Lagrange Multiplier Test, under the null hypothesis that \(\beta_{it}\) = \(\beta\) for all \(i,t\)

plmtest(fixed.model, c("time"), type=("bp"))
## 
##  Lagrange Multiplier Test - time effects (Breusch-Pagan) for
##  balanced panels
## 
## data:  Survival_Rate ~ logGDP_PC
## chisq = 933.68, df = 1, p-value < 2.2e-16
## alternative hypothesis: significant effects

Not surprisingly we reject the null, and should include a Year fixed effect in our model as well.

fixed.year.model <- plm(Survival_Rate ~ logGDP_PC, data = survival.df, 
                        index = c("Location","Year"), 
                        model = "within", effect = c("twoways"))

summary(fixed.year.model)
## Twoways effects Within Model
## 
## Call:
## plm(formula = Survival_Rate ~ logGDP_PC, data = survival.df, 
##     effect = c("twoways"), model = "within", index = c("Location", 
##         "Year"))
## 
## Balanced Panel: n = 39, T = 16, N = 624
## 
## Residuals:
##        Min.     1st Qu.      Median     3rd Qu.        Max. 
## -5.9978e-02 -9.7363e-03 -8.0923e-05  9.0096e-03  7.8115e-02 
## 
## Coefficients:
##           Estimate Std. Error t-value  Pr(>|t|)    
## logGDP_PC 0.137117   0.014846  9.2356 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    0.20998
## Residual Sum of Squares: 0.18261
## R-Squared:      0.13036
## Adj. R-Squared: 0.047833
## F-statistic: 85.2971 on 1 and 569 DF, p-value: < 2.22e-16

So that’s substantially different from our pooled model, and seems to suggest a pretty strong positive relationship between the wealth of an MSA and startup survivability. Subject, of course, to the limitations of our data.




Psychology Perspective

When I say psychology, most other social science fields (sociology, political science, education, and others) approach multilevel modeling from this perspective. In the psychology perspective, you typically model one entity nested within an aggregate entity. For example, students nested within classrooms; employees nested within a manager; homes nested within counties or MSA, and so forth.

We start with the same nested or hierarchical data structure, but we’re interested really in three possible outcomes…

  1. Is there a meaningful lower level effect of \(x\) on \(y\)?
  2. Is there a meaningful higher level effect of \(X_j\) on \(y\)?
  3. Is there a meaningful cross-level effect of \(X_j*x\) on \(y\)?

Terms

We need to talk about some key terms, and how they relate to the economics perspective…

  • A fixed effect here is an estimate that is constant across higher level entities

  • A random effect is an estimate that varies across higher level entities

Model Specification

In what can be very frustrating for folks new to multilevel modeling, the economics and psychology perspectives use different notation. I’m going to use the classic notion for multilevel models.

We can split our model into a series of regression equations for Level 1 (the lower level) and for Level 2 (the higher level).

Level 1 \[y_{ij} = \beta_{0j} + \beta_{1j}x_{ij} + \epsilon_{ij}\]

  • \(y\) = Value of the \(i^{th}\) observation of \(y\) for the \(j^{th}\) entity
  • \(\beta_{0j}\) = The value of \(y\) (intercept) for the \(j^{th}\) entity
  • \(\beta_{1j}\) = The expected change in \(y\) (slope) for a one unit change in \(x\) for the \(j^{th}\) entity
  • \(x\) = Value of the \(i^{th}\) observation of \(x\) for the \(j^{th}\) entity
  • \(\epsilon_{ij}\) = The variance in the \(i^{th}\) observation of \(y\) for the \(j^{th}\) entity that is not explained by the Level 1 equation (sometimes called \(r_{ij}\))

Level 2

\[\beta_{0j} = \gamma_{00} + u_{0j}\]

  • \(\gamma_{00}\) = The overall intercept, also called the grand mean; the expected value of \(y\) for all \(i,j\) observations when all Level 1 and Level 2 predictors equal zero
  • \(u_{0j}\) = The deviation in the intercept for the \(j^{th}\) entity from the overall intercept (\(\gamma_{00}\))

\[\beta_{1j} = \gamma_{10} + u_{1j}\]

  • \(\gamma_{10}\) = The expected change (slope) in \(y\) for an average \(j\) across observations \(i\)
  • \(u_{1j}\) = The deviation in the slope for the \(j^{th}\) entity from the overall (average) slope (\(\gamma_{10}\))

Variances

In multilevel models, what we are really after is understanding different type of variance occurring at different levels. We’re trying to quantity uncertainty around the effect of \(x\) on \(y\) that is because of differences within and between \(j\) groups.

For our research question, we are going to probe whether the effect of GDP per capita (as a proxy for MSA wealth) on new business survivability varies over time within each MSA and across MSAs.

To do that though, we need a bit more notation…

  • \(\sigma^2\) = Estimated within-group variance (\(\epsilon_{ij}\))
  • \(\tau_{00}\) = Estimated between-group variance (\(u_{0j}\))
  • \(\tau_{11}\) = Estimated random-slope variance (\(u_{1j}\))
  • \(\tau_{01}\) = Estimated random-intercept / random-slope covariance
  • \(\rho_{01}\) = Estimated random-intercept / random-slope correlation

Model Construction

Multilevel model construction generally proceeds in four steps…

  1. Intercept only model
  2. Random intercept/fixed slopes
  3. Random intercept/random slopes
  4. Cross-level interactions (if specified)

Generally in papers with multilevel models, you will only see step 3 and if moderation, step 4. All of the steps are important though, and it’s a best practice to include all of them.

Now that said, drawing proper inference in steps 3 and 4 depends on the assumption that the observation of \(x_{ij}\) does not correlate with unobserved factors existing at the \(j\) level.

Intercept Only

For all of our models here, we’re going to use the lme4 package. Because the math is the same, we could estimate the same models using plm, but the way examples and online resources are presented varies based on tradition—plm comes from an economics tradition, and lme4 from a psychology tradition.

We are also going to use the sjstats and lmerTest libraries, which add some useful features for interpreting multilevel models estimated with lme4.

Lets start with the intercept only model…

Level 1

\[y_{ij} = \beta_{0j} + \epsilon_{ij}\]

Level 2

\[\beta_{0j} = \gamma_{00} + u_{0j}\]

library(lme4)
library(lmerTest)
library(sjstats)

intercept.model <- lmer(Survival_Rate ~ (1 | Location), data = survival.df)
summary(intercept.model)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Survival_Rate ~ (1 | Location)
##    Data: survival.df
## 
## REML criterion at convergence: -2595
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.3900 -0.6716  0.0585  0.7049  3.9235 
## 
## Random effects:
##  Groups   Name        Variance  Std.Dev.
##  Location (Intercept) 0.0005821 0.02413 
##  Residual             0.0007689 0.02773 
## Number of obs: 624, groups:  Location, 39
## 
## Fixed effects:
##             Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)  0.45944    0.00402 38.00000   114.3   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Lets take a look at the estimate for \(\beta_{0j}\). The Fixed effects: (Intercept) is the expected grand mean of \(y\) (Survival_Rate) when the predictors are zero, and across all \(j's\) (MSAs). We could just have easily run the following and gotten the same value…

survival.df %>%
  summarise(mean(Survival_Rate))
## # A tibble: 1 x 1
##   `mean(Survival_Rate)`
##                   <dbl>
## 1                 0.459

But remember, \(\beta_{0j}\) includes two components:

\[\beta_{0j} = \gamma_{00} + u_{0j}\]

\(\gamma_{00}\) is the overall (grand) intercept, while \(u_{0j}\) is the deviation in the intercept for the \(j^th\) entity from the overall intercept. We see these values listed in the Random effects portion of the summary output, which represent \(\sigma^2\) and \(\tau_{00}\) respectively.

We can also call the re_var function from sjstats to display in a tidy way…

re_var(intercept.model)
##       Within-group-variance:    0.001
##      Between-group-variance:    0.001 (Location)

It might seem like there is little within-group variance (\(\sigma^2\)) for the mean survival rate across MSAs over time, nor much deviance from the expected mean between MSAs (\(\tau_{00}\)).

This would be a misleading conclusion.

What we are really after is how much of the variance in \(\beta_{0j}\) is a function of two components. For that, we can use the Interclass Correlation Coeffecient, which quantifies the amount of variance due to differences among the Level 2 entities (\(j\)).

\[ICC = \frac{\tau_{00}} {[\tau_{00} + \sigma^2]}\]

ICC is the ratio of the deviation of each \(j\) intercept from the overall intercept (\(\tau_{00}\)) to the total variance in \(\beta_{0j}\) (\(\tau_{00} + \sigma^2\)).

Lets make a function to calculate the ICC values, because we are going to come back to it again.

# This function takes a specified lmer model as input, retrieves the estimate
#  of tau_(00) and sigma^2, and then calculates the model ICC value. The 
#  function uses the `get_re_var` function from the `sjsplot` package.

getICC <- function(x) {
  tau <- get_re_var(x, "tau.00")
  sigma <- get_re_var(x, "sigma_2")
  icc <- tau / (tau + sigma)
}

Now we call our function and pass it our model results.

intercept.ICC <- getICC(intercept.model)
intercept.ICC
##  Location 
## 0.4308761

ICC values range from 0 to 1. Technically, any number over 0 suggests that there is meaningful differences in the intercepts between \(j's\). Our value of .43 suggests substantial differences, and that we should model those differences (as opposed to, say, using a pooled model).

Note that there is another interpretation of ICC, which would be the correlation among \(i\) observation within each \(j\) entity. This perspective on ICC is often used to justify aggregating Level 1 observations to model a Level 2 outcome variable.

Random Intercept/Fixed Slope

Our next model adds a Level 1 covariate to the model, but it’s not going to be our focal gdp_pc variable. We need to talk about about time really quick…

ggplot(survival.df, aes(y = Survival_Rate, x = Year)) + 
  geom_line(aes(group = MSA), alpha = 1/10) + 
  geom_smooth(method = "loess", se = FALSE)

As we can see, there is a significant variation in survival rate over time and across MSAs, and we need to account for that. In our econometric model earlier, we included a Year fixed effect, which is what we are going to do here.

In a random intercept-fixed slope model, we are allowing the intercept for survival rate to vary across MSAs, but holding constant the effect of a Level 1 predictor between MSAs. Effectively, this means that there will be one estimated slope that exists across all MSAs for each year.

Level 1 \[y_{ij} = \beta_{0j} + \beta_{1j}x_{ij} + \epsilon_{ij}\]

Level 2

\[\beta_{0j} = \gamma_{00} + u_{0j}\] \[\beta_{1j} = \gamma_{10}\]

Notice that unlike our earlier equation, the estimate for \(\beta_{1j}\) does not include any \(j\)-level variance from that estimate (\(u_{1j}\)). \(\beta_{1j}\) is going to be the same for each \(j\).

# Notice that I'm asking R to treat `Year` as a factor variable, 
#  so it becomes a categorical variable in the model; we wouldn't 
#  want to treat it as a continuous variable!

summary(lmer(Survival_Rate ~ factor(Year) + (1 | Location), data = survival.df))
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Survival_Rate ~ factor(Year) + (1 | Location)
##    Data: survival.df
## 
## REML criterion at convergence: -2897.3
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.4870 -0.5464 -0.0008  0.5647  4.6332 
## 
## Random effects:
##  Groups   Name        Variance  Std.Dev.
##  Location (Intercept) 0.0006072 0.02464 
##  Residual             0.0003684 0.01919 
## Number of obs: 624, groups:  Location, 39
## 
## Fixed effects:
##                    Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)       4.582e-01  5.001e-03  8.928e+01  91.612  < 2e-16 ***
## factor(Year)2002 -3.513e-04  4.346e-03  5.700e+02  -0.081   0.9356    
## factor(Year)2003 -4.387e-03  4.346e-03  5.700e+02  -1.009   0.3132    
## factor(Year)2004 -6.856e-03  4.346e-03  5.700e+02  -1.577   0.1152    
## factor(Year)2005  2.098e-02  4.346e-03  5.700e+02   4.826 1.79e-06 ***
## factor(Year)2006  2.103e-02  4.346e-03  5.700e+02   4.838 1.69e-06 ***
## factor(Year)2007  2.373e-02  4.346e-03  5.700e+02   5.459 7.15e-08 ***
## factor(Year)2008  2.668e-02  4.346e-03  5.700e+02   6.139 1.56e-09 ***
## factor(Year)2009  1.037e-02  4.346e-03  5.700e+02   2.387   0.0173 *  
## factor(Year)2010  6.474e-03  4.346e-03  5.700e+02   1.490   0.1369    
## factor(Year)2011 -1.765e-02  4.346e-03  5.700e+02  -4.062 5.56e-05 ***
## factor(Year)2012 -2.572e-02  4.346e-03  5.700e+02  -5.916 5.68e-09 ***
## factor(Year)2013 -3.018e-02  4.346e-03  5.700e+02  -6.945 1.04e-11 ***
## factor(Year)2014 -3.365e-02  4.346e-03  5.700e+02  -7.743 4.46e-14 ***
## factor(Year)2015  1.608e-03  4.346e-03  5.700e+02   0.370   0.7116    
## factor(Year)2016  2.797e-02  4.346e-03  5.700e+02   6.435 2.62e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Note that we get the same model from a random effects model using plm, and again adding a factor variable for Year.

summary(plm(Survival_Rate ~ factor(Year), data = survival.df, 
            index = c("Location", "Year"), model = "random"))
## Oneway (individual) effect Random Effect Model 
##    (Swamy-Arora's transformation)
## 
## Call:
## plm(formula = Survival_Rate ~ factor(Year), data = survival.df, 
##     model = "random", index = c("Location", "Year"))
## 
## Balanced Panel: n = 39, T = 16, N = 624
## 
## Effects:
##                     var   std.dev share
## idiosyncratic 0.0003684 0.0191935 0.378
## individual    0.0006072 0.0246407 0.622
## theta: 0.8089
## 
## Residuals:
##       Min.    1st Qu.     Median    3rd Qu.       Max. 
## -0.0665664 -0.0118261  0.0004099  0.0109217  0.0892901 
## 
## Coefficients:
##                     Estimate  Std. Error t-value  Pr(>|t|)    
## (Intercept)       0.45818974  0.00500143 91.6118 < 2.2e-16 ***
## factor(Year)2002 -0.00035128  0.00434648 -0.0808    0.9356    
## factor(Year)2003 -0.00438718  0.00434648 -1.0094    0.3132    
## factor(Year)2004 -0.00685641  0.00434648 -1.5775    0.1152    
## factor(Year)2005  0.02097692  0.00434648  4.8262 1.762e-06 ***
## factor(Year)2006  0.02102821  0.00434648  4.8380 1.664e-06 ***
## factor(Year)2007  0.02372821  0.00434648  5.4592 6.976e-08 ***
## factor(Year)2008  0.02668205  0.00434648  6.1388 1.501e-09 ***
## factor(Year)2009  0.01037436  0.00434648  2.3868    0.0173 *  
## factor(Year)2010  0.00647436  0.00434648  1.4896    0.1369    
## factor(Year)2011 -0.01765385  0.00434648 -4.0616 5.511e-05 ***
## factor(Year)2012 -0.02571538  0.00434648 -5.9164 5.497e-09 ***
## factor(Year)2013 -0.03018462  0.00434648 -6.9446 9.778e-12 ***
## factor(Year)2014 -0.03365385  0.00434648 -7.7428 4.081e-14 ***
## factor(Year)2015  0.00160769  0.00434648  0.3699    0.7116    
## factor(Year)2016  0.02796923  0.00434648  6.4349 2.506e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    0.46381
## Residual Sum of Squares: 0.22398
## R-Squared:      0.51708
## Adj. R-Squared: 0.50517
## F-statistic: 43.4014 on 15 and 608 DF, p-value: < 2.22e-16

Now lets add our predictor, logGDP_PC. Here we again allow the intercept for MSAs to vary, but we’re holding the effect (slope) of logGDP_PC constant across all MSAs. Effectively we’re saying that there is no meaningful difference in the \(\beta_{1j}\) estimate for logGDP_PC between MSA.

summary(lmer(Survival_Rate ~ logGDP_PC + factor(Year) + (1 | Location), data = survival.df))
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Survival_Rate ~ logGDP_PC + factor(Year) + (1 | Location)
##    Data: survival.df
## 
## REML criterion at convergence: -2959.4
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.3979 -0.5513  0.0157  0.5181  4.5041 
## 
## Random effects:
##  Groups   Name        Variance  Std.Dev.
##  Location (Intercept) 0.0007308 0.02703 
##  Residual             0.0003231 0.01798 
## Number of obs: 624, groups:  Location, 39
## 
## Fixed effects:
##                    Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)      -6.957e-01  1.336e-01  2.633e+02  -5.208 3.85e-07 ***
## logGDP_PC         1.066e-01  1.233e-02  2.640e+02   8.644 5.31e-16 ***
## factor(Year)2002 -8.555e-04  4.071e-03  5.629e+02  -0.210  0.83364    
## factor(Year)2003 -7.063e-03  4.083e-03  5.652e+02  -1.730  0.08417 .  
## factor(Year)2004 -1.227e-02  4.119e-03  5.720e+02  -2.980  0.00300 ** 
## factor(Year)2005  1.271e-02  4.182e-03  5.823e+02   3.040  0.00247 ** 
## factor(Year)2006  1.099e-02  4.233e-03  5.893e+02   2.596  0.00967 ** 
## factor(Year)2007  1.332e-02  4.245e-03  5.908e+02   3.137  0.00179 ** 
## factor(Year)2008  1.806e-02  4.191e-03  5.837e+02   4.308 1.93e-05 ***
## factor(Year)2009  7.134e-03  4.088e-03  5.663e+02   1.745  0.08149 .  
## factor(Year)2010  2.518e-03  4.096e-03  5.679e+02   0.615  0.53901    
## factor(Year)2011 -2.273e-02  4.113e-03  5.710e+02  -5.526 4.99e-08 ***
## factor(Year)2012 -3.193e-02  4.134e-03  5.747e+02  -7.723 5.07e-14 ***
## factor(Year)2013 -3.770e-02  4.162e-03  5.794e+02  -9.056  < 2e-16 ***
## factor(Year)2014 -4.257e-02  4.200e-03  5.849e+02 -10.138  < 2e-16 ***
## factor(Year)2015 -9.445e-03  4.267e-03  5.932e+02  -2.214  0.02723 *  
## factor(Year)2016  1.572e-02  4.311e-03  5.974e+02   3.646  0.00029 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Note that we get effectively the same estimates from our plm random effects model…

summary(plm(Survival_Rate ~ logGDP_PC + factor(Year), data = survival.df, 
            index = c("Location", "Year"), model = "random"))
## Oneway (individual) effect Random Effect Model 
##    (Swamy-Arora's transformation)
## 
## Call:
## plm(formula = Survival_Rate ~ logGDP_PC + factor(Year), data = survival.df, 
##     model = "random", index = c("Location", "Year"))
## 
## Balanced Panel: n = 39, T = 16, N = 624
## 
## Effects:
##                     var   std.dev share
## idiosyncratic 0.0003209 0.0179145 0.365
## individual    0.0005594 0.0236522 0.635
## theta: 0.814
## 
## Residuals:
##        Min.     1st Qu.      Median     3rd Qu.        Max. 
## -5.9787e-02 -1.1551e-02 -1.8953e-06  1.0095e-02  8.3994e-02 
## 
## Coefficients:
##                     Estimate  Std. Error t-value  Pr(>|t|)    
## (Intercept)      -0.63613280  0.12903969 -4.9297 1.064e-06 ***
## logGDP_PC         0.10110088  0.01191327  8.4864 < 2.2e-16 ***
## factor(Year)2002 -0.00082945  0.00410652 -0.2020 0.8399968    
## factor(Year)2003 -0.00692470  0.00411701 -1.6820 0.0930884 .  
## factor(Year)2004 -0.01199466  0.00415053 -2.8899 0.0039912 ** 
## factor(Year)2005  0.01313746  0.00420876  3.1215 0.0018851 ** 
## factor(Year)2006  0.01150688  0.00425666  2.7033 0.0070584 ** 
## factor(Year)2007  0.01385349  0.00426782  3.2460 0.0012347 ** 
## factor(Year)2008  0.01850148  0.00421777  4.3866 1.358e-05 ***
## factor(Year)2009  0.00730163  0.00412207  1.7714 0.0770041 .  
## factor(Year)2010  0.00272229  0.00412987  0.6592 0.5100358    
## factor(Year)2011 -0.02246503  0.00414509 -5.4197 8.623e-08 ***
## factor(Year)2012 -0.03160446  0.00416436 -7.5893 1.217e-13 ***
## factor(Year)2013 -0.03730796  0.00419105 -8.9018 < 2.2e-16 ***
## factor(Year)2014 -0.04211357  0.00422541 -9.9667 < 2.2e-16 ***
## factor(Year)2015 -0.00887458  0.00428789 -2.0697 0.0389045 *  
## factor(Year)2016  0.01634893  0.00432843  3.7771 0.0001743 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    0.46308
## Residual Sum of Squares: 0.19957
## R-Squared:      0.56904
## Adj. R-Squared: 0.55768
## F-statistic: 50.0927 on 16 and 607 DF, p-value: < 2.22e-16

Take a look at the intercept though…that seems a little weird, but it’s understandable because that is the expected survival rate when logGDP_PC equals zero. Except logGDP_PC never equals zero, and it would be a bad thing if it did!

So we’re going to make a zero value of logGDP_PC meaningful by group mean centering logGDP_PC. This means that the value of \(\beta_{0j}\) is the expected intercept for each \(j\) when logGDP_PC is at its mean value (in our case, 0).

survival.df <- survival.df %>% 
  group_by(Location) %>% 
    mutate(AveGDP = mean(logGDP_PC)) %>% 
  mutate(cenGDP = logGDP_PC - AveGDP)

survival.df
## # A tibble: 624 x 8
## # Groups:   Location [39]
##    Location  Year MSA       Survival_Rate GDP_PC logGDP_PC AveGDP   cenGDP
##       <int> <int> <fct>             <dbl>  <int>     <dbl>  <dbl>    <dbl>
##  1    12060  2001 Atlanta-…         0.445  57356      11.0   10.9  0.0504 
##  2    12060  2002 Atlanta-…         0.435  56423      10.9   10.9  0.0340 
##  3    12060  2003 Atlanta-…         0.428  56317      10.9   10.9  0.0321 
##  4    12060  2004 Atlanta-…         0.432  56654      10.9   10.9  0.0381 
##  5    12060  2005 Atlanta-…         0.465  57646      11.0   10.9  0.0554 
##  6    12060  2006 Atlanta-…         0.453  57101      11.0   10.9  0.0459 
##  7    12060  2007 Atlanta-…         0.464  57367      11.0   10.9  0.0506 
##  8    12060  2008 Atlanta-…         0.478  54850      10.9   10.9  0.00570
##  9    12060  2009 Atlanta-…         0.455  51437      10.8   10.9 -0.0585 
## 10    12060  2010 Atlanta-…         0.448  50894      10.8   10.9 -0.0692 
## # ... with 614 more rows

Now lets call our model…

random_fixed.model <- lmer(Survival_Rate ~ cenGDP + factor(Year) + (1 | Location), 
                           data = survival.df)
summary(random_fixed.model)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Survival_Rate ~ cenGDP + factor(Year) + (1 | Location)
##    Data: survival.df
## 
## REML criterion at convergence: -2970.3
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.3438 -0.5463  0.0142  0.4983  4.3646 
## 
## Random effects:
##  Groups   Name        Variance  Std.Dev.
##  Location (Intercept) 0.0006101 0.02470 
##  Residual             0.0003209 0.01791 
## Number of obs: 624, groups:  Location, 39
## 
## Fixed effects:
##                    Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)       4.666e-01  4.969e-03  8.713e+01  93.886  < 2e-16 ***
## cenGDP            1.371e-01  1.485e-02  5.690e+02   9.236  < 2e-16 ***
## factor(Year)2002 -9.998e-04  4.057e-03  5.690e+02  -0.246 0.805454    
## factor(Year)2003 -7.829e-03  4.074e-03  5.690e+02  -1.922 0.055148 .  
## factor(Year)2004 -1.383e-02  4.126e-03  5.690e+02  -3.350 0.000860 ***
## factor(Year)2005  1.034e-02  4.217e-03  5.690e+02   2.453 0.014462 *  
## factor(Year)2006  8.115e-03  4.291e-03  5.690e+02   1.891 0.059110 .  
## factor(Year)2007  1.034e-02  4.308e-03  5.690e+02   2.399 0.016757 *  
## factor(Year)2008  1.559e-02  4.231e-03  5.690e+02   3.684 0.000251 ***
## factor(Year)2009  6.207e-03  4.082e-03  5.690e+02   1.521 0.128907    
## factor(Year)2010  1.386e-03  4.094e-03  5.690e+02   0.338 0.735146    
## factor(Year)2011 -2.418e-02  4.118e-03  5.690e+02  -5.872 7.34e-09 ***
## factor(Year)2012 -3.370e-02  4.148e-03  5.690e+02  -8.125 2.80e-15 ***
## factor(Year)2013 -3.985e-02  4.189e-03  5.690e+02  -9.511  < 2e-16 ***
## factor(Year)2014 -4.513e-02  4.243e-03  5.690e+02 -10.636  < 2e-16 ***
## factor(Year)2015 -1.261e-02  4.339e-03  5.690e+02  -2.906 0.003805 ** 
## factor(Year)2016  1.221e-02  4.401e-03  5.690e+02   2.774 0.005716 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This model has an interesting feature. Compare the estimate for cenGDP above with our fixed effect model using plm and logGDP_PC

summary(fixed.year.model)
## Twoways effects Within Model
## 
## Call:
## plm(formula = Survival_Rate ~ logGDP_PC, data = survival.df, 
##     effect = c("twoways"), model = "within", index = c("Location", 
##         "Year"))
## 
## Balanced Panel: n = 39, T = 16, N = 624
## 
## Residuals:
##        Min.     1st Qu.      Median     3rd Qu.        Max. 
## -5.9978e-02 -9.7363e-03 -8.0923e-05  9.0096e-03  7.8115e-02 
## 
## Coefficients:
##           Estimate Std. Error t-value  Pr(>|t|)    
## logGDP_PC 0.137117   0.014846  9.2356 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    0.20998
## Residual Sum of Squares: 0.18261
## R-Squared:      0.13036
## Adj. R-Squared: 0.047833
## F-statistic: 85.2971 on 1 and 569 DF, p-value: < 2.22e-16

The multilevel model with the centered Level 1 predictor (cenGDP) is also called a within-effects model. The group mean centering process de-means the value of the Level 1 predictor by group, which is computationally the same method used in econometric “fixed effect” models to estimate the effect of \(x\) on \(y\).

In this case though, we are still constraining the slope to be equivalent across all \(j's\) (MSAs). But this has the advantage of yielding a consistent estimate for the effect of GDP on survivability, assuming that there is meaningful unobserved heterogeneity at the MSA level that might correlate with GDP.

That said though, the \(\beta_{ij}\) estimate for cenGDP only represents the within MSA effect over time. So the value of 0.137117 represents a log change in survival rate over time for an average MSA over the time period in the sample.

The caveat though is that while the model yields a “fixed effect” estimate for GDP, it applies only to that variable. If we had other Level 1 predictors, we would need to do the same group mean centering approach.

Ok, last up, lets check our ICC for the model…

random_fixed.ICC <- getICC(random_fixed.model)
random_fixed.ICC
##  Location 
## 0.6553081

Random Intercept/Random Slope

And now our last model. Here we are going to estimate \(\beta_{ij}\) by including both the average effect of GDP on survival, but also allowing \(\beta\) to vary across MSAs…

Level 2

\[\beta_{1j} = \gamma_{10} + u_{1j}\]

Here is where things break down between the economics perspective and the psychology perspective

In real life, it is not likely that the effect (slope) of GDP on survival rate is the same for each MSA. Just looking at the plot, we can see that the relationship varies substantially over time and over MSA…

ggplot(survival.df, aes(y = logGDP_PC, x = Year)) + 
  geom_line(aes(group = MSA), alpha = 1/10) + 
  geom_smooth(method = "loess", se = FALSE)

The economists argue, well, so what? Even if they do vary, if the differences are non-randomly distributed, than the estimate of \(\beta\) will be inconsistent, so better to use the big hammer of the fixed effect specification.

The psychology folks say that a little bit of inconsistency is better to yield a “more realistic” estimate of the multilevel effect. Plus with a fixed effect model, you eliminate any chance of a cross-level interaction. And with group mean centering or hybrid models, you get all the benefits of multilevel modeling without having to use such a big hammer.

I’m going to talk more about hybrid models in a future post, but for right now, lets go with the psychologists. Oh, we’re going to use our group mean centered variable for GDP. More on that later.

random_random.model <- lmer(Survival_Rate ~ cenGDP + factor(Year) + 
                              (cenGDP | Location), 
                           data = survival.df)
summary(random_random.model)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Survival_Rate ~ cenGDP + factor(Year) + (cenGDP | Location)
##    Data: survival.df
## 
## REML criterion at convergence: -3033.5
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.1672 -0.5246 -0.0524  0.5181  3.0289 
## 
## Random effects:
##  Groups   Name        Variance  Std.Dev. Corr 
##  Location (Intercept) 0.0006137 0.02477       
##           cenGDP      0.0187683 0.13700  -0.15
##  Residual             0.0002632 0.01622       
## Number of obs: 624, groups:  Location, 39
## 
## Fixed effects:
##                    Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)        0.467180   0.004880  81.337871  95.724  < 2e-16 ***
## cenGDP             0.165264   0.029152  49.799457   5.669 7.20e-07 ***
## factor(Year)2002  -0.001082   0.003680 537.070132  -0.294 0.768877    
## factor(Year)2003  -0.008169   0.003715 540.896025  -2.199 0.028321 *  
## factor(Year)2004  -0.014887   0.003805 548.085061  -3.913 0.000103 ***
## factor(Year)2005   0.008707   0.003946 554.954153   2.207 0.027747 *  
## factor(Year)2006   0.006127   0.004043 557.753848   1.515 0.130271    
## factor(Year)2007   0.008015   0.004063 558.732026   1.973 0.049033 *  
## factor(Year)2008   0.014600   0.003967 558.572742   3.681 0.000255 ***
## factor(Year)2009   0.006539   0.003780 552.807445   1.730 0.084259 .  
## factor(Year)2010   0.001750   0.003810 556.482655   0.459 0.646261    
## factor(Year)2011  -0.023465   0.003846 558.180349  -6.101 1.97e-09 ***
## factor(Year)2012  -0.033477   0.003908 562.348528  -8.566  < 2e-16 ***
## factor(Year)2013  -0.039897   0.003964 563.541370 -10.064  < 2e-16 ***
## factor(Year)2014  -0.045707   0.004052 566.512189 -11.281  < 2e-16 ***
## factor(Year)2015  -0.013902   0.004177 567.946454  -3.328 0.000932 ***
## factor(Year)2016   0.011048   0.004261 568.607088   2.593 0.009758 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Ok, we’ve got a series now of variance parameters…

re_var(random_random.model)
##       Within-group-variance:    0.000
##      Between-group-variance:    0.001 (Location)
##       Random-slope-variance:    0.019 (Location.cenGDP)
##  Slope-Intercept-covariance:   -0.001 (Location.(Intercept))
## Slope-Intercept-correlation:   -0.150 (Location)

The Random-slope-variance is also \(\tau_{11}\). This is the estimated random-slope variance (\(u_{1j}\)), or deviance in the slope for the \(j^th\) entity from the mean estimated \(beta\).

But because we used a group mean centered variable, it is also intuitively interpreted as the average deviation from the mean effect of GDP on survival rate for an average \(j\). So there is some additional variance here, and we can see that in our new ICC value…

random_random.ICC <- getICC(random_random.model)
random_random.ICC
##  Location 
## 0.6998542

Lets also take a closer look at \(\rho_{01}\), which is the correlation between the random intercepts (\(\tau_{00}\)) and random slopes (\(\tau_{11}\)).

get_re_var(random_random.model, "rho.01")
##   Location 
## -0.1502609

A value greater than zero suggests that MSAs with steeper slopes (larger \(\beta\) for GDP) will also have a higher expected mean level of survival rate. A value below zero suggests that (larger \(\beta\) for GDP) will have a lower expected mean level of survival rate.

The big question is whether the correlation is statistically different from zero. This is actually a non-trivial problem, but we’re going to punt on that for now.




Wrap-up

This post really just scratches the surface on multilevel models, which is really a favorite of mine. The hope though is that the post provides a handy guide to reconcile the notation and modeling differences between economics and psychology.