Bayesian multilevel modeling with brms

· by Brian Anderson · Read in about 38 min · (7887 words) ·

This is a Bayesian perspective on my last post.

I’m going to use the same data, and ask the same question…

Does being in a wealthier community improve the surviability of new businesses?


Data

I’m using two data sources…

I’ve already worked through how to tidy and join both datasets, so I’m just going to use a quick script…

# Load two datasets
library(tidyverse)
kauffman.ds <- read_csv("https://www.drbanderson.com/data/MainStreetEntrepreneurship.csv")
pcgdp.ds <- read_csv("https://www.drbanderson.com/data/gmpPCGDP.csv")

# Filter and Tidy the Kauffman Data we need
survive.df <- kauffman.ds %>% 
  select(Location, 'Index Year', Location_name, Survival_Rate) %>% 
  rename(Year = 'Index Year') %>% 
  filter(!is.na(Survival_Rate)) %>% 
  arrange(Location_name, Year)

# Tidy the GDP data from BEA
pcgdp.df <- pcgdp.ds %>% 
  filter(!is.na(GeoName)) %>% 
  select(GeoFIPS, GeoName, `2001`:`2016`) %>% 
  gather(Year, GDP_PC, `2001`:`2016`) %>% 
  filter(GeoFIPS != "00998") %>% 
  mutate(GeoName = str_remove(GeoName, ",.*")) %>% 
  rename(Location = GeoFIPS,
         Location_name = GeoName) %>% 
  mutate(Location = as.integer(Location),
         Year = as.integer(Year)) %>% 
  filter(Year >= 2001) %>% 
  filter(Location > 56) %>% 
  arrange(Location_name, Year)

# Join the data
survival.df <- inner_join(survive.df, pcgdp.df, by = c("Location", "Year"))

# Tidy the joined data
survival.df <- survival.df %>% 
  select(-Location_name.y) %>% 
  rename(MSA = Location_name.x) %>% 
  mutate(MSA = as.factor(MSA),
         Year = as.factor(Year)) %>% 
  mutate(logGDP_PC = log(GDP_PC)) %>% 
  
  # Create our group-mean centered GDP variable
  group_by(Location) %>% 
    mutate(AveGDP = mean(logGDP_PC)) %>% 
    ungroup() %>% 
  mutate(cenGDP = logGDP_PC - AveGDP)

Our outcome will be the five-year rate of new business survival, Survival_Rate, and our predictor will be the log-transformed value of GDP per capita, logGDP_PC.


Packages

There are several packages for fitting Bayesian multilevel models in R. Both of my favorites use Stan for the back-end.

The brms package offers more flexibility in model fitting, assumptions, and in specifying more complicated models. One very handy feature of both packages is that they use the lme4 syntax to specify multilevel models. I prefer the brms package, and so that’s what I’m going to use here.

I’m also going to use some visualizations from sjplot that take brms model objects as inputs.

library(brms)
library(sjPlot)
library(sjstats)

Bayesian multilevel models

Just to recap, here are our two-level equations…

Level 1 \[y_{ij} = \beta_{0j} + \beta_{1j}x_{ij} + \epsilon_{ij}\]

  • \(y\) = Value of the \(i^{th}\) observation of \(y\) for the \(j^{th}\) entity
  • \(\beta_{0j}\) = The value of \(y\) (intercept) for the \(j^{th}\) entity
  • \(\beta_{1j}\) = The expected change in \(y\) (slope) for a one unit change in \(x\) for the \(j^{th}\) entity
  • \(x\) = Value of the \(i^{th}\) observation of \(x\) for the \(j^{th}\) entity
  • \(\epsilon_{ij}\) = The variance in the \(i^{th}\) observation of \(y\) for the \(j^{th}\) entity that is not explained by the Level 1 equation (sometimes called \(r_{ij}\))

Level 2

\[\beta_{0j} = \gamma_{00} + u_{0j}\]

  • \(\gamma_{00}\) = The overall intercept, also called the grand mean; the expected value of \(y\) for all \(i,j\) observations when all Level 1 and Level 2 predictors equal zero
  • \(u_{0j}\) = The deviation in the intercept for the \(j^{th}\) entity from the overall intercept (\(\gamma_{00}\))

\[\beta_{1j} = \gamma_{10} + u_{1j}\]

  • \(\gamma_{10}\) = The expected change (slope) in \(y\) for an average \(j\) across observations \(i\)
  • \(u_{1j}\) = The deviation in the slope for the \(j^{th}\) entity from the overall (average) slope (\(\gamma_{10}\))

The Bayesian notation isn’t all that different…

Level 1

\[y_{ij} \sim \mathcal{N}(\beta_{0j} + \beta_{1j}x_{ij}, \sigma^2_{y})\]

Level 2

\[\beta_{0j} \sim \mathcal{N}(\gamma_{00} + u_{0j}, \sigma^2_{\beta})\]

\[\beta_{1j} \sim \mathcal{N}(\gamma_{10} + u_{1j}, \sigma^2_{\beta})\]


Priors

Lets revisit the variance parameters in multilevel models…

  • \(\sigma^2\) = Estimated within-group variance (\(\epsilon_{ij}\))
  • \(\tau_{00}\) = Estimated between-group variance (\(u_{0j}\))
  • \(\tau_{11}\) = Estimated random-slope variance (\(u_{1j}\))
  • \(\tau_{01}\) = Estimated random-intercept / random-slope covariance
  • \(\rho_{01}\) = Estimated random-intercept / random-slope correlation

In addition to the various \(\beta's\), these are all of the parameters we can—and typically do—estimate, and these are parameters of most interest in a multilevel model.

One of the things that makes Bayesian multilevel modeling so flexible is that we can apply a prior to each of these parameters.

By default, brms and rstanarm use weakly informed priors on the variance parameters. Weakly informed priors help parameterization and model fitting, while incorporating uncertainty about the model into the hierarchical structure.

The brms package—and I’m not wild about this part—uses improper flat priors on each of the \(\beta\) parameters.

Now, ideally, we would specify a prior for each \(\beta\) based on past research, perhaps an earlier analysis, or so forth. I also think that rarely would it be appropriate to specify a model with an uninformed (flat) prior.

In the interest of parsimony, I’m going to specify a uniform (Gaussian) prior on our \(\beta\) parameters, with mean of zero and standard deviation of 5.

my.prior <- set_prior("normal(0,5)", class = "b")

In practice, this would look like…

set.seed(1)
dist <- data_frame(beta = rnorm(1000, 0, 5))
ggplot(dist, aes(x = beta)) + 
  geom_density() + 
  theme_bw()

Which has the advantage of giving roughly equal weight to value below zero (the mean) as values above, although would generally penalize extreme values of \(\beta\).


Models

Lets walk through specifying some models that we’ve seen before.

Intercept only model

Lets start again with our intercept only model.

# The `brm` function uses a syntax very identical in most cases to `lem4`. 
#  I've constrained this model to just draw 2 MCMC chains.

# NOTE---In 'real life` you would want to stick to the default of 4 
#  MCMC chains for stability.

intercept.model <- brm(Survival_Rate ~ (1 | Location), 
                       data = survival.df, chains = 2)
## 
## SAMPLING FOR MODEL 'dfaae018950a0fbfcc07c481c492a814' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0.000168 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.68 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 2.46184 seconds (Warm-up)
## Chain 1:                0.498546 seconds (Sampling)
## Chain 1:                2.96039 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL 'dfaae018950a0fbfcc07c481c492a814' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 3.4e-05 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.34 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 2.51857 seconds (Warm-up)
## Chain 2:                0.430611 seconds (Sampling)
## Chain 2:                2.94918 seconds (Total)
## Chain 2:

Our model gives us an output very similar to lme4

summary(intercept.model)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: Survival_Rate ~ (1 | Location) 
##    Data: survival.df (Number of observations: 624) 
## Samples: 2 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 2000
## 
## Group-Level Effects: 
## ~Location (Number of levels: 39) 
##               Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sd(Intercept)     0.03      0.00     0.02     0.03        299 1.00
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept     0.46      0.00     0.45     0.47        217 1.02
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sigma     0.03      0.00     0.03     0.03       2021 1.00
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

The Population-Level Effects is the expected grand mean of \(y\) (Survival_Rate) when the predictors are zero, across all \(j's\) (MSAs).

But remember, in Bayesian modeling, parameter estimates follow a distribution, and we find the standard deviation of the intercept in our Group-Level Effects.

Recall also that \(\beta_{0j}\) includes two components:

\[\beta_{0j} \sim \mathcal{N}(\gamma_{00} + u_{0j}, \sigma^2_{\beta})\]

One nice feature of brms and sjplot is the ability to easily visualize \(u_{0j}\) for each \(j\)—the deviation of the expected posterior distribution of Survival_Rate for each \(j\)

plot_model(intercept.model, type = "re") + 
  theme_bw()

There looks to be substantial variance in the posterior distribution for the intercepts across \(j's\). We can use the Bayesian version of the ICC to quantify that variance…

icc(intercept.model)
## 
## # Random Effect Variances and ICC
## 
## Family: gaussian (identity)
## 
## ## Location
##           ICC: 0.45  HDI 89%: [0.35 0.55]
## Between-group: 0.00  HDI 89%: [0.00 0.00]
## 
## ## Residuals
## Within-group: 0.00  HDI 89%: [0.00 0.00]

Random intercept/fixed slope

So what about time? As we did with our frequentist model, we should accommodate variation in our model as a function of time, but there is little need to estimate time with varying slope distributions. So lets specify a model with a year population-level parameter.

# Now we are going to incorporate our prior across all estimated beta's,
#  in the model, following a Gaussian distribution with mean 0 and 
#  standard deviation 5

random_year.model <- brm(Survival_Rate ~ Year + (1 | Location), 
                         data = survival.df, chains = 2, prior = my.prior)
## 
## SAMPLING FOR MODEL '963545120edd00c0205945b7aa94c932' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0.000216 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 2.16 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 8.35662 seconds (Warm-up)
## Chain 1:                1.58326 seconds (Sampling)
## Chain 1:                9.93988 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL '963545120edd00c0205945b7aa94c932' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 7.9e-05 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.79 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 7.17441 seconds (Warm-up)
## Chain 2:                1.68068 seconds (Sampling)
## Chain 2:                8.85509 seconds (Total)
## Chain 2:

Now lets take a look at our results…

summary(random_year.model)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: Survival_Rate ~ Year + (1 | Location) 
##    Data: survival.df (Number of observations: 624) 
## Samples: 2 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 2000
## 
## Group-Level Effects: 
## ~Location (Number of levels: 39) 
##               Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sd(Intercept)     0.03      0.00     0.02     0.03        147 1.03
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept     0.46      0.01     0.45     0.47        140 1.01
## Year2002     -0.00      0.00    -0.01     0.01        389 1.00
## Year2003     -0.00      0.00    -0.01     0.00        341 1.01
## Year2004     -0.01      0.00    -0.02     0.00        342 1.00
## Year2005      0.02      0.00     0.01     0.03        344 1.01
## Year2006      0.02      0.00     0.01     0.03        310 1.00
## Year2007      0.02      0.00     0.01     0.03        367 1.01
## Year2008      0.03      0.00     0.02     0.04        345 1.01
## Year2009      0.01      0.00     0.00     0.02        331 1.01
## Year2010      0.01      0.00    -0.00     0.02        369 1.00
## Year2011     -0.02      0.00    -0.03    -0.01        348 1.00
## Year2012     -0.03      0.00    -0.03    -0.02        318 1.00
## Year2013     -0.03      0.00    -0.04    -0.02        352 1.00
## Year2014     -0.03      0.00    -0.04    -0.02        329 1.01
## Year2015      0.00      0.00    -0.01     0.01        331 1.00
## Year2016      0.03      0.00     0.02     0.04        348 1.01
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sigma     0.02      0.00     0.02     0.02        979 1.00
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

So what would visualizing these posterior distributions look like?

plot_model(random_year.model) + 
  theme_bw()

Remember though, we are treating Year as having a constant posterior distribution across all MSAs in the model, so there are no additional random-effects specified beyond the intercept…

plot_model(random_year.model, type = "re")

Next lets just jump right in to including our centered version of GDP per capita. We already know that this distribution represents the within-effect estimate of the posterior distribution, and the posterior distribution is orthogonal to the \(j\)-level disturbance distribution.

# Again we  incorporate our prior across all estimated beta's in the model, 
#  following a Gaussian distribution with mean 0 and standard deviation 5

random_fixed.model <- brm(Survival_Rate ~ cenGDP + Year + (1 | Location), 
                          data = survival.df, chains = 2, prior = my.prior)
## 
## SAMPLING FOR MODEL '963545120edd00c0205945b7aa94c932' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0.000138 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.38 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 6.95257 seconds (Warm-up)
## Chain 1:                2.05386 seconds (Sampling)
## Chain 1:                9.00643 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL '963545120edd00c0205945b7aa94c932' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 0.000122 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 1.22 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 6.52695 seconds (Warm-up)
## Chain 2:                2.11944 seconds (Sampling)
## Chain 2:                8.64639 seconds (Total)
## Chain 2:

And the model results…

summary(random_fixed.model)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: Survival_Rate ~ cenGDP + Year + (1 | Location) 
##    Data: survival.df (Number of observations: 624) 
## Samples: 2 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 2000
## 
## Group-Level Effects: 
## ~Location (Number of levels: 39) 
##               Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sd(Intercept)     0.03      0.00     0.02     0.03        109 1.01
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept     0.47      0.00     0.46     0.48         96 1.01
## cenGDP        0.14      0.01     0.11     0.16        948 1.00
## Year2002     -0.00      0.00    -0.01     0.01        684 1.00
## Year2003     -0.01      0.00    -0.02    -0.00        515 1.01
## Year2004     -0.01      0.00    -0.02    -0.01        486 1.01
## Year2005      0.01      0.00     0.00     0.02        605 1.00
## Year2006      0.01      0.00    -0.00     0.02        434 1.01
## Year2007      0.01      0.00     0.00     0.02        542 1.00
## Year2008      0.02      0.00     0.01     0.02        543 1.00
## Year2009      0.01      0.00    -0.00     0.01        502 1.01
## Year2010      0.00      0.00    -0.01     0.01        572 1.00
## Year2011     -0.02      0.00    -0.03    -0.02        515 1.01
## Year2012     -0.03      0.00    -0.04    -0.03        556 1.01
## Year2013     -0.04      0.00    -0.05    -0.03        536 1.00
## Year2014     -0.04      0.00    -0.05    -0.04        509 1.00
## Year2015     -0.01      0.00    -0.02    -0.00        485 1.01
## Year2016      0.01      0.00     0.00     0.02        567 1.00
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sigma     0.02      0.00     0.02     0.02       1002 1.00
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

And a visualization of the posterior distribution of the estimates…

plot_model(random_fixed.model) + 
  theme_bw()

We can also look at a marginal effect distribution, with predicted values of Survival Rate across values of cenGDP, with a 95% credibility interval around each predicted value. This represents the expected posterior distribution for the effect of GDP per capita on survival rate for an average MSA over time.

plot_model(random_fixed.model, type = "pred", terms = c("cenGDP")) + 
  theme_bw()

Just for clarification, we can see the priors we set on all of our parameters…

prior_summary(random_fixed.model)
##                  prior     class      coef    group resp dpar nlpar bound
## 1          normal(0,5)         b                                         
## 2                              b    cenGDP                               
## 3                              b  Year2002                               
## 4                              b  Year2003                               
## 5                              b  Year2004                               
## 6                              b  Year2005                               
## 7                              b  Year2006                               
## 8                              b  Year2007                               
## 9                              b  Year2008                               
## 10                             b  Year2009                               
## 11                             b  Year2010                               
## 12                             b  Year2011                               
## 13                             b  Year2012                               
## 14                             b  Year2013                               
## 15                             b  Year2014                               
## 16                             b  Year2015                               
## 17                             b  Year2016                               
## 18 student_t(3, 0, 10) Intercept                                         
## 19 student_t(3, 0, 10)        sd                                         
## 20                            sd           Location                      
## 21                            sd Intercept Location                      
## 22 student_t(3, 0, 10)     sigma

Random intercept/random slope

Ok, before we move on to the hybrid model, lets go ahead and allow for the posterior distribution around cenGDP to vary across MSAs. This is the equivalent of a random intercept/random slope model in the frequentist frame.

random_random.model <- brm(Survival_Rate ~ cenGDP + Year + AveGDP + 
                             (cenGDP | Location), 
                           data = survival.df, chains = 2, prior = my.prior)
## 
## SAMPLING FOR MODEL '47d60e8c0eaab0839b41942fab811d86' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0.000205 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 2.05 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 13.9518 seconds (Warm-up)
## Chain 1:                4.93475 seconds (Sampling)
## Chain 1:                18.8866 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL '47d60e8c0eaab0839b41942fab811d86' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 0.00011 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 1.1 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 13.9203 seconds (Warm-up)
## Chain 2:                3.69118 seconds (Sampling)
## Chain 2:                17.6115 seconds (Total)
## Chain 2:

And our model results…

summary(random_random.model)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: Survival_Rate ~ cenGDP + Year + AveGDP + (cenGDP | Location) 
##    Data: survival.df (Number of observations: 624) 
## Samples: 2 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 2000
## 
## Group-Level Effects: 
## ~Location (Number of levels: 39) 
##                       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sd(Intercept)             0.02      0.00     0.02     0.03        419 1.00
## sd(cenGDP)                0.14      0.03     0.10     0.20        894 1.00
## cor(Intercept,cenGDP)    -0.07      0.19    -0.42     0.31        823 1.01
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept     0.06      0.25    -0.44     0.53        353 1.01
## cenGDP        0.17      0.03     0.11     0.23        706 1.00
## Year2002     -0.00      0.00    -0.01     0.01       1273 1.00
## Year2003     -0.01      0.00    -0.02    -0.00        972 1.00
## Year2004     -0.01      0.00    -0.02    -0.01       1100 1.00
## Year2005      0.01      0.00     0.00     0.02       1000 1.00
## Year2006      0.01      0.00    -0.00     0.01        918 1.00
## Year2007      0.01      0.00    -0.00     0.02       1032 1.00
## Year2008      0.01      0.00     0.01     0.02        978 1.00
## Year2009      0.01      0.00    -0.00     0.01       1104 1.00
## Year2010      0.00      0.00    -0.01     0.01        998 1.00
## Year2011     -0.02      0.00    -0.03    -0.02       1103 1.00
## Year2012     -0.03      0.00    -0.04    -0.03        953 1.00
## Year2013     -0.04      0.00    -0.05    -0.03        921 1.00
## Year2014     -0.05      0.00    -0.05    -0.04        895 1.00
## Year2015     -0.01      0.00    -0.02    -0.01        885 1.00
## Year2016      0.01      0.00     0.00     0.02        823 1.00
## AveGDP        0.04      0.02    -0.01     0.08        354 1.01
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sigma     0.02      0.00     0.02     0.02       2049 1.00
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

We have another random effect parameter sd(cenGDP), which is conceptually equivalent to \(\tau_{11}\). This quantifies the expected variances in the posteriors across MSAs over time. As is clear from the plot, there is substantial variance…

plot_model(random_random.model, type = "re") + 
  theme_bw()

Now, there is a question of whether this variance is meaningful, and this is where \(\rho_{01}\) (COR[Intercept,cenGDP]) comes in…

summary(random_random.model)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: Survival_Rate ~ cenGDP + Year + AveGDP + (cenGDP | Location) 
##    Data: survival.df (Number of observations: 624) 
## Samples: 2 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 2000
## 
## Group-Level Effects: 
## ~Location (Number of levels: 39) 
##                       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sd(Intercept)             0.02      0.00     0.02     0.03        419 1.00
## sd(cenGDP)                0.14      0.03     0.10     0.20        894 1.00
## cor(Intercept,cenGDP)    -0.07      0.19    -0.42     0.31        823 1.01
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept     0.06      0.25    -0.44     0.53        353 1.01
## cenGDP        0.17      0.03     0.11     0.23        706 1.00
## Year2002     -0.00      0.00    -0.01     0.01       1273 1.00
## Year2003     -0.01      0.00    -0.02    -0.00        972 1.00
## Year2004     -0.01      0.00    -0.02    -0.01       1100 1.00
## Year2005      0.01      0.00     0.00     0.02       1000 1.00
## Year2006      0.01      0.00    -0.00     0.01        918 1.00
## Year2007      0.01      0.00    -0.00     0.02       1032 1.00
## Year2008      0.01      0.00     0.01     0.02        978 1.00
## Year2009      0.01      0.00    -0.00     0.01       1104 1.00
## Year2010      0.00      0.00    -0.01     0.01        998 1.00
## Year2011     -0.02      0.00    -0.03    -0.02       1103 1.00
## Year2012     -0.03      0.00    -0.04    -0.03        953 1.00
## Year2013     -0.04      0.00    -0.05    -0.03        921 1.00
## Year2014     -0.05      0.00    -0.05    -0.04        895 1.00
## Year2015     -0.01      0.00    -0.02    -0.01        885 1.00
## Year2016      0.01      0.00     0.00     0.02        823 1.00
## AveGDP        0.04      0.02    -0.01     0.08        354 1.01
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sigma     0.02      0.00     0.02     0.02       2049 1.00
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

In a frequentist frame, it’s complicated to put a confidence interval around the estimate of how much the random-intercept term covaries with the random-slope term. From a Bayesian perspective, it’s a lot easier, although we see that the standard deviation of \(\rho_{01}\) is quite high.

Take another look at the plot…

plot_model(random_random.model, type = "re") + 
  theme_bw()

The means are certainly quite different, but almost all of the posterior distributions cross zero. This suggests—and we will explore this more in a minute—that including a random slope parameter may not be necessary.


Hybrid model

Lets focus on the hybrid method for multilevel models. The Certo et al. paper speaks about the approach from a frequentist frame, but the logic is the same for Bayesian models.

The assumption for frequentist and Bayesian multilevel models is that the observation of \(x_{ij}\) does not correlate with between-group (\(j\)) disturbance term. The idea is that there are no unexplained sources of variance systematically related to the \(j\) level that biases the estimate of \(\beta_{1j}x_{ij}\).

The econometric solution is a fixed effect model, where we either demean or first-difference the observation of \(x_{ij}\). Demeaning—group-mean-centering—renders the estimate of \(\beta_{1j}x_{ij}\) orthogonal with the j-level disturbance term.

The resulting estimate for \(\beta_{1j}x_{ij}\) is the within-effect—the change in \(y\) for a unit change in \(x\) over time (or similar \(i\)-level observations) for an average \(j\).

Within-effect estimates are helpful, but of limited utility. The demeaning process eliminates the \(\mu_j\) term in the econometric approach, which yields a consistent estimate of the within-effect, but at the expense of removing any between-effect variance in \(j\).

In most cases, we would like to understand both within- and between-effects. We want to know how \(x\) influences \(y\) over time (\(i\)) and as a function of the higher level entity \(j\).

The solution is a hybrid model. In the hybrid model, we use the demeaned value of \(x\) in our Level 1 equation, but then also include the mean value of \(x\) for each \(j\) in the Level 1 equation:

Level 1

\[y_{ij} \sim \mathcal{N}(\beta_{0j} + \beta_{1j}(x_{ij}-\bar{x_{j}}) + \beta_{2j}\bar{x_{j}}, \sigma^2_{y})\]

Level 2

\[\beta_{0j} \sim \mathcal{N}(\gamma_{00} + u_{0j}, \sigma^2_{\beta})\]

\[\beta_{1j} \sim \mathcal{N}(\gamma_{10} + u_{1j}, \sigma^2_{\beta})\]

The estimate for \(\beta_{1j}(x_{ij}-\bar{x_{j}})\) is the within-effect, is orthogonal to the \(j\)-level disturbance term, and may be interpreted causally. The estimate for \(\beta_{2j}\bar{x_{j}}\) is the between-effect, and generally cannot be interpreted causally.

Why the limitations on causal interpretation? Well, a correlation between \(x\) and the \(j\)-level term is not the only source of endogeneity. The COV[\(x\),\(u_{0j}\)] = 0 assumption deals with between-level endogeneity. It is still possible that an omitted variable biases the Level 1 estimate, with \(x_{ij}\) correlating with \(\sigma^2_{ij}\), or the Level 2 estimate that the group-mean-centered variable correlates with the \(j\)-level disturbance term.

There are IV methods, even for Bayesian, to deal with this possibility.

Another possibility is to simply adjust the prior for estimated parameters depending on the source of the bias and its anticipated effect on the posterior probability distributions.

Random intercept/fixed slope

We are going to start with specifying a random intercept/fixed slope model, with our hybrid parameters. There are other options here, and we’ll get to that in a minute.

hybrid.rf.model <- brm(Survival_Rate ~ cenGDP + AveGDP + Year + (1 | Location), 
                       data = survival.df, chains = 2, prior = my.prior)
## 
## SAMPLING FOR MODEL '963545120edd00c0205945b7aa94c932' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0.000187 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.87 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 8.6919 seconds (Warm-up)
## Chain 1:                2.89683 seconds (Sampling)
## Chain 1:                11.5887 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL '963545120edd00c0205945b7aa94c932' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 0.000139 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 1.39 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 8.57548 seconds (Warm-up)
## Chain 2:                2.7963 seconds (Sampling)
## Chain 2:                11.3718 seconds (Total)
## Chain 2:

And the model results…

summary(hybrid.rf.model)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: Survival_Rate ~ cenGDP + AveGDP + Year + (1 | Location) 
##    Data: survival.df (Number of observations: 624) 
## Samples: 2 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 2000
## 
## Group-Level Effects: 
## ~Location (Number of levels: 39) 
##               Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sd(Intercept)     0.02      0.00     0.02     0.03        213 1.00
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept     0.03      0.21    -0.39     0.45        336 1.01
## cenGDP        0.14      0.01     0.11     0.17       1103 1.00
## AveGDP        0.04      0.02     0.00     0.08        339 1.01
## Year2002     -0.00      0.00    -0.01     0.01       1000 1.00
## Year2003     -0.01      0.00    -0.02     0.00        902 1.00
## Year2004     -0.01      0.00    -0.02    -0.01        691 1.00
## Year2005      0.01      0.00     0.00     0.02        663 1.00
## Year2006      0.01      0.00    -0.00     0.02        704 1.00
## Year2007      0.01      0.00     0.00     0.02        708 1.00
## Year2008      0.02      0.00     0.01     0.02        747 1.00
## Year2009      0.01      0.00    -0.00     0.01        715 1.00
## Year2010      0.00      0.00    -0.01     0.01        734 1.00
## Year2011     -0.02      0.00    -0.03    -0.02        748 1.00
## Year2012     -0.03      0.00    -0.04    -0.03        689 1.00
## Year2013     -0.04      0.00    -0.05    -0.03        731 1.00
## Year2014     -0.05      0.00    -0.05    -0.04        747 1.00
## Year2015     -0.01      0.00    -0.02    -0.00        632 1.00
## Year2016      0.01      0.00     0.00     0.02        632 1.00
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sigma     0.02      0.00     0.02     0.02       1356 1.00
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

The posterior point estimate for cenGDP is the within-effect—the expected change in survival rate for an average MSA as GDP changes over time.

The posterior point estimate for AveGDP is the between-effect—the expected change in survival rate as GDP changes over time between MSAs.

It looks like the parameters are quite a bit different, and that the within-effect is substantially more important than the between-effect. More on that in a minute.

Lets visualize the posterior distributions…

plot_model(hybrid.rf.model) + 
  theme_bw()

Random intercept/random slope

We can also include a random slope parameter on cenGDP

hybrid.rr.model <- brm(Survival_Rate ~ cenGDP + AveGDP + Year + (cenGDP | Location), 
                       data = survival.df, chains = 2, prior = my.prior)
## 
## SAMPLING FOR MODEL '47d60e8c0eaab0839b41942fab811d86' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0.000201 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 2.01 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 13.3858 seconds (Warm-up)
## Chain 1:                6.51235 seconds (Sampling)
## Chain 1:                19.8981 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL '47d60e8c0eaab0839b41942fab811d86' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 0.000117 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 1.17 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 15.8473 seconds (Warm-up)
## Chain 2:                6.30078 seconds (Sampling)
## Chain 2:                22.148 seconds (Total)
## Chain 2:

And the results of our random slope on cenGDP

summary(hybrid.rr.model)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: Survival_Rate ~ cenGDP + AveGDP + Year + (cenGDP | Location) 
##    Data: survival.df (Number of observations: 624) 
## Samples: 2 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 2000
## 
## Group-Level Effects: 
## ~Location (Number of levels: 39) 
##                       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sd(Intercept)             0.02      0.00     0.02     0.03        347 1.01
## sd(cenGDP)                0.14      0.02     0.10     0.19        889 1.01
## cor(Intercept,cenGDP)    -0.08      0.19    -0.44     0.29        790 1.00
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept     0.06      0.22    -0.37     0.50        403 1.00
## cenGDP        0.17      0.03     0.11     0.23        784 1.00
## AveGDP        0.04      0.02    -0.00     0.08        406 1.00
## Year2002     -0.00      0.00    -0.01     0.01       1157 1.00
## Year2003     -0.01      0.00    -0.02    -0.00       1160 1.00
## Year2004     -0.01      0.00    -0.02    -0.01        885 1.00
## Year2005      0.01      0.00     0.00     0.02        842 1.00
## Year2006      0.01      0.00    -0.00     0.01        916 1.00
## Year2007      0.01      0.00     0.00     0.02        696 1.00
## Year2008      0.01      0.00     0.01     0.02        786 1.00
## Year2009      0.01      0.00    -0.00     0.01       1000 1.00
## Year2010      0.00      0.00    -0.01     0.01        941 1.00
## Year2011     -0.02      0.00    -0.03    -0.02        852 1.00
## Year2012     -0.03      0.00    -0.04    -0.03        999 1.00
## Year2013     -0.04      0.00    -0.05    -0.03        901 1.00
## Year2014     -0.05      0.00    -0.05    -0.04        757 1.00
## Year2015     -0.01      0.00    -0.02    -0.01        868 1.00
## Year2016      0.01      0.00     0.00     0.02        849 1.00
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sigma     0.02      0.00     0.02     0.02       2212 1.00
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

Model selection

Lets talk about model selection…What we are trying to do is evaluate which model fits our data better.

Fit, by the way, is a subjective evaluation. If you want to get a group of statisticians to argue about something other than p-values, “fit” will be it.

The idea is pretty straightforward…which of my possible models better explains the variation I observe in my data?

The answer though is anything but straightforward, and it has to do with two wrongs do not make a right.

LOO

In the Bayesian world, an increasingly popular way of evaluating model fit is with (approximate) Leave-One-Out (LOO) cross-validation, and that’s what we are going to focus on here.

With LOO, you are evaluating whether the model produces better out of sample estimates of the posterior distribution. It is a way to determine the predictive validity of the model, by resamling from the data \(n\)-1 times, and making a posterior comparison each time.

We can use LOO to evaluate the predictive efficacy of a single model, or to make informed comparisons between multiple models.

We specified two hybrid models:

  • A model with a fixed slope for the effect of GDP on survival for each MSA
  • A model with a random slope for the effect of GDP on survival for each MSA

If you remember right though, the random slope model suggested that the model did not provide much in the way of additional predictive variance.

summary(hybrid.rr.model)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: Survival_Rate ~ cenGDP + AveGDP + Year + (cenGDP | Location) 
##    Data: survival.df (Number of observations: 624) 
## Samples: 2 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 2000
## 
## Group-Level Effects: 
## ~Location (Number of levels: 39) 
##                       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sd(Intercept)             0.02      0.00     0.02     0.03        347 1.01
## sd(cenGDP)                0.14      0.02     0.10     0.19        889 1.01
## cor(Intercept,cenGDP)    -0.08      0.19    -0.44     0.29        790 1.00
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept     0.06      0.22    -0.37     0.50        403 1.00
## cenGDP        0.17      0.03     0.11     0.23        784 1.00
## AveGDP        0.04      0.02    -0.00     0.08        406 1.00
## Year2002     -0.00      0.00    -0.01     0.01       1157 1.00
## Year2003     -0.01      0.00    -0.02    -0.00       1160 1.00
## Year2004     -0.01      0.00    -0.02    -0.01        885 1.00
## Year2005      0.01      0.00     0.00     0.02        842 1.00
## Year2006      0.01      0.00    -0.00     0.01        916 1.00
## Year2007      0.01      0.00     0.00     0.02        696 1.00
## Year2008      0.01      0.00     0.01     0.02        786 1.00
## Year2009      0.01      0.00    -0.00     0.01       1000 1.00
## Year2010      0.00      0.00    -0.01     0.01        941 1.00
## Year2011     -0.02      0.00    -0.03    -0.02        852 1.00
## Year2012     -0.03      0.00    -0.04    -0.03        999 1.00
## Year2013     -0.04      0.00    -0.05    -0.03        901 1.00
## Year2014     -0.05      0.00    -0.05    -0.04        757 1.00
## Year2015     -0.01      0.00    -0.02    -0.01        868 1.00
## Year2016      0.01      0.00     0.00     0.02        849 1.00
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sigma     0.02      0.00     0.02     0.02       2212 1.00
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

Another way to look at it is to compare the ICC values between the two models…

icc(hybrid.rf.model)
## 
## # Random Effect Variances and ICC
## 
## Family: gaussian (identity)
## 
## ## Location
##           ICC: 0.64  HDI 89%: [0.55 0.73]
## Between-group: 0.00  HDI 89%: [0.00 0.00]
## 
## ## Residuals
## Within-group: 0.00  HDI 89%: [0.00 0.00]
icc(hybrid.rr.model)
## 
## # Random Effect Variances and ICC
## 
## Family: gaussian (identity)
## 
## ## Location
##           ICC: 0.69  HDI 89%: [0.60 0.77]
## Between-group: 0.00  HDI 89%: [0.00 0.00]
## 
## ## Residuals
## Within-group: 0.00  HDI 89%: [0.00 0.00]
## 
## ## Random-slope-variance
## Location: 0.02  HDI 89%: [0.01 0.03]

A better way is to use LOO to make a distributional comparison…

loo(hybrid.rf.model, hybrid.rr.model)
##                                      LOOIC    SE
## hybrid.rf.model                   -3210.09 47.24
## hybrid.rr.model                   -3309.33 42.06
## hybrid.rf.model - hybrid.rr.model    99.24 31.77

So the closer the LOO value to zero, the better the model “fits”, in terms of better posterior predictive accuracy.

The key is the LOOIC difference between the models. Here we can look at our 2 s.e. guideline to determine the extent of the difference. In our comparison, the LOOIC is over three standard errors larger, which suggested a demonstrably better fitting model for the random intercept/fixed slope hybrid model.

So assuming this is the model we want to stick with, we can also use LOO to examine some model diagnostics…

loo(hybrid.rf.model)
## 
## Computed from 2000 by 624 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo   1605.0 23.6
## p_loo        43.1  4.2
## looic     -3210.1 47.2
## ------
## Monte Carlo SE of elpd_loo is 0.2.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     623   99.8%   232       
##  (0.5, 0.7]   (ok)         1    0.2%   660       
##    (0.7, 1]   (bad)        0    0.0%   <NA>      
##    (1, Inf)   (very bad)   0    0.0%   <NA>      
## 
## All Pareto k estimates are ok (k < 0.7).
## See help('pareto-k-diagnostic') for details.

We are looking for observations with Pareto k diagnostics over .7. Conceptually similar to the notion of outliers, these are observations that can skew the posterior predictive distributions.

We can look at these graphically by calling a plot function…

plot(loo(hybrid.rf.model))

So it looks like out random-intercept/fixed-slope model is pretty robust. There are other posterior checks, but I’m comfortable with this one.

And the interpretation of this multilevel model is substantially easier than our frequentist model…

summary(hybrid.rf.model)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: Survival_Rate ~ cenGDP + AveGDP + Year + (1 | Location) 
##    Data: survival.df (Number of observations: 624) 
## Samples: 2 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 2000
## 
## Group-Level Effects: 
## ~Location (Number of levels: 39) 
##               Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sd(Intercept)     0.02      0.00     0.02     0.03        213 1.00
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept     0.03      0.21    -0.39     0.45        336 1.01
## cenGDP        0.14      0.01     0.11     0.17       1103 1.00
## AveGDP        0.04      0.02     0.00     0.08        339 1.01
## Year2002     -0.00      0.00    -0.01     0.01       1000 1.00
## Year2003     -0.01      0.00    -0.02     0.00        902 1.00
## Year2004     -0.01      0.00    -0.02    -0.01        691 1.00
## Year2005      0.01      0.00     0.00     0.02        663 1.00
## Year2006      0.01      0.00    -0.00     0.02        704 1.00
## Year2007      0.01      0.00     0.00     0.02        708 1.00
## Year2008      0.02      0.00     0.01     0.02        747 1.00
## Year2009      0.01      0.00    -0.00     0.01        715 1.00
## Year2010      0.00      0.00    -0.01     0.01        734 1.00
## Year2011     -0.02      0.00    -0.03    -0.02        748 1.00
## Year2012     -0.03      0.00    -0.04    -0.03        689 1.00
## Year2013     -0.04      0.00    -0.05    -0.03        731 1.00
## Year2014     -0.05      0.00    -0.05    -0.04        747 1.00
## Year2015     -0.01      0.00    -0.02    -0.00        632 1.00
## Year2016      0.01      0.00     0.00     0.02        632 1.00
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sigma     0.02      0.00     0.02     0.02       1356 1.00
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).
plot_model(hybrid.rf.model)

plot_model(hybrid.rf.model, type = "re")

library(gridExtra)

grid.arrange(plot_model(hybrid.rf.model, type = "pred", terms = c("cenGDP")), 
             plot_model(hybrid.rf.model, type = "pred", terms = c("AveGDP")), ncol = 2)




Wrap-up

Like my last post, this example just scratches the surface on Bayesian multilevel modeling. The approach, including the hybrid method shown above, has become my go-to for multilevel models.