Assistant Professor | Bloch School of Management

Plan of the day

  • Quiz review
  • Team deliverable review
  • Homework questions
  • Pealing back the veil of causality

Well done on the quiz!

  • Median: 10

  • Mean: 9.2

Establishing causal relationships is hard.

Tragically that doesn't stop people from trying.

The more sex you have, the longer you live.

Eating healthy helps too, although it isn’t as fun.

In your teams, brainstorm all of the (many) threats to causal inference that you see in drawing substantive conclusions about the relationship between longevity and sexual activity, and about longevity and diet.

The Italy study was what we call an observational study because there was no manipulation of the independent variables by the researcher. It is generally considered to be a weak design, because of the difficulty in eliminating \(z\)'s.

You also tend to have greater random and systematic error in data from observational studies.

The gold standard to maximize causal inference is the randomized controlled experiment.

In an experiment, we’re trying to isolate the causal effect of \(x\) on \(y\) by…

  • Directly manipulating \(x\)

  • Randomizing participants to the manipulated conditions to parcel out unobservable confounds

  • Controlling the environment to eliminate exogenous factors influencing the experiment

Spoiler alert…

Setting up and executing well done experiments is really, really, really hard, and can be expensive.

There are also ethical/practical limitations that may make an experiment unfeasible.

Returning to our Italy example…

I could take roughly 2,000 octogenarians (from my power analysis) in a longitudinal design (about 20 years). Then, I’d randomly assign them to one of four conditions, and record when they die.

Conditions… Sex (Yes) Sex (No)
Diet (Good) Lots of sex, and a healthy diet (Yeah!) No sex, and a healthy diet
Diet (Bad) Lots of sex, and a bad diet No sex, and a bad diet (Boo!)

Assuming I could get 2,000 80 year olds to agree to let me do this to them, I would express my null hypothesis as a difference in the age at death as a function of the four experimental conditions.

I can write my null this way, where \(\mu_n\) is the mean observed value for age at death for each of the four experimental conditions…

\(H_0:\mu_1=\mu_2=\mu_3=\mu_4\)

Ok, so I’m not likely to recruit enough 80 year olds, and the control group would likely be pretty upset.

We can though use the logic and basic design of a experiment to help us answer a wide variety of business problems.

Almost always, a weaker experimental design will be better than a strong observational design, so you should try to conduct an experiment first.

Alternatively, you can do your observational study first as a pilot, but then design an experiment to rigorously test.

Lets roll…

library(tidyverse)
library(readr)
dsom.ds <- read_csv("http://www.drbanderson.com/data/DSOM5522Fall2017Experiment.csv")
experiment.df <- as.data.frame(dsom.ds) 
head(experiment.df, 5)
##   Happiness Creativity EntManager TaskSpecificity RespAge RespExper
## 1         1          5          0               1      39       0.2
## 2         1          6          1               1      39       3.5
## 3         0          5          0               0      33       3.5
## 4         1          3          1               1      44       5.2
## 5         1          5          1               0      40       3.1
##   RespGender Obs
## 1          0   1
## 2          1   2
## 3          1   3
## 4          0   4
## 5          0   5

Getting a feel for the data…

experiment.df %>%
  group_by(Happiness, RespGender) %>%
  summarise (n = n()) %>%
  mutate(freq = round(n / nrow(experiment.df),2))
## # A tibble: 4 x 4
## # Groups:   Happiness [2]
##   Happiness RespGender     n  freq
##       <int>      <int> <int> <dbl>
## 1         0          0    86  0.13
## 2         0          1    44  0.07
## 3         1          0   304  0.47
## 4         1          1   209  0.33

Our multiple regression model for today…

\(Creativity=\alpha+\beta{EntManager}+\beta{TaskSpecificity}+\epsilon\)


  • Creativity: 1-7 ordinal scale indicating the creativity in completing a task

  • EntManager: Categorical (dichotomous) variable with 1 indicating whether the participant had the Entrepreneurial Manager = YES condition

  • TaskSpecicity: Categorical (dichotomous) variable with 1 indicating whether the participant had the Task Specificity = YES condition

Before we get into multiple regression analysis, lets just talk briefly about regression and ANOVA.

creativity.reg.model <- lm(Creativity ~ EntManager, data = experiment.df)
summary(creativity.reg.model)
## 
## Call:
## lm(formula = Creativity ~ EntManager, data = experiment.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.7588 -0.7588  0.2412  0.3976  3.2412 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.75884    0.05986   62.79   <2e-16 ***
## EntManager   0.84357    0.08331   10.12   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.056 on 641 degrees of freedom
## Multiple R-squared:  0.1379, Adjusted R-squared:  0.1365 
## F-statistic: 102.5 on 1 and 641 DF,  p-value: < 2.2e-16

creativity.anova.model <- aov(Creativity ~ EntManager, data = experiment.df)
summary(creativity.anova.model)
##              Df Sum Sq Mean Sq F value Pr(>F)    
## EntManager    1  114.3  114.27   102.5 <2e-16 ***
## Residuals   641  714.4    1.11                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(creativity.reg.model)
## Analysis of Variance Table
## 
## Response: Creativity
##             Df Sum Sq Mean Sq F value    Pr(>F)    
## EntManager   1 114.27 114.269  102.52 < 2.2e-16 ***
## Residuals  641 714.43   1.115                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ANOVA is a workhouse in psychology, medicine, and lots of other fields. It was designed really to estimate experimental effects.

Even though they approach it from different perspectives, ANOVA and regression are two sides of the same coin (the general linear model).

So we’re just going to focus on analyzing our experimental data with regression.

In multiple regression, we’re usually interested in one (or some combination) of the following…

  • Isolating the effect of \(x\) on \(y\) while controlling for possible \(z\)'s

  • Estimating the effect of two (or more) meaningful \(x\)'s on \(y\)

  • Determining which \(x\) is most important

  • Determining whether the effect of \(x_1\) and \(x_n\) are materially different from each other

Lets get started with testing our model:

experiment.df$EntManager <- factor(experiment.df$EntManager)
experiment.df$TaskSpecificity <- factor(experiment.df$TaskSpecificity)
creativity.model <- lm(Creativity ~ EntManager + TaskSpecificity, data = experiment.df)
summary(creativity.model)
## 
## Call:
## lm(formula = Creativity ~ EntManager + TaskSpecificity, data = experiment.df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.77909 -0.77909  0.02542  0.60430  3.02542 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       3.59119    0.06907  51.995  < 2e-16 ***
## EntManager1       0.80452    0.08243   9.760  < 2e-16 ***
## TaskSpecificity1  0.38339    0.08240   4.653 3.98e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.039 on 640 degrees of freedom
## Multiple R-squared:  0.1661, Adjusted R-squared:  0.1635 
## F-statistic: 63.74 on 2 and 640 DF,  p-value: < 2.2e-16

F-Statistic

summary(creativity.model)$fstatistic
##     value     numdf     dendf 
##  63.73649   2.00000 640.00000

What the \(F\) statistic tells us is that the best fitting line to the model has a non-zero slope. In multiple regression, it just tells us that at least one of the coefficient estimates does not equal zero, but it doesn't tell us which one.

Multiple R-Squared (\(R^2\))

summary(creativity.model)$r.squared
## [1] 0.1660944

Proportion of explained variance in \(y\) across the average unique effect of all \(x\)'s.

Adjusted R-Squared (\(R^2\))

summary(creativity.model)$adj.r.squared
## [1] 0.1634885

Downward adjustment of \(R^2\) to account for the number of \(x\)'s in the model. We need this because \(R^2\) will go up solely on the basis of having more predictors, regardless of whether they meaningfully increase the explained variance in \(y\).

So the big question though…

When doing a business task, does having an entrepreneurial manager while doing the task and having task specificity (having at least basic instructions for doing the task) improve creativity on the task?

Coefficient Estimate (\(\beta\))

summary(creativity.model)$coefficients
##                   Estimate Std. Error   t value      Pr(>|t|)
## (Intercept)      3.5911862 0.06906851 51.994551 6.033867e-232
## EntManager1      0.8045160 0.08243023  9.759963  4.519612e-21
## TaskSpecificity1 0.3833905 0.08240310  4.652622  3.984516e-06

Because both of our predictors are dichotomous, the coefficient estimate is the average expected difference in \(y\) between the two values of \(x\).

In our case, we would read the result of our analysis of the relationship between \(EntManager\) and \(Creativity\) as:

We expect a .805 increase in creativity in the presence of an enterpreneurial manager, holding constant the effect of task specificity on creativity.

Ok, so how do we make sense visually of a multiple regression model? Well, it's a little on the tricky side, because we've "adjusted" each coefficient for the relationship between all of the other coefficients in the model…

creativity.simple.model <- lm(Creativity ~ EntManager, data = experiment.df)
summary(creativity.simple.model)$coefficients
##              Estimate Std. Error  t value      Pr(>|t|)
## (Intercept) 3.7588424 0.05986473 62.78893 5.208848e-276
## EntManager1 0.8435672 0.08331198 10.12540  1.872638e-22
summary(creativity.model)$coefficients
##                   Estimate Std. Error   t value      Pr(>|t|)
## (Intercept)      3.5911862 0.06906851 51.994551 6.033867e-232
## EntManager1      0.8045160 0.08243023  9.759963  4.519612e-21
## TaskSpecificity1 0.3833905 0.08240310  4.652622  3.984516e-06

What we can do is visualize the \(\beta\) coefficients as point estimates, if you remember from last week. But here we're going to put all of them together on one plot, and compare them to the intercept.

This works well for models with categorical variables, because there isn't a meaningful line of best fit; the estimate is simply a shift up (or down, if a negative coefficient) from the baseline (or reference) group.

library(broom)
creativity.model.tidy <- tidy(creativity.model, conf.int = TRUE)
creativity.model.tidy
##               term  estimate  std.error statistic       p.value  conf.low
## 1      (Intercept) 3.5911862 0.06906851 51.994551 6.033867e-232 3.4555579
## 2      EntManager1 0.8045160 0.08243023  9.759963  4.519612e-21 0.6426496
## 3 TaskSpecificity1 0.3833905 0.08240310  4.652622  3.984516e-06 0.2215774
##   conf.high
## 1 3.7268144
## 2 0.9663824
## 3 0.5452036

Now lets create a new dataframe that puts the coefficient estimates on the \(y\) (Creativity) scale, so that we can better compare the shifts…

baseline = creativity.model.tidy[1,2]
creativity.model.coef <- creativity.model.tidy %>%
  dplyr::select(term, estimate, conf.low, conf.high) %>%
  mutate(term.number = 1:n()) %>%
  mutate(estimate = ifelse((term.number > 1), estimate + baseline, estimate),
         conf.low = ifelse((term.number > 1), conf.low + baseline, conf.low),
         conf.high = ifelse((term.number > 1), conf.high + baseline, conf.high))
creativity.model.coef
##               term estimate conf.low conf.high term.number
## 1      (Intercept) 3.591186 3.455558  3.726814           1
## 2      EntManager1 4.395702 4.233836  4.557569           2
## 3 TaskSpecificity1 3.974577 3.812764  4.136390           3

creativity.plot <- ggplot(creativity.model.coef, aes(term, estimate, colour = term)) + 
  geom_point() +
  geom_errorbar(aes(ymin = conf.low, ymax = conf.high)) +
  scale_x_discrete(labels=c("Baseline Creativity", 
                            "Creativity With \n Entrepreneurial Manager", 
                            "Creativity With \n Task Specificity")) + 
  labs(title = "How Task Creativity Increases with an Entrepreneurial Manager",
       y = "Creativity (1 - 7)",
       x = "") + 
  scale_colour_discrete(name="Experimental \n Condition",
                        labels=c("Intercept", "Entrepreneurial \n Manager", "Task \n Specificity"))
creativity.plot

Take another look at the plot. You should notice that the point estimate for Creativity is going up by about .8 for those respondents that had an Entrepreneurial Manager, and the point estimate for Creativity is going up by about .4 for those that had Task Specificity.

Next question…

When doing a business task, which matters more to Creativity on the task: having an Entrepreneurial Manager while doing the task or having Task Specificity?

summary(creativity.model)$coefficients
##                   Estimate Std. Error   t value      Pr(>|t|)
## (Intercept)      3.5911862 0.06906851 51.994551 6.033867e-232
## EntManager1      0.8045160 0.08243023  9.759963  4.519612e-21
## TaskSpecificity1 0.3833905 0.08240310  4.652622  3.984516e-06


Our coefficient estimates look like they are different from each other, but what we are really after is whether they are statistically different from each other.

\(H_0:\mu_{EntManager}=\mu_{TaskSpecificity}\)


To answer this question, we need to conduct a contrast

install.packages("multcomp")
library(multcomp)
creativity.contrast <- glht(creativity.model, linfct = c("EntManager1 - TaskSpecificity1 = 0"))
summary(creativity.contrast)
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Fit: lm(formula = Creativity ~ EntManager + TaskSpecificity, data = experiment.df)
## 
## Linear Hypotheses:
##                                     Estimate Std. Error t value Pr(>|t|)
## EntManager1 - TaskSpecificity1 == 0   0.4211     0.1223   3.442 0.000615
##                                        
## EntManager1 - TaskSpecificity1 == 0 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

Bonus points…

Why is \(H_0:\mu_{EntManager}=\mu_{TaskSpecificity}\) the same thing as \(H_0:\mu_{EntManager}-\mu_{TaskSpecificity}=0\)?

Key Takeaways…

  • The gold standard to establish causality is the randomized controlled experiment. Even if the design isn’t ideal, the data and analyses are USUALLY better than an observational-only design.

  • The best experiments are the simplest experiments; depth is better than breadth!

  • Anything you can do with an ANOVA model, you can do with a regression model!

TRAINING TIME OUT

Wrap-up

What's next?

  • Next Class: Designing (real world) experiments
  • Deliverables: Homework 7 Due 9:00AM Wednesday