Moderation: Curvilinear effects // Endogeneity

ENT5587B - Research Design & Theory Testing II

Brian S. Anderson, Ph.D.

Assistant Professor

Department of Global Entrepreneurship & Innovation

andersonbri@umkc.edu

Â© 2017 Brian S. Anderson

- How goes it?
- IRB update
- Papers due!
- Curvilinear effects
- Dealing with endogeneity
- Lab this afternoon

Curves are deceptively fun to explore, but very difficult to (properly) model and interpret properly.

Our hypothesis today is that there is a â€˜sweet spotâ€™ in the relationship between a firmâ€™s strategic planning activities and its exhibition of innovative strategic behaviors.

The logic is that some planning gives the firm confidence in pursuing new product/market innovations. Too much planning, however, depresses those behaviors, because the firm becomes â€˜locked inâ€™ to its current product mix and avoids the new.

So what kind of relationship am I talking about here?

As you read about today, there are a lot of special considerations associated with modeling curvlinear relationships.

Before we dive in to the data though, weâ€™re going to talk about theory construction around curvilinear relationships.

How to write a hypothesis in 30 minutes.

Yes, Iâ€™m serious.

Weâ€™re going to use a framework called Action â€“> Result â€“> Impact.

Itâ€™s a way of thinking about a hypothesis through a focused, causal lens.

First, there **must** be an action; something **has** to happen.

When the something happens, it has a result (if it didnâ€™t have a result, then itâ€™s simply a null effect).

The big question though is what is the impact? Is it a big impact? A small impact? Is the impact a linear function? Does the impact always occur in the same way and of the same magnitude, and so forthâ€¦

Remember, the hypothesis is really nothing more than a research question. In our case, itâ€™s wondering whether there is a â€˜sweet spotâ€™ in the level of a firmâ€™s strategic planning and its strategic innovation activities.

But the research question is just the starting point. Itâ€™s really nothing more than slightly informed navel gazing.

To make it science, we need to put some theory behind it.

Yes, I am a big believer in theory. Journalism, not so much. Good theoryâ€”which to me means falsifiable predictions of the nomological relationship between two or more phenomenaâ€”is invaluable.

We just donâ€™t have much of what Iâ€™d call *good theory* using the above criteria, but thatâ€™s for another class.

Oh, weâ€™re going to use this framework for a curvilinear relationship, but it works just as well for a linear model, LDV model, etc.

Oh againâ€¦

While curvilinear relationships can take many forms, weâ€™re going to focus specifically on U (or inverse U) shaped relationships.

The reason being is that I think *most* other curvilinear forms are p-hacked or just appear because of noise in the data. I can certainly be wrong, but thatâ€™s just my opinion.

So in a U shaped relationshipâ€”Iâ€™m going to focus on an inverse U actually, but the logic applies to a regular U in, well, the inverse wayâ€”itâ€™s easist I think to split the theoretical development into two pieces.

On one side, you have a positive linear relationship between x and y until x reaches some point. At that inflection point the relationship switches, and then you have a negative linear relationship between x and y.

Ok, lets talk Action â€“> Result â€“> Impact.

First step for our hypothesis. **Why** does strategic planning increase innovativeness?

The trick for this is to time yourself, and to keep to the schedule.

Take five minutes and write down at the board all of the ways strategic planning can **change**. For example, it could beâ€¦

- The quality of the planning
- The type of planning
- Who was involved in the planning
- The output of the planning
- â€¦

Related noteâ€”whatâ€™s so big about needing change?

Ok, put a new column for â€˜Resultâ€™ to the side of your â€˜Actionâ€™ list of potential changes.

Now, in five minutes, consider what the likely result of this action will be on innovativeness in terms of a simple dichotomyâ€”it increases, or it decreases. Really, this is as simple as making a + or a -. Itâ€™s harder than it might seem though!

Oh, no, you canâ€™t have it both ways, at least right now. Why would we put this constraint in by the way?

Now, pick one action and one result that you feel is likely to have the greatest potential impact.

Yes, this is subjective, but think about impact in terms of effect size (or probability). This action has this result, which leads to the biggest change in innovativeness.

Now take five minutes and create three bullet points for why the impact on y is greatest because of the result of x doing whatever action it did.

The why is the theory, but it doesnâ€™t have to be complicated. This is an exercise in logical reasoning and deduction, and simple is generally better than complicated.

Ok, weâ€™re about fifteen minutes in to this. Now its time to write.

Take five minutes and write out one paragraphâ€”try for 4-5 sentencesâ€”about what change is occurring in strategic planning. You are just expanding on what action strategic planning exhibited.

Next five is describing the result. What happens to innovativeness when strategic planning changes? Shoot for 3-4 sentences describing why innovativeness changes in the way that it does (goes up, goes down, appears, disappears, etc.)

For the last five minutes, convert your impact bullets to sentences and expand on them, if you so desire.

There you go, three paragraphs for your hypothesis, which is about right for a focused paper.

Ok, lets pause for a discussionâ€¦

In order for our little exercise to ** not** be HARKing, what must come first in our research design process, and why?

Believe me, using this logic to define your measuresâ€”as opposed to the other way aroundâ€”will save you time in the long run.

The scientific method is, after all, what we are shooting for!

Ok, so how does this change for a curve?

Your readings today offer some great insight, but Iâ€™d also like to point you to a working paper here. Iâ€™m drawn to this one because of the simplicity in the theoretical formulation.

In the interest of full disclosure, Iâ€™m really intrigued by Uriâ€™s empirical approach, but it needs some additional fleshing out. Right now though, I think the logic for splitting the low/high condition of x is very valuable for hypothesis construction.

A simple way to think about is to consider a case where strategic planning equals 0. As planning increasesâ€”assuming a continuous variableâ€”until it reaches 1, what is the result on innovativeness.

Now, assume a case where strategic planning equals 1. As planning decreases until it reaches 2, what is the result on innovativeness?

When you put those two independent relationships together, in the context of impact, we get our quasi-U shaped form.

Remember though, weâ€™re looking at a non-linear effect, so we need to account for change in the linear form as well!

Ok, lets move on to running some models.

First up is our dataâ€¦

```
library(tidyverse)
my.ds <- read_csv("http://a.web.umkc.edu/andersonbri/ENT5587.csv")
my.df <- as.data.frame(my.ds) %>%
dplyr::select(Employees:RNDIntensity) %>%
filter(SGR > -50) %>%
na.omit()
```

Remember, a curvilinear relationship is just an interaction of a variable with itself. We could do this by hand if we wanted toâ€”`Forecast_squared <- Forecast^2`

â€”or we can use an R shortcut.

```
curvilinear.uncenteredmodel <- lm(Innovativeness ~ Forecast +
I(Forecast^2), data = my.df)
```

`summary(curvilinear.uncenteredmodel)`

```
##
## Call:
## lm(formula = Innovativeness ~ Forecast + I(Forecast^2), data = my.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.9242 -0.9531 0.0510 0.8134 3.9468
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.64654 0.68772 2.394 0.01841 *
## Forecast 1.21115 0.39041 3.102 0.00246 **
## I(Forecast^2) -0.13779 0.05141 -2.680 0.00853 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.398 on 106 degrees of freedom
## Multiple R-squared: 0.1017, Adjusted R-squared: 0.08471
## F-statistic: 5.998 on 2 and 106 DF, p-value: 0.003407
```

Lets run through these coefficientsâ€¦

Now lets talk about centering.

There is actually some debate over whether you should center in a curvilinear model. My personal opinion is that centering is beneficial because it places zero within the range of observed values of the predictor, and hence provides marginal benefit in interpreting the coefficient estimates.

So, lets centerâ€¦

`my.df$Forecast.center <- (my.df$Forecast - mean(my.df$Forecast))`

Now lets specify our model againâ€¦

```
curvilinear.model <- lm(Innovativeness ~ Forecast.center +
I(Forecast.center^2), data = my.df)
```

And the resultsâ€¦

`summary(curvilinear.model)`

```
##
## Call:
## lm(formula = Innovativeness ~ Forecast.center + I(Forecast.center^2),
## data = my.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.9242 -0.9531 0.0510 0.8134 3.9468
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.28230 0.18065 23.705 < 2e-16 ***
## Forecast.center 0.11894 0.09122 1.304 0.19509
## I(Forecast.center^2) -0.13779 0.05141 -2.680 0.00853 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.398 on 106 degrees of freedom
## Multiple R-squared: 0.1017, Adjusted R-squared: 0.08471
## F-statistic: 5.998 on 2 and 106 DF, p-value: 0.003407
```

Once again, does centering have anything to do with multicollinearity???

```
library(ggplot2)
myRange.df <- data.frame(Forecast.center = seq(from = range(my.df$Forecast.center)[1],
to = range(my.df$Forecast.center)[2], by = .1))
myPredicted.model <- predict(curvilinear.model,
newdata = myRange.df, se.fit = TRUE)
myRange.df$lower.ci <- myPredicted.model$fit - 1.96 * myPredicted.model$se.fit
myRange.df$fit <- myPredicted.model$fit
myRange.df$upper.ci <- myPredicted.model$fit + 1.96 * myPredicted.model$se.fit
ggplot(myRange.df, aes(x = Forecast.center, y = fit)) +
theme_minimal() +
geom_line() +
geom_ribbon(aes(ymin = lower.ci, ymax = upper.ci), alpha = .2) +
geom_point(data = my.df, aes(x = Forecast.center, y = Innovativeness)) +
labs(title = "Curvilinear Impact of Increasing Planning\nOn Firm Innovativeness",
x = "Strategic Planning -- Forecasting",
y = "Predicted Innovativeness")
```

So we see a general non-monotonic function, in this case, an inverse-U shape, between forecasting and innovativeness. Our plotted inflection point is about 0.5 on Forecastingâ€™s 1-7 **centered** Likert-type scale.

In the thinking about U paper that you read, the authors discuss a number of valid tests that you can perform to evaluate the presence of a U-shape.

Now, Iâ€™m fine with all of these. But, I think assuming you have centered your x, a visualization of the relationship (with a scatter plot overlay) and a calculation of the marginal effect across the range of the values of x gets you to the same place, why?

Ok, so lets get the marginal effect visualization, and weâ€™ll make use of `interplot`

again.

```
library(interplot)
interplot(m = curvilinear.model, var1 = "Forecast.center", var2 = "Forecast.center") +
geom_hline(yintercept = 0, linetype = "dashed") +
theme_minimal() +
xlab("Strategic Planning -- Forecasting") +
ylab("Estimated Effect of Planning\nOn Firm Innovativeness") +
ggtitle("Marginal Effect of Strategic Planning\nOn Odds of Firm Innovativeness")
```

Ok, how do we interpret this picture, and what does it mean for our visualization of the curvilinear effect?

Now the bigger question, do we find support for our hypothesis that there is a â€˜sweet spotâ€™ in the relationship between strategic planning and firm innovativeness?

Now the harder questionâ€¦how do we offer theory around a curvilinear relationship without first observing a curvilinear relationship?

So how do we address our dilemma?

Now here is the even harder part. This is observational data, from the same respondent, with a non-random survey response, and a mean-scale scored latent variable.

What problem am I getting at?

Learn it, love it, live it.

Embrace the counterfactual and dealing with endogeneity.

Weâ€™re going to start by looking at just a linear relationship between forecasting and innovativeness to show how to employ 2SLS in R.

Then weâ€™ll deal with the forbidden regression.

Weâ€™ve talked a lot about endogeneity and its sources, so weâ€™re just going to review how to evaluate the individual and joint reliability of instruments, evaluate the exclusion restriction, and do the Hausman test in R.

Weâ€™re going to make use of the the `AER`

package.

Spoiler alertâ€¦

In this dataset, there arenâ€™t any valid instruments for our focal relationship. Thatâ€™s pretty common.

So lets talk about how to find instrumentsâ€¦

Ok, lets estimate our 2SLS model.

```
library(AER)
instrument.model <- ivreg(Innovativeness ~ Forecast | AdPlan + LRP,
data = my.df)
summary(instrument.model, vcov = sandwich, diagnostics = TRUE)
```

```
##
## Call:
## ivreg(formula = Innovativeness ~ Forecast | AdPlan + LRP, data = my.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.3669 -0.9072 0.1474 1.0928 4.4721
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.5997 0.6782 2.359 0.020151 *
## Forecast 0.5948 0.1623 3.666 0.000385 ***
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 2 106 30.442 3.58e-11 ***
## Wu-Hausman 1 106 16.692 8.57e-05 ***
## Sargan 1 NA 0.476 0.49
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.568 on 107 degrees of freedom
## Multiple R-Squared: -0.1411, Adjusted R-squared: -0.1517
## Wald test: 13.44 on 1 and 107 DF, p-value: 0.0003853
```

So just a quick reviewâ€¦

- The â€˜Weak instrumentsâ€™ is the
*F*test of the instruments in the first stage equation. What critical value are we looking for again? - The â€˜Sarganâ€™ statistic is a test of the exclusion restriction under the null that the instruments are properly excluded from the second stage equation; i.e., the instruments are exogenous.
- The â€˜Wu-Hausmanâ€™ statistic is the consistency of the estimates under the null that the endogenous regressor can be treated as exogenous.

So back to our output and lets evaluateâ€¦

```
##
## Call:
## ivreg(formula = Innovativeness ~ Forecast | AdPlan + LRP, data = my.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.3669 -0.9072 0.1474 1.0928 4.4721
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.5997 0.6782 2.359 0.020151 *
## Forecast 0.5948 0.1623 3.666 0.000385 ***
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 2 106 30.442 3.58e-11 ***
## Wu-Hausman 1 106 16.692 8.57e-05 ***
## Sargan 1 NA 0.476 0.49
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.568 on 107 degrees of freedom
## Multiple R-Squared: -0.1411, Adjusted R-squared: -0.1517
## Wald test: 13.44 on 1 and 107 DF, p-value: 0.0003853
```

Why two instruments for our one endogenous regressor?

Donâ€™t forget to get the individual significance for the instruments in the first stage equation.

How would you get these values?

Now on to the forbidden regression.

The basic logic is straightforward. If you have an endogenous regressor, any additional regressor that is in part a linear combination of that endogenous regressor will also be endogenous.

The problem is, well, pretty bad.

Unfortunately, solving the problem can be rather complicated. In the case of our curvilinear model (which is an interaction effect), one solution that **may** work is to simply use the squares of the instruments along with specifying the linear and curvilinear term as endogenous.

```
my.df$SqForecast <- my.df$Forecast^2
my.df$SqAdPlan <- my.df$AdPlan^2
my.df$SqLRP <- my.df$LRP^2
forbidden.model <- ivreg(Innovativeness ~ Forecast + SqForecast |
AdPlan + SqAdPlan + LRP + SqLRP,
data = my.df)
summary(forbidden.model, vcov = sandwich, diagnostics = TRUE)
```

Note that the same rules apply for evaluating the validity of the instruments!

So what about continuous by continuous interaction effects with endogenous regressors?

Wrap-up.

Lab This Afternoon â€“ 1:00PM Moderation paper critique

Seminar 6 March â€“ Likely time change because of AACSB visit