My five guidelines for evaluating a study

· by Brian Anderson · Read in about 5 min · (970 words) ·

I didn’t fully understand the added workload that comes with being an field (associate) editor. Don’t get me wrong, I’m loving the position, but in writing decision letters, I find that I’m often making similar observations regarding a study’s reported empirics, so I thought I’d crystalize my five primary guidelines for evaluating a study. These are in no particular order, they are equally weighted, and this list isn’t exhaustive of other important methodological contributions.

Oh, one important point. Nowhere on this list does ‘make a theoretical contribution’ appear. That’s on purpose, and the logic is simple. You can’t make a contribution with faulty science. Get the empirics tight, and then we can move on to framing and argumentation. John Antonakis summed it up best—“Research that is not rigorous simply cannot be relevant.”

1) How big was the sample?

Statistical power is certainly an issue, and small samples may simply be underpowered to detect small effects. That’s not really my concern though. The vast majority of published studies have statistically significant results, so failing to detect an effect isn’t the problem. The problem, as described much better by others, is that small effects in small samples are often more likely to appear as a function of noise than reflecting a true effect. What’s worse, the ‘false’ signal is likely to be inflated, making the type I error bigger. It’s always problematic to have studies with a low probability of replication, but in this case, it’s particularly bad because such studies can make it seem that the estimated effect is quite large when, in reality, if the effect exists at all it’s likely to be small. These studies just add noise to an already noisy literature.

2) How noisy are the measures?

Speaking of noise, measurement error is a particularly pernicious beast. In entrepreneurship and management research in general, we use a lot of proxy variables and latent constructs to capture phenomenon of interest. There is nothing wrong with this, so long as there is an adequate discussion of construct (proxy) validity. What I’m concerned about specifically is measurement error and it’s impact on structural parameters. Measurement error is effectively omitted variable bias (endogeneity), which renders parameter estimates inconsistent—no matter how big you make the sample, the estimates will always be wrong. This is particularly concerning in mediation models and in moderation models. Mediation generally assumes the mediator is error free, and in moderation, the reliability of the interaction term (xm) is the product of the lower order reliabilities—it’s always going to be lower then the constituent terms. So if the measures were noisy to begin with, the moderator will be even worse.

3) Was there a manipulation of x?

As Paul Holland noted, there is no causation without manipulation. Experimental designs, natural or otherwise, are not as common in entrepreneurship and strategy research as they need to be. Given that we deal largely with observational data, we can never make the claim that selection effects and other omitted variables are not materially influencing a given model. That means in any paper without a manipulation, the bar is high for the author(s) to demonstrate that endogeneity has been thoroughly addressed. 2SLS, regression discontinuity, blocking, and other related designs are all fine from my perspective assuming they are well done, but something must be there to show that the author(s) are recovering consistent parameter estimates.

4) Does the researcher have skin in the game (confirmation bias)?

To be clear, I’m not talking necessary about the influence of grant/funding providers, etc. I’m talking in about confirmation bias—the extent to which a researcher seeks out information that conforms to his/her world view. In my own area of strategic entrepreneurship, there is a strong consensus (world view) that firms that are entrepreneurial outperform conservatively managed firms. There’s quite a bit of evidence to support that claim, but it also makes it less likely that someone with that world view is willing to accept evidence that entrepreneurial firms don’t outperform a non-entrepreneurial firm. I’ve got skin in the pro-entrepreneurship game, so I’m subject to confirmation bias. The bigger problem as I see it that researchers with a strong normative bias towards a given conclusion are less likely to critically analyze their results, and more concerning, may be more likely to utilize researcher degrees of freedom to ensure a p < .05 result. To be clear, it’s not about accusing an author of having confirmation bias, rather it’s a conditional probability—the probability of engaging in researcher degrees of freedom is higher for a given researcher with skin in the game on the study’s topic.

5) What is the margin of error for the prediction?

As a field, we don’t pay close enough attention to standard errors. I’m not in the camp that we need to show that a given effect is practically significant. I think that this standard actually encourages researcher degrees of freedom. The better standard is to just be honest about a consistently estimated effect size, which in entrepreneurship and management, is likely to be small. So better to be accurate, than to be practically important. That said, particularly with small samples and noisy measures, the resulting confidence intervals become even more important. We generally just dichotomize hypotheses in entrepreneurship—there is a positive effect of x on y—but effect sizes and standard errors matter a lot for replications and meta-analyses. So, the margin of error around the estimate is particularly important for me—the bigger the range, the less useful the prediction for science.

One of the things I’ve found from my own methodological awakening is how helpful these criteria are for evaluating my own research. It’s a check, if you will, on my own biases and excitement over promising early stage results, and I’m going to be using these more in my decision letters and reviews.