As a field, we seem to be gravitating to a modus operandi where model complexity equates to making a theoretical contribution. The notion seems to be that the more variables we have, the more hypotheses we test in a single paper, and the more mediators and moderators we include, the greater the ‘contribution’ of the paper. Models end up with causal paths pointing in each cardinal direction, all under the assumption that the ‘more stuff’ I include equates to richness of understanding.

I think this is a bad trend, for three reasons.

1). The likelihood of model misspecification—The more variables, hypotheses, and complexity in the model, the more likely that the entire model will not just be wrong, but really, really, bad;

2). Creates perverse incentives—Tying a theoretical contribution to model complexity invites HARKing and p-hacking; and

3). Barriers to replication—The more a model depends on a particular set of variables from a particular dataset, each of which with its own complex construction, the harder it is for another researcher to replicate the study.

So what’s the answer? Start with a simple model…

- A single x, predicting a single y;
- Minimal measurement error for both variables;
- A large, appropriately powered sample;
- Appropriate steps to eliminate alternate explanations for the relationship between x and y (ideally by manipulating x); and
- A reproducible codebook to ensure others can follow along with what you did.

Seriously, that’s it. Now, it’s actually really hard to do steps 2, 3, and 4. These are, however, critical to yield an unbiased estimate of the effect of x on y. Noisy measures in noisy data with small *true* effect sizes are far more likely to yield unpredictable (and usually inflated) results. A well developed measure, with measurement error kept to a minimum, needs a large dataset to tease out meaningful insights. Too often we see large datasets, but measures of such a convoluted construction that understanding just what the researcher did to build the measure—let alone have confidence that the observed effect is not simply an artifact of the measurement model—makes the contribution trivial at best. By the same token, well done measurement models tested in small, noisy samples result in a similar interpretational problem; it’s too difficult to separate the signal from the noise.

Step 4—dealing with endogeneity—is a topic near and dear to my heart. Here’s my specific problem…it’s so challenging to isolate a consistent effect size estimate for ONE focal relationship. The more hypotheses and variables added to the model, assuming the researcher tests it simultaneously, the difficulty in recovering consistent effect sizes increases exponentially; you are far more likely to screw the **entire** model up.

Of course, sharing your code, and ideally your data, is pretty easy. But it’s just not something commonly done in management and entrepreneurship research. I hope that is changing, and for me, all of my papers now include posted codebooks and data. There is just no good reason not too.

I think one solution is for journals to encourage more single hypothesis papers. Take an interesting question—say estimating the probability that a failed entrepreneur will start another new venture—and evaluate that question with 2-3 independent studies, with consistent measures, in large representative samples, and ideally with the same instruments used to address the endogeneity problem. As an incentive, journals could offer expedited review of these studies, assuming that the researcher shared his or her data and code.

The bottom line is that headline grabbing effect sizes with sexy variables in complicated models are, over the long run, far more likely to be found wanting than vindicated as *possibly* right. Science progresses with small, incremental contributions to our knowledge base. Start with a simple model, test it rigorously, and better our management *science*.