Causal Language

· by Brian Anderson · Read in about 3 min · (566 words) ·

New research suggests that the presence of causal language in a paper, despite what is otherwise only an associational design, leads readers to interpret the results as causal. This is concerning for entrepreneurship research, which often uses observational data and research designs without strong causal inference, but allows authors to develop arguably causal claims in the development of their hypotheses.

So here’s the question…If we accept this—or a similar—statement in a limitations section: “The current study employs cross-sectional data, and so precludes drawing causal claims”, should we also require authors to reword their hypothesis to avoid implying causal claims?

Consider this (commonly worded) hypothesis…

Increasing Research and Development intensity increases firm value.

In the development of this hypothesis, the study would have offered logic, citations to prior research, and general argumentation for why a firm that increases its R&D spend as a percentage of sales (R&D Intensity) experiences a higher valuation.

The research design, though, uses a simple random effects panel model (maybe with a lagged value of R&D intensity) to test the hypothesis. The design, data, and method are decidedly not causal, and perhaps the author made a disclosure such as the one above in the limitations section.


Side note…

It’s worth noting that a time-lag between the predictor and the criterion does not establish a causal relationship either. True, temporal separation between \(x\) and \(y\) is a necessary condition to establish that \(x\) indeed causes \(y\). It is not, however, sufficient. A non-spurious relationship and eliminating alternative explanations are co-equal conditions to draw causal inference. It’s also easy to argue that introducing a time lag may induce additional sample variation, or even open a new category of unobserved confounds, that may equally diminish a causal claim as it is to support it.


Despite the author’s honest disclosure, the implicit, or arguably explicit, assumption is that increasing R&D intensity causes firm value to increase. A reasonable reader can easily, but incorrectly, infer a causal relationship when the data and the results support no such conclusion.

By allowing authors to infer causality, but then disclaim away causal inference from associational data, we allow for an incomplete, or perhaps incorrect, nomological understanding. Getting this wrong carries implications for managers, entrepreneurs, and policy makers. If we’re going to continue accelerating entrepreneurship research impact on policy and practice, it’s in our best interest to be clear in our theoretical development and to be transparent in our conclusions.

In the end, I think the way forward is to encourage more precise hypotheses, starting with using associational verbiage if the research design does not allow for strong causal inference. For example, a study evaluating the effect of entrepreneurial orienation on sales growth rate, using observational data and an instrumental variable (2SLS) estimator, might read something like this:

We expect increasing levels of entrepreneurial orienation (EO) to be related to a higher rate of sales growth, such that a one standard deviation increase in EO associates with a 5-10% increase in sales growth rate.

A trick here though is to be careful to avoid HARKing—it’s easy to ‘adjust’ the wording of the hypothesis after the results are known. An easy solution is to preregister the design (my go-to has become OSF), and use a version control platform for the code.

The net result is better science, better transparency, and a reduction in the tendency to infer a causal relationship from associational results.