A Brief Explanation of Four Often Misunderstood Scientific Concepts.
Lawyers litigate life, which in many circumstances means litigating with scientific concepts. Indeed, science holds the promise of objectivity and a search for the truth. But people are imperfect and our understanding of science even more so (the debacle during the COVID pandemic is only the most recent, but certainly all-encompassing example). Below are four commonly misunderstood scientific concepts expressed (hopefully) in easy to digest snippets: causation v. correlation, the p-value, confidence intervals, and effect size.
I. Correlation v. Causation: How Data Looks v. How Data Behaves.
While the difference between correlation and causation seems obvious, the two terms can still cause confusion. Causation and correlation both describe the relationship between variables. However, causation refers to a situation in which one variable directly and significantly causes a change in another variable. Rather, causation requires evidence that changes in one variable lead to changes in the other variable, and this relationship must hold true despite the presence of other factors.
Let’s take a graph that shows an increase in per capita cheese consumption that is proportional to the number of people who died by becoming tangled in their bedsheets (yes, this is real data). The two lines increase in near unison, aligning with each other “nearly perfectly” over several years. This is described as a “strong positive correlation” because the data increases similarly over time and it does so in an upward trajectory (hence the word positive). To the naked eye the graph reflects a correlation between these two seemingly disparate variables. However, whether more cheese-eating leads to more bedsheet-deaths (or vice versa) is entirely unclear; these two data sets being so similar in progression over the years may be entirely random! CDC warnings to cheese-eaters to reduce dairy consumption for fear of dying in their bedsheets would only be warranted if there is, in fact, a causal relationship between the two things. In other words, causation is not correlation because (and this is the truly tricky part) correlation can be entirely random.
Correlation tells you how data looks; causation tells you how it behaves.
II. What is the p-value? Giving statistical tools undue grandiosity.
The p-value is the darling of scientific interpretation, presumably because it offers a “bright line” rule, usually expressed as 0.05: a p-value under 0.05 is “proof” and a p-value over 0.05 is “not proof.” But this is completely wrong.
Assume someone is looking at a data set. The p-value is the probability of seeing that particular data, in that form (or with more extreme values) if there is, in fact, no relationship between the variables at play. Using the example above, the p-value would tell us the probability of this graph looking like that (or being even more correlated) if cheese-eating and bedsheet-deaths were not, in fact, causing each other. Which brings us to the concept of null and alternative hypothesis. In the above scenario:
· Null hypothesis: there is no causal relationship between cheese-eating and dying in bedsheets.
· Alternative hypothesis: the more someone eats cheese, the more likely they are to die by being tangled in their bedsheets.
A very small p-value only indicates that the null hypothesis can be rejected. It is not true that there is no causal relationship between the two variables. However, a very small p-value does not prove the alternative hypothesis. A small p-value does not prove that eating cheese leads to people dying by being tangled in their sheets. Even more importantly, a small p-value does nothing to indicate the strength of the relationship between dairy and linen-induced death.
Said in plain terms: if the data set above resulted in a small p-value, we would only know that there is, in fact, a causal relationship between cheese-eating and bedsheet deaths. No more, no less.
III. Confidence Intervals: How Confident and Of What?
Another often-misunderstood term is “confidence interval.” Definitions of a confidence interval found online can be confusing so here is a plain (and hopefully more helpful) explanation. Assume that the goal is to find the mean quantity of Compound X in defective pills. The researcher tasked with this cannot measure the percentage of Compound X found in every single defective pull. Instead, they would obtain a sample of pills (the size of the sample needs to be large enough, but that’s a whole other topic). Based on that sample, they would be able to calculate a confidence interval to a specific level of confidence. Often, that level of confidence is in the 90s. Assume the researcher calculates a 95% confidence interval of 0.01 to 0.03 grams. This means that there is a 95% chance that the real mean weight of Compound X in each falls between 0.01 and 0.03 grams.
IV. Effect Size: The Underutilized Powerhouse.
Effect size is a powerful, but underutilized, calculation tool (the p-value having unfairly taken up the spotlight). (For comparison, a comprehensive search of case law across all jurisdictions results in 37 opinions with the term “effect size” versus 215 opinions with the term “p-value.”). Effect size actually answers the question “how strong is the relationship between two variables.” And while there are several ways to calculate effect size, those calculations always measure the strength of the relationship between two variables in a population or a sample-based estimate.
An attorney involved in a matter requiring an expert opinion would be well-served to ask about effect size in conjunction with p-value and would be very-well served by knowing how to undermine undue reliance on and misuse of the p-value.