In their recent Perspective article (Voelkl et al. Reproducibility of animal research in light of biological variation. Nat. Rev. Neurosci. 21, 384–393 (2020))1, Voelkl et al. recommend the use of systematic heterogenization in animal studies. The rationale for this recommendation lies in the current best practice of rigorously standardizing (that is, homogenizing) conditions within experiments. This practice has been repeatedly criticized to limit experiments’ inference to the specific experimental conditions used and hence to cause, rather than cure, poor reproducibility in animal experiments2,3. Instead, it has been suggested to embrace biological variation and to use it actively as a tool for making study populations more representative and the results more meaningful and reproducible (for example, see refs4,5). We greatly appreciate the recommendations of Voelkl et al. and agree on the importance of a paradigm shift in animal research. However, we would like to draw attention to some points that, from our perspective, deserve more attention.

It is right that “direct evidence for the standardization fallacy is currently limited to simulations across replicate studies and only a few dedicated experimental studies”1. However, we think it is even more critical to highlight the lack of studies going beyond the standardization fallacy, proving the benefits of systematic heterogenization. Hypothesis-driven comparisons of standardized and heterogenized designs are needed, demonstrating, for example, that the latter leads to the better reproducibility of treatment effects. Until now, only three empirical studies have adopted such an approach. A single-laboratory study showed that heterogenizing mouse study populations across two environmental factors improved the reproducibility of behavioural data3. However, in a multi-laboratory situation, the same approach did not yield similarly promising results6. A third, ecological study investigated different strategies in microcosm experiments, showing that genetic heterogenization lowered between-laboratory variation7.

Another point we wish to highlight is the lack of practical solutions for how to heterogenize a study population in an effective and feasible way. Voelkl et al. wrote: “heterogenization may be based on controlled variation, for instance by systematically varying the genotype […], the state and history of the individual […], or the test condition […]. Alternatively, heterogenization may be based on uncontrolled variation, for example by using outbred study populations, by splitting experiments into multiple independent batches of animals or by conducting multilaboratory studies.” Indeed, two recent simulation studies hinted towards better reproducibility in experiments that are either heterogenized across laboratories or the “time of day at which an experiment is conducted”8,9. However, approaches involving heterogenizing conditions across housing conditions and age classes did not lead to the desired improvements6. Thus, it is probably far more difficult to address the practical issues than has been suggested. Moreover, we argue that the concept of heterogenization relies on the introduction of systematic and hence controlled variation. Introducing uncontrolled variation instead (for example, by using outbred strains) might bear the risk of inflating sample sizes, as it is hard to control for this variation in the experimental design or the statistical analysis10. We therefore strongly recommend the use of heterogenization factors that can be systematically varied and act as kind of ‘umbrella factors’ covering plenty of known and unknown background variables at the same time (for example, the experimenter11,12).

Taken together, although the acceptance of the heterogenization concept has greatly increased over the past decade (for example, see ref.13), it is supported by only a few empirical studies. Proving the concept empirically and not just theoretically, however, appears particularly important to overcome existing lab traditions and to “establish systematic heterogenization of study populations as a new standard”1. We therefore appeal to the scientific community not to remain at the conceptual level but, instead, to explore and validate novel strategies for putting this concept into practice.

There is a reply to this letter by Würbel, H. et al. Nat. Rev. Neurosci. https://doi.org/10.1038/s41583-020-0370-7 (2020)