Recently, one of my PhD students complained that while presenting her poster, the scientific relevance of her modeling work was questioned aggressively by an experimentalist. So even at the end of the second decade of this millennium, theoreticians still have to justify the relevance of their work towards understanding the brain.Footnote 1 In this editorial I want to demonstrate that, unfortunately, such high standards are not always applied to experimental work, in particular in mice.

A first example is the recent news that two major Phase II trials for Alzheimer drugs have been canceled.Footnote 2 These trials of humanized anti-amyloid-β monoclonal antibodies were based on the convergence of two sets of data: genetic risk for Alzheimer disease in humans indicating the importance of amyloid metabolism and extensive studies in transgenic mice.Footnote 3 Many studies have shown that transgenic mice expressing gene mutations associated with human familial Alzheimer disease progressively develop brain amyloid plaques and memory deficits.Footnote 4 Immunization against amyloid-β peptide rapidly reversed memory defects in some transgenic models 3,Footnote 5 leading to the subsequent clinical trials. In hindsight, is the failure of such treatments in patients surprising? There were already papers suggesting that the antibodies did not work in all mouse models of Alzheimer disease; noticeably these papers were published in lower impact journals.Footnote 6 But I want to argue that, in general, mouse models are not very predictive of human disease.

This conjecture is based on a preceding sequence of failures with drugs derived from mouse and other animal studies: those used for the treatment of septic shock. Of the 69 clinical studies performed in 1982–2013 that were analyzed in,Footnote 7 only 8 resulted in some benefits and 4 actually harmed the patients, all others showed no effect. All of these studies used compounds that were beneficial in mice, baboons or rabbits. A simple reason of these differences may be that humans are much more sensitive to bacterial lipopolysaccharide than mice or baboons, but this has not stopped the use of these animals in drug testing. At least there were no fatal incidents in Phase I trials of septic shock drugs, as was recently the case for a new compound that was supposed to work both for Parkinson’s disease and chronic pain.Footnote 8

A rigorous reason why murine models are not predictive for septic shock was provided by a genomic study showing that changes in gene expression in mouse models to inflammatory stress have zero correlation with the corresponding gene expression changes in humans.Footnote 9 This study was widely advertised in the popular and scientific press and generated a lot of reactions. But, unfortunately, it did not lead to a fundamental change in our approach to murine models of human diseases and, in particular, there has been little interest in applying the lessons learned from septic shock to other categories of disease models.

Some will argue that there are no alternatives to mice, though for some diseases human organoidsFootnote 10 may soon become the primary model system. But we should not underestimate sheer inertia, the availability of easy to get grants for translational research and the vested interests of the industry and university centra that support murine research: all of these factors converge to ensure a bright future of the mouse model irrespective of its variable usefulness.

Returning to neuroscience, the relevance of murine models of psychiatric disease should especially be questioned. Although these mouse models are often named for the psychiatric syndrome, e.g. schizophrenia or autism, in reality they are models of a single endophenotype.Footnote 11 Endophenotypes are discrete behavioral traits that in combination form the whole syndrome, but of course there is no guarantee that the disease can really be decomposed in such a way. To compensate for this, it is now established practice to study several transgenic lines simultaneously based on the, somewhat naive, expectation that the relevance of an observed effect correlates with the number of transgenic lines in which it is observed. Resolving the mechanistic cause of human psychiatric diseases, which for most syndromes remains a mystery, should really be a more pressing challenge than investigating mouse endophenotypes in detail.

If mice have limited use in studying human disease, are they at least useful in understanding brain function? The technical revolution caused by optogenetic methods and imaging of genetically expressed calcium dyes has led to a rapid shift from rats to mice as the preferred experimental animal.Footnote 12 But again, a mouse is not a human and therefore, it is not the best animal to study every interesting neuroscience topic. An example is visual cortex, which is the main target of the Allen Institute’s Project MindScope.Footnote 13 It is well known that mice have low visual acuityFootnote 14 and use olfaction and whisking as their main sensory input. Although it was recently shown that mice do use vision in specific behaviors,Footnote 15 they are - like rats - not binocular animalsFootnote 16 and - lacking pinwheelsFootnote 17- their visual cortex is organized quite differently compared to primate visual cortex. Based on these differences, studying mouse visual cortex is comparative neuroscience, likely as relevant as studying the Drosophila visual system towards understanding human vision.

Finally, even when relevant brain functions are studied in mice, the standards used to design the behavioral component are much lower than what is common in human imaging studies.Footnote 18 A basic - though inaccurateFootnote 19 - neuroimaging technique is to subtract images obtained when performing a control condition (e.g. pushing a button or seeing an image) from images acquired when performing the condition of interest (e.g. pushing a specific button for a specific image). But even this is absent in mouse behavioral design. In many papers reporting on in vivo murine experiments all neural activity is implicitly assumed to be caused by the cognitive task, usually there is no attempt to decompose the behavior and attribute activity to specific subcomponents.

An example of how this lack of sophistication in study design and analysis leads to confusing results can be found in a recent study of the cortico-cerebellar loop.Footnote 20 At one level this study is ground-breaking because it demonstrates that activity in cerebellar nuclei is required for motor planning, specifically using sensory discrimination to plan a future directional licking movement. The mice have to use their whiskers to locate a pole relative to their fixed head, wait, and then perform either left or right licking. It was known that such a task requires persistent activity in the frontal cortex during the waiting period - akin to working memoryFootnote 21 - and the authors showed recently that the thalamus is required for this persistent activity 21,Footnote 22 It is through the thalamus that the cerebellum interacts with frontal cortex. Up till now, this summary of 20 describes an interesting mice experiment that is consistent with the increasing evidence for a cognitive function of the cerebellum.Footnote 23

However, the surprise of the study is that specifically the fastigial nucleus is required for this task. At first view this is completely unexpected, because the fastigial nucleus is the phylogenetically oldest cerebellar nucleus and is highly conserved in mammalian evolution.Footnote 24 Conversely, the dentate nucleus is known to project extensively to non-motor areas in cortex,Footnote 25 is greatly expanded in humansFootnote 26 and is generally assumed to be the structure involved in cognitive tasks of the cerebellum. The authors do not discuss why the fastigial nucleus is activated in their task and, because it was so unexpected, several colleagues in the cerebellar field do not believe the results reported in.20 However, if one decomposes the behavior into its different components it becomes much easier to explain. Discriminating the position of a structure close to the head of the animal to decide about a movement is a task for which activation of the fastigial nucleus is not surprising, because one of its main functions is axial and proximal motor control.23 In other words, it is probably the sensory component of the task, not the short-term memory component, that causes the fastigial nucleus to be involved and changes to the sensory component may cause other cerebellar nuclei to become necessary for the frontal cortex activation, as suggested by an unpublished study.Footnote 27 But this message is not conveyed by the paper.

In this editorial I discussed only a few examples of the lack of introspection and quality control that unfortunately affects much neuroscience research and I focused on the experimental side. Obviously more examples can be found, also in computational neuroscience,Footnote 28 and many of the challenges are not specific to neuroscience. As mentioned, these problems are exacerbated by the inertia of established scientific organizations and the group-thinking that guide many of the choices scientists make. To combat this, one has to look both inward and outward. Question yourself, are you trying to solve big questions about the healthy or diseased brain, or just collecting more data that will not contribute much to better understanding or effective treatment? Look beyond the boundaries of your field - I extensively did so in this editorial - and use this knowledge to improve your scientific planning. To return to the initial point, the experimentalist who intimidated my student could instead have tried to understand what modeling can contribute to his science.