当前位置: X-MOL 学术J. Anim. Breed. Genet. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Breeding beyond genomics
Journal of Animal Breeding and Genetics ( IF 1.9 ) Pub Date : 2021-04-22 , DOI: 10.1111/jbg.12547
Miguel Pérez-Enciso 1, 2
Affiliation  

From the Statistical point of view, I tend to think of Quantitative Genetics and related fields as domains where two main “pillars” cohabitate: Inference and Prediction. Even if the same tool, for example penalized linear models, can be used for both tasks and inference and prediction may reinforce each other, they are distinct concepts. It is interesting to observe how these two pillars have reacted to big data, that is the large p small n paradigm. While studies where the main target is inference have tried (unsuccessfully) to protect against false positives, prediction practitioners have embraced the new era with joy. Why is that so? Very simple: Prediction is falsifiable via cross‐validation, whereas inference validation is not that straightforward, and an increase in variables easily leads to confounding. Most relevant distributional properties in inference validation depend on knowing the actual, “true” model. Both inference and prediction are, however, encountering serious problems.

First, consider “Inference”. For many years, inference in breeding involved a few parameters and two or very few carefully chosen models, say including or not maternal effects. Today, literature is flooded with reports of genome wide association (GWAS) signals and studies on selective footprints. In a standard GWAS, that is when markers are individually estimated without penalization, a main issue is controlling false positive rates. Identifying selective sweeps is also tricky, numerous statistics coexist, each pinpointing to different genome regions. Further, significance is not well defined in this task. Do not get me wrong, I am responsible for some GWAS and a few selective sweep studies. Large scale GWAS in unrelated individuals from populations with a large effective size can be very useful. Understanding patterns of DNA variability is based in solid theory. In most livestock studies, though, one should take results with caution as few signals have been replicated in independent studies.

Prediction in turn has been blessed with multidimensionality. As long as penalization and cross‐validation are properly employed, having more variables is more desirable than having only a few. Success in prediction as number of predictors increased is astounding in the livestock and plant breeding fields, and genetic progress has accelerated since the application of genomic selection. This has been possible, I insist, because prediction is falsifiable and, therefore, pragmatism dominates. Of note, a model may predict well even without including the causative mutations and so a better performing model may not be the one that is closest to “biological causality.” The Achilles heel of prediction is interpretability. Most prediction machines are “black boxes,” although degree of “opacity” varies. GBLUP allows at least recovering marginal marker effects, whereas convolutional neural networks do not. In all, interpretability is a non‐negligible issue regarding communication of breeding methods to industry and society. Further, numerous prediction methods are available, yet they tend to perform similarly. Have we reached a “methodological” plateau?

Quantitative Genetics skills are in high demand worldwide, yet Breeding is a mature field where scientific advances seem incremental. As in many disciplines, animal breeders’ population is rather inbred, and scientific progress will likely increase by looking for inspiration outside our own field of science. Where are the main challenges of livestock breeding, then?

I am optimistic. I do see many exciting prospects in several areas and let me just mention a few. Phenomics, the automatic measurement of numerous phenotypes, is by far the main and most attractive challenge, in my opinion. Highly unstructured, massive and heterogeneous datasets can now be cheaply produced by sensors. New opportunities exist both for developing algorithms that transform raw data into meaningful phenotypes and for implementing breeding programmes based on high dimensional data. Among phenotypes, analysing individual and group behaviour via, say, video recording is an exciting problem. Impact of breeding on behaviour is a topic of utmost interest in terms of research, industry, and society.

Breeding programmes are accelerated evolutionary experiments and provide unique biological knowledge. This is a second domain where I foresee relevant discoveries, once longitudinal phenomic and genomic data sets are available. Animal genomes are highly resilient but also responsive; the same selective pressure is likely to result in (slightly) different allele frequency changes. Besides, response to selection has almost never been exhausted. This intriguing observation highlights the relevance of new mutations and that distinct physiological mechanisms may be activated in concerted action but at different stages.

Finally, domestication of terrestrial species has been a rare phenomenon in human history. Only a handful of species have been domesticated, likely because of behavioural and reproductive constraints. This scenario is completely different in aquaculture, where dozens of species recently have started to be grown in captivity, and many more are in the process. The aquaculture industry is in general more advanced technologically than terrestrial farming and poses new practical and methodological challenges. But domestication can be extended even more broadly, for example insects can be used for animal and human feeding. There are numerous uncharted territories for the curious breeder.

I finish by thanking numerous discussions with Miguel Toro, Daniel Gianola, Gustavo de los Campos, and Andrés Legarra throughout the years.



中文翻译:

超越基因组学的育种

从统计的角度来看,我倾向于将定量遗传学和相关领域视为两个主要“支柱”并存的领域:推理和预测。即使相同的工具,例如惩罚线性模型,可以同时用于任务并且推理和预测可以相互加强,但它们是不同的概念。观察这两个支柱如何对大数据做出反应很有趣,即大 p 小 n 范式. 虽然主要目标是推理的研究试图(失败)防止误报,但预测从业者已经欣然接受了新时代。为什么呢?很简单:通过交叉验证,预测是可证伪的,而推理验证不是那么简单,变量的增加很容易导致混淆。推理验证中最相关的分布特性取决于了解实际的“真实”模型。然而,推理和预测都遇到了严重的问题。

首先,考虑“推理”。多年来,育种推断涉及几个参数和两个或极少数精心挑选的模型,例如包括或不包括母体效应。今天,文献中充斥着关于全基因组关联 (GWAS) 信号和选择性足迹研究的报告。在标准 GWAS 中,即在没有惩罚的情况下单独估计标记时,主要问题是控制误报率。识别选择性扫描也很棘手,许多统计数据共存,每个都精确指向不同的基因组区域。此外,在这项任务中没有很好地定义重要性。不要误会我的意思,我负责一些 GWAS 和一些选择性扫描研究。来自具有大有效规模的人群中无关个体的大规模 GWAS 可能非常有用。理解 DNA 变异的模式是基于坚实的理论。然而,在大多数牲畜研究中,人们应该谨慎对待结果,因为在独立研究中复制的信号很少。

反过来,预测也具有多维性。只要适当地使用惩罚和交叉验证,拥有更多的变量比只有几个更可取。随着预测因子数量的增加,预测的成功在畜牧和植物育种领域令人震惊,并且自从应用基因组选择以来,遗传进展已经加速。我坚持认为,这是可能的,因为预测是可证伪的,因此实用主义占主导地位。值得注意的是,即使不包括致病突变,模型也可以很好地预测,因此性能更好的模型可能不是最接近“生物因果关系”的模型。预测的致命弱点是可解释性。大多数预测机器都是“黑匣子”,尽管“不透明”程度各不相同。GBLUP 至少允许恢复边缘标记效应,而卷积神经网络则不允许。总之,关于育种方法与工业和社会的交流,可解释性是一个不可忽视的问题。此外,有许多预测方法可用,但它们的性能往往相似。我们是否达到了“方法论”高原?

全球范围内对定量遗传学技能的需求量很大,但育种是一个成熟的领域,科学进步似乎是渐进式的。与许多学科一样,动物饲养者的人口是近亲繁殖的,通过在我们自己的科学领域之外寻找灵感,科学进步可能会增加。那么,畜牧业的主要挑战在哪里?

我很乐观。我确实在几个领域看到了许多令人兴奋的前景,让我只提几个。在我看来,表型组学是对众多表型的自动测量,是迄今为止主要和最具吸引力的挑战。高度非结构化、海量和异构的数据集现在可以通过传感器廉价地产生。开发将原始数据转化为有意义的表型的算法和实施基于高维数据的育种计划都存在新的机会。在表型中,通过视频记录分析个人和群体行为是一个令人兴奋的问题。育种对行为的影响是研究、工业和社会最感兴趣的话题。

育种计划是加速进化实验并提供独特的生物学知识。这是我预见相关发现的第二个领域,一旦纵向表型和基因组数据集可用。动物基因组具有很强的弹性,但也反应灵敏;相同的选择压力很可能导致(略微)不同的等位基因频率变化。此外,对选择的反应几乎从未用尽。这一有趣的观察强调了新突变的相关性,并且不同的生理机制可能会在协同行动中但在不同阶段被激活。

最后,陆生物种的驯化在人类历史上是罕见的现象。只有少数物种被驯化,可能是因为行为和生殖方面的限制。这种情况在水产养殖中是完全不同的,最近有数十种物种开始人工养殖,还有更多物种正在养殖中。一般来说,水产养殖业在技术上比陆上养殖更先进,并提出了新的实践和方法挑战。但是驯化可以扩展得更广泛,例如昆虫可用于动物和人类喂养。对于好奇的饲养员来说,有许多未知的领域。

最后,我感谢多年来与 Miguel Toro、Daniel Gianola、Gustavo de los Campos 和 Andrés Legarra 的多次讨论。

更新日期:2021-04-22
down
wechat
bug