当前位置:
X-MOL 学术
›
Mol. Ecol. Resour.
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Deep learning for population size history inference: Design, comparison and combination with approximate Bayesian computation
Molecular Ecology Resources ( IF 7.7 ) Pub Date : 2020-07-09 , DOI: 10.1111/1755-0998.13224 Théophile Sanchez 1 , Jean Cury 1 , Guillaume Charpiat 1 , Flora Jay 1
Molecular Ecology Resources ( IF 7.7 ) Pub Date : 2020-07-09 , DOI: 10.1111/1755-0998.13224 Théophile Sanchez 1 , Jean Cury 1 , Guillaume Charpiat 1 , Flora Jay 1
Affiliation
For the past decades, simulation-based likelihood-free inference methods have enabled researchers to address numerous population genetics problems. As the richness and amount of simulated and real genetic data keep increasing, the field has a strong opportunity to tackle tasks that current methods hardly solve. However, high data dimensionality forces most methods to summarize large genomic data sets into a relatively small number of handcrafted features (summary statistics). Here, we propose an alternative to summary statistics, based on the automatic extraction of relevant information using deep learning techniques. Specifically, we design artificial neural networks (ANNs) that take as input single nucleotide polymorphic sites (SNPs) found in individuals sampled from a single population and infer the past effective population size history. First, we provide guidelines to construct artificial neural networks that comply with the intrinsic properties of SNP data such as invariance to permutation of haplotypes, long scale interactions between SNPs and variable genomic length. Thanks to a Bayesian hyperparameter optimization procedure, we evaluate the performance of multiple networks and compare them to well-established methods like Approximate Bayesian Computation (ABC). Even without the expert knowledge of summary statistics, our approach compares fairly well to an ABC approach based on handcrafted features. Furthermore, we show that combining deep learning and ABC can improve performance while taking advantage of both frameworks. Finally, we apply our approach to reconstruct the effective population size history of cattle breed populations.
中文翻译:
人口规模历史推断的深度学习:设计、比较和结合近似贝叶斯计算
在过去的几十年里,基于模拟的无似然推理方法使研究人员能够解决许多种群遗传学问题。随着模拟和真实基因数据的丰富性和数量不断增加,该领域有很好的机会来解决当前方法难以解决的任务。然而,高数据维数迫使大多数方法将大型基因组数据集汇总为相对较少的手工特征(汇总统计)。在这里,我们提出了一种基于使用深度学习技术自动提取相关信息的汇总统计的替代方案。具体而言,我们设计了人工神经网络 (ANN),该网络将在从单个种群中采样的个体中发现的单核苷酸多态性位点 (SNP) 作为输入,并推断过去的有效种群规模历史。第一的,我们提供了构建符合 SNP 数据内在特性的人工神经网络的指南,例如单倍型排列的不变性、SNP 和可变基因组长度之间的长尺度相互作用。多亏了贝叶斯超参数优化程序,我们评估了多个网络的性能,并将它们与成熟的方法(如近似贝叶斯计算 (ABC))进行比较。即使没有汇总统计的专业知识,我们的方法与基于手工特征的 ABC 方法相比也相当不错。此外,我们表明结合深度学习和 ABC 可以在利用这两个框架的同时提高性能。最后,我们应用我们的方法来重建牛品种种群的有效种群规模历史。
更新日期:2020-07-09
中文翻译:
人口规模历史推断的深度学习:设计、比较和结合近似贝叶斯计算
在过去的几十年里,基于模拟的无似然推理方法使研究人员能够解决许多种群遗传学问题。随着模拟和真实基因数据的丰富性和数量不断增加,该领域有很好的机会来解决当前方法难以解决的任务。然而,高数据维数迫使大多数方法将大型基因组数据集汇总为相对较少的手工特征(汇总统计)。在这里,我们提出了一种基于使用深度学习技术自动提取相关信息的汇总统计的替代方案。具体而言,我们设计了人工神经网络 (ANN),该网络将在从单个种群中采样的个体中发现的单核苷酸多态性位点 (SNP) 作为输入,并推断过去的有效种群规模历史。第一的,我们提供了构建符合 SNP 数据内在特性的人工神经网络的指南,例如单倍型排列的不变性、SNP 和可变基因组长度之间的长尺度相互作用。多亏了贝叶斯超参数优化程序,我们评估了多个网络的性能,并将它们与成熟的方法(如近似贝叶斯计算 (ABC))进行比较。即使没有汇总统计的专业知识,我们的方法与基于手工特征的 ABC 方法相比也相当不错。此外,我们表明结合深度学习和 ABC 可以在利用这两个框架的同时提高性能。最后,我们应用我们的方法来重建牛品种种群的有效种群规模历史。