当前位置: X-MOL 学术Mol. Ecol. Resour. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Extending approximate Bayesian computation with supervised machine learning to infer demographic history from genetic polymorphisms using DIYABC Random Forest
Molecular Ecology Resources ( IF 7.7 ) Pub Date : 2021-05-05 , DOI: 10.1111/1755-0998.13413
François-David Collin 1 , Ghislain Durif 1 , Louis Raynal 1 , Eric Lombaert 2 , Mathieu Gautier 3 , Renaud Vitalis 3 , Jean-Michel Marin 1 , Arnaud Estoup 3
Affiliation  

Simulation-based methods such as approximate Bayesian computation (ABC) are well-adapted to the analysis of complex scenarios of populations and species genetic history. In this context, supervised machine learning (SML) methods provide attractive statistical solutions to conduct efficient inferences about scenario choice and parameter estimation. The Random Forest methodology (RF) is a powerful ensemble of SML algorithms used for classification or regression problems. Random Forest allows conducting inferences at a low computational cost, without preliminary selection of the relevant components of the ABC summary statistics, and bypassing the derivation of ABC tolerance levels. We have implemented a set of RF algorithms to process inferences using simulated data sets generated from an extended version of the population genetic simulator implemented in DIYABC v2.1.0. The resulting computer package, named DIYABC Random Forest v1.0, integrates two functionalities into a user-friendly interface: the simulation under custom evolutionary scenarios of different types of molecular data (microsatellites, DNA sequences or SNPs) and RF treatments including statistical tools to evaluate the power and accuracy of inferences. We illustrate the functionalities of DIYABC Random Forest v1.0 for both scenario choice and parameter estimation through the analysis of pseudo-observed and real data sets corresponding to pool-sequencing and individual-sequencing SNP data sets. Because of the properties inherent to the implemented RF methods and the large feature vector (including various summary statistics and their linear combinations) available for SNP data, DIYABC Random Forest v1.0 can efficiently contribute to the analysis of large SNP data sets to make inferences about complex population genetic histories.

中文翻译:

使用监督机器学习扩展近似贝叶斯计算,使用 DIYABC 随机森林从遗传多态性推断人口统计历史

基于模拟的方法,例如近似贝叶斯计算 (ABC),非常适合分析种群和物种遗传历史的复杂场景。在这种情况下,监督机器学习 (SML) 方法提供了有吸引力的统计解决方案,以对场景选择和参数估计进行有效的推理。随机森林方法 (RF) 是用于分类或回归问题的 SML 算法的强大集合。随机森林允许以较低的计算成本进行推理,无需预先选择 ABC 汇总统计的相关组件,并绕过 ABC 容差水平的推导。我们已经实现了一组 RF 算法来使用模拟数据集处理推理,这些数据集是从 DIYABC v2.1.0 中实现的种群遗传模拟器的扩展版本生成的。由此产生的名为 DIYABC Random Forest v1.0 的计算机包将两个功能集成到一个用户友好的界面中:不同类型分子数据(微卫星、DNA 序列或 SNP)的自定义进化场景下的模拟和 RF 处理,包括统计工具评估推理的能力和准确性。我们通过分析与池测序和个体测序 SNP 数据集对应的伪观察和真实数据集来说明 DIYABC 随机森林 v1.0 在场景选择和参数估计方面的功能。
更新日期:2021-05-05
down
wechat
bug