当前位置: X-MOL 学术Mol. Ecol. Resour. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
The choices we make and the impacts they have: Machine learning and species delimitation in North American box turtles (Terrapene spp.)
Molecular Ecology Resources ( IF 5.5 ) Pub Date : 2021-02-10 , DOI: 10.1111/1755-0998.13350
Bradley T Martin 1 , Tyler K Chafin 1 , Marlis R Douglas 1 , John S Placyk 2, 3 , Roger D Birkhead 4 , Christopher A Phillips 5 , Michael E Douglas 1
Affiliation  

Model-based approaches that attempt to delimit species are hampered by computational limitations as well as the unfortunate tendency by users to disregard algorithmic assumptions. Alternatives are clearly needed, and machine-learning (M-L) is attractive in this regard as it functions without the need to explicitly define a species concept. Unfortunately, its performance will vary according to which (of several) bioinformatic parameters are invoked. Herein, we gauge the effectiveness of M-L-based species-delimitation algorithms by parsing 64 variably-filtered versions of a ddRAD-derived SNP data set collected from North American box turtles (Terrapene spp.). Our filtering strategies included: (i) minor allele frequencies (MAF) of 5%, 3%, 1%, and 0% (= none), and (ii) maximum missing data per-individual/per-population at 25%, 50%, 75%, and 100% (= no filtering). We found that species-delimitation via unsupervised M-L impacted the signal-to-noise ratio in our data, as well as the discordance among resolved clades. The latter may also reflect biogeographic history, gene flow, incomplete lineage sorting, or combinations thereof (as corroborated from previously observed patterns of differential introgression). Our results substantiate M-L as a viable species-delimitation method, but also demonstrate how commonly observed patterns of phylogenetic discordance can seriously impact M-L-classification.

中文翻译:

我们做出的选择及其产生的影响:北美箱龟(Terrapene spp.)的机器学习和物种划分

试图划定物种的基于模型的方法受到计算限制以及用户忽视算法假设的不幸趋势的阻碍。显然需要替代方案,而机器学习 (ML) 在这方面很有吸引力,因为它无需明确定义物种概念即可发挥作用。不幸的是,它的性能将根据调用的(几个)生物信息学参数而有所不同。在此,我们通过解析从北美箱龟 ( Terrapene属)。我们的过滤策略包括:(i) 次要等位基因频率 (MAF) 为 5%、3%、1% 和 0%(= 无),以及 (ii) 每个个体/每个群体的最大缺失数据为 25%, 50%、75% 和 100%(= 无过滤)。我们发现通过无监督 ML 进行的物种定界影响了我们数据中的信噪比,以及已解析进化枝之间的不一致。后者也可能反映生物地理历史、基因流、不完整的谱系分类或其组合(如先前观察到的差异基因渗入模式所证实)。我们的结果证实 ML 作为一种可行的物种定界方法,但也证明了常见的系统发育不一致模式如何严重影响 ML 分类。
更新日期:2021-02-10
down
wechat
bug