当前位置: X-MOL 学术Front. Zool. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Using natural history to guide supervised machine learning for cryptic species delimitation with genetic data
Frontiers in Zoology ( IF 2.6 ) Pub Date : 2022-02-22 , DOI: 10.1186/s12983-022-00453-0
Shahan Derkarabetian 1 , James Starrett 2 , Marshal Hedin 3
Affiliation  

The diversity of biological and ecological characteristics of organisms, and the underlying genetic patterns and processes of speciation, makes the development of universally applicable genetic species delimitation methods challenging. Many approaches, like those incorporating the multispecies coalescent, sometimes delimit populations and overestimate species numbers. This issue is exacerbated in taxa with inherently high population structure due to low dispersal ability, and in cryptic species resulting from nonecological speciation. These taxa present a conundrum when delimiting species: analyses rely heavily, if not entirely, on genetic data which over split species, while other lines of evidence lump. We showcase this conundrum in the harvester Theromaster brunneus, a low dispersal taxon with a wide geographic distribution and high potential for cryptic species. Integrating morphology, mitochondrial, and sub-genomic (double-digest RADSeq and ultraconserved elements) data, we find high discordance across analyses and data types in the number of inferred species, with further evidence that multispecies coalescent approaches over split. We demonstrate the power of a supervised machine learning approach in effectively delimiting cryptic species by creating a “custom” training data set derived from a well-studied lineage with similar biological characteristics as Theromaster. This novel approach uses known taxa with particular biological characteristics to inform unknown taxa with similar characteristics, using modern computational tools ideally suited for species delimitation. The approach also considers the natural history of organisms to make more biologically informed species delimitation decisions, and in principle is broadly applicable for taxa across the tree of life.

中文翻译:

利用自然历史指导监督机器学习,利用遗传数据进行神秘物种划界

生物体的生物和生态特征的多样性,以及潜在的遗传模式和物种形成过程,使得普遍适用的遗传物种划定方法的发展具有挑战性。许多方法,例如结合多物种聚结的方法,有时会划定种群并高估物种数量。由于低分散能力,在具有固有高种群结构的分类群中,以及在非生态物种形成导致的神秘物种中,这个问题更加严重。这些分类群在划分物种时提出了一个难题:分析严重(如果不是完全)依赖于分裂物种的遗传数据,而其他证据线则混杂在一起。我们在收割机 Theromaster brunneus 中展示了这个难题,一个低分散的分类单元,具有广泛的地理分布和神秘物种的高潜力。整合形态学、线粒体和亚基因组(双重消化 RADSeq 和超保守元素)数据,我们发现推断物种数量的分析和数据类型之间存在高度不一致,进一步的证据表明多物种合并方法过度分裂。我们通过创建一个“自定义”训练数据集来展示监督机器学习方法在有效界定神秘物种方面的力量,该训练数据集源自经过充分研究的谱系,具有与 Theromaster 相似的生物学特征。这种新方法使用具有特定生物学特征的已知分类群来告知具有相似特征的未知分类群,使用非常适合物种划界的现代计算工具。
更新日期:2022-02-22
down
wechat
bug