当前位置: X-MOL 学术Syst. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Identification of Species by Combining Molecular and Morphological Data Using Convolutional Neural Networks
Systematic Biology ( IF 6.5 ) Pub Date : 2021-09-11 , DOI: 10.1093/sysbio/syab076
Bing Yang 1 , Zhenxin Zhang 2, 3, 4 , Cai-Qing Yang 1 , Ying Wang 1 , Michael C Orr 5 , Hongbin Wang 6 , Ai-Bing Zhang 1
Affiliation  

Integrative taxonomy is central to modern taxonomy and systematic biology, including behavior, niche preference, distribution, morphological analysis, and DNA barcoding. However, decades of use demonstrate that these methods can face challenges when used in isolation, for instance, potential misidentifications due to phenotypic plasticity for morphological methods, and incorrect identifications because of introgression, incomplete lineage sorting, and horizontal gene transfer for DNA barcoding. Although researchers have advocated the use of integrative taxonomy, few detailed algorithms have been proposed. Here, we develop a convolutional neural network method (morphology-molecule network [MMNet]) that integrates morphological and molecular data for species identification. The newly proposed method (MMNet) worked better than four currently available alternative methods when tested with 10 independent data sets representing varying genetic diversity from different taxa. High accuracies were achieved for all groups, including beetles (98.1% of 123 species), butterflies (98.8% of 24 species), fishes (96.3% of 214 species), and moths (96.4% of 150 total species). Further, MMNet demonstrated a high degree of accuracy ($>$98%) in four data sets including closely related species from the same genus. The average accuracy of two modest subgenomic (single nucleotide polymorphism) data sets, comprising eight putative subspecies respectively, is 90%. Additional tests show that the success rate of species identification under this method most strongly depends on the amount of training data, and is robust to sequence length and image size. Analyses on the contribution of different data types (image vs. gene) indicate that both morphological and genetic data are important to the model, and that genetic data contribute slightly more. The approaches developed here serve as a foundation for the future integration of multimodal information for integrative taxonomy, such as image, audio, video, 3D scanning, and biosensor data, to characterize organisms more comprehensively as a basis for improved investigation, monitoring, and conservation of biodiversity. [Convolutional neural network; deep learning; integrative taxonomy; single nucleotide polymorphism; species identification.]

中文翻译:

通过使用卷积神经网络结合分子和形态数据来识别物种

综合分类学是现代分类学和系统生物学的核心,包括行为、生态位偏好、分布、形态分析和 DNA 条形码。然而,数十年的使用表明,这些方法在单独使用时可能会面临挑战,例如,由于形态学方法的表型可塑性导致的潜在错误识别,以及由于基因渗入、不完整的谱系分类和 DNA 条形码的水平基因转移而导致的错误识别。尽管研究人员提倡使用综合分类法,但很少有人提出详细的算法。在这里,我们开发了一种卷积神经网络方法(形态分子网络 [MMNet]),它整合了形态和分子数据以进行物种识别。新提出的方法 (MMNet) 在使用 10 个代表来自不同分类群的不同遗传多样性的独立数据集进行测试时,比目前可用的四种替代方法效果更好。所有组都实现了高精度,包括甲虫(123 个物种的 98.1%)、蝴蝶(24 个物种的 98.8%)、鱼类(214 个物种的 96.3%)和飞蛾(150 个物种的 96.4%)。此外,MMNet 在包括来自同一属的密切相关物种的四个数据集中展示了高度的准确度($>$98%)。两个适度的亚基因组(单核苷酸多态性)数据集(分别包括八个假定的亚种)的平均准确度为 90%。额外的测试表明,这种方法下物种识别的成功率很大程度上取决于训练数据的数量,并且对序列长度和图像大小具有鲁棒性。对不同数据类型(图像与基因)贡献的分析表明,形态和遗传数据对模型都很重要,而遗传数据的贡献略大。这里开发的方法为未来整合多模态信息以进行综合分类学奠定了基础,例如图像、音频、视频、3D 扫描和生物传感器数据,以更全面地表征生物体,作为改进调查、监测和保护的基础的生物多样性。【卷积神经网络;深度学习;综合分类法;单核苷酸多态性;物种鉴定。] 并且遗传数据的贡献略大。这里开发的方法为未来整合多模态信息以进行综合分类学奠定了基础,例如图像、音频、视频、3D 扫描和生物传感器数据,以更全面地表征生物体,作为改进调查、监测和保护的基础的生物多样性。【卷积神经网络;深度学习;综合分类法;单核苷酸多态性;物种鉴定。] 并且遗传数据的贡献略大。这里开发的方法为未来整合多模态信息以进行综合分类学奠定了基础,例如图像、音频、视频、3D 扫描和生物传感器数据,以更全面地表征生物体,作为改进调查、监测和保护的基础的生物多样性。【卷积神经网络;深度学习;综合分类法;单核苷酸多态性;物种鉴定。] 和保护生物多样性。【卷积神经网络;深度学习;综合分类法;单核苷酸多态性;物种鉴定。] 和保护生物多样性。【卷积神经网络;深度学习;综合分类法;单核苷酸多态性;物种鉴定。]
更新日期:2021-09-11
down
wechat
bug