Planta ( IF 4.3 ) Pub Date : 2024-04-26 , DOI: 10.1007/s00425-024-04420-3 Ernesto Rios-Willars , Michelle C. Chirinos-Arias
Main conclusion
Mfind is a tool to analyze the impact of microsatellite presence on DNA barcode specificity. We found a significant correlation between barcode entropy and microsatellite count in angiosperm.
Abstract
Genetic barcodes and microsatellites are some of the identification methods in taxonomy and biodiversity research. It is important to establish a relationship between microsatellite quantification and genetic information in barcodes. In order to clarify the association between the genetic information in barcodes (expressed as Shannon’s Measure of Information, SMI) and microsatellites count, a total of 330,809 DNA barcodes from the BOLD database (Barcode of Life Data System) were analyzed. A parallel sliding-window algorithm was developed to compute the Shannon entropy of the barcodes, and this was compared with the quantification of microsatellites like (AT)n, (AC)n, and (AG)n. The microsatellite search method utilized an algorithm developed in the Java programming language, which systematically examined the genetic barcodes from an angiosperm database. For this purpose, a computational tool named Mfind was developed, and its search methodology is detailed. This comprehensive study revealed a broad overview of microsatellites within barcodes, unveiling an inverse correlation between the sumz of microsatellites count and barcodes information. The utilization of the Mfind tool demonstrated that the presence of microsatellites impacts the barcode information when considering entropy as a metric. This effect might be attributed to the concise length of DNA barcodes and the repetitive nature of microsatellites, resulting in a direct influence on the entropy of the barcodes.
中文翻译:
Mfind:使用滑动窗口算法进行被子植物 DNA 条形码分析及其与微卫星关系的工具
主要结论
Mfind 是一种分析微卫星存在对 DNA 条形码特异性影响的工具。我们发现被子植物中条形码熵和微卫星计数之间存在显着相关性。
抽象的
遗传条形码和微卫星是分类学和生物多样性研究中的一些识别方法。建立微卫星定量与条形码中的遗传信息之间的关系非常重要。为了阐明条码中的遗传信息(表示为香农信息测量,SMI)与微卫星计数之间的关联,对 BOLD 数据库(生命条码数据系统)中总共 330,809 个 DNA 条码进行了分析。开发了并行滑动窗口算法来计算条形码的香农熵,并将其与 (AT)n、(AC)n 和 (AG)n 等微卫星的量化进行比较。微卫星搜索方法利用Java编程语言开发的算法,系统地检查被子植物数据库中的遗传条形码。为此,开发了名为 Mfind 的计算工具,并详细介绍了其搜索方法。这项综合研究揭示了条形码中微卫星的广泛概述,揭示了微卫星计数总和与条形码信息之间的负相关性。 Mfind 工具的使用表明,在将熵视为度量时,微卫星的存在会影响条形码信息。这种效应可能归因于 DNA 条形码的长度简洁和微卫星的重复性,从而直接影响条形码的熵。