Abstract
Main conclusion
Mfind is a tool to analyze the impact of microsatellite presence on DNA barcode specificity. We found a significant correlation between barcode entropy and microsatellite count in angiosperm.
Abstract
Genetic barcodes and microsatellites are some of the identification methods in taxonomy and biodiversity research. It is important to establish a relationship between microsatellite quantification and genetic information in barcodes. In order to clarify the association between the genetic information in barcodes (expressed as Shannon’s Measure of Information, SMI) and microsatellites count, a total of 330,809 DNA barcodes from the BOLD database (Barcode of Life Data System) were analyzed. A parallel sliding-window algorithm was developed to compute the Shannon entropy of the barcodes, and this was compared with the quantification of microsatellites like (AT)n, (AC)n, and (AG)n. The microsatellite search method utilized an algorithm developed in the Java programming language, which systematically examined the genetic barcodes from an angiosperm database. For this purpose, a computational tool named Mfind was developed, and its search methodology is detailed. This comprehensive study revealed a broad overview of microsatellites within barcodes, unveiling an inverse correlation between the sumz of microsatellites count and barcodes information. The utilization of the Mfind tool demonstrated that the presence of microsatellites impacts the barcode information when considering entropy as a metric. This effect might be attributed to the concise length of DNA barcodes and the repetitive nature of microsatellites, resulting in a direct influence on the entropy of the barcodes.
Similar content being viewed by others
Data availability
Downloadable dataset is available at https://usegalaxy.org/u/rioswillars/h/angiosperm-dataset.
Code availability
The code of Mfind tool is available at https://github.com/riosew/Mfind in a file named Mfind_script.txt).
Abbreviations
- CR:
-
Conserved region
- MSA:
-
Multiple sequence alignment
- SMI:
-
Shannon’s Measure of Information
References
Almanza-Ruiz SH, Chavoya A, Duran-Limon HA (2023) Parallel protein multiple sequence alignment approaches: a systematic literature review. J Supercomput 79:1201–1234. https://doi.org/10.1007/s11227-022-04697-9
Andújar C, Arribas P, Yu DW, Vogler AP, Emerson BC (2018) Why the COI barcode should be the community DNA metabarcode for the metazoa. Mol Ecol 27:3968–3975. https://doi.org/10.1111/mec.14844
Aw AJ, Rosenberg NA (2018) Bounding measures of genetic similarity and diversity using majorization. J Math Biol 77:711–737. https://doi.org/10.1007/s00285-018-1226-x
Bañón R, Almón B, Rábade S, Ríos MB, De Carlos A (2023) DNA barcoding of the genus Magnisudis (Aulopiformes: Paralepididae) with a coastal record and biological features of Magnisudis atlantica. Biology (basel) 12:349. https://doi.org/10.3390/biology12030349
Bebber DP, Marriott FH, Gaston KJ, Harris SA, Scotland RW (2007) Predicting unknown species numbers using discovery curves. Proc Biol Sci 274:1651–1658. https://doi.org/10.1098/rspb.2007.0464
Bemis KE, Girard MG, Santos MD, Carpenter KE, Deeds JR, Pitassy DE, Flores NAL, Hunter ES, Driskell AC, Macdonald KS, Weigt LA, Williams JT (2023) Biodiversity of Philippine marine fishes: a DNA barcode reference library based on voucher specimens. Sci Data 10:411. https://doi.org/10.1038/s41597-023-02306-9
Benítez-Hidalgo A, Nebro AJ, Aldana-Montes JF (2020) Sequoya: multiobjective multiple sequence alignment in Python. Bioinformatics 36:3892–3893. https://doi.org/10.1093/bioinformatics/btaa257
Ben-Naim A (2006) The entropy of mixing and assimilation: an information-theoretical perspective. Am J Phys 74:1126–1135. https://doi.org/10.1119/1.2338545
Ben-Naim A (2017) Entropy, Shannon’s measure of information and Boltzmann’s H-theorem. Entropy 19:48. https://doi.org/10.3390/e19020048
Beovides Y, Fregene M, Alves A, Gutiérrez JP, Buitrago C, Marin JA, Milián MD, Rodríguez S, Cruz JA, Ruiz E (2006) Análisis de diversidad genética mediante microsatélites (SSR) en cultivares del germoplasma cubano de yuca. Biotecnol Veg 6:9–14
Bonizzoni P, Vedova GD (2001) The complexity of multiple sequence alignment with SP-score that is a metric. Theor Comput Sci 259:63–79. https://doi.org/10.1016/S0304-3975(99)00324-2
Braverman V (2016) Sliding window algorithms. In: Kao MY (ed) Encyclopedia of algorithms. Springer, New York, NY, pp 2006–2011
Chirinos-Arias MC, Jiménez JE (2015) Transferencia de algunos marcadores moleculares microsatélites de la familia Fabaceae en tarwi (Lupinus mutabilis Sweet). Sci Agropecu 6:51–58. https://doi.org/10.17268/sci.agropecu.2015.01.05
Chirinos-Arias MC, Jiménez JE, Vilca-Machaca LS (2015) Análisis de la variabilidad genética entre treinta accesiones de tarwi (Lupinus mutabilis Sweet) usando marcadores moleculares ISSR. Sci Agropecu 6:17–30. https://doi.org/10.17268/sci.agropecu.2015.01.02
Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW (2016) GenBank. Nucleic Acids Res 44:D67–D72. https://doi.org/10.1093/nar/gkv1276
De Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW (2010) GenBank. Nucleic Acids Res 38:D46–D51. https://doi.org/10.1093/nar/gkp1024
De Vere N, Rich TC, Trinder SA, Long C (2015) DNA barcoding for plants. Methods Mol Biol 1245:101–118. https://doi.org/10.1007/978-1-4939-1966-6_8
Fazekas AJ, Kesanakurti PR, Burgess KS, Percy DM, Graham SW, Barrett SC, Newmaster SG, Hajibabaei M, Husband BC (2009) Are plant species inherently harder to discriminate than animal species using DNA barcoding markers? Mol Ecol Resour 9(Suppl S1):130–139. https://doi.org/10.1111/j.1755-0998.2009.02652.x
Fazekas AJ, Kuzmina ML, Newmaster SG, Hollingsworth PM (2012) DNA barcoding methods for land plants. Methods Mol Biol 858:223–252. https://doi.org/10.1007/978-1-61779-591-6_11
Gibas C, Jamebeck P (2001) Bioinformatic computer skills. O’Reilly & Associates Inc, California
Hajibabaei M, Singer GA, Hebert PD, Hickey DA (2007) DNA barcoding: how it complements taxonomy, molecular phylogenetics and population genetics. Trends Genet 23:167–172. https://doi.org/10.1016/j.tig.2007.02.001
Hernández-Lalinde JD, Espinosa-Castro JF, Peñaloza-Tarazona ME, Fernández-González JE, Chacón-Rangel JG, Toloza-Sierra CA, Bermúdez-Pirela VJ (2018) Sobre el uso adecuado del coeficiente de correlación de Pearson: definición, propiedades y suposiciones. AVFT 37:587–595
Hollingsworth PM, Graham SW, Little DP (2011) Choosing and using a plant DNA barcode. PLoS ONE 6:e19254. https://doi.org/10.1371/journal.pone.0019254
Hubert N, Hanner R (2015) DNA barcoding, species delineation and taxonomy: a historical perspective. DNA Barcodes 3:44–58. https://doi.org/10.1515/dna-2015-0006
Iantorno S, Gori K, Goldman N, Gil M, Dessimoz C (2014) Who watches the watchmen? An appraisal of benchmarks for multiple sequence alignment. Methods Mol Biol 1079:59–73. https://doi.org/10.1007/978-1-62703-646-7_4
Ignatov KB, Miroshnikov AI, Kramarov VM (2003) A new approach to enhanced PCR specificity. Russ J Bioorg Chem 29:368–371. https://doi.org/10.1023/A:1024953302170
International Barcode of Life (2022) We are illuminating biodiversity. http://www.ibol.org/. Accessed 1 June 2023
Jabari M, Golparvar A, Sorkhilalehloo B, Shams M (2023) Investigation of genetic diversity of Iranian wild relatives of bread wheat using ISSR and SSR markers. J Genet Eng Biotechnol 21:73. https://doi.org/10.1186/s43141-023-00526-5
Jung S, Duwal RK, Lee S (2011) COI barcoding of true bugs (Insecta, Heteroptera). Mol Ecol Resour 11:266–270. https://doi.org/10.1111/j.1755-0998.2010.02945.x
Klimova A, Mondragón KYR, Aguirre-Planter E, Valiente A, Lira R, Eguiarte LE (2023) Genomic analysis unveils reduced genetic variability but increased proportion of heterozygotic genotypes of the intensively managed mezcal agave. Agave Angustifolia Am J Bot 110:e16216. https://doi.org/10.1002/ajb2.16216
Kress WJ, Erickson DL (2008) DNA barcodes: genes, genomics, and bioinformatics. Proc Natl Acad Sci USA 105(8):2761–2762. https://doi.org/10.1073/pnas.0800476105
Liu J, Shi L, Han J, Li G, Lu H, Hou J, Zhou X, Meng F, Downie SR (2014) Identification of species in the angiosperm family Apiaceae using DNA barcodes. Mol Ecol Resour 14:1231–1238. https://doi.org/10.1111/1755-0998.12262
Liu D, Tan W, Wang H, Li W, Fu J, Li J, Zhou Y, Lin M, Xing W (2023) Genetic diversity and genome-wide association study of 13 agronomic traits in 977 Beta vulgaris L. germplasms. BMC Genom 24:413. https://doi.org/10.1186/s12864-023-09522-y
Löytynoja A (2012) Alignment methods: strategies, challenges, benchmarking, and comparative overview. Methods Mol Biol 855:203–235. https://doi.org/10.1007/978-1-61779-582-4_7
Luo K, Chen S, Chen K, Song J, Yao H, Ma X, Zhu Y, Pang X, Yu H, Li X, Liu Z (2010) Assessment of candidate plant DNA barcodes using the Rutaceae family. Sci China Life Sci 53:701–708. https://doi.org/10.1007/s11427-010-4009-1
Mansour H, Alamer KH, Al-Hasawi ZM (2023) Population genetics, genetic structure, and inbreeding of Commiphora gileadensis (L.) C. Chr inferred from SSR markers in some mountainous sites of Makkah Province. Plants 12:2506. https://doi.org/10.3390/plants12132506
Martin JS, Smith NA, Francis CD (2013) Removing the entropy from the definition of entropy: clarifying the relationship between evolution, entropy, and the second law of thermodynamics. Evol Educ Outreach 6:30. https://doi.org/10.1186/1936-6434-6-30
Mason AS (2015) SSR genotyping. In: Batley J (ed) Plant genotyping: methods and protocols. Springer, New York, NY, pp 77–89
Miah G, Rafii MY, Ismail MR, Puteh AB, Rahim HA, Islam N, Latif MA (2013) A review of microsatellite markers and their applications in rice breeding programs to improve blast disease resistance. Int J Mol Sci 14:22499–22528. https://doi.org/10.3390/ijms141122499
Mishra BK, Chaudhary S, Yasin JK (2018) FabElm_BarcodeDb: matK barcode database of legumes. bioRxiv 241703
Monge RE, Crespo JL (2015) Analysis of data complexity in human DNA for gene-containing zone prediction. Entropy 17:1673–1689. https://doi.org/10.3390/e17041673
Mount DW (2009a) Comparing programs and methods to use for global multiple sequence alignment. Cold Spring Harb Protoc 2009:pdb.ip61. https://doi.org/10.1101/pdb.ip61
Mount DW (2009b) Using iterative methods for global multiple sequence alignment. Cold Spring Harb Protoc 2009:pdb.top44. https://doi.org/10.1101/pdb.top44
Nantongo JS, Odoi JB, Agaba H, Gwali S (2023) Genetic diversity and population structure of Vernonia amygdalina Del. in Uganda based on genome wide markers. PLoS ONE 18:e0283563. https://doi.org/10.1371/journal.pone.0283563
Naznin F, Sarker R, Essam D (2009) Iterative progressive alignment method (IPAM) for multiple sequence alignment. In: 2009 international conference on computers & industrial engineering. IEEE, Troyes, France, pp 536–541
Nithin C, Patwa N, Thomas A, Bahadur RP, Basak J (2015) Computational prediction of miRNAs and their targets in Phaseolus vulgaris using simple sequence repeat signatures. BMC Plant Biol 15:140. https://doi.org/10.1186/s12870-015-0516-3
Pang X, Song J, Zhu Y, Xie C, Chen S (2010) Using DNA barcoding to identify species within Euphorbiaceae. Planta Med 76:1784–1786. https://doi.org/10.1055/s-0030-1249806
Pang X, Song J, Zhu Y, Xu H, Huang L, Chen S (2011) Applying plant DNA barcodes for Rosaceae species identification. Cladistics 27:165–170. https://doi.org/10.1111/j.1096-0031.2010.00328.x
Parejo-Farnés C, Albaladejo RG, Camacho C, Aparicio A (2018) From species to individuals: combining barcoding and microsatellite analyses from non-invasive samples in plant ecology studies. Plant Ecol 219:1151–1158. https://doi.org/10.1007/s11258-018-0866-7
Pégard M, Barre P, Delaunay S, Surault F, Karagić D, Milić D, Zorić M, Ruttink T, Julier B (2023) Genome-wide genotyping data renew knowledge on genetic diversity of a worldwide alfalfa collection and give insights on genetic control of phenology traits. Front Plant Sci 14:1196134. https://doi.org/10.3389/fpls.2023.1196134
Pham T, Nguyen QT, Tran DM, Nguyen H, Le HT, Hoang QTH, Van YT, Tran TN (2022) Phylogenetic analysis based on DNA barcoding and genetic diversity assessment of Morinda officinalis how in Vietnam inferred by microsatellites. Genes (basel) 13:1938. https://doi.org/10.3390/genes13111938
Pozharskiy A, Kostyukova V, Khusnitdinova M, Adilbayeva K, Nizamdinova G, Kapytina A, Kerimbek N, Taskuzhina A, Kolchenko M, Abdrakhmanova A, Kisselyova N, Kalendar R, Gritsenko D (2023) Genetic diversity of the breeding collection of tomato varieties in Kazakhstan assessed using SSR, SCAR and CAPS Markers. PeerJ 11:e15683. https://doi.org/10.7717/peerj.15683
Rahimi M, AhmadiAfzadi M, Kordrostami M (2023) Genetic diversity in sickleweed (Falcaria vulgaris) and using stepwise regression to identify marker associated with traits. Sci Rep 13:12142. https://doi.org/10.1038/s41598-023-39419-5
Ranade SS, Lin YC, Zuccolo A, Van De Peer Y, García-Gil Mdel R (2014) Comparative in silico analysis of EST-SSRs in angiosperm and gymnosperm tree genera. BMC Plant Biol 14:220. https://doi.org/10.1186/s12870-014-0220-8
Ranwez V, Chantret N (2020) Strengths and limits of multiple sequence alignment and filtering methods. In: Scornavacca C, Delsuc F, Galtier N (eds) Phylogenetics in the genomic era, 1st edn. PGE, Montpellier, France, p 2.2.1-2.2.36
Ratnasingham S, Hebert PD (2007) Bold: the barcode of life data system (http://www.barcodinglife.org). Mol Ecol Notes 7:355–364. https://doi.org/10.1111/j.1471-8286.2007.01678.x
Roorkiwal M, Sharma PC (2011) Mining functional microsatellites in legume unigenes. Bioinformation 7:264–270. https://doi.org/10.6026/97320630007264
Savolainen V, Cowan RS, Vogler AP, Roderick GK, Lane R (2005) Towards writing the encyclopedia of life: an introduction to DNA barcoding. Philos Trans R Soc Lond B Biol Sci 360:1805–1811. https://doi.org/10.1098/rstb.2005.1730
Sneha MV, Madhushree AH, Tapas Ranjan S, Divakara BN, Kumara PM, Prabuddha HR (2023) Genome sequencing and characterization of microsatellite markers of Pterocarpus santalinus L.f.: an economically important endangered tree of Eastern Ghats, India. J Genet 102:35. https://doi.org/10.1007/s12041-023-01431-z
Sperchneide V (2010) Bioinformatics-problem solving paradigms. Springer, Osnabruck, Germany
The Galaxy Community (2022) The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2022 update. Nucleic Acids Res 50:W345–W351. https://doi.org/10.1093/nar/gkac247
Türkoğlu A, Haliloğlu K, Mohammadi SA, Öztürk A, Bolouri P, Özkan G, Bocianowski J, Pour-Aboughadareh A, Jamshidi B (2023) Genetic diversity and population structure in Turkiye bread wheat genotypes revealed by simple sequence repeats (SSR) markers. Genes 14:1182. https://doi.org/10.3390/genes14061182
Vanhaecke D, Garcia de Leaniz C, Gajardo G, Young K, Sanzana J, Orellana G, Fowler D, Howes P, Monzon-Arguello C, Consuegra S (2012) DNA barcoding and microsatellites help species delimitation and hybrid identification in endangered galaxiid fishes. PLoS ONE 7:e32939. https://doi.org/10.1371/journal.pone.0032939
Vieira ML, Santini L, Diniz AL, De Carla MF (2016) Microsatellite markers: what they mean and why they are so useful. Genet Mol Biol 39:312–328. https://doi.org/10.1590/1678-4685-gmb-2016-0027
Ward RD, Holmes BH, Zemlak TS, Smith PJ (2007) DNA barcoding discriminates spurdogs of the genus Squalus. In: Last PR, White WT, Pogonoski JJ (eds) Descriptions of new dogfishes of the genus Squalus (Squaloidea: Squalidae). CSIRO Marine Atmospheric Research, Hobart, Australia, pp 117–130
Ward RD, Hanner R, Hebert PD (2009) The campaign to DNA barcode all fishes, FISH-BOL. J Fish Biol 74:329–356. https://doi.org/10.1111/j.1095-8649.2008.02080.x
Waterman MS (1994) Introduction to computational biology. Chapman & Hall, New York
Waterman MS, Vingron M (1994) Sequence comparison significance and Poisson approximation. Stat Sci 9:367–381. https://doi.org/10.1214/ss/1177010382
Xu J, Liu L, Xu Y, Chen C, Rong T, Ali F, Zhou S, Wu F, Liu Y, Wang J, Cao M, Lu Y (2013) Development and characterization of simple sequence repeat markers providing genome-wide coverage and high resolution in maize. DNA Res 20:497–509. https://doi.org/10.1093/dnares/dst026
Zane L, Bargelloni L, Patarnello T (2002) Strategies for microsatellite isolation: a review. Mol Ecol 11:1–16. https://doi.org/10.1046/j.0962-1083.2001.01418.x
Zhang L, Yuan D, Yu S, Li Z, Cao Y, Miao Z, Qian H, Tang K (2004) Preference of simple sequence repeats in coding and non-coding regions of Arabidopsis thaliana. Bioinformatics 20:1081–1086. https://doi.org/10.1093/bioinformatics/bth043
Acknowledgements
The authors thank the Faculty of Systems from the Autonomous University of Coahuila and the Instituto de Genética Barbara McClintock (IGBM) for their support in scientific research.
Funding
The authors financed the research with their own resources.
Author information
Authors and Affiliations
Contributions
ERW conceived and designed the research, developed the code and saved the data in the cloud. MCA reviewed the code. ERW and MCA analyzed the data, wrote, reviewed, and approved the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Communicated by Dorothea Bartels.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Rios-Willars, E., Chirinos-Arias, M.C. Mfind: a tool for DNA barcode analysis in angiosperms and its relationship with microsatellites using a sliding window algorithm. Planta 259, 134 (2024). https://doi.org/10.1007/s00425-024-04420-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00425-024-04420-3