Abstract
This work presents a new method and tool to solve a common problem of molecular biologists and geneticists who use molecular markers in their scientific research and developments: curation of sequences. Omic studies conducted by molecular biologists and geneticists usually involve the use of molecular markers. AFLP, cDNA-AFLP, and MSAP are examples of markers that render information at the genomics, transcriptomics, and epigenomics levels, respectively. These three types of molecular markers use adaptors that are the template for PCR amplification. The sequences of the adaptors have to be eliminated for the analysis of the results. Since a large number of sequences are usually obtained in these studies, this clean-up of the data could demand long time and work. To automate this work, an R package, named CleanBSequences, was created that allows the sequences to be curated massively, quickly, without errors and can be used offline. The curating is performed by aligning the forward and/or reverse primers or ends of cloning vectors with the sequences to be removed. After the alignment, new subsequences are generated without biological fragments not desired by the user, i.e., sequences needed by the techniques. In conclusion, the CleanBSequences tool facilitates the work of researchers, reducing time, effort, and working errors. Therefore, the present tool would respond to the problems related to the curation of sequences obtained from the use of some types of molecular markers. In addition to the above, being an open source, CleanBSequences is a flexible tool that has the potential to be used in future improvements to respond to new problems.
Similar content being viewed by others
References
Albertini E, Marconi G (2014) Methylation-sensitive amplified polymorphism (MSAP) marker to investigate drought-stress response in Montepulciano and Sangiovese grape cultivars. Methods Mol Biol 1112:151–164
Amini S, Maali-Amiri R, Mohammadi R, Kazemi-Shahandashti SS (2016) cDNA-AFLP analysis of transcripts induced in chickpea plants by TiO2 nanoparticles during cold stress. Plant Physiol Biochem 111:39–49
Depetris MB, Acuña CA, Pozz FI, Quarin CL, Felitti SA (2018) Identification of genes related to endosperm balance number insensitivity in Paspalum notatum. Crop Sci 58:813–822
Elliott MS, Scacchi W (2008) Mobilization of software developers: the free software movement. Inform Technol People 21(1):4–33
Felitti SA, Acuña CA, Ortiz JPA, Quarin CL (2015) Transcriptome analysis of seed development in apomictic Paspalum notatum. Ann Appl Biol 167:36–54
Gimenez MD, Yañez-Santos AM, Paz RC, Quiroga MP, Marfil CF, Conci VC, García-Lampasona SC (2016) Assessment of genetic and epigenetic changes in virus-free garlic (Allium sativum L.) plants obtained by meristem culture followed by in vitro propagation. Plant Cell Rep 35:129–141
Hiki K, Nakajima F, Tobino T (2017) Application of cDNA-AFLP to biomarker exploration in a non-model species Grandidierella japonica. Ecotoxicol Environ Saf 140:206–213
Hsu TW, Tsai WC, Wang DP, Lin S, Hsiao YY, Chen WH, Chen HH (2008) Differential gene expression analysis by cDNA-AFLP between flower buds of Phalaenopsis Hsiang Fei cv. H. F. and its somaclonal variant. Plant Sci 175(3):415–422
Ke L, Luo H, Zhang M, Yu X, Sun J, Sun Y (2017) Differential transcript profiling alters regulatory gene expression during the development of Gossypium arboreum, G.stocksii and somatic hybrids. Sci Rep 7(1):3120–3132
Mecchia MA, Ochogavía A, Pablo Selva J, Laspina N, Felitti S, Martelotto LG, Spangenberg G, Echenique V, Pessino SC (2007) Genome polymorphisms and gene differential expression in a ‘back-and-forth’ ploidy-altered series of weeping lovegrass (Eragrostis curvula). J Plant Physiol 164:1051–1061
Montaño-Pérez K, Villalpando E, Vargas-Albores F (2006) AFLP (amplified fragment length polymorphism) y su aplicación en acuicultura. Interciencia 31:563–569
Ochogavía AC, Seijo JG, González AM, Podio M, Duarte Silveira E, Machado Lacerda AL, de Campos T, Carneiro V, Ortiz JP, Pessino SC (2011) Characterization of retrotransposon sequences expressed in inflorescences of apomictic and sexual Paspalum notatum plants. Sex Plant Reprod 24:231–246
Pages H, Aboyoun P, Gentleman R, DebRoy S (2015) Biostrings: string objects representing biological sequences, and matching algorithms. R package version 2.36.1
Pereira da Costa JH, Rodríguez GR, Picardi LA, Zorzoli R, Pratta GR (2018) Genome-wide expression analysis at three fruit ripening stages for tomato genotypes differing in fruit shelf life. Sci Hortic 229:125–125
Pozzi FI, Pratta G, Acuña C, Felitti S (2019) Xenia in bahiagrass: gene expression at initial seed formation. Seed Sci Res 29(1):29–37
Sagiroglu S, Sinanc D (2013) Big data: a review. 2013 Int Conf Collab Technol Syst (CTS). https://doi.org/10.1109/cts.2013.6567202
Sihaloho HF (2015) R and its applications in ecological research. Mar Res Indones Indones 40:33–39
Soresi D, Carrera AD, Echenique V, Garbus I (2015) Identification of genes induced by Fusarium graminearum inoculation in the resistant durum wheat line Langdon (Dic-3A) 10 and the susceptible parental line Langdon. Microbiol Res 177:53–66
Vuylsteke M, Peleman JD, Van Eijk MJT (2007) AFLP-based transcript profiling (cDNA-AFLP) for genome-wide expression analysis. Nat Protoc 2:1399–1413
Wickham H, Danenberg P, Eugster M (2018) In-line documentation for R. R package version 6.1.1
Wickham H, Hester J, Chang W (2019) Tools to make developing R packages easier. R package version 2.1.0
Xiao X, Li H, Tang C (2009) A silver-staining cDNA-AFLP protocol suitable for transcript profiling in the latex of Hevea brasiliensis (Para Rubber Tree). Mol Biotechnol 42:91–99
Yaish MW, Peng M, Rothstein SJ (2014) Global DNA methylation analysis using methyl-sensitive amplification polymorphism (MSAP). Methods Mol Biol 1062:285–298
Acknowledgements
This work was supported by the Consejo Nacional de Investigaciones Científicas y Técnicas grants number [PIP11220090100613] and [PUE22920160100043CO (IICAR)], and the Agencia Nacional de Promoción Científica y Tecnológica [PICT20121321].
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Pozzi, F.I., Green, G.Y., Barbona, I.G. et al. CleanBSequences: an efficient curator of biological sequences in R. Mol Genet Genomics 295, 837–841 (2020). https://doi.org/10.1007/s00438-020-01671-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00438-020-01671-z