Skip to main content
Log in

CleanBSequences: an efficient curator of biological sequences in R

  • Methods Paper
  • Published:
Molecular Genetics and Genomics Aims and scope Submit manuscript

Abstract

This work presents a new method and tool to solve a common problem of molecular biologists and geneticists who use molecular markers in their scientific research and developments: curation of sequences. Omic studies conducted by molecular biologists and geneticists usually involve the use of molecular markers. AFLP, cDNA-AFLP, and MSAP are examples of markers that render information at the genomics, transcriptomics, and epigenomics levels, respectively. These three types of molecular markers use adaptors that are the template for PCR amplification. The sequences of the adaptors have to be eliminated for the analysis of the results. Since a large number of sequences are usually obtained in these studies, this clean-up of the data could demand long time and work. To automate this work, an R package, named CleanBSequences, was created that allows the sequences to be curated massively, quickly, without errors and can be used offline. The curating is performed by aligning the forward and/or reverse primers or ends of cloning vectors with the sequences to be removed. After the alignment, new subsequences are generated without biological fragments not desired by the user, i.e., sequences needed by the techniques. In conclusion, the CleanBSequences tool facilitates the work of researchers, reducing time, effort, and working errors. Therefore, the present tool would respond to the problems related to the curation of sequences obtained from the use of some types of molecular markers. In addition to the above, being an open source, CleanBSequences is a flexible tool that has the potential to be used in future improvements to respond to new problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Similar content being viewed by others

References

  • Albertini E, Marconi G (2014) Methylation-sensitive amplified polymorphism (MSAP) marker to investigate drought-stress response in Montepulciano and Sangiovese grape cultivars. Methods Mol Biol 1112:151–164

    Article  CAS  Google Scholar 

  • Amini S, Maali-Amiri R, Mohammadi R, Kazemi-Shahandashti SS (2016) cDNA-AFLP analysis of transcripts induced in chickpea plants by TiO2 nanoparticles during cold stress. Plant Physiol Biochem 111:39–49

    Article  Google Scholar 

  • Depetris MB, Acuña CA, Pozz FI, Quarin CL, Felitti SA (2018) Identification of genes related to endosperm balance number insensitivity in Paspalum notatum. Crop Sci 58:813–822

    Article  CAS  Google Scholar 

  • Elliott MS, Scacchi W (2008) Mobilization of software developers: the free software movement. Inform Technol People 21(1):4–33

    Article  Google Scholar 

  • Felitti SA, Acuña CA, Ortiz JPA, Quarin CL (2015) Transcriptome analysis of seed development in apomictic Paspalum notatum. Ann Appl Biol 167:36–54

    Article  CAS  Google Scholar 

  • Gimenez MD, Yañez-Santos AM, Paz RC, Quiroga MP, Marfil CF, Conci VC, García-Lampasona SC (2016) Assessment of genetic and epigenetic changes in virus-free garlic (Allium sativum L.) plants obtained by meristem culture followed by in vitro propagation. Plant Cell Rep 35:129–141

    Article  CAS  Google Scholar 

  • Hiki K, Nakajima F, Tobino T (2017) Application of cDNA-AFLP to biomarker exploration in a non-model species Grandidierella japonica. Ecotoxicol Environ Saf 140:206–213

    Article  CAS  Google Scholar 

  • Hsu TW, Tsai WC, Wang DP, Lin S, Hsiao YY, Chen WH, Chen HH (2008) Differential gene expression analysis by cDNA-AFLP between flower buds of Phalaenopsis Hsiang Fei cv. H. F. and its somaclonal variant. Plant Sci 175(3):415–422

    Article  CAS  Google Scholar 

  • Ke L, Luo H, Zhang M, Yu X, Sun J, Sun Y (2017) Differential transcript profiling alters regulatory gene expression during the development of Gossypium arboreum, G.stocksii and somatic hybrids. Sci Rep 7(1):3120–3132

    Article  Google Scholar 

  • Mecchia MA, Ochogavía A, Pablo Selva J, Laspina N, Felitti S, Martelotto LG, Spangenberg G, Echenique V, Pessino SC (2007) Genome polymorphisms and gene differential expression in a ‘back-and-forth’ ploidy-altered series of weeping lovegrass (Eragrostis curvula). J Plant Physiol 164:1051–1061

    Article  CAS  Google Scholar 

  • Montaño-Pérez K, Villalpando E, Vargas-Albores F (2006) AFLP (amplified fragment length polymorphism) y su aplicación en acuicultura. Interciencia 31:563–569

    Google Scholar 

  • Ochogavía AC, Seijo JG, González AM, Podio M, Duarte Silveira E, Machado Lacerda AL, de Campos T, Carneiro V, Ortiz JP, Pessino SC (2011) Characterization of retrotransposon sequences expressed in inflorescences of apomictic and sexual Paspalum notatum plants. Sex Plant Reprod 24:231–246

    Article  Google Scholar 

  • Pages H, Aboyoun P, Gentleman R, DebRoy S (2015) Biostrings: string objects representing biological sequences, and matching algorithms. R package version 2.36.1

  • Pereira da Costa JH, Rodríguez GR, Picardi LA, Zorzoli R, Pratta GR (2018) Genome-wide expression analysis at three fruit ripening stages for tomato genotypes differing in fruit shelf life. Sci Hortic 229:125–125

    Article  Google Scholar 

  • Pozzi FI, Pratta G, Acuña C, Felitti S (2019) Xenia in bahiagrass: gene expression at initial seed formation. Seed Sci Res 29(1):29–37

    Article  CAS  Google Scholar 

  • Sagiroglu S, Sinanc D (2013) Big data: a review. 2013 Int Conf Collab Technol Syst (CTS). https://doi.org/10.1109/cts.2013.6567202

    Article  Google Scholar 

  • Sihaloho HF (2015) R and its applications in ecological research. Mar Res Indones Indones 40:33–39

    Article  Google Scholar 

  • Soresi D, Carrera AD, Echenique V, Garbus I (2015) Identification of genes induced by Fusarium graminearum inoculation in the resistant durum wheat line Langdon (Dic-3A) 10 and the susceptible parental line Langdon. Microbiol Res 177:53–66

    Article  CAS  Google Scholar 

  • Vuylsteke M, Peleman JD, Van Eijk MJT (2007) AFLP-based transcript profiling (cDNA-AFLP) for genome-wide expression analysis. Nat Protoc 2:1399–1413

    Article  CAS  Google Scholar 

  • Wickham H, Danenberg P, Eugster M (2018) In-line documentation for R. R package version 6.1.1

  • Wickham H, Hester J, Chang W (2019) Tools to make developing R packages easier. R package version 2.1.0

  • Xiao X, Li H, Tang C (2009) A silver-staining cDNA-AFLP protocol suitable for transcript profiling in the latex of Hevea brasiliensis (Para Rubber Tree). Mol Biotechnol 42:91–99

    Article  CAS  Google Scholar 

  • Yaish MW, Peng M, Rothstein SJ (2014) Global DNA methylation analysis using methyl-sensitive amplification polymorphism (MSAP). Methods Mol Biol 1062:285–298

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the Consejo Nacional de Investigaciones Científicas y Técnicas grants number [PIP11220090100613] and [PUE22920160100043CO (IICAR)], and the Agencia Nacional de Promoción Científica y Tecnológica [PICT20121321].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Florencia I. Pozzi.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 12 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pozzi, F.I., Green, G.Y., Barbona, I.G. et al. CleanBSequences: an efficient curator of biological sequences in R. Mol Genet Genomics 295, 837–841 (2020). https://doi.org/10.1007/s00438-020-01671-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00438-020-01671-z

Keywords

Navigation