Skip to main content
Log in

GAD: A Python Script for Dividing Genome Annotation Files into Feature-Based Files

  • Short communication
  • Published:
Interdisciplinary Sciences: Computational Life Sciences Aims and scope Submit manuscript

Abstract

Nowadays, the manipulation and analysis of genomic data stored in publicly accessible repositories have become a daily task in genomics and bioinformatics laboratories. Due to the enormous advancement in the field of genome sequencing and the emergence of many projects, bioinformaticians have pushed for the creation of a variety of programs and pipelines that will automatically analyze such big data, in particular the pipelines of gene annotation. Dealing with annotation files using easy and simple programs is very important, particularly for non-developers, enhancing the genomic data analysis acceleration. One of the first tasks required to work with genomic annotation files is to extract different features. In this regard, we have developed GAD (https://github.com/bio-projects/GAD) using Python to be a fast, easy, and controlled script that has a high ability to handle annotation files such as GFF3 and GTF. GAD is a cross-platform graphical interface tool used to extract genome features such as intergenic regions, upstream, and downstream genes. Besides, GAD finds all names of ambiguous sequence ontology, and either extracts them or considers them as genes or transcripts. The results are produced in a variety of file formats, such as BED, GTF, GFF3, and FASTA, supported by other bioinformatics programs. The GAD can handle large sizes of different genomes and an infinite number of files with minimal user effort. Therefore, our script could be integrated into various pipelines in all genomic laboratories to accelerate data analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

References

  1. Eilbeck K et al (2005) The sequence ontology: a tool for the unification of genome annotations. Genome Biol 6(5):R44. https://doi.org/10.1186/gb-2005-6-5-r44

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Tweedie S et al (2009) FlyBase: enhancing Drosophila gene ontology annotations. Nucleic Acids Res 37(suppl_1):D555–D559. https://doi.org/10.1093/nar/gkn788

    Article  CAS  PubMed  Google Scholar 

  3. Harris TW et al (2010) WormBase: a comprehensive resource for nematode research. Nucleic Acids Res 38:D463–D467. https://doi.org/10.1093/nar/gkp952

    Article  CAS  PubMed  Google Scholar 

  4. Winsor GL et al (2010) Pseudomonas Genome Database: improved comparative analysis and population genomics capability for Pseudomonas genomes. Nucleic Acids Res 39(suppl_1):596–600. https://doi.org/10.1093/nar/gkq869

    Article  CAS  Google Scholar 

  5. Lamesch P et al (2012) The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools. Nucleic Acids Res 40(D1):D1202–D1210. https://doi.org/10.1093/nar/gkr1090

    Article  CAS  PubMed  Google Scholar 

  6. Cherry JM et al (2012) Saccharomyces Genome Database: the genomics resource of budding yeast. Nucleic Acids Res 40(D1):D700–D705. https://doi.org/10.1093/nar/gkr1029

    Article  CAS  PubMed  Google Scholar 

  7. NCBI Resource Coordinators (2013) Database resources of the national center for biotechnology information. Nucleic Acids Res 41(D1):D8–D20. https://doi.org/10.1093/nar/gks1189

    Article  CAS  Google Scholar 

  8. Zerbino DR et al (2018) Ensembl 2018. Nucleic Acids Res 46(D1):D754–D761. https://doi.org/10.1093/nar/gkx1098

    Article  CAS  PubMed  Google Scholar 

  9. Aken BL et al (2017) Ensembl. Nucleic acid res 45(D1):D635–642. https://doi.org/10.1093/nar/gkw1104

    Article  PubMed  PubMed Central  Google Scholar 

  10. dos Santos G et al (2015) FlyBase: introduction of the Drosophila melanogaster Release 6 reference genome assembly and large-scale migration of genome annotations. Nucleic Acids Res 43(D1):D690–D697. https://doi.org/10.1093/nar/gku1099

    Article  CAS  PubMed  Google Scholar 

  11. Howe K et al (2012) WormBase: annotating many nematode genomes. Worm 1(1):15–21. https://doi.org/10.4161/worm.19574

    Article  PubMed  PubMed Central  Google Scholar 

  12. O’Leary NA et al (2016) Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res 44(D1):D733–D745. https://doi.org/10.1093/nar/gkv1189

    Article  CAS  PubMed  Google Scholar 

  13. Potter SC et al (2004) The Ensembl analysis pipeline. Genome Res 14(5):934–941. https://doi.org/10.1101/gr.1859804

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Skrzypek MS, Hirschman J (2011) Using the Saccharomyces Genome Database (SGD) for analysis of genomic information. Curr Protoc Bioinf 35(1):1–20. https://doi.org/10.1002/0471250953.bi0120s35

    Article  Google Scholar 

  15. Tatusova T et al (2016) NCBI prokaryotic genome annotation pipeline. Nucleic Acids Res 44(14):6614–6624. https://doi.org/10.1093/nar/gkw569

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Winsor GL et al (2009) Pseudomonas Genome Database: facilitating user-friendly, comprehensive comparisons of microbial genomes. Nucleic acids Res 37(suppl_1):D483–D488. https://doi.org/10.1093/nar/gkn861

    Article  CAS  PubMed  Google Scholar 

  17. Trapnell C et al (2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28(5):511. https://doi.org/10.1038/nbt.1621

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Quinlan AR, Hall IM (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26(6):841–842. https://doi.org/10.1093/bioinformatics/btq033

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Camiolo S, Porceddu A (2013) gff2sequence, a new user friendly tool for the generation of genomic sequences. BioData Min 6:15. https://doi.org/10.1186/1756-0381-6-15

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Rastogi A, Gupta D (2014) GFF-Ex: a genome feature extraction package. BMC Res Notes 7(1):315. https://doi.org/10.1186/1756-0500-7-315

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Afgan E et al (2018) The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Res 46(W1):W537–W544. https://doi.org/10.1093/nar/gky379

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

In this paper, we would like to thank the families of the authors for their continued support. Special thanks to Assistant Professor Dr. Ahmed Ismail for his help and support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahmed Karam.

Ethics declarations

Conflict of interest

‘The author(s) declare that they have no competing interests.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (PDF 86 kb)

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yasser, N., Karam, A. GAD: A Python Script for Dividing Genome Annotation Files into Feature-Based Files. Interdiscip Sci Comput Life Sci 12, 377–381 (2020). https://doi.org/10.1007/s12539-020-00378-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12539-020-00378-4

Keywords

Navigation