  • Corrigendum to: Metagenomic analysis of planktonic riverine microbial consortia using nanopore sequencing reveals insight into river microbe taxonomy and function
    Gigascience (IF 5.993) Pub Date : 2020-06-24
    Kate Reddington; David Eccles; Justin O'Grady; Devin M Drown; Lars Hestbjerg Hansen; Tue Kjærgaard Nielsen; Anne-Lise Ducluzeau; Richard M Leggett; Darren Heavens; Ned Peel; Terrance P Snutch; Anthony Bayega; Spyridon Oikonomopoulos; Ioannis Ragoussis; Thomas Barry; Eric van der Helm; Dino Jolic; Hollian Richardson; Hans Jansen; John R Tyson; Miten Jain; Bonnie L Brown

    In the original publication of this article, the author Jiannis Ragoussis was not part of the author byline. His name, affiliation, and funding information have been added to this article. The authors regret this error.

  • CandiMeth: Powerful yet simple visualization and quantification of DNA methylation at candidate genes.
    Gigascience (IF 5.993) Pub Date : 2020-06-22
    Sara-Jayne Thursby,Darin K Lobo,Kristina Pentieva,Shu-Dong Zhang,Rachelle E Irwin,Colum P Walsh

    DNA methylation microarrays are widely used in clinical epigenetics and are often processed using R packages such as ChAMP or RnBeads by trained bioinformaticians. However, looking at specific genes requires bespoke coding for which wet-lab biologists or clinicians are not trained. This leads to high demands on bioinformaticians, who may lack insight into the specific biological problem. To bridge

  • Reduced chromatin accessibility underlies gene expression differences in homologous chromosome arms of diploid Aegilops tauschii and hexaploid wheat.
    Gigascience (IF 5.993) Pub Date : 2020-06-20
    Fu-Hao Lu,Neil McKenzie,Laura-Jayne Gardiner,Ming-Cheng Luo,Anthony Hall,Michael W Bevan

    Polyploidy is centrally important in the evolution and domestication of plants because it leads to major genomic changes, such as altered patterns of gene expression, which are thought to underlie the emergence of new traits. Despite the common occurrence of these globally altered patterns of gene expression in polyploids, the mechanisms involved are not well understood.

  • The democratization of bioinformatics: A software engineering perspective.
    Gigascience (IF 5.993) Pub Date : 2020-06-20
    Brendan Lawlor,Roy D Sleator

    Today, thanks to advances in cloud computing, it is possible for small teams of software developers to produce internet-scale products, a feat that was previously the preserve of large organizations. Herein, we describe how these advances in software engineering can be made more readily available to bioinformaticians. In the same way that cloud computing has democratized access to distributed systems

  • Draft genome assemblies using sequencing reads from Oxford Nanopore Technology and Illumina platforms for four species of North American Fundulus killifish.
    Gigascience (IF 5.993) Pub Date : 2020-06-18
    Lisa K Johnson,Ruta Sahasrabudhe,James Anthony Gill,Jennifer L Roach,Lutz Froenicke,C Titus Brown,Andrew Whitehead

    Whole-genome sequencing data from wild-caught individuals of closely related North American killifish species (Fundulus xenicus, Fundulus catenatus, Fundulus nottii, and Fundulus olivaceus) were obtained using long-read Oxford Nanopore Technology (ONT) PromethION and short-read Illumina platforms.

  • CRISPRcasIdentifier: Machine learning for accurate identification and classification of CRISPR-Cas systems.
    Gigascience (IF 5.993) Pub Date : 2020-06-17
    Victor A Padilha,Omer S Alkhnbashi,Shiraz A Shah,André C P L F de Carvalho,Rolf Backofen

    CRISPR-Cas genes are extraordinarily diverse and evolve rapidly when compared to other prokaryotic genes. With the rapid increase in newly sequenced archaeal and bacterial genomes, manual identification of CRISPR-Cas systems is no longer viable. Thus, an automated approach is required for advancing our understanding of the evolution and diversity of these systems and for finding new candidates for

  • Watchdog 2.0: New developments for reusability, reproducibility, and workflow execution.
    Gigascience (IF 5.993) Pub Date : 2020-06-17
    Michael Kluge,Marie-Sophie Friedl,Amrei L Menzel,Caroline C Friedel

    Advances in high-throughput methods have brought new challenges for biological data analysis, often requiring many interdependent steps applied to a large number of samples. To address this challenge, workflow management systems, such as Watchdog, have been developed to support scientists in the (semi-)automated execution of large analysis workflows.

  • An improved pig reference genome sequence to enable pig genetics and genomics research.
    Gigascience (IF 5.993) Pub Date : 2020-06-16
    Amanda Warr,Nabeel Affara,Bronwen Aken,Hamid Beiki,Derek M Bickhart,Konstantinos Billis,William Chow,Lel Eory,Heather A Finlayson,Paul Flicek,Carlos G Girón,Darren K Griffin,Richard Hall,Greg Hannum,Thibaut Hourlier,Kerstin Howe,David A Hume,Osagie Izuogu,Kristi Kim,Sergey Koren,Haibou Liu,Nancy Manchanda,Fergal J Martin,Dan J Nonneman,Rebecca E O'Connor,Adam M Phillippy,Gary A Rohrer,Benjamin D Rosen

    The domestic pig (Sus scrofa) is important both as a food source and as a biomedical model given its similarity in size, anatomy, physiology, metabolism, pathology, and pharmacology to humans. The draft reference genome (Sscrofa10.2) of a purebred Duroc female pig established using older clone-based sequencing methods was incomplete, and unresolved redundancies, short-range order and orientation errors

  • Integrative computational epigenomics to build data-driven gene regulation hypotheses.
    Gigascience (IF 5.993) Pub Date : 2020-06-16
    Tyrone Chen,Sonika Tyagi

    Diseases are complex phenotypes often arising as an emergent property of a non-linear network of genetic and epigenetic interactions. To translate this resulting state into a causal relationship with a subset of regulatory features, many experiments deploy an array of laboratory assays from multiple modalities. Often, each of these resulting datasets is large, heterogeneous, and noisy. Thus, it is

  • Bioentity2vec: Attribute- and behavior-driven representation for predicting multi-type relationships between bioentities.
    Gigascience (IF 5.993) Pub Date : 2020-06-13
    Zhen-Hao Guo,Zhu-Hong You,Yan-Bin Wang,De-Shuang Huang,Hai-Cheng Yi,Zhan-Heng Chen

    The explosive growth of genomic, chemical, and pathological data provides new opportunities and challenges for humans to thoroughly understand life activities in cells. However, there exist few computational models that aggregate various bioentities to comprehensively reveal the physical and functional landscape of biological systems.

  • Galactic Circos: User-friendly Circos plots within the Galaxy platform.
    Gigascience (IF 5.993) Pub Date : 2020-06-12
    Helena Rasche,Saskia Hiltemann

    Circos is a popular, highly flexible software package for the circular visualization of complex datasets. While especially popular in the field of genomic analysis, Circos enables interactive graphing of any analytical data, including alternative scientific domain data and non-scientific data. This high degree of flexibility also comes with a high degree of complexity, which may present an obstacle

  • Metagenomic analysis of planktonic riverine microbial consortia using nanopore sequencing reveals insight into river microbe taxonomy and function.
    Gigascience (IF 5.993) Pub Date : 2020-06-10
    Kate Reddington,David Eccles,Justin O'Grady,Devin M Drown,Lars Hestbjerg Hansen,Tue Kjærgaard Nielsen,Anne-Lise Ducluzeau,Richard M Leggett,Darren Heavens,Ned Peel,Terrance P Snutch,Anthony Bayega,Spyridon Oikonomopoulos,Ioannis Ragoussis,Thomas Barry,Eric van der Helm,Dino Jolic,Hollian Richardson,Hans Jansen,John R Tyson,Miten Jain,Bonnie L Brown

    Riverine ecosystems are biogeochemical powerhouses driven largely by microbial communities that inhabit water columns and sediments. Because rivers are used extensively for anthropogenic purposes (drinking water, recreation, agriculture, and industry), it is essential to understand how these activities affect the composition of river microbial consortia. Recent studies have shown that river metagenomes

  • Trans-NanoSim characterizes and simulates nanopore RNA-sequencing data.
    Gigascience (IF 5.993) Pub Date : 2020-06-10
    Saber Hafezqorani,Chen Yang,Theodora Lo,Ka Ming Nip,René L Warren,Inanc Birol

    Compared with second-generation sequencing technologies, third-generation single-molecule RNA sequencing has unprecedented advantages; the long reads it generates facilitate isoform-level transcript characterization. In particular, the Oxford Nanopore Technology sequencing platforms have become more popular in recent years owing to their relatively high affordability and portability compared with other

  • Genomic consequences of dietary diversification and parallel evolution due to nectarivory in leaf-nosed bats.
    Gigascience (IF 5.993) Pub Date : 2020-06-06
    Yocelyn T Gutiérrez-Guerrero,Enrique Ibarra-Laclette,Carlos Martínez Del Río,Josué Barrera-Redondo,Eria A Rebollar,Jorge Ortega,Livia León-Paniagua,Araxi Urrutia,Erika Aguirre-Planter,Luis E Eguiarte

    The New World leaf-nosed bats (Phyllostomids) exhibit a diverse spectrum of feeding habits and innovations in their nutrient acquisition and foraging mechanisms. However, the genomic signatures associated with their distinct diets are unknown.

  • SnpHub: an easy-to-set-up web server framework for exploring large-scale genomic variation data in the post-genomic era with applications in wheat.
    Gigascience (IF 5.993) Pub Date : 2020-06-05
    Wenxi Wang,Zihao Wang,Xintong Li,Zhongfu Ni,Zhaorong Hu,Mingming Xin,Huiru Peng,Yingyin Yao,Qixin Sun,Weilong Guo

    The cost of high-throughput sequencing is rapidly decreasing, allowing researchers to investigate genomic variations across hundreds or even thousands of samples in the post-genomic era. The management and exploration of these large-scale genomic variation data require programming skills. The public genotype querying databases of many species are usually centralized and implemented independently, making

  • Education in the genomics era: Generating high-quality genome assemblies in university courses.
    Gigascience (IF 5.993) Pub Date : 2020-06-03
    Stefan Prost,Sven Winter,Jordi De Raad,Raphael T F Coimbra,Magnus Wolf,Maria A Nilsson,Malte Petersen,Deepak K Gupta,Tilman Schell,Fritjof Lammers,Axel Janke

    Recent advances in genome sequencing technologies have simplified the generation of genome data and reduced the costs for genome assemblies, even for complex genomes like those of vertebrates. More practically oriented genomic courses can prepare university students for the increasing importance of genomic data used in biological and medical research. Low-cost third-generation sequencing technology

  • Iterative hard thresholding in genome-wide association studies: Generalized linear models, prior weights, and double sparsity.
    Gigascience (IF 5.993) Pub Date : 2020-06-03
    Benjamin B Chu,Kevin L Keys,Christopher A German,Hua Zhou,Jin J Zhou,Eric M Sobel,Janet S Sinsheimer,Kenneth Lange

    Consecutive testing of single nucleotide polymorphisms (SNPs) is usually employed to identify genetic variants associated with complex traits. Ideally one should model all covariates in unison, but most existing analysis methods for genome-wide association studies (GWAS) perform only univariate regression.

  • Recommendations to enhance rigor and reproducibility in biomedical research.
    Gigascience (IF 5.993) Pub Date : 2020-06-01
    Jaqueline J Brito,Jun Li,Jason H Moore,Casey S Greene,Nicole A Nogoy,Lana X Garmire,Serghei Mangul

    Biomedical research depends increasingly on computational tools, but mechanisms ensuring open data, open software, and reproducibility are variably enforced by academic institutions, funders, and publishers. Publications may present software for which source code or documentation are or become unavailable; this compromises the role of peer review in evaluating technical strength and scientific contribution

  • A catalog of microbial genes from the bovine rumen unveils a specialized and diverse biomass-degrading environment.
    Gigascience (IF 5.993) Pub Date : 2020-05-30
    Junhua Li,Huanzi Zhong,Yuliaxis Ramayo-Caldas,Nicolas Terrapon,Vincent Lombard,Gabrielle Potocki-Veronese,Jordi Estellé,Milka Popova,Ziyi Yang,Hui Zhang,Fang Li,Shanmei Tang,Fangming Yang,Weineng Chen,Bing Chen,Jiyang Li,Jing Guo,Cécile Martin,Emmanuelle Maguin,Xun Xu,Huanming Yang,Jian Wang,Lise Madsen,Karsten Kristiansen,Bernard Henrissat,Stanislav D Ehrlich,Diego P Morgavi

    The rumen microbiota provides essential services to its host and, through its role in ruminant production, contributes to human nutrition and food security. A thorough knowledge of the genetic potential of rumen microbes will provide opportunities for improving the sustainability of ruminant production systems. The availability of gene reference catalogs from gut microbiomes has advanced the understanding

  • Fcirc: A comprehensive pipeline for the exploration of fusion linear and circular RNAs.
    Gigascience (IF 5.993) Pub Date : 2020-05-29
    Zhaoqing Cai,Hongzhang Xue,Yue Xu,Jens Köhler,Xiaojie Cheng,Yao Dai,Jie Zheng,Haiyun Wang

    In cancer cells, fusion genes can produce linear and chimeric fusion-circular RNAs (f-circRNAs), which are functional in gene expression regulation and implicated in malignant transformation, cancer progression, and therapeutic resistance. For specific cancers, proteins encoded by fusion transcripts have been identified as innovative therapeutic targets (e.g., EML4-ALK). Even though RNA sequencing

  • halSynteny: a fast, easy-to-use conserved synteny block construction method for multiple whole-genome alignments.
    Gigascience (IF 5.993) Pub Date : 2020-05-28
    Ksenia Krasheninnikova,Mark Diekhans,Joel Armstrong,Aleksei Dievskii,Benedict Paten,Stephen O'Brien

    Large-scale sequencing projects provide high-quality full-genome data that can be used for reconstruction of chromosomal exchanges and rearrangements that disrupt conserved syntenic blocks. The highest resolution of cross-species homology can be obtained on the basis of whole-genome, reference-free alignments. Very large multiple alignments of full-genome sequence stored in a binary format demand an

  • CSA: A high-throughput chromosome-scale assembly pipeline for vertebrate genomes.
    Gigascience (IF 5.993) Pub Date : 2020-05-25
    Heiner Kuhl,Ling Li,Sven Wuertz,Matthias Stöck,Xu-Fang Liang,Christophe Klopp

    BACKGROUND Easy-to-use and fast bioinformatics pipelines for long-read assembly that go beyond the contig level to generate highly continuous chromosome-scale genomes from raw data remain scarce. RESULT Chromosome-Scale Assembler (CSA) is a novel computationally highly efficient bioinformatics pipeline that fills this gap. CSA integrates information from scaffolded assemblies (e.g., Hi-C or 10X Genomics)

  • TinderMIX: Time-dose integrated modelling of toxicogenomics data.
    Gigascience (IF 5.993) Pub Date : 2020-05-25
    Angela Serra,Michele Fratello,Giusy Del Giudice,Laura Aliisa Saarimäki,Michelangelo Paci,Antonio Federico,Dario Greco

    BACKGROUND Omics technologies have been widely applied in toxicology studies to investigate the effects of different substances on exposed biological systems. A classical toxicogenomic study consists in testing the effects of a compound at different dose levels and different time points. The main challenge consists in identifying the gene alteration patterns that are correlated to doses and time points

  • parSMURF, a high-performance computing tool for the genome-wide detection of pathogenic variants.
    Gigascience (IF 5.993) Pub Date : 2020-05-23
    Alessandro Petrini,Marco Mesiti,Max Schubach,Marco Frasca,Daniel Danis,Matteo Re,Giuliano Grossi,Luca Cappelletti,Tiziana Castrignanò,Peter N Robinson,Giorgio Valentini

    BACKGROUND Several prediction problems in computational biology and genomic medicine are characterized by both big data as well as a high imbalance between examples to be learned, whereby positive examples can represent a tiny minority with respect to negative examples. For instance, deleterious or pathogenic variants are overwhelmed by the sea of neutral variants in the non-coding regions of the genome:

  • High-quality chromosome-scale assembly of the walnut (Juglans regia L.) reference genome.
    Gigascience (IF 5.993) Pub Date : 2020-05-01
    Annarita Marrano,Monica Britton,Paulo A Zaini,Aleksey V Zimin,Rachael E Workman,Daniela Puiu,Luca Bianco,Erica Adele Di Pierro,Brian J Allen,Sandeep Chakraborty,Michela Troggio,Charles A Leslie,Winston Timp,Abhaya Dandekar,Steven L Salzberg,David B Neale

    BACKGROUND The release of the first reference genome of walnut (Juglans regia L.) enabled many achievements in the characterization of walnut genetic and functional variation. However, it is highly fragmented, preventing the integration of genetic, transcriptomic, and proteomic information to fully elucidate walnut biological processes. FINDINGS Here, we report the new chromosome-scale assembly of

  • Smash++: an alignment-free and memory-efficient tool to find genomic rearrangements.
    Gigascience (IF 5.993) Pub Date : 2020-05-01
    Morteza Hosseini,Diogo Pratas,Burkhard Morgenstern,Armando J Pinho

    BACKGROUND The development of high-throughput sequencing technologies and, as its result, the production of huge volumes of genomic data, has accelerated biological and medical research and discovery. Study on genomic rearrangements is crucial owing to their role in chromosomal evolution, genetic disorders, and cancer. RESULTS We present Smash++, an alignment-free and memory-efficient tool to find

  • Ewastools: Infinium Human Methylation BeadChip pipeline for population epigenetics integrated into Galaxy.
    Gigascience (IF 5.993) Pub Date : 2020-05-01
    Katarzyna Murat,Björn Grüning,Paulina Wiktoria Poterlowicz,Gillian Westgate,Desmond J Tobin,Krzysztof Poterlowicz

    BACKGROUND Infinium Human Methylation BeadChip is an array platform for complex evaluation of DNA methylation at an individual CpG locus in the human genome based on Illumina's bead technology and is one of the most common techniques used in epigenome-wide association studies. Finding associations between epigenetic variation and phenotype is a significant challenge in biomedical research. The newest

  • Sequencing smart: De novo sequencing and assembly approaches for a non-model mammal.
    Gigascience (IF 5.993) Pub Date : 2020-05-01
    Graham J Etherington,Darren Heavens,David Baker,Ashleigh Lister,Rose McNelly,Gonzalo Garcia,Bernardo Clavijo,Iain Macaulay,Wilfried Haerty,Federica Di Palma

    BACKGROUND Whilst much sequencing effort has focused on key mammalian model organisms such as mouse and human, little is known about the relationship between genome sequencing techniques for non-model mammals and genome assembly quality. This is especially relevant to non-model mammals, where the samples to be sequenced are often degraded and of low quality. A key aspect when planning a genome project

  • Community standards for open cell migration data.
    Gigascience (IF 5.993) Pub Date : 2020-05-01
    Alejandra N Gonzalez-Beltran,Paola Masuzzo,Christophe Ampe,Gert-Jan Bakker,Sébastien Besson,Robert H Eibl,Peter Friedl,Matthias Gunzer,Mark Kittisopikul,Sylvia E Le Dévédec,Simone Leo,Josh Moore,Yael Paran,Jaime Prilusky,Philippe Rocca-Serra,Philippe Roudot,Marc Schuster,Gwendolien Sergeant,Staffan Strömblad,Jason R Swedlow,Merijn van Erp,Marleen Van Troys,Assaf Zaritsky,Susanna-Assunta Sansone,Lennart

    Cell migration research has become a high-content field. However, the quantitative information encapsulated in these complex and high-dimensional datasets is not fully exploited owing to the diversity of experimental protocols and non-standardized output formats. In addition, typically the datasets are not open for reuse. Making the data open and Findable, Accessible, Interoperable, and Reusable (FAIR)

  • Biospytial: spatial graph-based computing for ecological Big Data.
    Gigascience (IF 5.993) Pub Date : 2020-05-01
    Juan M Escamilla Molgora,Luigi Sedda,Peter M Atkinson

    BACKGROUND The exponential accumulation of environmental and ecological data together with the adoption of open data initiatives bring opportunities and challenges for integrating and synthesising relevant knowledge that need to be addressed, given the ongoing environmental crises. FINDINGS Here we present Biospytial, a modular open source knowledge engine designed to import, organise, analyse and

  • Global ocean resistome revealed: Exploring antibiotic resistance gene abundance and distribution in TARA Oceans samples.
    Gigascience (IF 5.993) Pub Date : 2020-05-01
    Rafael R C Cuadrat,Maria Sorokina,Bruno G Andrade,Tobias Goris,Alberto M R Dávila

    BACKGROUND The rise of antibiotic resistance (AR) in clinical settings is of great concern. Therefore, the understanding of AR mechanisms, evolution, and global distribution is a priority for patient survival. Despite all efforts in the elucidation of AR mechanisms in clinical strains, little is known about its prevalence and evolution in environmental microorganisms. We used 293 metagenomic samples

  • MaRe: Processing Big Data with application containers on Apache Spark.
    Gigascience (IF 5.993) Pub Date : 2020-05-01
    Marco Capuccini,Martin Dahlö,Salman Toor,Ola Spjuth

    BACKGROUND Life science is increasingly driven by Big Data analytics, and the MapReduce programming model has been proven successful for data-intensive analyses. However, current MapReduce frameworks offer poor support for reusing existing processing tools in bioinformatics pipelines. Furthermore, these frameworks do not have native support for application containers, which are becoming popular in

  • Antibiotic resistomes discovered in the gut microbiomes of Korean swine and cattle.
    Gigascience (IF 5.993) Pub Date : 2020-05-01
    Suk-Kyung Lim,Dongjun Kim,Dong-Chan Moon,Youna Cho,Mina Rho

    BACKGROUND Antibiotics administered to farm animals have led to increasing prevalence of resistance genes in different microbiomes and environments. While antibiotic treatments help cure infectious diseases in farm animals, the possibility of spreading antibiotic resistance genes into the environment and human microbiomes raises significant concerns. Through long-term evolution, antibiotic resistance

  • Multi-dimensional machine learning approaches for fruit shape phenotyping in strawberry.
    Gigascience (IF 5.993) Pub Date : 2020-05-01
    Mitchell J Feldmann,Michael A Hardigan,Randi A Famula,Cindy M López,Amy Tabb,Glenn S Cole,Steven J Knapp

    BACKGROUND Shape is a critical element of the visual appeal of strawberry fruit and is influenced by both genetic and non-genetic determinants. Current fruit phenotyping approaches for external characteristics in strawberry often rely on the human eye to make categorical assessments. However, fruit shape is an inherently multi-dimensional, continuously variable trait and not adequately described by

  • The gene-rich genome of the scallop Pecten maximus.
    Gigascience (IF 5.993) Pub Date : 2020-05-01
    Nathan J Kenny,Shane A McCarthy,Olga Dudchenko,Katherine James,Emma Betteridge,Craig Corton,Jale Dolucan,Dan Mead,Karen Oliver,Arina D Omer,Sarah Pelan,Yan Ryan,Ying Sims,Jason Skelton,Michelle Smith,James Torrance,David Weisz,Anil Wipat,Erez L Aiden,Kerstin Howe,Suzanne T Williams

    BACKGROUND The king scallop, Pecten maximus, is distributed in shallow waters along the Atlantic coast of Europe. It forms the basis of a valuable commercial fishery and plays a key role in coastal ecosystems and food webs. Like other filter feeding bivalves it can accumulate potent phytotoxins, to which it has evolved some immunity. The molecular origins of this immunity are of interest to evolutionary

  • Expression of concern: Dissection of soybean populations according to selection signatures based on whole-genome sequences.
    Gigascience (IF 5.993) Pub Date : 2020-05-01
    Jae-Yoon Kim,Seongmun Jeong,Kyoung Hyoun Kim,Won-Jun Lim,Ho-Yeon Lee,Namhee Jeong,Jung-Kyung Moon,Namshin Kim

    In 2019, GigaScience published the article “Dissection of soybean populations according to selection signatures based on whole-genome sequences,” by Jae-Yoon Kim and colleagues: https://doi.org/10.1093/gigascience/giz151. In January 2020, the editors received credible information about attribution problems with the source of the data in the article. In an attempt to address the concerns raised, the

  • Chromosome-level reference genome of the jellyfish Rhopilema esculentum.
    Gigascience (IF 5.993) Pub Date : 2020-04-01
    Yunfeng Li,Lei Gao,Yongjia Pan,Meilin Tian,Yulong Li,Chongbo He,Ying Dong,Yamin Sun,Zunchun Zhou

    BACKGROUND Jellyfish belong to the phylum Cnidaria, which occupies an important phylogenetic location in the early-branching Metazoa lineages. The jellyfish Rhopilema esculentum is an important fishery resource in China. However, the genome resource of R. esculentum has not been reported to date. FINDINGS In this study, we constructed a chromosome-level genome assembly of R. esculentum using Pacific

  • SRPRISM (Single Read Paired Read Indel Substitution Minimizer): an efficient aligner for assemblies with explicit guarantees.
    Gigascience (IF 5.993) Pub Date : 2020-04-01
    Aleksandr Morgulis,Richa Agarwala

    BACKGROUND Alignment of sequence reads generated by next-generation sequencing is an integral part of most pipelines analyzing next-generation sequencing data. A number of tools designed to quickly align a large volume of sequences are already available. However, most existing tools lack explicit guarantees about their output. They also do not support searching genome assemblies, such as the human

  • GSuite HyperBrowser: integrative analysis of dataset collections across the genome and epigenome.
    Gigascience (IF 5.993) Pub Date : 2017-04-27
    Boris Simovski,Daniel Vodák,Sveinung Gundersen,Diana Domanska,Abdulrahman Azab,Lars Holden,Marit Holden,Ivar Grytten,Knut Rand,Finn Drabløs,Morten Johansen,Antonio Mora,Christin Lund-Andersen,Bastian Fromm,Ragnhild Eskeland,Odd Stokke Gabrielsen,Egil Ferkingstad,Sigve Nakken,Mads Bengtsen,Alexander Johan Nederbragt,Hildur Sif Thorarensen,Johannes Andreas Akse,Ingrid Glad,Eivind Hovig,Geir Kjetil Sandve

    Background Recent large-scale undertakings such as ENCODE and Roadmap Epigenomics have generated experimental data mapped to the human reference genome (as genomic tracks) representing a variety of functional elements across a large number of cell types. Despite the high potential value of these publicly available data for a broad variety of investigations, little attention has been given to the analytical

  • Calculating the quality of public high-throughput sequencing data to obtain a suitable subset for reanalysis from the Sequence Read Archive.
    Gigascience (IF 5.993) Pub Date : 2017-06-01
    Tazro Ohta,Takeru Nakazato,Hidemasa Bono

    It is important for public data repositories to promote the reuse of archived data. In the growing field of omics science, however, the increasing number of submissions of high-throughput sequencing (HTSeq) data to public repositories prevents users from choosing a suitable data set from among the large number of search results. Repository users need to be able to set a threshold to reduce the number

  • Species-level evaluation of the human respiratory microbiome.
    Gigascience (IF 5.993) Pub Date : 2020-04-01
    Olufunmilola Ibironke,Lora R McGuinness,Shou-En Lu,Yaquan Wang,Sabiha Hussain,Clifford P Weisel,Lee J Kerkhof

    BACKGROUND Changes to human respiratory tract microbiome may contribute significantly to the progression of respiratory diseases. However, there are few studies examining the relative abundance of microbial communities at the species level along the human respiratory tract. FINDINGS Bronchoalveolar lavage, throat swab, mouth rinse, and nasal swab samples were collected from 5 participants. Bacterial

  • Artifact-free whole-slide imaging with structured illumination microscopy and Bayesian image reconstruction.
    Gigascience (IF 5.993) Pub Date : 2020-04-01
    Karl A Johnson,Guy M Hagen

    BACKGROUND Structured illumination microscopy (SIM) is a method that can be used to image biological samples and can achieve both optical sectioning and super-resolution effects. Optimization of the imaging set-up and data-processing methods results in high-quality images without artifacts due to mosaicking or due to the use of SIM methods. Reconstruction methods based on Bayesian estimation can be

  • Laniakea: an open solution to provide Galaxy “on-demand” instances over heterogeneous cloud infrastructures
    Gigascience (IF 5.993) Pub Date : 2020-04-06
    Marco Antonio Tangaro; Giacinto Donvito; Marica Antonacci; Matteo Chiara; Pietro Mandreoli; Graziano Pesole; Federico Zambelli

    While the popular workflow manager Galaxy is currently made available through several publicly accessible servers, there are scenarios where users can be better served by full administrative control over a private Galaxy instance, including, but not limited to, concerns about data privacy, customisation needs, prioritisation of particular job types, tools development, and training activities. In such

  • ShinyLearner: A containerized benchmarking tool for machine-learning classification of tabular data
    Gigascience (IF 5.993) Pub Date : 2020-04-06
    Stephen R Piccolo; Terry J Lee; Erica Suh; Kimball Hill

    Classification algorithms assign observations to groups based on patterns in data. The machine-learning community have developed myriad classification algorithms, which are used in diverse life science research domains. Algorithm choice can affect classification accuracy dramatically, so it is crucial that researchers optimize the choice of which algorithm(s) to apply in a given research domain on

  • Continuous chromosome-scale haplotypes assembled from a single interspecies F1 hybrid of yak and cattle
    Gigascience (IF 5.993) Pub Date : 2020-04-03
    Edward S Rice; Sergey Koren; Arang Rhie; Michael P Heaton; Theodore S Kalbfleisch; Timothy Hardy; Peter H Hackett; Derek M Bickhart; Benjamin D Rosen; Brian Vander Ley; Nicholas W Maurer; Richard E Green; Adam M Phillippy; Jessica L Petersen; Timothy P L Smith

    The development of trio binning as an approach for assembling diploid genomes has enabled the creation of fully haplotype-resolved reference genomes. Unlike other methods of assembly for diploid genomes, this approach is enhanced, rather than hindered, by the heterozygosity of the individual sequenced. To maximize heterozygosity and simultaneously assemble reference genomes for 2 species, we applied

  • Canfam_GSD: De novo chromosome-length genome assembly of the German Shepherd Dog (Canis lupus familiaris) using a combination of long reads, optical mapping, and Hi-C
    Gigascience (IF 5.993) Pub Date : 2020-04-01
    Matt A Field; Benjamin D Rosen; Olga Dudchenko; Eva K F Chan; Andre E Minoche; Richard J Edwards; Kirston Barton; Ruth J Lyons; Daniel Enosi Tuipulotu; Vanessa M Hayes; Arina D. Omer; Zane Colaric; Jens Keilwagen; Ksenia Skvortsova; Ozren Bogdanovic; Martin A Smith; Erez Lieberman Aiden; Timothy P L Smith; Robert A Zammit; J William O Ballard

    The German Shepherd Dog (GSD) is one of the most common breeds on earth and has been bred for its utility and intelligence. It is often first choice for police and military work, as well as protection, disability assistance, and search-and-rescue. Yet, GSDs are well known to be susceptible to a range of genetic diseases that can interfere with their training. Such diseases are of particular concern

  • Binning unassembled short reads based on k-mer abundance covariance using sparse coding.
    Gigascience (IF 5.993) Pub Date : 2020-04-01
    Olexiy Kyrgyzov,Vincent Prost,Stéphane Gazut,Bruno Farcy,Thomas Brüls

    BACKGROUND Sequence-binning techniques enable the recovery of an increasing number of genomes from complex microbial metagenomes and typically require prior metagenome assembly, incurring the computational cost and drawbacks of the latter, e.g., biases against low-abundance genomes and inability to conveniently assemble multi-terabyte datasets. RESULTS We present here a scalable pre-assembly binning

  • Multi-omics Visualization Platform: An extensible Galaxy plug-in for multi-omics data visualization and exploration
    Gigascience (IF 5.993) Pub Date : 2020-03-28
    Thomas McGowan; James E Johnson; Praveen Kumar; Ray Sajulga; Subina Mehta; Pratik D Jagtap; Timothy J Griffin

    Proteogenomics integrates genomics, transcriptomics, and mass spectrometry (MS)-based proteomics data to identify novel protein sequences arising from gene and transcript sequence variants. Proteogenomic data analysis requires integration of disparate ‘omic software tools, as well as customized tools to view and interpret results. The flexible Galaxy platform has proven valuable for proteogenomic data

  • Correction to: High-coverage genomes to elucidate the evolution of penguins
    Gigascience (IF 5.993) Pub Date : 2020-03-19
    Pan H, Cole T, Bi X, et al.

    This is a correction to: GigaScience, Volume 8, Issue 9, September 2019, giz117, https://doi.org/10.1093/gigascience/giz117

  • De novo assembly of the cattle reference genome with single-molecule sequencing.
    Gigascience (IF 5.993) Pub Date : 2020-03-01
    Benjamin D Rosen,Derek M Bickhart,Robert D Schnabel,Sergey Koren,Christine G Elsik,Elizabeth Tseng,Troy N Rowan,Wai Y Low,Aleksey Zimin,Christine Couldrey,Richard Hall,Wenli Li,Arang Rhie,Jay Ghurye,Stephanie D McKay,Françoise Thibaud-Nissen,Jinna Hoffman,Brenda M Murdoch,Warren M Snelling,Tara G McDaneld,John A Hammond,John C Schwartz,Wilson Nandolo,Darren E Hagen,Christian Dreischer,Sebastian J Schultheiss

    BACKGROUND Major advances in selection progress for cattle have been made following the introduction of genomic tools over the past 10-12 years. These tools depend upon the Bos taurus reference genome (UMD3.1.1), which was created using now-outdated technologies and is hindered by a variety of deficiencies and inaccuracies. RESULTS We present the new reference genome for cattle, ARS-UCD1.2, based on

  • Chromosome-level genome assembly of Aldrichina grahami, a forensically important blowfly.
    Gigascience (IF 5.993) Pub Date : 2020-03-01
    Fanming Meng,Zhuoying Liu,Han Han,Dmitrijs Finkelbergs,Yangshuai Jiang,Mingfei Zhu,Yang Wang,Zongyi Sun,Chao Chen,Yadong Guo,Jifeng Cai

    BACKGROUND Blowflies (Diptera: Calliphoridae) are the most commonly found entomological evidence in forensic investigations. Distinguished from other blowflies, Aldrichina grahami has some unique biological characteristics and is a species of forensic importance. Its development rate, pattern, and life cycle can provide valuable information for the estimation of the minimum postmortem interval. FINDINGS

  • *-DCC: A platform to collect, annotate, and explore a large variety of sequencing experiments.
    Gigascience (IF 5.993) Pub Date : 2020-03-01
    Matthias Hörtenhuber,Abdul K Mukarram,Marcus H Stoiber,James B Brown,Carsten O Daub

    BACKGROUND Over the past few years the variety of experimental designs and protocols for sequencing experiments increased greatly. To ensure the wide usability of the produced data beyond an individual project, rich and systematic annotation of the underlying experiments is crucial. FINDINGS We first developed an annotation structure that captures the overall experimental design as well as the relevant

  • PEMA: a flexible Pipeline for Environmental DNA Metabarcoding Analysis of the 16S/18S ribosomal RNA, ITS, and COI marker genes.
    Gigascience (IF 5.993) Pub Date : 2020-03-01
    Haris Zafeiropoulos,Ha Quoc Viet,Katerina Vasileiadou,Antonis Potirakis,Christos Arvanitidis,Pantelis Topalis,Christina Pavloudi,Evangelos Pafilis

    BACKGROUND Environmental DNA and metabarcoding allow the identification of a mixture of species and launch a new era in bio- and eco-assessment. Many steps are required to obtain taxonomically assigned matrices from raw data. For most of these, a plethora of tools are available; each tool's execution parameters need to be tailored to reflect each experiment's idiosyncrasy. Adding to this complexity

  • An image dataset related to automated macrophage detection in immunostained lymphoma tissue samples.
    Gigascience (IF 5.993) Pub Date : 2020-03-01
    Marcus Wagner,Sarah Reinke,René Hänsel,Wolfram Klapper,Ulf-Dietrich Braumann

    BACKGROUND We present an image dataset related to automated segmentation and counting of macrophages in diffuse large B-cell lymphoma (DLBCL) tissue sections. For the classification of DLBCL subtypes, as well as for providing a prognosis of the clinical outcome, the analysis of the tumor microenvironment and, particularly, of the different types and functions of tumor-associated macrophages is indispensable

  • Interpretable and accurate prediction models for metagenomics data.
    Gigascience (IF 5.993) Pub Date : 2020-03-01
    Edi Prifti,Yann Chevaleyre,Blaise Hanczar,Eugeni Belda,Antoine Danchin,Karine Clément,Jean-Daniel Zucker

    BACKGROUND Microbiome biomarker discovery for patient diagnosis, prognosis, and risk evaluation is attracting broad interest. Selected groups of microbial features provide signatures that characterize host disease states such as cancer or cardio-metabolic diseases. Yet, the current predictive models stemming from machine learning still behave as black boxes and seldom generalize well. Their interpretation

  • Chromosome-level genome assembly and annotation of the loquat (Eriobotrya japonica) genome.
    Gigascience (IF 5.993) Pub Date : 2020-03-01
    Shuang Jiang,Haishan An,Fangjie Xu,Xueying Zhang

    BACKGROUND The loquat (Eriobotrya japonica) is a species of flowering plant in the family Rosaceae that is widely cultivated in Asian, European, and African countries. It blossoms in the winter and ripens in the early summer. The genome of loquat has to date not been published, which limits the study of molecular biology in this cultivated species. Here, we used the third-generation sequencing technology

  • Introgression of Eastern Chinese and Southern Chinese haplotypes contributes to the improvement of fertility and immunity in European modern pigs.
    Gigascience (IF 5.993) Pub Date : 2020-03-01
    Hao Chen,Min Huang,Bin Yang,Zhongping Wu,Zheng Deng,Yong Hou,Jun Ren,Lusheng Huang

    BACKGROUND Pigs were domesticated independently from European and Asian wild boars nearly 10,000 years ago. Chinese indigenous pigs have been historically introduced to improve Europe local pigs. However, the geographic origin and biological functions of introgressed Chinese genes in modern European pig breeds remain largely unknown. RESULTS Here we explored whole-genome sequencing data from 266 Eurasian

  • Light-responsive expression atlas reveals the effects of light quality and intensity in Kalanchoë fedtschenkoi, a plant with crassulacean acid metabolism.
    Gigascience (IF 5.993) Pub Date : 2020-03-01
    Jin Zhang,Rongbin Hu,Avinash Sreedasyam,Travis M Garcia,Anna Lipzen,Mei Wang,Pradeep Yerramsetty,Degao Liu,Vivian Ng,Jeremy Schmutz,John C Cushman,Anne M Borland,Asher Pasha,Nicholas J Provart,Jin-Gui Chen,Wellington Muchero,Gerald A Tuskan,Xiaohan Yang

    BACKGROUND Crassulacean acid metabolism (CAM), a specialized mode of photosynthesis, enables plant adaptation to water-limited environments and improves photosynthetic efficiency via an inorganic carbon-concentrating mechanism. Kalanchoë fedtschenkoi is an obligate CAM model featuring a relatively small genome and easy stable transformation. However, the molecular responses to light quality and intensity

  • DeepPod: a convolutional neural network based quantification of fruit number in Arabidopsis.
    Gigascience (IF 5.993) Pub Date : 2020-03-01
    Azam Hamidinekoo,Gina A Garzón-Martínez,Morteza Ghahremani,Fiona M K Corke,Reyer Zwiggelaar,John H Doonan,Chuan Lu

    BACKGROUND High-throughput phenotyping based on non-destructive imaging has great potential in plant biology and breeding programs. However, efficient feature extraction and quantification from image data remains a bottleneck that needs to be addressed. Advances in sensor technology have led to the increasing use of imaging to monitor and measure a range of plants including the model Arabidopsis thaliana

  • A novel method for detecting morphologically similar crops and weeds based on the combination of contour masks and filtered Local Binary Pattern operators.
    Gigascience (IF 5.993) Pub Date : 2020-03-01
    Vi Nguyen Thanh Le,Selam Ahderom,Beniamin Apopei,Kamal Alameh

    BACKGROUND Weeds are a major cause of low agricultural productivity. Some weeds have morphological features similar to crops, making them difficult to discriminate. RESULTS We propose a novel method using a combination of filtered features extracted by combined Local Binary Pattern operators and features extracted by plant-leaf contour masks to improve the discrimination rate between broadleaf plants

