  • Understanding the impact of preprocessing pipelines on neuroimaging cortical surface analyses
    Gigascience (IF 5.993) Pub Date : 2021-01-22
    Bhagwat N, Barry A, Dickie E, et al.

    Abstract BackgroundThe choice of preprocessing pipeline introduces variability in neuroimaging analyses that affects the reproducibility of scientific findings. Features derived from structural and functional MRI data are sensitive to the algorithmic or parametric differences of preprocessing tasks, such as image normalization, registration, and segmentation to name a few. Therefore it is critical

  • Multiple origins of a frameshift insertion in a mitochondrial gene in birds and turtles
    Gigascience (IF 5.993) Pub Date : 2021-01-19
    Andreu-Sánchez S, Chen W, Stiller J, et al.

    Abstract BackgroundDuring evolutionary history, molecular mechanisms have emerged to cope with deleterious mutations. Frameshift insertions in protein-coding sequences are extremely rare because they disrupt the reading frame. There are a few known examples of their correction through translational frameshifting, a process that enables ribosomes to skip nucleotides during translation to regain proper

  • A chromosome-level genome assembly of the oriental river prawn, Macrobrachium nipponense
    Gigascience (IF 5.993) Pub Date : 2021-01-18
    Jin S, Bian C, Jiang S, et al.

    Abstract BackgroundThe oriental river prawn, Macrobrachium nipponense, is an economically important shrimp in China. Male prawns have higher commercial value than females because the former grow faster and reach larger sizes. It is therefore important to reveal sex-differentiation and development mechanisms of the oriental river prawn to enable genetic improvement. ResultsWe sequenced 293.3 Gb of raw

  • BiG-SLiCE: A highly scalable tool maps the diversity of 1.2 million biosynthetic gene clusters
    Gigascience (IF 5.993) Pub Date : 2021-01-13
    Kautsar S, van der Hooft J, de Ridder D, et al.

    Abstract BackgroundGenome mining for biosynthetic gene clusters (BGCs) has become an integral part of natural product discovery. The >200,000 microbial genomes now publicly available hold information on abundant novel chemistry. One way to navigate this vast genomic diversity is through comparative analysis of homologous BGCs, which allows identification of cross-species patterns that can be matched

  • Streamlining data-intensive biology with workflow systems
    Gigascience (IF 5.993) Pub Date : 2021-01-13
    Reiter T, Brooks† P, Irber† L, et al.

    AbstractAs the scale of biological data generation has increased, the bottleneck of research has shifted from data generation to analysis. Researchers commonly need to build computational workflows that include multiple analytic tools and require incremental development as experimental insights demand tool and parameter modifications. These workflows can produce hundreds to thousands of intermediate

  • Genome diversity in Ukraine
    Gigascience (IF 5.993) Pub Date : 2021-01-13
    Taras K Oleksyk; Walter W Wolfsberger; Alexandra M Weber; Khrystyna Shchubelka; Olga T Oleksyk; Olga Levchuk; Alla Patrus; Nelya Lazar; Stephanie O Castro-Marquez; Yaroslava Hasynets; Patricia Boldyzhar; Mikhailo Neymet; Alina Urbanovych; Viktoriya Stakhovska; Kateryna Malyar; Svitlana Chervyakova; Olena Podoroha; Natalia Kovalchuk; Juan L Rodriguez-Flores; Weichen Zhou; Sarah Medley; Fabia Battistuzzi;

    The main goal of this collaborative effort is to provide genome-wide data for the previously underrepresented population in Eastern Europe, and to provide cross-validation of the data from genome sequences and genotypes of the same individuals acquired by different technologies. We collected 97 genome-grade DNA samples from consented individuals representing major regions of Ukraine that were consented

  • Identifying the effect of vancomycin on health care–associated methicillin-resistant Staphylococcus aureus strains using bacteriological and physiological media
    Gigascience (IF 5.993) Pub Date : 2021-01-09
    Rajput A, Poudel S, Tsunemoto H, et al.

    Abstract BackgroundThe evolving antibiotic-resistant behavior of health care–associated methicillin-resistant Staphylococcus aureus (HA-MRSA) USA100 strains are of major concern. They are resistant to a broad class of antibiotics such as macrolides, aminoglycosides, fluoroquinolones, and many more. FindingsThe selection of appropriate antibiotic susceptibility examination media is very important. Thus

  • Significantly improving the quality of genome assemblies through curation
    Gigascience (IF 5.993) Pub Date : 2021-01-09
    Howe K, Chow W, Collins J, et al.

    AbstractGenome sequence assemblies provide the basis for our understanding of biology. Generating error-free assemblies is therefore the ultimate, but sadly still unachieved goal of a multitude of research projects. Despite the ever-advancing improvements in data generation, assembly algorithms and pipelines, no automated approach has so far reliably generated near error-free genome assemblies for

  • PM4NGS, a project management framework for next-generation sequencing data analysis
    Gigascience (IF 5.993) Pub Date : 2021-01-07
    Vera Alvarez R, Pongor L, Mariño-Ramírez L, et al.

    Abstract BackgroundFAIR (Findability, Accessibility, Interoperability, and Reusability) next-generation sequencing (NGS) data analysis relies on complex computational biology workflows and pipelines to guarantee reproducibility, portability, and scalability. Moreover, workflow languages, managers, and container technologies have helped address the problem of data analysis pipeline execution across

  • Chromosome-level reference genome of the European wasp spider Argiope bruennichi: a resource for studies on range expansion and evolutionary adaptation
    Gigascience (IF 5.993) Pub Date : 2021-01-07
    Sheffer M, Hoppe A, Krehenwinkel H, et al.

    Abstract BackgroundArgiope bruennichi, the European wasp spider, has been investigated intensively as a focal species for studies on sexual selection, chemical communication, and the dynamics of rapid range expansion at a behavioral and genetic level. However, the lack of a reference genome has limited insights into the genetic basis for these phenomena. Therefore, we assembled a high-quality chromosome-level

  • A new duck genome reveals conserved and convergently evolved chromosome architectures of birds and mammals
    Gigascience (IF 5.993) Pub Date : 2021-01-06
    Jing Li; Jilin Zhang; Jing Liu; Yang Zhou; Cheng Cai; Luohao Xu; Xuelei Dai; Shaohong Feng; Chunxue Guo; Jinpeng Rao; Kai Wei; Erich D Jarvis; Yu Jiang; Zhengkui Zhou; Guojie Zhang; Qi Zhou

    Ducks have a typical avian karyotype that consists of macro- and microchromosomes, but a pair of much less differentiated ZW sex chromosomes compared to chickens. To elucidate the evolution of chromosome architectures between ducks and chickens, and between birds and mammals, we produced a nearly complete chromosomal assembly of a female Pekin duck by combining long-read sequencing and multiplatform

  • Tool recommender system in Galaxy using deep learning
    Gigascience (IF 5.993) Pub Date : 2021-01-06
    Kumar A, Rasche H, Grüning B, et al.

    Abstract BackgroundGalaxy is a web-based and open-source scientific data-processing platform. Researchers compose pipelines in Galaxy to analyse scientific data. These pipelines, also known as workflows, can be complex and difficult to create from thousands of tools, especially for researchers new to Galaxy. To help researchers with creating workflows, a system is developed to recommend tools that

  • GALLO: An R package for genomic annotation and integration of multiple data sources in livestock for positional candidate loci
    Gigascience (IF 5.993) Pub Date : 2020-12-30
    Fonseca P, Suárez-Vega A, Marras G, et al.

    Abstract BackgroundThe development of high-throughput sequencing and genotyping methodologies has enabled the identification of thousands of genomic regions associated with several complex traits. The integration of multiple sources of biological information is a crucial step required to better understand patterns regulating the development of these traits. FindingsGenomic Annotation in Livestock for

  • SoupX removes ambient RNA contamination from droplet-based single-cell RNA sequencing data
    Gigascience (IF 5.993) Pub Date : 2020-12-26
    Matthew D Young; Sam Behjati

    Droplet-based single-cell RNA sequence analyses assume that all acquired RNAs are endogenous to cells. However, any cell-free RNAs contained within the input solution are also captured by these assays. This sequencing of cell-free RNA constitutes a background contamination that confounds the biological interpretation of single-cell transcriptomic data.

  • Toward a scalable framework for reproducible processing of volumetric, nanoscale neuroimaging datasets
    Gigascience (IF 5.993) Pub Date : 2020-12-21
    Erik C Johnson; Miller Wilt; Luis M Rodriguez; Raphael Norman-Tenazas; Corban Rivera; Nathan Drenkow; Dean Kleissas; Theodore J LaGrow; Hannah P Cowley; Joseph Downs; Jordan K. Matelsky; Marisa J. Hughes; Elizabeth P. Reilly; Brock A. Wester; Eva L. Dyer; Konrad P. Kording; William R. Gray-Roncal

    Emerging neuroimaging datasets (collected with imaging techniques such as electron microscopy, optical microscopy, or X-ray microtomography) describe the location and properties of neurons and their connections at unprecedented scale, promising new ways of understanding the brain. These modern imaging techniques used to interrogate the brain can quickly accumulate gigabytes to petabytes of structural

  • Parliament2: Accurate structural variant calling at scale
    Gigascience (IF 5.993) Pub Date : 2020-12-21
    Samantha Zarate; Andrew Carroll; Medhat Mahmoud; Olga Krasheninina; Goo Jun; William J Salerno; Michael C Schatz; Eric Boerwinkle; Richard A Gibbs; Fritz J Sedlazeck

    Structural variants (SVs) are critical contributors to genetic diversity and genomic disease. To predict the phenotypic impact of SVs, there is a need for better estimates of both the occurrence and frequency of SVs, preferably from large, ethnically diverse cohorts. Thus, the current standard approach requires the use of short paired-end reads, which remain challenging to detect, especially at the

  • Identification of a differentiation stall in epithelial mesenchymal transition in histone H3–mutant diffuse midline glioma
    Gigascience (IF 5.993) Pub Date : 2020-12-15
    Lauren M Sanders; Allison Cheney; Lucas Seninge; Anouk van den Bout; Marissa Chen; Holly C Beale; Ellen Towle Kephart; Jacob Pfeil; Katrina Learned; A Geoffrey Lyle; Isabel Bjork; David Haussler; Sofie R Salama; Olena M Vaske

    Diffuse midline gliomas with histone H3 K27M (H3K27M) mutations occur in early childhood and are marked by an invasive phenotype and global decrease in H3K27me3, an epigenetic mark that regulates differentiation and development. H3K27M mutation timing and effect on early embryonic brain development are not fully characterized.

  • High-quality chromosome-level genome assembly and full-length transcriptome analysis of the pharaoh ant Monomorium pharaonis
    Gigascience (IF 5.993) Pub Date : 2020-12-15
    Qionghua Gao; Zijun Xiong; Rasmus Stenbak Larsen; Long Zhou; Jie Zhao; Guo Ding; Ruoping Zhao; Chengyuan Liu; Hao Ran; Guojie Zhang

    Ants with complex societies have fascinated scientists for centuries. Comparative genomic and transcriptomic analyses across ant species and castes have revealed important insights into the molecular mechanisms underlying ant caste differentiation. However, most current ant genomes and transcriptomes are highly fragmented and incomplete, which hinders our understanding of the molecular basis for complex

  • Long-read assembly of the Brassica napus reference genome Darmor-bzh
    Gigascience (IF 5.993) Pub Date : 2020-12-15
    Mathieu Rousseau-Gueutin; Caroline Belser; Corinne Da Silva; Gautier Richard; Benjamin Istace; Corinne Cruaud; Cyril Falentin; Franz Boideau; Julien Boutte; Regine Delourme; Gwenaëlle Deniot; Stefan Engelen; Julie Ferreira de Carvalho; Arnaud Lemainque; Loeiz Maillet; Jérôme Morice; Patrick Wincker; France Denoeud; Anne-Marie Chèvre; Jean-Marc Aury

    The combination of long reads and long-range information to produce genome assemblies is now accepted as a common standard. This strategy not only allows access to the gene catalogue of a given species but also reveals the architecture and organization of chromosomes, including complex regions such as telomeres and centromeres. The Brassica genus is not exempt, and many assemblies based on long reads

  • Genome sequencing of deep-sea hydrothermal vent snails reveals adaptions to extreme environments
    Gigascience (IF 5.993) Pub Date : 2020-12-15
    Xiang Zeng; Yaolei Zhang; Lingfeng Meng; Guangyi Fan; Jie Bai; Jianwei Chen; Yue Song; Inge Seim; Congyan Wang; Zenghua Shao; Nanxi Liu; Haorong Lu; Xiaoteng Fu; Liping Wang; Xin Liu; Shanshan Liu; Zongze Shao

    The scaly-foot snail (Chrysomallon squamiferum) is highly adapted to deep-sea hydrothermal vents and has drawn much interest since its discovery. However, the limited information on its genome has impeded further related research and understanding of its adaptation to deep-sea hydrothermal vents.

  • Making experimental data tables in the life sciences more FAIR: a pragmatic approach
    Gigascience (IF 5.993) Pub Date : 2020-12-15
    Daniel Jacob; Romain David; Sophie Aubin; Yves Gibon

    Making data compliant with the FAIR Data principles (Findable, Accessible, Interoperable, Reusable) is still a challenge for many researchers, who are not sure which criteria should be met first and how. Illustrated with experimental data tables associated with a Design of Experiments, we propose an approach that can serve as a model for research data management that allows researchers to disseminate

  • Comparison of the two up-to-date sequencing technologies for genome assembly: HiFi reads of Pacific Biosciences Sequel II system and ultralong reads of Oxford Nanopore
    Gigascience (IF 5.993) Pub Date : 2020-12-15
    Dandan Lang; Shilai Zhang; Pingping Ren; Fan Liang; Zongyi Sun; Guanliang Meng; Yuntao Tan; Xiaokang Li; Qihua Lai; Lingling Han; Depeng Wang; Fengyi Hu; Wen Wang; Shanlin Liu

    The availability of reference genomes has revolutionized the study of biology. Multiple competing technologies have been developed to improve the quality and robustness of genome assemblies during the past decade. The 2 widely used long-read sequencing providers—Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT)—have recently updated their platforms: PacBio enables high-throughput

  • DrivAER: Identification of driving transcriptional programs in single-cell RNA sequencing data
    Gigascience (IF 5.993) Pub Date : 2020-12-10
    Lukas M Simon; Fangfang Yan; Zhongming Zhao

    Single-cell RNA sequencing (scRNA-seq) unfolds complex transcriptomic datasets into detailed cellular maps. Despite recent success, there is a pressing need for specialized methods tailored towards the functional interpretation of these cellular maps.

  • Accurate assembly of the olive baboon (Papio anubis) genome using long-read and Hi-C data
    Gigascience (IF 5.993) Pub Date : 2020-12-07
    Sanjit Singh Batra; Michal Levy-Sakin; Jacqueline Robinson; Joseph Guillory; Steffen Durinck; Tauras P Vilgalys; Pui-Yan Kwok; Laura A Cox; Somasekar Seshagiri; Yun S Song; Jeffrey D Wall

    Baboons are a widely used nonhuman primate model for biomedical, evolutionary, and basic genetics research. Despite this importance, the genomic resources for baboons are limited. In particular, the current baboon reference genome Panu_3.0 is a highly fragmented, reference-guided (i.e., not fully de novo) assembly, and its poor quality inhibits our ability to conduct downstream genomic analyses.

  • Trajectories, bifurcations, and pseudo-time in large clinical datasets: applications to myocardial infarction and diabetes data
    Gigascience (IF 5.993) Pub Date : 2020-11-25
    Sergey E Golovenkin; Jonathan Bac; Alexander Chervov; Evgeny M Mirkes; Yuliya V Orlova; Emmanuel Barillot; Alexander N Gorban; Andrei Zinovyev

    Large observational clinical datasets are becoming increasingly available for mining associations between various disease traits and administered therapy. These datasets can be considered as representations of the landscape of all possible disease conditions, in which a concrete disease state develops through stereotypical routes, characterized by “points of no return" and “final states" (such as lethal

  • A methodological approach to correlate tumor heterogeneity with drug distribution profile in mass spectrometry imaging data
    Gigascience (IF 5.993) Pub Date : 2020-11-25
    Mridula Prasad; Geert Postma; Pietro Franceschi; Lavinia Morosi; Silvia Giordano; Francesca Falcetta; Raffaella Giavazzi; Enrico Davoli; Lutgarde M C Buydens; Jeroen Jansen

    Drug mass spectrometry imaging (MSI) data contain knowledge about drug and several other molecular ions present in a biological sample. However, a proper approach to fully explore the potential of such type of data is still missing. Therefore, a computational pipeline that combines different spatial and non-spatial methods is proposed to link the observed drug distribution profile with tumor heterogeneity

  • Chromosomal genome of Triplophysa bleekeri provides insights into its evolution and environmental adaptation
    Gigascience (IF 5.993) Pub Date : 2020-11-24
    Dengyue Yuan; Xuehui Chen; Haoran Gu; Ming Zou; Yu Zou; Jian Fang; Wenjing Tao; Xiangyan Dai; Shijun Xiao; Zhijian Wang

    Intense stresses caused by high-altitude environments may result in noticeable genetic adaptions in native species. Studies of genetic adaptations to high elevations have been largely limited to terrestrial animals. How fish adapt to high-elevation environments is largely unknown. Triplophysa bleekeri, an endemic fish inhabiting high-altitude regions, is an excellent model to investigate the genetic

  • SSNOMBACTER: A collection of scattering-type scanning near-field optical microscopy and atomic force microscopy images of bacterial cells
    Gigascience (IF 5.993) Pub Date : 2020-11-24
    Massimiliano Lucidi; Denis E Tranca; Lorenzo Nichele; Devrim Ünay; George A Stanciu; Paolo Visca; Alina Maria Holban; Radu Hristu; Gabriella Cincotti; Stefan G Stanciu

    In recent years, a variety of imaging techniques operating at nanoscale resolution have been reported. These techniques have the potential to enrich our understanding of bacterial species relevant to human health, such as antibiotic-resistant pathogens. However, owing to the novelty of these techniques, their use is still confined to addressing very particular applications, and their availability is

  • Localized effect of treated wastewater effluent on the resistome of an urban watershed
    Gigascience (IF 5.993) Pub Date : 2020-11-19
    Christopher N Thornton; Windy D Tanner; James A VanDerslice; William J Brazelton

    Wastewater treatment is an essential tool for maintaining water quality in urban environments. While the treatment of wastewater can remove most bacterial cells, some will inevitably survive treatment to be released into natural environments. Previous studies have investigated antibiotic resistance within wastewater treatment plants, but few studies have explored how a river’s complete set of antibiotic

  • GigaSOM.jl: High-performance clustering and visualization of huge cytometry datasets
    Gigascience (IF 5.993) Pub Date : 2020-11-18
    Miroslav Kratochvíl; Oliver Hunewald; Laurent Heirendt; Vasco Verissimo; Jiří Vondrášek; Venkata P Satagopam; Reinhard Schneider; Christophe Trefois; Markus Ollert

    The amount of data generated in large clinical and phenotyping studies that use single-cell cytometry is constantly growing. Recent technological advances allow the easy generation of data with hundreds of millions of single-cell data points with >40 parameters, originating from thousands of individual samples. The analysis of that amount of high-dimensional data becomes demanding in both hardware

  • Unifying package managers, workflow engines, and containers: Computational reproducibility with BioNix
    Gigascience (IF 5.993) Pub Date : 2020-11-18
    Justin Bedő; Leon Di Stefano; Anthony T Papenfuss

    A challenge for computational biologists is to make our analyses reproducible—i.e. to rerun, combine, and share, with the assurance that equivalent runs will generate identical results. Current best practice aims at this using a combination of package managers, workflow engines, and containers.

  • A proteomic approach reveals possible molecular mechanisms and roles for endosymbiotic bacteria in begomovirus transmission by whiteflies
    Gigascience (IF 5.993) Pub Date : 2020-11-13
    Adi Kliot; Richard S Johnson; Michael J MacCoss; Svetlana Kontsedalov; Galina Lebedev; Henryk Czosnek; Michelle Heck; Murad Ghanim

    Many plant viruses are vector-borne and depend on arthropods for transmission between host plants. Begomoviruses, the largest, most damaging and emerging group of plant viruses, infect hundreds of plant species, and new virus species of the group are discovered each year. Begomoviruses are transmitted by members of the whitefly Bemisia tabaci species complex in a persistent-circulative manner. Tomato

  • Efficient DNA sequence compression with neural networks
    Gigascience (IF 5.993) Pub Date : 2020-11-11
    Milton Silva; Diogo Pratas; Armando J Pinho

    The increasing production of genomic data has led to an intensified need for models that can cope efficiently with the lossless compression of DNA sequences. Important applications include long-term storage and compression-based data analysis. In the literature, only a few recent articles propose the use of neural networks for DNA sequence compression. However, they fall short when compared with specific

  • Adaptive venom evolution and toxicity in octopods is driven by extensive novel gene formation, expansion, and loss
    Gigascience (IF 5.993) Pub Date : 2020-11-10
    Brooke L Whitelaw; Ira R Cooke; Julian Finn; Rute R da Fonseca; Elena A Ritschard; M T P Gilbert; Oleg Simakov; Jan M Strugnell

    Cephalopods represent a rich system for investigating the genetic basis underlying organismal novelties. This diverse group of specialized predators has evolved many adaptations including proteinaceous venom. Of particular interest is the blue-ringed octopus genus (Hapalochlaena), which are the only octopods known to store large quantities of the potent neurotoxin, tetrodotoxin, within their tissues

  • Correcting for experiment-specific variability in expression compendia can remove underlying signals
    Gigascience (IF 5.993) Pub Date : 2020-11-03
    Alexandra J Lee; YoSon Park; Georgia Doing; Deborah A Hogan; Casey S Greene

    In the past two decades, scientists in different laboratories have assayed gene expression from millions of samples. These experiments can be combined into compendia and analyzed collectively to extract novel biological patterns. Technical variability, or "batch effects," may result from combining samples collected and processed at different times and in different settings. Such variability may distort

  • CopyDetective: Detection threshold–aware copy number variant calling in whole-exome sequencing data
    Gigascience (IF 5.993) Pub Date : 2020-11-02
    Sarah Sandmann; Marius Wöste; Aniek O de Graaf; Birgit Burkhardt; Joop H Jansen; Martin Dugas

    Copy number variants (CNVs) are known to play an important role in the development and progression of several diseases. However, detection of CNVs with whole-exome sequencing (WES) experiments is challenging. Usually, additional experiments have to be performed.

  • Prediction of single-cell gene expression for transcription factor analysis
    Gigascience (IF 5.993) Pub Date : 2020-10-30
    Fatemeh Behjati Ardakani; Kathrin Kattler; Tobias Heinen; Florian Schmidt; David Feuerborn; Gilles Gasparoni; Konstantin Lepikhov; Patrick Nell; Jan Hengstler; Jörn Walter; Marcel H Schulz

    Single-cell RNA sequencing is a powerful technology to discover new cell types and study biological processes in complex biological samples. A current challenge is to predict transcription factor (TF) regulation from single-cell RNA data.

  • A molecular map of lung neuroendocrine neoplasms
    Gigascience (IF 5.993) Pub Date : 2020-10-30
    Aurélie A G Gabriel; Emilie Mathian; Lise Mangiante; Catherine Voegele; Vincent Cahais; Akram Ghantous; James D McKay; Nicolas Alcala; Lynnette Fernandez-Cuesta; Matthieu Foll

    Lung neuroendocrine neoplasms (LNENs) are rare solid cancers, with most genomic studies including a limited number of samples. Recently, generating the first multi-omic dataset for atypical pulmonary carcinoids and the first methylation dataset for large-cell neuroendocrine carcinomas led us to the discovery of clinically relevant molecular groups, as well as a new entity of pulmonary carcinoids (supra-carcinoids)

  • The on-premise data sharing infrastructure e!DAL: Foster FAIR data for faster data acquisition
    Gigascience (IF 5.993) Pub Date : 2020-10-22
    Daniel Arend; Patrick König; Astrid Junker; Uwe Scholz; Matthias Lange

    The FAIR data principle as a commitment to support long-term research data management is widely accepted in the scientific community. Although the ELIXIR Core Data Resources and other established infrastructures provide comprehensive and long-term stable services and platforms for FAIR data management, a large quantity of research data is still hidden or at risk of getting lost. Currently, high-throughput

  • A single-cell RNA-sequencing training and analysis suite using the Galaxy framework
    Gigascience (IF 5.993) Pub Date : 2020-10-20
    Mehmet Tekman; Bérénice Batut; Alexander Ostrovsky; Christophe Antoniewski; Dave Clements; Fidel Ramirez; Graham J Etherington; Hans-Rudolf Hotz; Jelle Scholtalbers; Jonathan R Manning; Lea Bellenger; Maria A Doyle; Mohammad Heydarian; Ni Huang; Nicola Soranzo; Pablo Moreno; Stefan Mautner; Irene Papatheodorou; Anton Nekrutenko; James Taylor; Daniel Blankenberg; Rolf Backofen; Björn Grüning

    The vast ecosystem of single-cell RNA-sequencing tools has until recently been plagued by an excess of diverging analysis strategies, inconsistent file formats, and compatibility issues between different software suites. The uptake of 10x Genomics datasets has begun to calm this diversity, and the bioinformatics community leans once more towards the large computing requirements and the statistically

  • NanoGalaxy: Nanopore long-read sequencing data analysis in Galaxy
    Gigascience (IF 5.993) Pub Date : 2020-10-17
    Willem de Koning; Milad Miladi; Saskia Hiltemann; Astrid Heikema; John P Hays; Stephan Flemming; Marius van den Beek; Dana A Mustafa; Rolf Backofen; Björn Grüning; Andrew P Stubbs

    Long-read sequencing can be applied to generate very long contigs and even completely assembled genomes at relatively low cost and with minimal sample preparation. As a result, long-read sequencing platforms are becoming more popular. In this respect, the Oxford Nanopore Technologies–based long-read sequencing “nanopore" platform is becoming a widely used tool with a broad range of applications and

  • Interpreting k-mer–based signatures for antibiotic resistance prediction
    Gigascience (IF 5.993) Pub Date : 2020-10-17
    Magali Jaillard; Mattia Palmieri; Alex van Belkum; Pierre Mahé

    Recent years have witnessed the development of several k-mer–based approaches aiming to predict phenotypic traits of bacteria on the basis of their whole-genome sequences. While often convincing in terms of predictive performance, the underlying models are in general not straightforward to interpret, the interplay between the actual genetic determinant and its translation as k-mers being generally

  • The genetics-BIDS extension: Easing the search for genetic data associated with human brain imaging
    Gigascience (IF 5.993) Pub Date : 2020-10-17
    Clara A Moreau; Martineau Jean-Louis; Ross Blair; Christopher J Markiewicz; Jessica A Turner; Vince D Calhoun; Thomas E Nichols; Cyril R Pernet

    Metadata are what makes databases searchable. Without them, researchers would have difficulty finding data with features they are interested in. Brain imaging genetics is at the intersection of two disciplines, each with dedicated dictionaries and ontologies facilitating data search and analysis. Here, we present the genetics Brain Imaging Data Structure extension, consisting of metadata files for

  • IDseq—An open source cloud-based pipeline and analysis service for metagenomic pathogen detection and monitoring
    Gigascience (IF 5.993) Pub Date : 2020-10-15
    Katrina L Kalantar; Tiago Carvalho; Charles F A de Bourcy; Boris Dimitrov; Greg Dingle; Rebecca Egger; Julie Han; Olivia B Holmes; Yun-Fang Juan; Ryan King; Andrey Kislyuk; Michael F Lin; Maria Mariano; Todd Morse; Lucia V Reynoso; David Rissato Cruz; Jonathan Sheu; Jennifer Tang; James Wang; Mark A Zhang; Emily Zhong; Vida Ahyong; Sreyngim Lay; Sophana Chea; Jennifer A Bohl; Jessica E Manning; Cristina

    Metagenomic next-generation sequencing (mNGS) has enabled the rapid, unbiased detection and identification of microbes without pathogen-specific reagents, culturing, or a priori knowledge of the microbial landscape. mNGS data analysis requires a series of computationally intensive processing steps to accurately determine the microbial composition of a sample. Existing mNGS data analysis tools typically

  • Gene-set Enrichment with Mathematical Biology (GEMB)
    Gigascience (IF 5.993) Pub Date : 2020-10-09
    Amy L Cochran; Kenneth J Nieser; Daniel B Forger; Sebastian Zöllner; Melvin G McInnis

    Gene-set analyses measure the association between a disease of interest and a “set" of genes related to a biological pathway. These analyses often incorporate gene network properties to account for differential contributions of each gene. We extend this concept further—defining gene contributions based on biophysical properties—by leveraging mathematical models of biology to predict the effects of

  • An extensible big data software architecture managing a research resource of real-world clinical radiology data linked to other health data from the whole Scottish population
    Gigascience (IF 5.993) Pub Date : 2020-09-29
    Thomas Nind; James Sutherland; Gordon McAllister; Douglas Hardy; Ally Hume; Ruairidh MacLeod; Jacqueline Caldwell; Susan Krueger; Leandro Tramma; Ross Teviotdale; Mohammed Abdelatif; Kenny Gillen; Joe Ward; Donald Scobbie; Ian Baillie; Andrew Brooks; Bianca Prodan; William Kerr; Dominic Sloan-Murphy; Juan F R Herrera; Dan McManus; Carole Morris; Carol Sinclair; Rob Baxter; Mark Parsons; Andrew Morris;

    To enable a world-leading research dataset of routinely collected clinical images linked to other routinely collected data from the whole Scottish national population. This includes more than 30 million different radiological examinations from a population of 5.4 million and >2 PB of data collected since 2010.

  • Construction of a chromosome-scale long-read reference genome assembly for potato.
    Gigascience (IF 5.993) Pub Date : 2020-09-23
    Gina M Pham,John P Hamilton,Joshua C Wood,Joseph T Burke,Hainan Zhao,Brieanne Vaillancourt,Shujun Ou,Jiming Jiang,C Robin Buell

    Worldwide, the cultivated potato, Solanum tuberosum L., is the No. 1 vegetable crop and a critical food security crop. The genome sequence of DM1–3 516 R44, a doubled monoploid clone of S. tuberosum Group Phureja, was published in 2011 using a whole-genome shotgun sequencing approach with short-read sequence data. Current advanced sequencing technologies now permit generation of near-complete, high-quality

  • 3D Imaging and metabolomic profiling reveal higher neuroactive kavalactone contents in lateral roots and crown root peels of Piper methysticum (kava).
    Gigascience (IF 5.993) Pub Date : 2020-09-22
    Yogini S Jaiswal,Aaron M Yerke,M Caleb Bagley,Måns Ekelöf,Daniel Weber,Daniel Haddad,Anthony Fodor,David C Muddiman,Leonard L Williams

    Kava is an important neuroactive medicinal plant. While kava has a large global consumer footprint for its clinical and recreational use, factors related to its use lack standardization and the tissue-specific metabolite profile of its neuroactive constituents is not well understood.

  • Long-read only assembly of Drechmeria coniospora genomes reveals widespread chromosome plasticity and illustrates the limitations of current nanopore methods.
    Gigascience (IF 5.993) Pub Date : 2020-09-18
    Damien Courtine,Jan Provaznik,Jerome Reboul,Guillaume Blanc,Vladimir Benes,Jonathan J Ewbank

    Long-read sequencing is increasingly being used to determine eukaryotic genomes. We used nanopore technology to generate chromosome-level assemblies for 3 different strains of Drechmeria coniospora, a nematophagous fungus used extensively in the study of innate immunity in Caenorhabditis elegans.

  • Corrigendum to: Recommendations to enhance rigor and reproducibility in biomedical research.
    Gigascience (IF 5.993) Pub Date : 2020-09-17
    Jaqueline J Brito,Jun Li,Jason H Moore,Casey S Greene,Nicole A Nogoy,Lana X Garmire,Serghei Mangul

    In the original publication of this article, the author Serghei Mangul was erroneously listed with a second affiliation: Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, 90 090, USA, which has been removed from the author byline. The authors regret this error.

  • Machado: Open source genomics data integration framework.
    Gigascience (IF 5.993) Pub Date : 2020-09-14
    Mauricio de Alvarenga Mudadu,Adhemar Zerlotini

    Genome projects and multiomics experiments generate huge volumes of data that must be stored, mined, and transformed into useful knowledge. All this information is supposed to be accessible and, if possible, browsable afterwards. Computational biologists have been dealing with this scenario for more than a decade and have been implementing software and databases to meet this challenge. The GMOD's (Generic

  • Generation of a chromosome-scale genome assembly of the insect-repellent terpenoid-producing Lamiaceae species, Callicarpa americana.
    Gigascience (IF 5.993) Pub Date : 2020-09-01
    John P Hamilton,Grant T Godden,Emily Lanier,Wajid Waheed Bhat,Taliesin J Kinser,Brieanne Vaillancourt,Haiyan Wang,Joshua C Wood,Jiming Jiang,Pamela S Soltis,Douglas E Soltis,Bjoern Hamberger,C Robin Buell

    BACKGROUND Plants exhibit wide chemical diversity due to the production of specialized metabolites that function as pollinator attractants, defensive compounds, and signaling molecules. Lamiaceae (mints) are known for their chemodiversity and have been cultivated for use as culinary herbs, as well as sources of insect repellents, health-promoting compounds, and fragrance. FINDINGS We report the chromosome-scale

  • TGS-GapCloser: A fast and accurate gap closer for large genomes with low coverage of error-prone long reads.
    Gigascience (IF 5.993) Pub Date : 2020-09-01
    Mengyang Xu,Lidong Guo,Shengqiang Gu,Ou Wang,Rui Zhang,Brock A Peters,Guangyi Fan,Xin Liu,Xun Xu,Li Deng,Yongwei Zhang

    BACKGROUND Analyses that use genome assemblies are critically affected by the contiguity, completeness, and accuracy of those assemblies. In recent years single-molecule sequencing techniques generating long-read information have become available and enabled substantial improvement in contig length and genome completeness, especially for large genomes (>100 Mb), although bioinformatic tools for these

  • Chromosome-level genome assembly of the female western mosquitofish (Gambusia affinis).
    Gigascience (IF 5.993) Pub Date : 2020-08-27
    Feng Shao,Arne Ludwig,Yang Mao,Ni Liu,Zuogang Peng

    The western mosquitofish (Gambusia affinis) is a sexually dimorphic poeciliid fish known for its worldwide biological invasion and therefore an important research model for studying invasion biology. This organism may also be used as a suitable model to explore sex chromosome evolution and reproductive development in terms of differentiation of ZW sex chromosomes, ovoviviparity, and specialization

  • ScanITD: Detecting internal tandem duplication with robust variant allele frequency estimation.
    Gigascience (IF 5.993) Pub Date : 2020-08-27
    Ting-You Wang,Rendong Yang

    Internal tandem duplications (ITDs) are tandem duplications within coding exons and are important prognostic markers and drug targets for acute myeloid leukemia (AML). Next-generation sequencing has enabled the discovery of ITD at single-nucleotide resolution. ITD allele frequency is used in the risk stratification of patients with AML; higher ITD allele frequency is associated with poorer clinical

  • Hi-C chromosome conformation capture sequencing of avian genomes using the BGISEQ-500 platform.
    Gigascience (IF 5.993) Pub Date : 2020-08-26
    Marcela Sandoval-Velasco,Juan Antonio Rodríguez,Cynthia Perez Estrada,Guojie Zhang,Erez Lieberman Aiden,Marc A Marti-Renom,M Thomas P Gilbert,Oliver Smith

    Hi-C experiments couple DNA-DNA proximity with next-generation sequencing to yield an unbiased description of genome-wide interactions. Previous methods describing Hi-C experiments have focused on the industry-standard Illumina sequencing. With new next-generation sequencing platforms such as BGISEQ-500 becoming more widely available, protocol adaptations to fit platform-specific requirements are useful

  • Technical workflows for hyperspectral plant image assessment and processing on the greenhouse and laboratory scale.
    Gigascience (IF 5.993) Pub Date : 2020-08-20
    Stefan Paulus,Anne-Katrin Mahlein

    The use of hyperspectral cameras is well established in the field of plant phenotyping, especially as a part of high-throughput routines in greenhouses. Nevertheless, the workflows used differ depending on the applied camera, the plants being imaged, the experience of the users, and the measurement set-up.

  • A hybrid pipeline for reconstruction and analysis of viral genomes at multi-organ level.
    Gigascience (IF 5.993) Pub Date : 2020-08-20
    Diogo Pratas,Mari Toppinen,Lari Pyöriä,Klaus Hedman,Antti Sajantila,Maria F Perdomo

    Advances in sequencing technologies have enabled the characterization of multiple microbial and host genomes, opening new frontiers of knowledge while kindling novel applications and research perspectives. Among these is the investigation of the viral communities residing in the human body and their impact on health and disease. To this end, the study of samples from multiple tissues is critical, yet

  • Initial data release and announcement of the 10,000 Fish Genomes Project (Fish10K).
    Gigascience (IF 5.993) Pub Date : 2020-08-18
    Guangyi Fan,Yue Song,Liandong Yang,Xiaoyun Huang,Suyu Zhang,Mengqi Zhang,Xianwei Yang,Yue Chang,He Zhang,Yongxin Li,Shanshan Liu,Lili Yu,Jeffery Chu,Inge Seim,Chenguang Feng,Thomas J Near,Rod A Wing,Wen Wang,Kun Wang,Jing Wang,Xun Xu,Huanming Yang,Xin Liu,Nansheng Chen,Shunping He

    With more than 30,000 species, fish—including bony, jawless, and cartilaginous fish—are the largest vertebrate group, and include some of the earliest vertebrates. Despite their critical roles in many ecosystems and human society, fish genomics lags behind work on birds and mammals. This severely limits our understanding of evolution and hinders progress on the conservation and sustainable utilization