当前期刊: GigaScience Go to current issue    加入关注   
显示样式:        排序: IF: - GO 导出
  • A single-cell RNA-sequencing training and analysis suite using the Galaxy framework
    Gigascience (IF 5.993) Pub Date : 2020-10-20
    Mehmet Tekman; Bérénice Batut; Alexander Ostrovsky; Christophe Antoniewski; Dave Clements; Fidel Ramirez; Graham J Etherington; Hans-Rudolf Hotz; Jelle Scholtalbers; Jonathan R Manning; Lea Bellenger; Maria A Doyle; Mohammad Heydarian; Ni Huang; Nicola Soranzo; Pablo Moreno; Stefan Mautner; Irene Papatheodorou; Anton Nekrutenko; James Taylor; Daniel Blankenberg; Rolf Backofen; Björn Grüning

    The vast ecosystem of single-cell RNA-sequencing tools has until recently been plagued by an excess of diverging analysis strategies, inconsistent file formats, and compatibility issues between different software suites. The uptake of 10x Genomics datasets has begun to calm this diversity, and the bioinformatics community leans once more towards the large computing requirements and the statistically

  • NanoGalaxy: Nanopore long-read sequencing data analysis in Galaxy
    Gigascience (IF 5.993) Pub Date : 2020-10-17
    Willem de Koning; Milad Miladi; Saskia Hiltemann; Astrid Heikema; John P Hays; Stephan Flemming; Marius van den Beek; Dana A Mustafa; Rolf Backofen; Björn Grüning; Andrew P Stubbs

    Long-read sequencing can be applied to generate very long contigs and even completely assembled genomes at relatively low cost and with minimal sample preparation. As a result, long-read sequencing platforms are becoming more popular. In this respect, the Oxford Nanopore Technologies–based long-read sequencing “nanopore" platform is becoming a widely used tool with a broad range of applications and

  • Interpreting k-mer–based signatures for antibiotic resistance prediction
    Gigascience (IF 5.993) Pub Date : 2020-10-17
    Magali Jaillard; Mattia Palmieri; Alex van Belkum; Pierre Mahé

    Recent years have witnessed the development of several k-mer–based approaches aiming to predict phenotypic traits of bacteria on the basis of their whole-genome sequences. While often convincing in terms of predictive performance, the underlying models are in general not straightforward to interpret, the interplay between the actual genetic determinant and its translation as k-mers being generally

  • The genetics-BIDS extension: Easing the search for genetic data associated with human brain imaging
    Gigascience (IF 5.993) Pub Date : 2020-10-17
    Clara A Moreau; Martineau Jean-Louis; Ross Blair; Christopher J Markiewicz; Jessica A Turner; Vince D Calhoun; Thomas E Nichols; Cyril R Pernet

    Metadata are what makes databases searchable. Without them, researchers would have difficulty finding data with features they are interested in. Brain imaging genetics is at the intersection of two disciplines, each with dedicated dictionaries and ontologies facilitating data search and analysis. Here, we present the genetics Brain Imaging Data Structure extension, consisting of metadata files for

  • IDseq—An open source cloud-based pipeline and analysis service for metagenomic pathogen detection and monitoring
    Gigascience (IF 5.993) Pub Date : 2020-10-15
    Katrina L Kalantar; Tiago Carvalho; Charles F A de Bourcy; Boris Dimitrov; Greg Dingle; Rebecca Egger; Julie Han; Olivia B Holmes; Yun-Fang Juan; Ryan King; Andrey Kislyuk; Michael F Lin; Maria Mariano; Todd Morse; Lucia V Reynoso; David Rissato Cruz; Jonathan Sheu; Jennifer Tang; James Wang; Mark A Zhang; Emily Zhong; Vida Ahyong; Sreyngim Lay; Sophana Chea; Jennifer A Bohl; Jessica E Manning; Cristina

    Metagenomic next-generation sequencing (mNGS) has enabled the rapid, unbiased detection and identification of microbes without pathogen-specific reagents, culturing, or a priori knowledge of the microbial landscape. mNGS data analysis requires a series of computationally intensive processing steps to accurately determine the microbial composition of a sample. Existing mNGS data analysis tools typically

  • Gene-set Enrichment with Mathematical Biology (GEMB)
    Gigascience (IF 5.993) Pub Date : 2020-10-09
    Amy L Cochran; Kenneth J Nieser; Daniel B Forger; Sebastian Zöllner; Melvin G McInnis

    Gene-set analyses measure the association between a disease of interest and a “set" of genes related to a biological pathway. These analyses often incorporate gene network properties to account for differential contributions of each gene. We extend this concept further—defining gene contributions based on biophysical properties—by leveraging mathematical models of biology to predict the effects of

  • Multimodal signal dataset for 11 intuitive movement tasks from single upper extremity during multiple recording sessions
    Gigascience (IF 5.993) Pub Date : 
    Jeong J, Cho J, Shim K, et al.

    Abstract BackgroundNon-invasive brain–computer interfaces (BCIs) have been developed for realizing natural bi-directional interaction between users and external robotic systems. However, the communication between users and BCI systems through artificial matching is a critical issue. Recently, BCIs have been developed to adopt intuitive decoding, which is the key to solving several problems such as

  • TRiCoLOR: tandem repeat profiling using whole-genome long-read sequencing data
    Gigascience (IF 5.993) Pub Date : 
    Bolognini D, Magi A, Benes V, et al.

    Abstract BackgroundTandem repeat sequences are widespread in the human genome, and their expansions cause multiple repeat-mediated disorders. Genome-wide discovery approaches are needed to fully elucidate their roles in health and disease, but resolving tandem repeat variation accurately remains a challenging task. While traditional mapping-based approaches using short-read data have severe limitations

  • An extensible big data software architecture managing a research resource of real-world clinical radiology data linked to other health data from the whole Scottish population
    Gigascience (IF 5.993) Pub Date : 2020-09-29
    Thomas Nind; James Sutherland; Gordon McAllister; Douglas Hardy; Ally Hume; Ruairidh MacLeod; Jacqueline Caldwell; Susan Krueger; Leandro Tramma; Ross Teviotdale; Mohammed Abdelatif; Kenny Gillen; Joe Ward; Donald Scobbie; Ian Baillie; Andrew Brooks; Bianca Prodan; William Kerr; Dominic Sloan-Murphy; Juan F R Herrera; Dan McManus; Carole Morris; Carol Sinclair; Rob Baxter; Mark Parsons; Andrew Morris;

    To enable a world-leading research dataset of routinely collected clinical images linked to other routinely collected data from the whole Scottish national population. This includes more than 30 million different radiological examinations from a population of 5.4 million and >2 PB of data collected since 2010.

  • Construction of a chromosome-scale long-read reference genome assembly for potato.
    Gigascience (IF 5.993) Pub Date : 2020-09-23
    Gina M Pham,John P Hamilton,Joshua C Wood,Joseph T Burke,Hainan Zhao,Brieanne Vaillancourt,Shujun Ou,Jiming Jiang,C Robin Buell

    Worldwide, the cultivated potato, Solanum tuberosum L., is the No. 1 vegetable crop and a critical food security crop. The genome sequence of DM1–3 516 R44, a doubled monoploid clone of S. tuberosum Group Phureja, was published in 2011 using a whole-genome shotgun sequencing approach with short-read sequence data. Current advanced sequencing technologies now permit generation of near-complete, high-quality

  • 3D Imaging and metabolomic profiling reveal higher neuroactive kavalactone contents in lateral roots and crown root peels of Piper methysticum (kava).
    Gigascience (IF 5.993) Pub Date : 2020-09-22
    Yogini S Jaiswal,Aaron M Yerke,M Caleb Bagley,Måns Ekelöf,Daniel Weber,Daniel Haddad,Anthony Fodor,David C Muddiman,Leonard L Williams

    Kava is an important neuroactive medicinal plant. While kava has a large global consumer footprint for its clinical and recreational use, factors related to its use lack standardization and the tissue-specific metabolite profile of its neuroactive constituents is not well understood.

  • Long-read only assembly of Drechmeria coniospora genomes reveals widespread chromosome plasticity and illustrates the limitations of current nanopore methods.
    Gigascience (IF 5.993) Pub Date : 2020-09-18
    Damien Courtine,Jan Provaznik,Jerome Reboul,Guillaume Blanc,Vladimir Benes,Jonathan J Ewbank

    Long-read sequencing is increasingly being used to determine eukaryotic genomes. We used nanopore technology to generate chromosome-level assemblies for 3 different strains of Drechmeria coniospora, a nematophagous fungus used extensively in the study of innate immunity in Caenorhabditis elegans.

  • Corrigendum to: Recommendations to enhance rigor and reproducibility in biomedical research.
    Gigascience (IF 5.993) Pub Date : 2020-09-17
    Jaqueline J Brito,Jun Li,Jason H Moore,Casey S Greene,Nicole A Nogoy,Lana X Garmire,Serghei Mangul

    In the original publication of this article, the author Serghei Mangul was erroneously listed with a second affiliation: Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, 90 090, USA, which has been removed from the author byline. The authors regret this error.

  • Machado: Open source genomics data integration framework.
    Gigascience (IF 5.993) Pub Date : 2020-09-14
    Mauricio de Alvarenga Mudadu,Adhemar Zerlotini

    Genome projects and multiomics experiments generate huge volumes of data that must be stored, mined, and transformed into useful knowledge. All this information is supposed to be accessible and, if possible, browsable afterwards. Computational biologists have been dealing with this scenario for more than a decade and have been implementing software and databases to meet this challenge. The GMOD's (Generic

  • Generation of a chromosome-scale genome assembly of the insect-repellent terpenoid-producing Lamiaceae species, Callicarpa americana.
    Gigascience (IF 5.993) Pub Date : 2020-09-01
    John P Hamilton,Grant T Godden,Emily Lanier,Wajid Waheed Bhat,Taliesin J Kinser,Brieanne Vaillancourt,Haiyan Wang,Joshua C Wood,Jiming Jiang,Pamela S Soltis,Douglas E Soltis,Bjoern Hamberger,C Robin Buell

    BACKGROUND Plants exhibit wide chemical diversity due to the production of specialized metabolites that function as pollinator attractants, defensive compounds, and signaling molecules. Lamiaceae (mints) are known for their chemodiversity and have been cultivated for use as culinary herbs, as well as sources of insect repellents, health-promoting compounds, and fragrance. FINDINGS We report the chromosome-scale

  • TGS-GapCloser: A fast and accurate gap closer for large genomes with low coverage of error-prone long reads.
    Gigascience (IF 5.993) Pub Date : 2020-09-01
    Mengyang Xu,Lidong Guo,Shengqiang Gu,Ou Wang,Rui Zhang,Brock A Peters,Guangyi Fan,Xin Liu,Xun Xu,Li Deng,Yongwei Zhang

    BACKGROUND Analyses that use genome assemblies are critically affected by the contiguity, completeness, and accuracy of those assemblies. In recent years single-molecule sequencing techniques generating long-read information have become available and enabled substantial improvement in contig length and genome completeness, especially for large genomes (>100 Mb), although bioinformatic tools for these

  • Chromosome-level genome assembly of the female western mosquitofish (Gambusia affinis).
    Gigascience (IF 5.993) Pub Date : 2020-08-27
    Feng Shao,Arne Ludwig,Yang Mao,Ni Liu,Zuogang Peng

    The western mosquitofish (Gambusia affinis) is a sexually dimorphic poeciliid fish known for its worldwide biological invasion and therefore an important research model for studying invasion biology. This organism may also be used as a suitable model to explore sex chromosome evolution and reproductive development in terms of differentiation of ZW sex chromosomes, ovoviviparity, and specialization

  • ScanITD: Detecting internal tandem duplication with robust variant allele frequency estimation.
    Gigascience (IF 5.993) Pub Date : 2020-08-27
    Ting-You Wang,Rendong Yang

    Internal tandem duplications (ITDs) are tandem duplications within coding exons and are important prognostic markers and drug targets for acute myeloid leukemia (AML). Next-generation sequencing has enabled the discovery of ITD at single-nucleotide resolution. ITD allele frequency is used in the risk stratification of patients with AML; higher ITD allele frequency is associated with poorer clinical

  • Hi-C chromosome conformation capture sequencing of avian genomes using the BGISEQ-500 platform.
    Gigascience (IF 5.993) Pub Date : 2020-08-26
    Marcela Sandoval-Velasco,Juan Antonio Rodríguez,Cynthia Perez Estrada,Guojie Zhang,Erez Lieberman Aiden,Marc A Marti-Renom,M Thomas P Gilbert,Oliver Smith

    Hi-C experiments couple DNA-DNA proximity with next-generation sequencing to yield an unbiased description of genome-wide interactions. Previous methods describing Hi-C experiments have focused on the industry-standard Illumina sequencing. With new next-generation sequencing platforms such as BGISEQ-500 becoming more widely available, protocol adaptations to fit platform-specific requirements are useful

  • Technical workflows for hyperspectral plant image assessment and processing on the greenhouse and laboratory scale.
    Gigascience (IF 5.993) Pub Date : 2020-08-20
    Stefan Paulus,Anne-Katrin Mahlein

    The use of hyperspectral cameras is well established in the field of plant phenotyping, especially as a part of high-throughput routines in greenhouses. Nevertheless, the workflows used differ depending on the applied camera, the plants being imaged, the experience of the users, and the measurement set-up.

  • A hybrid pipeline for reconstruction and analysis of viral genomes at multi-organ level.
    Gigascience (IF 5.993) Pub Date : 2020-08-20
    Diogo Pratas,Mari Toppinen,Lari Pyöriä,Klaus Hedman,Antti Sajantila,Maria F Perdomo

    Advances in sequencing technologies have enabled the characterization of multiple microbial and host genomes, opening new frontiers of knowledge while kindling novel applications and research perspectives. Among these is the investigation of the viral communities residing in the human body and their impact on health and disease. To this end, the study of samples from multiple tissues is critical, yet

  • Initial data release and announcement of the 10,000 Fish Genomes Project (Fish10K).
    Gigascience (IF 5.993) Pub Date : 2020-08-18
    Guangyi Fan,Yue Song,Liandong Yang,Xiaoyun Huang,Suyu Zhang,Mengqi Zhang,Xianwei Yang,Yue Chang,He Zhang,Yongxin Li,Shanshan Liu,Lili Yu,Jeffery Chu,Inge Seim,Chenguang Feng,Thomas J Near,Rod A Wing,Wen Wang,Kun Wang,Jing Wang,Xun Xu,Huanming Yang,Xin Liu,Nansheng Chen,Shunping He

    With more than 30,000 species, fish—including bony, jawless, and cartilaginous fish—are the largest vertebrate group, and include some of the earliest vertebrates. Despite their critical roles in many ecosystems and human society, fish genomics lags behind work on birds and mammals. This severely limits our understanding of evolution and hinders progress on the conservation and sustainable utilization

  • A haplotype-resolved, de novo genome assembly for the wood tiger moth (Arctia plantaginis) through trio binning.
    Gigascience (IF 5.993) Pub Date : 2020-08-18
    Eugenie C Yen,Shane A McCarthy,Juan A Galarza,Tomas N Generalovic,Sarah Pelan,Petr Nguyen,Joana I Meier,Ian A Warren,Johanna Mappes,Richard Durbin,Chris D Jiggins

    Diploid genome assembly is typically impeded by heterozygosity because it introduces errors when haplotypes are collapsed into a consensus sequence. Trio binning offers an innovative solution that exploits heterozygosity for assembly. Short, parental reads are used to assign parental origin to long reads from their F1 offspring before assembly, enabling complete haplotype resolution. Trio binning could

  • The chromosome-level draft genome of Dalbergia odorifera.
    Gigascience (IF 5.993) Pub Date : 2020-08-18
    Zhou Hong,Jiang Li,Xiaojin Liu,Jinmin Lian,Ningnan Zhang,Zengjiang Yang,Yongchao Niu,Zhiyi Cui,Daping Xu

    Dalbergia odorifera T. Chen (Fabaceae) is an International Union for Conservation of Nature red-listed tree. This tree is of high medicinal and commercial value owing to its officinal, insect-proof, durable heartwood. However, there is a lack of genome reference, which has hindered development of studies on the heartwood formation.

  • Scientometric trends for coronaviruses and other emerging viral infections.
    Gigascience (IF 5.993) Pub Date : 2020-08-17
    Dima Kagan,Jacob Moran-Gilad,Michael Fire

    COVID-19 is the most rapidly expanding coronavirus outbreak in the past 2 decades. To provide a swift response to a novel outbreak, prior knowledge from similar outbreaks is essential.

  • Graph2GO: a multi-modal attributed network embedding method for inferring protein functions.
    Gigascience (IF 5.993) Pub Date : 2020-08-08
    Kunjie Fan,Yuanfang Guan,Yan Zhang

    Identifying protein functions is important for many biological applications. Since experimental functional characterization of proteins is time-consuming and costly, accurate and efficient computational methods for predicting protein functions are in great demand for generating the testable hypotheses guiding large-scale experiments.“

  • The intersectional genetics landscape for humans.
    Gigascience (IF 5.993) Pub Date : 2020-08-06
    Andre Macedo,Alisson M Gontijo

    The human body is made up of hundreds—perhaps thousands—of cell types and states, most of which are currently inaccessible genetically. Intersectional genetic approaches can increase the number of genetically accessible cells, but the scope and safety of these approaches have not been systematically assessed. A typical intersectional method acts like an “AND" logic gate by converting the input of 2

  • VariantSpark: Cloud-based machine learning for association study of complex phenotype and large-scale genomic data.
    Gigascience (IF 5.993) Pub Date : 2020-08-06
    Arash Bayat,Piotr Szul,Aidan R O'Brien,Robert Dunne,Brendan Hosking,Yatish Jain,Cameron Hosking,Oscar J Luo,Natalie Twine,Denis C Bauer

    Many traits and diseases are thought to be driven by >1 gene (polygenic). Polygenic risk scores (PRS) hence expand on genome-wide association studies by taking multiple genes into account when risk models are built. However, PRS only considers the additive effect of individual genes but not epistatic interactions or the combination of individual and interacting drivers. While evidence of epistatic

  • Genomic data imputation with variational auto-encoders.
    Gigascience (IF 5.993) Pub Date : 2020-08-06
    Yeping Lina Qiu,Hong Zheng,Olivier Gevaert

    As missing values are frequently present in genomic data, practical methods to handle missing data are necessary for downstream analyses that require complete data sets. State-of-the-art imputation techniques, including methods based on singular value decomposition and K-nearest neighbors, can be computationally expensive for large data sets and it is difficult to modify these algorithms to handle

  • EHRtemporalVariability: delineating temporal data-set shifts in electronic health records.
    Gigascience (IF 5.993) Pub Date : 2020-07-30
    Carlos Sáez,Alba Gutiérrez-Sacristán,Isaac Kohane,Juan M García-Gómez,Paul Avillach

    Temporal variability in health-care processes or protocols is intrinsic to medicine. Such variability can potentially introduce dataset shifts, a data quality issue when reusing electronic health records (EHRs) for secondary purposes. Temporal data-set shifts can present as trends, as well as abrupt or seasonal changes in the statistical distributions of data over time. The latter are particularly

  • PhaseME: Automatic rapid assessment of phasing quality and phasing improvement.
    Gigascience (IF 5.993) Pub Date : 2020-07-24
    Sina Majidian,Fritz J Sedlazeck

    The detection of which mutations are occurring on the same DNA molecule is essential to predict their consequences. This can be achieved by phasing the genomic variations. Nevertheless, state-of-the-art haplotype phasing is currently a black box in which the accuracy and quality of the reconstructed haplotypes are hard to assess.

  • A generalizable data-driven multicellular model of pancreatic ductal adenocarcinoma.
    Gigascience (IF 5.993) Pub Date : 2020-07-22
    Boris Aguilar,David L Gibbs,David J Reiss,Mark McConnell,Samuel A Danziger,Andrew Dervan,Matthew Trotter,Douglas Bassett,Robert Hershberg,Alexander V Ratushny,Ilya Shmulevich

    Mechanistic models, when combined with pertinent data, can improve our knowledge regarding important molecular and cellular mechanisms found in cancer. These models make the prediction of tissue-level response to drug treatment possible, which can lead to new therapies and improved patient outcomes. Here we present a data-driven multiscale modeling framework to study molecular interactions between

  • Comparative genomics and transcriptomics of 4 Paragonimus species provide insights into lung fluke parasitism and pathogenesis.
    Gigascience (IF 5.993) Pub Date : 2020-07-20
    Bruce A Rosa,Young-Jun Choi,Samantha N McNulty,Hyeim Jung,John Martin,Takeshi Agatsuma,Hiromu Sugiyama,Thanh Hoa Le,Pham Ngoc Doanh,Wanchai Maleewong,David Blair,Paul J Brindley,Peter U Fischer,Makedonka Mitreva

    Paragonimus spp. (lung flukes) are among the most injurious foodborne helminths, infecting ∼23 million people and subjecting ∼292 million to infection risk. Paragonimiasis is acquired from infected undercooked crustaceans and primarily affects the lungs but often causes lesions elsewhere including the brain. The disease is easily mistaken for tuberculosis owing to similar pulmonary symptoms, and accordingly

  • Assessment of fecal DNA extraction protocols for metagenomic studies.
    Gigascience (IF 5.993) Pub Date : 2020-07-13
    Fangming Yang,Jihua Sun,Huainian Luo,Huahui Ren,Hongcheng Zhou,Yuxiang Lin,Mo Han,Bing Chen,Hailong Liao,Susanne Brix,Junhua Li,Huanming Yang,Karsten Kristiansen,Huanzi Zhong

    Shotgun metagenomic sequencing has improved our understanding of the human gut microbiota. Various DNA extraction methods have been compared to find protocols that robustly and most accurately reflect the original microbial community structures. However, these recommendations can be further refined by considering the time and cost demands in dealing with samples from very large human cohorts. Additionally

  • Chromosome-level de novo assembly of the pig-tailed macaque genome using linked-read sequencing and HiC proximity scaffolding.
    Gigascience (IF 5.993) Pub Date : 2020-07-10
    Morteza Roodgar,Afshin Babveyh,Lan H Nguyen,Wenyu Zhou,Rahul Sinha,Hayan Lee,John B Hanks,Mohan Avula,Lihua Jiang,Ruiqi Jian,Hoyong Lee,Giltae Song,Hassan Chaib,Irv L Weissman,Serafim Batzoglou,Susan Holmes,David G Smith,Joseph L Mankowski,Stefan Prost,Michael P Snyder

    Macaque species share >93% genome homology with humans and develop many disease phenotypes similar to those of humans, making them valuable animal models for the study of human diseases (e.g., HIV and neurodegenerative diseases). However, the quality of genome assembly and annotation for several macaque species lags behind the human genome effort.

  • Imputing missing RNA-sequencing data from DNA methylation by using a transfer learning-based neural network.
    Gigascience (IF 5.993) Pub Date : 2020-07-10
    Xiang Zhou,Hua Chai,Huiying Zhao,Ching-Hsing Luo,Yuedong Yang

    Gene expression plays a key intermediate role in linking molecular features at the DNA level and phenotype. However, owing to various limitations in experiments, the RNA-seq data are missing in many samples while there exist high-quality of DNA methylation data. Because DNA methylation is an important epigenetic modification to regulate gene expression, it can be used to predict RNA-seq data. For this

  • Sequence Compression Benchmark (SCB) database-A comprehensive evaluation of reference-free compressors for FASTA-formatted sequences.
    Gigascience (IF 5.993) Pub Date : 2020-07-06
    Kirill Kryukov,Mahoko Takahashi Ueda,So Nakagawa,Tadashi Imanishi

    Nearly all molecular sequence databases currently use gzip for data compression. Ongoing rapid accumulation of stored data calls for a more efficient compression tool. Although numerous compressors exist, both specialized and general-purpose, choosing one of them was difficult because no comprehensive analysis of their comparative advantages for sequence compression was available.

  • Corrigendum to: Metagenomic analysis of planktonic riverine microbial consortia using nanopore sequencing reveals insight into river microbe taxonomy and function.
    Gigascience (IF 5.993) Pub Date : 2020-06-24
    Kate Reddington,David Eccles,Justin O'Grady,Devin M Drown,Lars Hestbjerg Hansen,Tue Kjærgaard Nielsen,Anne-Lise Ducluzeau,Richard M Leggett,Darren Heavens,Ned Peel,Terrance P Snutch,Anthony Bayega,Spyridon Oikonomopoulos,Ioannis Ragoussis,Thomas Barry,Eric van der Helm,Dino Jolic,Hollian Richardson,Hans Jansen,John R Tyson,Miten Jain,Bonnie L Brown

    In the original publication of this article, the author Jiannis Ragoussis was not part of the author byline. His name, affiliation, and funding information have been added to this article. The authors regret this error.

  • CandiMeth: Powerful yet simple visualization and quantification of DNA methylation at candidate genes.
    Gigascience (IF 5.993) Pub Date : 2020-06-22
    Sara-Jayne Thursby,Darin K Lobo,Kristina Pentieva,Shu-Dong Zhang,Rachelle E Irwin,Colum P Walsh

    DNA methylation microarrays are widely used in clinical epigenetics and are often processed using R packages such as ChAMP or RnBeads by trained bioinformaticians. However, looking at specific genes requires bespoke coding for which wet-lab biologists or clinicians are not trained. This leads to high demands on bioinformaticians, who may lack insight into the specific biological problem. To bridge

  • Reduced chromatin accessibility underlies gene expression differences in homologous chromosome arms of diploid Aegilops tauschii and hexaploid wheat.
    Gigascience (IF 5.993) Pub Date : 2020-06-20
    Fu-Hao Lu,Neil McKenzie,Laura-Jayne Gardiner,Ming-Cheng Luo,Anthony Hall,Michael W Bevan

    Polyploidy is centrally important in the evolution and domestication of plants because it leads to major genomic changes, such as altered patterns of gene expression, which are thought to underlie the emergence of new traits. Despite the common occurrence of these globally altered patterns of gene expression in polyploids, the mechanisms involved are not well understood.

  • The democratization of bioinformatics: A software engineering perspective.
    Gigascience (IF 5.993) Pub Date : 2020-06-20
    Brendan Lawlor,Roy D Sleator

    Today, thanks to advances in cloud computing, it is possible for small teams of software developers to produce internet-scale products, a feat that was previously the preserve of large organizations. Herein, we describe how these advances in software engineering can be made more readily available to bioinformaticians. In the same way that cloud computing has democratized access to distributed systems

  • Draft genome assemblies using sequencing reads from Oxford Nanopore Technology and Illumina platforms for four species of North American Fundulus killifish.
    Gigascience (IF 5.993) Pub Date : 2020-06-18
    Lisa K Johnson,Ruta Sahasrabudhe,James Anthony Gill,Jennifer L Roach,Lutz Froenicke,C Titus Brown,Andrew Whitehead

    Whole-genome sequencing data from wild-caught individuals of closely related North American killifish species (Fundulus xenicus, Fundulus catenatus, Fundulus nottii, and Fundulus olivaceus) were obtained using long-read Oxford Nanopore Technology (ONT) PromethION and short-read Illumina platforms.

  • CRISPRcasIdentifier: Machine learning for accurate identification and classification of CRISPR-Cas systems.
    Gigascience (IF 5.993) Pub Date : 2020-06-17
    Victor A Padilha,Omer S Alkhnbashi,Shiraz A Shah,André C P L F de Carvalho,Rolf Backofen

    CRISPR-Cas genes are extraordinarily diverse and evolve rapidly when compared to other prokaryotic genes. With the rapid increase in newly sequenced archaeal and bacterial genomes, manual identification of CRISPR-Cas systems is no longer viable. Thus, an automated approach is required for advancing our understanding of the evolution and diversity of these systems and for finding new candidates for

  • Watchdog 2.0: New developments for reusability, reproducibility, and workflow execution.
    Gigascience (IF 5.993) Pub Date : 2020-06-17
    Michael Kluge,Marie-Sophie Friedl,Amrei L Menzel,Caroline C Friedel

    Advances in high-throughput methods have brought new challenges for biological data analysis, often requiring many interdependent steps applied to a large number of samples. To address this challenge, workflow management systems, such as Watchdog, have been developed to support scientists in the (semi-)automated execution of large analysis workflows.

  • An improved pig reference genome sequence to enable pig genetics and genomics research.
    Gigascience (IF 5.993) Pub Date : 2020-06-16
    Amanda Warr,Nabeel Affara,Bronwen Aken,Hamid Beiki,Derek M Bickhart,Konstantinos Billis,William Chow,Lel Eory,Heather A Finlayson,Paul Flicek,Carlos G Girón,Darren K Griffin,Richard Hall,Greg Hannum,Thibaut Hourlier,Kerstin Howe,David A Hume,Osagie Izuogu,Kristi Kim,Sergey Koren,Haibou Liu,Nancy Manchanda,Fergal J Martin,Dan J Nonneman,Rebecca E O'Connor,Adam M Phillippy,Gary A Rohrer,Benjamin D Rosen

    The domestic pig (Sus scrofa) is important both as a food source and as a biomedical model given its similarity in size, anatomy, physiology, metabolism, pathology, and pharmacology to humans. The draft reference genome (Sscrofa10.2) of a purebred Duroc female pig established using older clone-based sequencing methods was incomplete, and unresolved redundancies, short-range order and orientation errors

  • Integrative computational epigenomics to build data-driven gene regulation hypotheses.
    Gigascience (IF 5.993) Pub Date : 2020-06-16
    Tyrone Chen,Sonika Tyagi

    Diseases are complex phenotypes often arising as an emergent property of a non-linear network of genetic and epigenetic interactions. To translate this resulting state into a causal relationship with a subset of regulatory features, many experiments deploy an array of laboratory assays from multiple modalities. Often, each of these resulting datasets is large, heterogeneous, and noisy. Thus, it is

  • Bioentity2vec: Attribute- and behavior-driven representation for predicting multi-type relationships between bioentities.
    Gigascience (IF 5.993) Pub Date : 2020-06-13
    Zhen-Hao Guo,Zhu-Hong You,Yan-Bin Wang,De-Shuang Huang,Hai-Cheng Yi,Zhan-Heng Chen

    The explosive growth of genomic, chemical, and pathological data provides new opportunities and challenges for humans to thoroughly understand life activities in cells. However, there exist few computational models that aggregate various bioentities to comprehensively reveal the physical and functional landscape of biological systems.

  • Galactic Circos: User-friendly Circos plots within the Galaxy platform.
    Gigascience (IF 5.993) Pub Date : 2020-06-12
    Helena Rasche,Saskia Hiltemann

    Circos is a popular, highly flexible software package for the circular visualization of complex datasets. While especially popular in the field of genomic analysis, Circos enables interactive graphing of any analytical data, including alternative scientific domain data and non-scientific data. This high degree of flexibility also comes with a high degree of complexity, which may present an obstacle

  • Metagenomic analysis of planktonic riverine microbial consortia using nanopore sequencing reveals insight into river microbe taxonomy and function.
    Gigascience (IF 5.993) Pub Date : 2020-06-10
    Kate Reddington,David Eccles,Justin O'Grady,Devin M Drown,Lars Hestbjerg Hansen,Tue Kjærgaard Nielsen,Anne-Lise Ducluzeau,Richard M Leggett,Darren Heavens,Ned Peel,Terrance P Snutch,Anthony Bayega,Spyridon Oikonomopoulos,Ioannis Ragoussis,Thomas Barry,Eric van der Helm,Dino Jolic,Hollian Richardson,Hans Jansen,John R Tyson,Miten Jain,Bonnie L Brown

    Riverine ecosystems are biogeochemical powerhouses driven largely by microbial communities that inhabit water columns and sediments. Because rivers are used extensively for anthropogenic purposes (drinking water, recreation, agriculture, and industry), it is essential to understand how these activities affect the composition of river microbial consortia. Recent studies have shown that river metagenomes

  • Trans-NanoSim characterizes and simulates nanopore RNA-sequencing data.
    Gigascience (IF 5.993) Pub Date : 2020-06-10
    Saber Hafezqorani,Chen Yang,Theodora Lo,Ka Ming Nip,René L Warren,Inanc Birol

    Compared with second-generation sequencing technologies, third-generation single-molecule RNA sequencing has unprecedented advantages; the long reads it generates facilitate isoform-level transcript characterization. In particular, the Oxford Nanopore Technology sequencing platforms have become more popular in recent years owing to their relatively high affordability and portability compared with other

  • Genomic consequences of dietary diversification and parallel evolution due to nectarivory in leaf-nosed bats.
    Gigascience (IF 5.993) Pub Date : 2020-06-06
    Yocelyn T Gutiérrez-Guerrero,Enrique Ibarra-Laclette,Carlos Martínez Del Río,Josué Barrera-Redondo,Eria A Rebollar,Jorge Ortega,Livia León-Paniagua,Araxi Urrutia,Erika Aguirre-Planter,Luis E Eguiarte

    The New World leaf-nosed bats (Phyllostomids) exhibit a diverse spectrum of feeding habits and innovations in their nutrient acquisition and foraging mechanisms. However, the genomic signatures associated with their distinct diets are unknown.

  • SnpHub: an easy-to-set-up web server framework for exploring large-scale genomic variation data in the post-genomic era with applications in wheat.
    Gigascience (IF 5.993) Pub Date : 2020-06-05
    Wenxi Wang,Zihao Wang,Xintong Li,Zhongfu Ni,Zhaorong Hu,Mingming Xin,Huiru Peng,Yingyin Yao,Qixin Sun,Weilong Guo

    The cost of high-throughput sequencing is rapidly decreasing, allowing researchers to investigate genomic variations across hundreds or even thousands of samples in the post-genomic era. The management and exploration of these large-scale genomic variation data require programming skills. The public genotype querying databases of many species are usually centralized and implemented independently, making

  • Education in the genomics era: Generating high-quality genome assemblies in university courses.
    Gigascience (IF 5.993) Pub Date : 2020-06-03
    Stefan Prost,Sven Winter,Jordi De Raad,Raphael T F Coimbra,Magnus Wolf,Maria A Nilsson,Malte Petersen,Deepak K Gupta,Tilman Schell,Fritjof Lammers,Axel Janke

    Recent advances in genome sequencing technologies have simplified the generation of genome data and reduced the costs for genome assemblies, even for complex genomes like those of vertebrates. More practically oriented genomic courses can prepare university students for the increasing importance of genomic data used in biological and medical research. Low-cost third-generation sequencing technology

  • Iterative hard thresholding in genome-wide association studies: Generalized linear models, prior weights, and double sparsity.
    Gigascience (IF 5.993) Pub Date : 2020-06-03
    Benjamin B Chu,Kevin L Keys,Christopher A German,Hua Zhou,Jin J Zhou,Eric M Sobel,Janet S Sinsheimer,Kenneth Lange

    Consecutive testing of single nucleotide polymorphisms (SNPs) is usually employed to identify genetic variants associated with complex traits. Ideally one should model all covariates in unison, but most existing analysis methods for genome-wide association studies (GWAS) perform only univariate regression.

  • Recommendations to enhance rigor and reproducibility in biomedical research.
    Gigascience (IF 5.993) Pub Date : 2020-06-01
    Jaqueline J Brito,Jun Li,Jason H Moore,Casey S Greene,Nicole A Nogoy,Lana X Garmire,Serghei Mangul

    Biomedical research depends increasingly on computational tools, but mechanisms ensuring open data, open software, and reproducibility are variably enforced by academic institutions, funders, and publishers. Publications may present software for which source code or documentation are or become unavailable; this compromises the role of peer review in evaluating technical strength and scientific contribution

  • A catalog of microbial genes from the bovine rumen unveils a specialized and diverse biomass-degrading environment.
    Gigascience (IF 5.993) Pub Date : 2020-05-30
    Junhua Li,Huanzi Zhong,Yuliaxis Ramayo-Caldas,Nicolas Terrapon,Vincent Lombard,Gabrielle Potocki-Veronese,Jordi Estellé,Milka Popova,Ziyi Yang,Hui Zhang,Fang Li,Shanmei Tang,Fangming Yang,Weineng Chen,Bing Chen,Jiyang Li,Jing Guo,Cécile Martin,Emmanuelle Maguin,Xun Xu,Huanming Yang,Jian Wang,Lise Madsen,Karsten Kristiansen,Bernard Henrissat,Stanislav D Ehrlich,Diego P Morgavi

    The rumen microbiota provides essential services to its host and, through its role in ruminant production, contributes to human nutrition and food security. A thorough knowledge of the genetic potential of rumen microbes will provide opportunities for improving the sustainability of ruminant production systems. The availability of gene reference catalogs from gut microbiomes has advanced the understanding

  • Fcirc: A comprehensive pipeline for the exploration of fusion linear and circular RNAs.
    Gigascience (IF 5.993) Pub Date : 2020-05-29
    Zhaoqing Cai,Hongzhang Xue,Yue Xu,Jens Köhler,Xiaojie Cheng,Yao Dai,Jie Zheng,Haiyun Wang

    In cancer cells, fusion genes can produce linear and chimeric fusion-circular RNAs (f-circRNAs), which are functional in gene expression regulation and implicated in malignant transformation, cancer progression, and therapeutic resistance. For specific cancers, proteins encoded by fusion transcripts have been identified as innovative therapeutic targets (e.g., EML4-ALK). Even though RNA sequencing

  • halSynteny: a fast, easy-to-use conserved synteny block construction method for multiple whole-genome alignments.
    Gigascience (IF 5.993) Pub Date : 2020-05-28
    Ksenia Krasheninnikova,Mark Diekhans,Joel Armstrong,Aleksei Dievskii,Benedict Paten,Stephen O'Brien

    Large-scale sequencing projects provide high-quality full-genome data that can be used for reconstruction of chromosomal exchanges and rearrangements that disrupt conserved syntenic blocks. The highest resolution of cross-species homology can be obtained on the basis of whole-genome, reference-free alignments. Very large multiple alignments of full-genome sequence stored in a binary format demand an

  • CSA: A high-throughput chromosome-scale assembly pipeline for vertebrate genomes.
    Gigascience (IF 5.993) Pub Date : 2020-05-25
    Heiner Kuhl,Ling Li,Sven Wuertz,Matthias Stöck,Xu-Fang Liang,Christophe Klopp

    BACKGROUND Easy-to-use and fast bioinformatics pipelines for long-read assembly that go beyond the contig level to generate highly continuous chromosome-scale genomes from raw data remain scarce. RESULT Chromosome-Scale Assembler (CSA) is a novel computationally highly efficient bioinformatics pipeline that fills this gap. CSA integrates information from scaffolded assemblies (e.g., Hi-C or 10X Genomics)

  • TinderMIX: Time-dose integrated modelling of toxicogenomics data.
    Gigascience (IF 5.993) Pub Date : 2020-05-25
    Angela Serra,Michele Fratello,Giusy Del Giudice,Laura Aliisa Saarimäki,Michelangelo Paci,Antonio Federico,Dario Greco

    BACKGROUND Omics technologies have been widely applied in toxicology studies to investigate the effects of different substances on exposed biological systems. A classical toxicogenomic study consists in testing the effects of a compound at different dose levels and different time points. The main challenge consists in identifying the gene alteration patterns that are correlated to doses and time points

Contents have been reproduced by permission of the publishers.