-
A loss-of-function mutation in human Oxidation Resistance 1 disrupts the spatial–temporal regulation of histone arginine methylation in neurodevelopment Genome Biol. (IF 12.3) Pub Date : 2023-09-29 Xiaolin Lin, Wei Wang, Mingyi Yang, Nadirah Damseh, Mirta Mittelstedt Leal de Sousa, Fadi Jacob, Anna Lång, Elise Kristiansen, Marco Pannone, Miroslava Kissova, Runar Almaas, Anna Kuśnierczyk, Richard Siller, Maher Shahrour, Motee Al-Ashhab, Bassam Abu-Libdeh, Wannan Tang, Geir Slupphaug, Orly Elpeleg, Stig Ove Bøe, Lars Eide, Gareth J. Sullivan, Johanne Egge Rinholm, Hongjun Song, Guo-li Ming, Barbara
Oxidation Resistance 1 (OXR1) gene is a highly conserved gene of the TLDc domain-containing family. OXR1 is involved in fundamental biological and cellular processes, including DNA damage response, antioxidant pathways, cell cycle, neuronal protection, and arginine methylation. In 2019, five patients from three families carrying four biallelic loss-of-function variants in OXR1 were reported to be associated
-
Dominance is common in mammals and is associated with trans-acting gene expression and alternative splicing Genome Biol. (IF 12.3) Pub Date : 2023-09-29 Leilei Cui, Bin Yang, Shijun Xiao, Jun Gao, Amelie Baud, Delyth Graham, Martin McBride, Anna Dominiczak, Sebastian Schafer, Regina Lopez Aumatell, Carme Mont, Albert Fernandez Teruel, Norbert Hübner, Jonathan Flint, Richard Mott, Lusheng Huang
Dominance and other non-additive genetic effects arise from the interaction between alleles, and historically these phenomena play a major role in quantitative genetics. However, most genome-wide association studies (GWAS) assume alleles act additively. We systematically investigate both dominance—here representing any non-additive within-locus interaction—and additivity across 574 physiological and
-
happi: a hierarchical approach to pangenomics inference Genome Biol. (IF 12.3) Pub Date : 2023-09-29 Pauline Trinh, David S. Clausen, Amy D. Willis
Recovering metagenome-assembled genomes (MAGs) from shotgun sequencing data is an increasingly common task in microbiome studies, as MAGs provide deeper insight into the functional potential of both culturable and non-culturable microorganisms. However, metagenome-assembled genomes vary in quality and may contain omissions and contamination. These errors present challenges for detecting genes and comparing
-
Dosage compensation of Z sex chromosome genes in avian fibroblast cells Genome Biol. (IF 12.3) Pub Date : 2023-09-20 Ruslan Deviatiiarov, Hiroki Nagai, Galym Ismagulov, Anastasia Stupina, Kazuhiro Wada, Shinji Ide, Noriyuki Toji, Heng Zhang, Woranop Sukparangsi, Sittipon Intarapat, Oleg Gusev, Guojun Sheng
In birds, sex is genetically determined; however, the molecular mechanism is not well-understood. The avian Z sex chromosome (chrZ) lacks whole chromosome inactivation, in contrast to the mammalian chrX. To investigate chrZ dosage compensation and its role in sex specification, we use a highly quantitative method and analyze transcriptional activities of male and female fibroblast cells from seven
-
DISCERN: deep single-cell expression reconstruction for improved cell clustering and cell subtype and state detection Genome Biol. (IF 12.3) Pub Date : 2023-09-20 Fabian Hausmann, Can Ergen, Robin Khatri, Mohamed Marouf, Sonja Hänzelmann, Nicola Gagliani, Samuel Huber, Pierre Machart, Stefan Bonn
Single-cell sequencing provides detailed insights into biological processes including cell differentiation and identity. While providing deep cell-specific information, the method suffers from technical constraints, most notably a limited number of expressed genes per cell, which leads to suboptimal clustering and cell type identification. Here, we present DISCERN, a novel deep generative network that
-
Structural variation and introgression from wild populations in East Asian cattle genomes confer adaptation to local environment Genome Biol. (IF 12.3) Pub Date : 2023-09-18 Xiaoting Xia, Fengwei Zhang, Shuang Li, Xiaoyu Luo, Lixin Peng, Zheng Dong, Hubert Pausch, Alexander S. Leonard, Danang Crysnanto, Shikang Wang, Bin Tong, Johannes A. Lenstra, Jianlin Han, Fuyong Li, Tieshan Xu, Lihong Gu, Liangliang Jin, Ruihua Dang, Yongzhen Huang, Xianyong Lan, Gang Ren, Yu Wang, Yuanpeng Gao, Zhijie Ma, Haijian Cheng, Yun Ma, Hong Chen, Weijun Pang, Chuzhao Lei, Ningbo Chen
Structural variations (SVs) in individual genomes are major determinants of complex traits, including adaptability to environmental variables. The Mongolian and Hainan cattle breeds in East Asia are of taurine and indicine origins that have evolved to adapt to cold and hot environments, respectively. However, few studies have investigated SVs in East Asian cattle genomes and their roles in environmental
-
Author Correction: Single-cell resolution analysis reveals the preparation for reprogramming the fate of stem cell niche in cotton lateral meristem Genome Biol. (IF 12.3) Pub Date : 2023-09-18 Xiangqian Zhu, Zhongping Xu, Guanying Wang, Yulong Cong, Lu Yu, Ruoyu Jia, Yuan Qin, Guangyu Zhang, Bo Li, Daojun Yuan, Lili Tu, Xiyan Yang, Keith Lindsey, Xianlong Zhang, Shuangxia Jin
Correction: Genome Biol 24, 194 (2023) https://doi.org/10.1186/s13059-023-03032-6 Following publication of the original article [1], the authors reported an error in Fig. 9, namely a missing significant difference symbol for JCR1 and a redundant significant difference symbol for JOE1. The updated Fig. 9 is available in this Correction. Fig. 9 Phenotype of GhLAX1, GhLAX2, GhLOX3 knock out and overexpression
-
Disparities in spatially variable gene calling highlight the need for benchmarking spatial transcriptomics methods Genome Biol. (IF 12.3) Pub Date : 2023-09-18 Natalie Charitakis, Agus Salim, Adam T. Piers, Kevin I. Watt, Enzo R. Porrello, David A. Elliott, Mirana Ramialison
Identifying spatially variable genes (SVGs) is a key step in the analysis of spatially resolved transcriptomics data. SVGs provide biological insights by defining transcriptomic differences within tissues, which was previously unachievable using RNA-sequencing technologies. However, the increasing number of published tools designed to define SVG sets currently lack benchmarking methods to accurately
-
ZINBMM: a general mixture model for simultaneous clustering and gene selection using single-cell transcriptomic data Genome Biol. (IF 12.3) Pub Date : 2023-09-11 Yang Li, Mingcong Wu, Shuangge Ma, Mengyun Wu
Clustering is a critical component of single-cell RNA sequencing (scRNA-seq) data analysis and can help reveal cell types and infer cell lineages. Despite considerable successes, there are few methods tailored to investigating cluster-specific genes contributing to cell heterogeneity, which can promote biological understanding of cell heterogeneity. In this study, we propose a zero-inflated negative
-
The relationship between regulatory changes in cis and trans and the evolution of gene expression in humans and chimpanzees Genome Biol. (IF 12.3) Pub Date : 2023-09-11 Kenneth A. Barr, Katherine L. Rhodes, Yoav Gilad
Comparative gene expression studies in apes are fundamentally limited by the challenges associated with sampling across different tissues. Here, we used single-cell RNA sequencing of embryoid bodies to collect transcriptomic data from over 70 cell types in three humans and three chimpanzees. We find hundreds of genes whose regulation is conserved across cell types, as well as genes whose regulation
-
Coupling of co-transcriptional splicing and 3’ end Pol II pausing during termination in Arabidopsis Genome Biol. (IF 12.3) Pub Date : 2023-09-11 Sixian Zhou, Fengli Zhao, Danling Zhu, Qiqi Zhang, Ziwei Dai, Zhe Wu
In Arabidopsis, RNA Polymerase II (Pol II) often pauses within a few hundred base pairs downstream of the polyadenylation site, reflecting efficient transcriptional termination, but how such pausing is regulated remains largely elusive. Here, we analyze Pol II dynamics at 3’ ends by combining comprehensive experiments with mathematical modelling. We generate high-resolution serine 2 phosphorylated
-
PhaseDancer: a novel targeted assembler of segmental duplications unravels the complexity of the human chromosome 2 fusion going from 48 to 46 chromosomes in hominin evolution Genome Biol. (IF 12.3) Pub Date : 2023-09-11 Barbara Poszewiecka, Krzysztof Gogolewski, Justyna A. Karolak, Paweł Stankiewicz, Anna Gambin
Resolving complex genomic regions rich in segmental duplications (SDs) is challenging due to the high error rate of long-read sequencing. Here, we describe a targeted approach with a novel genome assembler PhaseDancer that extends SD-rich regions of interest iteratively. We validate its robustness and efficiency using a golden-standard set of human BAC clones and in silico-generated SDs with predefined
-
COLLAGENE enables privacy-aware federated and collaborative genomic data analysis Genome Biol. (IF 12.3) Pub Date : 2023-09-11 Wentao Li, Miran Kim, Kai Zhang, Han Chen, Xiaoqian Jiang, Arif Harmanci
Growing regulatory requirements set barriers around genetic data sharing and collaborations. Moreover, existing privacy-aware paradigms are challenging to deploy in collaborative settings. We present COLLAGENE, a tool base for building secure collaborative genomic data analysis methods. COLLAGENE protects data using shared-key homomorphic encryption and combines encryption with multiparty strategies
-
ChromGene: gene-based modeling of epigenomic data Genome Biol. (IF 12.3) Pub Date : 2023-09-07 Artur Jaroszewicz, Jason Ernst
Various computational approaches have been developed to annotate epigenomes on a per-position basis by modeling combinatorial and spatial patterns within epigenomic data. However, such annotations are less suitable for gene-based analyses. We present ChromGene, a method based on a mixture of learned hidden Markov models, to annotate genes based on multiple epigenomic maps across the gene body and flanks
-
Quartet protein reference materials and datasets for multi-platform assessment of label-free proteomics Genome Biol. (IF 12.3) Pub Date : 2023-09-07 Sha Tian, Dongdong Zhan, Ying Yu, Yunzhi Wang, Mingwei Liu, Subei Tan, Yan Li, Lei Song, Zhaoyu Qin, Xianju Li, Yang Liu, Yao Li, Shuhui Ji, Shanshan Wang, Yuanting Zheng, Fuchu He, Jun Qin, Chen Ding
Quantitative proteomics is an indispensable tool in life science research. However, there is a lack of reference materials for evaluating the reproducibility of label-free liquid chromatography-tandem mass spectrometry (LC–MS/MS)-based measurements among different instruments and laboratories. Here, we develop the Quartet standard as a proteome reference material with built-in truths, and distribute
-
Correcting batch effects in large-scale multiomics studies using a reference-material-based ratio method Genome Biol. (IF 12.3) Pub Date : 2023-09-07 Ying Yu, Naixin Zhang, Yuanbang Mai, Luyao Ren, Qiaochu Chen, Zehui Cao, Qingwang Chen, Yaqing Liu, Wanwan Hou, Jingcheng Yang, Huixiao Hong, Joshua Xu, Weida Tong, Lianhua Dong, Leming Shi, Xiang Fang, Yuanting Zheng
Batch effects are notoriously common technical variations in multiomics data and may result in misleading outcomes if uncorrected or over-corrected. A plethora of batch-effect correction algorithms are proposed to facilitate data integration. However, their respective advantages and limitations are not adequately assessed in terms of omics types, the performance metrics, and the application scenarios
-
Pervasive under-dominance in gene expression underlying emergent growth trajectories in Arabidopsis thaliana hybrids Genome Biol. (IF 12.3) Pub Date : 2023-09-04 Wei Yuan, Fiona Beitel, Thanvi Srikant, Ilja Bezrukov, Sabine Schäfer, Robin Kraft, Detlef Weigel
Complex traits, such as growth and fitness, are typically controlled by a very large number of variants, which can interact in both additive and non-additive fashion. In an attempt to gauge the relative importance of both types of genetic interactions, we turn to hybrids, which provide a facile means for creating many novel allele combinations. We focus on the interaction between alleles of the same
-
Single-cell transcriptomics reveals multiple chemoresistant properties in leukemic stem and progenitor cells in pediatric AML Genome Biol. (IF 12.3) Pub Date : 2023-08-31 Yongping Zhang, Shuting Jiang, Fuhong He, Yuanyuan Tian, Haiyang Hu, Li Gao, Lin Zhang, Aili Chen, Yixin Hu, Liyan Fan, Chun Yang, Bi Zhou, Dan Liu, Zihan Zhou, Yanxun Su, Lei Qin, Yi Wang, Hailong He, Jun Lu, Peifang Xiao, Shaoyan Hu, Qian-Fei Wang
Cancer patients can achieve dramatic responses to chemotherapy yet retain resistant tumor cells, which ultimately results in relapse. Although xenograft model studies have identified several cellular and molecular features that are associated with chemoresistance in acute myeloid leukemia (AML), to what extent AML patients exhibit these properties remains largely unknown. We apply single-cell RNA sequencing
-
A DNA adenine demethylase impairs PRC2-mediated repression of genes marked by a specific chromatin signature Genome Biol. (IF 12.3) Pub Date : 2023-08-30 Qingxiao Jia, Xinran Zhang, Qian Liu, Junjie Li, Wentao Wang, Xuan Ma, Bo Zhu, Sheng Li, Shicheng Gong, Jingjing Tian, Meng Yuan, Yu Zhao, Dao-Xiu Zhou
The Fe (II)- and α-ketoglutarate-dependent AlkB family dioxygenases are implicated in nucleotide demethylation. AlkB homolog1 (ALKBH1) is shown to demethylate DNA adenine methylation (6mA) preferentially from single-stranded or unpaired DNA, while its demethylase activity and function in the chromatin context are unclear. Here, we find that loss-of-function of the rice ALKBH1 gene leads to increased
-
Ariadne: synthetic long read deconvolution using assembly graphs Genome Biol. (IF 12.3) Pub Date : 2023-08-28 Lauren Mak, Dmitry Meleshko, David C. Danko, Waris N. Barakzai, Salil Maharjan, Natan Belchikov, Iman Hajirasouliha
Synthetic long read sequencing techniques such as UST’s TELL-Seq and Loop Genomics’ LoopSeq combine 3 $$'$$ barcoding with standard short-read sequencing to expand the range of linkage resolution from hundreds to tens of thousands of base-pairs. However, the lack of a 1:1 correspondence between a long fragment and a 3 $$'$$ unique molecular identifier confounds the assignment of linkage between short
-
A high-resolution genotype–phenotype map identifies the TaSPL17 controlling grain number and size in wheat Genome Biol. (IF 12.3) Pub Date : 2023-08-28 Yangyang Liu, Jun Chen, Changbin Yin, Ziying Wang, He Wu, Kuocheng Shen, Zhiliang Zhang, Lipeng Kang, Song Xu, Aoyue Bi, Xuebo Zhao, Daxing Xu, Zhonghu He, Xueyong Zhang, Chenyang Hao, Jianhui Wu, Yan Gong, Xuchang Yu, Zhiwen Sun, Botao Ye, Danni Liu, Lili Zhang, Liping Shen, Yuanfeng Hao, Youzhi Ma, Fei Lu, Zifeng Guo
Large-scale genotype–phenotype association studies of crop germplasm are important for identifying alleles associated with favorable traits. The limited number of single-nucleotide polymorphisms (SNPs) in most wheat genome-wide association studies (GWASs) restricts their power to detect marker-trait associations. Additionally, only a few genes regulating grain number per spikelet have been reported
-
Cross-species oncogenomics offers insight into human muscle-invasive bladder cancer Genome Biol. (IF 12.3) Pub Date : 2023-08-28 Kim Wong, Federico Abascal, Latasha Ludwig, Heike Aupperle-Lellbach, Julia Grassinger, Colin W. Wright, Simon J. Allison, Emma Pinder, Roger M. Phillips, Laura P. Romero, Arnon Gal, Patrick J. Roady, Isabel Pires, Franco Guscetti, John S. Munday, Maria C. Peleteiro, Carlos A. Pinto, Tânia Carvalho, João Cota, Elizabeth C. Du Plessis, Fernando Constantino-Casas, Stephanie Plog, Lars Moe, Simone de Brot
In humans, muscle-invasive bladder cancer (MIBC) is highly aggressive and associated with a poor prognosis. With a high mutation load and large number of altered genes, strategies to delineate key driver events are necessary. Dogs and cats develop urothelial carcinoma (UC) with histological and clinical similarities to human MIBC. Cattle that graze on bracken fern also develop UC, associated with exposure
-
SCA: recovering single-cell heterogeneity through information-based dimensionality reduction Genome Biol. (IF 12.3) Pub Date : 2023-08-25 Benjamin DeMeo, Bonnie Berger
Dimensionality reduction summarizes the complex transcriptomic landscape of single-cell datasets for downstream analyses. Current approaches favor large cellular populations defined by many genes, at the expense of smaller and more subtly defined populations. Here, we present surprisal component analysis (SCA), a technique that newly leverages the information-theoretic notion of surprisal for dimensionality
-
Single-cell resolution analysis reveals the preparation for reprogramming the fate of stem cell niche in cotton lateral meristem Genome Biol. (IF 12.3) Pub Date : 2023-08-25 Xiangqian Zhu, Zhongping Xu, Guanying Wang, Yulong Cong, Lu Yu, Ruoyu Jia, Yuan Qin, Guangyu Zhang, Bo Li, Daojun Yuan, Lili Tu, Xiyan Yang, Keith Lindsey, Xianlong Zhang, Shuangxia Jin
Somatic embryogenesis is a major process for plant regeneration. However, cell communication and the gene regulatory network responsible for cell reprogramming during somatic embryogenesis are still largely unclear. Recent advances in single-cell technologies enable us to explore the mechanism of plant regeneration at single-cell resolution. We generate a high-resolution single-cell transcriptomic
-
Comprehensive analyses of partially methylated domains and differentially methylated regions in esophageal cancer reveal both cell-type- and cancer-specific epigenetic regulation Genome Biol. (IF 12.3) Pub Date : 2023-08-24 Yueyuan Zheng, Benjamin Ziman, Allen S. Ho, Uttam K. Sinha, Li-Yan Xu, En-Min Li, H Phillip Koeffler, Benjamin P. Berman, De-Chen Lin
As one of the most common malignancies, esophageal cancer has two subtypes, squamous cell carcinoma and adenocarcinoma, arising from distinct cells-of-origin. Distinguishing cell-type-specific molecular features from cancer-specific characteristics is challenging. We analyze whole-genome bisulfite sequencing data on 45 esophageal tumor and nonmalignant samples from both subtypes. We develop a novel
-
Partial gene suppression improves identification of cancer vulnerabilities when CRISPR-Cas9 knockout is pan-lethal Genome Biol. (IF 12.3) Pub Date : 2023-08-23 J. Michael Krill-Burger, Joshua M. Dempster, Ashir A. Borah, Brenton R. Paolella, David E. Root, Todd R. Golub, Jesse S. Boehm, William C. Hahn, James M. McFarland, Francisca Vazquez, Aviad Tsherniak
Hundreds of functional genomic screens have been performed across a diverse set of cancer contexts, as part of efforts such as the Cancer Dependency Map, to identify gene dependencies—genes whose loss of function reduces cell viability or fitness. Recently, large-scale screening efforts have shifted from RNAi to CRISPR-Cas9, due to superior efficacy and specificity. However, many effective oncology
-
GTM-decon: guided-topic modeling of single-cell transcriptomes enables sub-cell-type and disease-subtype deconvolution of bulk transcriptomes Genome Biol. (IF 12.3) Pub Date : 2023-08-18 Lakshmipuram Seshadri Swapna, Michael Huang, Yue Li
Cell-type composition is an important indicator of health. We present Guided Topic Model for deconvolution (GTM-decon) to automatically infer cell-type-specific gene topic distributions from single-cell RNA-seq data for deconvolving bulk transcriptomes. GTM-decon performs competitively on deconvolving simulated and real bulk data compared with the state-of-the-art methods. Moreover, as demonstrated
-
Predicting the impact of sequence motifs on gene regulation using single-cell data Genome Biol. (IF 12.3) Pub Date : 2023-08-15 Jacob Hepkema, Nicholas Keone Lee, Benjamin J. Stewart, Siwat Ruangroengkulrith, Varodom Charoensawan, Menna R. Clatworthy, Martin Hemberg
The binding of transcription factors at proximal promoters and distal enhancers is central to gene regulation. Identifying regulatory motifs and quantifying their impact on expression remains challenging. Using a convolutional neural network trained on single-cell data, we infer putative regulatory motifs and cell type-specific importance. Our model, scover, explains 29% of the variance in gene expression
-
BamQuery: a proteogenomic tool to explore the immunopeptidome and prioritize actionable tumor antigens Genome Biol. (IF 12.3) Pub Date : 2023-08-15 Maria Virginia Ruiz Cuevas, Marie-Pierre Hardy, Jean-David Larouche, Anca Apavaloaei, Eralda Kina, Krystel Vincent, Patrick Gendron, Jean-Philippe Laverdure, Chantal Durette, Pierre Thibault, Sébastien Lemieux, Claude Perreault, Grégory Ehx
MHC-I-associated peptides deriving from non-coding genomic regions and mutations can generate tumor-specific antigens, including neoantigens. Quantifying tumor-specific antigens’ RNA expression in malignant and benign tissues is critical for discriminating actionable targets. We present BamQuery, a tool attributing an exhaustive RNA expression to MHC-I-associated peptides of any origin from bulk and
-
Genome sequencing of 2000 canids by the Dog10K consortium advances the understanding of demography, genome function and architecture Genome Biol. (IF 12.3) Pub Date : 2023-08-15 Jennifer R. S. Meadows, Jeffrey M. Kidd, Guo-Dong Wang, Heidi G. Parker, Peter Z. Schall, Matteo Bianchi, Matthew J. Christmas, Katia Bougiouri, Reuben M. Buckley, Christophe Hitte, Anthony K. Nguyen, Chao Wang, Vidhya Jagannathan, Julia E. Niskanen, Laurent A. F. Frantz, Meharji Arumilli, Sruthi Hundi, Kerstin Lindblad-Toh, Catarina Ginja, Kadek Karang Agustina, Catherine André, Adam R. Boyko, Brian
The international Dog10K project aims to sequence and analyze several thousand canine genomes. Incorporating 20 × data from 1987 individuals, including 1611 dogs (321 breeds), 309 village dogs, 63 wolves, and four coyotes, we identify genomic variation across the canid family, setting the stage for detailed studies of domestication, behavior, morphology, disease susceptibility, and genome architecture
-
Maast: genotyping thousands of microbial strains efficiently Genome Biol. (IF 12.3) Pub Date : 2023-08-10 Zhou Jason Shi, Stephen Nayfach, Katherine S. Pollard
Existing single nucleotide polymorphism (SNP) genotyping algorithms do not scale for species with thousands of sequenced strains, nor do they account for conspecific redundancy. Here we present a bioinformatics tool, Maast, which empowers population genetic meta-analysis of microbes at an unrivaled scale. Maast implements a novel algorithm to heuristically identify a minimal set of diverse conspecific
-
The CUT&RUN suspect list of problematic regions of the genome Genome Biol. (IF 12.3) Pub Date : 2023-08-10 Anna Nordin, Gianluca Zambanini, Pierfrancesco Pagella, Claudio Cantù
Cleavage Under Targets and Release Using Nuclease (CUT&RUN) is an increasingly popular technique to map genome-wide binding profiles of histone modifications, transcription factors, and co-factors. The ENCODE project and others have compiled blacklists for ChIP-seq which have been widely adopted: these lists contain regions of high and unstructured signal, regardless of cell type or protein target
-
LAST-seq: single-cell RNA sequencing by direct amplification of single-stranded RNA without prior reverse transcription and second-strand synthesis Genome Biol. (IF 12.3) Pub Date : 2023-08-09 Jun Lyu, Chongyi Chen
Existing single-cell RNA sequencing (scRNA-seq) methods rely on reverse transcription (RT) and second-strand synthesis (SSS) to convert single-stranded RNA into double-stranded DNA prior to amplification, with the limited RT/SSS efficiency compromising RNA detectability. Here, we develop a new scRNA-seq method, Linearly Amplified Single-stranded-RNA-derived Transcriptome sequencing (LAST-seq), which
-
Reconstruction of the last bacterial common ancestor from 183 pangenomes reveals a versatile ancient core genome Genome Biol. (IF 12.3) Pub Date : 2023-08-08 Jason C. Hyun, Bernhard O. Palsson
Cumulative sequencing efforts have yielded enough genomes to construct pangenomes for dozens of bacterial species and elucidate intraspecies gene conservation. Given the diversity of organisms for which this is achievable, similar analyses for ancestral species are feasible through the integration of pangenomics and phylogenetics, promising deeper insights into the nature of ancient life. We construct
-
Cross-protein transfer learning substantially improves disease variant prediction Genome Biol. (IF 12.3) Pub Date : 2023-08-07 Milind Jagota, Chengzhong Ye, Carlos Albors, Ruchir Rastogi, Antoine Koehl, Nilah Ioannidis, Yun S. Song
Genetic variation in the human genome is a major determinant of individual disease risk, but the vast majority of missense variants have unknown etiological effects. Here, we present a robust learning framework for leveraging saturation mutagenesis experiments to construct accurate computational predictors of proteome-wide missense variant pathogenicity. We train cross-protein transfer (CPT) models
-
3D organization of regulatory elements for transcriptional regulation in Arabidopsis Genome Biol. (IF 12.3) Pub Date : 2023-08-07 Li Deng, Qiangwei Zhou, Jie Zhou, Qing Zhang, Zhibo Jia, Guangfeng Zhu, Sheng Cheng, Lulu Cheng, Caijun Yin, Chao Yang, Jinxiong Shen, Junwei Nie, Jian-Kang Zhu, Guoliang Li, Lun Zhao
Although spatial organization of compartments and topologically associating domains at large scale is relatively well studied, the spatial organization of regulatory elements at fine scale is poorly understood in plants. Here we perform high-resolution chromatin interaction analysis using paired-end tag sequencing approach. We map chromatin interactions tethered with RNA polymerase II and associated
-
Towards in silico CLIP-seq: predicting protein-RNA interaction via sequence-to-signal learning Genome Biol. (IF 12.3) Pub Date : 2023-08-04 Marc Horlacher, Nils Wagner, Lambert Moyon, Klara Kuret, Nicolas Goedert, Marco Salvatore, Jernej Ule, Julien Gagneur, Ole Winther, Annalisa Marsico
We present RBPNet, a novel deep learning method, which predicts CLIP-seq crosslink count distribution from RNA sequence at single-nucleotide resolution. By training on up to a million regions, RBPNet achieves high generalization on eCLIP, iCLIP and miCLIP assays, outperforming state-of-the-art classifiers. RBPNet performs bias correction by modeling the raw signal as a mixture of the protein-specific
-
A syntelog-based pan-genome provides insights into rice domestication and de-domestication Genome Biol. (IF 12.3) Pub Date : 2023-08-03 Dongya Wu, Lingjuan Xie, Yanqing Sun, Yujie Huang, Lei Jia, Chenfeng Dong, Enhui Shen, Chu-Yu Ye, Qian Qian, Longjiang Fan
Asian rice is one of the world’s most widely cultivated crops. Large-scale resequencing analyses have been undertaken to explore the domestication and de-domestication genomic history of Asian rice, but the evolution of rice is still under debate. Here, we construct a syntelog-based rice pan-genome by integrating and merging 74 high-accuracy genomes based on long-read sequencing, encompassing all ecotypes
-
BEDwARS: a robust Bayesian approach to bulk gene expression deconvolution with noisy reference signatures Genome Biol. (IF 12.3) Pub Date : 2023-08-03 Saba Ghaffari, Kelly J. Bouchonville, Ehsan Saleh, Remington E. Schmidt, Steven M. Offer, Saurabh Sinha
Differential gene expression in bulk transcriptomics data can reflect change of transcript abundance within a cell type and/or change in the proportions of cell types. Expression deconvolution methods can help differentiate these scenarios. BEDwARS is a Bayesian deconvolution method designed to address differences between reference signatures of cell types and corresponding true signatures underlying
-
Effective methods for bulk RNA-seq deconvolution using scnRNA-seq transcriptomes Genome Biol. (IF 12.3) Pub Date : 2023-08-01 Francisco Avila Cobos, Mohammad Javad Najaf Panah, Jessica Epps, Xiaochen Long, Tsz-Kwong Man, Hua-Sheng Chiu, Elad Chomsky, Evgeny Kiner, Michael J. Krueger, Diego di Bernardo, Luis Voloch, Jan Molenaar, Sander R. van Hooff, Frank Westermann, Selina Jansky, Michele L. Redell, Pieter Mestdagh, Pavel Sumazin
RNA profiling technologies at single-cell resolutions, including single-cell and single-nuclei RNA sequencing (scRNA-seq and snRNA-seq, scnRNA-seq for short), can help characterize the composition of tissues and reveal cells that influence key functions in both healthy and disease tissues. However, the use of these technologies is operationally challenging because of high costs and stringent sample-collection
-
Genetic impacts on DNA methylation help elucidate regulatory genomic processes Genome Biol. (IF 12.3) Pub Date : 2023-07-31 Sergio Villicaña, Juan Castillo-Fernandez, Eilis Hannon, Colette Christiansen, Pei-Chien Tsai, Jane Maddock, Diana Kuh, Matthew Suderman, Christine Power, Caroline Relton, George Ploubidis, Andrew Wong, Rebecca Hardy, Alissa Goodman, Ken K. Ong, Jordana T. Bell
Pinpointing genetic impacts on DNA methylation can improve our understanding of pathways that underlie gene regulation and disease risk. We report heritability and methylation quantitative trait locus (meQTL) analysis at 724,499 CpGs profiled with the Illumina Infinium MethylationEPIC array in 2358 blood samples from three UK cohorts. Methylation levels at 34.2% of CpGs are affected by SNPs, and 98%
-
vamos: variable-number tandem repeats annotation using efficient motif sets Genome Biol. (IF 12.3) Pub Date : 2023-07-27 Jingwen Ren, Bida Gu, Mark J. P. Chaisson
Roughly 3% of the human genome is composed of variable-number tandem repeats (VNTRs): arrays of motifs at least six bases. These loci are highly polymorphic, yet current approaches that define and merge variants based on alignment breakpoints do not capture their full diversity. Here we present a method vamos: VNTR Annotation using efficient Motif Sets that instead annotates VNTR using repeat composition
-
ISLET: individual-specific reference panel recovery improves cell-type-specific inference Genome Biol. (IF 12.3) Pub Date : 2023-07-26 Hao Feng, Guanqun Meng, Tong Lin, Hemang Parikh, Yue Pan, Ziyi Li, Jeffrey Krischer, Qian Li
We propose a statistical framework ISLET to infer individual-specific and cell-type-specific transcriptome reference panels. ISLET models the repeatedly measured bulk gene expression data, to optimize the usage of shared information within each subject. ISLET is the first available method to achieve individual-specific reference estimation in repeated samples. Using simulation studies, we show outstanding
-
Genetic history of East-Central Europe in the first millennium CE Genome Biol. (IF 12.3) Pub Date : 2023-07-24 Ireneusz Stolarek, Michal Zenczak, Luiza Handschuh, Anna Juras, Malgorzata Marcinkowska-Swojak, Anna Spinek, Artur Dębski, Marzena Matla, Hanna Kóčka-Krenz, Janusz Piontek, Marek Figlerowicz
The appearance of Slavs in East-Central Europe has been the subject of an over 200-year debate driven by two conflicting hypotheses. The first assumes that Slavs came to the territory of contemporary Poland no earlier than the sixth century CE; the second postulates that they already inhabited this region in the Iron Age (IA). Testing either hypothesis is not trivial given that cremation of the dead
-
Predicting disease severity in metachromatic leukodystrophy using protein activity and a patient phenotype matrix Genome Biol. (IF 12.3) Pub Date : 2023-07-21 Marena Trinidad, Xinying Hong, Steven Froelich, Jessica Daiker, James Sacco, Hong Phuc Nguyen, Madelynn Campagna, Dean Suhr, Teryn Suhr, Jonathan H. LeBowitz, Michael H. Gelb, Wyatt T. Clark
Metachromatic leukodystrophy (MLD) is a lysosomal storage disorder caused by mutations in the arylsulfatase A gene (ARSA) and categorized into three subtypes according to age of onset. The functional effect of most ARSA mutants remains unknown; better understanding of the genotype–phenotype relationship is required to support newborn screening (NBS) and guide treatment. We collected a patient data
-
L-GIREMI uncovers RNA editing sites in long-read RNA-seq Genome Biol. (IF 12.3) Pub Date : 2023-07-20 Zhiheng Liu, Giovanni Quinones-Valdez, Ting Fu, Elaine Huang, Mudra Choudhury, Fairlie Reese, Ali Mortazavi, Xinshu Xiao
Although long-read RNA-seq is increasingly applied to characterize full-length transcripts it can also enable detection of nucleotide variants, such as genetic mutations or RNA editing sites, which is significantly under-explored. Here, we present an in-depth study to detect and analyze RNA editing sites in long-read RNA-seq. Our new method, L-GIREMI, effectively handles sequencing errors and read
-
MSV: a modular structural variant caller that reveals nested and complex rearrangements by unifying breakends inferred directly from reads Genome Biol. (IF 12.3) Pub Date : 2023-07-17 Markus Schmidt, Arne Kutzner
Structural variant (SV) calling belongs to the standard tools of modern bioinformatics for identifying and describing alterations in genomes. Initially, this work presents several complex genomic rearrangements that reveal conceptual ambiguities inherent to the representation via basic SV. We contextualize these ambiguities theoretically as well as practically and propose a graph-based approach for
-
Comprehensive analysis of neoantigens derived from structural variation across whole genomes from 2528 tumors Genome Biol. (IF 12.3) Pub Date : 2023-07-17 Yang Shi, Biyang Jing, Ruibin Xi
Neoantigens are critical for anti-tumor immunity and have been long-envisioned as promising therapeutic targets. However, current neoantigen analyses mostly focus on single nucleotide variations (SNVs) and indel mutations and seldom consider structural variations (SVs) that are also prevalent in cancer. Here, we develop a computational method termed NeoSV, which incorporates SV annotation, protein
-
Sensitive inference of alignment-safe intervals from biodiverse protein sequence clusters using EMERALD Genome Biol. (IF 12.3) Pub Date : 2023-07-17 Andreas Grigorjew, Artur Gynter, Fernando H. C. Dias, Benjamin Buchfink, Hajk-Georg Drost, Alexandru I. Tomescu
Sequence alignments are the foundations of life science research, but most innovation so far focuses on optimal alignments, while information derived from suboptimal solutions is ignored. We argue that one optimal alignment per pairwise sequence comparison is a reasonable approximation when dealing with very similar sequences but is insufficient when exploring the biodiversity of the protein universe
-
Identifying and quantifying isoforms from accurate full-length transcriptome sequencing reads with Mandalorion Genome Biol. (IF 12.3) Pub Date : 2023-07-17 Roger Volden, Kayla D. Schimke, Ashley Byrne, Danilo Dubocanin, Matthew Adams, Christopher Vollmers
In this manuscript, we introduce and benchmark Mandalorion v4.1 for the identification and quantification of full-length transcriptome sequencing reads. It further improves upon the already strong performance of Mandalorion v3.6 used in the LRGASP consortium challenge. By processing real and simulated data, we show three main features of Mandalorion: first, Mandalorion-based isoform identification
-
Stable maternal proteins underlie distinct transcriptome, translatome, and proteome reprogramming during mouse oocyte-to-embryo transition Genome Biol. (IF 12.3) Pub Date : 2023-07-13 Hongmei Zhang, Shuyan Ji, Ke Zhang, Yuling Chen, Jia Ming, Feng Kong, Lijuan Wang, Shun Wang, Zhuoning Zou, Zhuqing Xiong, Kai Xu, Zili Lin, Bo Huang, Ling Liu, Qiang Fan, Suoqin Jin, Haiteng Deng, Wei Xie
The oocyte-to-embryo transition (OET) converts terminally differentiated gametes into a totipotent embryo and is critically controlled by maternal mRNAs and proteins, while the genome is silent until zygotic genome activation. How the transcriptome, translatome, and proteome are coordinated during this critical developmental window remains poorly understood. Utilizing a highly sensitive and quantitative
-
SEESAW: detecting isoform-level allelic imbalance accounting for inferential uncertainty Genome Biol. (IF 12.3) Pub Date : 2023-07-12 Euphy Y. Wu, Noor P. Singh, Kwangbom Choi, Mohsen Zakeri, Matthew Vincent, Gary A. Churchill, Cheryl L. Ackert-Bicknell, Rob Patro, Michael I. Love
Detecting allelic imbalance at the isoform level requires accounting for inferential uncertainty, caused by multi-mapping of RNA-seq reads. Our proposed method, SEESAW, uses Salmon and Swish to offer analysis at various levels of resolution, including gene, isoform, and aggregating isoforms to groups by transcription start site. The aggregation strategies strengthen the signal for transcripts with
-
Mapping genetic variants for nonsense-mediated mRNA decay regulation across human tissues Genome Biol. (IF 12.3) Pub Date : 2023-07-11 Bo Sun, Liang Chen
Nonsense-mediated mRNA decay (NMD) was originally conceived as an mRNA surveillance mechanism to prevent the production of potentially deleterious truncated proteins. Research also shows NMD is an important post-transcriptional gene regulation mechanism selectively targeting many non-aberrant mRNAs. However, how natural genetic variants affect NMD and modulate gene expression remains elusive. Here
-
CMOT: Cross-Modality Optimal Transport for multimodal inference Genome Biol. (IF 12.3) Pub Date : 2023-07-11 Sayali Anil Alatkar, Daifeng Wang
Multimodal measurements of single-cell sequencing technologies facilitate a comprehensive understanding of specific cellular and molecular mechanisms. However, simultaneous profiling of multiple modalities of single cells is challenging, and data integration remains elusive due to missing modalities and cell–cell correspondences. To address this, we developed a computational approach, Cross-Modality
-
GreenHill: a de novo chromosome-level scaffolding and phasing tool using Hi-C Genome Biol. (IF 12.3) Pub Date : 2023-07-11 Shun Ouchi, Rei Kajitani, Takehiko Itoh
Chromosome-level haplotype-resolved genome assembly is an important resource in molecular biology. However, current de novo haplotype assemblers require parental data or reference genomes and often fail to provide chromosome-level results. We present GreenHill, a novel scaffolding and phasing tool that considers various assemblers’ contigs as input to reconstruct chromosome-level haplotypes using Hi-C
-
CimpleG: finding simple CpG methylation signatures Genome Biol. (IF 12.3) Pub Date : 2023-07-10 Tiago Maié, Marco Schmidt, Myriam Erz, Wolfgang Wagner, Ivan G. Costa
DNA methylation signatures are usually based on multivariate approaches that require hundreds of sites for predictions. Here, we propose a computational framework named CimpleG for the detection of small CpG methylation signatures used for cell-type classification and deconvolution. We show that CimpleG is both time efficient and performs as well as top performing methods for cell-type classification
-
Intronic small nucleolar RNAs regulate host gene splicing through base pairing with their adjacent intronic sequences Genome Biol. (IF 12.3) Pub Date : 2023-07-06 Danny Bergeron, Laurence Faucher-Giguère, Ann-Kathrin Emmerichs, Karine Choquet, Kristina Sungeun Song, Gabrielle Deschamps-Francoeur, Étienne Fafard-Couture, Andrea Rivera, Sonia Couture, L. Stirling Churchman, Florian Heyd, Sherif Abou Elela, Michelle S. Scott
Small nucleolar RNAs (snoRNAs) are abundant noncoding RNAs best known for their involvement in ribosomal RNA maturation. In mammals, most expressed snoRNAs are embedded in introns of longer genes and produced through transcription and splicing of their host. Intronic snoRNAs were long viewed as inert passengers with little effect on host expression. However, a recent study reported a snoRNA influencing
-
Protocadherin 20 maintains intestinal barrier function to protect against Crohn’s disease by targeting ATF6 Genome Biol. (IF 12.3) Pub Date : 2023-07-05 Shanshan Huang, Zhuo Xie, Jing Han, Huiling Wang, Guang Yang, Manying Li, Gaoshi Zhou, Ying Wang, Lixuan Li, Li Li, Zhirong Zeng, Jun Yu, Minhu Chen, Shenghong Zhang
Intestinal barrier dysfunction plays a central role in the pathological onset of Crohn’s disease. We identify the cadherin superfamily member protocadherin 20 (PCDH20) as a crucial factor in Crohn’s disease. Here we describe the function of PCDH20 and its mechanisms in gut homeostasis, barrier integrity, and Crohn’s disease development. PCDH20 mRNA and protein expression is significantly downregulated
-
HiCognition: a visual exploration and hypothesis testing tool for 3D genomics Genome Biol. (IF 12.3) Pub Date : 2023-07-05 Christoph C. H. Langer, Michael Mitter, Roman R. Stocsits, Daniel W. Gerlich
Genome browsers facilitate integrated analysis of multiple genomics datasets yet visualize only a few regions at a time and lack statistical functions for extracting meaningful information. We present HiCognition, a visual exploration and machine-learning tool based on a new genomic region set concept, enabling detection of patterns and associations between 3D chromosome conformation and collections
-
Characterization of large-scale genomic differences in the first complete human genome Genome Biol. (IF 12.3) Pub Date : 2023-07-04 Xiangyu Yang, Xuankai Wang, Yawen Zou, Shilong Zhang, Manying Xia, Lianting Fu, Mitchell R. Vollger, Nae-Chyun Chen, Dylan J. Taylor, William T. Harvey, Glennis A. Logsdon, Dan Meng, Junfeng Shi, Rajiv C. McCoy, Michael C. Schatz, Weidong Li, Evan E. Eichler, Qing Lu, Yafei Mao
The first telomere-to-telomere (T2T) human genome assembly (T2T-CHM13) release is a milestone in human genomics. The T2T-CHM13 genome assembly extends our understanding of telomeres, centromeres, segmental duplication, and other complex regions. The current human genome reference (GRCh38) has been widely used in various human genomic studies. However, the large-scale genomic differences between these