-
A realistic benchmark for differential abundance testing and confounder adjustment in human microbiome studies Genome Biol. (IF 10.1) Pub Date : 2024-09-25 Jakob Wirbel, Morgan Essex, Sofia Kirke Forslund, Georg Zeller
In microbiome disease association studies, it is a fundamental task to test which microbes differ in their abundance between groups. Yet, consensus on suitable or optimal statistical methods for differential abundance testing is lacking, and it remains unexplored how these cope with confounding. Previous differential abundance benchmarks relying on simulated datasets did not quantitatively evaluate
-
Recruitment of the m6A/m6Am demethylase FTO to target RNAs by the telomeric zinc finger protein ZBTB48 Genome Biol. (IF 10.1) Pub Date : 2024-09-19 Syed Nabeel-Shah, Shuye Pu, Giovanni L. Burke, Nujhat Ahmed, Ulrich Braunschweig, Shaghayegh Farhangmehr, Hyunmin Lee, Mingkun Wu, Zuyao Ni, Hua Tang, Guoqing Zhong, Edyta Marcon, Zhaolei Zhang, Benjamin J. Blencowe, Jack F. Greenblatt
N6-methyladenosine (m6A), the most abundant internal modification on eukaryotic mRNA, and N6, 2′-O-dimethyladenosine (m6Am), are epitranscriptomic marks that function in multiple aspects of posttranscriptional regulation. Fat mass and obesity-associated protein (FTO) can remove both m6A and m6Am; however, little is known about how FTO achieves its substrate selectivity. Here, we demonstrate that ZBTB48
-
A dynamic regulome of shoot-apical-meristem-related homeobox transcription factors modulates plant architecture in maize Genome Biol. (IF 10.1) Pub Date : 2024-09-19 Zi Luo, Leiming Wu, Xinxin Miao, Shuang Zhang, Ningning Wei, Shiya Zhao, Xiaoyang Shang, Hongyan Hu, Jiquan Xue, Tifu Zhang, Fang Yang, Shutu Xu, Lin Li
The shoot apical meristem (SAM), from which all above-ground tissues of plants are derived, is critical to plant morphology and development. In maize (Zea mays), loss-of-function mutant studies have identified several SAM-related genes, most encoding homeobox transcription factors (TFs), located upstream of hierarchical networks of hundreds of genes. Here, we collect 46 transcriptome and 16 translatome
-
Atlas of telomeric repeat diversity in Arabidopsis thaliana Genome Biol. (IF 10.1) Pub Date : 2024-09-16 Yueqi Tao, Wenfei Xian, Zhigui Bao, Fernando A. Rabanal, Andrea Movilli, Christa Lanz, Gautam Shirsekar, Detlef Weigel
Telomeric repeat arrays at the ends of chromosomes are highly dynamic in composition, but their repetitive nature and technological limitations have made it difficult to assess their true variation in genome diversity surveys. We have comprehensively characterized the sequence variation immediately adjacent to the canonical telomeric repeat arrays at the very ends of chromosomes in 74 genetically diverse
-
Splam: a deep-learning-based splice site predictor that improves spliced alignments Genome Biol. (IF 10.1) Pub Date : 2024-09-16 Kuan-Hao Chao, Alan Mao, Steven L. Salzberg, Mihaela Pertea
The process of splicing messenger RNA to remove introns plays a central role in creating genes and gene variants. We describe Splam, a novel method for predicting splice junctions in DNA using deep residual convolutional neural networks. Unlike previous models, Splam looks at a 400-base-pair window flanking each splice site, reflecting the biological splicing process that relies primarily on signals
-
ESCHR: a hyperparameter-randomized ensemble approach for robust clustering across diverse datasets Genome Biol. (IF 10.1) Pub Date : 2024-09-16 Sarah M. Goggin, Eli R. Zunder
Clustering is widely used for single-cell analysis, but current methods are limited in accuracy, robustness, ease of use, and interpretability. To address these limitations, we developed an ensemble clustering method that outperforms other methods at hard clustering without the need for hyperparameter tuning. It also performs soft clustering to characterize continuum-like regions and quantify clustering
-
Dimension reduction, cell clustering, and cell–cell communication inference for single-cell transcriptomics with DcjComm Genome Biol. (IF 10.1) Pub Date : 2024-09-09 Qian Ding, Wenyi Yang, Guangfu Xue, Hongxin Liu, Yideng Cai, Jinhao Que, Xiyun Jin, Meng Luo, Fenglan Pang, Yuexin Yang, Yi Lin, Yusong Liu, Haoxiu Sun, Renjie Tan, Pingping Wang, Zhaochun Xu, Qinghua Jiang
Advances in single-cell transcriptomics provide an unprecedented opportunity to explore complex biological processes. However, computational methods for analyzing single-cell transcriptomics still have room for improvement especially in dimension reduction, cell clustering, and cell–cell communication inference. Herein, we propose a versatile method, named DcjComm, for comprehensive analysis of single-cell
-
A comprehensive map of the aging blood methylome in humans Genome Biol. (IF 10.1) Pub Date : 2024-09-06 Kirsten Seale, Andrew Teschendorff, Alexander P. Reiner, Sarah Voisin, Nir Eynon
During aging, the human methylome undergoes both differential and variable shifts, accompanied by increased entropy. The distinction between variably methylated positions (VMPs) and differentially methylated positions (DMPs), their contribution to epigenetic age, and the role of cell type heterogeneity remain unclear. We conduct a comprehensive analysis of > 32,000 human blood methylomes from 56 datasets
-
DeepKINET: a deep generative model for estimating single-cell RNA splicing and degradation rates Genome Biol. (IF 10.1) Pub Date : 2024-09-06 Chikara Mizukoshi, Yasuhiro Kojima, Satoshi Nomura, Shuto Hayashi, Ko Abe, Teppei Shimamura
Messenger RNA splicing and degradation are critical for gene expression regulation, the abnormality of which leads to diseases. Previous methods for estimating kinetic rates have limitations, assuming uniform rates across cells. DeepKINET is a deep generative model that estimates splicing and degradation rates at single-cell resolution from scRNA-seq data. DeepKINET outperforms existing methods on
-
Author Correction: A benchmark of computational methods for correcting biases of established and unknown origin in CRISPR-Cas9 screening data Genome Biol. (IF 10.1) Pub Date : 2024-09-04 Alessandro Vinceti, Rafaele M. Iannuzzi, Isabella Boyle, Lucia Trastulla, Catarina D. Campbell, Francisca Vazquez, Joshua M. Dempster, Francesco Iorio
Correction: Genome Biol 25, 192 (2024) https://doi.org/10.1186/s13059-024-03336-1 Following publication of the original article [1], the authors identified an omission in the completing interests section. The omitted text is given in bold below. Competing interests FI receives funding from Open Targets, a public-private initiative involving academia and industry and performs consultancy for the joint
-
Publisher Correction: scParser: sparse representation learning for scalable single-cell RNA sequencing data analysis Genome Biol. (IF 10.1) Pub Date : 2024-09-04 Kai Zhao, Hon-Cheong So, Zhixiang Lin
Publisher Correction: Genome Biol 25, 223 (2024) https://doi.org/10.1186/s13059-024-03345-0 Following publication of the original article [1], the authors identified a typesetting error in Eq. 3, 4 and 10, as well as in Algorithm 1 equation. An erroneous “ll” was typeset at the start of the equations. The incorrect and corrected versions are published in this correction article. Incorrect equation
-
Improved simultaneous mapping of epigenetic features and 3D chromatin structure via ViCAR Genome Biol. (IF 10.1) Pub Date : 2024-09-03 Sean M. Flynn, Somdutta Dhir, Krzysztof Herka, Colm Doyle, Larry Melidis, Angela Simeone, Winnie W. I. Hui, Rafael de Cesaris Araujo Tavares, Stefan Schoenfelder, David Tannahill, Shankar Balasubramanian
Methods to measure chromatin contacts at genomic regions bound by histone modifications or proteins are important tools to investigate chromatin organization. However, such methods do not capture the possible involvement of other epigenomic features such as G-quadruplex DNA secondary structures (G4s). To bridge this gap, we introduce ViCAR (viewpoint HiCAR), for the direct antibody-based capture of
-
RNAseqCovarImpute: a multiple imputation procedure that outperforms complete case and single imputation differential expression analysis Genome Biol. (IF 10.1) Pub Date : 2024-09-03 Brennan H. Baker, Sheela Sathyanarayana, Adam A. Szpiro, James W. MacDonald, Alison G. Paquette
Missing covariate data is a common problem that has not been addressed in observational studies of gene expression. Here, we present a multiple imputation method that accommodates high dimensional gene expression data by incorporating principal component analysis of the transcriptome into the multiple imputation prediction models to avoid bias. Simulation studies using three datasets show that this
-
Enhlink infers distal and context-specific enhancer–promoter linkages Genome Biol. (IF 10.1) Pub Date : 2024-09-02 Olivier B. Poirion, Wulin Zuo, Catrina Spruce, Candice N. Baker, Sandra L. Daigle, Ashley Olson, Daniel A. Skelly, Elissa J. Chesler, Christopher L. Baker, Brian S. White
Enhlink is a computational tool for scATAC-seq data analysis, facilitating precise interrogation of enhancer function at the single-cell level. It employs an ensemble approach incorporating technical and biological covariates to infer condition-specific regulatory DNA linkages. Enhlink can integrate multi-omic data for enhanced specificity, when available. Evaluation with simulated and real data, including
-
Dissecting the genetic basis of UV-B responsive metabolites in rice Genome Biol. (IF 10.1) Pub Date : 2024-08-29 Feng Zhang, Chenkun Yang, Hao Guo, Yufei Li, Shuangqian Shen, Qianqian Zhou, Chun Li, Chao Wang, Ting Zhai, Lianghuan Qu, Cheng Zhang, Xianqing Liu, Jie Luo, Wei Chen, Shouchuang Wang, Jun Yang, Cui Yu, Yanyan Liu
UV-B, an important environmental factor, has been shown to affect the yield and quality of rice (Oryza sativa) worldwide. However, the molecular mechanisms underlying the response to UV-B stress remain elusive in rice. We perform comprehensive metabolic profiling of leaves from 160 diverse rice accessions under UV-B and normal light conditions using a widely targeted metabolomics approach. Our results
-
NERD-seq: a novel approach of Nanopore direct RNA sequencing that expands representation of non-coding RNAs Genome Biol. (IF 10.1) Pub Date : 2024-08-28 Luke Saville, Li Wu, Jemaneh Habtewold, Yubo Cheng, Babita Gollen, Liam Mitchell, Matthew Stuart-Edwards, Travis Haight, Majid Mohajerani, Athanasios Zovoilis
Non-coding RNAs (ncRNAs) are frequently documented RNA modification substrates. Nanopore Technologies enables the direct sequencing of RNAs and the detection of modified nucleobases. Ordinarily, direct RNA sequencing uses polyadenylation selection, studying primarily mRNA gene expression. Here, we present NERD-seq, which enables detection of multiple non-coding RNAs, excluded by the standard approach
-
Gut microbiota contributes to high-altitude hypoxia acclimatization of human populations Genome Biol. (IF 10.1) Pub Date : 2024-08-28 Qian Su, Dao-Hua Zhuang, Yu-Chun Li, Yu Chen, Xia-Yan Wang, Ming-Xia Ge, Ting-Yue Xue, Qi-Yuan Zhang, Xin-Yuan Liu, Fan-Qian Yin, Yi-Ming Han, Zong-Liang Gao, Long Zhao, Yong-Xuan Li, Meng-Jiao Lv, Li-Qin Yang, Tian-Rui Xia, Yong-Jun Luo, Zhigang Zhang, Qing-Peng Kong
The relationship between human gut microbiota and high-altitude hypoxia acclimatization remains highly controversial. This stems primarily from uncertainties regarding both the potential temporal changes in the microbiota under such conditions and the existence of any dominant or core bacteria that may assist in host acclimatization. To address these issues, and to control for variables commonly present
-
Contribution of homoeologous exchange to domestication of polyploid Brassica Genome Biol. (IF 10.1) Pub Date : 2024-08-27 Tianpeng Wang, Aalt D. J. van Dijk, Ranze Zhao, Guusje Bonnema, Xiaowu Wang
Polyploidy is widely recognized as a significant evolutionary force in the plant kingdom, contributing to the diversification of plants. One of the notable features of allopolyploidy is the occurrence of homoeologous exchange (HE) events between the subgenomes, causing changes in genomic composition, gene expression, and phenotypic variations. However, the role of HE in plant adaptation and domestication
-
Seqrutinator: scrutiny of large protein superfamily sequence datasets for the identification and elimination of non-functional homologues Genome Biol. (IF 10.1) Pub Date : 2024-08-26 Agustín Amalfitano, Nicolás Stocchi, Hugo Marcelo Atencio, Fernando Villarreal, Arjen ten Have
Seqrutinator is an objective, flexible pipeline that removes sequences with sequencing and/or gene model errors and sequences from pseudogenes from complex, eukaryotic protein superfamilies. Testing Seqrutinator on major superfamilies BAHD, CYP, and UGT removes only 1.94% of SwissProt entries, 14% of entries from the model plant Arabidopsis thaliana, but 80% of entries from Pinus taeda’s recent complete
-
Real-time identification of epistatic interactions in SARS-CoV-2 from large genome collections Genome Biol. (IF 10.1) Pub Date : 2024-08-22 Gabriel Innocenti, Maureen Obara, Bibiana Costa, Henning Jacobsen, Maeva Katzmarzyk, Luka Cicin-Sain, Ulrich Kalinke, Marco Galardini
The emergence of the SARS-CoV-2 virus has highlighted the importance of genomic epidemiology in understanding the evolution of pathogens and guiding public health interventions. The Omicron variant in particular has underscored the role of epistasis in the evolution of lineages with both higher infectivity and immune escape, and therefore the necessity to update surveillance pipelines to detect them
-
Current limitations in predicting mRNA translation with deep learning models Genome Biol. (IF 10.1) Pub Date : 2024-08-20 Niels Schlusser, Asier González, Muskan Pandey, Mihaela Zavolan
The design of nucleotide sequences with defined properties is a long-standing problem in bioengineering. An important application is protein expression, be it in the context of research or the production of mRNA vaccines. The rate of protein synthesis depends on the 5′ untranslated region (5′UTR) of the mRNAs, and recently, deep learning models were proposed to predict the translation output of mRNAs
-
Melon: metagenomic long-read-based taxonomic identification and quantification using marker genes Genome Biol. (IF 10.1) Pub Date : 2024-08-19 Xi Chen, Xiaole Yin, Xianghui Shi, Weifu Yan, Yu Yang, Lei Liu, Tong Zhang
Long-read sequencing holds great potential for characterizing complex microbial communities, yet taxonomic profiling tools designed specifically for long reads remain lacking. We introduce Melon, a novel marker-based taxonomic profiler that capitalizes on the unique attributes of long reads. Melon employs a two-stage classification scheme to reduce computational time and is equipped with an expect
-
Benchmarking computational methods for single-cell chromatin data analysis Genome Biol. (IF 10.1) Pub Date : 2024-08-16 Siyuan Luo, Pierre-Luc Germain, Mark D. Robinson, Ferdinand von Meyenn
Single-cell chromatin accessibility assays, such as scATAC-seq, are increasingly employed in individual and joint multi-omic profiling of single cells. As the accumulation of scATAC-seq and multi-omics datasets continue, challenges in analyzing such sparse, noisy, and high-dimensional data become pressing. Specifically, one challenge relates to optimizing the processing of chromatin-level measurements
-
StaVia: spatially and temporally aware cartography with higher-order random walks for cell atlases Genome Biol. (IF 10.1) Pub Date : 2024-08-16 Shobana V. Stassen, Minato Kobashi, Edmund Y. Lam, Yuanhua Huang, Joshua W. K. Ho, Kevin K. Tsia
Single-cell atlases pose daunting computational challenges pertaining to the integration of spatial and temporal information and the visualization of trajectories across large atlases. We introduce StaVia, a computational framework that synergizes multi-faceted single-cell data with higher-order random walks that leverage the memory of cells’ past states, fused with a cartographic Atlas View that offers
-
scParser: sparse representation learning for scalable single-cell RNA sequencing data analysis Genome Biol. (IF 10.1) Pub Date : 2024-08-16 Kai Zhao, Hon-Cheong So, Zhixiang Lin
The rapid rise in the availability and scale of scRNA-seq data needs scalable methods for integrative analysis. Though many methods for data integration have been developed, few focus on understanding the heterogeneous effects of biological conditions across different cell populations in integrative analysis. Our proposed scalable approach, scParser, models the heterogeneous effects from biological
-
Overlooked poor-quality patient samples in sequencing data impair reproducibility of published clinically relevant datasets Genome Biol. (IF 10.1) Pub Date : 2024-08-16 Maximilian Sprang, Jannik Möllmann, Miguel A. Andrade-Navarro, Jean-Fred Fontaine
Reproducibility is a major concern in biomedical studies, and existing publication guidelines do not solve the problem. Batch effects and quality imbalances between groups of biological samples are major factors hampering reproducibility. Yet, the latter is rarely considered in the scientific literature. Our analysis uses 40 clinically relevant RNA-seq datasets to quantify the impact of quality imbalance
-
Comprehensive network modeling approaches unravel dynamic enhancer-promoter interactions across neural differentiation Genome Biol. (IF 10.1) Pub Date : 2024-08-14 William DeGroat, Fumitaka Inoue, Tal Ashuach, Nir Yosef, Nadav Ahituv, Anat Kreimer
Increasing evidence suggests that a substantial proportion of disease-associated mutations occur in enhancers, regions of non-coding DNA essential to gene regulation. Understanding the structures and mechanisms of the regulatory programs this variation affects can shed light on the apparatuses of human diseases. We collect epigenetic and gene expression datasets from seven early time points during
-
Associating transcription factors to single-cell trajectories with DREAMIT Genome Biol. (IF 10.1) Pub Date : 2024-08-14 Nathan D. Maulding, Lucas Seninge, Joshua M. Stuart
Inferring gene regulatory networks from single-cell RNA-sequencing trajectories has been an active area of research yet methods are still needed to identify regulators governing cell transitions. We developed DREAMIT (Dynamic Regulation of Expression Across Modules in Inferred Trajectories) to annotate transcription-factor activity along single-cell trajectory branches, using ensembles of relations
-
The GC-content at the 5′ ends of human protein-coding genes is undergoing mutational decay Genome Biol. (IF 10.1) Pub Date : 2024-08-13 Yi Qiu, Yoon Mo Kang, Christopher Korfmann, Fanny Pouyet, Andrew Eckford, Alexander F. Palazzo
In vertebrates, most protein-coding genes have a peak of GC-content near their 5′ transcriptional start site (TSS). This feature promotes both the efficient nuclear export and translation of mRNAs. Despite the importance of GC-content for RNA metabolism, its general features, origin, and maintenance remain mysterious. We investigate the evolutionary forces shaping GC-content at the transcriptional
-
SynGAP: a synteny-based toolkit for gene structure annotation polishing Genome Biol. (IF 10.1) Pub Date : 2024-08-13 Fengqi Wu, Yingxiao Mai, Chengjie Chen, Rui Xia
Genome sequencing has become a routine task for biologists, but the challenge of gene structure annotation persists, impeding accurate genomic and genetic research. Here, we present a bioinformatics toolkit, SynGAP (Synteny-based Gene structure Annotation Polisher), which uses gene synteny information to accomplish precise and automated polishing of gene structure annotation of genomes. SynGAP offers
-
Prevalence of and gene regulatory constraints on transcriptional adaptation in single cells Genome Biol. (IF 10.1) Pub Date : 2024-08-12 Ian A. Mellis, Madeline E. Melzer, Nicholas Bodkin, Yogesh Goyal
Cells and tissues have a remarkable ability to adapt to genetic perturbations via a variety of molecular mechanisms. Nonsense-induced transcriptional compensation, a form of transcriptional adaptation, has recently emerged as one such mechanism, in which nonsense mutations in a gene trigger upregulation of related genes, possibly conferring robustness at cellular and organismal levels. However, beyond
-
READv2: advanced and user-friendly detection of biological relatedness in archaeogenomics Genome Biol. (IF 10.1) Pub Date : 2024-08-12 Erkin Alaçamlı, Thijessen Naidoo, Merve N. Güler, Ekin Sağlıcan, Şevval Aktürk, Igor Mapelli, Kıvılcım Başak Vural, Mehmet Somel, Helena Malmström, Torsten Günther
The advent of genome-wide ancient DNA analysis has revolutionized our understanding of prehistoric societies. However, studying biological relatedness in these groups requires tailored approaches due to the challenges of analyzing ancient DNA. READv2, an optimized Python3 implementation of the most widely used tool for this purpose, addresses these challenges while surpassing its predecessor in speed
-
Creating large-scale genetic diversity in Arabidopsis via base editing-mediated deep artificial evolution Genome Biol. (IF 10.1) Pub Date : 2024-08-09 Xiang Wang, Wenbo Pan, Chao Sun, Hong Yang, Zhentao Cheng, Fei Yan, Guojing Ma, Yun Shang, Rui Zhang, Caixia Gao, Lijing Liu, Huawei Zhang
Base editing is a powerful tool for artificial evolution to create allelic diversity and improve agronomic traits. However, the great evolutionary potential for every sgRNA target has been overlooked. And there is currently no high-throughput method for generating and characterizing as many changes in a single target as possible based on large mutant pools to permit rapid gene directed evolution in
-
Inferring clonal somatic mutations directed by X chromosome inactivation status in single cells Genome Biol. (IF 10.1) Pub Date : 2024-08-09 Ilke Demirci, Anton J. M. Larsson, Xinsong Chen, Johan Hartman, Rickard Sandberg, Jonas Frisén
Analysis of clonal dynamics in human tissues is enabled by somatic genetic variation. Here, we show that analysis of mitochondrial mutations in single cells is dramatically improved in females when using X chromosome inactivation to select informative clonal mutations. Applying this strategy to human peripheral mononuclear blood cells reveals clonal structures within T cells that otherwise are blurred
-
Genomic reproducibility in the bioinformatics era Genome Biol. (IF 10.1) Pub Date : 2024-08-09 Pelin Icer Baykal, Paweł Piotr Łabaj, Florian Markowetz, Lynn M. Schriml, Daniel J. Stekhoven, Serghei Mangul, Niko Beerenwinkel
In biomedical research, validating a scientific discovery hinges on the reproducibility of its experimental results. However, in genomics, the definition and implementation of reproducibility remain imprecise. We argue that genomic reproducibility, defined as the ability of bioinformatics tools to maintain consistent results across technical replicates, is essential for advancing scientific knowledge
-
Benchmarking clustering, alignment, and integration methods for spatial transcriptomics Genome Biol. (IF 10.1) Pub Date : 2024-08-09 Yunfei Hu, Manfei Xie, Yikang Li, Mingxing Rao, Wenjun Shen, Can Luo, Haoran Qin, Jihoon Baek, Xin Maizie Zhou
Spatial transcriptomics (ST) is advancing our understanding of complex tissues and organisms. However, building a robust clustering algorithm to define spatially coherent regions in a single tissue slice and aligning or integrating multiple tissue slices originating from diverse sources for essential downstream analyses remains challenging. Numerous clustering, alignment, and integration methods have
-
Transcriptional and epigenetic characterization of a new in vitro platform to model the formation of human pharyngeal endoderm Genome Biol. (IF 10.1) Pub Date : 2024-08-08 Andrea Cipriano, Alessio Colantoni, Alessandro Calicchio, Jonathan Fiorentino, Danielle Gomes, Mahdi Moqri, Alexander Parker, Sajede Rasouli, Matthew Caldwell, Francesca Briganti, Maria Grazia Roncarolo, Antonio Baldini, Katja G. Weinacht, Gian Gaetano Tartaglia, Vittorio Sebastiano
The Pharyngeal Endoderm (PE) is an extremely relevant developmental tissue, serving as the progenitor for the esophagus, parathyroids, thyroids, lungs, and thymus. While several studies have highlighted the importance of PE cells, a detailed transcriptional and epigenetic characterization of this important developmental stage is still missing, especially in humans, due to technical and ethical constraints
-
Microsatellite instability at U2AF-binding polypyrimidic tract sites perturbs alternative splicing during colorectal cancer initiation Genome Biol. (IF 10.1) Pub Date : 2024-08-06 Vincent Jonchère, Hugo Montémont, Enora Le Scanf, Aurélie Siret, Quentin Letourneur, Emmanuel Tubacher, Christophe Battail, Assane Fall, Karim Labreche, Victor Renault, Toky Ratovomanana, Olivier Buhard, Ariane Jolly, Philippe Le Rouzic, Cody Feys, Emmanuelle Despras, Habib Zouali, Rémy Nicolle, Pascale Cervera, Magali Svrcek, Pierre Bourgoin, Hélène Blanché, Anne Boland, Jérémie Lefèvre, Yann Parc
Microsatellite instability (MSI) due to mismatch repair deficiency (dMMR) is common in colorectal cancer (CRC). These cancers are associated with somatic coding events, but the noncoding pathophysiological impact of this genomic instability is yet poorly understood. Here, we perform an analysis of coding and noncoding MSI events at the different steps of colorectal tumorigenesis using whole exome sequencing
-
Efficient inference of large prokaryotic pangenomes with PanTA Genome Biol. (IF 10.1) Pub Date : 2024-08-06 Duc Quang Le, Tien Anh Nguyen, Son Hoang Nguyen, Tam Thi Nguyen, Canh Hao Nguyen, Huong Thanh Phung, Tho Huu Ho, Nam S. Vo, Trang Nguyen, Hoang Anh Nguyen, Minh Duc Cao
Pangenome inference is an indispensable step in bacterial genomics, yet its scalability poses a challenge due to the rapid growth of genomic collections. This paper presents PanTA, a software package designed for constructing pangenomes of large bacterial datasets, showing unprecedented efficiency levels multiple times higher than existing tools. PanTA introduces a novel mechanism to construct the
-
DNA-binding factor footprints and enhancer RNAs identify functional non-coding genetic variants Genome Biol. (IF 10.1) Pub Date : 2024-08-06 Simon C. Biddie, Giovanna Weykopf, Elizabeth F. Hird, Elias T. Friman, Wendy A. Bickmore
Genome-wide association studies (GWAS) have revealed a multitude of candidate genetic variants affecting the risk of developing complex traits and diseases. However, the highlighted regions are typically in the non-coding genome, and uncovering the functional causative single nucleotide variants (SNVs) is challenging. Prioritization of variants is commonly based on genomic annotation with markers of
-
scPriorGraph: constructing biosemantic cell–cell graphs with prior gene set selection for cell type identification from scRNA-seq data Genome Biol. (IF 10.1) Pub Date : 2024-08-05 Xiyue Cao, Yu-An Huang, Zhu-Hong You, Xuequn Shang, Lun Hu, Peng-Wei Hu, Zhi-An Huang
Cell type identification is an indispensable analytical step in single-cell data analyses. To address the high noise stemming from gene expression data, existing computational methods often overlook the biologically meaningful relationships between genes, opting to reduce all genes to a unified data space. We assume that such relationships can aid in characterizing cell type features and improving
-
STdGCN: spatial transcriptomic cell-type deconvolution using graph convolutional networks Genome Biol. (IF 10.1) Pub Date : 2024-08-05 Yawei Li, Yuan Luo
Spatially resolved transcriptomics integrates high-throughput transcriptome measurements with preserved spatial cellular organization information. However, many technologies cannot reach single-cell resolution. We present STdGCN, a graph model leveraging single-cell RNA sequencing (scRNA-seq) as reference for cell-type deconvolution in spatial transcriptomic (ST) data. STdGCN incorporates expression
-
MAMS: matrix and analysis metadata standards to facilitate harmonization and reproducibility of single-cell data Genome Biol. (IF 10.1) Pub Date : 2024-08-01 Irzam Sarfraz, Yichen Wang, Amulya Shastry, Wei Kheng Teh, Artem Sokolov, Brian R. Herb, Heather H. Creasy, Isaac Virshup, Ruben Dries, Kylee Degatano, Anup Mahurkar, Daniel J. Schnell, Pedro Madrigal, Jason Hilton, Nils Gehlenborg, Timothy Tickle, Joshua D. Campbell
Many datasets are being produced by consortia that seek to characterize healthy and disease tissues at single-cell resolution. While biospecimen and experimental information is often captured, detailed metadata standards related to data matrices and analysis workflows are currently lacking. To address this, we develop the matrix and analysis metadata standards (MAMS) to serve as a resource for data
-
Annelid methylomes reveal ancestral developmental and aging-associated epigenetic erosion across Bilateria Genome Biol. (IF 10.1) Pub Date : 2024-08-01 Kero Guynes, Luke A. Sarre, Allan M. Carrillo-Baltodano, Billie E. Davies, Lan Xu, Yan Liang, Francisco M. Martín-Zamora, Paul J. Hurd, Alex de Mendoza, José M. Martín-Durán
DNA methylation in the form of 5-methylcytosine (5mC) is the most abundant base modification in animals. However, 5mC levels vary widely across taxa. While vertebrate genomes are hypermethylated, in most invertebrates, 5mC concentrates on constantly and highly transcribed genes (gene body methylation; GbM) and, in some species, on transposable elements (TEs), a pattern known as “mosaic”. Yet, the role
-
aKNNO: single-cell and spatial transcriptomics clustering with an optimized adaptive k-nearest neighbor graph Genome Biol. (IF 10.1) Pub Date : 2024-08-01 Jia Li, Yu Shyr, Qi Liu
Typical clustering methods for single-cell and spatial transcriptomics struggle to identify rare cell types, while approaches tailored to detect rare cell types gain this ability at the cost of poorer performance for grouping abundant ones. Here, we develop aKNNO to simultaneously identify abundant and rare cell types based on an adaptive k-nearest neighbor graph with optimization. Benchmarking on
-
Current genomic deep learning models display decreased performance in cell type-specific accessible regions Genome Biol. (IF 10.1) Pub Date : 2024-08-01 Pooja Kathail, Richard W. Shuai, Ryan Chung, Chun Jimmie Ye, Gabriel B. Loeb, Nilah M. Ioannidis
A number of deep learning models have been developed to predict epigenetic features such as chromatin accessibility from DNA sequence. Model evaluations commonly report performance genome-wide; however, cis regulatory elements (CREs), which play critical roles in gene regulation, make up only a small fraction of the genome. Furthermore, cell type-specific CREs contain a large proportion of complex
-
Modelling the demographic history of human North African genomes points to a recent soft split divergence between populations Genome Biol. (IF 10.1) Pub Date : 2024-07-30 Jose M. Serradell, Jose M. Lorenzo-Salazar, Carlos Flores, Oscar Lao, David Comas
North African human populations present a complex demographic scenario due to the presence of an autochthonous genetic component and population substructure, plus extensive gene flow from the Middle East, Europe, and sub-Saharan Africa. We conducted a comprehensive analysis of 364 genomes to construct detailed demographic models for the North African region, encompassing its two primary ethnic groups
-
Epigenomic identification of vernalization cis-regulatory elements in winter wheat Genome Biol. (IF 10.1) Pub Date : 2024-07-30 Yanhong Liu, Pan Liu, Lifeng Gao, Yushan Li, Xueni Ren, Jizeng Jia, Lei Wang, Xu Zheng, Yiping Tong, Hongcui Pei, Zefu Lu
Winter wheat undergoes vernalization, a process activated by prolonged exposure to low temperatures. During this phase, flowering signals are generated and transported to the apical meristems, stimulating the transition to the inflorescence meristem while inhibiting tiller bud elongation. Although some vernalization genes have been identified, the key cis-regulatory elements and precise mechanisms
-
Phospholipase-mediated phosphate recycling during plant leaf senescence Genome Biol. (IF 10.1) Pub Date : 2024-07-29 Bao Yang, Zengdong Tan, Jiayu Yan, Ke Zhang, Zhewen Ouyang, Ruyi Fan, Yefei Lu, Yuting Zhang, Xuan Yao, Hu Zhao, Xuemin Wang, Shaoping Lu, Liang Guo
Phosphorus is a macronutrient necessary for plant growth and development and its availability and efficient use affect crop yields. Leaves are the largest tissue that uses phosphorus in plants, and membrane phospholipids are the main source of cellular phosphorus usage. Here we identify a key process for plant cellular phosphorus recycling mediated by membrane phospholipid hydrolysis during leaf senescence
-
scCross: a deep generative model for unifying single-cell multi-omics with seamless integration, cross-modal generation, and in silico exploration Genome Biol. (IF 10.1) Pub Date : 2024-07-29 Xiuhui Yang, Koren K. Mann, Hao Wu, Jun Ding
Single-cell multi-omics data reveal complex cellular states, providing significant insights into cellular dynamics and disease. Yet, integration of multi-omics data presents challenges. Some modalities have not reached the robustness or clarity of established transcriptomics. Coupled with data scarcity for less established modalities and integration intricacies, these challenges limit our ability to
-
Mining alternative splicing patterns in scRNA-seq data using scASfind Genome Biol. (IF 10.1) Pub Date : 2024-07-29 Yuyao Song, Guillermo Parada, Jimmy Tsz Hang Lee, Martin Hemberg
Single-cell RNA-seq (scRNA-seq) is widely used for transcriptome profiling, but most analyses focus on gene-level events, with less attention devoted to alternative splicing. Here, we present scASfind, a novel computational method to allow for quantitative analysis of cell type-specific splicing events using full-length scRNA-seq data. ScASfind utilizes an efficient data structure to store the percent
-
Author Correction: DNA methylation remodeling and the functional implication during male gametogenesis in rice Genome Biol. (IF 10.1) Pub Date : 2024-07-27 Xue Li, Bo Zhu, Yue Lu, Feng Zhao, Qian Liu, Jiahao Wang, Miaomiao Ye, Siyuan Chen, Junwei Nie, Lizhong Xiong, Yu Zhao, Changyin Wu, Dao-Xiu Zhou
Author Correction: Genome Biol 25, 84 (2024) https://doi.org/10.1186/s13059-024-03222-w Following publication of the original article [1], the authors identified an error in Fig. 2. In Fig. 2B, a wild type pollen picture was wrongly used to represent cmt3b pollens that in fact are of wild type phenotype. The incorrect and correct Fig. 2 is published in this correction article and the original article
-
SonicParanoid2: fast, accurate, and comprehensive orthology inference with machine learning and language models Genome Biol. (IF 10.1) Pub Date : 2024-07-25 Salvatore Cosentino, Sira Sriswasdi, Wataru Iwasaki
Accurate inference of orthologous genes constitutes a prerequisite for comparative and evolutionary genomics. SonicParanoid is one of the fastest tools for orthology inference; however, its scalability and accuracy have been hampered by time-consuming all-versus-all alignments and the existence of proteins with complex domain architectures. Here, we present a substantial update of SonicParanoid, where
-
The vast majority of somatic mutations in plants are layer-specific Genome Biol. (IF 10.1) Pub Date : 2024-07-24 Manish Goel, José A. Campoy, Kristin Krause, Lisa C. Baus, Anshupa Sahu, Hequan Sun, Birgit Walkemeier, Magdalena Marek, Randy Beaudry, David Ruiz, Bruno Huettel, Korbinian Schneeberger
Plant meristems are structured organs consisting of distinct layers of stem cells, which differentiate into new plant tissue. Mutations in meristematic layers can propagate into large sectors of the plant. However, the characteristics of meristematic mutations remain unclear, limiting our understanding of the genetic basis of somaclonal phenotypic variation. Here, we analyse the frequency and distribution
-
N6-methyladenosine writer METTL16-mediated alternative splicing and translation control are essential for murine spermatogenesis Genome Biol. (IF 10.1) Pub Date : 2024-07-19 Qian Ma, Yiqian Gui, Xixiang Ma, Bingqian Zhang, Wenjing Xiong, Shiyu Yang, Congcong Cao, Shaomei Mo, Ge Shu, Jing Ye, Kuan Liu, Xiaoli Wang, Yaoting Gui, Fengli Wang, Shuiqiao Yuan
The mitosis-to-meiosis switch during spermatogenesis requires dynamic changes in gene expression. However, the regulation of meiotic transcriptional and post-transcriptional machinery during this transition remains elusive. We report that methyltransferase-like protein 16 (METTL16), an N6-methyladenosine (m6A) writer, is required for mitosis-to-meiosis transition during spermatogenesis. Germline conditional
-
A benchmark of computational methods for correcting biases of established and unknown origin in CRISPR-Cas9 screening data Genome Biol. (IF 10.1) Pub Date : 2024-07-19 Alessandro Vinceti, Raffaele M. Iannuzzi, Isabella Boyle, Lucia Trastulla, Catarina D. Campbell, Francisca Vazquez, Joshua M. Dempster, Francesco Iorio
CRISPR-Cas9 dropout screens are formidable tools for investigating biology with unprecedented precision and scale. However, biases in data lead to potential confounding effects on interpretation and compromise overall quality. The activity of Cas9 is influenced by structural features of the target site, including copy number amplifications (CN bias). More worryingly, proximal targeted loci tend to
-
Single-cell decoding of drug induced transcriptomic reprogramming in triple negative breast cancers Genome Biol. (IF 10.1) Pub Date : 2024-07-18 Farhia Kabeer, Hoa Tran, Mirela Andronescu, Gurdeep Singh, Hakwoo Lee, Sohrab Salehi, Beixi Wang, Justina Biele, Jazmine Brimhall, David Gee, Viviana Cerda, Ciara O’Flanagan, Teresa Algara, Takako Kono, Sean Beatty, Elena Zaikova, Daniel Lai, Eric Lee, Richard Moore, Andrew J. Mungall, Marc J. Williams, Andrew Roth, Kieran R. Campbell, Sohrab P. Shah, Samuel Aparicio
The encoding of cell intrinsic drug resistance states in breast cancer reflects the contributions of genomic and non-genomic variations and requires accurate estimation of clonal fitness from co-measurement of transcriptomic and genomic data. Somatic copy number (CN) variation is the dominant mutational mechanism leading to transcriptional variation and notably contributes to platinum chemotherapy
-
Non-coding variants impact cis-regulatory coordination in a cell type-specific manner Genome Biol. (IF 10.1) Pub Date : 2024-07-18 Olga Pushkarev, Guido van Mierlo, Judith Franziska Kribelbauer, Wouter Saelens, Vincent Gardeux, Bart Deplancke
Interactions among cis-regulatory elements (CREs) play a crucial role in gene regulation. Various approaches have been developed to map these interactions genome-wide, including those relying on interindividual epigenomic variation to identify groups of covariable regulatory elements, referred to as chromatin modules (CMs). While CM mapping allows to investigate the relationship between chromatin modularity
-
Leveraging neighborhood representations of single-cell data to achieve sensitive DE testing with miloDE Genome Biol. (IF 10.1) Pub Date : 2024-07-18 Alsu Missarova, Emma Dann, Leah Rosen, Rahul Satija, John Marioni
Single-cell RNA-sequencing enables testing for differential expression (DE) between conditions at a cell type level. While powerful, one of the limitations of such approaches is that the sensitivity of DE testing is dictated by the sensitivity of clustering, which is often suboptimal. To overcome this, we present miloDE—a cluster-free framework for DE testing (available as an open-source R package)
-
Comprehensive and deep evaluation of structural variation detection pipelines with third-generation sequencing data Genome Biol. (IF 10.1) Pub Date : 2024-07-15 Zhi Liu, Zhi Xie, Miaoxin Li
Structural variation (SV) detection methods using third-generation sequencing data are widely employed, yet accurately detecting SVs remains challenging. Different methods often yield inconsistent results for certain SV types, complicating tool selection and revealing biases in detection. This study comprehensively evaluates 53 SV detection pipelines using simulated and real data from PacBio (CLR: