-
Long-read genome sequencing and variant reanalysis increase diagnostic yield in neurodevelopmental disorders Genome Res. (IF 6.2) Pub Date : 2024-09-19 Susan M Hiatt, James MJ Lawlor, Lori H Handley, Donald R Latner, Zachary T Bonnstetter, Candice R Finnila, Michelle L Thompson, Lori Beth Boston, Melissa Williams, Ivan Rodriguez-Nunez, Jerry Jenkins, Whitley V Kelley, E Martina Bebin, Michael A Lopez, Anna CE Hurst, Bruce R Korf, Jeremy Schmutz, Jane Grimwood, Gregory M. Cooper
Variant detection from long-read genome sequencing (lrGS) has proven to be more accurate and comprehensive than variant detection from short-read genome sequencing (srGS). However, the rate at which lrGS can increase molecular diagnostic yield for rare disease is not yet precisely characterized. We performed lrGS using Pacific Biosciences HiFi technology on 96 short-read-negative probands with rare
-
Chromosome-level subgenome-aware de novo assembly of Saccharomyces bayanus provides insight into genome divergence after hybridization Genome Res. (IF 6.2) Pub Date : 2024-09-17 Cory Gardner, Junhao Chen, Christina Hadfield, Zhaolian Lu, David Debruin, Yu Zhan, Maureen Donlin, Tae-Hyuk Ahn, Zhenguo Lin
Interspecies hybridization is prevalent in various eukaryotic lineages and plays important roles in phenotypic diversification, adaptation, and speciation. To better understand the changes that occurred in the different subgenomes of a hybrid species and how they facilitate adaptation, we completed chromosome-level de novo assemblies of all chromosomes for a recently formed hybrid yeast, Saccharomyces
-
Full-length RNA transcript sequencing traces brain isoform diversity in house mouse natural populations Genome Res. (IF 6.2) Pub Date : 2024-09-17 Wenyu Zhang, Anja Guenther, Yuanxiao Gao, Kristian Ullrich, Bruno Huettel, Aftab Ahmad, Lei Duan, Kaizong Wei, Diethard Tautz
The ability to generate multiple RNA transcript isoforms from the same gene is a general phenomenon in eukaryotes. However, the complexity and diversity of alternative isoforms in natural populations remain largely unexplored. Using a newly developed full-length transcripts enrichment protocol with 5' CAP selection, we sequenced full-length RNA transcripts of 48 individuals from outbred populations
-
Long-read RNA sequencing of archival tissues reveals novel genes and transcripts associated with clear cell renal cell carcinoma recurrence and immune evasion Genome Res. (IF 6.2) Pub Date : 2024-09-16 Joshua Lee, Elizabeth A Snell, Joanne Brown, Charlotte Elizabeth Booth, Rosamonde E Banks, Daniel J Turner, Naveen Vasudev, Dimitris Lagos
The use of long-read direct RNA sequencing (DRS) and PCR cDNA sequencing (PCS) in clinical oncology remains limited, with no direct comparison between the two methods. We used DRS and PCS to study clear cell renal cell carcinoma (ccRCC), focussing on new transcript and gene discovery. Twelve primary ccRCC archival tumors, six from patients who went on to relapse, were analysed. Results were validated
-
Measuring X inactivation skew for X-linked diseases with adaptive nanopore sequencing Genome Res. (IF 6.2) Pub Date : 2024-09-16 Sena A Gocuk, James Lancaster, Shian Su, Jasleen K Jolly, Thomas L Edwards, Doron G Hickey, Matthew E Ritchie, Marnie E Blewitt, Lauren N Ayton, Quentin Gouil
X-linked genetic disorders typically affect females less severely than males due to the presence of a second X Chromosome not carrying the deleterious variant. However, the phenotypic expression in females is highly variable, which may be explained by an allelic skew in X-Chromosome inactivation. Accurate measurement of X inactivation skew is crucial to understand and predict disease phenotype in carrier
-
Enhanced detection of RNA modifications and read mapping with high-accuracy nanopore RNA basecalling models Genome Res. (IF 6.2) Pub Date : 2024-09-13 Gregor Diensthuber, Leszek P Pryszcz, Laia Llovera, Morghan C Lucas, Anna Delgado-Tejedor, Sonia Cruciani, Jean-Yves Roignant, Oguzhan Begik, Eva Maria Novoa
In recent years, nanopore direct RNA sequencing (DRS) became a valuable tool for studying the epitranscriptome, due to its ability to detect multiple modifications within the same full-length native RNA molecules. While RNA modifications can be identified in the form of systematic basecalling 'errors' in DRS datasets, N6-methyladenosine (m6A) modifications produce relatively low 'errors' compared to
-
Long-read transcriptome sequencing of CLL and MDS patients uncovers molecular effects of SF3B1 mutations Genome Res. (IF 6.2) Pub Date : 2024-09-13 Alicja Pacholewska, Matthias Lienhard, Mirko Brueggemann, Heike Haenel, Lorina Bilalli, Anja Koenigs, Felix Hess, Kerstin Becker, Karl Koehrer, Jesko Fabian Kaiser, Holger Gohlke, Norbert Gattermann, Michael Hallek, Carmen Diana Herling, Julian Koenig, Christina Grimm, Ralf Herwig, Kathi Zarnack, Michal R. Schweiger
Mutations in splicing factor 3B subunit 1 (SF3B1) frequently occur in patients with chronic lymphocytic leukemia (CLL) and myelodysplastic syndromes (MDS). These mutations have different effects on the disease prognosis with beneficial effect in MDS and worse prognosis in CLL patients. A full-length transcriptome approach can expand our knowledge on SF3B1 mutation effects on RNA splicing and its contribution
-
Long-read DNA and cDNA sequencing identify cancer-predisposing deep intronic variation in tumor-suppressor genes Genome Res. (IF 6.2) Pub Date : 2024-09-13 Suleyman Gulsuner, Amal AbuRayyan, Jessica B. Mandell, Ming K. Lee, Greta V. Bernier, Barbara M. Norquist, Sarah B. Pierce, Mary-Claire King, Tom Walsh
The vast majority of deeply intronic genomic variants are benign, but some extremely rare or private deep intronic variants lead to exonification of intronic sequence with abnormal transcriptional consequences. Damaging variants of this class are likely underreported as causes of disease for several reasons: Most clinical DNA and RNA testing does not include full intronic sequences; many of these variants
-
Evidence for compensatory evolution within pleiotropic regulatory elements Genome Res. (IF 6.2) Pub Date : 2024-09-10 Zane Kliesmete, Peter Orchard, Victor Yan Kin Lee, Johanna Geuder, Simon M. Krauß, Mari Ohnuki, Jessica Jocher, Beate Vieth, Wolfgang Enard, Ines Hellmann
Pleiotropy, measured as expression breadth across tissues, is one of the best predictors for protein sequence and expression conservation. In this study, we investigated its effect on the evolution of cis-regulatory elements (CREs). To this end, we carefully reanalyzed the Epigenomics Roadmap data for nine fetal tissues, assigning a measure of pleiotropic degree to nearly half a million CREs. To assess
-
Protein domain embeddings for fast and accurate similarity search Genome Res. (IF 6.2) Pub Date : 2024-09-05 Benjamin Giovanni Iovino, Haixu Tang, Yuzhen Ye
Recently developed protein language models have enabled a variety of applications with the protein contextual embeddings they produce. Per-protein representations (each protein is represented as a vector of fixed dimension) can be derived via averaging the embeddings of individual residues, or applying matrix transformation techniques such as the discrete cosine transformation to matrices of residue
-
A Bayesian framework for inferring dynamic intercellular interactions from time-series single-cell data Genome Res. (IF 6.2) Pub Date : 2024-09-05 Cameron Y Park, Shouvik Mani, Nicolas Beltran-Velez, Katie Maurer, Teddy Huang, Shuqiang Li, Satyen Gohil, Kenneth J Livak, David A Knowles, Catherine J Wu, Elham Azizi
Characterizing cell-cell communication and tracking its variability over time are crucial for understanding the coordination of biological processes mediating normal development, disease progression, and responses to perturbations such as therapies. Existing tools fail to capture time-dependent intercellular interactions, and primarily rely on existing databases compiled from limited contexts. We introduce
-
Privacy-preserving biological age prediction over federated human methylation data using fully homomorphic encryption Genome Res. (IF 6.2) Pub Date : 2024-09-05 Meir Goldenberg, Loay Mualem, Amit Shahar, Sagi Snir, Adi Akavia
DNA methylation data plays a crucial role in estimating chronological age in mammals, offering real-time insights into an individual’s aging process. The Epigenetic Pacemaker (EPM) model allows inference of the biological age as deviations from the population trend. Given the sensitivity of this data, it is essential to safeguard both inputs and outputs of the EPM model. In a recent study, a privacy-preserving
-
Matrix sketching framework for linear mixed models in association studies Genome Res. (IF 6.2) Pub Date : 2024-09-04 Myson C Burch, Aritra Bose, Gregory Dexter, Laxmi Parida, Petros Drineas
Linear mixed models (LMMs) have been widely used in genome-wide association studies (GWAS) to control for population stratification and cryptic relatedness. However, estimating LMM parameters is computationally expensive, necessitating large-scale matrix operations to build the genetic relatedness matrix (GRM). Over the past 25 years, Randomized Linear Algebra has provided alternative approaches to
-
Spatial Cellular Networks from omics data with SpaCeNet Genome Res. (IF 6.2) Pub Date : 2024-09-04 Stefan Schrod, Niklas Lück, Robert Lohmayer, Stefan Solbrig, Dennis Völkl, Tina Wipfler, Katherine H. Shutta, Marouen Ben Guebila, Andreas Schäfer, Tim Beißbarth, Helena U. Zacharias, Peter Oefner, John Quackenbush, Michael Altenbuchinger
Advances in omics technologies have allowed spatially resolved molecular profiling of single cells, providing a window not only into the diversity and distribution of cell types within a tissue but also into the effects of interactions between cells in shaping the transcriptional landscape. Cells send chemical and mechanical signals which are received by other cells, where they can subsequently initiate
-
A best-match approach for gene set analysis in embedding spaces Genome Res. (IF 6.2) Pub Date : 2024-09-04 Lechuan Li, Ruth Dannenfelser, Charlie Cruz, Vicky Yao
Embedding methods have emerged as a valuable class of approaches for distilling essential information from complex high-dimensional data into more accessible lower-dimensional spaces. Applications of embedding methods to biological data have demonstrated that gene embeddings can effectively capture physical, structural, and functional relationships between genes. However, this utility has been primarily
-
A scalable adaptive quadratic kernel method for interpretable epistasis analysis in complex traits Genome Res. (IF 6.2) Pub Date : 2024-08-29 Boyang Fu, Prateek Anand, Aakarsh Anand, Joel Mefford, Sriram Sankararaman
Our knowledge of the contribution of genetic interactions (epistasis) to variation in human complex traits remains limited, partly due to the lack of efficient, powerful, and interpretable algorithms to detect interactions. Recently proposed approaches for set-based association tests show promise in improving power to detect epistasis by examining the aggregated effects of multiple variants. Nevertheless
-
Memory-bound k-mer selection for large and evolutionary diverse reference libraries Genome Res. (IF 6.2) Pub Date : 2024-08-29 Ali Osman Berk Sapci, Siavash Mirarab
Using k-mers to find sequence matches is increasingly used in many bioinformatic applications, including metagenomic sequence classification. The accuracy of these downstream applications relies on the density of the reference databases, which are rapidly growing. While the increased density provides hope for dramatic improvements in accuracy, scalability is a concern. The k-mers are kept in the memory
-
Visualization and analysis of medically relevant tandem repeats in nanopore sequencing of control cohorts with pathSTR Genome Res. (IF 6.2) Pub Date : 2024-08-15 Wouter De Coster, Ida Hoijer, Inge Bruggeman, Svenn D'Hert, Malin Melin, Adam Ameur, Rosa Rademakers
The lack of population-scale databases hampers research and diagnostics for medically relevant tandem repeats and repeat expansions. We attempt to fill this gap using our pathSTR web tool, which leverages long-read sequencing of large cohorts to determine repeat length and sequence composition in a healthy population. The current version includes 1040 individuals of the 1000 Genomes Project cohort
-
Bayesian inference of sample-specific coexpression networks Genome Res. (IF 6.2) Pub Date : 2024-08-12 Enakshi Saha, Viola Fanfani, Panagiotis Mandros, Marouen Ben Guebila, Jonas Fischer, Katherine H Shutta, Dawn L DeMeo, Camila M Lopes Ramos, John Quackenbush
Gene regulatory networks (GRNs) are effective tools for inferring complex interactions between molecules that regulate biological processes and hence can provide insights into drivers of biological systems. Inferring coexpression networks is a critical element of GRN inference, as the correlation between expression patterns may indicate that genes are coregulated by common factors. However, methods
-
Secure discovery of genetic relatives across large-scale and distributed genomic datasets Genome Res. (IF 6.2) Pub Date : 2024-08-07 Matthew Man-Hou Hong, David Froelicher, Ricky Magner, Victoria Popic, Bonnie Berger, Hyunghoon Cho
Finding relatives within a study cohort is a necessary step in many genomic studies. However, when the cohort is distributed across multiple entities subject to data-sharing restrictions, performing this step often becomes infeasible. Developing a privacy-preserving solution for this task is challenging due to the burden of estimating kinship between all pairs of individuals across datasets. We introduce
-
Reconstructing extrachromosomal DNA structural heterogeneity from long-read sequencing data using Decoil Genome Res. (IF 6.2) Pub Date : 2024-08-07 Madalina Giurgiu, Nadine Wittstruck, Elias Rodriguez-Fos, Rocio Chamorro Gonzalez, Lotte Brueckner, Annabell Krienelke-Szymansky, Konstantin Helmsauer, Anne Hartebrodt, Philipp Euskirchen, Richard P. Koche, Kerstin Haase, Knut Reinert, Anton G. Henssen
Circular extrachromosomal DNA (ecDNA) is a form of oncogene amplification found across cancer types and associated with poor outcome in patients. ecDNA can be structurally complex and contain rearranged DNA sequences derived from multiple chromosome locations. As the structure of ecDNA can impact oncogene regulation and may indicate mechanisms of its formation, disentangling it at high resolution from
-
Independent expansion, selection and hypervariability of the TBC1D3 gene family in humans Genome Res. (IF 6.2) Pub Date : 2024-08-06 Xavi Guitart, David Porubsky, DongAhn Yoo, Max L Dougherty, Philip Dishuck, Katherine M. Munson, Alexandra P. Lewis, Kendra Hoekzema, Jordan Knuth, Stephen Chang, Tomi Pastinen, Evan E. Eichler
TBC1D3 is a primate-specific gene family that has expanded in the human lineage and has been implicated in neuronal progenitor proliferation and expansion of the frontal cortex. The gene family and its expression have been challenging to investigate because it is embedded in high-identity and highly variable segmental duplications. We sequenced and assembled the gene family using long-read sequencing
-
A novel approach for in vivo DNA footprinting using short double-stranded cell-free DNA from plasma Genome Res. (IF 6.2) Pub Date : 2024-08-01 Jan Müller, Christina Hartwig, Mirko Sonntag, Lisa Bitzer, Christopher Adelmann, Yevhen Vainshtein, Karolina Glanz, Sebastian O. Decker, Thorsten Brenner, Georg F. Weber, Arndt von Haeseler, Kai Sohn
Here, we present a method for enrichment of double-stranded cfDNA with an average length of ∼40 bp from cfDNA for high-throughput DNA sequencing. This class of cfDNA is enriched at gene promoters and binding sites of transcription factors or structural DNA-binding proteins, so that a genome-wide DNA footprint is directly captured from liquid biopsies. In short double-stranded cfDNA from healthy individuals
-
Comprehensive identification of genomic and environmental determinants of phenotypic plasticity in maize Genome Res. (IF 6.2) Pub Date : 2024-08-01 Laura E. Tibbs-Cortes, Tingting Guo, Carson M. Andorf, Xianran Li, Jianming Yu
Maize phenotypes are plastic, determined by the complex interplay of genetics and environmental variables. Uncovering the genes responsible and understanding how their effects change across a large geographic region are challenging. In this study, we conducted systematic analysis to identify environmental indices that strongly influence 19 traits (including flowering time, plant architecture, and yield
-
Differences in activity and stability drive transposable element variation in tropical and temperate maize Genome Res. (IF 6.2) Pub Date : 2024-08-01 Shujun Ou, Armin Scheben, Tyler Collins, Yinjie Qiu, Arun S. Seetharam, Claire C. Menard, Nancy Manchanda, Jonathan I. Gent, Michael C. Schatz, Sarah N. Anderson, Matthew B. Hufford, Candice N. Hirsch
Much of the profound interspecific variation in genome content has been attributed to transposable elements (TEs). To explore the extent of TE variation within species, we developed an optimized open-source algorithm, panEDTA, to de novo annotate TEs in a pangenome context. We then generated a unified TE annotation for a maize pangenome derived from 26 reference-quality genomes, which reveals an excess
-
Genetic complexity of killer-cell immunoglobulin-like receptor genes in human pangenome assemblies Genome Res. (IF 6.2) Pub Date : 2024-08-01 Tsung-Kai Hung, Wan-Chi Liu, Sheng-Kai Lai, Hui-Wen Chuang, Yi-Che Lee, Hong-Ye Lin, Chia-Lang Hsu, Chien-Yu Chen, Ya-Chien Yang, Jacob Shujui Hsu, Pei-Lung Chen
The killer-cell immunoglobulin-like receptor (KIR) gene complex, a highly polymorphic region of the human genome that encodes proteins involved in immune responses, poses strong challenges in genotyping owing to its remarkable genetic diversity and structural intricacy. Accurate analysis of KIR alleles, including their structural variations, is crucial for understanding their roles in various immune
-
Genome-wide patterns of selection–drift variation strongly associate with organismal traits across the green plant lineage Genome Res. (IF 6.2) Pub Date : 2024-08-01 Kavitha Uthanumallian, Andrea Del Cortona, Susana M. Coelho, Olivier De Clerck, Sebastian Duchene, Heroen Verbruggen
There are many gaps in our knowledge of how life cycle variation and organismal body architecture associate with molecular evolution. Using the diverse range of green algal body architectures and life cycle types as a test case, we hypothesize that increases in cytomorphological complexity are likely to be associated with a decrease in the effective population size, because larger-bodied organisms
-
Allele-specific transcription factor binding across human brain regions offers mechanistic insight into eQTLs Genome Res. (IF 6.2) Pub Date : 2024-08-01 Ashlyn G. Anderson, Belle A. Moyers, Jacob M. Loupe, Ivan Rodriguez-Nunez, Stephanie A. Felker, James M.J. Lawlor, William E. Bunney, Blynn G. Bunney, Preston M. Cartagena, Adolfo Sequeira, Stanley J. Watson, Huda Akil, Eric M. Mendenhall, Gregory M. Cooper, Richard M. Myers
Transcription factors (TFs) regulate gene expression by facilitating or disrupting the formation of transcription initiation machinery at particular genomic loci. Because TF occupancy is driven in part by recognition of DNA sequence, genetic variation can influence TF–DNA associations and gene regulation. To identify variants that impact TF binding in human brain tissues, we assessed allele-specific
-
A simple method for finding related sequences by adding probabilities of alternative alignments Genome Res. (IF 6.2) Pub Date : 2024-08-01 Martin C. Frith
The main way of analyzing genetic sequences is by finding sequence regions that are related to each other. There are many methods to do that, usually based on this idea: Find an alignment of two sequence regions, which would be unlikely to exist between unrelated sequences. Unfortunately, it is hard to tell if an alignment is likely to exist by chance. Also, the precise alignment of related regions
-
Colibactin leads to a bacteria-specific mutation pattern and self-inflicted DNA damage Genome Res. (IF 6.2) Pub Date : 2024-08-01 Emily Lowry, Yiqing Wang, Tal Dagan, Amir Mitchell
Colibactin produced primarily by Escherichia coli strains of the B2 phylogroup cross-links DNA and can promote colon cancer in human hosts. Here, we investigate the toxin's impact on colibactin producers and on bacteria cocultured with producing cells. Using genome-wide genetic screens and mutation accumulation experiments, we uncover the cellular pathways that mitigate colibactin damage and reveal
-
Widespread natural selection on metabolite levels in humans Genome Res. (IF 6.2) Pub Date : 2024-08-01 Yanina Timasheva, Kaido Lepik, Orsolya Liska, Balázs Papp, Zoltán Kutalik
Natural selection acts ubiquitously on complex human traits, predominantly constraining the occurrence of extreme phenotypes (stabilizing selection). These constraints propagate to DNA sequence variants associated with traits under selection. The genetic signatures of such evolutionary events can thus be detected via combining effect size estimates from genetic association studies and the corresponding
-
Benchmarking bulk and single-cell variant-calling approaches on Chromium scRNA-seq and scATAC-seq libraries Genome Res. (IF 6.2) Pub Date : 2024-08-01 Matthew Wiens, Hossein Farahani, R. Wilder Scott, T. Michael Underhill, Ali Bashashati
Single-cell sequencing methodologies such as scRNA-seq and scATAC-seq have become widespread and effective tools to interrogate tissue composition. Increasingly, variant callers are being applied to these methodologies to resolve the genetic heterogeneity of a sample, especially in the case of detecting the clonal architecture of a tumor. Typically, traditional bulk DNA variant callers are applied
-
A fast and adaptive detection framework for genome-wide chromatin loop mapping from Hi-C data Genome Res. (IF 6.2) Pub Date : 2024-08-01 Siyuan Chen, Jiuming Wang, Inkyung Jung, Zhaowen Qiu, Xin Gao, Yu Li
Chromatin loop identification plays an important role in molecular biology and 3D genomics research, as it constitutes a fundamental process in transcription and gene regulation. Such precise chromatin structures can be identified across genome-wide interaction matrices via Hi-C data analysis, which is essential for unraveling the intricacies of transcriptional regulation. Given the increasing number
-
A spatiotemporally resolved atlas of mRNA decay in the C. elegans embryo reveals differential regulation of mRNA stability across stages and cell types Genome Res. (IF 6.2) Pub Date : 2024-08-01 Felicia Peng, C. Erik Nordgren, John Isaac Murray
During embryonic development, cells undergo dynamic changes in gene expression that are required for appropriate cell fate specification. Although both transcription and mRNA degradation contribute to gene expression dynamics, patterns of mRNA decay are less well understood. Here, we directly measure spatiotemporally resolved mRNA decay rates transcriptome-wide throughout C. elegans embryogenesis by
-
Accurate assembly of circular RNAs with TERRACE Genome Res. (IF 6.2) Pub Date : 2024-07-26 Tasfia Zahin, Qian Shi, Xiaofei Carl Zang, Mingfu Shao
Circular RNA (circRNA) is a class of RNA molecules that forms a closed loop with its 5' and 3' ends covalently bonded. circRNAs are known to be more stable than linear RNAs, admit distinct properties and functions, and have been proven to be promising biomarkers. Existing methods for assembling circRNAs heavily rely on the annotated transcriptomes, hence exhibiting unsatisfactory accuracy without a
-
Parameter-efficient fine-tuning on large protein language models improves signal peptide prediction Genome Res. (IF 6.2) Pub Date : 2024-07-26 Shuai Zeng, Duolin Wang, Lei Jiang, Dong Xu
Signal peptides (SP) play a crucial role in protein translocation in cells. The development of large protein language models (PLMs) and prompt-based learning provides a new opportunity for SP prediction, especially for the categories with limited annotated data. We present a parameter-efficient fine-tuning (PEFT) framework for SP prediction, PEFT-SP, to effectively utilize pretrained PLMs. We integrated
-
Scalable summary statistics-based heritability estimation method with individual genotype level accuracy Genome Res. (IF 6.2) Pub Date : 2024-07-22 Moonseong Jeong, Ali Pazokitoroudi, Zhengtong Liu, Sriram Sankararaman
SNP heritability, the proportion of phenotypic variation explained by genotyped SNPs, is an important parameter in understanding the genetic architecture underlying various diseases and traits. Methods that aim to estimate SNP heritability from individual genotype and phenotype data are limited by their ability to scale to Biobank-scale datasets and by the restrictions in access to individual-level
-
Graph-based self-supervised learning for repeat detection in metagenomic assembly Genome Res. (IF 6.2) Pub Date : 2024-07-19 Ali Azizpour, Advait Balaji, Todd J. Treangen, Santiago Segarra
Repetitive DNA (repeats) poses significant challenges for accurate and efficient genome assembly and sequence alignment. This is particularly true for metagenomic data, where genome dynamics such as horizontal gene transfer, gene duplication, and gene loss/gain complicate accurate genome assembly from metagenomic communities. Detecting repeats is a crucial first step in overcoming these challenges
-
Haplotype-aware sequence alignment to pangenome graphs Genome Res. (IF 6.2) Pub Date : 2024-07-16 Ghanshyam Chandra, Daniel Gibney, Chirag Jain
Modern pangenome graphs are built using haplotype-resolved genome assemblies. When mapping reads to a pangenome graph, prioritizing alignments that are consistent with the known haplotypes improves genotyping accuracy. However, the existing rigorous formulations for co-linear chaining and alignment problems do not consider the haplotype paths in a pangenome graph. This often leads to spurious read
-
CoRAL accurately resolves extrachromosomal DNA genome structures with long-read sequencing Genome Res. (IF 6.2) Pub Date : 2024-07-09 Kaiyuan Zhu, Matthew Gregory Jones, Jens Luebeck, Xinxin Bu, Hyerim Yi, King L. Huang, Ivy Tsz-Lo Wong, Shu Zhang, Paul S. Mischel, Howard Chang, Vineet Bafna
Extrachromosomal DNA (ecDNA) is a central mechanism for focal oncogene amplification in cancer, occurring in approximately 15% of early-stage cancers and 30% of late-stage cancers. EcDNAs drive tumor formation, evolution, and drug resistance by dynamically modulating oncogene copy-number and rewiring gene-regulatory networks. Elucidating the genomic architecture of ecDNA amplifications is critical
-
A gene regulatory network–aware graph learning method for cell identity annotation in single-cell RNA-seq data Genome Res. (IF 6.2) Pub Date : 2024-07-01 Mengyuan Zhao, Jiawei Li, Xiaoyi Liu, Ke Ma, Jijun Tang, Fei Guo
Cell identity annotation for single-cell transcriptome data is a crucial process for constructing cell atlases, unraveling pathogenesis, and inspiring therapeutic approaches. Currently, the efficacy of existing methodologies is contingent upon specific data sets. Nevertheless, such data are often sourced from various batches, sequencing technologies, tissues, and even species. Notably, the gene regulatory
-
Pangenome-spanning epistasis and coselection analysis via de Bruijn graphs Genome Res. (IF 6.2) Pub Date : 2024-07-01 Juri Kuronen, Samuel T. Horsfield, Anna K. Pöntinen, Sudaraka Mallawaarachchi, Sergio Arredondo-Alonso, Harry Thorpe, Rebecca A. Gladstone, Rob J.L. Willems, Stephen D. Bentley, Nicholas J. Croucher, Johan Pensar, John A. Lees, Gerry Tonkin-Hill, Jukka Corander
Studies of bacterial adaptation and evolution are hampered by the difficulty of measuring traits such as virulence, drug resistance, and transmissibility in large populations. In contrast, it is now feasible to obtain high-quality complete assemblies of many bacterial genomes thanks to scalable high-accuracy long-read sequencing technologies. To exploit this opportunity, we introduce a phenotype- and
-
The Chinese longsnout catfish genome provides novel insights into the feeding preference and corresponding metabolic strategy of carnivores Genome Res. (IF 6.2) Pub Date : 2024-07-01 Yulong Liu, Gang Zhai, Jingzhi Su, Yulong Gong, Bingyuan Yang, Qisheng Lu, Longwei Xi, Yutong Zheng, Jingyue Cao, Haokun Liu, Junyan Jin, Zhimin Zhang, Yunxia Yang, Xiaoming Zhu, Zhongwei Wang, Gaorui Gong, Jie Mei, Zhan Yin, Rodolphe E. Gozlan, Shouqi Xie, Dong Han
Fish show variation in feeding habits to adapt to complex environments. However, the genetic basis of feeding preference and the corresponding metabolic strategies that differentiate feeding habits remain elusive. Here, by comparing the whole genome of a typical carnivorous fish (Leiocassis longirostris Günther) with that of herbivorous fish, we identify 250 genes through both positive selection and
-
The grasshopper genome reveals long-term gene content conservation of the X Chromosome and temporal variation in X Chromosome evolution Genome Res. (IF 6.2) Pub Date : 2024-07-01 Xinghua Li, Judith E. Mank, Liping Ban
We present the first chromosome-level genome assembly of the grasshopper, Locusta migratoria, one of the largest insect genomes. We use coverage differences between females (XX) and males (X0) to identify the X Chromosome gene content, and find that the X Chromosome shows both complete dosage compensation in somatic tissues and an underrepresentation of testis-expressed genes. X-linked gene content
-
Reference-informed prediction of alternative splicing and splicing-altering mutations from sequences Genome Res. (IF 6.2) Pub Date : 2024-07-01 Chencheng Xu, Suying Bao, Ye Wang, Wenxing Li, Hao Chen, Yufeng Shen, Tao Jiang, Chaolin Zhang
Alternative splicing plays a crucial role in protein diversity and gene expression regulation in higher eukaryotes, and mutations causing dysregulated splicing underlie a range of genetic diseases. Computational prediction of alternative splicing from genomic sequences not only provides insight into gene-regulatory mechanisms but also helps identify disease-causing mutations and drug targets. However
-
High-fidelity, large-scale targeted profiling of microsatellites Genome Res. (IF 6.2) Pub Date : 2024-07-01 Caitlin A. Loh, Danielle A. Shields, Adam Schwing, Gilad D. Evrony
Microsatellites are highly mutable sequences that can serve as markers for relationships among individuals or cells within a population. The accuracy and resolution of reconstructing these relationships depends on the fidelity of microsatellite profiling and the number of microsatellites profiled. However, current methods for targeted profiling of microsatellites incur significant “stutter” artifacts
-
CodonBERT large language model for mRNA vaccines Genome Res. (IF 6.2) Pub Date : 2024-07-01 Sizhen Li, Saeed Moayedpour, Ruijiang Li, Michael Bailey, Saleh Riahi, Lorenzo Kogler-Anele, Milad Miladi, Jacob Miner, Fabien Pertuy, Dinghai Zheng, Jun Wang, Akshay Balsubramani, Khang Tran, Minnie Zacharia, Monica Wu, Xiaobo Gu, Ryan Clinton, Carla Asquith, Joseph Skaleski, Lianne Boeglin, Sudha Chivukula, Anusha Dias, Tod Strugnell, Fernando Ulloa Montoya, Vikram Agarwal, Ziv Bar-Joseph, Sven Jager
mRNA-based vaccines and therapeutics are gaining popularity and usage across a wide range of conditions. One of the critical issues when designing such mRNAs is sequence optimization. Even small proteins or peptides can be encoded by an enormously large number of mRNAs. The actual mRNA sequence can have a large impact on several properties, including expression, stability, immunogenicity, and more
-
Streamlined spatial and environmental expression signatures characterize the minimalist duckweed Wolffia australiana Genome Res. (IF 6.2) Pub Date : 2024-07-01 Tom Denyer, Pin-Jou Wu, Kelly Colt, Bradley W. Abramson, Zhili Pang, Pavel Solansky, Allen Mamerto, Tatsuya Nobori, Joseph R. Ecker, Eric Lam, Todd P. Michael, Marja C.P. Timmermans
Single-cell genomics permits a new resolution in the examination of molecular and cellular dynamics, allowing global, parallel assessments of cell types and cellular behaviors through development and in response to environmental circumstances, such as interaction with water and the light–dark cycle of the Earth. Here, we leverage the smallest, and possibly most structurally reduced, plant, the semiaquatic
-
Interspecies regulatory landscapes and elements revealed by novel joint systematic integration of human and mouse blood cell epigenomes Genome Res. (IF 6.2) Pub Date : 2024-07-01 Guanjue Xiang, Xi He, Belinda M. Giardine, Kathryn J. Isaac, Dylan J. Taylor, Rajiv C. McCoy, Camden Jansen, Cheryl A. Keller, Alexander Q. Wixom, April Cockburn, Amber Miller, Qian Qi, Yanghua He, Yichao Li, Jens Lichtenberg, Elisabeth F. Heuston, Stacie M. Anderson, Jing Luan, Marit W. Vermunt, Feng Yue, Michael E.G. Sauria, Michael C. Schatz, James Taylor, Berthold Göttgens, Jim R. Hughes, Douglas
Knowledge of locations and activities of cis-regulatory elements (CREs) is needed to decipher basic mechanisms of gene regulation and to understand the impact of genetic variants on complex traits. Previous studies identified candidate CREs (cCREs) using epigenetic features in one species, making comparisons difficult between species. In contrast, we conducted an interspecies study defining epigenetic
-
Delineating yeast cleavage and polyadenylation signals using deep learning Genome Res. (IF 6.2) Pub Date : 2024-07-01 Emily Kunce Stroup, Zhe Ji
3′-end cleavage and polyadenylation is an essential process for eukaryotic mRNA maturation. In yeast species, the polyadenylation signals that recruit the processing machinery are degenerate and remain poorly characterized compared with the well-defined regulatory elements in mammals. Here we address this issue by developing deep learning models to deconvolute degenerate cis-regulatory elements and
-
Long-read Ribo-STAMP simultaneously measures transcription and translation with isoform resolution Genome Res. (IF 6.2) Pub Date : 2024-06-21 Pratibha Jagannatha, Alexandra T Tankka, Daniel A Lorenz, Tao Yu, Brian A Yee, Kristopher W Brannan, Catherine Jiarui Zhou, Jason G Underwood, Gene W. Yeo
Transcription and translation are intertwined processes where mRNA isoforms are crucial intermediaries. However, methodological limitations in analyzing translation at the mRNA isoform level have left gaps in our understanding of critical biological processes. To address these gaps, we developed an integrated computational and experimental framework called long-read Ribo-STAMP (LR-Ribo-STAMP) that
-
Size-based expectation maximization for characterizing nucleosome positions and subtypes Genome Res. (IF 6.2) Pub Date : 2024-06-17 Jianyu Yang, Kuangyu Yen, Shaun Mahony
Genome-wide nucleosome profiles are predominantly characterized using MNase-seq, which involves extensive MNase digestion and size selection to enrich for mono-nucleosome-sized fragments. Most available MNase-seq analysis packages assume that nucleosomes uniformly protect 147-bp DNA fragments. However, some nucleosomes with atypical histone or chemical compositions protect shorter lengths of DNA. The
-
DNA-m6A calling and integrated long-read epigenetic and genetic analysis with fibertools Genome Res. (IF 6.2) Pub Date : 2024-06-07 Anupama Jha, Stephanie C. Bohaczuk, Yizi Mao, Jane Ranchalis, Benjamin J. Mallory, Alan T. Min, Morgan O Hamm, Elliott Swanson, Danilo Dubocanin, Connor Finkbeiner, Tony Li, Dale Whittington, William Stafford Noble, Andrew Ben Stergachis, Mitchell R Vollger
Long-read DNA sequencing has recently emerged as a powerful tool for studying both genetic and epigenetic architectures at single-molecule and single-nucleotide resolution. Long-read epigenetic studies encompass both the direct identification of native cytosine methylation as well as the identification of exogenously placed DNA N6-methyladenine (DNA-m6A). However, detecting DNA-m6A modifications using
-
Full resolution HLA and KIR gene annotations for human genome assemblies Genome Res. (IF 6.2) Pub Date : 2024-06-05 Ying Zhou, Li Song, Heng Li
The Human leukocyte antigens (HLA) genes and the Killer cell immunoglobulin like receptors (KIR) genes are critical to immune responses and are associated with many immune-related diseases. Located in highly polymorphic regions, they are hard to study with traditional short-read alignment-based methods. Although modern long-read assemblers can often assemble these genes, using existing tools to annotate
-
Long read subcellular fractionation and sequencing reveals the translational fate of full-length mRNA isoforms during neuronal differentiation Genome Res. (IF 6.2) Pub Date : 2024-06-05 Alexander Ritter, Jolene M Draper, Christopher Vollmers, Jeremy Sanford
Alternative splicing (AS) alters the cis-regulatory landscape of mRNA isoforms leading to transcripts with distinct localization, stability and translational efficiency. To rigorously investigate mRNA isoform-specific ribosome association, we generated subcellular fractionation and sequencing (Frac-seq) libraries using both conventional short reads and long reads from human embryonic stem cells (ESC)
-
Corrigendum: Centromere RNA is a key component for the assembly of nucleoproteins at the nucleolus and centromere Genome Res. (IF 6.2) Pub Date : 2024-06-01 Lee H. Wong, Kate H. Brettingham-Moore, Lyn Chan, Julie M. Quach, Melissa A. Anderson, Emma L. Northrop, Ross Hannan, Richard Saffery, Margaret L. Shaw, Evan Williams, K.H. Andy Choo
Genome Research 17: 1146–1160 (2007)
-
Global compositional and functional states of the human gut microbiome in health and disease Genome Res. (IF 6.2) Pub Date : 2024-06-01 Sunjae Lee, Theo Portlock, Emmanuelle Le Chatelier, Fernando Garcia-Guevara, Frederick Clasen, Florian Plaza Oñate, Nicolas Pons, Neelu Begum, Azadeh Harzandi, Ceri Proffitt, Dorines Rosario, Stefania Vaga, Junseok Park, Kalle von Feilitzen, Fredric Johansson, Cheng Zhang, Lindsey A. Edwards, Vincent Lombard, Franck Gauthier, Claire J. Steves, David Gomez-Cabrero, Bernard Henrissat, Doheon Lee, Lars
The human gut microbiota is of increasing interest, with metagenomics a key tool for analyzing bacterial diversity and functionality in health and disease. Despite increasing efforts to expand microbial gene catalogs and an increasing number of metagenome-assembled genomes, there have been few pan-metagenomic association studies and in-depth functional analyses across different geographies and diseases
-
Single-cell discovery of m6A RNA modifications in the hippocampus Genome Res. (IF 6.2) Pub Date : 2024-06-01 Shuangshuang Feng, Maitena Tellaetxe-Abete, Yujie Zhang, Yan Peng, Han Zhou, Mingjie Dong, Erika Larrea, Liang Xue, Li Zhang, Magdalena J. Koziol
N6-Methyladenosine (m6A) is a prevalent and highly regulated RNA modification essential for RNA metabolism and normal brain function. It is particularly important in the hippocampus, where m6A is implicated in neurogenesis and learning. Although extensively studied, its presence in specific cell types remains poorly understood. We investigated m6A in the hippocampus at a single-cell resolution, revealing
-
DEAD box RNA helicases are pervasive protein kinase interactors and activators Genome Res. (IF 6.2) Pub Date : 2024-06-01 Alexander Hirth, Edoardo Fatti, Eugen Netz, Sergio P. Acebron, Dimitris Papageorgiou, Andrea Švorinić, Cristina-Maria Cruciat, Emil Karaulanov, Alexandr Gopanenko, Tianheng Zhu, Irmgard Sinning, Jeroen Krijgsveld, Oliver Kohlbacher, Christof Niehrs
DEAD box (DDX) RNA helicases are a large family of ATPases, many of which have unknown functions. There is emerging evidence that besides their role in RNA biology, DDX proteins may stimulate protein kinases. To investigate if protein kinase–DDX interaction is a more widespread phenomenon, we conducted three orthogonal large-scale screens, including proteomics analysis with 32 RNA helicases, protein
-
Accurate allocation of multimapped reads enables regulatory element analysis at repeats Genome Res. (IF 6.2) Pub Date : 2024-06-01 Alexis Morrissey, Jeffrey Shi, Daniela Q. James, Shaun Mahony
Transposable elements (TEs) and other repetitive regions have been shown to contain gene regulatory elements, including transcription factor binding sites. However, regulatory elements harbored by repeats have proven difficult to characterize using short-read sequencing assays such as ChIP-seq or ATAC-seq. Most regulatory genomics analysis pipelines discard “multimapped” reads that align equally well