-
Efficient taxa identification using a pangenome index Genome Res. (IF 9.438) Pub Date : 2023-05-31 Omar Ahmed, Massimiliano Rossi, Christina Boucher, Ben Langmead
Tools that classify sequencing reads against a database of reference sequences require efficient index data structures. The r-index is a compressed full-text index that answers substring presence/absence, count and locate queries in space proportional to the amount of distinct sequence in the database: O(r) space where r is the number of Burrows-Wheeler runs. To date, the r-index has lacked the ability
-
Discordant calls across genotype discovery approaches elucidate variants with systematic errors Genome Res. (IF 9.438) Pub Date : 2023-05-30 Elizabeth G Atkinson, Mykyta Artomov, Alexander A Loboda, Heidi L Rehm, Daniel G MacArthur, Konrad J. Karczewski, Benjamin Neale, Mark J Daly
Large-scale high-throughput sequencing datasets have been transformative for informing clinical variant interpretation and as reference panels for statistical and population genetic efforts. While such resources are often treated as ground truth, we find that in widely used reference datasets such as the Genome Aggregation Database (gnomAD), some variants pass gold standard filters yet are systematically
-
Extremely fast construction and querying of compacted and colored de Bruijn graphs with GGCAT Genome Res. (IF 9.438) Pub Date : 2023-05-30 Andrea Cracco, Alexandru I. Tomescu
Compacted de Bruijn graphs are one of the most fundamental data structures in computational genomics. Colored compacted de Bruijn graphs are a variant built on a collection of sequences, and associate to each k-mer the sequences in which it appears. We present GGCAT, a tool for constructing both types of graphs, based on a new approach merging the k-mer counting step with the unitig construction step
-
Genealogical inference and more flexible sequence clustering using iterative PopPUNK Genome Res. (IF 9.438) Pub Date : 2023-05-30 Bin Zhao, John A. Lees, Hongjin Wu, Chao Yang, Daniel Falush
Bacterial genome data are accumulating at an unprecedented speed due the routine use of sequencing in clinical diagnoses, public health surveillance and population genetics studies. Genealogical reconstruction is fundamental to many of these uses, however, inferring genealogy from large-scale genome datasets quickly, accurately, and flexibly is still a challenge. Here, we extend an alignment- and annotation-free
-
Variation in histone configurations correlates with gene expression across nine inbred strains of mice Genome Res. (IF 9.438) Pub Date : 2023-05-22 Anna L Tyler, Catrina Spruce, Romy Kursawe, Annat Haber, Robyn L Ball, Wendy A Pitman, Alexander D Fine, Narayanan Raghupathy, Michael Walker, Vivek M Philip, Christopher L Baker, J. Matthew Mahoney, Gary A. Churchill, Jennifer J Trowbridge, Michael L Stitzel, Kenneth Paigen, Petko M Petkov, Gregory W Carter
The diversity outbred (DO) mice and their inbred founders are widely used models of human disease. However, although the genetic diversity of these mice has been well documented, their epigenetic diversity has not. Epigenetic modifications, such as histone modifications and DNA methylation, are important regulators of gene expression, and as such are a critical mechanistic link between genotype and
-
A fast and scalable method for inferring phylogenetic networks from trees by aligning lineage taxon strings Genome Res. (IF 9.438) Pub Date : 2023-05-22 Louxin Zhang, Niloufar Niloufar Abhari, Caroline Colijn, Yufeng Wu
The reconstruction of phylogenetic networks is an important but challenging problem in phylogenetics and genome evolution, as the space of phylogenetic networks is vast and cannot be sampled well. One approach to the problem is to solve the minimum phylogenetic network problem, in which phylogenetic trees are first inferred, then the smallest phylogenetic network that displays all the trees is computed
-
Enabling trade-offs in privacy and utility in genomic data beacons and summary statistics Genome Res. (IF 9.438) Pub Date : 2023-05-22 Rajagopal Venkatesaramani, Zhiyu Wan, Bradley A. Malin, Yevgeniy Vorobeychik
The collection and sharing of genomic data are becoming increasingly commonplace in research, clinical, and direct-to-consumer settings. The computational protocols typically adopted to protect individual privacy include sharing summary statistics, such as allele frequencies, or limiting query responses to the presence/absence of alleles of interest using web-services called Beacons. However, even
-
Unsupervised contrastive peak caller for ATAC-seq Genome Res. (IF 9.438) Pub Date : 2023-05-22 Ha T. H. Vu, Yudi Zhang, Geetu Tuteja, Karin S. Dorman
The assay for transposase-accessible chromatin with sequencing (ATAC-seq) is a common assay to identify chromatin accessible regions by using a Tn5 transposase that can access, cut, and ligate adapters to DNA fragments for subsequent amplification and sequencing. These sequenced regions are quantified and tested for enrichment in a process referred to as "peak calling". Most unsupervised peak calling
-
Entropy predicts sensitivity of pseudo-random seeds Genome Res. (IF 9.438) Pub Date : 2023-05-22 Benjamin Dominik Maier, Kristoffer Sahlin
Seed design is important for sequence similarity search applications such as read mapping and average nucleotide identity (ANI) estimation. While k-mers and spaced k-mers are likely the most well-known and used seeds, sensitivity suffers at high error rates, particularly when indels are present. Recently, we developed a pseudo-random seeding construct, strobemers, which were empirically demonstrated
-
Leveraging family data to design Mendelian Randomization that is provably robust to population stratification Genome Res. (IF 9.438) Pub Date : 2023-05-17 Nathan LaPierre, Boyang Fu, Steven Turnbull, Eleazar Eskin, Sriram Sankararaman
Mendelian Randomization (MR) has emerged as a powerful approach to leverage genetic instruments to infer causality between pairs of traits in observational studies. However, the results of such studies are susceptible to biases due to weak instruments as well as the confounding effects of population stratification and horizontal pleiotropy. Here, we show that family data can be leveraged to design
-
Improving quartet graph construction for scalable and accurate species tree estimation from gene trees Genome Res. (IF 9.438) Pub Date : 2023-05-17 Yunheng Han, Erin K Molloy
Summary methods are widely employed to estimate species trees from genome-scale data. However, they can fail to produce accurate species trees when the input gene trees are highly discordant due to estimation error and biological processes, like incomplete lineage sorting. Here, we introduce TREE-QMC, a new summary method that offers accuracy and scalability under these challenging scenarios. TREE-QMC
-
Efficient and accurate KIR and HLA genotyping with massively parallel sequencing data Genome Res. (IF 9.438) Pub Date : 2023-05-11 Li Song, Gali Bai, X. Shirley Liu, Bo Li, Heng Li
Killer immunoglobulin-like receptor (KIR) genes and human leukocyte antigen (HLA) genes play important roles in innate and adaptive immunity. They are highly polymorphic and cannot be genotyped with standard variant calling pipelines. Compared with HLA genes, many KIR genes are similar to each other in sequences and may be absent in the chromosomes. Therefore, while many tools have been developed to
-
Multiplexed long-read plasmid validation and analysis using OnRamp Genome Res. (IF 9.438) Pub Date : 2023-05-08 Camille Mumm, Melissa L Drexel, Torrin L McDonald, Adam G Diehl, Jessica A Switzenberg, Alan P Boyle
Recombinant plasmid vectors are versatile tools that have facilitated discoveries in molecular biology, genetics, proteomics, and many other fields. As the enzymatic and bacterial processes used to create recombinant DNA can introduce errors, sequence validation is an essential step in plasmid assembly. Sanger sequencing is the current standard for plasmid validation; however, this method is limited
-
Comparing genomic and epigenomic features across species using the WashU Comparative Epigenome Browser Genome Res. (IF 9.438) Pub Date : 2023-05-08 Xiaoyu Zhuo, Silas Hsu, Deepak Purushotham, Prashant K Kuntala, Jessica K Harrison, Alan Y Du, Samuel Chen, Daofeng Li, Ting Wang
Genome browsers have become an intuitive and critical tool to visualize and analyze genomic features and data. Conventional genome browsers display data/annotations on a single reference genome/assembly; there are also genomic alignment viewer/browsers that help users visualize alignment, mismatch, and rearrangement between syntenic regions. However, there is a growing need for a comparative epigenome
-
Dynamic modulation of genomic enhancer elements in the suprachiasmatic nucleus, the site of the mammalian circadian clock Genome Res. (IF 9.438) Pub Date : 2023-05-08 Akanksha Bafna, Gareth Banks, Michael H Hastings, Patrick M Nolan
The mammalian suprachiasmatic nucleus (SCN), located in the ventral hypothalamus, synchronises and maintains daily cellular and physiological rhythms across the body, in accordance with environmental and visceral cues. Consequently, the systematic regulation of spatiotemporal gene transcription in the SCN is vital for daily timekeeping. So far, the regulatory elements assisting circadian gene transcription
-
Genomic insights into metabolic flux in hummingbirds Genome Res. (IF 9.438) Pub Date : 2023-05-08 Ariel Gershman, Quinn Hauck, Morag Dick, Jerrica M Jamison, Michael Tassia, Xabier Agirrezabala, Saad Muhammad, Raafay Ali, Rachael E. Workman, Mikel Valle, G William Wong, Kenneth C Welch, Jr., Winston Timp
Hummingbirds are very well adapted to sustain efficient and rapid metabolic shifts. They oxidize ingested nectar to directly fuel flight when foraging but have to switch to oxidizing stored lipids derived from ingested sugars during the night or long-distance migratory flights. Understanding how this organism moderates energy turnover is hampered by a lack of information regarding how relevant enzymes
-
CRISPR-Cas9-based repeat depletion for the high-throughput genotyping of complex plant genomes Genome Res. (IF 9.438) Pub Date : 2023-05-01 Marzia Rossato, Luca Marcolungo, Luca De Antoni, Giulia Lopatriello, Elisa Bellucci, Gaia Cortinovis, Giulia Frascarelli, Laura Nanni, Elena Biotcchi, Valerio Di Vittori, Leonardo Vincenzi, Filippo Lucchini, Kirstin E Bett, Larissa Remsay, David Konkin, Massimo Delledonne, Roberto Papa
High-throughput genotyping enables the large-scale analysis of genetic diversity in population genomics and genome-wide association studies that combine the genotypic and phenotypic characterization of large collections of accessions. Sequencing-based approaches for genotyping are progressively replacing traditional genotyping methods due to the lower ascertainment bias. However, genome-wide genotyping
-
Highly complete long-read genomes reveal pangenomic variation underlying yeast phenotypic diversity Genome Res. (IF 9.438) Pub Date : 2023-05-01 Cory A Weller, Ilya Andreev, Michael J Chambers, Morgan Park, NISC Comparative Sequencing Program, Joshua S Bloom, Meru J Sadhu
Understanding the genetic causes of trait variation is a primary goal of genetic research. One way that individuals can vary genetically is through variable pangenomic genes - genes that are only present in some individuals in a population. The presence or absence of entire genes could have large effects on trait variation. However, variable pangenomic genes can be missed in standard genotyping workflows
-
A novel quantitative trait locus implicates Msh3 in the propensity for genome-wide short tandem repeat expansions in mice Genome Res. (IF 9.438) Pub Date : 2023-05-01 Mikhail O. Maksimov, Cynthia Wu, David G. Ashbrook, Flavia Villani, Vincenza Colonna, Nima Mousavi, Nichole Ma, Lu Lu, Jonathan K. Pritchard, Alon Goren, Robert W. Williams, Abraham A. Palmer, Melissa Gymrek
Short tandem repeats (STRs) are a class of rapidly mutating genetic elements typically characterized by repeated units of 1–6 bp. We leveraged whole-genome sequencing data for 152 recombinant inbred (RI) strains from the BXD family of mice to map loci that modulate genome-wide patterns of new mutations arising during parent-to-offspring transmission at STRs. We defined quantitative phenotypes describing
-
Aligning distant sequences to graphs using long seed sketches Genome Res. (IF 9.438) Pub Date : 2023-04-18 Amir Joudaki, Alexandru Meterez, Harun Mustafa, Ragnar Groot Koerkamp, André Kahles, Gunnar Rätsch
Sequence-to-graph alignment is crucial for applications such as variant genotyping, read error correction, and genome assembly. We propose a novel seeding approach that relies on long inexact matches rather than short exact matches, and demonstrate that it yields a better time-accuracy trade-off in settings with up to a 25% mutation rate. We use sketches of a subset of graph nodes, which are more robust
-
Gaps and complex structurally variant loci in phased genome assemblies Genome Res. (IF 9.438) Pub Date : 2023-04-01 David Porubsky, Mitchell R. Vollger, William T. Harvey, Allison N. Rozanski, Peter Ebert, Glenn Hickey, Patrick Hasenfeld, Ashley D. Sanders, Catherine Stober, Human Pangenome Reference Consortium, Jan O. Korbel, Benedict Paten, Tobias Marschall, Evan E. Eichler, Haley J. Abel, Lucinda L. Antonacci-Fulton, Mobin Asri, Gunjan Baid, Carl A. Baker, Anastasiya Belyaeva, Konstantinos Billis, Guillaume Bourque
There has been tremendous progress in phased genome assembly production by combining long-read data with parental information or linked-read data. Nevertheless, a typical phased genome assembly generated by trio-hifiasm still generates more than 140 gaps. We perform a detailed analysis of gaps, assembly breaks, and misorientations from 182 haploid assemblies obtained from a diversity panel of 77 unique
-
High resolution genomes of multiple Xiphophorus species provide new insights into microevolution, hybrid incompatibility, and epistasis Genome Res. (IF 9.438) Pub Date : 2023-04-01 Yuan Lu, Edward Rice, Kang Du, Susanne Kneitz, Magali Naville, Corentin Dechaud, Jean-Nicolas Volff, Mikki Boswell, William Boswell, LaDeana Hillier, Chad Tomlinson, Kremitzki Milin, Ronald B. Walter, Manfred Schartl, Wesley C. Warren
Because of diverged adaptative phenotypes, fish species of the genus Xiphophorus have contributed to a wide range of research for a century. Existing Xiphophorus genome assemblies are not at the chromosomal level and are prone to sequence gaps, thus hindering advancement of the intra- and inter-species differences for evolutionary, comparative, and translational biomedical studies. Herein, we assembled
-
Challenges and considerations for reproducibility of STARR-seq assays Genome Res. (IF 9.438) Pub Date : 2023-04-01 Maitreya Das, Ayaan Hossain, Deepro Banerjee, Craig Alan Praul, Santhosh Girirajan
High-throughput methods such as RNA-seq, ChIP-seq, and ATAC-seq have well-established guidelines, commercial kits, and analysis pipelines that enable consistency and wider adoption for understanding genome function and regulation. STARR-seq, a popular assay for directly quantifying the activities of thousands of enhancer sequences simultaneously, has seen limited standardization across studies. The
-
Accurate transcriptome-wide identification and quantification of alternative polyadenylation from RNA-seq data with APAIQ Genome Res. (IF 9.438) Pub Date : 2023-04-01 Yongkang Long, Bin Zhang, Shuye Tian, Jia Jia Chan, Juexiao Zhou, Zhongxiao Li, Yisheng Li, Zheng An, Xingyu Liao, Yu Wang, Shiwei Sun, Ying Xu, Yvonne Tay, Wei Chen, Xin Gao
Alternative polyadenylation (APA) enables a gene to generate multiple transcripts with different 3′ ends, which is dynamic across different cell types or conditions. Many computational methods have been developed to characterize sample-specific APA using the corresponding RNA-seq data, but suffered from high error rate on both polyadenylation site (PAS) identification and quantification of PAS usage
-
Incomplete erasure of histone marks during epigenetic reprogramming in medaka early development Genome Res. (IF 9.438) Pub Date : 2023-04-01 Hiroto S. Fukushima, Hiroyuki Takeda, Ryohei Nakamura
Epigenetic modifications undergo drastic erasure and reestablishment after fertilization. This reprogramming is required for proper embryonic development and cell differentiation. In mammals, some histone modifications are not completely reprogrammed and play critical roles in later development. In contrast, in nonmammalian vertebrates, most histone modifications are thought to be more intensively
-
MYT1L is required for suppressing earlier neuronal development programs in the adult mouse brain Genome Res. (IF 9.438) Pub Date : 2023-04-01 Jiayang Chen, Nicole A. Fuhler, Kevin K. Noguchi, Joseph D. Dougherty
In vitro studies indicate the neurodevelopmental disorder gene myelin transcription factor 1-like (MYT1L) suppresses non-neuronal lineage genes during fibroblast-to-neuron direct differentiation. However, MYT1L's molecular and cellular functions in the adult mammalian brain have not been fully characterized. Here, we found that MYT1L loss leads to up-regulated deep layer (DL) gene expression, corresponding
-
A single-cell transcriptome atlas of the maturing zebrafish telencephalon Genome Res. (IF 9.438) Pub Date : 2023-04-01 Shristi Pandey, Anna J. Moyer, Summer B. Thyme
The zebrafish telencephalon is composed of highly specialized subregions that regulate complex behaviors such as learning, memory, and social interactions. The transcriptional signatures of the neuronal cell types in the telencephalon and the timeline of their emergence from larva to adult remain largely undescribed. Using an integrated analysis of single-cell transcriptomes of approximately 64,000
-
Density separation of petrous bone powders for optimized ancient DNA yields Genome Res. (IF 9.438) Pub Date : 2023-04-01 Daniel M. Fernandes, Kendra A. Sirak, Olivia Cheronet, Mario Novak, Florian Brück, Evelyn Zelger, Alejandro Llanos-Lizcano, Anna Wagner, Anna Zettl, Kirsten Mandl, Kellie Sara Duffet Carlson, Victoria Oberreiter, Kadir T. Özdoğan, Susanna Sawyer, Francesco La Pastina, Emanuela Borgia, Alfredo Coppa, Miroslav Dobeš, Petr Velemínský, David Reich, Lynne S. Bell, Ron Pinhasi
Density separation is a process routinely used to segregate minerals, organic matter, and even microplastics, from soils and sediments. Here we apply density separation to archaeological bone powders before DNA extraction to increase endogenous DNA recovery relative to a standard control extraction of the same powders. Using nontoxic heavy liquid solutions, we separated powders from the petrous bones
-
Motif conservation, stability, and host gene expression are the main drivers of snoRNA expression across vertebrates Genome Res. (IF 9.438) Pub Date : 2023-04-01 Étienne Fafard-Couture, Pierre-Étienne Jacques, Michelle S. Scott
Small nucleolar RNAs (snoRNAs) are structured noncoding RNAs present in multiple copies within eukaryotic genomes. snoRNAs guide chemical modifications on their target RNA and regulate processes like ribosome assembly and splicing. Most human snoRNAs are embedded within host gene introns, the remainder being independently expressed from intergenic regions. We recently characterized the abundance of
-
Inferring the mode and strength of ongoing selection Genome Res. (IF 9.438) Pub Date : 2023-04-01 Gustavo V. Barroso, Kirk E. Lohmueller
Genome sequence data are no longer scarce. The UK Biobank alone comprises 200,000 individual genomes, with more on the way, leading the field of human genetics toward sequencing entire populations. Within the next decades, other model organisms will follow suit, especially domesticated species such as crops and livestock. Having sequences from most individuals in a population will present new challenges
-
The motif composition of variable number tandem repeats impacts gene expression Genome Res. (IF 9.438) Pub Date : 2023-04-01 Tsung-Yu Lu, Paulina N. Smaruj, Geoffrey Fudenberg, Nicholas Mancuso, Mark J.P. Chaisson
Understanding the impact of DNA variation on human traits is a fundamental question in human genetics. Variable number tandem repeats (VNTRs) make up ∼3% of the human genome but are often excluded from association analysis owing to poor read mappability or divergent repeat content. Although methods exist to estimate VNTR length from short-read data, it is known that VNTRs vary in both length and repeat
-
Variation in mutation, recombination, and transposition rates in Drosophila melanogaster and Drosophila simulans Genome Res. (IF 9.438) Pub Date : 2023-04-01 Yiguan Wang, Paul McNeil, Rashidatu Abdulazeez, Marta Pascual, Susan E. Johnston, Peter D. Keightley, Darren J. Obbard
The rates of mutation, recombination, and transposition are core parameters in models of evolution. They impact genetic diversity, responses to ongoing selection, and levels of genetic load. However, even for key evolutionary model species such as Drosophila melanogaster and Drosophila simulans, few estimates of these parameters are available, and we have little idea of how rates vary between individuals
-
Genome enrichment of rare and unknown species from complicated microbiomes by nanopore selective sequencing Genome Res. (IF 9.438) Pub Date : 2023-04-01 Yuhong Sun, Zhanwen Cheng, Xiang Li, Qing Yang, Bixi Zhao, Ziqi Wu, Yu Xia
Rare species are vital members of a microbial community, but retrieving their genomes is difficult because of their low abundance. The ReadUntil (RU) approach allows nanopore devices to sequence specific DNA molecules selectively in real time, which provides an opportunity for enriching rare species. Despite the robustness of enriching rare species by reducing the sequencing depth of known host sequences
-
Chromatin structure influences rate and spectrum of spontaneous mutations in Neurospora crassa Genome Res. (IF 9.438) Pub Date : 2023-04-01 Mariana Villalba de la Peña, Pauliina A.M. Summanen, Martta Liukkonen, Ilkka Kronholm
Although mutation rates have been extensively studied, variation in mutation rates throughout the genome is poorly understood. To understand patterns of genetic variation, it is important to understand how mutation rates vary. Chromatin modifications may be an important factor in determining variation in mutation rates in eukaryotic genomes. To study variation in mutation rates, we performed a mutation
-
Proving sequence aligners can guarantee accuracy in almost O(m log n) time through an average-case analysis of the seed-chain-extend heuristic Genome Res. (IF 9.438) Pub Date : 2023-03-29 Jim Shaw, Yun William Yu
Seed-chain-extend with k-mer seeds is a powerful heuristic technique for sequence alignment employed by modern sequence aligners. While effective in practice for both runtime and accuracy, theoretical guarantees on the resulting alignment do not exist for seed-chain-extend. In this work, we give the first rigorous bounds for the efficacy of seed-chain-extend with k-mers in expectation. Assume we are
-
Simultaneous profiling of host expression and microbial abundance by spatial metatranscriptome sequencing Genome Res. (IF 9.438) Pub Date : 2023-03-01 Lin Lyu, Xue Li, Ru Feng, Xin Zhou, Tuhin K. Guha, Xiaofei Yu, Guo Qiang Chen, Yufeng Yao, Bing Su, Duowu Zou, Michael P. Snyder, Lei Chen
We developed an analysis pipeline that can extract microbial sequences from spatial transcriptomic (ST) data and assign taxonomic labels, generating a spatial microbial abundance matrix in addition to the default host expression matrix, enabling simultaneous analysis of host expression and microbial distribution. We called the pipeline spatial metatranscriptome (SMT) and applied it on both human and
-
Genome-wide identification of tandem repeats associated with splicing variation across 49 tissues in humans Genome Res. (IF 9.438) Pub Date : 2023-03-01 Kohei Hamanaka, Daisuke Yamauchi, Eriko Koshimizu, Kei Watase, Kaoru Mogushi, Kinya Ishikawa, Hidehiro Mizusawa, Naomi Tsuchida, Yuri Uchiyama, Atsushi Fujita, Kazuharu Misawa, Takeshi Mizuguchi, Satoko Miyatake, Naomichi Matsumoto
Tandem repeats (TRs) are one of the largest sources of polymorphism, and their length is associated with gene regulation. Although previous studies reported several tandem repeats regulating gene splicing in cis (spl-TRs), no large-scale study has been conducted. In this study, we established a genome-wide catalog of 9537 spl-TRs with a total of 58,290 significant TR–splicing associations across 49
-
A sheep pangenome reveals the spectrum of structural variations and their effects on tail phenotypes Genome Res. (IF 9.438) Pub Date : 2023-03-01 Ran Li, Mian Gong, Xinmiao Zhang, Fei Wang, Zhenyu Liu, Lei Zhang, Qimeng Yang, Yuan Xu, Mengsi Xu, Huanhuan Zhang, Yunfeng Zhang, Xuelei Dai, Yuanpeng Gao, Zhuangbiao Zhang, Wenwen Fang, Yuta Yang, Weiwei Fu, Chunna Cao, Peng Yang, Zeinab Amiri Ghanatsaman, Niloufar Jafarpour Negari, Hojjat Asadollahpour Nanaei, Xiangpeng Yue, Yuxuan Song, Xianyong Lan, Weidong Deng, Xihong Wang, Chuanying Pan, Ruidong
Structural variations (SVs) are a major contributor to genetic diversity and phenotypic variations, but their prevalence and functions in domestic animals are largely unexplored. Here we generated high-quality genome assemblies for 15 individuals from genetically diverse sheep breeds using Pacific Biosciences (PacBio) high-fidelity sequencing, discovering 130.3 Mb nonreference sequences, from which
-
SWATH-MS-based proteogenomic analysis reveals the involvement of alternative splicing in poplar upon lead stress Genome Res. (IF 9.438) Pub Date : 2023-03-01 Fu-Yuan Zhu, Xin Chen, Yu-Chen Song, Lydia Pui Ying Lam, Yuki Tobimatsu, Bei Gao, Mo-Xian Chen, Fu-Liang Cao
Alternative splicing (AS) regulates gene expression and increases proteomic diversity for the fine tuning of stress responses in plants, but the exact mechanism through which AS functions in plant stress responses is not thoroughly understood. Here, we investigated how AS functions in poplar (Populus trichocarpa), a popular plant for bioremediation, in response to lead (Pb) stress. Using a proteogenomic
-
Tn5 tagments and transposes oligos to single-stranded DNA for strand-specific RNA sequencing Genome Res. (IF 9.438) Pub Date : 2023-03-01 Yanjun Zhang, Yin Tang, Zhongxing Sun, Junqi Jia, Yuan Fang, Xinyi Wan, Dong Fang
Tn5 transposon tagments double-stranded DNA and RNA/DNA hybrids to generate nucleic acids that are ready to be amplified for high-throughput sequencing. The nucleic acid substrates for the Tn5 transposon must be explored to increase the applications of Tn5. Here, we found that the Tn5 transposon can transpose oligos into the 5′ end of single-stranded DNA longer than 140 nucleotides. Based on this property
-
Enhancers display constrained sequence flexibility and context-specific modulation of motif function Genome Res. (IF 9.438) Pub Date : 2023-03-01 Franziska Reiter, Bernardo P. de Almeida, Alexander Stark
The information about when and where each gene is to be expressed is mainly encoded in the DNA sequence of enhancers, sequence elements that comprise binding sites (motifs) for different transcription factors (TFs). Most of the research on enhancer sequences has been focused on TF motif presence, whereas the enhancer syntax, that is, the flexibility of important motif positions and how the sequence
-
The impact of SWI/SNF and NuRD inactivation on gene expression is tightly coupled with levels of RNA polymerase II occupancy at promoters Genome Res. (IF 9.438) Pub Date : 2023-03-01 Sachin Pundhir, Jinyu Su, Marta Tapia, Anne Meldgaard Hansen, James Seymour Haile, Klaus Hansen, Bo Torben Porse
SWI/SNF and NuRD are protein complexes that antagonistically regulate DNA accessibility. However, repression of their activities often leads to unanticipated changes in target gene expression (paradoxical), highlighting our incomplete understanding of their activities. Here we show that SWI/SNF and NuRD are in a tug-of-war to regulate PRC2 occupancy at lowly expressed and bivalent genes in mouse embryonic
-
Defining the separation landscape of topological domains for decoding consensus domain organization of the 3D genome Genome Res. (IF 9.438) Pub Date : 2023-03-01 Dachang Dang, Shao-Wu Zhang, Ran Duan, Shihua Zhang
Topologically associating domains (TADs) have emerged as basic structural and functional units of genome organization and have been determined by many computational methods from Hi-C contact maps. However, the TADs obtained by different methods vary greatly, which makes the accurate determination of TADs a challenging issue and hinders subsequent biological analyses about their organization and functions
-
Global loss of cellular m6A RNA methylation following infection with different SARS-CoV-2 variants Genome Res. (IF 9.438) Pub Date : 2023-03-01 Roshan Vaid, Akram Mendez, Ketan Thombare, Rebeca Burgos-Panadero, Rémy Robinot, Barbara F. Fonseca, Nikhil R. Gandasi, Johan Ringlander, Mohammad Hassan Baig, Jae-June Dong, Jae Yong Cho, Björn Reinius, Lisa A. Chakrabarti, Kristina Nystrom, Tanmoy Mondal
Insights into host–virus interactions during SARS-CoV-2 infection are needed to understand COVID-19 pathogenesis and may help to guide the design of novel antiviral therapeutics. N6-Methyladenosine modification (m6A), one of the most abundant cellular RNA modifications, regulates key processes in RNA metabolism during stress response. Gene expression profiles observed postinfection with different SARS-CoV-2
-
Complete sequencing of a cynomolgus macaque major histocompatibility complex haplotype Genome Res. (IF 9.438) Pub Date : 2023-03-01 Julie A. Karl, Trent M. Prall, Hailey E. Bussan, Joshua M. Varghese, Aparna Pal, Roger W. Wiseman, David H. O'Connor
Macaques provide the most widely used nonhuman primate models for studying the immunology and pathogenesis of human diseases. Although the macaque major histocompatibility complex (MHC) region shares most features with the human leukocyte antigen (HLA) region, macaques have an expanded repertoire of MHC class I genes. Although a chimera of two rhesus macaque MHC haplotypes was first published in 2004
-
Large haplotypes highlight a complex age structure within the maize pan-genome Genome Res. (IF 9.438) Pub Date : 2023-03-01 Jianing Liu, R. Kelly Dawe
The genomes of maize and other eukaryotes contain stable haplotypes in regions of low recombination. These regions, including centromeres, long heterochromatic blocks, and rDNA arrays, have been difficult to analyze with respect to their diversity and origin. Greatly improved genome assemblies are now available that enable comparative genomics over these and other nongenic spaces. Using 26 complete
-
Kinetic networks identify TWIST2 as a key regulatory node in adipogenesis Genome Res. (IF 9.438) Pub Date : 2023-03-01 Arun B. Dutta, Daniel S. Lank, Roza K. Przanowska, Piotr Przanowski, Lixin Wang, Bao Nguyen, Ninad M. Walavalkar, Fabiana M. Duarte, Michael J. Guertin
Adipocytes contribute to metabolic disorders such as obesity, diabetes, and atherosclerosis. Prior characterizations of the transcriptional network driving adipogenesis have overlooked transiently acting transcription factors (TFs), genes, and regulatory elements that are essential for proper differentiation. Moreover, traditional gene regulatory networks provide neither mechanistic details about individual
-
Evaluation of N6-methyldeoxyadenosine antibody-based genomic profiling in eukaryotes Genome Res. (IF 9.438) Pub Date : 2023-03-01 Brian M. Debo, Benjamin J. Mallory, Andrew B. Stergachis
Low-level DNA N6-methyldeoxyadenosine (DNA-m6A) has recently been reported across various eukaryotes. Although anti-m6A antibody–based approaches are commonly used to measure DNA-m6A levels, this approach is known to be confounded by DNA secondary structures, RNA contamination, and bacterial contamination. To evaluate for these confounding features, we introduce an approach for systematically validating
-
Complex hierarchical structures in single-cell genomics data unveiled by deep hyperbolic manifold learning Genome Res. (IF 9.438) Pub Date : 2023-02-01 Tian Tian, Cheng Zhong, Xiang Lin, Zhi Wei, Hakon Hakonarson
With the advances in single-cell sequencing techniques, numerous analytical methods have been developed for delineating cell development. However, most are based on Euclidean space, which would distort the complex hierarchical structure of cell differentiation. Recently, methods acting on hyperbolic space have been proposed to visualize hierarchical structures in single-cell RNA-seq (scRNA-seq) data
-
The aberrant epigenome of DNMT3B-mutated ICF1 patient iPSCs is amenable to correction, with the exception of a subset of regions with H3K4me3- and/or CTCF-based epigenetic memory Genome Res. (IF 9.438) Pub Date : 2023-02-01 Varsha Poondi Krishnan, Barbara Morone, Shir Toubiana, Monika Krzak, Salvatore Fioriniello, Floriana Della Ragione, Maria Strazzullo, Claudia Angelini, Sara Selig, Maria R. Matarazzo
Bi-allelic hypomorphic mutations in DNMT3B disrupt DNA methyltransferase activity and lead to immunodeficiency, centromeric instability, facial anomalies syndrome, type 1 (ICF1). Although several ICF1 phenotypes have been linked to abnormally hypomethylated repetitive regions, the unique genomic regions responsible for the remaining disease phenotypes remain largely uncharacterized. Here we explored
-
The Planemo toolkit for developing, deploying, and executing scientific data analyses in Galaxy and beyond Genome Res. (IF 9.438) Pub Date : 2023-02-01 Simon Bray, John Chilton, Matthias Bernt, Nicola Soranzo, Marius van den Beek, Bérénice Batut, Helena Rasche, Martin Čech, Peter J.A. Cock, Björn Grüning, Anton Nekrutenko
There are thousands of well-maintained high-quality open-source software utilities for all aspects of scientific data analysis. For more than a decade, the Galaxy Project has been providing computational infrastructure and a unified user interface for these tools to make them accessible to a wide range of researchers. To streamline the process of integrating tools and constructing workflows as much
-
Characterization of network hierarchy reflects cell state specificity in genome organization Genome Res. (IF 9.438) Pub Date : 2023-02-01 Jingyao Wang, Yue Xue, Yueying He, Hui Quan, Jun Zhang, Yi Qin Gao
Dynamic chromatin structure acts as the regulator of transcription program in crucial processes including cancer and cell development, but a unified framework for characterizing chromatin structural evolution remains to be established. Here, we performed graph inferences on Hi-C data sets and derived the chromatin contact networks. We discovered significant decreases in information transmission efficiencies
-
Regulation of endogenous retrovirus–derived regulatory elements by GATA2/3 and MSX2 in human trophoblast stem cells Genome Res. (IF 9.438) Pub Date : 2023-02-01 Cui Du, Jing Jiang, Yuzhuo Li, Miao Yu, Jian Jin, Shuai Chen, Hairui Fan, Todd S. Macfarlan, Bin Cao, Ming-an Sun
The placenta is an organ with extraordinary phenotypic diversity in eutherian mammals. Recent evidence suggests that numerous human placental enhancers are evolved from lineage-specific insertions of endogenous retroviruses (ERVs), yet the transcription factors (TFs) underlying their regulation remain largely elusive. Here, by first focusing on MER41, a primate-specific ERV family previously linked
-
Atlas-scale single-cell chromatin accessibility using nanowell-based combinatorial indexing Genome Res. (IF 9.438) Pub Date : 2023-02-01 Brendan L. O'Connell, Ruth V. Nichols, Dmitry Pokholok, Jerushah Thomas, Sonia N. Acharya, Andrew Nishida, Casey A. Thornton, Marissa Co, Andrew J. Fields, Frank J. Steemers, Andrew C. Adey
Here we present advancements in single-cell combinatorial indexed Assay for Transposase Accessible Chromatin (sciATAC) to measure chromatin accessibility that leverage nanowell chips to achieve atlas-scale cell throughput (>105 cells) at low cost. The platform leverages the core of the sciATAC workflow where multiple indexed tagmentation reactions are performed, followed by pooling and distribution
-
A temporal in vivo catalog of chromatin accessibility and expression profiles in pineoblastoma reveals a prevalent role for repressor elements Genome Res. (IF 9.438) Pub Date : 2023-02-01 Salam Idriss, Mohammad Hallal, Abdullah El-Kurdi, Hasan Zalzali, Inaam El-Rassi, Erik A. Ehli, Christel M. Davis, Philip E.D. Chung, Deena M.A. Gendoo, Eldad Zacksenhaus, Raya Saab, Pierre Khoueiry
Pediatric pineoblastomas (PBs) are rare and aggressive tumors of grade IV histology. Although some oncogenic drivers are characterized, including germline mutations in RB1 and DICER1, the role of epigenetic deregulation and cis-regulatory regions in PB pathogenesis and progression is largely unknown. Here, we generated genome-wide gene expression, chromatin accessibility, and H3K27ac profiles covering
-
Matching queried single-cell open-chromatin profiles to large pools of single-cell transcriptomes and epigenomes for reference supported analysis Genome Res. (IF 9.438) Pub Date : 2023-02-01 Shreya Mishra, Neetesh Pandey, Smriti Chawla, Madhu Sharma, Omkar Chandra, Indra Prakash Jha, Debarka SenGupta, Kedar Nath Natarajan, Vibhor Kumar
The true benefits of large single-cell transcriptome and epigenome data sets can be realized only with the development of new approaches and search tools for annotating individual cells. Matching a single-cell epigenome profile to a large pool of reference cells remains a major challenge. Here, we present scEpiSearch, which enables searching, comparison, and independent classification of single-cell
-
A chromosome-scale epigenetic map of the Hydra genome reveals conserved regulators of cell state Genome Res. (IF 9.438) Pub Date : 2023-02-01 Jack F. Cazet, Stefan Siebert, Hannah Morris Little, Philip Bertemes, Abby S. Primack, Peter Ladurner, Matthias Achrainer, Mark T. Fredriksen, R. Travis Moreland, Sumeeta Singh, Suiyuan Zhang, Tyra G. Wolfsberg, Christine E. Schnitzler, Andreas D. Baxevanis, Oleg Simakov, Bert Hobmayer, Celina E. Juliano
The epithelial and interstitial stem cells of the freshwater polyp Hydra are the best-characterized stem cell systems in any cnidarian, providing valuable insight into cell type evolution and the origin of stemness in animals. However, little is known about the transcriptional regulatory mechanisms that determine how these stem cells are maintained and how they give rise to their diverse differentiated
-
Genome-wide evaluation of the effect of short tandem repeat variation on local DNA methylation Genome Res. (IF 9.438) Pub Date : 2023-02-01 Alejandro Martin-Trujillo, Paras Garg, Nihir Patel, Bharati Jadhav, Andrew J. Sharp
Short tandem repeats (STRs) contribute significantly to genetic diversity in humans, including disease-causing variation. Although the effect of STR variation on gene expression has been extensively assessed, their impact on epigenetics has been poorly studied and limited to specific genomic regions. Here, we investigated the hypothesis that some STRs act as independent regulators of local DNA methylation
-
Multi-omics analyses demonstrate a critical role for EHMT1 methyltransferase in transcriptional repression during oogenesis Genome Res. (IF 9.438) Pub Date : 2023-01-01 Hannah Demond, Courtney W. Hanna, Juan Castillo-Fernandez, Fátima Santos, Evangelia K. Papachristou, Anne Segonds-Pichon, Kamal Kishore, Simon Andrews, Clive S. D'Santos, Gavin Kelsey
EHMT1 (also known as GLP) is a multifunctional protein, best known for its role as an H3K9me1 and H3K9me2 methyltransferase through its reportedly obligatory dimerization with EHMT2 (also known as G9A). Here, we investigated the role of EHMT1 in the oocyte in comparison to EHMT2 using oocyte-specific conditional knockout mouse models (Ehmt2 cKO, Ehmt1 cKO, Ehmt1/2 cDKO), with ablation from the early
-
Robust analysis of prokaryotic pangenome gene gain and loss rates with Panstripe Genome Res. (IF 9.438) Pub Date : 2023-01-01 Gerry Tonkin-Hill, Rebecca A. Gladstone, Anna K. Pöntinen, Sergio Arredondo-Alonso, Stephen D. Bentley, Jukka Corander
Horizontal gene transfer (HGT) plays a critical role in the evolution and diversification of many microbial species. The resulting dynamics of gene gain and loss can have important implications for the development of antibiotic resistance and the design of vaccine and drug interventions. Methods for the analysis of gene presence/absence patterns typically do not account for errors introduced in the